EPA/620/R-94/020
                               March 1994
Chesapeake Bay Watershed

           Pilot Project.
      Project Manager - L Dorsey Worthy
     Advanced Monitoring Systems Division
            EMSL Las Vegas, NV
    EMAP Research and Assessment Center
Environmental Monitoring and Assessment Program
      Office of Research and Development
      U.S. Environmental Protection Agency
       Research Triangle Park, NC 27711
                                   Printed on Recycled Paper

-------
                                 NOTICE
      The information in this document has been funded wholly or in part by the U.S.
Environmental Protection Agency. It has been subjected to the Agency's peer and admin-
istrative review process. It has been approved for publication as an EPA document. Mention
of trade names or commercial products does not constitute endorsement or recommenda-
tion for use.

-------
 and industrial activities, all potentially affect the condition of the watershed and the
 Bay. The Bay extends 333 km (200 miles) from  its northern extent at Susquehanna
 Flats, Maryland, south to its southern extent between Cape Henry and Cape Charles,
 Virginia. The Bay is fed by 48 major rivers and more than 100 smaller tributaries
 draining 165,800 km2 (64,000 mi2).  Watershed tributaries reach north to
 Cooperstown, New York, south to Norfolk, Virginia, west to the Appalachian and
 Allegheny Mountains, and, east into the State of Delaware. Many Bay monitoring
 activities, including non-point source water-quality assessments and process
 modelling, require recent land cover/use inventory data to adequately assess the
 status and trends of this dynamic watarshed and Bay.
 OBJECTIVES

      The major objective of the EMAP Chesapeake Bay Watershed Pilot Project
 was the development and testing of methods for producing detailed digital land cover
 and land use data .over large geographic areas using commercially available satellite
 imagery.   The land  cover/use map generated by this project is intended to be used,
 in the CBPO non-point source water quality model and will  replace the currently used
 and outdated map. This project was also intended to compliment other similar
 remote sensing data  products being generated for the Chesapeake Bay area.  These
 include the National Oceanic and Atmospheric Administration (NOAA) Coastal
 Change Analysis Program (C-CAP) Chesapeake Bay Project, and efforts by both the
 states of Maryland  and Virginia to map Bay area resources

      A secondary goal of this project was the evaluation of the applicability of the
 EMAP hexagon sampling frame.  This frame consists of a systematic, uniformly
 distributed grid of continuous 640 km2 hexagonal units covering the United States.
 Smaller 40 km2 Stage Two hexagons are centered within each of these larger
 hexagons.  These smaller hexagons have been recommended as standard sampling
 units for EMAP resource groups.

      Results of this and similar projects have led to the formation of the Multi-
 Resolution Land Characteristics (MRLC) consortium, an interagency cooperative
effort established to pool expertise and defray costs of production  of a satellite based
digital land cover/use database for the conterminous United States. The MRLC
consists of representatives from USGS EROS Data  Center (EDC), USGS National
Water Quality Assessment Program (NAWQA), NOAA Coastal Change Assessment
Program  (C-CAP),  EPA North American Landscape Characterization Project (NALC),
and  EMAP-LC.  This  project will, therefore, serve as  a model for large-scale projects
using remote sensing for environmental monitoring.

-------
DATA SOURCES

       Landsat Thematic Mapper (TM) digital multispectral imagery was selected as
the source image data due to its relatively high spatial and spectral resolution.
Spatial resolution refers to the level of spatial detail inherent in an image. Spectral
resolution relates to the widths of wavelength bands and positions of bands in the
electromagnetic spectrum measured by the sensor. Sixteen TM scenes from the
Landsat 5 satellite were used in the project. Figure 1-2 illustrates the positions of the
scenes across the watershed.

      The TM sensor system has-the capability to differentiate reflected and emitted
electromagnetic radiation (EMR) in seven discrete wavelength bands or channels.
These capabilities include spectral sensitivity in the visible (three bands), near and
mid-infrared (three bands), and thermal portions of the EMR spectrum.  These band
combinations provide quantitative spectral values which can be used to discriminate
land cover/use types.

-------
  Path 17
Path 16
Path 15
Path 14
Chesapeake Bay Watershed
              c
                                                   Row 30
                                                   Row 31
                                                   Row 32
                                                   Row 33
                                                   Row 34
                                                   Row 35
     TM Scene Boundary
    Figure 1 -2. Chesapeake Bay Watershed Study
             (Location of Landsat Thematic Mapper Scenes)
                                                          5561GR92AMW-2-1

-------
                                  CHAPTER 2
                           CLASSIFICATION SYSTEM
      The EPA EMAP-LC participated in the development of an interagency
ecological land cover and land use classification system.  Collaborators included EPA
EMAP, US Geological Survey (USGS), US Fish and Wildlife Service - National
Wetlands Inventory (USFWS-NWI), National Oceanic and Atmospheric Administration
(NOAA) - National Marine Fisheries Service (NMFS), University of Delaware, Oak
Ridge National Laboratory, Salisbury State University, and Florida Department of
Natural Resources. The system was developed to be hierarchial, with broad
categories at the basic level, and increasing detail at subsequent levels.
CLASSIFICATION CRITERIA

      Portions of Anderson (1971), Anderson et ai. (1976) and Cowardin at a/.,
(1979) were incorporated into the effort.  These systems were used to take
advantage of their strengths, to avoid duplication of effort, to provide commonality
between the current system and generally accepted systems, and to facilitate
understanding of the final classification product by a variety of users. The following
criteria,  modified from Anderson (1971) and Anderson et a/. (1976), were included in
the development of this system.

     1.  The minimum level of interpretation accuracy in the identification of land use
        and land cover categories should be 85% correct overall.
     2.  The accuracy of interpretation for the several categories should  be essentially
        equal.
     3.  Repeatable or repetitive results should be obtained from one interpreter to
        another, and from one time of sensing to another.
     4.  The classification system should  be applicable over extensive areas.
     5.  The classification system should  permit vegetation and other types of land
        cover categories to be used  as surrogates  for land use activity.
     6.  The classification system should  be suitable for use with remote sensor data
        obtained at different times of the  year.
     7.  Subcategories (finer detail) obtained from ground surveys or from the use
        of larger scale or enhanced remote sensor data can be used effectively.
     8.  Categories can be and should be aggregated when appropriate.
     9.  Current and future land use data can be  compared.
   10.  Multiple uses of land can be recognized.

-------
       Initially, 19 categories were selected, including three forest categories, four
 wetlands categories, and two agricultural categories.  Inconsistencies and low
 accuracies resulted in the eventual aggregation or elimination of many of these
 original categories, resulting in the final six category data set.  Table 2-1 shows the
 original categories and the final database categories to which they were merged..
TABLE 2-1. Original and Final

Level 0      Original Level 1

UPLAND     1 DEVELOPED


            2 CULTIVATED


            3 GRASSLAND

            4 WOODY



            5 EXPOSED




           6 SNOW & ICE

WETLAND   7 WOODY



           8 HERBACEOUS

           9 NONVEGETATED
                             Land Cover/Land Use Categories

                               Original Level 2        Final Database Category
                             11  HIGH INTENSITY
                             12  LOW INTENSITY

                             21  WOODY
                             22,  HERBACEOUS

                             31  HERBACEOUS

                             4t  DECIDUOUS
                             42  MIXED
                             43  EVERGREEN
 11 DVL HIGH INTENSITY
 12 DVL. LOW INTENSITY

 30 WOODY
 20 HERBACEOUS

 20 HERBACEOUS

 30 WOODY
                             51  SOIL
                             52  SAND
                             53  ROCK
                             54  EVAPORITE DEPOSITS
40 EXPOSED
                             61  SNOW & ICE

                             71  DECIDUOUS
                             72  MIXED
                             73  EVERGREEN

                             81  HERBACEOUS

                             91  NONVEGETATED
WATER AND
SUBMERGED
           10 WATER
                             100 WATER
  - NONE -

30 WOODY
      u
      it


20 HERBACEOUS

40 EXPOSED




60 WATER
      The original intent was to use the USFWS National Wetlands Inventory (NWI)
digital coverages to define wetlands. However, the NWI maps for the Chesapeake
Bay watershed were incomplete.  Therefore, an attempt was made to identify

-------
wetlands using the clustered spectral data.  Early verification efforts showed these
classifications to be inaccurate, and they were merged with identifiable component
categories.  In addition, pasture, grassland,  and cultivated herbaceous categories
were found to be indistinguishable and were merged.  The mixed woody category
accuracy was also unacceptable, and required the merging of all woody categories,
since it could not be merged to evergreen or deciduous categories. The final six
categories represented the greatest thematic detail meeting EMAP-LC Data Quality
Objectives (DQO).
DEFINITIONS

      The remainder of this section provides descriptions for each of the final
category or class in the classification system for the Chesapeake Bay Pilot study.
The definitions are a combination of the project comments and text from Anderson
(1971) and Anderson et a/. (1976): A Land Use and Land Cover Classification
System for Use with Remote Sensor Data.

      Land use is defined here as spatial divisions based on anthropomorphic
activity or utilization. Land use may or may not be identifiable from above the earths
surface, and generally requires a priori knowledge of the feature. Examples include
agricultural orchards and row crops, urban and residential areas, commercial zoning,
and transportation networks.

      Land cover is defined here as the substance existing on and visible and
recognizable from above the earth's surface. This generally requires limited or no a
priori knowledge of the feature for recognition or differentiation.  Examples include
vegetation, exposed or barren land, water, and snow and ice.

      UPLAND DIVISION
      The Upland Division includes all categories other than open water.  It is divided
      into four Level 1  categories:  Developed, Herbaceous, Woody,  and Exposed
      Land.

            10  DEVELOPED
            This category is composed of areas  of anthropogenic use, with much of
            the land covered by structures and other artificial and impervious
            surfaces.  This category does not include cultivated or other agricultural
            land.
            Included in this category are cities; towns; villages; strip
            developments along highways; transportation, power, and
            communications facilities; and areas such as those occupied by
            mills, shopping centers, industrial and commercial complexes,
                                       8

-------
and institutions that may, in some instances, be isolated from
urban areas. (Anderson et at., 1976).
Developed Lands are divided into two Level 2 groups: 1.1 - High
Intensity (Solid Cover) and 1.2 - Low Intensity (Mixed Cover).

      11  HIGH INTENSITY DEVELOPED
      This class refers to built-up urban areas, and it contains areas
      primarily composed of a solid cover of human-made materials.
      They contain few mixed (human-made materials and vegetation)
      areas.  They may have a variety of land uses. A significant
      portion of the surface is covered by concrete, asphalt, and other
      artificial materials, and contain little vegetation. Examples are
      apartments, large buildings, shopping centers, factories, and
      industrial areas.  This class often occurs in city centers.  Some
      major highway systems are also included in this category.

      12  LOW INTENSITY DEVELOPED
      The Low Intensity class refers to areas that contain a mixture of
      human-made materials and other land cover/use resources. They
      are typically single family housing areas, and  are often called
      suburban or residential. The category also contains roadways
      where pixels consist of a mix of highway materials and other
      resources.  As the intensity of the human-made materials
      decreases, this category grades into 2.0-Herbaceous,  3.0-Woody,
      and other appropriate categories.

20  HERBACEOUS
The Herbaceous category includes lands covered by either natural or
managed  herbaceous cover,  including agricultural row crops and
pasture.  The Herbaceous class  is defined as land where the potential
natural vegetation is predominantly grasses, grass-like plants, and forbs.
Also included in this class are lawns and other landscaped grassy areas
such as parks, cemeteries, golf courses, and road and highway right-of-
ways.

30  WOODY
The class Woody  refers to land covered by shrubs or trees. This
includes any species that has an  aerial stem which persists for more
than one season,  and in most cases a cambium layer for periodic
growth in diameter (Harlow and Harrar, 1969).  The Woody category
includes deciduous, evergreen, and mixed trees,  and shrub-scrub
vegetation.

-------
      40  EXPOSED
      Exposed land includes naturally occurring areas that have limited ability
      to support plant life, or have been burned, cleared, or disturbed. In
      general, these areas are covered with soil, sand, or rocks.  Vegetation,
      if present, is widely spaced. Naturally barren areas contain less than
      one-third vegetation or other cover.  The exposed areas may be
      transitional or developed.  These areas include lands cleared for a
      variety of purposes, e.g., construction of new buildings, quarries,
      landfills, gravel pits, strip mines, etc.

WATER AND SUBMERGED LAND DIVISION
The Water and  Submerged Land Division consists of areas of open water and
land areas covered by water.  It has one Level 1 category: Water.

      60  WATER
      Water is defined as areas that contain standing shallow and deep water,
      either natural or human-made.  Water habitats include environments
      where surface water is permanent and often  deep, so that water, rather
      than air, is the principal medium within which the dominant organisms
      live. Human-made areas of water would include reservoirs,
      impoundments, dikes, ponds, and canals.
                                10

-------
                                 CHAPTER 1
                               INTRODUCTION
      The Chesapeake Bay Watershed Pilot Project was initiated to develop and test
methods for generating: digital land cover and land use products from satellite based
remotely sensed imagery,  these methods and the resulting data products were
intended to fulfill specific program requirements of the Landscape Characterization
component of the Environmental Monitoring and Assessment Program (EMAP-LC)
and the Chesapeake Bay Program Office (CBPO)  of the U.S. Environmental
Protection Agency (EPA). This report presents a standardized methodology for
digitally classifying remote sensing data over relatively large and  diverse geographic
areas.

      This report describes the methodology used to produce the land cover/use
map of the Chesapeake Bay watershed.  Chapter  2 discusses the development of the
classification scheme and gives a general description of the classes.  A technical
description of ail processing and quality assurance/quality control methods, including
spatial and thematic accuracy assessments, is presented in Chapter 3.  Chapter 4
presents results and summary statistics, and discusses difficulties, accomplishments,
and recommendations for future research. Conclusions are presented in Chapter 5.
The Chapters are followed by References, Acknowledgements, and Appendices.
BACKGROUND

      EMAP is an innovative research, monitoring, and assessment effort designed
to report on the condition of the nation's ecological resources, including surface
waters, agroecosystems, arid ecosystems, forests, and estuaries.  This information,
when combined with data from other monitoring programs, will provide a
comprehensive view of the effectiveness of national environmental .policies.

      EMAP-LC is responsible for providing the geographic element of "condition,"
using consistent national methods now under development and testing. These
methods are intended to provide a comprehensive, consistent, and statistically vaiid
nationwide land cover/use product to assist in the overall EMAP effo'.t The use of
satellite imagery to derive this coverage is viewed as the most cost effective and
achievable approach.

      The Chesapeake Bay Program  Office has environmental stewardship over the
nation's largest estuarine complex, a more than 165,800 km2 (64,000  mi2) watershed
(Figure 1-1). A burgeoning population, combined with extensive forests, agricultural,
                                      1

-------
                                                             New York




                                                         Washington DC




                                                         Chesapeake Bay
                                                         Norfolk
Figure 1-1 Location map showing the Chesapeake Bay Watershed.

-------
                               CONTENTS



 LIST OF FIGURES	  iv



 LIST OF TABLES	  iv



 CHAPTERS




      1. INTRODUCTION	;  -\




           BACKGROUND		  t




           OBJECTIVES	  3




           DATA SOURCES	  4




      2. CLASSIFICATIONSYSTEM	  6




           CLASSIFICATION CRITERIA	  6



           DEFINITIONS	  8




      3. METHODOLOGY  	  H




           QA/QC PROCEDURES	  11




           TM BAND SELECTIONS	  15



           CLASSIFICATION TECHNIQUES	  17




           THEMATIC ACCURACY ASSESSMENT 	  25



           FINAL LAND COVER/LAND USE GENERATION 	  34



           SUMMARY	  34,




      4. RESULTS AND DISCUSSION	  37




           CLASSIFICATION SYSTEM  	  37



           CLASSIFICATION METHODOLOGY  	  38



      6. CONCLUSIONS 	  42



REFERENCES	  43



ACKNOWLEDGMENTS	  45




APPENDIX A: INSTRUCTIONS AND TRACKING FORMS	  A1

-------
                               LIST OF FIGURES


 Figure 1-1   Location map showing the Chesapeake Bay Watershed	  2

 Figure 1-2   Map showing  the distribution and  overlap of Landsat Thematic
             Mapper Scenes across the Chesapeake Bay Watershed	  5

 Figure 3-1   Flow chart illustrating the major phases in the Chesapeake Bay
             Watershed Pilot Project	  12

 Figure 3-2   Flow chart outlining the steps in the image classification analysis.   .  19

 Figure 3-3   Distribution  of  the 291  EMAP  40  km2 Hexagons  within the
             Chesapeake Bay Watershed	  30

 Figure 3-4   Final classification of the Chesapeake Bay Watershed	  35

 Figure 4-1   Landsat  Thematic Mapper  color  composite  image   of  the
             Shenandoah Mountains, VA.  Thematic Mapper bands 4,  5, and 3
             are assigned to red, green and blue, respectively. The blue areas
             along the ridges in the central portion of the image indicate areas of
             gypsy moth defoliation.  .-	  40



                               LIST OF TABLES


Table 2-1     Original and Final  Land Cover/Land  Use Categories  	  7

Table 3-1     Reference Material Used To Aid The Classification Process	  22

Table 3-2    An Example of an  Error or Confusion Matrix 	  26

Table 3-3    Total Sample Points Evaluated by Level 2 Category	  28

Table 3-4     Land Cover/Use Statistics and Correlation Coefficients for Overall
            Chesapeake Bay Watershed, Sample Point Photo Coverage, and
            EMAP Stage 2 Hexagon Coverage	  31

Table 3-5    Final Thematic Accuracies for the  Chesapeake  Bay Watershed
            Categorized Data Set	   33
                                      IV

-------
                                  CHAPTER 3
                               METHODOLOGY
      The development of an overall methodology for executing a large area TM land
cover/use classification was a primary goal of this project. A major effort went into
the development and documentation of the techniques, taking into consideration the
large number of TM scenes to be used and requirements of the various users within
EPA and EPA-EMAP. The procedures and documentation were intended to facilitate
efforts in similar large area or small-scale projects.

      This section describes analytical procedures used in processing the Landsat
TM data to produce land cover/use digital maps of the Chesapeake Bay watershed.
This section is divided into five major topics: Quality Assurance and Quality Control;
TM Band Selection; Classification Techniques; Accuracy Assessment; and Final Land
Cover/Use Generation.   Each section describes issues which relate to the major
topic, followed by a discussion of the decision processes used to select the final
methodology.  Alternative approaches are discussed as they relate to methodology
decisions and do not represent an exhaustive discussion of all available techniques.

      The technical background of the methodology, including quality
assurance/quality control procedures, is presented here.  Detailed steps in data
processing, including all tracking forms, are contained in Appendix A. A diagram of
the image classification methodology is shown in Figure 3-1 and is discussed in this
section of the report. A step-by-step instruction guide for data handling and
processing is also found in Appendix A.
QUALITY ASSURANCE AND QUALITY CONTROL PROCEDURES

      A standard definition of the QA/QC terminology was adopted to avoid
confusion.  Taylor (1987) defines QA and QC as follows:

      Quality Assurance: A system of activities whose purpose is to provide the
      producer or user of a product or a service the assurance that it meets defined
      standards of quality with stated level of confidence.

      Quality Control: The overall system of activities whose purpose is to control
      the quality of a product or service so that it meets the needs of users.  The
      aim is to provide quality that is satisfactory, dependable, and economical.
                                      11

-------
                            Receive Raw TM
                           Clip to Watershed  j
                        Subset & Reduce Bands
&$®$ii$£
^£pr
?&&&}.%
 ":?•&£
                 &&$m
                                Classify    I
                         Build Single Coverage   I
                               	
                       "Smooth" to Min. Map Unit|j
             Build Final Landscape Coverage ARC/INFO GRID   I
Figure 3-1 Flow chart illustrating the major phases in the Chesapeake Bay

Watershed Pilot Project.


                                 12

-------
       Production of all elements of the project, from receiving data, through
evaluation of the final data classification, followed standardized and documented
QA/QC procedures.  The strategies arid procedures encompassing management,
personnel, problem areas, corrective actions, and products are discussed. The six
major topics covered are:  User Instructions and Training; Tracking  Forms; Receiving
TM Data; Spatial Accuracy Tests; and Combining and Edge Matching Classified TM
Data.
User Instruction and: Training

      To complete a project of this magnitude, a trained staff and consistent
procedures and analysis techniques were required. Standard Operating Procedures
(SOP's) ensured that all TM scenes within the watershed were processed alike.
Examples of these SOP's, along with the tracking forms, are found in Appendix A.

      SOP's contained detailed, step-by-step procedures to help the analyst classify
TM data from the project's beginning to end. The instructions included computer
commands,  data handling operations, file-naming conventions, file storage and
retrieval, input parameters for statistical and analytical programs, and tracking forms.
The four instruction guides in Appendix A are: Subset TM Scenes Instructions;
Classification Instructions; Subset  Editing Instructions; and Combining Subsets into
the Master-Raster. These instruction guides ensure that all stages of data processing
were carried out consistently for all data sets and by all  analysts.

      Project analysts were knowledgeable in remote sensing techniques. They also
had a working knowledge of the hardware and software  used in the project,  including
SUN and Silicon Graphics workstations, and UNIX, ERDAS, and Arc/Info software.
Each  analyst was familiar with the SOP's specified for the project.
Tracking Forms

      Among the most important QA/QC elements developed for this project were
the data tracking forms.  Examples are included in Appendix A.  The primary function
of the tracking forms were to allow all steps to be reproduced, errors to be traced,
and the results to be compared between different data subsets.  Tracking forms were
used for each of the following steps in processing: receiving TM data; visual
inspection of TM data; spatial accuracy assessment; clipping TM scenes to
watershed; subsetting TM scenes; classification of subsets; cluster labeling;
combining files into the master-raster; editing subsets; and combining subsets into the
entire file.

      The tracking forms provided a mechanism so that errors could be traced to
                                      13

-------
 specific files, parameters, or procedures. The source of error could then be identified
 and corrected by repeating the processing steps and modifying the analysis.  It was
 also instructive for the analysts to be able to compare results on tracking forms from
 different data sets.  This comparison led to a better understanding of the procedures
 and how the results compared across different ecological regions.


 Receiving TM  Data

      A QA/QC check was performed to evaluate the spatial, radiometric, and
 general image quality of each Landsat TM scene as it was received. TM scenes
 were accompanied by a description sheet and header file associated with the digital
 TM data from EOSAT Corporation, the commercial landsat vendor. This header
 information contained unique parameters associated with each TM scene.
 Parameters were recorded for the purposes of documentation and later use in the
 project.

      After the initial information from a TM scene was recorded, the quality of the
 product was checked visually. The scenes were viewed to note any clouds and cloud
 shadows, data dropouts, striping, or other irregularities in the image. This information
 was also recorded  on the tracking form (Appendix A).  If the percentage of cloud
 cover was greater than  10%, the scene was rejected and returned to EOSAT for a
 replacement.
Spatial Accuracy Tests

      An accuracy test was performed to check the spatial fidelity of the
georeferenced TM data. The TM sensor collects data at a nominal spatial resolution
of 28.8 m by 28.8 m. Data for this project were resampled by EOSAT to 25 m.  As
part of the reporting procedure EOSAT provided documentation with each TM scene
which listed the Root Mean Square (RMS) error in point and  linear coordinates, along
with all points used or rejected in the resampling process.

      A documentation check and a data validation process  were used to confirm
that the spatial accuracy of the Landsat TM  data met the error specifications of
±15m RMS. The documentation check began with a review  of the QA/QC
procedures and the RMS error reported by EOSAT for the geocoded scenes. The
project staff checked the reported RMS and the spatial accuracy of the TM
scenes by using standard georeferencing methods. Points on the TM image
were compared with the identical points on USGS topographic maps.  Four 7.5-
minute USGS quadrangle maps for each TM scene were randomly selected for
validation; eight points that were identifiable in the  image and on each map
were digitized, and variation between image and map coordinates were determined.
                                      14

-------
       The selection process for the 7.5-minute quadrangles was based on a random
 selection from a grid of 64 maps covering each TM scene.  Each of these quadrangle
 areas was identified by an alphanumeric code related to the latitude/longitude of the
 southeast corner of the grid of 64 maps. Factors considered before the random
 selection included the  date of each 7.5-minute quadrangle map and availability of
 identifiable points in the image and the map. There were a few 7.5-minute
 quadrangles that had not been updated in the 1980's, and the dates on these maps,
 ranged from the 1950's to 1970's.  in some of the rural and mountainous areas the
 availability of identifiable points posed  a problem.

       If the randomly selected 7;5-minute quadrangle did not contain a-sufficient
 number of identifiable ground control points, then an adjacent 7.5-minute quadrangle
 was selected.  The  new quadrangle was chosen horizontal to the original quadrangle.
 If there were insufficient identifiable points on an adjacent horizontal quadrangle, then
 an adjacent vertical quadrangle was selected.  If this failed to produce a sufficient
 number of identifiable points a diagonal quadrangle was chosen.  This process was
 expanded and repeated until an adequate combination of map and image points
 could be identified.  TM image coordinates were identified for each identifiable map
 point, and the  information recorded on the Spatial Accuracy Assessment Tracking
 Form.

      The map was then positioned on a digitizing table and test points selected to
 determine the  accuracy of the map setup. Again this information was recorded on
 the tracking form. Map setup errors were generally found to be less than 7 m.
 Those areas with higher errors usually corresponded to older maps.

      A difference (±)  between the image and map coordinates was determined for
 each  point identified in  a TM scene.  Also for each scene a mean difference was
 determined in the x  and y directions. A graph was plotted for each TM scene
 indicating the difference between the image and map coordinates.  This information
was recorded in the project tracking  book.  The Spatial Accuracy Test indicated that
 EOSAT had met the required contract specifications. Overall, the spatial accuracy of
the TM scenes was  within ± 15m.
TM BAND SELECTION

      Multidimensional data sets, such as TM data, contain redundant spectral
information.  This redundancy results from the fact that spectral reflective properties
of land features in different parts of the electromagnetic spectrum are similar.
Employing more than four TM bands for spectral feature extraction does not
necessarily increase the clustering capability of computer-based classifications (Latty
and Hoffer, 1981; Stenback and Congalton, 1990). Spectral clustering for TM land
cover/use mapping typically utilizes band combinations which include at least one
                                      15

-------
 band from the visible (0.4 - 0.7 //m), near infrared (0.7 - 1.3 //m) and middle infrared
 (1.3 - 3.0 //m) spectral regions (Nelson et a/., 1984).  The thermal band (TM band 6),
 due to it's lower spatial resolution and feature calibration requirements, is generally
 not used for spectral cluster development.

     In order to determine the best band combination for use in classification
 algorithms, three techniques were investigated: Principal Components Analysis
 (PCA); Analysis of Correlation; and Optimum Index  Factors (OIF). The PCA method
 was investigated, but was not used. Each scene would have had a  different PCA
 transformation, making interpretation of composite images difficult, since  similar
 resources would look different from scene to scene.  In addition,  rare land resources
 may not have been distinguishable until the higher components were analyzed.

      Similarities between bands were measured by analyzing covariance and
 correlation matrices. Bands having high correlation values are considered to contain
 redundant information.  The visible bands (1, 2, and 3) of the Chesapeake TM
 scenes were shown to  be highly  correlated (most were over 90%).  This  is typical for
 TM data because most surface features have similar reflectance  properties in these
 wavelengths.  Bands 5 and 7 were also highly correlated (most over 85%). Band 4
 showed little correlation with the  other bands.

      Within-band variance provides an indication of the amount of information a
 band contains.  A high  variance results from different land-cover types reflecting
 various amounts of energy within that wavelength. It was desirable to use bands with
 high variance because  they are more likely to provide high contrast between land
 features, making it easier to distinguish and segregate land cover/use types.

      Chavez et a/. (1982, 1984) described a technique for calculating Optimum
 Index Factor values (OIF)  to determine best band combinations.  The OIF combines
the variance and covariance so that higher values result when within-band variance is
 high, and  between-band covariance is low. Combinations of bands with higher OIF
values are desirable because the amount of information (variance) is high, whereas
the amount of redundancy (covariance) is low. This was the technique selected.

      Two four-band combinations produced the  highest OIF values for the  majority
of the selected TM scenes: bands 1, 4, 5, and 7; and  bands 3, 4, 5,  and  7.  Band 1
was not included in the final band combination for the analysis because it measures a
wavelength (0.45 //m -  0.52 //m)  that is most affected  by atmospheric scattering.
Therefore, bands 3, 4,  5, and 7 were used to derive the spectral clusters.
                                      16

-------
 CLASSIFICATION TECHNIQUES

       Automated spectral classification of remotely sensed digital data, is performed
 using statistical analysis. There are two primary methods of automated statistical
 image classification: supervised and,unsupervised. The two methods:differ in the;
 way that groupings of digital values are identified in spectral space. The TM  bands
 describe a multi-dimensional data set where the digital values for a pixel describe its
 location in spectral space. Within spectral space, pixels with the same reflective
 properties will group together.  The first step in the classification process is to identify
 statistics that differentiate these groupings.

       In a supervised classification, the identity and location of representative land
 cover/use types are known a priori, through a combination of field work and/or the
 analysis of aerial photography and maps.  These areas of known land cover/use,
 called training sites, are identified on the image.  Multivariate statistical parameters
 (means, standard deviations, covariance matrices, correlation matrices, etc.) are
 calculated for each training site and are used to describe the land-cover categories  in
 spectral space (Jensen, 1986).  Enough training sites must.be established so that
 land cover/use classes are identified for all spectral reflective conditions present,in
 the scene. This requires identification of training sites for all existing land cover/use
 features under all topoiogical, hydrological, and physiological conditions which might
 alter the spectral reflectance of the feature.

       In an unsupervised classification, the identities of land-cover types to be
 specified as classes within a scene are not generally known a priori, either because
 ground truth  is not available or surface features within the scene are not well  defined
 (Jensen, 1986). An unsupervised clustering  algorithm examines the spectral values
 of the imagery  and statistically groups similar values into spectral clusters.  A second
 algorithm  then  determines which cluster best represents the combination of spectral
values for each individual pixel and assigns that cluster value to the pixel.  It is then
the responsibility of the analyst to assign a land cover/use label to each cluster using
 available reference materials.

      An  alternative to pure  supervised or unsupervised classification is to use a
 combination of the two.  A test was performed to determine whether a first-pass
supervised classification could eliminate large, easily identified  land features prior to
 an unsupervised classification.  It was found  that available reference material  only
allowed for identification of easi.y recognized, very distinct features.  These features
also were easily differentiated with  an  unsupervised clustering algorithm. It was also
found that a prohibitive number of training  sites were required to adequately cover the
widely  varying spectral signatures of the classes.

      The selection of training sites had to be made using the analysts' skills in
 manual interpretation of the TM data, since available reference material were
                                        17

-------
 collected at different times of the year.  Analysts not familiar with the study area had
 a difficult time choosing representative training sites. The additional time spent
 choosing training sets was not worth the minimal gain.

       Several alternative unsupervised clustering routines were considered and
 tested. The four criteria used to evaluate the options were: (1) validity of theory, (2)
 coverage of spectral space, (3) separability of resources, and (4) ease of use.  A full
 discussion of all alternative unsupervised computer programs is beyond the scope of
 this report.  After evaluation of available software, none of the routines was
 considered to be entirely adequate. Additional research efforts were made to refine
 an  existing clustering routine. Therefore, a *wo-step clustering technique was
 developed for use in this project

       The two-step process  clustering is illustrated in Figure 3-2.  Cluster statistics
 were gathered within the scene subset. Each pixel was then  assigned to a cluster
 using the allocation program  described below.  The resulting image was examined by
 the analyst and the spectral clusters were assigned to a land  cover/use category.
 There were, inevitably, some clusters  that represented  more than one land cover/use
 type or remained unclassified. These "confusion" clusters were statistically analyzed
 again to break each cluster into several better defined spectral clusters.  Pixels
 belonging to the "confusion" clusters were then assigned to one of these new
 clusters. These refined clusters were examined by the analyst and assigned to a
 land-cover category.
Subset TM Scenes

      A total of 16 Landsat TM scenes were required to cover the Chesapeake Bay
watershed.  Each TM scene included areas of overlap with its neighboring scenes
and some included areas outside the watershed.  Boundary scenes were clipped to
eliminate those portions lying outside the watershed using a Geographic Information
System (GIS) file provided  by the Chesapeake Bay Liaison Office. Visual
interpretation of the quality of the imagery was used to select which TM scene would
be used in areas of overlap.

      Computer processing limitations required that TM scenes be subset to practical
image file sizes. A single TM scene contains approximately 38 million, pixels.
However, computer hardware available at EMSL-LV was limited to a maximum
instantaneous display of 1024 X 1024 pixels.  This restriction could be overcome by
displaying an image at a reduced scale.  However, the resulting image would no
longer contain the full resolution of the original data, and would limit the ability to
identify and label the clustered imagery.  Limitations associated with the file server
and backup facilities further restricted the ability to move, copy, analyze,  backup, and
otherwise maintain larger data files.
                                       18

-------
                            TM Subset
                            Raw Data
                                f
                      Gather Cluster Statistics
      Original
     TM Scenes
Assign Pixels to Clusters
   •       I
    Label Clusters  -< —
          I
Reference Data
                            Partially
                          Coded Image
              Non-Confusion
                 Clusters
                 Confusion
                 Clusters
                                           I
                                 Refine Cluster Statistics
                               Reassign Pixels to Clusters
                                           I
                                     Label Clusters  -< -
                        Recede Pixels •<-
                                  Classified TM
                                     Scenes
                          Classified
                           Subset
Figure 3-2  Flow chart outlining the steps in the image classification analysis
                                      19

-------
       In addition to the above, single land cover/use categories may have several
 spectral signatures depending on the characteristics of their physical location, satellite
 viewing angle, soil conditions, slope, aspect, atmospheric effects, etc.  Larger data
 sets would generally contain a greater number of land cover/use types and
 consequently more spectral signatures.  Also, large clusters tend to obscure spectral
 variability which may have otherwise generated seeds for new clusters. Therefore,
 an image subset of a quarter TM scene was chosen as a maximum file size for
 analysis.
Defining Spectral Clusters

       Cluster statistics are the foundation of any unsupervised spectral classification
and must accurately describe spectral information found within the image under
analysis. The clustering programs used for this project were developed by
programmers at EMSL-LV (Weerackoon and Mace, 1990). The algorithm used a
non-overlapping 3x3 moving window as a basis for defining cluster statistics.  It was
desirable to collect the statistics of windows containing one cover type and reject
windows containing more then one cover type.  This was done by analyzing the
variance between pixels within a window.  Windows containing more than  one land
cover/use type should have a higher variance than those representing a single land
cover/use type.  Therefore, windows with lower variance were used to develop cluster
statistics.

      A separate routine was  used to define maximum variance threshold levels for
acceptable windows. This generated a report which showed  the discrete cumulative
distribution functions of the ceiling integer values of the variances for each band.
Analysts identified variance threshold values for each band corresponding  to the 50%
cumulative level. The variance thresholds were then used as input parameters to the
clustering program.   If the variance of a window in any channel was greater than the
user-defined threshold for that channel, the window was rejected. Lower threshold
values resulted in fewer acceptable windows; higher thresholds resulted in more
acceptable windows.

      The  clustering algorithm used the accepted windows to build spectral clusters.
The first acceptable window became the first spectral cluster.  From then on, each
acceptable window was checked against the previously defined clusters. The closest
cluster measured in Euclidean distance (in spectral space) to the new window was
identified.   If the mean of the new window was further than two standard deviations
from the closest cluster, the window became a new cluster.  If the mean of the new
window was between one and two standard deviations from the closest cluster, it was
rejected. If the mean of the window was within one standard deviation  of the closest
cluster, the statistics of the  window were merged with that of the cluster.  By
analyzing the entire scene and checking all acceptable 3X3 windows, a set of initial
                                       20

-------
 spectral clusters was identified.

       Because of the complexity of the TM scenes there were more initial clusters
 than were practical for an analyst to evaluate. Further statistical analysis was
 performed to reduce the number of spectral clusters.  The user was able to specify
 the number of desired final clusters before running the program.

       The variance threshold values, used to evaluate windows above, were used
 again in this final cluster merging step.  An iterative process beginning with 0.1 times
 the standard deviation (the square root of the threshold variance) for each band was
 used.  All distances between clusters  (in spectral space) were checked.  If any cluster
 distance was found to be within  0.1 times  the standard deviation, the two clusters
 were merged.

    The next iteration compared 0.2 times the standard deviation between the cluster
 distances, and the next iteration used 0.3  times the standard deviation. The iterative
 process of merging clusters stopped at 1.5 times the standard deviation.  If, at any
 time during this  iterative process, the number of clusters was less than or equal to
 the desired number of clusters specified by the user, the program stopped.  The
 remaining clusters became the final set output by the program.  If the number of
 clusters after the iteration using  1.5 times  the standard deviation was still  larger than
 specified, the program only output clusters built from the greatest number of  pixels
 and deleted the remaining clusters..
Image Classification - Pixel Allocation

      The final step in the image classification, after generation of the spectral
clusters, was allocation of individual pixels to the cluster class.  The classification or
allocation step assigned each pixel within the subscene to the cluster to which it,had
the highest likelihood of being a member.  The most common and generally preferred
method of pixel allocation is the maximum likelihood classifier (Richards, 1986).
Other methodologies, such as nearest neighbor and parallelepiped, can be employed
where computer time is at a premium.  This project used an optimized maximum
likelihood routine (Weerackoon and Mace, 1990) and relatively fast computers,
minimizing processing time to the point that this was not a limiting factor.


Cluster Labeling

      During the labeling process, an analyst assigned a  land cover/use class name
to each of the spectral  clusters. The use of reference data was  necessary to aid the
analyst in labeling the unsupervised clusters. Table 3-1 lists all reference data used
by the analysts to label the clusters.  The reference  data sets fell into two categories,
                                       21

-------
 %
 Oi
 o

 a
Q.

 e


U
 o

"w
 U)
Jtt

O
TJ
TJ
cu
.2
*w
 cu
•M
 m


 0)
 u


 I
CO

0)

2

.2









3
o
CD
'55
.1
Q.
O
CD
CO
O
CO




'co
"•







•s
CO
3
co
Q


P
0
CD
1
co co"
.2 £
8C
CO
ll
'"of c
	 to co
0 sjj ji
CD CD "CD
OH I



o o
§o o
o o
Cf 0" °-
m o T
CM t- CM
T- T- T-

co
Q.
CO
E
CL
8
•5
CO
X




CO
Q.

USGS Topographic




CO
0
CL
i— 15
CD JO
0 8 1
S — CD
— cf
g ID 3
IfS:
*2 "o •*•
£.0 "S
£ 0 CT
•C co o
Q. Z) <




o o
o o
o o
o" of
""" "~
CO

"o
CL
Q.
o
o
CD
X







Aerial Photography
NAPP
NHAP





CO
o
1
ll
II
'co Q.
C;
co —
0 CO
0 CO
0 >



o
S
o
o
T-
CO
0
O)
CD
CL
O
o
CO
X
0
o>
CO
E
0
CL
CL
CO
•?••
ILandsat Thematic to
maps




CO
CD
Q.
§
O
8"
.CO to
i |

CO _.
.2 ?
co •==
.0 5
-o 2
SO




<
z




CO
0
o
QD


CO
c
=
02
CO
1 Agricultural Statistic
by state






li
o ,2^
^J
*~ 0
1 g
CO o
CO •=
0 .SH
.0 0
S CL




o o
O TT
O 00
o" io~
CN 
S




0
1

County boundary co





CO
0
Q.
cover/use
T3
JO
^
0
0
O


CO
0

'5.
E
o
-O





3
O)
b





u.
0)
§
o
c
CN ra
O> 0
T- CO
O "g
-2
"S
•a
o
O5
o
T3
CD
0
a.
5
8
T3
JO
"2
0
0
O



-------
 those used to aid in geographic location and those used to identify land cover/use
 types.  The USGS topographic maps and the TM image maps were used mainly to
 help analysts locate areas of interest. -Most other data sets were used for labeling
 the unsupervised clusters.

       The most  important reference data set utilized for labeling clusters was aerial
 photography. A search was made to find a reference data set that was inexpensive,
 of high quality, and would give a consistent coverage across the study area. Aerial
 photographs satisfied these requirements.  The TM images were acquired during leaf-
 on  conditions of years 1988, 1989, and 1991.  Color infrared 1:40,000 scale USGS
 National Aerial Photography Program (NAPP) photographs, acquired between 1987 to
 1990, were available for most  of the watershed. In areas of the watershed, which
 lacked NAPP coverage, USGS National High Altitude Photography (NHAP)  was
 substituted.  The NHAP photographs were acquired from 1982 to 1985. The NAPP
 photography was preferred because it had higher resolution and more recent dates of
 acquisition.

      The following  types of areas were recognized as requiring aerial photographs
 for  complete classification and labeling: 1) areas with clouds, 2) areas with cloud
 shadows, 3)  areas of physiographic change (i.e.: gypsy moth defoliation), and 4)
 areas of special interest. After inspecting five scenes (16/34, 16/33, 15/33,  14/34,
 and 14/33),  130 to 200 areas per scene were identified which required  additional
 information for categorization.  However, due to cost constraints, photos were ordered
 for  only 20 to 50 of the most problematic areas for each scene.  In addition, stereo
 pairs were ordered for the center portion of each of the 291 EMAP hexagons located
 within the watershed. Additional photos, evenly distributed across the 1:100,000 map
 indexes, were ordered to cover general land cover/use types. This resulted in a
 photo coverage of roughly 35% of the watershed.


 Cluster Refinement

      While labeling clusters, analysts sometimes found confused land surface
features within a single spectral cluster.  Clustering routines were developed to
analyze the raw data of those pixels found in confusion clusters  to form new spectral
clusters.  These programs are  analogous to the clustering and maximum likelihood
routines explained above.  They used the same statistical principles but only work
with those pixels identified as belonging to a confusion cluster.  This refinement of the
confusion clusters usually improved the separability of land cover/use classes.


Post Classification Editing

      After all of the spectral clusters were labelled  for all subsets, the data were
                                      23

-------
 examined for classification errors. This procedure required analysts to examine every
 portion of the final classification at full resolution and manually update any areas that
 were unclassified, cloud covered, or incorrectly classified.  The reference materials
 described in Table 3-1 were used by the analysts to make image corrections.  Edits
 were necessary in areas where cloud cover or data anomalies obscured ground
 features.  Common misclassifications included confusion between gypsy moth
 damaged forest and herbaceous clusters.  Edits were made by manually drawing
 polygons around areas on the image screen and changing cluster values. In
 addition,  major highways (limited access) were checked and vectors were drawn
 along the highway.'? where they were not defined by at least a single row of pixels.
Combining and Edge Matching Classified TM Data

      After all subsets in each of the individual TM scenes were labeled, they were
mosaicked together to form a final coverage file. The original TM scenes were
georeferenced and most were projected into UTM zone 18 coordinates. Four scenes
(16-32,  16-33, 16-34,  and  17-34) were referenced to UTM zone 17.  All TM subsets
were reprojected into the same Albers Equal Area Conic projection coordinates
before building the final coverage.

      Each subset of classified data was individually mosaicked into the final
coverage.  As each data subset was added  it was checked for edge  matching.  The
edge of each subset was viewed at full resolution to check that all land cover/land
use classes were labelled  consistently and matched across boundaries of subsets
within scenes and between scenes.  This assured that all the data along the edges
was  consistently labeled and that there were no gaps. Occasionally  it was apparent
that one of the land cover/use types was slightly over or under represented due to a
mislabelling of one of the clusters. This usually happened when the  cluster was a
mixture  of two surface types.  In these cases the original cluster label was changed.

      When visual inspections were made, most features lined up well between
subsets. Geographic features crossing scene and subset boundaries matched very
well. However, some  clusters were found to have been labelled differently in
adjacent subsets, even though they contained the same resource. This occurred
primarily where clusters contained mixed cover types or there was confusion in
interpreting between similar cover features.  Whenever inconsistencies  were
discovered,  clusters were reexamined and appropriate labels assigned.

      After each TM subset was determined to be acceptable, the cluster values
were receded to the final range of class values.  This was a two step process.
Following classification and editing, each TM subset contained cluster values ranging
from 1 to 150, depending upon the image subset.  Each cluster of a common surface
type was assigned to a common pixel value. After all TM subsets were combined
                                      24

-------
 into a single coverage the entire file was receded to the class numbers as designated
 in the classification system in Chapter 2.
 Smoothing

       A 1  hectare (ha = 10,000 m2) Minimum Mapping Unit (MMU) was selected for
 this project. The MMU is the smallest contiguous area incorporated in the final digital
 product. The TM sensor collects data at a nominal spatial resolution of 28.8 m by
 28.8 m.  Each TM pixel area is approximately 830 m2.  This represents the closest
 approximation of the 10rto-1 ratio recommended by Congalton era/. (1992) and
 others to minimize propagation of spatial error, and to supply a mapping product
 which  presents uniform and consistent delineation of land cover/use types over large
 areas.  The TM data was smoothed to eliminate small groups of pixels of less than
 the MMU.  Smoothing was executed on the data set after all subscenes were
 mosaicked into one final file.

       A two step process was used to smooth the data: locating areas smaller than
 the MMU; and reassigning those pixels to other adjacent land cover/use values. The
 first step identified areas smaller than 1 hectare by identifying groups of adjoining
 pixels which had the same value and by counting the number of pixels in each group.
 A Minimum Mapping Unit was specified by choosing the minimum number of pixels a
 group should have. Groups of adjoining pixels smaller then the Minimum Mapping
 Unit were eliminated and areas larger than the Minimum Mapping Unit were retained.
 Adjacent pixels included horizontal,  vertical, and diagonal strings of individual pixels,
 which helped to preserve narrow linear features.

      The  second step replaced old pixel values in areas smaller than the minimum
 map unit with values from adjoining pixels. For this study, a "majority  rule" was used,
 where  the new pixel value was changed to the value most frequently found  in a 3x3
 window surrounding the pixel.
THEMATIC ACCURACY ASSESSMENT

      The Chesapeake  Bay watershed thematic accuracy assessment involved
developing a methodology which was statistically valid and which adequately and
efficiently represented each of the classes in the categorized data -set. This included
determining a representative sample number and deriving a method for comparing
verification data to the categorized data set.

      A preliminary accuracy assessment of a portion of the Chesapeake Bay
watershed categorized imagery was undertaken at Towson State University, Towson,
Maryland.  This consisted of a limited ground verification of the categorized data for
                                      25

-------
 Baltimore County.  This initial assessment indicated certain ambiguities with a
 number of original class divisions, and resulted in revisions to the original
 classification scheme.  This study underscored the need for a comprehensive
 accuracy assessment, which was subsequently accomplished.
Accuracy Assessment Method Design

      Accuracy assessment is a comparison of classified and labeled data to some
true or known data.  Designing the methods fcr assessing the thematic accuracy of a
large categorized data set included identification and acquisition of verification data,
design of a sampling scheme, and determining a means for comparing the
categorized and verification data sets so that accuracies could be determined.

      A common way to express the accuracy of such image or map data is by a
statement of the percentage of the map area that has been correctly classified when
compared with reference data or ground-truth (Story and Congalton,  1986). in
accuracy assessments, the most common way to represent the classification
accuracy of remotely sensed data is in the form of an error or confusion matrix as
shown in Table 3-2 (Congalton, 1991).
Table 3-2. An Example of an Error or Confusion Matrix.
CATEGORIES
Image A
Image B
Image C
Producer's
Accuracy
Reference A
65
25
10
65/100 » 0.65
Reference B
5
85
10
85/100 - 0.85
Reference C
10
15
75
75/100 = 0.75
User's Accuracy
65/80 = 0.81
85/125 = 0.68
75/95 = 0.79

Sum of major diagonal = 225 Overall accuracy = 225/300 = 75%
      An error matrix is a square array which expresses the number of sample units
(i.e., pixels, clusters of pixels, or polygons) assigned to a particular category relative
to the actual category as verified on the ground (Congalton, 1991). The columns in
an error matrix represent reference or ground-truth data, while the rows represent the
labeled data (Story and Congalton, 1986). The diagonal elements of this matrix
represent agreement or correct classifications.  The number of correct observations
                                      26

-------
 divided by the total number of points observed (times 100%) gives the overall percent
 of correct classifications, or the overall map accuracy.  The error matrix is an
 effective way of visualizing errors of omission (exclusion) and commission (inclusion).

       The accuracy for each class can be given in two ways: from the perspective of
 the user of the map, i.e., the percentage of times that a class on the map correctly,
 identified the class actually on the ground; or from the  perspective of the producer,
 i.e., the percentage of times that a class on the ground was correctly identified on the
 map.  These two approaches can give very different results as is illustrated in
 Table 3-2,  where the user's accuracy for category A is 81%, while the producer's
 accuracy for category A is only 65%. '                 •    .  *

       A Kappa coefficient (Congalton 1991)  is derived from the error matrix
 calculations and is used to measure the relationship of non-random categorization
 agreement versus expected disagreement. The calculation of Kappa assumes a
 multinomial sampling model and  independence (Bishop et a/., 1988).  It is used to
 monitor trends in reliability from one categorization to another. The Kappa coefficient
 equals zero when the agreement between the categorized data  and ground truth
 equals chance or random agreement.  Kappa increases to one as chance agreement
 decreases.  Kappa equal to one occurs only when there is perfect agreement.


 Verification Data Set

       The  verification data set consisted of medium to small-scale aerial photographs
 acquired by the US Department of Interior under the NAPP and  NHAP programs.
 The photographic coverage included color infrared (CIR) aerial photographs over the
 entire United States at nominal scales of 1:40,000 and  1:58,000 respectively.   Early
 in the categorization  process a large quantity of NAPP photographs were acquired,
 both to assist in the labeling process, and to support the accuracy assessment effort.
 The accuracy assessment was performed using these previously acquired
 photographs. In order to avoid bias, photographs used in the  labeling process were
 excluded from the data set  used  in the verification process. Limited reserve field data
from earlier verification studies were used to validate the aerial photointerpretation of
 land cover/use.

       Dates of the NAPP photographs ranged from September,  1987 to April,  1990.
The dates of the NHAP photography ranged from April, 1981 to April, 1982.  The
temporal variation between  the NAPP prints and the TM imagery did not appear
significant.  In the case of the NHAP photographs, which varied up to nine years in
acquisition timing from the dates of the TM imagery, changes in  land cover/use were
sometimes  noted.  In these cases, the photos were used in conjunction with the
original TM imagery to evaluate the land cover/use classifications.
                                      27

-------
Sampling Scheme

      Adequately representing the categorized land cover/use data set was critical to
a valid accuracy assessment.  The sampling criteria employed in this study were
based on the multinomial equation developed by Tortora (1978).  Using this formula,
and assuming a worst-case scenario (the majority land cover/use class representing
50% of the coverage), 71 samples were required for the majority land cover/use class
to attain an 85% confidence interval. This was independent of the actual population
size and  was applied initially to three TM Scene areas and then  to the watershed as
a whole.  The sample sizes for the remaining categories were evaluated by
determining the proportional representation of each class within the scenes or
watershed relative to the majority class.  The proportional value was chosen since it
was simpler to calculate and tended to produce a larger sample.

      These  calculations indicated that small sample sizes were appropriate for
classes with low numbers of samples. In order to adequately assess these
categories, an attempt was made to obtain a minimum of 5 samples per class.  In
some cases, however, repeating the sample selection  process numerous times failed
to produce five samples. As a result, some categories were not  evaluated in certain
data sets. This procedure was repeated for three representative TM scenes and for
the watershed as a whole.  Table 3-3 shows the combined total number of samples
per category evaluated during the assessment (note that the three level 2 forest
categories were aggregated into a single woody category in the final image data).
   TABLE 3-3. Total Sample Points Evaluated by Land Cover/Use Category
              Category

        11  High-Density Developed
        12  Low-Density Developed
        21 Deciduous Woody
        22 Evergreen Woody
        23 Mixed Woody

        30 Herbaceous
        40 Exposed Land

        60 Water
\
(20 Woody)
                  Total Samples

                       27
                       61
                                           Total
282
 43
  5

224
 20

 42
704
(330)
                                     28

-------
 EMAP Hexagon Sampling Scheme

      A review of the preliminary three scene accuracy assessment effort showed
 that screen digitizing the reference photographs occupied a significant amount of
 time. In the case of the preliminary scenes, as many as 96 reference photos were
 digitized.  Building a corresponding photo data set for the watershed would have
 involved digitizing several hundred photos.  One goal of this .project was to test the
 effectiveness of the EMAP hexagons for characterizing land cover/use data.  For this
 reason, a method of constraining the data set to existing EMAP Stage Two hexagons
 was devised.
                     t-
      The EMAP sampling grid consists of a series of continuous 640 km2 hexagons
 covering the country.  The EMAP Stage Two hexagons are an evenly spaced grid of
 40 km2 hexagons that fall at the center of every sampling grid cell.  The Chesapeake
 Bay watershed contains 291 hexagons, uniformly distributed throughout the
 watershed. Figure 3-3 shows the location of these hexagons.  In addition, NAPP and
 NHAP aerial photos were available for virtually all of these hexagons.

      Before proceeding with the hexagon-constrained sampling routine, a test was
 performed to determine the adequacy of the EMAP hexagon grid for representing the
 watershed at large.  The boundaries of the hexagons were superimposed over the
 categorized map image and summary statistics were derived for the hexagon areas,
 the photo coverage  within the hexagons, and the watershed as a whole. The results
 of the classification for the whole watershed were compared to the  results extracted
 from within the hexagons.  Table 3-4 contains the summary statistics from the final
 classification, including the data within the 291 EMAP Stage 2 hexagons and the
 photo coverage used for the accuracy point sampling. For each of the 6 categories
 found in the watershed, areal extent is given as percentage, acres,  and hectares.

      The Woody category comprised  half (54.7%) of the land cover/use within the
 watershed. The classes making up the remaining half of the watershed are:
 Herbaceous (32.9%), Water (7.5%), Low-Intensity Developed (4.0%), High-Intensity
 Developed (0.6%), and Exposed Land (0.4%). The results confirmed that the
 hexagon grid was an appropriate approach  for a representative subsample of the
Watershed land cover/use data.
Assessment Procedure

      Once the sampling scheme had been determined, the photo verification data
set was assembled. This was accomplished by screen digitizing the four corner
points of the selected photos onto the displayed TM imagery. The result was a
coordinate file representing the selected photos and their coverages.  Those areas of
the classified data set which were within the photo coverages were then extracted
                                      29

-------
          Pennsylvania r» *    ••*  ^ • _•
                 i*  '."••• '."• "'
                 '•..•••„•»'
                ^ '..••••.. *
                  •         *
      West Virginia /
                                    EMAP Stage
                                    Two Hexagon
Figure 3-3. Distribution of the 291 EMAP 40 km2 Hexagons within the
Chesapeake Bay Watershed.
                        30

-------
Table 3-4. Land Cover/Use Statistics and Correlation Coefficients for Overall
Chesapeake Bay Watershed, Sample Point Photo Coverage, and EMAP
Stage 2 Hexagon Coverage.
Hexagon and PI
Overall Watershed
Category
11
12
20
30
40
60
Total
Hexagon Coverage
Category
11
12
20
30
40
60
Total
Photo Coverage
Category
11
12
20
30
40
60
Total

Hectares
101,302
714,992
8,158,817
5,838,413
49,495
1,333,399
16,196,419

Hectares
7,089
46,171
624,417
358,233
3,260
78,611
1,117,780

Hectares
15,871
118,451
1,719,006
1,035,438
8,277
140,938
3,037,981
Acres
250,323
1,766,783
20,160,877
14,427,034
122,306
3,294,901
40,022,223
Acres
17,517
114,090
1,542,969
885,212
8,055
194,251
2,762,095
Coefficient
Acres
39,217
292,698
4,247,757
2,558,623
20,454
348,266
7,507,015
Sq.Miles
391.13
2,760.60
31,501.37
22,542.24
191.10
5,148.28
62,534.72
Sq.Miies
27.37
178.27
2,410.89
1,383.14
12.59
303.52
4,315.77
of Correlation
Sq.Miies
61.28
457.34
6,637.12
3,997.85
31.96
544.17
11,729.71
                                                                    Percent
                                                                      0.57%
                                                                      4.03%
                                                                    54.74%
                                                                    32.88%
                                                                      0.28%
                                                                      7.51%
                                                                    Percent
                                                                     0.63%
                                                                     4.13%
                                                                    55.87%
                                                                    32.05%
                                                                     0.29%
                                                                     7.03%

                                                                    0.99190
                                                                    Percent
                                                                     0.52%
                                                                     3.90%
                                                                    56.58%
                                                                    34.08%
                                                                     0.27%
                                                                     4.64%
                                           Coefficient of Correlation = 0.99338
        Category 11 = High Intensity Developed
        Category 12 = Low Intensity Developed
        Category 20 = Woody
        Category 30 = Herbaceous
        Category 40 = Exposed Land
        Category 60 = Water
                                      31

-------
from the overall watershed data. The resultant file, constrained to the photographic
coverage, was the data set from which the random sample points were drawn.

      Final sample point selection was performed by individuals not participating in
the accuracy assessment process. This was done to avoid introducing bias into the
photointerpretation process. In addition, the database land cover/use types were not
revealed to the analyst until the assessment was completed and the results compiled.

      Following the random point selection process, the area surrounding each point
was observed and an interpretation made as to it's identity.  The categorical value
assigned to each 3x3 sample site was the majority value of the 3x3 pixel window
centered at the selected point  location. If no majority existed within the window, then
the value assigned was that of the central pixel.  The 3x3 pixel sample window was
selected as it would provide a  central pixel for point location.  It also provided a
sample  size of 0.56 ha, or approximately half the size of the MMU, which facilitated
an evaluation of small or narrow and irregularly shaped thematic units, as well as
mixed or transition areas.

      The final stage in the process involved displaying the outlines of the photo
coverage as  it overlaid on the  original TM imagery. The 3x3 pixel sample site was
displayed on the screen, located on the corresponding photograph,  plotted on an
acetate  overlay attached to the photo, and an interpretation made as to the correct
identification  of the land cover/use. This interpreted land cover/use  value was then
recorded in a Classification Accuracy Table (CAT), which contained  the sample.
number, its coordinates, the categorized value, and a data record field into which the
verification (photo-interpreted)  values were entered.

      A Photographic Accuracy Assessment Form (Appendix A) was used to  record
the visually interpreted land cover/use category.  This form allowed the analysts to
make comments about the nature of .features within the sample sites, the quality of
the photographs,  and to assess and assign a secondary category value if the  sample
was of mixed land cover/use types. A 3x3 pixel grid was employed  to graphically
represent the position of significant features within and surrounding  mixed sample
sites.

      Results of the accuracy assessment of the watershed are presented in
Table 3-5.  Categorical detail was reduced from an original 17 classes to 6 classes to
obtain a final overall accuracy  of 80% with an  85% confidence level. Initial
accuracies for many of the original classes were less than 50%.  The Exposed Land
category had the  lowest accuracy,  with 60% User's Accuracy.  Further aggregation of
Developed and Exposed classes does not improve the final individual class thematic
accuracies. Note that the overall accuracy of the non-transition sample sites (those
more than  one pixel from a category boundary) was 90%.
                                       32

-------
     Table 3-5. Final Thematic Accuracies for the Chesapeake Bay
                  Watershed Categorized Data Set
 Image Classes
      11
      12,
      20
      30
      40
       0
      Total
                  Photointerpretation Classes
                11    12   20    31    40
            61
Total
19
6
1
8
0
0
34
2
42
12
12
0
0
68
1
6
288
32
4
0
331
0
6
29
164
4
1
204
2
0
0
6
12
4
24
0
0
2
3
0
38
43
24
60 tl.
332
225
20
43

Combined Overall Watershed Accuracies
      Category              Producer's Accuracy   User's Accuracy
 11 = High-Intensity Developed     19/34 = 55.88%
 12 = Low-Intensity Developed     42/68 = 61.76%
 20 = Woody                  288/331= 87.00%
 30 = Herbaceous              164/204= 80.39%
 40 = Exposed Land              12/24 = 50.00%
 60 = Water                     38/43 = 88.37%
                  19/24 =-79.16%
                  42/60 -70.00%
                288/332= 86.74%
                164/225= 72.88%
                  12/20 = 60.00%
                38/43 = 88.37%
Final Chesapeake Bay
Watershed Categorized'
Data Set Results:
    Sum of major diagonal   =563
Overall accuracy 563/704 = 79.97%
Kappa Coefficient (Khat) = 0.70155
      Variance of Kappa = 0.00048
Watershed Non-Transition
Site Results:
    Sum of major diagonal  = 79
 Overall accuracy 79/88  - 89.77%
 Kappa Coefficient (Khat) = 0.86093
 Variance of Kfippa     = 0.00192
(Non-transition sites consisted of samples at least one pixel removed from
any category boundary.)
                                33

-------
 FINAL LAND COVER/LAND USE GENERATION

      The final accuracy assessment results were used to refine the land cover/use
 classifications in the Chesapeake Bay watershed data set. Individual land cover/use
 accuracies that fell significantly below 60% were merged or aggregated into similar or
 more general classification categories. The  original 17 land cover/land use types
 were first reduced to 12 classes. This action followed the preliminary accuracy
 assessments by personnel of Towson State  University, MD. Subsequently, the
 classes were merged to eight classes prior to the  start of the final accuracy
 assessment.  Results of the final accuracy assessment necessitated a further
 reduction to six classes.  The watershed data set was receded to these final six
 classes.  These actions produced a land cover/use data set of known and acceptable
 thematic accuracy.

    The final data product was converted  to Arc/Info GRID coverage, and will be
 archived and  distributed on 8mm digital tapes. Arc/Info GRID  is a raster data format
 that is integrated into the Arc/Info vector GIS database environment. The  use of
 Arc/Info GRID will facilitate modeling efforts where synthesis of vector GIS data with
 raster image data is necessary or desirable.  A reduced scale  image of the final
 classified Chesapeake Bay watershed land cover/use map is shown in Figure 3-4.
 The final Arc/Info GRID coverage is approximately 80 Mbytes.
SUMMARY

      All methodologies described were implemented in the creation of the
Chesapeake Bay watershed land cover/use data set.  These methodologies cover the
selection of TM Bands to use in analyses, quality assurance and quality control
procedures, the subsetting of TM scene for analysis, classification techniques,
accuracy assessment methodologies, and the generation of final products. The
following paragraphs summarize the major approaches and results:

1 - TM bands 3, 4, 5 and 7 were selected for analysis because this band  combination
      had a consistently high Optimum Index Factor (OIF), a measure of the
      information content of individual bands and band  combinations;

2 - TM scenes were subset into quarter scenes due to system limitations and
      because coarser resolution data sets were found  to be inadequate  for reducing
      statistical confusion or separating general land cover/use features;

3 - A two-step, unsupervised approach was used, utilizing a custom  clustering
      algorithm and an optimized maximum likelihood classifier to spectrally classify
      TM data;
                                      34

-------
Figure 3-4. Final classification of the Chesapeake Bay Watershed.
                                    35

-------
4 - Spectral clusters were identified and assigned land cover/use labels primarily
      through reference to USGS NAPP and NHAP color infrared  photographs.
      Aerial photographs were selected because they provided a reference data set
      that was relatively inexpensive, of high quality, and which provided consistent
      coverage across the study area;

5 - Following the first labeling step, a second unsupervised clustering and labeling
      routine was accomplished for all confusion clusters and unclassified pixels.
      This refinement was  added because it improved the separation of land
      cover/use classes within the confusion clusters and reduced the number of
      unclassified pixels;

6 - All subsets of classified TM imagery were individually merged into a larger
      coverage, edge matched, and receded to the final values of the  classification
      system;
7 - All contiguous groups of pixels having the same land cover/use value were
      examined to eliminate areas smaller than the Minimum Map Unit of 1 ha.
      was accomplished through use of a two step smoothing algorithm;
This
8 - An assessment of the thematic accuracy of the land cover/use was accomplished
      through photointerpretation of class-stratified random points.  Land cover/use
      classes which  did not meet accuracy DQO's were combined or aggregated into
      similar or more general classes and receded;

9 - The final land cover/use classes were then converted to Arc/Info GRID format for
      archive and distribution.
                                      36

-------
                                   CHAPTER 4
                           RiESULTS AND DISCUSSION
       The primary accomplishments of the Chesapeake Bay Watershed Pilot Project
 included a classification methodology, a digital land cover/use map, and a test of the
 EMAP Hexagon sampling scheme.  This Chapter discusses the utility and limitations
 of the classification scheme and methodologies, and modifications are suggested.
 The final digital classification results and the test of the EMAP Hexagon sampling  are
 also discussed.
 CLASSIFICATION SYSTEM

       The classification system used for this project was a prototype of an
 interagency classification developed, as  described in Chapter 2.  However, not all
 features of the system were suited to the classification work performed in this project.
 The primary concern was the disparity between the categorization of land cover
 versus land use.  The proposed system  attempted to incorporate both;

       Categories such as Woody, Herbaceous, and Water describe land cover. They
 identify basic features on the earth's surface.  Categories such as Developed and
 Cultivated are land use descriptions. They identify an associated human use or
 interpretation of surface features.  Wetlands represent conceptual environmental
 variables defined  by soil types, topography, and species assemblages. These are
 often indistinguishable in spaceborne or  aerial imagery.  Satellite sensors can only
 measure reflected or emitted radiation from the earth's surface.  It is the
 differentiation and identification of the resultant reflected or emitted spectra which
 drives traditional computer assisted digital-image classification. A priori knowledge is
 generally required to differentiate most land use classes.  Detailed photointerpretation
 and field investigations are generally required to accurately delineate wetlands.

      Results of the thematic accuracy assessment confirmed the difficulty of
 differentiating land cover and land use classes. Several of the original classes
 exhibited accuracies of less than 50%. This necessitated the aggregation of related
 land use and land cover classes into more general categories. For example, all
 herbaceous cover classes, including cultivated, were combined into one category.
 This process was repeated until the resulting overall accuracy met EMAP-LC Data
 Quality Objectives (DQO). While the remaining land  cover/land use classes may not
 serve all intended purposes, they represent an accurate segmentation  of surface
features, within which more detailed differentiation and study can be accomplished.
                                      37

-------
      There are a variety of potential methods which could produce greater thematic
detail. One possibility would be to use the original raw spectral data to develop new
spectral clusters' within the final differentiated categories. This would eliminate some
of the ambiguity incurred in developing signatures from an entirely undifferentiated
TM scene or subscene, possibly improving the chances for separating desired
features. Another possibility would be to use ancillary data such as municipal
boundaries, transportation networks, population maps, tax maps, etc.,  in conjunction
with the existing land  cover classification to differentiate certain land use categories.
The National Wetlands Inventory data from the United States Fish and Wildlife
Service should serve to define we;land categories.  There are ever increasing
numbers and types of spatial data available which could be used to refine and
improve the final categorization.
CLASSIFICATION METHODOLOGY

      The TM bands used in the unsupervised classification were selected based on
an Optimum Index Factor (OIF),  as described in Chapter 3.  Based on the results of
the OIF analysis, bands 3, 4, 5, and 7 were selected to maximize spectral
information, while reducing processing complexity and time.  This combination
avoided problems of atmospheric scattering associated with the shorter wavelengths
measured by band 1.

      After completing the classification analysis, it was suspected that some of the
confusion encountered between cover types during cluster labelling could have been
reduced if TM bands 1 and 2 had been included.  For example, a confusion between
murky water and high intensity urban cover types might have been avoided. Given
the widely varying types of surface features of interest in  this study, it is suspected
that using all  reflective spectral information would yield better classification results.  In
addition, newer computer hardware has made the issue of processing complexity and
speed less of a constraint.
Subset TM scenes

      The creation of Landsat TM scene subsets was the best available method for
reducing the data volume at the time of the project. The relatively small and simple
rectangular image blocks contained sufficient spectral variability (spectral signatures)
for the development of spectral clusters.

      The spectral statistics may have been  improved had the scenes been subset
by natural land divisions, such as ecological regions.  The use of such boundaries to
subset data introduces potential sources of problems and errors, since ecologic
boundaries are likely to cross TM scene boundaries.  Because of the spectral and
                                       38

-------
temporal differences between TM scenes each should be classified independently.
Problems of edge matching also occur.  This process requires detailed human
interaction.  With small or irregular scene divisions the edge matching of the final
data into one seamless coverage would be considerably more complex.


Classification  Techniques

    .  The use of unsupervised clustering and maximum likelihood classification
combined with  photo interpretation and field work proved to be an effective and
affordable method of covering the approximately 165,800 km2 watershed. Processing
and labelling instructions were as objective  as possible.and were standardized in the
Methods Guides (Appendix A).  However, each data set had its own peculiarities and
analysts occasionally had to modify procedures to maintain the quality of the
clustering results.

      The labelling of spectral clusters was the most subjective and time intensive
stage in the classification process. The identification of a cover type can not be
readily automated.  An analyst must visually identify the cover type represented by
each spectral cluster.  This process depends on the experience of the analyst and
the quality of the  reference information (field notes, air photos, maps, etc.).

      The consistency of the labelling process was apparent as the data subsets
were pieced together to form the complete coverage. The occasional mismatched
clusters were checked and relabeled if necessary before receding to the final class
values.

      The cluster refinement process was effective in improving the discrimination
between surface types.  The ability to go back and redefine clusters that  contained
more than one  surface type allowed the analyst to specify fewer clusters  at the
beginning.  Since labelling clusters is so time consuming it was less tedious for the
analyst to produce a smaller number of initial spectral clusters and then redefine
cluster statistics for those clusters which contained mixed cover types. These
clusters were indicated on the tracking forms, and editing suggestions were  noted.

      Post classification editing provided corrections for the final classification  but
little overall change to the classification of the subsets. These edits improved local
areas cosmetically but changed only a small percent of the total pixels. Changes
ranged from less than 1, to 3 percent of each image subset. An exception was in the
region of forest gypsy moth infestation (Figure 4-1).

      Gypsy moth defoliation of forest canopies resulted in an unobstructed satellite
view of the shrub and herbaceous cover of the forest floor in these areas. This
problem occurred primarily on the ridge tops in the Appalachians of the southwestern
                                       39

-------
Figure 4-1 Landsat Thematic Mapper color composite image of the
Shenandoah Mountains, VA.  Thematic Mapper bands 4, 5, and 3 are assigned
to red, green, and blue, respectively. The blue areas along the ridges in the
central portion of the image indicate areas of gypsy moth defoliation.
                                   40

-------
 and central western portions of the watershed. The resulting spectral signatures were
 indistinguishable from other areas of herbaceous cover.  The striking difference
 between healthy forest and damaged forest can be seen in Figure 4-1  The false
 color composite image is composed of TM bands 5, 4, and 3 in red, green and blue
 respectively. The Shenandoah Mountains run north-northeast across the image  Trie
 areas damaged by gypsy moths appear as blue tones along the ridges.  There is a
 strong contrast between these areas and the reds and oranges of the healthy forest
 The blue and blue-green areas in the image in the low lands either side of the ridge
 are primarily areas of herbaceous cover. The dark blue in the southwest corner of
 the image is the town of Harrisonburg, VA.


 Final Land  Cover/Use Classification Product Generation

       As each  image subset was pieced into the master image file, mislabelled
 clusters (as  discussed above) were relabeled.  Each boundary between subsets was
 checked for classification consistency. Edges of the image subsets matched well
 including subsets from within the same TM scene and from adjacent scenes.

       The smoothing algorithm, used to eliminate pixel groups of less than the one,
 hectare minimum mapping unit, effectively corrected  edge-effect problems. Isolated
 groups of pixels, miss-classified because they lie on the edge of two resources and
 have a mixed signature, were essentially eliminated.  The smoothing method as
 described in Chapter 3 finds and eliminates all features smaller than a combined pixel
 area of 1 ha. Unlike traditional smoothing filters which use a specific kernel size,
 linear features as narrow as one pixel wide were  maintained as long as their
 combined area was equal to or greater than the minimum map unit. This  allowed
 features such as roads and  streams to remain.


 EMAP Hexagon Sampling Scheme

      The EMAP Hexagon Sampling Scheme was developed as a spatial sampling
 system for a wide variety of point and spatial data. One goal of this sampling design
 was to test its utility in  sampling land cover/use spatial data.  The comparison of land
 cover/use statistics for the overall watershed and the hexagons (Table 4-2) indicated
 that the hexagons provided an adequate representation. The category percent
 coverages differed  by less than 1% for all but the Woody class, which differed  by
 1.13%. The hexagon sampling scheme appears adequate, even for less common
surface types. In summary,  the EMAP Hexagon Sampling Scheme provided a useful
estimate of the percent coverage of classes for the Chesapeake Bay watershed.
                                     41

-------
                                  CHAPTER 5
                                CONCLUSIONS
      The Chesapeake Bay Watershed Pilot Project was initiated to meet the needs
of the Environmental Protection Agency (EPA) Environmental Monitoring and
Assessment Program - Landscape Characterization (EMAP-LC) and the EPA
Chesapeake Bay Program Office (CBPO).  This Chapter summarizes the main points
and recommendations from the report.

      The classification system used for the project was developed by an
interagency group for wide ranging purposes.  As a land cover and  land use system it
was not ideally suited for use with a TM data classification.  The classification of TM
spectral signatures lacked the human interpretations necessary to categorize land
use classes, as was explained  in the Discussion section. The political and social
importance of the land use categories such as croplands and urban areas are well
understood. However, future work should carefully consider the spectral separability
of these categories using TM data.  It may be necessary to use additional data
resources and/or a methodology other than traditional classification techniques to
produce greater  land use detail.

      The methodology presented here effectively classified a number of land
cover/use categories.  However, accuracy DQOs were met only after simplifying and
aggregating the original classes.  Existing  raster or vector coverages of specialized
land use could be combined with the resultant classification to improve detail.

      A significant accomplishment of this project was the development and
implementation of tracking forms and instruction guides developed as part of the
QA/QC procedures. From the beginning of this project tracking forms traced all
procedures performed on the data.  The tracking forms allowed for easy handling of
the large number of TM scenes.  The forms facilitated  monitoring the completion of
each step in the  process for each data subset, comparison of results between data
subsets, backup  and  retrieval of data, and tracing of errors.  The instruction guides
helped the analysts perform consistent analyses on all of the image subsets by
providing step-by-step operating instructions.
                                      42

-------
                                REFERENCES
Anderson, J. R.  1971. Land Use Classification Schemes Used in Selected Recent
      Geographic Applications of Remote Sensing.  Photogrammetric Engineering
      37(4):379-387.

Anderson, J. R., E. E. Hardy, J. T. Roach, and R. E. Witma'n.  1976.  A Land Use
      and Land Cover Classification System for Use with Remote Sensor Data.  U.S.
      Geological Survey Professional Paper 964, Washington, DC. 28 pp.

Bishop,  Y.M.M., S.E. Fienberg, and P.W. Holland, 1988, Discrete Multivariate
      Analysis, Theory and Practice, The MIT Press, Cambridge, MA, 557 p.

Chavez, P. S., G. L. Berlin, and L. B. Sowers. 1982. Statistical Method  for Selecting
      Landsat MSS Ratios. Journal of Applied Photographic Engineering 8:23-30.

Chavez, P. S., C. Guptiil, and J. A. Bowell. 1984. Image Processing Techniques for
      Thematic Mapper Data. Technical Papers, 50th Annual Meeting of the
      American Society of Photogrammetry 2:728-742.

Congalton, R. G., 1991. A Review of Assessing the Accuracy of Classifications of
      Remotely Sensed Data.  Remote Sensing of the Environment 37(1):35-46.

Cowardin, L. M., V. Carter, F. C. Golet, and E. T. LaRoe.  1979. A Classification of
      Wetlands and Deepwater Habitats of the United States, Office of  Biological
      Services, Fish and Wildlife  Service, U.S. Department of the Interior,
      Washington, DC. 103 pp.

Harlow,  W.M., and E.S. Harrar. 1969.  Textbook of Dendrology: Covering the
      Important Forest Trees of the United States and Canada. McGraw-Hill Book
      Company, New York, NY. 511 pp.

Jensen,  J. R., 1986. Introductory Digital Image Processing.  Prentice-Hall,
      Englewood Cliffs, NJ. 379 pp.

Latty, R. S., and  R. M. Hoffer.  1981.  Waveband Evaluation of Proposed Thematic
      Mapper in Forest Cover Classification.  Proceedings of American  Society of
      Photogrammetry Fall Technical meeting, Niagara Falls, NY, pp. RS2-D:1-12.

Nelson,  R. F., R. S. Latty, and G.  Mott. 1984.  Classifying Northern  Forests Using
      Thematic Mapper Simulator Data.  Photogrammetric Engineering  and Remote
      Sensing 50(5):607-617.
                                     43

-------
 Omernik, J. M.,  1987.  Ecoregions of the Conterminous United States.  Annals of the
      Association of American Geographers 77(1): 118-125.

 Richards, J. A.  1986.  Remote Sensing Digital Image Analysis. Springer-Verlag
      New York, NY. 292 pp.

 Sabins, F. F., 1987. Remote Sensing Principles and Interpretation. W.H. Freeman
      and Company, New York,  NY. 449 pp.

 Scheaffer, R.  L, W. Mendenhall,  and L. Ott. 1986.  Elementary Survey Sampling, 3rd
      edition. P.W.S - Kent Publishing Company, Boston, MA. 324 pp.

 Stenback, J. M. and R. G. Congalton, 1990. Using Thematic Mapper Imagery to
      Examine Forest Understory.  Photogrammetric Engineering  & Remote Sensing
      56(9):1285-1290.

 Story, M., and R. G. Congalton. 1986.  Remote Sensing Brief - Accuracy
      Assessment: A User's Perspective.  Photogrammetric Engineering & Remote
      Sensing 52(3):397-399.

Taylor, J. K.  1987.  Quality Assurance of Chemical Measurements.
    Lewis Publishers, Inc., Chelsea, Ml.

Weerackoon, R. D., and T. H-.-Mace. 1990. A Method of Optimizing the Maximum
      Likelihood Classifier Using  Mathematical Transformations.  Proceedings of the
      1990 ACSM/ASPRS Annual Convention, Denver, CO. Vol. 4, pp. 464-474.
                                     44

-------
                            ACKNOWLEDGMENTS
       Many individuals have contributed to the success of this project.  First we
 would  like to acknowledge the many individuals who have contributed directly to the
 completion of this classification: Denice Shaw, EMAP-LC Technical Coordinator Ross
 Lunetta, originating Project Officer and Douglas J. Norton, EMAP-LC Technical'
 Director at the initiation of the project, all of the US EPA; Mark Finkbeiner and Steven
 R. Hoffer, Lockheed Environmental Systems and Technologies Company (LESAT)-
 Janice L. Thompson.The Wilderness Society; Scott Thomasma, formerly of LESAT
 Others who contributed significantly to the project include: John Lyon, Ohio State
 University; Russ Congalton,  University of New Hampshire; Jay Morgan Towson State
 University; William Aymard, PCI, Inc.; Mary E.  Balogh, US Bureau of Reclamation',
 Edward Bright, Oak Ridge National Laboratory; Michael Cambers, US Geological '
 Survey (USGS); James J. Chung, LESAT; Jerome E. Dobson, Oak Ridge  National
 Laboratory; Lynn K. Fenstermaker,  Desert Research Institute; Randolph L Ferguson
 NOAA/National Marine Fisheries Service (NMFS); Frank Golet, University of Rhode
 Island; Kenneth D. Haddad,  Florida Department of Natural Resources; Jimmy
Johnson, US Fish and Wildlife Service; Donley Kisner, the Bionetics Corporation;
Richard Kleckner, USGS; Victor V. Klemas, University of Delaware- K Peter Lade
Salisbury State University; Karen H. Lee, LESAT; Kathy Lins,  USGS; James P
Thomas, NOAA/NMFS; and Bill O. Wilen, US Fish and Wildlife Service.  Also
contributing were: Triana N. Burchianti, LESAT; Dominic A. Fuccillo, LESAT; Lynda
Liptrap, Computer Sciences Corporation; James Love, EOSAT; James R Lucas
LESAT; Tom Mace, US EPA; John Nietling, LESAT; Lynn Schuler, US EPA -
Chesapeake Bay Program Office; Kris Stout, the Bionetics Corporation; Ron Risty,
USGS EROS Data Center; and Ridgeway D. Weerackoon, Desert Research Institute.
                                    45

-------

-------
Path:
Row:
                           CLIP TM SCENES TO WATERSHED
BSTATS output (file.	

File size of clipped file:
       Number of rows:

PRINCO output (file	

OIF output (file	)

Backup Tape:
       Tape number:	
      File name:
      Tape listing (file
1:250,000:
  1:100,000:
  l:100,000:i

  1:100,000:_

  1:100,000:
1:250,000:
  1:100,000:_

  1:100,000:
  1:100,000:_

  1:100,000:
                      columns:
                     LIST OF MAP NAMES

                        1:250,000:	
                         1:100,000:

                         1:100,000:.

                         1:100,000:.

                         1:100,000:


                        1:250,000:
                         1:100,000:.

                         1:100,000:.

                         1:100,000:

                         1:100,000:

-------
1:250,000:
  1:100,000:.




  1:100,000:.




  1:100,000:_




  1:100,000:
1:250,000:_




 1:100,000:




 1:100,000:




 1:100,000:




 1:100,000:

-------
                                    APPENDIX A

                            INSTRUCTIONS AND FORMS
       Copies of instruction guides and tracking forms used for the image classification work
are shown on the following pages.  They appear hi the order hi which they are used in the
analysis.
 1. Receiving, TM Data Tracking Form . .	   2

 2. Clip TM Scenes to Watershed Tracking Form	 .            3

 3. Subset TM Scenes Instructions	             5

 4. Subset TM Scenes Tracking Form	           7

 5. Classification Instructions	               9

 6. Classification of Subsets  Tracking Form	  17

7. Cluster Labels Tracking Form	  19

8. Subset Editing Instructions	  23

9. Editing Subsets Tracking Form  	  27

 10. Combining Subsets into the Master-Raster Instructions	    31

11. Combining Files into Master-Raster Tracking Form	    44

12. Master-Raster Recede Tracking Form   	  45

13. Accuracy Assessment Form	  52

14. Chesapeake Bay Watershed Land Cover Metadata	  54

-------
Path:
Row:
RECEIVING TM DATA


    Scene  ID:
***** From Billing Statement  (file_

Billing order number:_	


Sequence number:	


Shipping date:	


***** prom STX Header Information Sheet  (file


Acquisition date:	


UTM zone:	


Pixels per line:	


Lines per image:
                                    **************************
UL:
UR:
LR:
LL:
Latitude
o /
o /
0 '
O 1
"W
"W
"W
11 W
Longitude
o /
o /
o /
o /

"N
"N
"N
                                                         UTM-X
                                                                    UTM-Y
                                     Scene Center:
Blocking factory


Record length:	
***** prom STX Rectification Information (file_


Number of points in consensus set:	


RMS X:	


RMS Y:	


RMS D:	


***** EMSL*LV Tape Library Information
****************************************
                                     *************************
    EMSL-LV 9-track
    tape number


  1 	


  2 	


  3 	


  4 	


EMSL-LV 8mm number:
      TM bands
  notes
                    (file MTCOUNT output_


                      2

-------
                                   SUBSET TM SCENES
 1) Get disk assignment from supervisor. You will be working probably on drs4. Log in as "rsches"
      and move to that directory.
            sunssOS % cd /drs4/rsches

 2) Create a directory on, that disk using the conventions outlined hi the followicg example:
            sunssOS % mkdir trnl533*

 3) Create a.link at the home directory (/drsl/rsches) to the new directory.  This link will help others
      find your files.  Use the following commands:
            sunssOS % cd
            sunssOS % In -s /drs4/rsches/tml533 tml533

 4) Retrieve tape from library (tape numbers are listed in the tracking book) and insert in SUNS 90's
      8mm tape drive.  Shell to the server and cd to the new directory.
            sunss03% rsh  sun390
            sun390% cd
            sun390% cd tm!533

5) Retrieve the file from tape.  This step will take some time  as the files  are large.   The  "tar"
      command will read the entire tape, even though  you will be requesting the first file on tape.
      The file size can be found  on the tape listing  in the  file for this  scene.  You may monitor the
      status of the "tar"  command by using another window to check the file size.  After the entire
      file is read you may Ctrl-C to stop tape processing. (It will continue to scan:the whole tape even
      if your file is the first  one  on the tape.) Use  the following  command to retrieve the file-
            sun390% tar -xvf /dev/rstl cltml533.1an

6) Run BSTATS,  get header listing only, send output to printer and file results. Use the following
     example to respond to  prompts:
            ERD> bstats
            Is  this an Image or a GIS file? i
            Enter Image filename: cltmlSSS.lan
            Make listing go to Printer, Terminal, or Both? p
            Make a listing  of the Statistics? y
            Make a listing  of the Histogram? n
            Use the whole  image? y
            Enter X skip factor: 1
            Enter Y skip factor: 1
            Count the zeros? n

-------
 7) Decide where to subset image.  You will want a file roughly 4000x3000.  Enter subset coordinate
      information on the tracking form for each subset.

 8) Create  subdirectories for each subset.  Use the conventions outlined hi the following examples:
             sunss04% mkdir subsetl
             sunss04% mkdir subset2

 9) Move to the appropriate subdirectory and use SUBSET to create subset image files.  Remember
      to select only TM bands 3, 4, 5, and 7. Refer to the following example for prompt responses:
             ERD > cd subsetl
             ERD> subset
             Image or GIS file? i
             Enter Input linage filename:  ../cltm!533.1an
             Use the whole image: n
             Enter coordinates (X, Y) for upper left corner? 1829,712
             Enter coordinates (X,Y) for lower right corner? 2829,1712
             Enter Output Image filename: cltml533subl.lan
             How many columns are to  be hi the output file?  < default >
             How may rows are to be in the output file? < default >
             Enter coordinates of absolute upper-left corner of output file? < default >
             Enter output file coordinates at which  to place upper  left corner of  input subset?
                    < default >
             How may bands are to be hi output file? 4
             Copy all bands hi order? n
             For input band 1, enter output band? -1
             For input band 2, enter output band? -1
             For input band 3, enter output band? 1
             For input band 4, enter output band? 2
             For input band 5, enter output band? 3
             For input band 6, enter output band? 4

10) Run BSTATS on subset image, get a statistics listing only,  send output to printer  and file results.
             ERD> bstats
             Is this an Image or a GIS file? i
             Enter Image filename: tml533subl.lan
             Overwrite the file? y
             Make a statistics  listing? y
             Make a histogram listing? n
             Listing to go to Printer, Terminal, or Both? p
             Use the whole image? y
             Enter X skip factor? 1
             Enter Y skip factor? 1
             Count zeroes hi the statistics  computation? n
11) Repeat  steps 9 and 10 for each subset.

-------
                                   SUBSET TM SCENES
Path:
Row:
Directory (full path):




Tape number:	




File name:
LISTIT (file
Number of subsets:
*********************
                             1 ********************
  Directory (full path):*
  Upper left file coordinate:
  Lower right file coordinate:



  Output file name:	
  BSTATS (file _ )
********************;
  Directory (full path):
  Upper left file coordinate:
  Lower right file coordinate:




  Output file name:	




  BSTATS (file	)
                              ********************

-------
******************** Subset 3 ********************




  Directory (full path):_	
  Upper left file coordinate:
  Lower right file coordinate:




  Output file name:_




  BSTATS (file _ )




Path: _ Row: _





********************




  Directory (full path):
                            4 ********************
  Upper left file coordinate:
  Lower right file coordinate:



  Output file name:	



  BSTATS (file

-------
                              CLASSIFICATION INSTRUCTIONS
       The following instructions outline step by step how Landsat Thematic Mapper (TM) data are
 classified in the Chesapeake Bay Watershed Pilot Project.  This document .-is. designed: to, berused by
 an analyst during the data classification process.  It does not explain the techniques or reasoning
 behind the different steps in the analysis (see the section on Methodology).  Before proceeding with
 these steps, TM  data must be subset to the proper area and reduced to four bands (bands 3, 4, 5,
 and 7).  After^completing the.steps, the user will have created a classified image ready, to b'e
 combined with other images and edited;

    Computer programs written at the Environmental Systems Laboratory-Las Vegas. (EMSL-LV)
 and Erdas  software are required to complete the classification steps. Familiarity with,the Unix
 operating system and Erdas software is assumed.  The names, of all programs are written,in
 boldface.
       EMSL-LV programs
              gisrain
              kluster
              maxopt
              printstf

              unitvarneO

              wckluster
              wcmaxopt

              Erdas programs
              bstats  -
              colormod
              display
              electromap
              gisedit
              read
              recede
              stitch  -
     generates a rainbow file that imitates a three-band, composite,
     generates statistical clusters from .Ian file and input parameters
     assigns pixels to clusters generated in kluster
     formats output from kluster to be printed,  output file name,
     "printstf.log"
     calculates band variance thresholds from the  .Ian file, excludes
     zero values, output file, VFILE.DAT
     regenerates statistical clusters from mixed clusters • from maxopt
     assigns pixels to clusters generated in wckluster
generates image statistics
    highlights clusters on the screen
    displays  .gis image
    plots .Ian and .gis image files
    edits screen values of .gis file and updates file values
    displays  .Ian file
    recodes .gis file
attaches two geographically adjacent images into a single image
       The naming conventions for data files used in these instructions should be followed so that
work may be traced easily.  All examples hi this document use "subl" hi the file names to indicate
that subset 1 of a scene is being processed.  The numbers 2, 3, or 4 should be substituted:, for the 1
in "subl" to indicate the appropriate subset.

-------
1) To begin classifying a new subset, fill out the top of a new classification tracking form and start
       a new file folder to hold the printouts.  Remember to label all printouts with the scene and
       subset numbers and keep them in the folder (i.e. 1533 subsetD.  The tracking forms and the
       folder of data printouts must be kept organized for QA/QC checks.

2) Run unitvarneO
             ERD> unitvarneO subl.Ian

3) Print VFILE.DAT, this listing  should be filed in the scene folder.
             ERD> lprVFILE.DAT

4) Find the cumulative window count of 50% for each band on VFILE.DAT and mark on the
       printout.  For example, if the percentage closest to 50% falls at the range, of 11-12, then
       select the number 12 as your variance threshold.

5) Create a parameter file (.pfile)  using textedit. Use the naming convention outlined in example
       below.
             ERD > textedit subla.pfile &

6) The .pfile contains all information  required by the kluster and maxopt programs. The file
       format must be exactly correct. Make sure you use a comma to separate  items on a record
       line.  The list below describes  the items  on each line.

    Record 1: Image file name
    Record 2, item 1: option of kluster
         1= euclidian distance option, 2= quadratic threshold method
    Record 2, item 2: # of windows to skip along columns
    Record 2, item 3: # of windows to skip along rows
    Record 3: output statistics file name (.stf)
    Record 4, item 1: Unitvar variance threshold for channel 1
    Record 4, item 2: Unitvar variance threshold for channel 2
    Record 4, item 3: Unitvar variance threshold for channel 3
    Record 4, item 4: Unitvar variance threshold for channel 4
    Record 5, item 1: desired number of clusters
    Record 5, item 2: segment of  .stf file (use segment 1)
    Record 6: output classified image file name
    Record 7. item 1: standard deviation for maxopt, suggested value 2.1
                                         10

-------
        The .pfile for subset 1 of a scene should look similar to the example below:
                     sub 1. Ian
                     1,0,0
                     subl.stf
                     11.0,25.0,54.0,17.0
                     80,1
                     subla.gis
                     2.1

 7) Run kluster.
              SS03% kluster subla.pfile

 8) Run printstf to create a listing of kluster results. The command line must contain the, .stf file,
       name and the number of channels.  File the output in the scene folder..

              ERD > printstf subl.stf 4   (4 indicates the number of bands)
              ERD> lprprintstf.log

       Write the total number of: clusters, before the final merge (during. Muster) at the, bottom of
       the printstf.log.  This number can.be found at the bottom, of the subl.stf file.

9) The desired number of clusters is  very scene-dependent and you will have to decide how many
       clusters you will need.  A minimum of 80 clusters and as many as 95 may be useful.  The
       output of the printstf will tell you, among other things, how many final clusters were found.
       To generate  more clusters, edit record 5 in the  .pfile and choose threshold values
       corresponding to the cumulative window count of 40%  instead of 50%.  To generate  fewer
       clusters, use threshold values  corresponding to the cumulative window count of 60%  instead
       of 50%.  In one example, only 52 clusters were originally found; consequently, the 50%
       values were  replaced with 40% values in the pfile. Document this change on the printout of
       VFILE.DAT

10) Repeat steps 7 through 9 if required.

11) Put final printout of printstf in scene folder.

12) Runmaxopt.
             SS03% maxopt subla.pfile

13) Run gisedit to remove bad pixel values, if any, generated in the first row of the .gis file. Set
      the pixel values to zero.
                                          11

-------
14) Run bstats on the output .gis file.  This listing must be kept in the scene folder.
              ERD> bstats
              Is this an Image or a GIS file: 'g
              Enter GIS filename: sub la. gis
              Overwrite the file? y
              Make a header information listing? y
              Make a histogram listing? y
              Make a listing of the color scheme? n
              Listing to go to Printer, Terminal, or Both? b

15) Print a listing of the .pfile.  File the printout in the scene folder.
              ERD> Ipr subla.pfile

16) Collect the appropriate hardcopy reference material for naming clusters.
       U.S. Geological  Survey Topographic maps
       NAPP  and NHAP aerial photography
       U.S. Geological  Survey Land Use/Land Cover Maps
       U.S. Fish and Wildlife National Wetland Inventory maps
       U.S. Department of Agriculture Agricultural Statistics Bulletins, by state
       U.S. Soil Conservation Service Soil Survey Bulletins, by county
       Landsat Thematic Mapper image maps

17) Label the clusters of the new  .gis file created hi maxopt (subl.gis).  First, use gisrain to
       generate a rainbow file containing color schemes that imitate three-band composite images.
       See gisrain help  screen for use of this program. Run display to view the .gis file.  Using
       colormod, update the trailer with the color scheme of your choice.  Also hi colormod,
       highlight each cluster one at a time to identify the level 2 class it belongs to.  Use the
       reference materials obtained step 16. Record the class number for each cluster on the
       "Cluster Labels"  tracking form.  At this point do not edit the .gis file.  If classes require
       editing, note them on the tracking form. Mark them on the image plotted hi step 36.
       Clusters which include more than one cover type should be flagged for "re-cluster" on the
       tracking form. Note any areas which should be field checked.  The following is a
       simplified list of the names and numbers of the classification system:
                                          12

-------
       Level 0'

       Upland
       Wetland
       Water and
       Submerged land
Level 1

1  Developed


2  Cultivated: Land


3  Grassland

4  Woody



5  Exposed Land
6  Snow & Ice
7  Woody Wetland
8  Herbaceous Wetland

9  Nonvegetated Wetland

10 Water and
submerged land
 Level 2

 11  High Intensity
 12  Low Intensity

 21  Woody
 22  Herbaceous

 31  Herbaceous

 41  Deciduous
 42  Mixed
 43  Evergreen

 51  Soil
 52  Sand
 53  Rock
 54  Evaporite Deposits

 61  Snow & Ice
71  Deciduous

72  Mixed
73  Evergreen

81  Herbaceous

91  Nonvegetated"

100 Water
       The classification system listed above was used for the Chesapeake Bay Watershed Pilot
project. However, it was established and classification begun before the final EMAP classification
system was determined.  Future projects may wish to follow the final EMAP classification system,
which was modified from the above in some categories.

18) Identify areas on the image for which air photos or other reference material is available.
       Record the file coordinates of several 1024x1024 windows and the photo numbers covered by
       this window on the "Cluster Labels" tracking form.  This step may be done while kluster
       and maxopt are running.
                                         13

-------
 19) Copy the old .gis to a new file name.
              sunss04% cp subla.gis sublb.gis

 20) Copy the old .pfile to a new file name.
              sunss04% cp subla.pfile sublb.pfile

 21) Edit the new sublb.pfile.  Make the following changes:

       Record 5, item 3-13: add class number for those clusters needing re-clustering.  Cluster
             255 (unclassified pixels) may be included.  Make sure the numbers are separated by
             commas.  A maximum of 10 clusters may be listed
       Record 6: specify the new output  .gis file created in step 19

       The file should look similar to the example below:
                    subl.lan
                    1,0,0
                    subl.stf
                    11.0,25.0,54.0,17.0
                    80,1,1,2,7,21,255
                    sublb.gis
                    2.1
22) Run wckluster
             SS03% wckluster sublb.pfile

23) Use printstf to get a listing of wckluster results.  The command line must contain the .stf file
       name and the number of channels.
             ERD> printstf wckluster. stf 4  (4 indicates the number of
                                                    bands)
             ERD> lprprintstf.log

24) Run wcmaxopt
             SS03% wcmaxopt sublb.pfile

25) Update the trailer.  Copy a trailer file containing names and numbers for the clusters into the
       working directory. This directory is useful when for printing the recede.aud file later on.
             sunss04% cp /drsl/rsches/defaults/wckluster.trl sublb.trl
                                          14

-------
 26) Run bstats on the output .gis file.  File the output hi the scene folder.
              ERD> bstats
              Is this an Image or a GIS file: g
              Enter GIS filename: sublb.gis
              Overwrite; the file? y
              Maker a header information listing? y
              Make a histogram listing? y
              Make a color; table listing? n
              Listing to go to Fruiter, Terminal, or Both? b
              Printer unit? 0

 27) Print a hard copy listing of wcmaxopt.log.  File it in the scene folder.
              ERD> Ipr wcmaxopt.log

 28) Run gisrain again to create a new rainbow file for your new .gis image.  Name the new clusters
       as in step 17.  Write the class numbers on the "Cluster Labels" tracking form.

 29) Run recede to create the final .gis file for this subset.  Create an "audit" file of this step to
       document how the clusters are receded to class values.
              ERD> prep
              Enter  audit file name: recode.aud
              ERD> recede

              Enter output file name: sublr.gis

              ERD> noprep

30) Print a hardcopy of the audit file.  Check the recede  values on the hard copy and make any
       necessary  corrections to the audit file with textedit.  Make a new hard copy if necessary and
       put it in the scene folder.  Run the audit file with  a batch command.

              ERD> Ipr recode.aud
              ERD> batch recode.aud

31) Copy the final trailer file. A standard trailer file that contains  the color scheme and class names
       for the final .gis files was created at the start  of the project.  This file may be copied into
       your directory.
              sunss04%  cp /drsl/rsches/defaults/fmal.trl  sublr.trl
                                          15

-------
32) Run bstats on the output .gis file.
              ERD> bstats
              Is this an Image or a GIS file: g
              Enter GIS filename: subla.gis
              Overwrite the file? y
              Make a header information listing? y
              Make a histogram listing? y
              Make a listing of the color scheme? n
              Listing to go to Printer, Terminal, or Both? b

33) Use display to view the .gis image with the new trailer.  Look at the image reduced to fit on
       the display screen and look for any problems.

34) Repeat steps 1-30 for each subset hi the scene.  If the  other subsets are completed, compare
       them to your newly completed subset.. Either display them side by side  or temporarily stitch
       them together. Rerun any recedes that will improve  the match of the subsets.

35) Plot a hard copy of the .Ian file for your subset.  Run electromap to plot a linear stretch of
       bands 2, 3, 1 (TM bands 3, 5, 4) as R, G, B.  Use the default stretch options.  Plot at a
       scale of 1:250,000.  Your plot will probably  use more than one strip of paper.  Tape the
       strips  together.  Mark on the plot any unusual features and areas that will need to be edited
       in the classified .gis file and write an explanation in the margins.  Store the plot hi the map
       drawer marked RSCHES.

36) Generate a plot of the receded .gis  image to mark any major edits.  First display your receded
       .gis file. Run colormod and call up a rainbow file containing the class colors for the printer.
       The path name is /drsl/rsches/defaults/legend.rnb. Retrieve the look up table for the printer
       and update the trailer. Then run electromap and print the  .gis file scaled to fit the page.
       Mark  any areas that need editing.  Circle areas 'with  a heavy pen and mark hi the margin  the
       class numbers that are to be changed (for example 51 -> 22). Put this  image in the tracking
       book following the "Cluster Labelling" tracking form.

37) When all of the subsets for a scene are completed, backup all files for the scene on 8mm tape
       and print a copy of the tape log.  Use the following tar commands:
              SUN390% tar -cv tm!533
              SUN390% tar -tv > 1533back.up
              SUN390% Ipr 1533back.up
       Record the tape number and the date on the tracking  form and tiie printout of the backup log.
       File the backup log hi the scene folder.
                                          16

-------
TM path:
row:
     CLUSTER LABELS

subset #
   file coordinates-
      X       Y     air-photo numbers-
A-

B:

C:

D:

E:

F:

G:

H:
cluste
1
2
3
4
5
6
7-
&
9
10
11'
12
13
14
15
16
17
18
19
20
21
22
23
24
25
r scan

























we

























A
I



'




















B

























C

























D























1

E

























F
























	 I
G












.












H













































































                                      19

-------
cluster scan   we
                                                                                final
                                                                                         notes
26
27
28
29
30
31
'32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70



































































































"























































































































































































































































































































































































J
J


















I
I











































                                                   20

-------
                           CLASSIFICATION OF SUBSETS
TM path:
row:
subset #
Directory:
File name:
UNITVARNEO
      VFILE.DAT (file_

KLUSTER
      final pfile name:
      output stats file name:

      output of printstf (file

MAXOPT
      output GIS file name:
      BSTATS (file	)

      pfile after maxopt - with class names (file_

WCKLUSTER
      new GIS file name:
      new pfile name:
      output of printstf (file_

WCMAXOPT
      output GIS file name:
      BSTATS (file	)

      WCMAXOPT.LOG - with, class names (file
                                     17

-------
 RECODE
       audit file name:
       output GIS file name:

       BSTATS (file	)
                 (file
 FILE CHECK LIST
       Image files
           subl.lan
           subl.sta
           subla.gis
           subla.trl
           sublb.gis
           sublb.trl
           sublr.gis
           sublr.trl
Miscellaneous files
    VFILE.DAT
    subla.pfile
    sublb.pfile
    subl.stf
    printstf.log
    wckluster.stf
    wcmaxopt.log
    recode.aud
       (Your file names may vary from subl, sub2, sub3, etc.)

PRINT OUT CHECK LIST
           VFILE.DAT
           printstf.log - after first clustering
           bstats - for gis image after first clustering
           .pfile
           printstf.log - after "within class" clustering
           bstats - for gis image after "within class" clustering
           wcmaxopt.log
           recode.aud
           bstats - for gis image after receding to the
BACKUP
       tape number:

       date:
                    classification scheme
       tar listing (file_
                                         18

-------
cluster scan wcABCDEFGH -final nnt-oc
71
72
73-
74
75
76
77'
78:
79
80
81
82
83'
84
85"
86.
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102,
103
104
105
106
107
108
109
110
111
112
113
114
115
i








I
I
I

l









1











1


1
1




1








1





1


































1



21

-------
cluster scan   we
                                                                                final
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150



















































































































































'










































































































































































































I





















J
J









j









h


























                                                  22

-------
                              SUBSET EDITING INSTRUCTIONS
       The following instructions outline step by step how classified Landsat: Thematic Mapper
 (TM) data are edited in; the Chesapeake Bay Watershed Pilot Project.  This document is  designed to
 be used by an analyst during the process of editing the imagery.. It does not explain the
 methodology or results  of processing the Chesapeake, Bay imagery  (see the section on Methodology).
 Before proceeding with these steps the TM data must be classified, projected into the final
 projection,  and assigned a recede parameter file and  rainbow file associated with it.  After
 completing the steps the user will have an edited; classified image ready to be combined  with other
 subsets  for the final coverage.

   Computer programs  written at the Environmental  Systems Laboratory-Las Vegas (EMSL-LV)
 and Erdas  software are; required to complete the classification steps. Familiarity with the Unix
 operating system and Erdas software is assumed. The names of all programs are written in
 boldface.

              EMSL-LV programs
              classord - generates a new parameter file for recode2 with clusters grouped by similar
              surface types and a new rainbow  file
              recbde2 - recedes cluster values in a .gis file

              Erdas programs
              listit - list file header information
              display -  displays a. .gis image
              colormod - load  and modify rainbow files (color lookup tables)
              gisedit - edit values in a  .gis image

      The naming conventions for data files  should be followed so that work may be traced  easily.
All examples of image files use the scene path and row numbers followed.by an "s" and  a single
digit for the subset number (for example, 1533s2: path 15, row 33, subset  2).

1) Three files should be present hi a new working directory: the classified TM subset file
      (toil633s3.gis); the recode parameter file used when balancing and receding the subset to its
      surrounding  subsets in the larger coverage (recode.pfile); and the rainbow file containing
      color schemes for the "old" classification, the  "new" balanced classification, and the
      imitation of the color composite  (tml633s3.rnb).  Retrieve these files from the tape backup
      of the master raster if necessary.
             sunssOS %  rsh sun390
             sunss03%  cd tml633
             sun390%  tar -xvf/dev/rstl tml633s3.gis recode.pfile tm!633s3.rnb
                                          23

-------
 2)  Copy the .gis file to a new file name for further processing:
              ERD>  cp tm!633s3.gis 1633s3ue.gis  ("ue"  for unedited)

 3)  Run listit to obtain the number of rows and columns hi the .gis file.  Send the output to the
       terminal only.
              ERD>  listit

 4)  Edit the existing recede parameter file for use hi classord.  The format of the file is as follows:
              line 1:  the new .gis file name   (1633s3ue.gis)
              line 2:  -1    (DD change)
              line 3:  1,1, number of rows, number of columns
              line 4 to end:  cluster number, recede number    (no change)
       The only changes to be made are the file name on line 1  (the file name)  and on line 3
       (numbers, rows, and columns).
              sunss03%  textedit recode.pfile

 5)  Run the program classord. This program creates a  new recede parameter file that will be used
       to recede the .gis image so that clusters of the same surface type will be grouped together.
       It also creates a rainbow file adjusted to be used  for the new image.  Classord will run as
       follows:
              sunss03%  classord
              enter input parameter file name:  recode.pfile
              enter output parameter  file name:  neworder.pfile
              enter input rainbow file name: tml633s3.rab
              enter output rainbow file name: neworder.rnb

6)  Print a copy of the log file output from running  classord.
             sunssOS %  Ipr classord.log
       Copy the cluster range for each class in the space provided on the "Editing Subsets" tracking
       form.  This will be useful during editing.  Note that the first range of clusters (new value
       200) includes clusters that were unclassified, reclustered hi wckluster, and clusters missed hi
       the labelling process.

7)  Run the program recodeZ.  This program uses the recode parameter file created in classord to
       recede the .gis image file.  The program overwrites the original file.  The resulting  image
       will look identical; only the order of the clusters  will be different.  The program runs as
       follows:
             agws02% recode2 neworder.pfile

8)  Use Erdas display  to display the .gis image after receding.
                                          24

-------
9)  Use Erdas colon-mod to access the rainbow file neworder.rnb created by classord in step 5.
       Update the trailer of the .gis image to display the "new" classified color scheme created in
       the last majon step of balancing me subsets.. The image should look just as it did  before
       these editing steps were begun.  To insure that all clusters have been assigned, the correct,
       colors, reassign colors to all clusters. Reassignment can be done very quickly in  colormod
       by using the ranges of cluster values from the tracking form.   Note the colors of the
       unclassified pixels (old cluster value 255).  These may need,to be changed to white to
       improve their color contrast on the image.  Save the new lookup table in the "neworder.rnb"
       rainbow file as "new2."  A color palette with the appropriate colors and code numbers can
       be found at the following path name:
             /drsl/rsches/defaults/codes.dat

10)  Retain the file with the "ue" unchanged in your directory.  Copy  the file, and its trailer to new
       names and perform the edits on the new file.
             sunss03% cp 1633s3ue.gisl633s3e.gis
             sunss03% cp 1633s3ue.trl 1633s3e.trl

11)  If the image is in a region containing gypsy, moth, damage,, record all cluster numbers that
       contain moth damage.  Do this by using colormod to first display the look up table that
       imitates the color composite and  then "flash" each cluster with  the color palette: Record'
       cluster numbers on the "Editing Subsets" tracking form.

12)  Gather all reference material available to aid in image editing. This material may  include air
       photos, 1:100,000 scale topographic maps, 1:24,000 topographic maps, soils maps, etc.


13)  Set the Erdas image-display 'drivers and display portions  of imagery to begin editing  the data.
       The imagery will be edited at full resolution; therefore, it must be displayed in smaller
       overlapping sections.  First be sure two image drivers  are open; one should be 1024x1024,
       the other can be any size.  Use display to display the upper left corner of the classified
       image in the 1024x1024 driver at a magnification of 1  (not a reduction of 1).  Resize the
       second driver down to so that only  the button panel on the right of the window is showing.
       Use colormod to load the "unclassified"  rainbow file in the second driver.  Note that the
       color scheme in the  first driver changes to the lookup table used in the second driver because
       the display screen can only use one lookup table at a time. However, the  RGB button on
       each driver window will return the  lookup table of the window.  All work will be done hi the
       first driver, but note that by  "clicking" on the RGB buttons of the two windows  you can
       alternately view the classified and the color composite  lookup tables.
                                          25

-------
14)  Perform the image editing with gisedit.  All edits will assign pixel values to the 201 to 219
       range of pixel values to prevent overlapping with the pixel values of the clusters.  See the listl
       of class names and corresponding pixel values on the  "Editing Subsets" tracking form.  For
       large areas that can be  outlined exactly,  all pixels in a polygon may changed to a single new
       value. However, for the majority of the edits, scattered pixels of a particular class (color)
       will need to be changed to a different class. The range of the clusters for each class copied
       in step 6 will make these edits easier. For example, if scattered pixels labelled (colored) as
       "soil" in a field need to be changed to the label (color) of "cultivated," just circle  the whole
       field and change the range of cluster values listed for  "soil" to 204, the value for
       "cultivated."

       As sections of each subset are  edited be sure to record the center coordinates of each section
       on the "Editing  Subsets" tracking form.   Space is also provided on the tracking form to note
       any unusual features or problems hi each section.  In  addition to editing cluster values,
       vectors should be drawn down major roads (202),  power lines (205), and airport runways
       (201 or 202) where features  would be lost to smoothing.   Do not bother drawing vectors  on
       anything but major highways.  Where a road with a vector crosses out of a subset, mark  an
       arrow on the classified  printout in the tracking book so that the adjacent subset will continue
       the vector.
15)  Run display to view the entire  subset after editing to look for mistakes and consistency hi
       editing across the subset. Correct any errors that are  found.

16)  Run bstats on the final image.  Save the hard copy output in the scene folder.
                                           26

-------
                                    EDITING SUBSETS
TMPath:
Row:
Subset #:
old gis file name:
old recode parameter file name:




old rainbow file name:
new gis file name (before editing):	




gis file - number of rows	  number of columns




new recode parameter file name:	




new rainbow file name:
print out classord.log




gis file name after editing
bstats from edited gis file (
                                         27

-------
The range of cluster values for each class:
                                class values
Level 1	   Level 2      old    new
Unclassified                     255
Developed	   high density
                     low density 12
Cultivated land	  woody 21
                     herbaceous  22
Grassland	  herbaceous
Woody	deciduous
Exposed Land
                     mixed      42
                     evergreen   43
                 ....     soil   51
                     sand       52
                     rock       53
                     evaporites   54
Snow and Ice	   snow and ice
Woody Wetland	   deciduous
                     mixed      72
                     evergreen   73
Herbaceous Wetland .  . .  herbaceous
Nonvegetated Wetland . .  nonvegetated
Water/Submerged Land . water    100
  200
    11
  202
  203
  204
   31
   41
  207
  208
  209
  210
  211
  212
   61
  71
  215
  216
 81
91
  219
                                                   cluster
                                                   range
                                                201
                                                205
                                                206
                                               213
                                               214
                                              217
                                             218
Numbers of clusters with moth damage:
                                        28

-------
 1024X1024 Edit Areas

      Center
    Coordinate
 Area   X    Y
Comments
 10
 11
 12
 13
 14
15
16
17
18
19
20
21
22
                                         29

-------
23 | | 1
1 1
24 | 1
1
1
1 1 1
25 I 1 1
Check Lists

For each image subset, make sure the following printouts are in the scene folder, and the
following files are left in the subset directory:
Print Outs
     classord.log
     bstats.out
Files
      1633s3ue.gis
      1633s3ue.trl
      1633s3e.gis
      1633s3e.trl
      clasord.log
      neworder.pfile
      neworder.rnb
      recode.pfile
                                            30

-------
                 COMBINING SUBSETS INTO THE MASTER-RASTER
        The following process steps are used to combine the subsets into a single raster file.
 This single file is called the  "master-raster"  throughout this document.  The files that were'
 output from the wcmaxopt program will be added to the master-raster.   The labels
 previously assigned to the clusters will be compared to the subsets previously added to the
 master-raster.  Modification  may have to be made to the labels to match, as best as possible,
 the other subsets.  Once final labels have been identified, the subset will be recoded to a
 classification numbering scheme that is unique to the master-raster. This numbering scheme
 uses values between 200 and 219, thus making it possible to manipulate the colors of cluster
 values (ranging from 1  to 199) without effecting the color surrounding completed data.

        Computer programs written at the Environmental System Laboratory -Las Vegas
 (EMSL-LV) and ERDAS version 7.5 software are required to complete these  steps.
 Familiarity  with the UNIX operating system and ERD AS software is assumed. The names
 of all programs are written in boldface.

       EMSL-LV programs
       dighead - generates a ".dig" file from the header of a raster file
       listhead - lists contents of the header of a raster file
       digcorners - lists the extreme X and Y coordinates in a ".dig" file
       mapcon - generates GCP points used to transform a raster file into a new projection
       georef - uses the output of mapcon to calculate coefficients for projecting a raster file
       geomap - uses the output of georef to create a raster file in a new projection
       recodel - changes the pixel values in a ".gis" file

       ERD AS programs
       ccvrt - converts  coordinates  in a ".dig" file to a new projection
       subset  - overwrites  a raster file with data from another raster file
       display - displays a ".gis" file on the computer screen
       colormod - allows interactive changes to colors associated with pixel values
       fixhed  - allows manipulation of the contents of the header hi a  raster file
       curses  - displays the pixel values currently displayed on the computer screen

       Documentation of. the ERD AS programs may be  found hi the ERDAS 7.5 software
manuals.  All  EMSL-LV software is considered public domain and was developed for EPA
under contract number 68-CO-0050 to Lockheed Engineering & Sciences Company.  Source
code is available upon request.  Documentation of the EMSL-LV  software is included in this
final report of the Chesapeake Bay Pilot Project.
                                          31

-------
       The naming conventions for data files used in these instructions should be followed to
facilitate QC checks and process tracking.  Most of the files have standard names.  A few
file names contain the TM path and row and subset number.' Several of the examples in this
document use "tml533sl"  within the file names to indicate that subset 1 of path 15, row 33
is being processed.  The appropriate path, row and subset numbers should be substituted
were applicable.

       Retrieve File, Project to Albers, Subset, Recede, and Archive,  a total of five major
operations, comprised of 41 steps are described below.
Retrieve File

       Retrieve the output of the classification from the tape archive to begin the procedure.
You will be using the output from wcmaxopt.  The pixels in this file contain the cluster
numbers.

1) Record the TM path, row number, and the subset number on the top of the "Combine
       Files into  Master Raster" tracking form.

2) Get 8-mm tape number from  "Classification of Subsets" tracking form.  Record it on the
       "Combine Files into Master Raster" tracking form.

3) The entire file name, including  ihe path, must be specified to retrieve  the file. This name
       can be found on the listing  of the 8-mm tape. If a listing is not available, one can be
       made using a tar command. Insert the proper 8-mm tape into the tape drive, log onto
       the SUN390, and use the following command:
              sun390% tar -tvf  /dev/rstl  > tape.log

       This command will create a file called "tape.log,"  which can be printed, containing a
       listing of the entire tape,  specifying complete file names.

4) Record the ".gis" file name, including the path, on the "Combine Files into Master
       Raster" tracking form.  If there is a rainbow (".rnb") file name instead of or hi
       addition to the ".gis" file name, record its  name on the tracking form.  If no rainbow
       file  exists, record the trailer (".trl") file name.

5) Insert the proper 8-mm into the tape drive, if you have not already done so.  The tar
       command  used to retrieve the files  must include the complete file  names with the
       path.  Log onto the SUN390 and retrieve the files  with a command line similar to the
       following  (this is all one  long command line):
              sun390% tar -xvf/dev/rstl  tml533/subsetl/sublb.gis
              /dev/rstl/tml533/subset2/sublbgis.rnb
                                          32

-------
       Your command line will vary, depending on the path and file name. If you are
       retrieving the trailer file rather than the rainbow file, the command will diff
       accordingly.                               --••

6) Record the output file name on the "Combine Files into Master Raster'" tracking form.

Project to Albers

Project the subset from Universal Trans Mercator (UTM) to Albers by following these steps.

7) Record UTM zone on "Combine Files Into Master Raster" tracking form. The UTM zone
       may  be found on the "Receiving TM Data" tracking form.

8) Run dighead program to create a ".dig" file of scene boundaries. The output file name
       should be specified as "utrnl.dig."  The command line should be similar to this:
              sunss03% dighead subIb.gis utml.dig

9) Record the upper left and lower right UTM coordinates,  displayed by the dighead
       program, on the "Combine Files Into Master Raster" tracking form.

10) Run listhead to display the contents of the ".gis" file header.  The corner UTM
       coordinates are listed so you may verify the coordinates recorded in step 9.  Record
       the number of rows and number of columns in the file.  These will be listed as
       number of columns and number of rows.  They are recorded on the tracking form in
       reverse order (rows, columns) to simplify later processing steps.
              sunssOS % listhead sublb.gis

11) The program mapcon will be run to create a set of control points to be used to calculate
       the projection parameters.  Create a parameter file called "geol.pfile" to be used:as
       input to the mapcon program.  This file has the following parameters (correct entries
       on right):
             Line I: Name of control point file	   geo.cfile
             Line 2: C,  L, or W, where C requests a cubic fit, L a linear fit,  and W a
                   weighted linear fit	   C
             Line 3: The following elements  must be separated by commas:
               element 1: scan tolerance  value measured in pixels  	   1.0
               element 2: element tolerance value measured in pixels	   1.0
               element 3: UTM zone of input file   	   see step 7
               element 4: latitude of the origin of output projection 	   0.0
               element 5: longitude of the origin of output projection  . .   -77.83333333333
               element 6: output projection type: 1 = Albers,  2 = Lambert   	  1
               element 7: first standard parallel (Albers only)   	  38.0
               element 8: second standard parallel (Albers only)	  42.0
             Line 4: The word YES or the word NO, for listing of proceedings.  If 'YES',

                                         33

-------
                    georef will generate a file named GEOPNTS.DAT and output relevant
                    information  	   Yes
             Line 5: The following elements must be separated by commas:
               element 1: Name of input file	  see step 6
               element 2: Type of input file - ELAS  or ERDAS  	ERDAS
               element 3: BL for a bilinear interpolation or NN for a nearest neighbor fiiNN
             Line 6: Name of output file   	  tinPPRRstf.gis
                    where PP is the TM path, RR is the TM row, and # is the subset
                    number.
             Line 7:. The following elements muft be separated by commas:
               element 1: Pixel wHth in meters  	  25.0
               element 2: Pixel width in meters	  25.0
             Line 8: The folio whig elements must be separated by commas:
               element 1: UTM X coordinate of the upper left	   see step 9
               element 2: UTM Y coordinate of the upper left	   see step 9
               element 3: UTM X coordinate of the lower right   	   see step 9
               element 4: UTM Y coordinate of the lower right   	   see step 9
               element 5: line to start processing   	  1
               element 6: number of lines to process   	  see step 10
               element 7: element to begin processing  	  1
               element 8: number of elements to process   	  see step 10

       All parameters for line 8 can be taken off the "Combine Files Into Master Raster"
       tracking form. The file should be called "geol.pfile" and be similar  to the following:

             gea.cfile
             C
             1.0,1- 0,18,0.0,-77.83333333333,1,38.0,42.0
             YES
             sublb.gis, ERDAS, NN
             tml533sl.gis
             25.0,25.0
             265450.0,4397225.0,397675.0,4219050.0,1,7128,1,5290
      Lines 1, 2, 4, and 7 will be exactly the same. Line 3 may differ depending on the
      input UTM zone only (element 3), all other parameters must be the same.  Line 5 and
      6 will differ only in the input and output file name, respectively. Line 8 will differ
      depending on the UTM coordinates and file size of the input file.

12) Run the program mapcon to create a set of control points to be used to calculate the
      projection parameters.  Use the above parameter file (step 11).
             sunss03% mapcon geol.pfile
                                         34

-------
 13) Make a hard-copy listing of both the parameter file and the log file.  Mark the TM path
       and row and the subset number on these printouts and file them in the folder for this
       scene.
             sunss03%  Ipr geol.pfile
             sunss03%  Ipr mapcon.log

 14) Use the ccvrt program to convert the UTM coordinates in the "utml. dig "file (see step
       8) into Albers projection. The following example answers should help respond to the
       program prompts.  All answers below will be the same for all subsets  except the
       UTM zone.
             ERD> ccvrt
             Options: (D,T,A) [DIG file],: DIG file
             Enter INPUT .filename : utml.dig
             Enter UTM zone number ? 0.: 18
             North or South of the equator? (N,S) [North] :  North
             Enter spheroid number ? Q : 1
             Enter OUTPUT filename : albersl
             What is the OUTPUT Coordinate Type? 3 (Albers Conical Equal Area)
             Enter LATITUDE of FIRST  STANDARD PARALLEL? D : 38
             Enter LATITUDE of SECOND STANDARD PARALLEL? []  : 42
             Enter LONGITUDE  of CENTRAL  MERIDIAN? Q  : -77.83333333333 (use
                   10 3's)
             Enter LATITUDE of ORIGIN of PROJECTION? Q : 0
             Enter FALSE EASTING at CENTRAL MERIDIAN? Q : 0
             Enter FALSE NORTHING at ORIGIN? 0 : 0

15) Run digcorners program to obtain the new raster file coordinates. The command line
      should be exactly like this:
             sunss03% digcorners  albersl  albersl

16) Record the upper left and lower right Albers coordinates displayed by the  digcorners
      program on the "Combine Files into Master Raster" tracking form. Verify  these
      coordinates by comparing them to the "mapcon.log" file. Due to differences in
      orientation, the X and Y will not necessarily be listed in the same corner, and will
      probably be slightly different.  The mapcon program simply projects the old UTM
      file corners. The output of digcorners is more exact and lists the extreme minimum
      and maximum X and Y coordinates.  If you cannot find any coordinates in the
      "mapcon.log" file that are similar (within 10 m) to the coordinates output from
      digcorners, then check the contents of "geol.pfile" (see step 11) and repeat steps 12
      to 15.  It is more likely that  mapcon was run incorrectly than dighead, ccvrt, and
      digcorners.  If you are sure  mapcon is correct,  rerun dighead, ccvrt,  and
      digcorners  (steps 8, 14, and  15, respectively).  This  is a critical step; do not proceed
      until you are sure everything was done correctly to this point.
                                       35

-------
 17) Round off corners to multiples of 50 so they match the master-raster.  Use the following
       conventions and be careful with negatives:
              Upper left X:  round DOWN to the nearest multiple of 50.
              Upper left Y:  round UP to the nearest multiple of 50.
              Lower right X: round UP to the nearest multiple of 50.
              Lower right Y: round DOWN to the nearest multiple of 50.
       Remember that rounding a negative up results in a smaller negative number.  Record
       these coordinates on the "Combine Files into Master Raster" tracking form.

 !8) Copy the above parameter file (see step 11) into a new file for editing.  The new file
       should be called "geo2.pfile."
              sunss03% cp geol.pfile geo2.pfile

 19) Edit line 8 of the parameter file and replace the UTM coordinates hi line 8 with the new
       Albers coordinates from step  17.  The new parameter file should be similar to the
       following:
                    geo.cfile
                    C
                    1.0,1.0,18,0.0,-77.83333333333,1,38.0,42.0
                    YES
                    sublb.gis, ERDAS, NN
                    tm!533sl.gis
                    25,25
                    8350,4146250,146100,3963900,1,7128,1,5290
20) Make a hard-copy printout of the new parameter folder and save it in the file for this
       scene.
              sunss03% Ipr geo2.pfile

21) Before running the georef program (the next step), the file coordinates of the upper left
       corner  must be 1,1.  This position may be verified using the listhead program (see
       step 10). If the upper left file coordinates are not 1,1 use the ERDAS program
       flxhed  to change it.  Do not change any other header information.
22) The georef program uses the output from mapcon to calculate the necessary
       transformation coefficients to project the file from UTM into Albers projection.
       the georef program, using the above parameter file (step 19) as input.
             sunss03% georef geo2.pfile
Run
23) The geomap program uses the output from georef and actually creates the new raster
       file.  Run geomap using the above parameter file (step 19) as input.
             sunss03% geomap geo2.pfile
                                          36

-------
 24) Record the output ".gis" file name (line 6, step 11) on the "Combine Files into Master
       Raster" tracking form.

 25) .Check the output using the ERDAS display program.  If you were able to retrieve a
       rainbow file from tape archive (see steps 4 and 5) you may use color-mod to retrieve
       a color scheme.  Otherwise, the trailer will have to be copied to the new file name
       using the  following command line as an example:
              sunss03% cp sublb.trl tml533sl.trl;

       Create a new rainbow file using the ERD AS colormod program option "r -        *
       •RAINBOW file I/O."  Use a file name similar to that recorded hi step 24 (for the
       above example the rainbow file name would be:  "tml533sl.rnb"). Save  the color-
       scheme that resembles the raw data and call it "unclassed."

       When viewing the file, make sure  none of the corners was inadvertently lost in the
       projection process.  If they were lost; the rounded corner coordinates  (step 17) were
       probably not computed correctly,  or were entered into "geo.pfile" incorrectly (step
       19).  Repeat: steps 17 to 24 until the entire file is transformed correctly.

Subset

Add the subset to the master-raster by the following steps.

26) Figure out the master-raster  file coordinates  for upper !cft and lower right corners of the
       subset.  Use the rounded coordinates from above (step 9) and the  following formulas:
              Master X file coordinate = (257525  + Albers X) / 25
              Master Y file coordinate = (4507925 - Albers Y) / 25

       Record these coordinates  on: the "Combine Files into Master Raster" tracking form.
       Do these calculations carefully and. double-check the results.  A miscalculation  can
       result in loss of'data in the master-raster.
                                          37

-------
27) Use the ERDAS subset program to add the subset to the master-raster.  Make sure that
       you use the proper file coordinates from step # above as the place for the upper left
       corner (line 5 below).  Also specify that you want the output file overwritten (line 4
       below), but that zero values should NOT overwrite existing data (line 6 below). Use
       care when answering the prompts because the Master-Raster will be overwritten and it
       will be difficult to correct mistakes.  The following responses illustrate important
       correct answers to prompts:
              Image or GIS file? GIS
              Enter  Input GIS filename : tm!533sl
              Enter  Output GIS filename  : /drs4/rsches/master-raster/master
              Overwrite the file? Yes
              Enter  output file coordinates at which to place
                    upper left corner of input subset? 10636 14468
              Should input zero values overwrite data in the output file? No

28) Use the ERDAS display program to view the master-raster and assure that the subset fits
       into its proper place.  Stay hi the current directory and specify the full path name,
       "/drs4/rsches/master-raster/master.gis."

29) Use the ERDAS program colormod, option "r," to retrieve the color scheme created in
       step 25.  This program will shade the new subset correctly, but black out the other
       values of the  master-raster.  Use colormod option "c - color  palette entry" to reset
       the master-raster colors correctly.  Use the "o - open new palette file" option and use
       the file: "/drsl/rsches/defaults/codes.dat."  Modify the colors according to the
       following table:
GIS value
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
Color name
sun--tan
11
12
21
22
31
41
42
43
51
52
53
54
61
71
72
73
81
91
100
Red
174
175
255
200
240
255
0
0
0
200
150
100
50
255
0
0
0
255
175
0
Green
171
0
0
125
185
255
255
200
150
200
150
100
50
255
255
201
150
0
0
0
Blue
128
0
0
50
130
0
0
0
0
200
150
100
50
255
255
200
150
255
175
200
                                           38

-------
        When you are done, the new subset should have colors resembling the original data,
        and the data previously entered into the master-raster should have the standard
        classification colors.  Using the colormod program option, "r - RAINBOW file I/O,"
        retrieve the rainbow file created in step 25 and save the color scheme, using the name
        "unclassed."  This action will replace the color scheme created in step 25 with the
        one currently on display.

 30) Using the colormod program option  "t - trailer update of GIS file," place the color
        scheme into  the master-raster trailer file.

 Recode

 Review the previous cluster labels, make changes to the previous  labels  as needed to
 edge-match the subset to the master-raster,  and recede the subset  to the  classification values
 by the following steps.

 31) Fill-out the  "Master-Raster Recode" tracking form.  This form is used, to record the
       original cluster labels and the new labels within the master-raster.  The columns for
        "old" labels and for the "change" should contain numbers from the classification
       numbering scheme  (11 = Developed - high intensity, 12 = Developed - low
       intensity,  etc).  The only entries in the  "change" column should be for those clusters
       whose label will be changed (including the "recluster" clusters whose new value will
       be 200).   The column marked "new" labels will correspond to a numbering scheme
       that ranges from 200 to 219 (see step 35 below).

       Enter the  Path, Row,  and Subset at the  top of the form.  Retrieve the "Cluster
       Labels" tracking form from the folder and fill out the "old label"  column of the
       "Master-Raster Recode" tracking form.

32) This step will create a color scheme that reflects the  classes as originally labelled.
       Display the master-raster using the ERDAS display program.  Take, the default
       reduction  factor that allows  the entire master-raster  to be displayed.. Using the
       ERDAS program colormod option "c -  color palette entry" set the cluster colors to
       correspond to the label classes from the "Master-Raster  Recode"  tracking form (step
       31). Use the "o - open new palette file" option and use the file:
       "/drsl/rsches/defaults/codes.dat."  Modify the colors of the original clusters according
       to the following table:
                                           39

-------
Label value
0
11
12
21
22
31
41
42
43
51
52
53
54
61
71
72
73
Color name
black
11
12
21
22
31
41
42
43
51
52
53
54
61
71
72
73
Red
174
175
255
200
240
255
0
0
0
200
150
100
50
255
0
0
0
Green
171
0
0
125
185
255
255
200
150
200
150
100
50
255
255
200
150
Blue
128
0
0
50
130
0
0
0
0
200
150
100
50
255
255
200
150
Description
Areas outside the watershed
Developed. - High Intensity
Developed - Low Intensity
Cultivated - Woody
Cultivated - Herbaceous
Herbaceous
Woody - Deciduous
Woody - Mixed
Woody - Evergreen
E:qx;sed - Soil
Exposed - Sand
Exposed - Rock
Exposed - Evaporite Deposits
Snow & Ice
Woody Wetlands - Deciduous
Woody Wetlands - Mixed
Woody Wetlands - Evergreen
33) Using the colormod program option "r - RAINBOW file I/O," retrieve the rainbow file
       created in step 25 and save the color scheme using the name "old."

34) The entire edge between the new subset and the previously added data must be visually
       inspected to determine if changes to the cluster labels  must be made.  Display the
       master-raster using the ERDAS program display with a magnification factor of 1.
       This step must be repeated until all portions of the subset boarder are visited.

       During this step, determine whether a change hi a cluster labels can be  made in a way
       that minimizes the differences between subsets. Look for  areas of homogenous cover
       type that straddle the edge and make sure that there is no difference between subsets.
       Any changes to a cluster label must be noted on the "Master-Raster Recede" tracking
       form.  Changes to the cluster labels should be kept to a minimum.  Remember that
       the original analyst used a variety of reference material and studied areas throughout
       the scene to determine the original cluster labels.  Before making label changes,
       consider the original analyst's notes and the impact of the  changes.

       There are no step-by-step instructions for this  process. The following is a list of
       routines, programs, and processes may be of some use.
       - Use  the colormod option "r - RAINBOW file I/O" to retrieve the rainbow file
             created hi step 25 and updated hi step 33.  Toggling between the two color
             schemes may be useful for interpreting the image.
       - If you decide on a label change, make a new color scheme reflecting that change
             and  save it hi  a temporary rainbow file, along with the "old" scheme from the
             rainbow file list above.  Use the same technique as  above to toggle between
             the "old" and  "new" color schemes.  NOTE: A bug hi the colormod program
             causes problems hi saving the correct color scheme with the correct name. In
                                          40

-------
              the past, this bug has resulted in loss of the entire file.  Since you will be
              saving the rainbow file created in step 25, it is suggested  that another
              temporary rainbow file be made for this technique.
       - The ERDAS program curses can be used to find the cluster number that may need
              changing.
       - Keep the color scheme from step 30 hi the trailer of the master-raster. This action
              will make it easier to see the edge when a new section is displayed.
       - If you are advancing the display along a relatively vertical edge of the subset, use
              the "Keyboard" option when entering file coordinates in display and simply
              add or subtract 1000 to the Y coordinate.  Similarly, along relatively horizonal
              edges, add or subtract 1000 to the X coordinate.
       - Look for notes on the "Cluster Labels" tracking form that may  indicate which
              clusters the analyst had problems labelling.  Sometimes these notes  can
              indicate an alternative label that may be a better match.  Discuss the changes
              with the original analyst to determine if the change is inappropriate.
   This step must be repeated so that the entire edge of the subset is viewed at a
       magnification factor of 1.

35) Fill out the "new label" column of the "Master-Raster Recede" tracking form. This
       column will have an entry for eveiy cluster in the file.  Most of the entries should be
       a simple translation of the  "old label" column values into the "new label" column
       values using the table below.  The exceptions are those clusters with an entry hi the
       "change" column. The following new values will be used as the  master-raster class
       numbers:
New value'
0
200
201-
202
203
204,
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
Old value
0
255
11
12
21
22
31
41
42
43
51
52
53
54
61
71
72
73
81
91
100
Description
Areas outside of the watershed
Unclassed areas within the watershed
Developed - High Intensity
Developed - Low Intensity
Cultivated — Woody
Cultivated - Herbaceous
Herbaceous
Woody - Deciduous
Woody - Mixed
Woody - Evergreen
Exposed - Soil
Exposed - Sand
Exposed. - Rock
Exposed - Evaporite Deposits
Gnow & Ice
Woody Wetland - Deciduous
Woody Wetland - Mixed
Woody Wetland - Evergreen
Herbaceous Wetland
Non-Vegetated Wetland
Water
                                          41

-------
36) The program recode2 will be used to change the pixel values from the cluster numbers to
       the master-raster classification numbers.  Create a parameter file to be used for
       receding the portion of the master-raster  containing the subset.  This file should be
       called "recode.pfile" and contain the following (correct entries on right):
             Line 1: The input file name  	  /drs4/rsches/master-raster/master.gis
             Line 2: A default recede value for pixels with values other then those in the
                    recede table starting on line 4. Any negative value specifies that these
                    pixels retain their original  values   	    200
             Line 3: The following parameters must be separated by commas or spaces:
               element 1: line to begin processing  	   see step 25
               element 2: column to begin processing   	   see step 25
               element 3: line to end processing   	   see step 25
               element 4: column to end processing   	   see step 25
             Line 4 - end of file: Starting hi line 4, each line should have two integers
                    separated by commas or spaces.
               element 1: old pixel value   	   see step 28
               element 2: new pixel value   	   see step 28
   Your file should be longer, but similar to the following:

                    /drs4/rsches/master-raster/master. gis
                    200
                    9369,20795,14607,23517
                    00
                    1   206
                    2   206
                    3   204
                    4   204
                    5   208
                    6   205
                    7   205
                    8   216  .
                    255 200

      Lines  1, 2 and 4 should always be the same, as above. The line 3 parameters can be
      taken from the "Combine Files Into Master Raster"  tracking form (see step 25).  The
      file coordinates hi line 8 must be entered hi reverse  order then are listed on the
      tracking form (i.e. Y,X,Y,X not X,Y,X,Y).  The file should have a line for every
      cluster number.  All "new" values (the right column) must be between 200 and 219.
      The last line should be to change pixels with 255 (unclassed) to new class  200.
                                           42

-------
 37) Obtain a printout of the above (step •#) parameter file and compare it to the  "Master-
        Raster Recede" tracking form.  Make sure the contents of this file are correct before
        proceeding to the next step.  File the hardcopy in the folder for this subset.
              sunss<33% Ipr recode.pfile

 38) Run the recode2 program to change the values hi the master-raster.
              sunss03% recodeZ recode.pfile
Archive


Rather then making separate backup 8-mm tapes for each subset, all important interim files
will be  kept on this system until all subsets are completed.  The entire master-raster directory
will"be  archived at once.

39) Remove the original files that were retrieved from archive in step 5. Also remove any
       temporary files that you may have generated during processing.  Use the checklist on
       the "Combine Files Into Master Raster" tracking form to assure you delete the
       unnecessary files.
              sunss03% rm sublb*

40) Compress all files in the directory.
              sunss03% compress *

41) Use the checklist  on the "Combine Files Into Master Raster" tracking form to make sure
      all files and hardcopy outputs exist.
                                          43

-------
Path:
Row:
                            COMBINE FILES INTO MASTER RASTER
Subset:
Retrieve file




  8-mm tape number:_



  ".gis" file name:	



  ".trl" file name:
  output file:_
Project to Albers




  old UTM coordinates



    upper left:  	



    lower nglit:	



  old file size



     number of rows:
  parameter file (file_
  mapcon.log file (file_
  new Albers coordinates



    upper left:  	



    lower right:	
                   zone:
               number of columns:
  Albers coordinates rounded to multiple of 50



    upper left:  	



    lower right:	
                                              44

-------
   parameter file (file
   output ".gis" file name:	

 Subset

   Master file coordinates

     upper left: 	

     lower right: 	

 Recode

   re-code, parameter file (file
File Checklist:
       	GEOPNTS. DAT
       	albersl.dig
       	albersl.pro
       	albers2. dig
       	geo.cfile
       	geol.pfile
       	geo2.pfile
       	mapcon; log
       	recede, pfile
       	tml533sl.gis
            Jml533sl.rnb
            _tml533sl".trL
            _utml.dig
            _utml.pro
Printout Check List:
       	geol.pfile
       	mapcon. log
       	geol.pfile
       	recede. pf ile
                                                 45

-------
                               MASTER-RASTER RECODE
   Path
Row
Subset
   new label   old label   change     notes
 1:
 2:
 3:
 4:
 5:
 6:
 7:
 8:
 9:,
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
                                          46

-------
     new label    old label
notes
 26:
 27:
 28:
 29:
 30:
 31:
 32:
 33:
 34:
 35:
 36:
 37:
 38:
 39:
 40:
 41:
 42:
 43:
 44:
 45:
 46:
 47: _
48:
49:
50:
                                             47

-------
    new label   old label
notes
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
65:
66:
67:
68:
69:
70:
71:
72:
73:
74:
75:
                                             48

-------
     new label   old label    change      notes
 76:
 77:
 78:
 79:
 80:
 81:
 82:
 83:
 84:
 85:
 86:
 87:
 88:
 89:
 90:
 91: .
 92:
 93:
 94:
 95:
 96:
 97: _
 98: _
 99:
100:
                                             49

-------
     new label   old label
notes
101:
102:
103:
104:
105:
106:
107:
108:
109:
110:
111:
112:
113:
114:
115:
116:
117:
118:
119:
120:
121:
122:
123:
124:
125:
                                              50

-------
      new label    old label
notes
 126:
 127:
 128:
 129:
 130:
 131:
 132:
 133:
 134:
 135:
 136:
 137:
 138:
 139:
 140:
 141:
 142:
 143:
 144:
 145: _
 146:
 147: _
 148: _
 149: _
150:
                                             51

-------
               Photographic Accuracy Assessment Form
Landsat Scene  ID #:



Classified Data Set:
Classification System: 	



Classification Accuracy Table, File:
Photo Coverage  .dig File:
Date of Photograph:



Stereo Coverage: 	
          Sample Site #:





         	  Projection:



         	  Projection:
1:100,000 Scale Map:



Photo Quality: 	
             Projection:
          Frame #:



      Photo Media:
      Primary Class:.
   Secondary Class
Sample Site Characteristics and Components:
Analyst
Date of Interpretation:
                                52

-------
                    Sample Site Component Grid





Photograph Frame #: 	     Sample- Site~#
                               53

-------
 CHESAPEAKE BAY WATERSHED METADATA

 Data_set_identity: Chesapeake Bay Watershed Thematic Land Coverage/Land Use.
 Theme_keywords: Chesapeake, thematic, land cover, land use,  watershed.
 Representation_model:  Vector-topologic.
 Spatial_object_types:  Pixel/Grid
 Native_data_set_size:  57 MB
 Transfer_format:  ARC Grid
 Transfer_size:  57 MB
 Transfer_format:  ARC Grid
 Transfer_size:  57MB
 Data_set_description:  Landsat Thematic coverage of the Chesapeake Bay Watershed.  Final data set
        includes ten thematic land cover categories at a 25 meter resolution.  The final data set is in an
        ARC Grid format.
 Intended_use: Data set to be used for the Chesapeake Bay Program Office's non-point pollution models.
        Also for use as a thematic map of the land cover within the Chesapeake Bay Watershed.
 Data_set_extent:  -257500,4507900,287400,3800500
 Geographic_area:   Chesapeake Bay Watershed
 Intended_scale(s)_of_use:  24000,100000,250000
 Resolution_of_data: 25 m
 Projectionjname:  Albers Conical Equal Area
 Horizontal_datum_or_ellipsoid:  NAD83
 Vertical_datum:  NGVD
 Projection_units:  meters
 Standard_parallel:  38.0
 Standardjparallci:  42.0
 Longitude_of_central_meridian:  -77.5
 Latitude_pf_projection's_origin:  0
 Coordinatejprecision:  Single
 Contact_type: Source/Authority.
 Contact_organization: U.S. Environmental Protection Agency, Environmental Monitoring and Assessment
        Program - Landscape Characterization.
 Contact_person_title:  Denice Shaw, Technical Coordinator.
 Contact_mailing_address: U.S. Environmental Protection Agency,  EMAP Center, Catawba Building,
        Research Triangle Park, NC 27711.
 Contact_telephone: (919) 541-2698
 Contact_email: denice.shaw@heart.epa.gov
 Contact_instructions: contact for technical information via e-mail or regular mail
Contact_type: Distributor  .
Contact_organization:  Customer Services, U.S. Geological-Survey, EROS Data Center.
Contactjperson:  Customer Services
Contact_mailing_address:  U.S. Geological Survey, EROS Data Center, Customer Services, Souix Falls,
        SD 57198
Contact_telephone: (605) 594-6511
Contact_instructions:  Data are available on  8mm  data tapes.   Tape requests  are  filled  at cost of
        duplication.
Transferjnode: 8mm data tape
                                             54

-------
  Transfer_instructions: Data is transferred in and ARC Grid format
  Degree_of_digital_completion:  Complete
  Completion_status:  Completed
  Completion_date:  19940215
  Percentage_complete:  Complete.
  Degree_of_availability: Complete.
  Policy_status:  Users may obtain these data at the cost of reproduction.
  Copyright_status:  Public domain.
  Custodialjiability: Custodian does not assume liability.
  Table_identity:  chesgrid.vat
  Table_definition:  Polygon attribute table for land cover codes.
  Table_definition_source:  author
  Attribute_identity: area
  Attribute_defmition:   area measured in equal area meters.
  Attribute_defmition_source:  software-defined
 Attribute_table_identity: chesgrid.vat
 Attribute_domain_value:  positive real numbers.
 Attribute_domain_value_defmition:  none
 Attribute_format:  real
 Attribute_format_length:  12
 Attribute_units_of_measure:  square meters.
 Attribute_authority:  U.S. EPA, Environmental Systems Laboratory - Las Vegas
 Source_name:  Land  Cover from Landsat Thematic Mapper imagery
 Bibliographicjeference: U.S. Environmental. Protection Agency, Environmental Monitoring Systems
        Laboratory -  Las Vegas, 1994, EMAP Chesapeake Bay Watershed Pilot Project Final Report-
        U.S. Environmental Protection Agency, Office of Research and Development, Environmental
        Monitoring Systems Laboratory, Las Vegas, NV
 Source_scale:  25 meter pixel units
 Source_scale:  N/A
 Source_medium: Landsat Thematic Mapper digital data files in band sequential format
 Creator_of_source: EOSAT Corporation, Lanham, Maryland
 Date(s)_of_source_materials:   1988-1991
 Source_projection:, Universal Transverse Mercator (UTM),  UTM Zones 17 and 18
 Final Projection: Albers Equal Area
 Procedure:  1. The  Landsat Thematic Mapper imagery was geocorrected by Hughes STX Corporation
        The final data product was geocorrected to 25 meters.  This data was then ship to the US. EPA's
        Environmental Systems Laboratory - Las Vegas for image processing.
        2. The  Landsat data was processed using a modified unsupervised image processing technique
        The data was clustered using an unsupervised clustering algorithm. The data was then observed
        and, the confusion  clusters identified.   Tnese confusion clusters were then  placed  into the
        unsupervised lustering algorithm for reclustering of the spectral data.
        3. After clustering the data was'labelled, and the clusters receded into the appropriate land cover
        category.
Procedure_date:  199404
Procedure_contact:  Dorsey Worthy, Remote Sensing Program Manager, U.S. Environmental Protection
        Agency, 944 E. Harmon, Las Vegas, NV  89119,  telephone (702) 798-2274
Positional_accuracy:  +/- 15 meters
                                              55

-------
Positional_accuracy_method:  Spatial Accuracy Test
Positional_accuracy_explanation:  Landsat Thematic Mapper data met the spatial accuracy desired.
Attribute_accuracy:  80% overall within 85% confidence interval
Attribute_accuracy_method: Stratefied systematic random point photointerpretation with field validation.
Data_model_integrity:  Data  set contains thematic land  cover categories  for the  Chesapeake Bay
       Watershed
Completeness:  Complete.
Metadata_revision_date:  19940320
Metadata_contact:  Dorsey  Worthy, amdldw@vegasl.las.epa.gov
                                              56

-------