I
3

\
5
o
^
p*
       Geospatial Data
   Accuracy Assessment
            Edited by
          Ross S. Lunetta
           John G. Lyon
                          145LCB04 RPT •> 8/27/2004

-------
                            EPA/600/R-03/064
                             December 2003
    Geospatial Data
Accuracy Assessment
             by

      R.S. Lunetta and J.G. Lyon
                                145LCB04.RPT <• 8/27/04

-------

-------
                                         Foreword
The development of robust accuracy assessment methods for the validation of spatial data represents a
difficult scientific challenge for the geospatial science community. The importance and timeliness of this
issue is related directly to the dramatic escalation in the development and application of spatial data
throughout the latter 20* century. This trend, which is expected to continue, will become evermore
pervasive and continue to revolutionize future decision making processes. However, our current ability
to validate large-area spatial data sets represents a major impediment to many future applications.
Problems associated with assessing spatial data accuracy are primarily related to their valued
characteristic of being continuous data, and to the associated geometric or positional errors implicit with
all spatial data. Continuous data typically suffer from the problem of spatial autocorrelation which
violate the important statistical assumption of "independent" data, while positional errors tend to
introduce anomalous errors with the combining of multiple data sets or layers.  The majority of large-area
spatial data coverages are derived from remote sensor data and subsequently analyzed in a GIS to provide
baseline information for data driven assessments to facilitate the decision making process.

This important topic was the focus of a special symposium sponsored by the U.S. Environmental
Protection Agency (EPA) on "Remote Sensing and GIS Accuracy Assessment" on December 11-13,
2001 in Las Vegas, NV. The symposium evaluated the important science elements relevant to the
performance of accuracy assessments for remote sensing derived data and GIS data analysis and
integration products. A keynote address was delivered by Dr. Russell G. Congalton that provided
attendees with an historical accuracy assessment overview and that identified current technical gaps and
established important issues that were the topic of intense debates throughout the symposium. A total of
27 technical papers were presented over the two and one-half day symposium by an international group
of scientists representing federal, state and local governments, academia, and non-governmental
organizations. Specific technical presentations examined sampling issues, reference data collection, edge
and boundary effects, error matrix and fuzzy assessments, error budget analysis, and special issues
related to change detection accuracy assessment.

Abstracts submitted for presentation were evaluated for technical merit and assigned to technical sessions
by the program committee members.  Members then served as technical session chairs, thus maintaining
responsibility for session content. Subsequent to  the symposium, presenters were invited to submit
manuscripts for consideration as chapters.  This book contains 20 chapters that represent the important
symposium outcomes. All chapters have undergone peer review and were determined to be suitable for

-------
 publication. The editors have arranged the book into a series of complementary science topics to provide
 the reader with a detailed treatise of spatial data accuracy assessment issues.

 The symposium chairs would like to thank the program committee members for their organization of
 individual technical sessions and participation as session chairs and presenters.

 Ross S. Lunetta and John G. Lyon
 U.S. Environmental Protection Agency
IV

-------
                               Acknowledgments
Symposium Sponsor
U.S. Environmental Protection Agency
Office of Research and Development

Symposium Chairs and Book Editors
Ross S. Lunetta, U.S. Environmental Protection Agency
John G. Lyon, U.S. Environmental Protection Agency

Program Committee and Session Chairs
Gregory S. Biging, University of California at Berkeley
Russell G. Congalton, University of New Hampshire
Christopher D. Elvidge, National Oceanographic and Atmospheric Administration
John S. liames,  U.S.  Environmental  Protection Agency
S. Taylor Jarnagin, U.S. Environmental Protection Agency
Michael Jennings, U.S. Geological Survey
K. Bruce Jones, U.S. Environmental Protection Agency
Siamak Khorram, North Carolina State University
Thomas R. Loveland, U.S. Geological Survey
Thomas H. Mace, National Aeronautics and Space Administration
Anthony R. Olsen, U.S. Environmental Protection Agency
Elijah Ramsey III, U.S. Geological Survey
Terrence E. Slonecker, U.S. Environmental Protection Agency
Stephen V. Stehman, State University of New York
James D. Wickham,  U.S. Environmental Protection Agency
L. Dorsey Worthy, U.S. Environmental Protection Agency

-------
VI

-------
                                Table of Contents
Foreword 	 m
Acknowledgments  	v

Chapter 1:  Putting the Map Back in Map Accuracy Assessment  	  1
  1.1  Introduction	  1
  1.2  Accuracy Assessment Overview	  2
        1.2.1  Historical Review	  2
        1.2.2  Established Techniques and Considerations	  3
        1.2.3  The Error Matrix	  3
        1.2.4  Discrete Multivariate Analysis	  4
        1.2.5  Sampling Size and Scheme	  5
        1.2.6  Spatial Autocorrelation	  7
  1.3  Current Issues and Needs 	  7
        1.3.1  Sampling Issues	  7
        1.3.2  Edge and Boundary Effects	  7
        1.3.3  Reference Data Collection 	  8
        1.3.4  Beyond the Error Matrix:  Fuzzy Assessment  	  8
        1.3.5  Error Budget Analysis	  9
        1.3.6  Change Detection Accuracy Assessment  	 10
  1.4  Summary	 11
  1.5  References	 12

Chapter 2:  Sampling Design for Accuracy Assessment of Large-Area, Land-
            Cover Maps: Challenges and Future Directions	 15
  2.1  Introduction	 15
  2.2  Meeting the Challenge of Cost-Effective Sampling Design	 17
        2.2.1   Strata Versus Clusters: The Cost Versus Precision Paradox	 17
        2.2.2   Flexibility of the NLCD Design	 19
        2.2.3   Comparison of the Three Options	 20
        2.2.4   Stratification and Local Spatial Control	 20
                                                                                        VII

-------
  2.3 Existing Data	  24
        2.3.1 Added Value Uses of Accuracy Assessment Data	  25
  2.4 Non-Probability Sampling 	  25
        2.4.1 Policy Aspects of Probability Versus Non-Probability Sampling  	  26
  2.5 Statistical Computing 	  27
  2.6 Practical  Realities of Sampling Design	  28
        2.6.1 Principle 1	  28
        2.6.2 Principle 2	  28
        2.6.3 Principle 3	  28
        2.6.4 Principle 4	  29
  2.7 Discussion	  29
  2.8 Summary	  30
  2.9 References	  31

Chapter 3: Validation of Global Land-Cover Products by the Committee on
            Earth Observing Satellites	  35
  3.1 Introduction	  35
        3.1.1 Committee on Earth Observing Satellites	  35
        3.1.2 Approaches to Land-Cover Validation	  36
        3.1.3 Lessons Learned from IGPB Discover	  38
  3.2 Validation of the European Commission's Global Land-Cover 2000 	  39
  3.3 Validation of the MODIS Global Land-Cover Product  	  40
  3.4 CEOS Land Product Validation Subgroup	  41
        3.4.1 Fine-Resolution Image Quality and Availability  	  42
        3.4.2 Local Knowledge Requirements	  43
        3.4.3 Resource Requirements	  44
  3.5 Summary	  44
  3.6 Acknowledgments	  44
  3.7 References	  44

Chapter 4: In Situ Estimates of Forest LAI for MODIS Data Validation	  47
  4.1 Introduction	  47
        4.1.1 Study Area	  49
  4.2 Background	  49
        4.2.1 TRAC Measurements 	  49
        4.2.2 Hemispherical Photography Measurements  	  51
        4.2.3 Combining TRAC and Hemispherical Photography	  53
        4.2.4 Satellite Data	  53
        4.2.5 MODIS LAI and NDVI Products  	  53
VIII

-------
 4.3  Methods	 54
       4.3.1  Sampling Frame Design  	 54
       4.3.2  Biometric Mensuration	 57
       4.3.3  TRAC Measurements  	 58
       4.3.4  Hemispherical Photography  	 59
       4.3.5  Hemispherical Photography Quality Assurance	 60
 4.4  Discussion	 60
       4.4.1  LAI Accuracy Assessment 	 60
       4.4.2  Hemispherical Photography  	 61
       4.4.3  Satellite Remote Sensing Issues 	 62
 4.5  Summary	 62
 4.6  Acknowledgments	 63
 4.7  References	 63

Chapter 5:  Light Attenuation Profiling as an Indicator of Structural Changes
            in Coastal Marshes  	 67
 5.1  Introduction	 67
       5.1.1  Marsh Canopy Descriptions  	 69
 5.2  Methods	 69
       5.2.1  Field Collection Methods	 69
             5.2.1.1   Area Frequency Sampling  	 71
             5.2.1.2  Vertical Frequency Sampling	 71
             5.2.1.3  Atypical Canopy Structures 	 72
             5.2.1.4  Changing Sun Zenith 	 72
 5.3  Results  	 73
       5.3.1  Vertical Frequency Sampling	 73
       5.3.2  Atypical Canopy Structures  	 74
       5.3.3  Changing Sun Zenith	 78
 5.4  Discussion	 82
 5.5  Summary	 84
 5.6  Acknowledgments	 85
 5.7  References	 85

Chapter 6:  Participatory Reference Data Collection Methods for
            Accuracy Assessment of Land-Cover Change Maps	  87
 6.1  Introduction	  87
       6.1.1  Study Objectives	  88
       6.1.2  Study Area	  89
 6.2  Methods	 90
       6.2.1  Imagery	 90
                                                                                         IX

-------
        6.2.2  Reference Data Collection 	  91
        6.2.3  Data Processing	  93
        6.2.4  Image Classification	  94
        6.2.5  Accuracy Assessment	  94
  6.3  Results and Discussion	  95
        6.3.1  Classified Imagery and Land-Cover Change 	  95
        6.3.2  Map Accuracy Assessment	  97
        6.3.3  Bringing Users  Into the Map	  100
  6.4  Conclusions	  101
  6.5  Summary	  102
  6.6  Acknowledgments	  103
  6.7  References	  103

Chapter 7: Thematic Accuracy Assessment of Regional Scale Land-Cover Data  	  107
  7.1  Introduction	  107
  7.2  Approach	  108
        7.2.1  Sampling Design	  108
        7.2.2  Training	  109
        7.2.3  Photographic Interpretation	  110
              7.2.3.1  Interpretation Protocol  	  110
              7.2.3.2  Interpretation Procedures	  110
              7.2.3.3  Quality Assurance and Quality Control	  Ill
  7.3  Results	  113
        7.3.1  Accuracy Estimates	  113
        7.3.2  Issues and Problems  	  116
              7.3.2.1  Heterogeneity	  116
              7.3.2.2  Acquisition Dates	  116
              7.3.2.3  Location Errors	  116
  7.4  Further Research  	  118
  7.5  Acknowledgments	  118
  7.6  References	  118
Appendix A: MRLC Classification Scheme and Class Definitions  	  121
  Al  Water   	  121
        Al.l  Water	  121
  A2  Developed	  121
        A2.1  Low Intensity Residential	  121
        A2.2  High Intensity Residential	  121
        A2.3  High Intensity Commercial/Industrial/Transportation 	  122
  A3  Barren  	  122
        A3.1  Bare Rock/Sand	  122

-------
       A3.2  Quarries/Strip Mines/Gravel Pits  	  122
       A3.3  Transitional	  122
 A4 Natural Forested Upland (non-wet)  	  122
       A4.1  Deciduous Forest	  122
       A4.2  Evergreen Forest	  122
       A4.3  Mixed Forest	  123
 A5 Herbaceous Planted/Cultivated	  123
       A5.1  Pasture/Hay	  123
       A5.2  Row Crops	  123
       A5.3  Other Grasses 	  123
 A6 Wetlands	  123
       A6.1  Woody Wetlands	  123
       A6.2  Emergent Herbaceous Wetlands	  123

Chapter 8:  An Independent Reliability Assessment for the Australian
            Agricultural Land-Cover Change Project 1990/91 -1995  	  125
 8.1 Introduction	  125
 8.2 Methods	  129
 8.3 Results  	  131
 8.4 Discussion and Conclusions	  134
 8.5 Summary	  135
 8.6 Acknowledgments	  135
 8.7 References	  135

Chapter 9:  Assessing the Accuracy of Satellite-Derived
            Land-Cover Classification Using Historical Aerial Photography,
            Digital Orthophoto Quadrangles, and Airborne Video Data	 137
 9.1  Introduction	 138
 9.2  Background	 139
       9.2.1  Upper San Pedro Watershed Study Area  	  139
       9.2.2 Reference Data Sources for Accuracy Assessment 	  139
       9.2.3 Reporting Accuracy Assessment Results  	  140
  9.3  Methods	  140
       9.3.1  Image Classification 	  140
       9.3.2 Sampling Design	  143
       9.3.3 Historical Aerial Photography 	  144
             9.3.3.1  Image Collection, Preparation, and Site Selection	  144
             9.3.3.2  Photograph Interpretation and Assessment 	  144
       9.3.4 Digital Orthophoto Quadrangles	  145
             9.3.4.1  Interpreter Calibration  	  145
                                                                                         xi

-------
             9.3.4.2  Sample Point Selection	  145
       9.3.5  Airborne Videography  	  145
             9.3.5.1  Video and CIS Data Preparation  	  146
             9.3.5.2  Video Sample Point Selection  	  146
             9.3.5.3  Random Frame Selection and Evaluation	  146
 9.4  Results	  147
       9.4.1  Aerial Photography Method  	  147
       9.4.2  Digital Orthophoto Quadrangle Method	  148
       9.4.3  Airborne Videography Method	  149
 9.5  Discussion	  150
       9.5.1  Map Accuracies	  150
       9.5.2  Class Confusion  	  151
       9.5.3  Future Research	  152
 9.6  Conclusions	  152
 9.7  Summary	  153
 9.8  Acknowledgments	  154
 9.9  References	  154

Chapter 10:  Using Classification Consistency in Inter-Scene Overlap Areas
             to Model Spatial Variations in Land-Cover Accuracy Over
             Large Geographic Regions 	  157
 10.1  Introduction	  157
 10.2  Link Between Classification Consistency and Accuracy	  158
 10.3  Using Consistency Within a Classification Methodology  	  161
 10.4  Great Lakes Results  	  162
       10.4.1  Variation of Consistency Among Clusters of a Given Class 	  163
       10.4.2  Aspects of Scene-Based Consistency Overlays  	  163
       10.4.3  Aspects of the Accumulated Confidence Layer	  165
       10.4.4  Relationship of Accumulated Confidence and User's Accuracy  	  168
 10.5  Conclusions	  169
 10.6  Summary	  169
 10.7  References	  170

Chapter 11:  Geostatistical Mapping of Thematic Classification Uncertainty	  171
 11.1  Introduction	 171
 11.2  Methods	  173
       11.2.1  Classification Based on Remotely Sensed Data	 173
       11.2.2  Geostatistical Modeling of Context  	 174
       11.2.3  Combining Spectral and Contextual Information	 176
XII

-------
      11.2.4  Mapping Thematic Classification Accuracy	  178
      11.2.5  Generation of Simulated TM Reference Values	  179
 11.3  Results  	  180
      11.3.1  Spectral and Spatial Classifications 	  181
      11.3.2  Merging Spectral and Contextual Information	  183
      11.3.3  Mapping Classification Accuracy	  186
 11.4  Discussion	  186
 11.5  Conclusions	  187
 11.6  Summary	  187
 11.7  References	  188

Chapter 12:  An Error Matrix Approach to Fuzzy Accuracy Assessment:
             The NIMA Geocover Project	  191
 12.1  Introduction	  191
 12.2  Background	  192
 12.3  Methods	  194
      12.3.1  Classification Scheme	  194
      12.3.2  Sampling Design	  195
      12.3.3  Site Labeling	  195
      12.3.4  Compilation of the Deterministic and Fuzzy Error Matrix	  196
 12.4  Results  	  198
 12.5  Discussion and Conclusions	  198
 12.6  Summary	  199
 12.7  References	  200
Appendix A: Classification Rules	  201

Chapter 13:   Mapping Spatial Accuracy and Estimating Landscape Indicators
              from Thematic Land-Cover Maps Using Fuzzy Set Theory	  203
 13.1  Introduction	  203
 13.2  Methods	  205
      13.2.1  Multi-Level Agreement	  206
      13.2.2  Spatial Accuracy Map	  208
      13.2.3  Degrees of Fuzzy Membership	  208
      13.2.4  Fuzzy Membership Rules 	  209
      13.2.5  Fuzzy Land-Cover Maps	  210
      13.2.6  Deriving Landscape Indicators	  212
 13.3  Results and Discussion	  212
 13.4  Conclusions	  218
 13.5  Summary	  219
                                                                                       XIII

-------
 13.6 Acknowledgments	 219
 13.7 References	 219

Chapter 14: Fuzzy Set and Spatial Analysis Techniques for
             Evaluating Thematic Accuracy of a Land-Cover Map	 223
 14.1  Introduction	 223
       14.1.1 Accuracy Assessment	 223
 14.2 Analysis of Reference Data  	 224
       14.2.1 Binary Analysis	 224
       14.2.2 Fuzzy Set Analysis  	 224
       14.2.3 Spatial Analysis	 225
 14.3  Background	 226
 14.4 Methodology	 227
       14.4.1 Reference Data  	 227
       14.4.2 Binary Analysis	 227
       14.4.3 Fuzzy Set Analysis  	 227
       14.4.4 Spatial Analysis	 228
 14.5  Results  	 230
       14.5.1 Binary Analysis	 230
       14.5.2 Fuzzy Set Analysis  	 233
       14.5.3 Spatial Analysis	 233
 14.6  Discussion	 238
 14.7  Summary	 239
 14.8  References	 240

Chapter 15: The Effects of Classification Accuracy on Landscape Indices 	 247
 15.1  Introduction	 247
 15.2  Methods	 248
       15.2.1 Relative Errors of Area (REA)	 249
 15.3  Results  	 252
 15.4  Discussion	 257
 15.5  Conclusions	 258
 15.6  Summary	 259
 15.7  Acknowledgments	 259
 15.8  References	 259

Chapter 16: Assessing Uncertainty in Spatial Landscape Metrics
             Derived from Remote Sensing Data  	 263
 16.1  Introduction	 263
 16.2  Background	 265
XIV

-------
 16.3  Methods	  265
      16.3.1   Precision of Landscape Change Metrics 	  265
      16.3.2   Comparing Class Definitions	  266
              16.3.2.1  Landsat Classifications	  266
              16.3.2.2  Aerial Photography Interpretations	  266
      16.3.3   Landscape Simulations	  267
              16.3.3.1  Ecotone Abruptness 	  267
              16.3.3.2  Fragmentation	  267
 16.4  Results	  269
      16.4.1   Precision of Landscape Metrics	  269
      16.4.2   Comparing Class Definitions	  270
              16.4.2.1  Comparing TM Classifications	  270
              16.4.2.2  Comparing Photographic Classifications	  271
      16.4.3   Landscape Simulations	  271
              16.4.3.1  Ecotone Abruptness 	  271
              16.4.3.2  Forest Fragmentation 	  271
 16.5  Discussion	  273
 16.6  Conclusions	  273
 16.7  Summary	  274
 16.8  Acknowledgments	  274
 16.9  References	  275

Chapter 17:  Components of Agreement Between Categorical Maps at
             Multiple Resolutions  	  277
 17.1  Introduction	  277
      17.1.1  Map Comparison	  277
      17.1.2  Puzzle Example	  278
 17.2  Methods	  279
      17.2.1  Example Data 	  279
      17.2.2  Data Requirements and Notation  	  280
      17.2.3  Minimum Function	  282
      17.2.4  Agreement Expressions and Information Components	  282
      17.2.5  Agreement and Disagreement	  285
      17.2.6  Multiple Resolutions	  287
 17.3  Results  	  289
 17.4  Discussion	  293
      17.4.1  Common Applications 	  293
      17.4.2  Quantity Information	  294
      17.4.3  Stratification and Multiple Resolutions  	  294
 17.5  Conclusions	  295
                                                                                          xv

-------
 17.6  Summary	  295
 17.7  Acknowledgments	  295
 17.8  References	  296

Chapter 18:  Accuracy Assessments of Airborne Hyperspectral Techniques for
             Mapping Opportunistic Plant Species in Freshwater Coastal Wetlands .  297
 18.1  Introduction	  298
 18.2  Background	  298
 18.3  Methods	  300
       18.3.1  Remote Sensor Data Acquisition and Processing	  301
       18.3.2  Field Reference Data Collection	  303
       18.3.3  Accuracy Assessment of Vegetation Maps	  306
 18.4  Results  	  306
       18.4.1  Field Reference Data Measurements  	  306
       18.4.2  Distinguishing Between Phragmites and Typha	  308
       18.4.3  Semi-Automated Phragmites Mapping	  309
       18.4.4  Accuracy Assessment	  311
 18.5  Discussion	  311
 18.6  Conclusions	  312
 18.7  Summary	  312
 18.8  Acknowledgments	  312
 18.9  References	  313

Chapter 19:  A Technique for Assessing the Accuracy of Sub-Pixel
             Impervious Surface  Estimates Derived from LandsatTM Imagery  	  315
 19.1  Introduction	  315
 19.2  Methods	  317
       19.2.1  Study Area	  317
       19.2.2  Data	  317
       19.2.3  Spatial Processing	  319
       19.2.4  Statistical Processing  	  321
 19.3  Results and Discussion	  321
 19.4  Conclusions	  324
 19.5  Summary	  324
 19.6  Acknowledgments	  325
 19.7  References	  325

Chapter 20:  Area and Positional  Accuracy of DMSP Nighttime Lights Data	  327
 20.1  Introduction	  327
 20.2  Methods	  330
XVI

-------
      20.2.1  Modeling a Smoothed OLS Pixel Footprint	 330
      20.2.2  OLS Data Preparation	 332
      20.2.3  Target Selection and Measurement	 332
20.3  Results 	 334
      20.3.1  Geolocation Accuracy	 334
      20.3.2  Comparison of OLS Lighting Areas to ETM+ Areas  	 336
      20.3.3  Multiplicity of OLS Light Detections	 336
20.4  Conclusions	 337
20.5  Summary	 338
20.6  Acknowledgments	 339
20.7  References	 339
                                                                                         XVII

-------
                                      Chapter 1

         Putting the Map Back in Map Accuracy Assessment
                                           by

                                   Russell G. Congalton
                                  Corresponding Author
                             Department of Natural Resources
                                     215 James Hall
                               University of New Hampshire
                                   Durham, NH 03824

                            Telephone:  (603) 862-4644
                             Facsimile:  (603) 862-4976
                                E-mail:  russ.congalton@unh.edu
1.1    Introduction

The need for assessing the accuracy of a map generated from any remotely sensed data has become
universally recognized as an integral project component. In the last few years, most projects have
required that a certain level of accuracy be achieved for the project and map to be deemed a success.
With the widespread application of Geographic Information Systems (GIS) employing remotely sensed
data as layers, the need for such an assessment has become even more critical. There are a number of
reasons why this assessment is so important including:

      • The need to perform a self-evaluation and to learn from your mistakes.
      • The ability to quantitatively compare method/algorithms/analysts.
      • The desire to use the resulting maps/spatial information in some decision-making process.

There are many examples in the literature as well as an overwhelming selection of anecdotal evidence to
demonstrate the need for accuracy assessment. Many different groups have mapped and/or quantified the
amount of tropical deforestation occurring in the South America or Southeast Asia. Estimates have
ranged by almost an order of magnitude. Which estimate is correct? Without a valid accuracy
assessment we may never know. Several federal, state, and local agencies have created maps of wetlands
in a county on the eastern shore of Maryland.  Techniques used to make these maps included satellite
                                                                               Page 1 of339

-------
imagery, aerial photography (various scales and film types), and ground sampling. Comparing the various
maps yielded little agreement about where wetlands actually existed.  Without a valid accuracy
assessment we may never know which of these maps to use.

It is no longer sufficient to just make a map using remotely sensed or other spatial data. It is absolutely
necessary to take some steps towards assessing the accuracy or validity of that map.  There are a number
of ways to investigate the accuracy/error in spatial data including but not limited to:  visual inspection,
non-site specific analysis, generating difference images, error budget analysis, and quantitative accuracy
assessment.

The goal of this chapter is to review the current knowledge of accuracy assessment methods and stimulate
the reader to further the progression of diagnostic techniques and information to support the appropriate
application of spatial data.  The ultimate objective is to motivate everyone to conduct or demand an
appropriate accuracy assessment or validation included as an essential metadata element.
1.2   Accuracy Assessment Overview

1.2.1  Historical Review

The history of accuracy assessment of digital remotely sensed data is relatively short, beginning in about
1975. Before 1975 maps derived from analog, remotely sensed data (i.e., photo interpretation) were
rarely subjected to any kind of quantitative accuracy assessment either.  Field checking was typically
performed as part of the interpretation process, but no overall map accuracy or other quantitative
measures of quality were typically incorporated into the analysis.  Only after photo interpretation began to
be used as reference data to compare maps derived from digital remote sensor data, did issues concerning
the accuracy of the photo interpretation arise.  All the accuracy assessment techniques mentioned in this
chapter can be applied to assessing the accuracy of both analog and digital remotely sensed data
(Congalton and Mead, 1983; Congalton et al., 1983).

The history of accuracy assessment can be effectively divided into four developmental epochs or ages. In
the beginning, no real accuracy assessment was performed but rather an "it looks good" mentality
prevailed. This approach is typical of many new, emerging technologies. Despite the maturing of the
technology over the last 25 years, some remote sensing analysts are still stuck in this mentality. Of
course, the map must "look good" before any further analysis should be performed. Why assess a map
that is obviously poor?  However, while "looking good" is a required characteristic, it is not sufficient for
a valid assessment.

The second age of accuracy assessment could be called the epoch of non-site specific assessment.  During
this period, overall acreage's were compared between ground estimates  and the map without regard for
location (Meyer et al.,  1975).  In some instances, such as imagery with very large pixels (e.g., AVHRR
imagery), a non-site specific assessment may be the best and/or only choice for validation. For most
imagery, the age of non-site specific assessment quickly gave way to the age of the site-specific
assessment (third age).  In a site-specific assessment, actual places on the ground (i.e., locations) were
Page 2 of 339

-------
compared to the same place on the map and a measure of overall accuracy (i.e., percent correct)
computed.

Finally, the fourth and current age of accuracy assessment could be called the age of the error matrix.
This epoch includes a significant number of analysis techniques, most importantly the Kappa analysis. A
brief review of the techniques and considerations of the error matrix age can be found below and is
described in more detail in Congalton and Green (1999).

1.2.2   Established Techniques and Considerations

Since the mid-1980s the error matrix has been accepted as the standard descriptive reporting tool for
accuracy assessment of remotely sensed data. The use of the error matrix has significantly improved our
ability to conduct accuracy assessments.  In addition, analysis tools including discrete multivariate
techniques have facilitated the comparison and development of various methodologies, algorithms, and
approaches. Many factors affect the compilation of the error matrix and must be considered when
designing any accuracy assessment. The current state of knowledge concerning the error matrix, analysis
techniques and some considerations are briefly reviewed here.


1.2.3   The Error Matrix

An error matrix is a square array of numbers organized in rows and columns which express the number of
sample units (i.e., pixels, clusters of pixels, or polygons) assigned to a particular category relative to the
actual category as indicated by the reference data (see Table 1-1). The columns typically represent the
reference data while the rows indicate the map generated from the remotely sensed data. Reference data
are assumed correct and can be collected from a variety of sources including: photo interpretation, ground
or field observation, and ground or field measurement.

The error matrix, once correctly generated, can be used as a starting point for a series of descriptive and
analytical statistical techniques.  The most common and simplest descriptive statistic is overall accuracy,
which is computed by dividing the total correct (i.e., the sum of the major diagonal) by the total number
of sample units in the error matrix. In addition, individual category accuracies can be computed in a
similar manner.  Traditionally, the total number of correct sample units in a category is divided by the
total number of sample units of that category from the reference data (i.e., the column total).  This
accuracy measure relates to the probability of a reference sample unit being correctly classified and is
really a measure of omission error. This accuracy measure is often called the "producer's accuracy"
because the producer of the classification is interested in how well a certain area can be classified.  On the
other hand, if the total number of correct sample units in a category is divided by the total number of
sample units that were classified into that category on the map (i.e., the row total), then this result is a
measure of commission error. This measure  is called "user's accuracy" or reliability, and is indicative of
the probability that a sample unit classified on the map actually represents that category on the ground
(Story and Congalton, 1986).
                                                                                      Page 3 of 339

-------
                          Table 1-1. Example error matrix.

Classified
Data

D
C
AG
SB
Column
Total
Reference Data
D
63
6
0
4
73
C
4
79
11
7
101
AG
22
8
85
3
118
SB
24
8
11
89
132
Row
Total
113
101
107
103
424
                                                            Land Cover Categories
                                                            D = deciduous
                                                            C = conifer
                                                            AG = agriculture
                                                            SB = shrub
                                                            Overall Accuracy =
                                                            (63+79+85+89)7424 =
                                                            316/424 = 75%
                        Producer's Accuracy
                           D =  63/73 = 86%
                           C= 79/101 = 78%
                         AG= 85/118=72%
                         SB= 89/132 = 67%
 User's Accuracy
 D= 63/113 = 56%
 C= 79/101 = 78%
AG = 85/107 = 79%
SB= 89/103 = 86%
1.2.4  Discrete Multivariate Analysis

In addition to these descriptive techniques, an error matrix is an appropriate beginning for many
analytical statistical techniques, especially discrete multivariate techniques. Starting with Congalton et al.
(1983), discrete multivariate techniques have been used for performing statistical tests on the
classification accuracy of digital remotely sensed data. Since that time many others have adopted these
techniques as the standard accuracy assessment tools (Rosenfield and Fitzpatrick-Lins, 1986; Hudson and
Ramm, 1987; Campbell, 1987; and Lillesand and Kiefer, 1994).

One analytical step to perform once the error matrix has been built is to "normalize" or standardize the
matrix using a technique known as "MARGFIT" (Congalton et al., 1983).  This technique uses an
iterative proportional fitting procedure that forces each row and column in the matrix to  sum to one. The
rows and column totals are called marginals, hence the technique name, MARGFIT.  In  this way,
differences in sample sizes used to generate  the matrices are eliminated and therefore, individual cell
values within the matrix are directly comparable.  Also, because the iterative process totals the rows and
columns, the resulting normalized matrix is  more indicative of the off-diagonal cell values (i.e., the errors
of omission and commission) than is the original matrix. The major diagonal of the normalized matrix
can be summed and divided by the total of the entire matrix to compute a normalized overall accuracy.

A second discrete multivariate technique of use in accuracy assessment is called Kappa (Cohen, 1960).
Kappa can be used as another measure of agreement or accuracy.  Kappa values can range from +1 to -1.
Page 4 of 339

-------
However, since there should be a positive correlation between the remotely sensed classification and the
reference data, positive values are expected.  Landis and Koch (1977) lumped the possible ranges for
Kappa into three groups:

     (1)  A value greater than 0.80 (i.e., 80%) represents strong agreement;

     (2)  a value between 0.40 and 0.80 (i.e., 40% - 80%) represents moderate agreement; and

     (3)  a value below 0.40 (i.e., 40%) represents poor agreement.


The equations for computing Kappa can be found in Congalton et al. (1983); Rosenfield and Fitzpatrick-
Lins (1986); Hudson and Ramm (1987), and Congalton and Green (1999), to list just a few. It should be
noted that the Kappa equation assumes a multinomial sampling model and that the variance is derived
using the Delta method (Bishop et al., 1975).

The power of the Kappa analysis is that it provides two statistical tests of significance. Using this
technique, it is possible to test if an individual land-cover (LC) map generated from remotely sensed data
is significantly better than if the map had been generated by randomly assigning labels to areas. The
second test allows for the comparison of any two matrices to see if they are statistically, significantly
different. In this way, it is possible to determine that one method/algorithm/analyst is different than
another one and based on a chosen accuracy measure (e.g.,  overall accuracy) to conclude which is better.


1.2.5   Sampling  Size and Scheme

Sample size is another important consideration when assessing the accuracy of remotely sensed data.
Each sample point collected is expensive and therefore sample size must be kept to a minimum and yet it
is critical to maintain a large enough sample size so that any analysis performed is statistically valid.
Many researchers, notably van Genderen and Lock (1977); Tortora (1978); Hay (1979); Hord and
Brooner (1976); Rosenfield et al. (1982); and Congalton (1988b), have published equations and
guidelines for choosing the appropriate sample size.  The majority of researchers have used an equation
based on the binomial distribution or the normal approximation to the binomial distribution to compute
the required sample size.  These techniques are statistically sound for computing the sample size needed
to compute the overall accuracy of a classification or the overall accuracy of a single category.  The
equations are based on the proportion of correctly classified samples (pixels, clusters, or polygons) and on
some allowable error.  However, these techniques were not designed to choose  a sample size for creating
an error matrix. In the case of an error matrix, it is not simply a matter of correct or incorrect. Given an
error matrix with n land-cover categories, for a given category there is one correct answer and n-1
incorrect answers.  Sufficient samples must be acquired to be able to adequately represent this confusion.
Therefore, the use of these techniques for determining the sample size for an error matrix is not
inappropriate. Instead, the use  of the multinomial distribution is recommended (Tortora,  1978).

Traditional thinking about sampling does not often apply because of the large number of pixels in a
remotely sensed image. For example, a 0.5% sample of a single Landsat Thematic Mapper (TM) scene
can be over 300,000 pixels. Most, if not all  assessments should not be performed on a per pixel basis
because of problems with exact single pixel  location.  Practical considerations more often dictate the
                                                                                     Page 5 of 339

-------
sample size selection.  A balance between what is statistically sound and what is practically attainable
must be found.  A generally accepted rule of thumb is to use a minimum of 50 samples for each LC
category in the error matrix. This rule also tends to agree with the results of computing sample size using
the multinomial distribution (Tortora, 1978).  If the area is especially large or the classification has a large
number of LC categories (i.e., more than 12 categories), the minimum number of samples should be
increased to 75 - 100 samples per category.

The number of samples for each category can also be weighted based on the relative importance of that
category within the objectives of the mapping or  by the inherent variability within each of the categories.
Sometimes it is better to concentrate the sampling on the categories of interest and increase their number
of samples while reducing the number of samples taken in the less important categories.  Also, it may be
useful to take fewer samples in categories that show little variability, such as water or forest plantations,
and increase the sampling in the categories that are more variable such as uneven-aged forests or riparian
areas. In summary, the goal is to balance the statistical recommendations to obtain an adequate sample
from which to generate an appropriate error matrix within the objectives, time, cost,  and practical
limitations of the mapping project.

Along with sample size, sampling scheme is an important part of any accuracy assessment. Selection of
the proper scheme is absolutely critical to generating an error matrix that is representative of the entire
classified image. Poor choice in sampling scheme can result in significant biases being introduced into
the error matrix that may over or under estimate the true accuracy. In  addition, the use of the proper
sampling scheme may be essential depending on  the analysis techniques to be applied to the error matrix.

Many researchers have expressed opinions about the proper sampling scheme to use, including everything
from simple random sampling to stratified systematic unaligned sampling. Despite all these opinions,
very little work has actually been performed in this area. Congalton (1988b) performed sampling
simulations on three spatially diverse areas (forest, agriculture, and rangeland) and concluded that in  all
cases simple random without replacement and stratified random sampling provided satisfactory results.
Despite the desirable statistical properties of simple random sampling, this sampling scheme is not always
that practical to apply. Simple random sampling tends to under sample small but possibly very important
areas unless the sample size is significantly increased.  For this reason, stratified random  sampling is
recommended where a minimum number of samples are selected from each strata (i.e., category).  Even
stratified random sampling can be somewhat impractical because of having to collect ground information
for the accuracy assessment at random locations on the ground.

Two difficult problems arise when using random locations: (1) the location can be very difficult to
access; and (2) they can only be selected after the classification has been performed. This second
condition limits the accuracy assessment data to being  collected late in the project instead of in
conjunction with the training data collection, thereby increasing the costs of the project.  In addition,  in
some projects, the time between the project beginning  and the accuracy assessment may be so long as to
cause temporal problems in collecting reference data.
Page 6 of 339

-------
1.2.6  Spatial Autocorrelation

Spatial autocorrelation is said to occur when the presence, absence or degree of a certain characteristic
affects the presence, absence or degree of the same characteristic in neighboring units (Cliff and Ord,
1973). This condition is particularly important in accuracy assessment if an error at a certain location can
be found to positively or negatively influence errors at surrounding locations (Campbell, 1981).  Work by
Congalton (1988a) on Landsat MSS data from three areas of varying spatial diversity (agriculture, range,
and forest) showed a positive influence as much as 30 pixels (1.8 km) away. More recent work by Pugh
and Congalton (2001) using Landsat TM data in a forested environment showed similar issues with
spatial autocorrelation.  These results affect the choice of sample size and especially sampling scheme
used in the accuracy assessment.
1.3    Current Issues and Needs

1.3.1  Sampling Issues

The major sampling issue of importance today is the choice of the sample unit. Historically, a single
pixel has often been chosen as the sample unit.  However, it is extremely difficult to know exactly where
that pixel is on the reference data especially when the reference data are generated on the ground (using
field work).  Despite recent advances in Global Positioning System (GPS) technology, it is very rare to
achieve adequate location information for a single pixel. Many times the GPS unit is used under dense
forest canopy and the GPS signals are weak or absent. Location becomes even more problematic with the
new high-spatial resolution sensors such as Space Imaging IKONOS or Digital Globe imagery with pixels
as small as 1.0 m. Also, it is nearly impossible to match the corners of a pixel on an image to the ground
despite our best registration algorithms. Therefore, using a single pixel as the sampling unit can cause
much of the  error represented in the error matrix to be positional error rather than thematic error. Since
the goal of the error matrix is to measure thematic error, it is best to take steps to avoid including
positional error. Single pixels should not be used for the sample unit. Instead, some cluster of pixels or a
polygon should be chosen.


1.3.2  Edge and Boundary Effects

Traditionally, accuracy assessment has been performed to avoid the boundaries between different LC
classes by taking samples near the center of each polygon or at least away from the edges. Avoiding the
edges also helps to minimize the locational error as discussed in the last section. Where exactly to draw
the line between different cover types on the ground is very subjective.  Most LC or vegetation maps
divide a rather continuous environment called Earth into a number of discrete categories. The number of
categories varies with the objective of the mapping and our ability to separate different categories
depends on the variability within and between each category.

All this information should be represented in a well-defined, mutually exclusive and totally exhaustive
classification scheme. However, in many instances, it would be useful to know more about the
boundaries or edges of different LC types. For example, when performing change detection (i.e., looking
                                                                                    Page 7 of 339

-------
for changes over time), it is important to know if real change exists and that change is going to occur
along the boundaries between cover types. Therefore, it is important that more research and study be
undertaken to better understand this boundary and edge issue.


1.3.3  Reference Data Collection

Reference data are typically assumed to be correct and are used to evaluate the results of the LC mapping.
If the reference data are wrong, then the LC map will be unfairly judged.  If the reference data are
inefficiently collected then the project may suffer from unnecessarily high costs or an insufficient number
of samples to properly evaluate the results.  Reference data are a critical, very expensive, and yet often
overlooked component of any spatial analysis.  For example, aerial photo interpretation is often used as
reference data for assessing a LC map generated from digital satellite imagery. The photo interpretation
is assumed correct because it often has greater spatial resolution than the satellite imagery and because
photogrammetry has become a time-honored skill that is accepted as accurate.  However, photo
interpretation is subjective and can be significantly wrong. If the interpretation is wrong, then the results
of the accuracy assessment could indicate that the satellite-based map is of poor accuracy when actually it
is the reference data that is inappropriate.

There are numerous examples in the literature documenting problems with collecting improper or
inadequate reference data.  One especially insidious problem with reference data collection is the size of
the sample area in which to collect the reference data. Clearly,  it is important to collect reference
information that is representative of the mapped area. In other words, if the map is generated with
remotely sensed data that has 30 x 30 m pixels,  it does not make sense to collect reference data for a 5.0
m2 area. A current example of this situation is the use of the Forest Inventory and Analysis (FIA) plots
collected by the U.S. Forest Service across the country.  It is important that these inventory plots be  large
enough to provide valid reference data.

The opposite situation must also be carefully monitored.  For example, it is not appropriate to assess the
accuracy of a 1.0 ha mapping unit with 5.0 ha reference data. The reference data must be collected  with
the pixel size and/or the  minimum mapping unit of the map in mind. Additionally, the same exact
classification scheme using the same exact rules must be used to label the reference data as was used to
generate the map. Otherwise, errors will be introduced by classification scheme (definitional) differences
and the error matrix created will not be indicative of the true accuracy of the map.  Using well design
field forms that steps the collector through the process can be very helpful in ensuring that the reference
data are collected at the proper scale and with the same or appropriate classification scheme to accurately
assess the map.

1.3.4  Beyond the Error Matrix: Fuzzy Assessment

As remote sensing projects have grown in complexity, so have the associated classification schemes. The
classification scheme then becomes a very important factor influencing the  accuracy of the entire project.
Recently, papers have appeared in the literature that point out some of the limitations of using only an
error matrix with a complex classification scheme. A paper by Congalton and Green (1993) recommends
the error matrix as a jumping off point for identifying sources of confusion  and not just error  in the
Page 8 of 339

-------
remotely sensed classification. For example, the variation in human interpretation can have a significant
impact on what is considered correct and what is not.  As previously mentioned, if photo interpretation is
used as the reference data in an accuracy assessment and the interpretation is not completely correct, then
the results of the accuracy assessment will be very misleading. The same statements are true if ground
observations, as opposed to ground measurements, are used as the reference data set.  As classification
schemes get more complex, more variation in human interpretation is introduced. Other factors beyond
just variation in interpretation are important also.

In order to deal with ambiguity/variation in remotely sensed maps, Gopal and Woodcock (1994) proposed
the use of fuzzy sets to "allow for explicit recognition of the possibility that ambiguity might exist
regarding the appropriate map label." In such an approach, it is recognized that instead of a simple
system of correct (agreement) and incorrect (disagreement) there can be a variety of responses such as:
absolutely right, good answer, acceptable, understandable but wrong, and absolutely wrong. This
approach deals well with the ambiguity issue. However, the results are not presented in a standard error
matrix format. Therefore, Congalton and Green (1999) and Green and Congalton (2003) presented a
fuzzy assessment methodology that not only deals with variation/ambiguity, but also allows for the results
of the assessment to be presented in an error matrix.

 1.3.5   Error Budget Analysis

Over the last 25 years, many papers have been written about the quantification of error associated with
remotely sensed and other spatial data (Congalton and Green,  1999). As documented in this chapter, our
ability to quantify the total error in a spatial data set has developed substantially. However, little has been
done to partition this error into its component parts and construct an error budget. Without this division
into parts, it is not possible to evaluate or analyze the  impact a specific error has on the entire mapping
project. Therefore, it is not possible to  determine which components contribute the most errors or which
are most easily corrected. Some early work in this area was demonstrated in a paper by Lunetta et al.
(1991) and resulted in an often-cited diagram that lists the sources of error accumulating throughout a
remote sensing project.

It should be noted that each of the major error sources adds to the total error budget separately, and/or
through a mixing process. It is no longer sufficient to always just evaluate the total error.  For many
applications, there is a definite need to identify and understand: (1) error sources; and (2) the appropriate
mechanisms for controlling, reducing, and reporting errors.  Perhaps the simplest way to begin to look at
an error budget is to create a special error budget analysis table (Congalton and Green, 1999). This table
is generated, column-by-column, beginning with a listing of the possible sources of error for the project.
Once the various components that comprise the total error are listed, then each component can be
assessed to determine its contribution to the overall error. Next, our ability to deal  with this error is
evaluated.  It should be noted that some errors may be very large but are easy to correct while others may
be rather small.  Finally, an error index can be created directly by multiplying the error contribution
potential by the error control difficulty. Combining these two factors allows one to establish priorities for
best dealing with individual errors within a mapping project. A template to be used to conduct just such
an error budget analysis is presented in Table 1-2.
                                                                                       Page 9 of 339

-------
                  Table 1-2. Template for conducting an error budget analysis.
Error Source





Error
Contribution
Potential





Error
Control
Difficulty





Error
Index





Error
Priority





               Error Contribution Potential:
     Relative potential for this source as
     contributing factor to the total error
     (1 = low, 2 = medium, and 3 = high).
               Error Control Difficulty:
Given the current knowledge about this
source, how difficult is controlling the error
contribution (1 = not very difficult to 5 = very
difficult).
               Error Index:  An index that represents the combination of error potential
                            and error difficulty.

               Error Priority.  Order in which methods should be implemented to
                              understand, control, reduce, and/or report the error due
                              to this source based on the error index.
1.3.6  Change Detection Accuracy Assessment

Much has recently been written in the literature about change detection (Lunette and Elvidge, 1998;
Khorram et al., 1999). This technique is an extremely popular and powerful use of remotely sensed data.
Assessing the accuracy of a change detection analysis has all the issues, complications and difficulties of
a single date assessment plus many additional, unique problems.  For example, how does one obtain
information on reference data for images/maps from the past? Likewise, how can one sample enough
areas that will change in the future to generate a statistically valid assessment? Most of the studies on
change detection do not present any quantitative results with their work. Without the desired accuracy
assessment, it is difficult to determine which change detection methods are best and should be applied to
future projects.

Congalton et al. (1993) provided the first example comparing a single date and change detection error
matrix. It should be noted that if a single date error matrix has n map categories, then a change detection
error matrix would contain n2 map categories. This is because we are no longer dealing with a single
classification, but rather a change between two different classifications generated at different times. In a
Page 10 of 339

-------
single date error matrix there is one row and column for each map category.  However, in a change
detection error matrix the question of interest is, "What category was this area at T, versus T2?" This
comparison uses the exact same logic as for the single classification error matrix; it is just complicated by
the two time periods (i.e., the change).  As always, the major diagonal indicates correct classification
while the off-diagonal elements indicate the errors or confusion.
1.4   Summary

Validation or accuracy assessment is an integral component of most mapping projects incorporating
remotely sensed data. In fact, this topic has become so important as to spawn regular conferences and
symposia. This emphasis on data quality was not always the case. In the 1970s, only a few enlightened
scientists and researchers dared ask the question, "How good is this map derived from Landsat MSS
imagery?" In the 1980s, the use of the error matrix became a common tool for representing the accuracy
of individual map categories. By the 1990s, most maps derived from remotely sensed imagery were
required to meet some minimum accuracy standard. Now, it is important that with all the statistics and
spatial analysis available to us that we do not lose track of the primary goal of why we perform an
accuracy assessment in the first place.

This chapter presented a review of techniques and considerations necessary to assess or validate maps
derived from remotely sensed and other spatial  data.  Although it is important to perform a visual
examination of the map, it is not sufficient.  Other techniques, such as non-site specific analysis and
difference images, can help. Error budgeting is a very useful exercise in helping to realize error and
consider ways to minimize it. Quantitative accuracy assessment provides a very powerful mechanism for
both descriptive and analytical evaluation of the spatial data. However, given all these techniques and
considerations, it is most important that we remember why we are performing the accuracy assessment in
the  first place.

Both as makers and users of our maps, our goal is to make the best map possible for a given objective.
To achieve this goal we must not  get lost in all  the statistics and analyses, but must apply the correct
analysis techniques and use the proper sampling approaches.  However, all these things will do us no
good if we forget about the map we are trying to assess. We must "put the map back in the map
assessment process." We must do everything we can to ensure that the assessment is valid for the map
and not simply a statistical exercise. It is key that the reference data match the map data not only in
classification scheme, but in sampling unit (i.e., minimum mapping unit) as well.  It is also important that
we make every effort to collect accurate and timely reference data. Finally, there is still much to do.
Many maps generated from remotely sensed data still have no validation or accuracy assessment. There
are  numerous steps that can be taken to evaluate how good a map is.  Now we must move past the age of
"it looks good" and move towards the more quantitative assessments outlined in this chapter.
                                                                                    Page 11 of 339

-------
1.5   References

Bishop, Y., S. Fienberg, and P. Holland. Discrete Multivariate Analysis - Theory and Practice. MIT
    Press, Cambridge, MA, 575 p., 1975.

Campbell, J. Spatial autocorrelation effects upon the accuracy of supervised classification of land cover.
    Photogrammetric Engineering and Remote Sensing, 47,355-363, 1981.

Campbell, J. Introduction to Remote Sensing. Guilford Press, New York, NY, 551 p., 1987.

Cliff, A.D., and J.K. Ord.  Spatial Autocorrelation. Pion Limited, London, England, 178p., 1973.

Cohen, J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement,
    20,37-46, 1960.

Congalton,  R.G., and R.A. Mead.  A quantitative method to test for consistency and correctness in photo-
    interpretation. Photogrammetric Engineering and Remote Sensing, 49, 69-74,  1983.

Congalton,  R.G., R.G. Oderwald, and R.A. Mead.  Assessing Landsat classification accuracy using
    discrete multivariate statistical techniques. Photogrammetric Engineering and Remote Sensing, 49,
    1671-1678, 1983.

Congalton,  R.G.  Using spatial autocorrelation analysis to explore errors in maps generated from remotely
    sensed data.  Photogrammetric Engineering and Remote Sensing, 54, 587-592, 1988a.

Congalton, R.G.  A comparison of sampling schemes used in generating error matrices for assessing the
    accuracy of maps generated from remotely sensed data. Photogrammetric Engineering and Remote
    Sensing, 54,  593-600, 1988b.

Congalton R., and K. Green. A practical look at the sources of confusion in error matrix generation.
    Photogrammetric Engineering and Remote Sensing, 59,641-644, 1993.

Congalton,  R., R. Macleod, and F. Short. Developing accuracy assessment procedures for change
    detection analysis. Final Report submitted to NOAA CoastWatch Change Analysis Program,
    Beaufort, NC, 57 p., 1993.

Congalton,  R. and K. Green. Assessing the Accuracy of Remotely Sensed Data: Principles and Practices.
    CRC/Lewis Press, Boca Raton, FL, 137 p., 1999.

Gopal, S., and C. Woodcock.  Theory and methods for accuracy assessment of thematic maps using fuzzy
    sets.  Photogrammetric Engineering and Remote Sensing, 60,  181-188, 1994.

Green, K., and R. Congalton.  An error matrix approach to fuzzy accuracy assessment: The NIMA
    Geocover project example.  Geospatial Data Accuracy Assessment. R. Lunetta and J. Lyon (Editors),
    U.S. Environmental Protection Agency, Report No. EPA/600/R-03/064, 335 p., 2003.

Hay, A.M.  Sampling designs to test land-use  map accuracy. Photogrammetric Engineering and Remote
    Sensing, 45,  529-533, 1979.
Page 12 of 339

-------
Hord, R.M., and W. Brooner. Land use map accuracy criteria. Photogrammetric Engineering and
    Remote Sensing, 42, 671-677, 1976.

Hudson, W., and C. Ramm.  Correct formulation of the kappa coefficient of agreement. Photogrammetric
    Engineering and Remote Sensing, 53, 421-422, 1987.

Khorram, S., G. Biging, N. Chrisman, D. Colby, R. Congalton, J. Dobson, R. Ferguson, M. Goodchild, J.
    Jensen, and T. Mace. Accuracy Assessment of Remote Sensing-Derived Change Detection.  A
    Monograph published by the American Society for Photogrammetry and Remote Sensing, Bethesda,
    MD, 64 p., 1999.

Landis, J., and G. Koch. The measurement of observer agreement for categorical data. Biometrics, 33,
    159-174, 1977.

Lillesand, T., and R. Kiefer.  Remote Sensing and Image Interpretation (Third Edition). John Wiley and
    Sons, New York, NY, 750 p., 1994.

Lunetta, R., R. Congalton, L. Fenstermaker, J. Jensen, K. McGwire, and L. Tinney. Remote sensing and
    geographic information system data integration: error sources and research issues.  Photogrammetric
    Engineering and Remote Sensing, 57, 677-687, 1991.

Lunetta, R.L., and C.D. Elvidge (Editors). Remote Sensing Change Detection: Environmental Monitoring
    Methods and Applications.  Taylor and Francis, London, UK, 318 p.,  1998.

Meyer, M., J. Brass, B. Gerbig, and F. Batson. ERTS data applications to surface resource surveys of
    potential coal production lands in southeast Montana.  IARSL Final Research Report 75-1,
    University of Minnesota, 24 p., 1975.

Pugh, S., and R. Congalton.  Applying spatial autocorrelation analysis to evaluate error in New England
    forest cover type maps derived from Landsat Thematic Mapper Data.  Photogrammetric Engineering
    and Remote Sensing, 67(5), 613-620, 2001.

Rosenfield, G.H., K. Fitzpatrick-Lins, and H. Ling. Sampling for thematic map accuracy testing.
    Photogrammetric Engineering and Remote Sensing, 48, 131- 137, 1982.

Rosenfield, G., and K. Fitzpatrick-Lins. A  coefficient of agreement as a measure of thematic
    classification accuracy.  Photogrammetric Engineering and Remote Sensing, 52, 223-227, 1986.

Story, M., and R. Congalton. Accuracy assessment:  A user's perspective. Photogrammetric Engineering
    and Remote Sensing, 52, 397-399, 1986.

Tortora, R. A note on sample size estimation for multinomial populations. The American Statistician, 32,
    100-102, 1978.

van Genderen, J.L., and B.F. Lock. Testing land use map accuracy.  Photogrammetric Engineering and
    Remote Sensing, 43, 1135-1137,1977.
                                                                                   Page 13 of 339

-------
Page 14 of 339

-------
                                      Chapter 2

             Sampling Design for Accuracy Assessment of
Large-Area,  Land-Cover Maps:  Challenges and Future Directions
                                           by
                        Stephen V. Stehman, Corresponding Author
                   SUNY College of Environmental Science and Forestry
                                      320 Bray Hall
                                   Syracuse, NY 13210

                              Telephone: (315)470-6692
                               Facsimile: (315)470-6535
                                   E-mail: svstehma@svr.edu
2.1    Introduction

This chapter focuses on the application of accuracy assessment as a final stage evaluation of the thematic
quality of a land-cover (LC) map covering a large region such as a state or province, country, or
continent. The map is assumed classified according to a crisp or hard classification scheme, as opposed
to a fuzzy classification scheme (Foody, 1999). The standard protocol for accuracy assessment is to
compare the map LC label to the reference label at a sample of locations, where the reference label is
assumed to be correct. The source of reference data may be aerial photography, ground visit, or
videography. Discussion will be limited to the case in which the assessment unit for comparing the map
and reference label is  a pixel. Similar issues apply to sampling both pixels and polygons, but a greater
assortment of design options has been developed for pixel-based assessments. Most of the chapter will
focus on site-specific accuracy, which is accuracy determined on a pixel-by-pixel basis. In contrast, non-
site-specific accuracy  provides a comparison aggregated over some spatial extent. For example, in a non-
site-specific assessment, the area of forest mapped for a county would be compared to the true area of
forest in that county.  Errors of omission for a particular class may be compensated for by errors of
commission from other classes such that non-site-specific accuracy may be high even if site-specific
accuracy is poor. Site-specific accuracy may be viewed as spatially explicit, whereas non-site-specific
accuracy addresses map  quality in a spatially aggregated framework.

A sampling design is a set of rules for selecting which pixels will be visited to obtain the reference data.
Congalton (1991), Janssen and van der Wei (1994), Congalton and Green (1999), and Stehman (1999)
                                                                              Page 15 of 339

-------
provide overviews of the basic sampling designs available for accuracy assessment. Although these
articles describe designs that may serve well for small-area, limited objective assessments, they do not
convey the broad diversity of design options that must be drawn upon to meet the demands of large-area
mapping efforts with multiple accuracy objectives. An objective here is to expand the discussion of
sampling design to encompass alternatives available for more demanding, complex accuracy assessment
problems.

The diversity of accuracy assessment objectives makes it important to specify which objectives a
particular assessment is designed to address.  Objectives may be categorized into three general classes:
(1) description of the accuracy of a completed map; (2) comparison of different classifiers; and
(3) assessment of sources of classification error. This chapter focuses on the descriptive objective.
Recent examples illustrating descriptive accuracy assessments of large-area LC maps include Edwards et
al. (1998), Muller et al. (1998), Scepan (1999), Zhu et al. (2000), Yang et al. (2001), and Laba et al.
(2002). The foundation of a descriptive accuracy assessment is the error matrix and the variety of
summary measures computed from the error matrix such as overall, user's and producer's accuracies,
commission and omission error probabilities, measures of chance-corrected agreement, and measures of
map value or utility.

Additional descriptive objectives are often pursued. Because classification schemes are often hierarchical
(Anderson et al., 1976), descriptive summaries may be required for each level of the hierarchy.  For large-
area LC maps, there is frequently interest in accuracy of various sub-regions (for example, a  state or
province within a national map, or a county or watershed within a state or regional map). Each identified
sub-region could be characterized by an error matrix and accompanying summary measures.  Describing
spatial patterns of classification error is yet another objective. Reporting accuracy for various subsets of
the data, for example, homogeneous 3x3 pixel blocks, edge pixels, or interior pixels may address this
objective. Another potential objective would be to describe accuracy for various aggregations of the data.
For example, if a map constructed with a 30 m pixel resolution is converted to a 90 m pixel resolution,
what is the accuracy of the 90 m product? Lastly, non-site-specific accuracy may be of interest. For
example, if a primary application of the map were to provide LC proportions for a 5 x 5 km spatial unit
(e.g., Jones et al., 2001), non-site-specific accuracy would be of interest. Non-site-specific accuracy has
typically been thought of as applying to the entire map (Congalton and Green, 1999). However, when
viewed in the wider context of how maps are used, non-site-specific accuracy at various spatial extents
becomes relevant.

The basic elements of a statistically rigorous sampling strategy are encapsulated in the specification of a
probability sampling design, accompanied by consistent estimation following principles of Horvitz-
Thompson estimation.  These fundamental characteristics of statistical rigor are detailed in Stehman
(2001). Choosing a sampling design for accuracy assessment may be guided by the following additional
design criteria:  (1) adequate precision for key estimates; (2) cost-effective; and (3) appropriately simple
to implement and analyze (Stehman, 1999).  These criteria hold whether the reference data are crisp or
fuzzy and will be prioritized differently for different assessments.  Because these criteria often lead to
conflicting design choices, the ability to compromise among criteria is a crucial element of the art of
sampling design.
Page 16 of 339

-------
2.2   Meeting the Challenge of Cost-Effective Sampling Design

Effective sampling practice requires constructing a design that affords good precision while keeping costs
low. Strata and clusters are two basic sampling structures available in this regard, and often both are
desirable in accuracy assessment problems. Unfortunately, implementing a design incorporating both
features may be challenging. This topic will be addressed in the next subsection.  A second approach to
enhance cost-effectiveness is to use existing data or data collected for purposes other than accuracy
assessment (e.g., for environmental monitoring). This topic is addressed in the second subsection.

2.2.1   Strata Versus Clusters: The Cost Versus Precision Paradox

The objective of precise estimation of class-specific accuracy is a prime motivation for stratified
sampling. In the typical implementation of stratification in accuracy assessment, the mapped LC classes
define the strata, and the design is tailored to  enhance precision of estimated user's accuracy or
commission error.  Stratified sampling requires all pixels in the population  to be identified with a stratum.
If the map is finished, stratifying by mapped LC class is readily accomplished. Geographic stratification
is also commonly used in accuracy assessment. It is motivated by an objective specifying accuracy
estimates for key geographic regions (e.g., an administrative unit such as a  state or an ecological unit such
as an ecoregion), or by an objective specifying a spatially well-distributed sample. It is possible, though
rare, to stratify by the cross-classification of land-cover class by geographic region.  The drawback of this
two-way stratification is that resources are generally not sufficient to obtain an adequate sample size to
estimate accuracy precisely in each stratum (e.g., Edwards et al.,  1998).

The rationale for cluster sampling is to obtain cost-effectiveness by sampling pixels in groups defined by
their spatial proximity. The decrease in the per-unit cost of each sample pixel achieved by cluster
sampling may result in more precise accuracy estimates depending on the spatial pattern of classification
error. Cluster sampling is a means by which  to obtain spatial control (distribution) over the sample. This
spatial control can occur at two scales, termed regional and  local. Regional spatial control refers to
limiting the macro-scale spatial distribution of the sample, whereas local spatial control reflects the
logical consequence that sampling several spatially proximate pixels requires little additional effort
beyond that needed to sample a single pixel. Examples of clusters achieving regional control over the
spatial distribution of the sample include a county, quarter-quad, or 6 x 6 km area. Examples of design
structures used to implement local control include blocks of pixels (e.g., 3  x 3 or 5 x 5 pixel blocks),
polygons of homogeneous LC or linear clusters of pixels. Both regional and local controls are designed
to reduce costs, and for either option, the assessment unit is still an individual pixel.

Regional spatial control is designed to control travel costs or reference data material costs.  For example,
if the reference data consist of interpreted aerial photography, restricting the sample to a relatively small
number of photos will reduce cost. If the reference data are collected by ground visit, regional control
can limit travel to within  a much smaller total area (e.g., within a sample of counties or 6 x 6 km blocks,
rather than among all counties or 6 x  6 km blocks). When used alone, local spatial control may not
achieve these cost advantages. For example, a simple random or systematic sample of 3 x 3 pixel blocks
providing local spatial control may be widely dispersed across the landscape, therefore requiring many
photos or extensive travel to reach the sample clusters.
                                                                                      Page 17 of 339

-------
In practice, both regional and local control may be employed in the same design. The most likely
combination in such a multi-stage design would be to exercise regional control via two-stage cluster
sampling, and local control via one-stage cluster sampling, as follows.  Define the primary sampling unit
as the cluster constructed to obtain regional spatial control (e.g., a 6 x 6 km area). The secondary
sampling unit would be chosen to provide the desired local spatial control (e.g., 3x3 block of pixels).
The first-stage sample consists of PSUs, but not every 3x3 block in each sampled PSU is observed.
Rather, a second-stage sample of 3 x 3 blocks would be selected from those available in the first-stage
sample. The 3x3 blocks would not be further subsampled, but instead, reference data would be obtained
for all 9 pixels of the 3x3 cluster.

Stratifying by LC class can directly conflict with clustering. The essence of the problem is illustrated by
a simple example.  Suppose the clusters are 3 x 3 blocks of pixels that, when taken together, partition the
mapped region.  The majority of these clusters will not consist of nine pixels all belonging to the same LC
class. Stratified sampling directs us to select individual pixels from each LC class, in opposition to
cluster sampling in which the selection protocol is based on a group of pixels.  Because cluster sampling
selects groups of pixels, we forfeit the control over the sample allocation that is sought by stratified
sampling. It is possible to sample clusters via a stratified design, but it is the cluster, not the individual
pixel, that must determine stratum membership.

A variety of approaches to circumvent this conflict between stratified and cluster sampling can be posed.
One that should not be considered is to restrict the sample to only homogeneous 3x3 clusters.  This
approach clearly results in a sample that cannot be considered representative of the population, and it is
well known that sampling only homogeneous areas of the map tends to inflate accuracy (Hammond and
Verbyla, 1996).  A second approach, and one that maintains the desired statistical rigor of the sampling
protocol, is to employ two-stage cluster sampling in conjunction with stratification by LC class. A third
approach in which the clusters are re-defined to permit stratified selection will also be described.

The sampling design implemented in the accuracy assessment of the National Land Cover Data (NLCD)
map illustrates how cluster sampling and stratification can be combined to achieve cost-effectiveness and
precise class-specific estimates (Zhu et al., 2000; Yang et al., 2001; Stehman et al., 2003). The NLCD
design was implemented across the United States using ten regional assessments based on the U.S.
Environmental Protection Agency's (U.S.  EPA) federal administrative regions.  Within a single region,
the NLCD assessment was designed to provide regional spatial control and stratification  by LC class. For
several regions, the primary sampling unit (PSU) was constructed from non-overlapping, equal-sized
areas of National Aerial Photography Program (NAPP) photo-frames, and in other regions, the PSU was a
6 x 6 km spatial unit.  Both PSU constructions were designed to reduce the number of photos that would
need to be purchased for reference data collection.  A first stage sample of PSUs was selected at a
sampling rate of approximately 2.0%. Stratification by LC class was implemented at the second stage of
the design. Mapped LC classes were used to stratify all pixels found within the first-stage sample PSUs.
A simple random sample of pixels from each stratum was then selected, typically with  100 pixels per
class.  This design proved effective for ensuring that all LC classes, including the rare classes, were
represented adequately so that estimates of user's accuracies were reasonably precise. The clustering
feature implemented to achieve regional control succeeded at reducing costs considerably.
Page 18 of 339

-------
2.2.2  Flexibility of the A/LCD Design

The flexibility of the NLCD design permits other options for selecting a second-stage sample. An
alternative second-stage design could improve precision of the NLCD estimates (Stehman et al., 2000b),
but such improvements are not guaranteed and would be gained at some cost. Precision for the rare LC
classes is the primary consideration.  Often the rare class pixels cluster within a relatively small number
of PSUs. The simple random selection within each class implemented in the second stage of the NLCD
design will result in a sample with representation proportional to the number of pixels of each class within
each PSU. That is, if many of the pixels of a rare class are found in only a few first-stage PSUs, many of
the 100 second-stage sample pixels would fall within these same few PSUs.  This clustering could result
in poor precision for the estimated accuracy of this class. Ameliorating this concern is the fact that the
NLCD clustering is at the regional level of control. The PSUs were large (e.g., 6x6 km), so pixels
sampled within  the same PSU will not necessarily exhibit strong infra-cluster correlation. In the case of
weak intra-cluster correlation of classification error, cluster sampling will not result in precision
significantly different from a simple random sample of the same size (Cochran, 1977).

Two alternatives may counter the clustering effect for rare class pixels. One is to select a single pixel at
random from 100 first-stage PSUs containing at least one pixel of the rare class.  If the class is present in
more  than 100 PSUs, the first-stage PSUs could be sub-sampled to reduce the eligible set to 100. If fewer
than 100 PSUs contain the rare class, the more likely scenario, the situation is slightly more complicated.
A fixed number of pixels may be sampled from each first-stage PSU containing the rare class so that the
total sample size for the rare class is maintained at 100. The complication is choosing the sample size for
each PSU. This will depend on the number of eligible first-stage PSUs, and also on the number of pixels
of the class in the PSU. This design option counters the potential clustering effect of rare class pixels by
forcing the second-stage sample to be widely dispersed among the eligible first-stage PSUs. In contrast to
the outcome of the NLCD, PSUs containing a large proportion of the rare class will  not receive the
majority of the second-stage sample.

The second option to counter clustering of the sample into a few PSUs is to construct a "self-weighting"
design (i.e., an equal probability sampling design in which all pixels have the same probability of being
included in the sample).  The term self-weighting arises from the fact that the analysis requires no
weighting to account for different inclusion probabilities. At the first-stage, 100  sample PSUs would be
selected with inclusion probability proportional to the number of pixels of the specified rare class in the
PSU. A wide variety of probability proportional to size designs exists, but simplicity would be the
primary consideration when selecting the design for an accuracy assessment application. At the second
stage, one pixel would be selected per PSU. A consequence of this two-stage protocol is that within each
LC stratum, each pixel has an equal probability of being included in the sample (Sarndal et al., 1992), so
no individual pixel weighting is needed for the user accuracy estimates. The design goal of distributing
the sample pixels among 100 PSUs is also achieved.
                                                                                     Page 19 of 339

-------
2.2.3   Comparison of the Three Options

Three criteria will be used to compare the NLCD design alternatives: (1) ease of implementation; (2)
simplicity of analysis; and (3) precision.  The actual NLCD design will be designated as "Option 1,"
sampling one pixel from each of 100 PSUs will be "Option 2," and the self-weighting design will be
referred to as "Option 3." Options 1 and 2 are the easiest to implement, with Option 3 being the most
complicated because of the potentially complex, unequal probability first-stage protocol.  Not only would
such a first-stage design be more complex than what is typically done in accuracy assessment, Option 3
requires much more effort because we need the number of pixels of each LC class within each PSU in the
region.

Options 1 and 3 share the characteristic of being self-weighting within LC strata. Self-weighting designs
are simpler to analyze, although survey sampling computational software would mitigate this analysis
advantage. Option 2 is not self-weighting, as demonstrated by the following example.  Suppose a first-
stage PSU has 1,000 pixels of the rare class and another PSU has  20 pixels of this class. At the first stage
under Option 2, both PSUs have an equal chance of being selected. At the second stage, a pixel in the
first PSU has a probability of 1/1,000 of being chosen, whereas a  pixel in the second PSU has a 1/20
chance of being sampled. Clearly, the probability of a pixel being included in the sample is dependent
upon how many other pixels of that class are found within the PSU. The appropriate estimation weights
can be derived for this unequal probability design, but the analysis is complicated.

In addition to evaluating options based on simplicity,  we would like to compare precision of the different
options.  Unfortunately, such an evaluation would be  difficult, requiring either complicated theoretical
analysis, or extensive simulation studies based on acquiring reasonably good approximations to spatial
patterns of classification error. A key point of this discussion of design alternatives for two-stage cluster
sampling is that while the problem can be simply stated and the objectives for what needs to be achieved
are clear, determining an optimal solution is elusive.  Simple changes in sampling protocol may lead to
complications in the analysis, whereas maintaining a simple analysis may require a complex sampling
protocol.

2.2.4   Stratification and Local Spatial Control

Clustering to achieve local spatial control also conflicts with the effort to stratify by cover types.  Several
design alternatives may be considered to remedy this problem. An easily  implemented approach is the
following. A stratified random sample of pixels is obtained using the mapped LC classes as strata.  To
incorporate local spatial control and increase the sample size, the eight pixels touching each sampled pixel
are also included in the sample.  That is, a cluster consisting of a 3 x 3 block of pixels is created, but the
selection protocol is based on the center pixel of the cluster.  Two potential drawbacks exist for this
protocol. First, the sample size control feature of stratified random sampling is diminished because the
eight pixels surrounding an originally selected  sample pixel could be any LC type, not necessarily the
same type as the center pixel of the block. Sample size  planning becomes trickier because we do not
know which LC classes will be represented by the surrounding eight pixels, nor how many pixels will be
obtained for each LC class present.  This will not be a problem if we have abundant resources because we
could specify the desired minimum sample size for each LC  class  based on the  identity of the center
Page 20 of 339

-------
pixels. However, having an overabundance of accuracy assessment resources is unlikely, so the loss of
control over sample allocation is a legitimate concern.

Second, and more importantly, this protocol creates a complex inclusion probability structure because a
pixel may be selected into the sample via two conditions:  it is an originally selected center pixel of the 3
x 3 cluster, or it is one of the eight pixels surrounding the initially sampled center pixel.  To use the data
within a rigorous probability-sampling framework, the inclusion probability determined for each pixel
must account for this joint possibility of selection. We require the probability of being selected as a
center pixel, the  probability of being selected as an accompanying pixel in the 3 x 3 block, and the
probability of being selected by both avenues in the same sample (i.e., the intersection event). The first
probability is readily available because it is the inclusion probability of a stratified random sample, nh/Nh,
where nh and Nh  are the sample and population numbers of pixels for stratum h.  The other two
probabilities are  much more complicated.  The probability of a pixel being selected because it is adjacent
to a pixel selected in the initial sample depends on the map LC labels of the eight pixels surrounding the
pixel in question, and this probability differs among different LC types. Although  it is conceptually
possible to enumerate the necessary information to obtain these probabilities, it is practically difficult.
Finding the intersection probability would be equally complex.  Rather than derive the actual inclusion
probabilities, we could use the stratified random sampling inclusion probabilities as an easy to implement,
but crude approximation. This would violate the principle of consistent estimation and raise the question
of how well such an approximation worked.

A second general alternative is to change the way the stratification is implemented. The problem arises
because the strata are defined at the pixel level while the selection procedure is applied to the cluster
level. Stratifying at the cluster level, for example a 3 x 3 block of pixels, resolves this problem, but
creates another.  The non-homogeneous character of the clusters creates a challenge when deciding to
which stratum a block should be assigned  if it consists of two or more cover types.  Rules to determine
the assignment must be specified.  For example, assigning the block to the most common class found in
the 3  x 3 block is one possibility, with a tie-breaking provision defined for equally common classes. A
drawback of this approach  is that few 3x3 blocks may be assigned to  strata representing rare classes if
the rare class pixels are often found in small patches of 2-4 pixels. An alternative is to construct a rule
that forces greater numbers of blocks into  rare class strata. For example, the presence of a single pixel of
a rare class may trigger assignment of that pixel's block to the rare class stratum. An obvious difficulty of
this assignment protocol is what to do if two or more rare classes are represented within the same cluster.
Because stratification requires that each block be assigned to exactly one stratum, and all blocks in the
region must be assigned to strata, an elaborate set of rules may be needed to encompass all cases.  A two-
stage protocol such as implemented in the NLCD would reduce the workload of assigning blocks to strata
because this assignment would be necessary only for the first-stage sample PSUs, not the entire area
mapped.  Estimation of accuracy parameters would be straightforward in this approach because each pixel
in the 3 x 3 cluster has the  same inclusion probability.  This is an advantage of this option compared to
the first option in which the pixels within  a 3 x 3 block may have different inclusion probabilities.  As is
true for most complex designs, constructing a variance estimator and implementing it via existing
software may be difficult.

This discussion  of how to resolve design conflicts created by the desire to incorporate both cover type
stratification and local spatial control via clustering illustrates that the  solutions to practical problems may
                                                                                       Page 21 of 339

-------
not be simple. We know how to implement cluster sampling and stratified sampling as separate entities,
but we do not necessarily have simple, effective ways to construct a design that simultaneously
accommodates both structures.  Simple implementation procedures may lead to complex analysis
protocols (e.g., difficulty in specifying the inclusion probabilities), and procedures permitting simpler
analyses may require complex implementation protocols (e.g., defining strata at the 3 x 3 block level).
The situation is even more complex than the treatment in this section indicates. It is likely that these
methods focusing on local spatial control will need to be embedded in a design also incorporating
regional spatial control.  The 3x3 pixel clusters would represent sub-samples from a larger primary
sampling unit such as a 6 x 6 km area. Integrating regional and local spatial control with stratification
raises still additional challenges to the design.

The NLCD case study may also be used as the context for addressing concerns related to pixel-based
assessments.  Positional  error creates difficulties with any accuracy assessment because of potential
problems in achieving exact spatial correspondence between the reference location and the map  location.
Typically, the problem is more strongly associated with pixel-based assessments relative to polygon-
based assessments, but it is not clear that this association  is entirely justified. The effects of positional
error are most strongly manifested along the edges of map polygons. Whether the assessment is based  on
a pixel, polygon, or other spatial unit does not change the amount of edge present in the map. What may
be changed by choice of assessment unit is how edges are treated in the collection and use of reference
data. For example, suppose a polygon assessment employs an agreement protocol in which the entire
map polygon is judged either in complete agreement or complete disagreement with the reference data.
In this approach, the effect of positional error is greatly diminished because the error associated with a
polygon edge may be obscured when blended with the more homogeneous, polygon interior. The
positional error problem has not disappeared; it has to some extent been swept under the rug.  This
particular version of a polygon-based assessment is valid for certain map applications, but not all. For
example, if the assessment objective is site-specific accuracy, the assessment must account for possible
classification error along polygon boundaries.  Defining agreement as a binary outcome based on the
entire polygon will not achieve that purpose.

In a pixel-based  assessment, provisions should be included to accommodate the reality of positional error
when assessing edge or boundary pixels. No option is perfect, because we are dealing with a problem
that has no practical,  ideal solution. However, the option chosen  should address the problem directly.
One approach is to construct the reference data  protocol so that the potential influence of positional error
can be assessed. The protocol may include a rating of location confidence (i.e., how confident is the
observer that the reference and map locations correspond exactly), followed  by reporting results  for the
full reference data as well as subsets of the data defined by the location confidence rating.  Readers may
then judge the potential effect of positional error by comparing accuracy at various levels of location
confidence.  A related approach would be to report accuracy results separately for edge and interior
pixels. An alternative approach is to define agreement based on more information than comparing a
single map pixel to a single reference pixel.  In the NLCD assessment, one definition of agreement used
was to compare the reference label of the sample pixel with a mode class determined from the map labels
of the 3x3 block of pixels centered on the nominal sample pixel  (Yang et al.,  2001).  This definition
recognizes the possibility that the actual location used to determine the reference label could be offset by
one pixel from the location identified on the  map.
Page 22 of 339

-------
Another important feature of a pixel-based assessment is to account for the minimum mapping unit
(MMU) of the map. When assigning the reference label, the observer should choose the LC class keeping
in mind the MMU established.  That is, the observer should not apply tunnel vision restricted only to the
area covered by the pixel being assessed, but rather should evaluate the pixel taking into account the
surrounding spatial context. In the 1990 NLCD, the MMU was a single pixel. It is expected that NLCD
users may choose to define a different MMU depending on their particular application, but the NLCD
accuracy assessment was pixel-based because the base product made available was not aggregated to a
larger MMU.

The problems associated with positional error are largely specific to the response or measurement
component of the accuracy assessment (Stehman and Czaplewski, 1998).  However, a few points related
to sampling design should be recognized. Although the MMU is a relevant feature of a map to consider
when determining the response design protocol, it is important to recognize that a MMU does not define a
sampling unit. A pixel, a polygon, or a 3 x 3 block of pixels, for example, are all legitimate sampling
units, but a "1.0 ha MMU" lacks the necessary specificity to define a sampling unit. The MMU does not
create the unambiguous definition required of a sampling unit because it permits various shapes of the
unit, it does not include specification of how the unit is accounted for when the polygon is larger than the
MMU, and it does not lead directly to a partitioning of the region into sampling units.  While it may be
possible to construct the necessary sampling unit partition based on a MMU, this approach has never been
explicitly articulated. When sampling polygons, the basic methods available are simple random,
systematic, and stratified (by LC class) random sampling from a list frame of polygons. Less obvious are
how to incorporate clustering and spatial sampling methods for polygon assessment units.  Polygons may
vary greatly in size, so a decision is required whether to stratify by size so as not to have the sample
dominated by numerous small polygons. A design protocol of locating sample points systematically or
completely at random and including those polygons touched by these sample point locations creates a
design in which the probability of including a polygon is proportional to its area.  This structure must be
accounted for in the analysis, and is a characteristic of polygon sampling that has yet to be discussed
explicitly by proponents of such designs. Most of the comparative studies of accuracy assessment
sampling designs are pixel-based assessments (Fitzpatrick-Lins, 1981; Congalton, 1988b;  Stehman, 1992,
1997) and analyses of potential  factors influencing design choice (e.g., spatial correlation of error) are
also pixel-based investigations (Congalton, 1988a; Congalton and Pugh, 2001).

Problems associated with positional error in accuracy assessment merit further investigation and
discussion. Although it is easy to dismiss pixel-based assessments with a "you can't find a pixel"
proclamation, a less superficial treatment of the issue is called for. Edges are a real characteristic of all
LC maps, and the accuracy reported for a map should account for this reality.  Whether the assessment is
based on a pixel or larger spatial unit, the accuracy assessment should confront the edge feature directly.
Although there is no perfect solution to the problem, options exist to specify the analysis or response
design protocol in such a way that the effect of positional error on accuracy is addressed.   Sampling in a
manner that permits evaluating the effect of positional error seems preferable to sampling  in a way that
obscures the  problem (e.g., limiting the sample to homogeneous LC regions).
                                                                                      Page 23 of 339

-------
2.3   Existing Data

It is natural to consider whether existing data or data collected for other purposes could be used as
reference data to reduce the cost of accuracy assessment. Such data must first be evaluated to ascertain
spatial, temporal, and classification scheme compatibility with the LC map that is the subject of the
assessment.  Once compatibility has been established, the issue of sampling design becomes relevant.
Existing data may originate from either a probability or non-probability sampling protocol. If the data
were not obtained from a probability sampling design, the inability to generalize via rigorous, defensible
inference from these data to the full population is a severe limitation. The difficulties associated with
non-probability sampling are detailed in a separate subsection.

The greatest potential for using existing data occurs when the data have a probability-sampling origin.
Ongoing environmental monitoring programs are prime candidates for accuracy assessment reference
data.  The National Resources Inventory (NRI) (Nusser and Goebel, 1997) and Forest Inventory and
Analysis (FIA) (USFS, 1992) are the most likely contributors among the monitoring programs active in
the United States. Both programs include LC description in their objectives, so the data naturally fit
potential accuracy assessment purposes. Gill et al. (2000) implemented a successful accuracy assessment
using FIA data, and Stehman et al. (2000a) discuss use of FIA and NRI data within a general strategy of
integrating environmental monitoring with accuracy assessment.

At first glance, using existing data for accuracy assessment appears to be a great opportunity to control
cost.  However, further inspection suggests that deeper issues are involved.  Even when the data are from
a legitimate probability sampling design, these data will not be tailored exactly to satisfy all objectives of
a full-scale accuracy assessment.  For example, the sampling design for a monitoring program may be
targeted to specific areas or resources so that coverage is very good for some LC classes and sub-regions,
but possibly inadequate for others. For example, NRI covers non-federal land and targets agriculture
related questions, whereas the FIA focus is obviously on forested land.  To complete a thorough accuracy
assessment, it may be necessary to piece together a patchwork of various sources of existing data plus a
supplemental, directed sampling effort to fill in the gaps of the existing data coverage. The effort
required to cobble together a seamless, consistent assessment may be significant, and the statistical
analysis of the data complex.

Data from monitoring programs may carry provisions for confidentiality.  This is certainly true of NRI
and FIA.  Confidentiality agreements permitting access to the data will need to be negotiated and strictly
adhered to. Because of limited access to the data, progress may be slow if human interaction with the
reference data materials is required to complete the accuracy assessment.  For example, additional
photographic interpretation for reference data using NRI or FIA materials may be problematic because
only one or two qualified interpreters may have the necessary clearance to handle the materials.
Confidentiality requirements will also preclude making the reference data generally available for public
use. This creates problems for users wishing to conduct sub-regional assessments or error analyses, to
construct models of classification  error or to evaluate different spatial aggregations of the data.  It is
difficult to assign costs to these features. Existing data obviously saves on data collection costs, but there
are accompanying hidden costs related to complexity and completeness of the analysis, timeliness to
report results, and public access to the data.
Page 24 of 339

-------
2.3.1  Added Value Uses of Accuracy Assessment Data

In the previous section, accuracy assessment is considered an add-on to objectives of an ongoing
environmental monitoring program. However, if accuracy data are collected via a probability sampling
design, these data may have value for more general purposes.  For example, a common objective of LC
studies is to estimate the proportional representation of various cover types and how they change over
time.  We can use complete coverage maps such as the NLCD to provide such estimates, but these
estimates are biased because of the classification errors present. Although the maps represent a complete
census, they contain measurement error. The reference data collected for accuracy assessment supposedly
represent higher quality data (i.e., less measurement error), so these data may serve  as a stand-alone basis
for estimates of LC proportions and areas.  Methods for estimating area and proportion of area covered by
the various LC classes have been developed (Czaplewski and Catts, 1992; Walsh and Burk, 1993; Van
Deusen, 1996).  Recognizing this potentially important use of reference data provides further rationale for
implementing a statistically defensible,  probability sampling designs.  This area estimation application
extends to situations in which LC proportions for small areas such as a watershed or county are of
interest. A probability sampling design provides a good foundation for implementing small-area
estimation methods to obtain the area proportions.
2.4   Non-Probability Sampling

Because non-probability sampling is often more convenient and less expensive, it is useful to review
some manifestations of this departure from a statistically rigorous approach. Restricting the probability
sample to near roads for convenient access or to homogeneous 3x3 pixel clusters to reduce confounding
of spatial and thematic error are two typical examples of non-probability sampling. A positive feature of
both examples is that generalization to some population is statistically justified (e.g., the population of all
locations conveniently accessible by road or all areas of the map consisting of 3 x 3 homogeneous pixel
blocks). Extrapolation to the full map is problematic. In the NLCD assessment, restricting the sample to
3x3 homogeneous blocks would have represented roughly 33% of the map, and the overall accuracy for
this homogeneous subset was about 10% higher than for the full map.  Class-specific accuracies could
increase by 10% to 20% for the homogeneous areas relative to the full map.

Another prototypical non-probability sampling design results when the inclusion probabilities needed to
meet the consistent estimation criterion of statistical rigor are unknown. Expert or judgment samples,
convenience samples (e.g., near roads, but not selected by a probability sampling protocol), and complex,
ad hoc protocols are common examples. "Citizen participation" data collection programs are another
example in which data are usually not collected via a probability sampling protocol, but rather are
purposefully chosen because of proximity and ease of access to the participants.  This version of non-
probability sampling creates adverse conditions for statistically defensible inference to any population.
Peterson et al. (1999) demonstrate inference problems in the particular case of a citizen-based, lake water-
quality monitoring program.  To support inference from non-probability samples, the options are to resort
to a statistical model, or to simply claim "the sample looks good." In the former case, rarely are the
model assumptions explicitly stated or evaluated  in accuracy assessment. The latter option is generally
                                                                                    Page 25 of 339

-------
regarded as unacceptable, just as it is unacceptable to reduce accuracy assessment to an "it looks good"
judgment.

Another use of non-probability sampling is to select a relatively small number of sample sites that are,
based on expert judgment, representative of the population.  In environmental monitoring, these locations
are referred to as "sentinel" sites, and serve as an analogy to hand picked confidence sites in accuracy
assessment.  In both environmental monitoring and accuracy assessment, judgment samples can play an
invaluable role in understanding processes, and their role in accuracy assessment for developing better
classification techniques should be recognized. Although non-probability samples may serve as a useful
initial check on gross quality of the data because poorly classified areas may be identified quickly,
caution must be exercised when broad-based, population level description is desired (i.e., when the
objective is to generalize from the sample). Edwards (1999) emphasizes that the use of sentinel sites for
population inference in environmental monitoring is  suspect.  This concern is applicable to accuracy
assessment as well.

More statistically formal approaches to non-probability sampling have been proposed.  In the method of
balanced sampling, selection of sample units is purposefully balanced on one or more auxiliary variables
known for the population (Royall and Eberhardt, 1975). For example, the sample might be chosen so that
the mean elevation of the sample pixels matches the mean elevation of all pixels  mapped as that LC class
(i.e., the population mean). The method is designed to produce a sample robust to violations in the model
used to support inference. Most non-probability sampling designs implemented in accuracy assessment
lack the underlying model-based rationale of balanced sampling, and instead are  the result of simply
convenience, judgment, or poor design. Schreuder and Gregoire (2001) discuss other potential uses of
non-probability sampling data.

2.4.1   Policy Aspects  of Probability Versus Non-Probability Sampling

Considering implementation of a non-probability sampling protocol has policy implications in addition to
the scientific issues discussed in the previous section. The policy issues arise because both scientists and
managers using the LC map have a vested  interest in the map's accuracy. Federal sponsorship to create
these maps adds an element of governmental responsibility to ensure, or at least document, their quality.
The stakes are consequently high and the accuracy assessment design will need to be statistically
defensible.  Most government sampling programs responsible for providing national and broad regional
estimates are conducted using probability sampling protocols. The Current Population Survey (CPS)
(McGuiness, 1994) and National Health and Nutrition Examination Survey (NHANES) (McDowell et al.,
1981) are two such programs designed  as probability samples. Similarly, national environmental
sampling programs are typically based on probability sampling protocols (Olsen et al., 1999).

The expense of LC maps covering large geographic regions combined with the multitude of applications
these maps serve elevates the importance of accuracy assessment to a level commensurate with these
other national sampling programs.  Accordingly, the  protocols employed to evaluate the quality of the LC
data must achieve standards of sampling design and statistical credibility established by other national
sampling programs. These standards of accuracy assessment protocol will exceed those acceptable for
more local use, lower profile maps.  The exposure, or perhaps notoriety, accruing to maps such as the
Page 26 of 339

-------
NLCD will elicit intense scrutiny of their quality. Concerns related to litigation may become more
prevalent as use of LC maps affecting government decisions increases. Map quality may be challenged
not only scientifically, but also legally.  Because the sampling design is such a fundamental part of the
scientific basis of an accuracy assessment, the credibility of this component of accuracy assessment must
be assured. To provide this assurance, the use of scientifically defensible probability sampling protocols
should be a matter of policy.
2.5   Statistical Computing

The requirements for statistically rigorous design and analysis will tax the capability of traditional
computing practice in accuracy assessment. Stehman and Czaplewski (1998) noted the absence of readily
accessible, easy to use statistical software that could perform the analyses associated with the more
complex sampling designs that will be needed for large-area map assessments. Recent upgrades in
computing software have improved this situation. For example, the SAS analysis software now includes
survey sampling estimation procedures that can be adapted for accuracy assessment applications. Nusser
and Klaas (2003) implemented these procedures to obtain the typical suite of accuracy estimates and
accompanying standard errors for complex sampling designs. The SAS procedure accomplishing these
tasks is PROC SURVEYMEANS.

Survey sampling software will be invaluable if data from ongoing monitoring programs is to be used for
accuracy assessment. For example, suppose NRI data serve as the source of reference data. Two
characteristics of the NRI data, confidentiality and the unequal probability design used, may be resolved
by the  capabilities available in SAS. To adhere to the estimation criterion of consistency, the accuracy
estimates must incorporate weights for the sample pixels derived from the unequal  inclusion probabilities.
The SAS estimation procedures are designed to accommodate these weights. Confidentiality of sample
locations can be maintained because the necessary estimation weights need not refer to any location
information.  The possibility exists that with the location information stripped away, the  data could be
made available for limited general use for applications requiring only the sample weights, and the map
and reference labels.  Users would need to conduct their analyses via SAS or another software package
that implements design-based estimation procedures  incorporating the sampling weights. Analyses
ignoring this feature may produce badly misleading results.

Use of SAS  for accuracy assessment estimation provides two other advantages. SAS includes estimation
of standard errors as standard output.  Standard error formulas are complex for the  sampling designs
combining the advantages of both strata and clusters. Having available software to compute these
standard errors is highly beneficial relative to the alternative of writing one's own variance estimation
code and having to confirm its validity.  Secondly, SAS readily accommodates the  fact that many
accuracy estimates, for example producer's accuracy, are ratio estimators (i.e., ratios of two estimates).
For ratio estimators, the  SAS standard error estimation procedures employ the common practice of using
a Taylor Series approximation. The more complex design structures that arise from more cost-effective
assessments or use of existing data obtained from an ongoing monitoring program will likely require
more sophisticated analysis software than is available in standard GIS and classification  software. SAS
                                                                                    Page 27 of 339

-------
does not provide everything that is needed, but its capabilities represent a major step forward in
computing for accuracy assessment analyses.
2.6    Practical Realities of Sampling Design

In comments directed toward sampling design for environmental monitoring, Fuller (1999) captured the
essence of many of the issues facing sampling design for accuracy assessment.  These principles are re-
stated, and in some cases paraphrased, to adapt them to accuracy assessment sampling design: (1) every
new approach sounds easier than it is to implement and analyze; (2) more will be required of the data at
the analysis stage than had been anticipated at the planning stage; (3) objectives and priorities change
over time; and (4) the budget will be insufficient.


2.6.1   Principle 1

Every new approach sounds easier than it is.  Incorporating existing data for accuracy assessment is a
good case in point. While the data may be "free," the analysis and research required to evaluate the
compatibility of the spatial units and classification scheme are not without costs.  Confidentiality
agreements may need to be negotiated and strictly adhered to, spatial and temporal coverage of the
existing data may be incomplete and/or inadequate, and the response time for interaction with the agency
supplying the data may be slow because this use of their data may not be a top priority among their
overburden of responsibilities.  Existing data that do not originate from a probability sampling protocol
are even more difficult to incorporate into a rigorous protocol and may be useful only as a qualitative
check of accuracy and to provide limited anecdotal, case study information.


2.6.2   Principle 2

More will be required of the data at the analysis stage than had been anticipated at the planning stage.
This principle applies to estimating accuracy of sub-regions and other subsets of the data.  That is, a
program designed for regional accuracy assessments will be asked to provide state-level estimates and
possibly even county-level estimates. Not only will overall accuracy be requested for these small sub-
regions, but also class-specific accuracy within the sub-region will be seen as desirable information.
Accuracy estimates for other subsets of the data will become  appealing.  For example, are the
classification errors associated with transitions between cover types? How accurate are the classifications
within relatively large homogeneous areas of the map?  Deriving a spatial representation of classification
error is another relevant, but supplemental objective that places additional  requirements on the accuracy
assessment analysis that may not have been planned for at the design stage.


2.6.3  Principles

Over time, objectives and/or priorities of objectives may change.  This may not represent a major problem
in accuracy assessment projects, but one example is changing the classification scheme if it is recognized
that certain LC classes cannot be mapped well. Another example illustrating this principle occurs when
Page 28 of 339

-------
the map is revised (updated) while the accuracy assessment is in progress. Some of the additional
analyses described in Principle 2 represent a change in objectives also.


2.6.4  Principle 4

Insufficient budget is a common affliction of accuracy assessments (Scepan, 1999).  Resource allocation
is dominated by the mapping activity, with scant resources available for accuracy assessment.  Adequate
resources may exist to obtain reasonably precise, class-specific estimates of accuracy over broad spatial
regions. For example, the NLCD accuracy assessment provides relatively low standard errors for class-
specific accuracy for each of 10 large regions of the United States. However, once Principle 2 manifests
itself, data that serve well for regional estimates may look woefully inadequate for sub-regional accuracy
objectives.  Edwards et al. (1998) and Scepan (1999) recognized these phenomena for state level and
global mapping.  In the former case, resources were inadequate to estimate class-specific accuracy with
acceptable precision for all three ecoregions found in the state of Utah. In the global application, the data
were too sparse to provide precise class-specific estimates for each continent.

Timeliness of accuracy assessment reporting is hampered by the need for the map to be completed prior
to drawing an appropriately targeted sample, and any accuracy assessment activity concurrent with map
production detracts from timely completion of the map. Managing and quality checking data is a time-
consuming, tedious task for the large datasets of accuracy assessment, and the statistical analysis is non-
trivial when the design is complex and standard errors are required. Lastly, neither the time nor the
financial resources are usually available to support research that would allow tailoring the sampling
design to specifically target objectives and characteristics of each individual mapping project. Comparing
different sampling designs using data directly relevant to the specific mapping project requires both time
and money.  Instead of this focused research approach, often design choices must be based on judgment
and experience, but without hard data to support the decision.
2.7   Discussion

Sampling design is one of the core challenges facing accuracy assessment, and future developments in
this area will contribute to more successful assessments. The goal is to implement a statistically
defensible sampling design that is both cost-effective and addresses the multitude of objectives that
multiple users and applications of the map generate. The future direction of sampling design in accuracy
assessment must go beyond the basic designs featured in textbooks (Campbell, 1987; Congalton and
Green, 1999) and repeated in several reviews of the field (Congalton, 1991; Janssen and van der Wei,
1994; Stehman, 1999; McGwire and Fisher, 2001; Foody, 2002).  While these designs are fundamentally
sound and introduce most of the basic structures required of good design (e.g., stratification, clusters,
randomization), they are inadequate for assessing large-area maps given the reality of budgetary and
practical constraints.

For both policy and scientific reasons, probability sampling is a necessary characteristic of the sampling
design. Within the class of probability sampling designs, we must seek to develop or identify methods
that resolve the conflicts of a design combining stratifying by LC class and clustering. Protocols
                                                                                     Page 29 of 339

-------
incorporating the advantages of two or more of the basic sampling designs need to be implemented when
combining data from different ongoing monitoring programs to take advantage of existing data, or when
augmenting a general sampling design to increase the sample size for rare classes or small sub-regions.
Sampling methods need to be explored for assessing accuracy for different spatial aggregations of the
data and for non-site-specific accuracy assessments. As is often the case for any developing field of
application, sampling design for accuracy assessment may not require developing entirely new methods,
but rather learning better how to use existing methods.

Implementing a scientifically rigorous sampling design provides a secure foundation to any accuracy
assessment.  Accuracy assessment data have little or no value to inform us about the map's utility if the
data are not collected via a credible sampling design. Sampling design in accuracy assessment is still
evolving according to a progression common in other fields of application.  Early innovators identified
the need for sound sampling practice (Fitzpatrick-Lins, 1981; Card, 1982; Congalton 1991). As more
familiarity was  gained with traditional survey sampling methods, more complex sampling designs could
be introduced and integrated into practice.  The challenges confronting sampling design for descriptive
objectives of accuracy assessment were recognized as daunting, but by no means insurmountable. The
platitude that we must choose a sampling design that "balances statistical validity and practical utility"
was raised (Congalton, 1991), and specificity was added to this generic recommendation by stating
explicit criteria of both validity and utility (Stehman, 2001).

The future direction of accuracy assessment sampling design demands new developments. Practical
challenges are a reality. For most, if not all  of these problems, statistical solutions already exist, or the
fundamental concepts and techniques upon which to derive the solutions can be found in the survey
sampling literature. The key to implementing better, more cost-effective sampling procedures in accuracy
assessment is to move beyond the parochial, insular traditions characterizing the early stage  of accuracy
assessment sampling and to recognize more clearly the broad expanse of opportunities offered by
sampling theory and practice. The book on sampling design for accuracy assessment is by no means
closed.  Sampling design in accuracy  assessment may have progressed to an advanced stage of
adolescence,  but it has yet to reach a level of consistency in good  practice and sound conceptual
fundamentals necessary to be considered a scientifically mature endeavor.  More statistically
sophisticated sampling designs not only contribute to the value of map accuracy assessments, they are the
result of our current needs for more information related to map utility.  If our needs were simple and  few
the basic sampling designs receiving the bulk of attention in the 1980s and early 1990s would suffice. It
is the increasingly demanding questions related to utility of these  maps that compel us to seek better,
more cost-effective sampling designs. Identifying these designs and implementing them in practice is the
future of sampling practice in accuracy assessment.
2.8   Summary

As maps delineating LC play an increasingly important role in natural resource science and policy
applications, implementing high quality statistically rigorous accuracy assessments becomes essential.
Typically, the primary objective of accuracy assessment is to provide precise estimates of overall
accuracy and class-specific accuracies (e.g., user's or producer's accuracies). An extended set of
Page 30 of 339

-------
objectives exists for most large-area mapping projects because multiple users interested in different
applications will employ the map. Constructing a cost-effective accuracy assessment is a challenging
problem given the multiple objectives the assessment must satisfy. To meet this challenge, a more
integrated sampling approach combining several design elements such as stratification, clustering, and use
of existing data must be considered. These design elements are typically found individually in current
accuracy assessment practice, but greater efficiency may be gained by more innovatively combining their
strengths.  To ensure scientific credibility, sampling designs for accuracy assessment should satisfy the
criteria defining a probability sample.  This requirement places additional burden on how various design
elements are integrated.  When exploring alternative design options, the apparently simple answers may
not be as straightforward as they first appear. Combining basic design structures such as strata and
clusters to enhance efficiency has some significant complicating factors, and use of existing data for
accuracy assessment has associated hidden costs even if the data are free.
2.9   References

Anderson, J. R., E.E. Hardy, J.T.Roach, and R.E. Witmer. A land use and land cover classification
    system for use with remote sensor data. U.S. Geological Survey Prof. Paper 964, U.S. Geological
    Survey, Washington, DC, 28 p., 1976.

Campbell, J.B. Introduction to Remote Sensing.  Guilford Press, New York, NY, 1987.

Card, D. H.  Using known map category marginal frequencies to improve estimates of thematic map
    accuracy. Photogrammetric Engineering & Remote Sensing, 48, 431-439, 1982.

Cochran, W. G. Sampling Techniques. Wiley, New York, NY, 1977.

Congalton, R. G. Using spatial autocorrelation analysis to explore the errors in maps generated from
    remotely sensed data. Photogrammetric Engineering and Remote Sensing, 54, 587-592, 1988a.

Congalton, R. G. A comparison of sampling schemes used in generating error matrices for assessing the
    accuracy of maps generated from remotely sensed data. Photogrammetric Engineering and Remote
    Sensing, 54, 593-600, 1988b.

Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data. Remote
    Sens. Environ., 37, 35-46, 1991.

Congalton, R.G., and K. Green. Assessing the Accuracy of Remotely Sensed Data: Principles and
    Practices. CRC Press, Boca Raton, FL, 1999.

Czaplewski, R.L., and G.P. Catts.  Calibration of remotely sensed proportion or area estimates for
    misclassification error. Remote Sens. Environ., 39, 29-43, 1992.

Edwards, D. Issues and themes for natural resources trend and change detection. Ecological
    Applications, 8, 323-325, 1998.
                                                                                    Page 31 of 339

-------
 Edwards, T. C., Jr., G.G. Moisen, and D.R. Cutler. Assessing map accuracy in an ecoregion-scale cover-
     map. Remote Sens. Environ., 63, 73-83, 1998.

 Fitzpatrick-Lins, K. Comparison of sampling procedures and data analysis for a land-use and land-cover
     map. Photogrammetric Engineering and Remote Sensing, 47, 343-3 51, 1981.

 Foody, G.M.  The continuum of classification fuzziness in thematic mapping. Photogrammetric
    Engineering and Remote Sensing, 65, 443-451, 1999.

Foody, G.M. Status of land cover classification accuracy assessment.  Remote Sens. Environ., 80, 185-
    201,2002.

Fuller, W.A. Environmental surveys over time. Journal of Agricultural, Biological, and Environmental
    Statistics, 4, 331-345,  1999.

Gill, S., J., J. Milliken, D. Beardsley, and R. Warbington. Using a mensuration  approach with FIA
    vegetation plot data to assess the accuracy of tree size and crown closure classes in a vegetation map
    of Northeastern California.  Remote Sens. Environ., 73, 298-306, 2000.

Hammond, T.O., and D.L. Verbyla. Optimistic bias in classification accuracy assessment.  Int. J.  Remote
    Sensing, 17, 1261-1266, 1996.

 Janssen,  L.L.F., and F.J.M. van der Wei. Accuracy assessment of satellite derived land-cover data:  A
    review. Photogrammetric Engineering and Remote Sensing, 60, 419-426,  1994.

 Jones, K.B., A.C. Neale, M.S. Nash, R.D. Van Remotel, J.D. Wickham, K.H. Riitters, and R.V. O'Neill.
    Predicting nutrient and sediment loadings to streams from landscape metrics: a multiple watershed
    study from the United States Mid-Atlantic Region.  Landscape Ecology, 16, 301 -312, 2001.

 Laba, M., S.K.. Gregory, J. Braden, D. Ogurcak, E. Hill, E., Fegraus, J. Fiore, and S.D. DeGloria.
    Conventional and fuzzy accuracy assessment of the New York Gap Analysis Project land cover maps
    Remote Sens. Environ., 81, 443-455, 2002.

 McDowell, A., A. Engel, J.T. Massey, and K. Maurer. Plan and Operation of the Second National Health
    and Nutrition Examination Survey, 1976-1980.  Vital and Health Stat. Rep., Series 1(15), National
    Center for Health Statistics, 1981.

 McGuiness, R.A. Redesign of the sample for the Current Population Survey. Employment and Earnings,
    41,7-10,1994.

McGwire, K.C., and P. Fisher.  Spatially variable thematic  accuracy: Beyond the confusion matrix, In:
    Spatial Uncertainty in Ecology: Implications for Remote Sensing and GIS Applications. (C.T.
    Hunsaker, M.F. Goodchild, M.A. Friedl, and T.J. Case, Editors), Springer,  New York, NY, 329 p.,
    2001.

 Muller, S.V., D.A. Walker, F.E. Nelson, N.A. Auerbach, J.G. Bockheim, S. Guyer, and D. Sherba.
    Accuracy assessment of a land-cover map of the Kuparuk  River Basin, Alaska: Considerations for
    remote regions. Photogrammetric Engineering and Remote Sensing, 64,619-628, 1998.
 Page 32 of 339

-------
Nusser, S.M., and J.J. Goebel.  The National Resources Inventory: A long-term multi-resource monitoring
    programme.  Environmental and Ecological Statistics, 4,181 -204, 1997.

Nusser, S.M., and E.E. Klaas.  Survey methods for assessing land cover map accuracy. Environmental
    and Ecological Statistics, 10, 309-331, 2003.

Olsen, A.R., J. Sedransk, D. Edwards, C.A. Gotway, W. Liggett, S. Rathbun, K.H. Reckhow, and L.J.
    Young.  Statistical issues for monitoring ecological and natural resources in the United States.
    Environmental Monitoring and Assessment, 54, 1-45, 1999.

Peterson, S.A., N.S. Urquhart,  and E.B. Welch. Sample representativeness: A must for reliable regional
    lake condition estimates. Environmental Science and Technology, 33, 1559-1565, 1999.

Pugh, S. A., and R. G. Congalton. Applying spatial autocorrelation analysis to evaluate error in New
    England forest-cover-type  maps derived from  Landsat Thematic Mapper data. Photogrammetric
    Engineering & Remote Sensing, 67, 613-620, 2001.

Royall, R.M., and K..R. Eberhardt.  Variance estimates for the ratio estimator.  Sankhya, C (37), 43-52,
    1975.

Sarndal, C.E., B. Swensson, and J. Wretman. Model-Assisted Survey Sampling. Springer-Verlag, New
    York, NY, 1992.

Scepan, J. Thematic validation of high-resolution global land-cover data sets. Photogrammetric
    Engineer ing and Remote Sensing, 65, 1051-1060, 1999.

Schreuder, H.T., and T.G. Gregoire.  For what applications can probability and non-probability sampling
    be used? Environmental Monitoring and Assessment, 66, 281-291, 2001.

Stehman, S.V. Comparison of systematic and random sampling for estimating the accuracy of maps
    generated from remotely sensed data. Photogrammetric Engineering and Remote Sensing, 58, 1343-
    1350, 1992.

Stehman, S.V. Estimating standard errors of accuracy assessment statistics under cluster sampling.
    Remote Sens. Environ., 60, 258-269, 1997.

Stehman, S.V. Basic probability sampling designs for thematic map accuracy assessment. Int. J. Remote
    Sensing, 20, 2423-2441, 1999.

Stehman, S.V.  Statistical rigor and practical utility in thematic map accuracy assessment,
    Photogrammetric Engineering and Remote Sensing, 67, 727-734, 2001.

Stehman, S.V., and R.L. Czaplewski. Design and analysis for thematic map accuracy assessment:
    Fundamental principles. Remote Sens. Environ., 64, 331-344, 1998.

Stehman, S.V., R.L. Czaplewski, S.M. Nusser, L. Yang, and Z. Zhu. Combining accuracy assessment of
    land-cover maps with environmental monitoring programs.  Environmental Monitoring and
    Assessment, 64, 115-126,  2000a.
                                                                                   Page 33 of 339

-------
Stehman, S.V., J.D. Wickham, L. Yang, and J.H. Smith. Assessing the accuracy of large-area land cover
    maps: Experiences from the Multi-resolution Land-Cover Characteristics (MRLC) project. In:
    Accuracy 2000: Proceedings of the 4lh International Symposium on Spatial Accuracy Assessment in
    Natural Resources and Environmental Sciences, (G.B.M. Heuvelink and M.J.P.M. Lemmens,
    Editors), Delft University Press, The Netherlands, pp. 601-608, 2000b.

Stehman, S.V., J.D. Wickham, L. Yang, and J.H. Smith. Accuracy of the national land-cover dataset
    (NLCD) for the eastern United States: statistical methodology and regional results. Remote Sens.
    Environ., 86, 500-516, 2003.

USFS (U.S. Forest Service).  Forest Service Resource Inventories: An Overview.  USGPO 1992-341-
    350/60861, U.S. Department of Agriculture, Forest Service, Forest Inventory, Economics, and
    Recreation Research, Washington, DC, 39 p., 1992.

Van Deusen, P.C.  Unbiased estimates of class proportions from thematic maps, Photogrammetric
    Engineering & Remote Sensing, 62, 409-412, 1996.

Walsh, T.A., and I.E. Burk. Calibration of satellite classifications of land area. Remote Sens. Environ.,
    46,281-290, 1993.

Yang, L., S.V.  Stehman, J.H. Smith, and J.D. Wickham. Thematic accuracy of MRLC land cover for the
    Eastern United States. Remote Sens. Environ., 76, 418-422, 2001.

Zhu, Z., L. Yang, S.V. Stehman, and R.L. Czaplewski. Accuracy assessment for the U.S. Geological
    Survey regional land-cover mapping program: New York and New Jersey region.  Photogrammetric
    Engineering and Remote Sensing, 66,  1425-1435, 2000.
Page 34 of 339

-------
                                   Chapter 3

             Validation of Global Land-Cover Products by
             the Committee on Earth Observing Satellites

                                        by

                      Jeffrey T. Morisette, Corresponding Author1*
                                 Jeffrey L. Privette1
                                   Alan Strahler2
                                  Philippe Mayaux3
                               Christopher O. Justice4
   NASA Goddard Space Flight Center
   Code 923
   Greenbelt, MD 20771

   *'Corresponding Author Contact:

   Telephone: (301)614-6676
    Facsimile: (301)614-6695
       E-mail: ieff.morisette@asfc.nasa.Qov
Boston University
Geography Department
Boston, MA 02215
   Global Vegetation Monitoring Unit
   Institute for Environment and Sustainability
   Joint Research Centre
   European Commission
   Ispra, Italy
University of Maryland
Geography Department
Colleague Station, MD 20742
3.1   Introduction

3.1.1  Committee on Earth Observing Satellites

The Committee on Earth Observation Satellites (CEOS) is an international organization charged with
coordinating international civil space-borne missions designed to observe and study planet Earth. Current
membership is comprised of 41 space agencies and other national and international organizations.  It was
created (1984) in response to a recommendation from the "Economic Summit of Industrialized Nations
                                                                        Page 35 of 339

-------
Working Group on Growth, Technology, and Employment's Panel of Experts on Satellite Remote
Sensing," which recognized the multidisciplinary nature of satellite Earth observation and the value of
coordination across all proposed missions. The main goals of CEOS are to ensure that: (1) critical
scientific questions relating to Earth observation and global change are covered; and (2) satellite missions
do not unnecessarily overlap (http://www.ceos.org).  The first goal can be achieved by providing timely
and accurate information from satellite-derived products. Proper use of these products, in turn, relies on
our ability to ascertain their uncertainty. The second goal is achieved through coordination among CEOS
members.

As validation efforts are an integral part of "satellite missions," part of the CEOS mission is to reduce the
likelihood of unnecessary overlap in validation efforts. The particular CEOS work related to validation
falls within the Working Group on Calibration  and Validation (WGCV); which is one of two standing
working groups of CEOS (the other being the Working Group on  Information Systems and Services,
WG1SS).  The ultimate goal of the WGCV is to ensure long-term confidence in the accuracy and quality
of Earth observation data and products through (a) sensor-specific calibration and validation, and
(b) geophysical parameter and derived product  validation. To ensure long-term confidence in the
accuracy and quality of Earth observation data and products, the WGCV provides a forum for calibration
and validation information exchange, coordination, and cooperative activities. The WGCV promotes the
international exchange of technical information and documentation, joint experiments, and the sharing of
facilities, expertise, and resources (http://wgcv.ceos.org). There are currently six established subgroups
within WGCV including: (1) atmospheric  chemistry; (2)  infrared and visible optical sensors (IVOS);
(3) land product validation (LPV); (4) terrain mapping (TM); (5) synthetic aperture radar (SAR); and
(6) microwave sensors subgroup (MSSG).

Each subgroup has a specific mission. For example, the relevant subgroup for global land product
validation is LPV.  The mission of LPV is to increase the quality and economy of global satellite product
validation by developing and promoting international standards and protocols for field sampling, scaling
error budgeting, data exchange and product evaluation, and to advocate mission-long validation programs
for current and future earth observing  satellites (Justice et al., 2000).  In this chapter, by considering the
lessons learned from previous and current  programs, we describe a strategy to utilize LPV for current and
future global land-cover (LC) validation efforts.

3.1.2  Approaches to Land-Cover Validation

Approaches to LC validation may be divided into two primary types:  statistical approaches and
confidence-building measures. Confidence-building measures include studies or comparisons made
without a firm statistical basis that provide confidence in  the map. When presented with a LC map
product, user's typically first carry out "reconnaissance measures" by examining the map to see how well
it conforms to regional landscape attributes, such as mountain chains, valleys, or agricultural regions.
Spatial structure is  inspected to ensure that the map has sensible patterns of LC that are without excessive
"salt-and-pepper" noise or excessive smoothness and generalization. Land-water boundaries are checked
for continuity to reveal the quality of multi-date registration.  The map is carefully examined for gross
errors,  such as cities in the Sahara or water on high mountain slopes.  If the map seems reasonable based
on these and similar criteria, validation can proceed to more time-consuming confidence measures.  These
Page 36 of 339

-------
include ancillary comparisons, in which specific maps or datasets are compared to the map. However,
such comparisons are not always straightforward, since ancillary materials are typically prepared from
input data acquired at a different time. Also, map scales and LC units used in the ancillary materials may
not be directly comparable to the map of interest.

The Global Land Cover 2000 program has established a systematic approach for qualitative confidence
building in which a global map is divided into small cells, each of which is examined carefully for
discrepancies. This procedure is described more fully in section 3.2.

Statistical approaches may be further broken down into two types:  model-based inference and design-
based inference (Stehman, 2000 and 2001). Model-based inference is focused on the classification
process, not the mapper se. A map is viewed as one realization  of a classification process that is subject
to error, and the map's accuracy is characterized by estimates of errors in the classification process that
produced it. For example, the MODIS LC product provides a confidence value for each pixel that
measures of how well the pixel fits the training examples presented to the classifier. Design-based
inference uses statistical principles in which samples are acquired to infer characteristics of a finite
population, such as the pixels in a LC map.  The key to this approach is probability-based sampling, in
which the units to be sampled are drawn with known probabilities. Examples include random sampling,
in which all possible sample units have equal probability of being drawn, or stratified random sampling,
in which all possible sample units within a particular stratum have equal probability of being drawn.

Probability-based samples are used to derive consistent estimates of population parameters that equal the
population parameters when the entire population is included in  the sample.  Consistent estimators
commonly used in LC mapping from remotely sensed data include the proportion of pixels correctly
classified (global accuracy); "user's accuracy," which is the probability that a pixel is truly of a particular
cover to which it was classified; and "producer's accuracy," which is the probability that a pixel was
mapped as a member of a class to which it's truly a member. These estimators are typically derived from
a confusion matrix, which tabulates true class labels with those assigned on the map according to the
sample design.

While design-based inference allows proper calculation of these very useful consistent estimators, it is not
without its difficulties. Foremost is the difficulty of verifying the accuracy of the label assigned to a
sampled pixel. In the case of a global map, it is not possible to go to a randomly assigned location on the
Earth's surface. Thus, the accuracy of a label is typically assessed using finer resolution remotely sensed
data. In this case, accuracy is assessed by photointerpretation, which is subject to its own error.
Registration errors also occur and commonly restrict or negate a pixel-based assessment strategy.

Another practical problem may lie in the classification scheme itself. Sometimes the LC types are not
mutually exclusive or are difficult to resolve. For example, in the International Geosphere/Biosphere
Project (IGBP) legend, permanent wetland may also be forest (Loveland et al., 1999). Or, the pixel may
fall on a golf course. Is it grassland, savanna, agriculture, urban or built-up land? A related problem is
that of mixed pixels. Where fine resolution data show a selected pixel to contain more than one cover
class, how is a correct label to be assigned?
                                                                                      Page 37 of 339

-------
 Additionally, the classification error structure as assessed by the consistent estimators above may not be
 the most useful measure of classification accuracy.  Some errors are clearly more problematic than others
 For example, confusing forest with water is probably a more serious error than confusing open and closed
 shrubland for many applications.  This problem leads to the development of "fuzzy" accuracies that better
 meet user needs (Gopal and Woodcock, 1994).

 A final concern is that a design-based sample designed to validate a specific map cannot necessarily be
 used to validate another. A proper design-based validation procedure normally calls for stratified
 sampling so that accuracies may be established for each class with equal certainty. With stratified
 sampling, the probability of selection of all pixels within the same class is equal.  If a stratified sample is
 overlain on another map, the selected pixels do not retain this property, thus introducing bias. While an
 unstratified (random or regular) sample does not suffer from this problem, very large sample sizes are
 typically required to gain sufficient samples from small classes to establish their accuracies with needed
 precision.

 While the foregoing discussion described the major elements for validating LC maps, particularly at the
 global scale,  it is clear that a proper validation plan requires all three. Confidence-building measures are
 used at early stages both to refine a map that is under construction and to characterize the general nature
 of errors of a specific map product.  Model-based inference, implemented during  the classification
 process, can provide users with a quantitative assessment of each classification decision. Design-based
 inference, although costly, provide unbiased map accuracy statements using consistent estimators.


 3.1.3  Lessons Learned from IGPB Discover

 The IGBP Discover LC dataset, produced from 1.1  km spatial resolution AVHRR data by Loveland et
 al. (2000),  remains a milestone in  global LC classification using satellite data. The validation process
 used incorporated a global random sample stratified by cover type. Selected pixels were examined at
 high spatial resolution using Landsat and SPOT data in a design that featured multiple photographic
 interpreters classifying each pixel. Although not without difficulties, the validation process was very
 successful, yielding the first global validation of a global thematic map.

 Recent research by Estes et al. (1999) summarized the lessons learned in the IGBP DISCover validation
 effort that apply to current and future global LC validation efforts.  A primary conclusion was that the
 information of coarse-resolution satellite datasets is limited by such factors as multi-date registration,
 atmospheric correction, and directional viewing effects.  These limits in turn impose limits on the
 accuracies achievable in any global classification scenario.  It should be noted that coarse-resolution
 satellite imaging instruments continue to produce data of improved quality. For example, data from
 MODIS that are used to develop LC products include nadir-looking surface reflectance's that are obtained
 at multiple  spatial resolutions (250, 500, and 1,000 m).

 Secondly, LC products developed  using the spectral and temporal information available from coarse-
resolution satellite imagers will always be an imperfect process, given the high intrinsic  variance found in
the global range (variability) of cover types. While the natural variation within many cover types is large
new instruments may yield new data streams that increase the certainty of identifying them uniquely.
Page 38 of 339

-------
Among these are measures of vegetation structure derived from multi-angular observations, measures of
spatial variance obtained from finer-resolution channels, and ancillary datasets such as land surface
temperature.

A third lesson concerned the quality and availability of fine-resolution imagery for use in validation. Not
only were Landsat and SPOT images costly, they were also very scarce for some large and ecologically
important regions, such as Siberian conifer forest. However, the present Landsat 7 acquisition policy,
which includes acquiring at least four relatively cloud-free scenes per year for every path and row,
coupled with major price decreases, has eased this problem significantly for future validation efforts.
Although, the recent (May 2003) degradation of ETM+ capabilities may significantly reduce future data
acquisition capabilities.

A fourth lesson documented was that interpreter skill and the quality of ancillary data were major factors
that significantly impacted assessment results. Best results were obtained using local interpreters that
were familiar with the region of interest. The most important observation was that proper validation was
an essential component of the mapping process and requires a significant amount of the total effort.
Roughly, one-third of the mapping resources were expended equally to each of the following: (1) data
assembly; (2) data classification; and (3) assessing the quality and accuracy of the result.  Supporting
agencies need to understand that a map classification is not completed until it is properly validated.
3.2   Validation of the European Commission's Global Land-Cover 2000

The general objective of the European Commission's Global Land Cover (GLC) 2000 was to provide a
harmonized global LC database. The year 2000 was considered a reference year for environmental
assessment in relation to various activities, and in particular the United Nation's Ecosystem-related
International Conventions. To achieve this objective, GLC 2000 made use of the VEGA 2000 dataset:
a dataset of 14 months of pre-processed daily global data acquired by the VEGETATION instrument on
board SPOT 4. These data were made available through a sponsorship from members of the
VEGETATION program (http://www.gvm.sai.jrc.it/glc2000/defaultGLC2000.htm).

The validation of the GLC 2000 products incorporated the following methods: (1) confidence-building
based on a comparison with ancillary data; and (2) quantitative accuracy assessment using a stratified
random sampling design and high-resolution sites. First, the draft products are reviewed by experts and
compared with reference data (thematic maps, satellite images, etc.). These quality controls meet two
important objectives:  (1) the elimination of macroscopic errors; and (2) the improvement of the global
acceptance by the customers associated in the process. Each validation cell (200 x 200 km) was
systematically compared with reference material and documented in a database containing intrinsic
properties of the GLC2000 map (thematic composition and spatial pattern) and identified errors (wrong
labels or limits).

This design-based inference had the objective of providing a statistical assessment of the accuracy by
class and is based on a comparison with high-resolution data interpretations. The main characteristics
were:  (1) stratified randomly by cover class; (2) broad network of experts with local knowledge;
                                                                                    Page 39 of 339

-------
(3) decentralized approach; (4) visual interpretation of the higher resolution imagery; and
(5) interpretations based on the hierarchal classification scheme (Di Gregorio, 2000). Both the confidence
building and designed based components occurred sequentially. Confidence building started with         '
problematic areas (as expected by the map producer). This allowed for the correction of macro-errors
found during the check. Then, a systematic review of the product using the same procedure was
conducted before implementing the final quantitative accuracy assessment.
3.3   Validation of the MODIS Global Land-Cover Product

A team of researchers at Boston University currently produces a global LC product at 1.0 km spatial
resolution using data from the MODIS instrument (Friedl et al., 2002). The primary product is a map of
global LC using the IGBP classification scheme, which includes 17 classes that are largely differentiated
by the life-form of the dominant vegetation layer. Included with the product is a confidence measure for
each pixel as well as the second-most-likely class label. Input data are MODIS surface reflectance
obtained in seven spectral bands coupled with an enhanced vegetation index product also derived from
MODIS. These are obtained at 16-day intervals for each 1.0 km pixel.  The classification  is carried out
using a decision tree classifier operating on more than 1,300 global training sites identified from high-
resolution data sources, primarily Landsat Thematic Mapper and Enhanced Thematic Mapper Plus
(ETM+).  The product is produced at three- to six-month intervals using data from the prior 12-month
period (http://geography.bu.edu/landcover/userguidelc/intro.html).

The validation plan  for the MODIS derived LC product incorporates all approaches identified in section
3.1.2.  Confidence-building exercises are used to provide a document accompanying the product that
describes its strengths and weaknesses in qualitative terms for specific regions. A web site also
accumulates comments from users, providing feedback on specific regions. Confidence-building
exercises also include comparisons with other datasets including; the Landsat Pathfinder for the humid
tropics, United Nation's Food and Agricultural Organization (FAO) forest resource assessment, the
European Union's Co-ordination of Information on the Environment (CORTNE) database  of LC for
Europe, and the U.S. interagency sponsored Multi-Resolution Land Characteristics (MRLC) database

Model-based inference of classification accuracy is represented by the layer of per-pixel confidence
values, which quantifies the posterior probability of classification for each pixel. This probability is first
estimated by the classifier, which uses information on class signatures and separability obtained during
the  building of the decision tree using boosting (Friedl et al., 2002) to calculate the classification
probability.  This probability is then adjusted by three weighted prior probabilities,  associated with (a) the
global frequency of all classes taken from the prior product; (b) the frequency of class types  within the
training set; and (c) the frequency of classes within a 200 x 200 pixel moving window.  The result is a
posterior probability that merges present and prior information and is used to assign the most likely class
label to each pixel. The  posterior probabilities are then summarized by cover type and region to convey
information to users about the quality of the classification.

A form of design-based inference is used in the preparation of a confusion matrix taken from the
classification of training sites. In this process, all training sites are divided into five equal  sets.  The
Page 40 of 339

-------
classifier is trained using four of the five sets and then classifies the unseen sites in the fifth set. This
procedure is repeated for each set as the unseen set, yielding a pair of labels - "true" and "as classified" -
for every training site. Cross-tabulation of the two labels for the training site collection yields a confusion
matrix that provides estimates of global,  user's, and producer's accuracies. This matrix is provided in the
documentation of the LC data product.

Note that these estimated accuracies will be biased because the training sites are not chosen randomly,
and thus may not properly reflect the variance encountered across the full extent of the true land cover
class. However, in selection of training sites, every effort is made to identify sites that do reflect the full
range of variance of each class. Accordingly, the accuracies obtained are thought to be reasonable
characterizations of the true accuracies, even though they cannot be shown to be proper unbiased
estimators. In a final application of design-based inference, the MODIS team plans to conduct a random
stratified sample of its LC product at regular intervals.  The methodology will be similar to that of the
IGBP DISCover validation effort (see section 3.1.3). However, funds have not yet been secured to
support this costly endeavor.
3.4   CEOS Land Product Validation Subgroup

The lessons learned from previous and ongoing projects point to several areas where LPV can help with
validation efforts. Perhaps most fundamental is that CEOS/WGCV/LPV provides a forum to discuss
these issues and develop and maintain a standardized protocol. Indeed the authors are all involved with
LPV and it was through this association that we have developed this article.  There is also the opportunity
to communicate on LC classification systems, although each project will have its own system,
coordination between the two projects result in synergy between the two systems (Di  Gregorio and
Jansen, 2000; Thomlinson et al., 1999). Here we present methods by which LPV can help address the
specific lessons learned from IGBP  in the context of the two current projects. Table 3-1 lists the various
subgroups and their corresponding URLs.
           Table 3-1.  CEOS land-cover validation participants and contributions.
Entity
CEOS
Working Group on Calibration and Validation
Land Product Validation subgroup
httrj://www.wacvceos.ora/
Global Observation of Forest Cover
Global Observation of Land Dynamics
httD://www.fao.ora/qtos/aofc-aold/index.html
European Commission's Global Land Cover 2000
httD://www.avm. sai.irc.it/alc2000/
defaultGLC2000.htm
NASA's Global Land Cover product
httD7/edcdaac.usas.aov/modis/mod12a1.html
Role in global land cover validation
Coordinate validation activities of CEOS
members
Coordination of Regional networks to provide
"local" expertise
Data Producer
Data Producer
                                                                                    Page 41 of 339

-------
  Table 3-1. Continued
  Entity
Role in global land cover validation
  EOS Land Validation Core Sites
  http://modis.asfc.nasa.QOV/MODIS/LANDA/AL/CEOS
  WGCV/lai intercomp.html
Sites under consideration for
CEOS Land Product Validation Core Sites
 VALERI (VAIidation of Land European Remote
 sensing Instruments)
 http://www.avianon.inra.fr/valeri/
Sites under consideration for
CEOS Land Product Validation Core Sites
  CEOS "LAI-intercomparison"
  http.7/landval.gsfc.nasa.qov/LPVS/BIO/lai
  intercomp.html
Sites under consideration for
CEOS Land Product Validation Core Sites
As satellite sensors and related algorithms continue to improve, many of the technical obstacles addressed
above may be overcome. However it is essential groups producing global LC products have a thorough
awareness of technology improvements across the range of satellite sensors including optical, microwave
LIDAR, and SAR. Such awareness can be supported through LPV's interaction with the other WGCV
subgroups; including discussions at the semi-annual WGCV plenary meetings and utilizing the projects
and publications available through the other subgroups of WGCV. Further, coordination of various LC
products can help determine the most suitable approach to using multiple products.  For example, the
MODIS product has been operationally produced since 2001. Careful examination of this product as well
as the GLC 2000 product could lend insight into the best way to use both in a complementary fashion.


3.4.1  Fine-Resolution Image Quality and Availability

Data sharing of high-resolution imagery may be one of the most immediate and concrete ways in which
LPV can support global land product validation. Using the NASA Earth Observing System Land
Validation Core Sites (Morisette et al., 2002) as an example, the LPV and WGISS subgroups are
establishing an infrastructure for a set of "CEOS Land Product Validation Core Sites." The initial sites
being considered for this project are shown in Plate 3-1, which represents an agglomeration of three
entities: (1) the EOS Land Validation Core Sites, (2) the VAIidation of Land European Remote sensing
Instruments (VALERI) project, and (3) the CEOS "LAI Inter-comparison" activity (see Table  3-1,
above).  The concept is to establish a set of sites where high-resolution data will be archived and proved
free or at minimal cost over locations where field and/or tower measurements are continuously or
periodically collected (see Plate 3-1). These core sites are intended to serve as validation sites for
multiple satellite products. Specific products appropriate for validation depend on the individual field
tower measurements parameters (Morisette et al., 2002). Practically, the limited number of sites
(approximately 50), which are not based on a random sample, cannot be used for statistical inferences on
a global product. However,  in terms of LC validation, the high  resolution data from these sites would
allow a set of common "confidence building sites" that could be shared by GLC 2000 and MODIS as well
as future global  LC mapping efforts. LC product comparisons with high-resolution data  and cross-
comparison with other global LC products over the core sites would provide substantive  information for
initial quality control.  Additionally, within a given site a random sample could  be collected and design-
based inference  carried out for that particular "sub-population." So, while the core site concept has
Page 42 of 339

-------
limitation with respect to statistical inference, the opportunities for data sharing and initial cross-
comparison at a set of core sites seems worthwhile.

                                                                                                 M

       0 Water
       1 Evergreen Needleleaf Forest
       2 Evergreen Broadleat Forest
       3 Deciduous Needleleaf Forest
       4 Deciduous Broadteef Forest
       5 Mixed Forests
] 6 Closed Shrublands
j 7 Open Shrublands
 8 Woody Savanna*
 9 Savannas
 10 Grasslands
111 Permanent Wetlands
    i 12 Croplands
•• 13 Urban and Built-Up
^•B 14 Cropland/Natural Vegetation Mosaic
I    115 Snow and Ice
    ] 16 Barren or Sparsely Vegetated
    1254 Unclassified
                Plate 3-1.  CEOS land-cover product evaluation core site locations.
3.4.2   Local Knowledge Requirements

The LPV was strategically designed to complement the objectives of the Global Observation of Forest
Cover/Land Dynamics (GOFC/GOLD) program (http://www.fao.org/gtos/gofc-gold/).  This partnership
provides a context for validation activities (through LPV) within the specific user group (GOFC/GOLD).
GOFC/GOLD is broken down into three implementation teams that include: (I) LC characteristics and
change; (2) fire-related products; and (3) biophysical processes.  Initial activities of LPV have also
focused on these three areas through topical workshops and initial projects. A major component of
GOFC/GOLD is to build on "regional networks." These networks involve local and regional partners
who are interested in using the global products and serve to provide feedback to the data producers.  This
regional network concept has proven to be a significant resource to support validation efforts. The IGBP
experience indicates that the knowledge gained through regional collaborators is critical.  LPV can use
the regional networks as an infrastructure to gain local expertise for product validation. This
                                                                                      Page 43 of 339

-------
 infrastructure can provide assistance with the difficult and labor-intensive task of design-based inference
 planned for both MODIS and GLC 2000.

 3.4.3  Resource Requirements

 The LPV has been established with the realization that proper validation requires a significant scientific
 effort. Indeed the subgroup has been established to conduct global validation activities as efficiently as
 possible. The validation approaches described here have all been conceived to minimize the resource
 requirements for global LC validation. To this end, the LPV has capitalized on the most current sensor
 technologies (high-resolution) and exploited data sharing opportunities with both the CEOS core sites an
 the use of GOFC/GOLD regional networks, to reduce the cost and effort of global validation efforts.  Th
 LPV subgroup is collaborating with the MODIS LC and GLC 2000 programs to help realize and develo
 these suggestions.  This, in turn, can be applied to future global LC products.
 3.5    Summary

 This chapter presents the approach for the use of the CEOS to coordinate the validation efforts of global
 land products. This premise is based on experience from previous global validation through the IGBP
 which depended on the good will, support, cooperation, and collaboration of interested organization and
 institutions. Two global LC efforts are now underway including: (1) NASA's MODIS Global LC
 product; and (2) the European Commission's GLC 2000. These validation efforts will likewise require
 coordination and collaboration - much of which has been, or is being, established. In this chapter we
 discussed issues pertaining to validation of global  LC products, presented a brief overview of the
 validation strategy for the two current efforts, then described a mutually beneficial strategy for both to
 realize some efficiencies by using CEOS to further coordinate their validation efforts. This strategy
 should be applicable to other global LC mapping efforts, such as those being developed for the
 GOFC/GOLD and beyond.
3.6   Acknowledgments

Thanks are extended to Yves-Louis Desnos, as chair of the CEOS Working Group on Calibration and
Validation, for continued attention to the CEOS Core Site concept. The authors would like to
acknowledge John Hodges, Boston University, for providing Plate 3-1. Also, reviews from Ross Lunetta
and anonymous reviewers were helpful and appreciated.
3.7   References

Di Gregorio, A. and L.J.M. Jansen. Land Cover Classification System (LCCS):  Classification Concepts
    and User Manual. Food and Agriculture Organization of the United Nations, Rome
    (http://www.fao.org/DOCREP/003/X0596E/X0596EOO.HTM), 2000.
Page 44 of 339

-------
Estes, J., A. Belward, T. Loveland, J. Scepan, A. Strahler, J. Townshend, J., and C. Justice.  The way
    forward. Photogrammetric Engineering and Remote Sensing, 65, 1089-1093, 1999.

Friedl, M. A., D. K. Mclver, J.C.F. Hodges, X.Y. Zhang, D. Muchoney, A.H. Strahler, C.E. Woodcock,
    S. Gopal, A. Schneider, A. Cooper, A. Baccini, F. Gao, and C. Schaaf.  Global land cover mapping
    from MODIS: Algorithms and early results. Remote Sens. Environ, 83(1-2), 287-302, 2002.

Gopal, S. and C. Woodcock. Theory and methods for accuracy assessment of thematic maps using fuzzy
    sets.  Photogrammetric Engineering and Remote Sensing, 60, 181-188, 1994.

Justice, C., A. Belward, J. Morisette, P. Lewis, J. Privette and F. Baret. Developments in the 'validation1
    of satellite sensor products for the study of land surface, Int. J. Remote Sensing, 21(17), 3383-3390,
    2000.

Loveland, T.R., Z. Zhu, D.O. Ohlen, J.F. Brown, B.C. Reed and Y. Limin.  An Analysis of the IGBP
    Global Land-Cover Characterization Process.  Photogrammetric Engineering and Remote Sensing,
    65(9), 1021-1032,  1999.

Loveland, T.R., B.C. Reed, J.F. Brown, D.O. Ohlen, Z. Zhu, L. Yang, and J.W. Merchant.  Development
    of a global land cover characteristics database and IGBP DISCover from 1-km AVHRR data.  Int. J.
    Remote Sensing, 21, 1303-1330, 2000.

Morisette J.T., J.L. Privette and C.O. Justice. A framework for the validation of MODIS land products,
    Remote Sens. Environ., 83(1-2), 77-96, 2002.

Stehman, S.V.  Practical implications of design-based sampling inference for thematic map accuracy
    assessment. Remote Sens. Environ., 72, 34-45, 2000.

Stehman, S.V.  Statistical rigor and practical utility in thematic map accuracy assessment.
    Photogrammetric Engineering and Remote Sensing, 67, 727-734, 2001.

Thomlinson J.R., P.V. Bolstad and W.B. Cohen. Coordinating Methodologies for Scaling Landcover
    Classifications from Site-Specific to Global: Steps toward Validation Global Map Products. Remote
    Sens. Environ., 70(1), 16-28, 1999.
                                                                                   Page 45 of 339

-------
Page 46 of 339

-------
                                     Chapter 4
     In Situ Estimates of Forest LAI for MODIS Data Validation
                                           by
                                   John S. liames, Jr.1
                         Andrew N. Pilant, Corresponding Author1*
                                    Timothy E. Lewis2
  U.S. Environmental Protection Agency
  National Exposure Research Laboratory
  (E243-05)
  Research Triangle Park, NC 27711

  *'Corresponding Author Contact:

   Telephone:  (919)541-0648
       E-mail:  pilant.drew@epa.gov
U.S. Environmental Protection Agency
National Center for Environmental Assessment
(B243-01)
Research Triangle Park, NC 27711
4.1    Introduction

Satellite remote sensor data is commonly used to assess ecosystem condition through synoptic monitoring
of terrestrial vegetation extent, biomass, and seasonal dynamics. Two commonly used vegetation indices
that can be derived from various remote sensor systems include the Normalized Difference Vegetation
Index (NDVI) and Leaf Area Index (LAI).  Detailed knowledge of vegetation index performance is
required to characterize both the natural variability across forest stands and the intra-annual variability
(phenology) associated within individual stands. To assess performance accuracy, in situ validation
procedures can be applied to evaluate the accuracy of remote sensor derived indices. A collaborative
effort was established with researchers from the U.S. Environmental Protection Agency (EPA), National
Aeronautics and Space Administration (NASA), academia, and non-governmental organizations to
evaluate the Moderate Resolution Imaging Spectroradiometer (MODIS) NDVI and LAI products across
six validation sites in the Albemarle-Pamlico Basin (APB), NC and VA (see Figure 4-1).

The significance of LAI and NDVI as source data for process-based ecological models has been well
documented. LAI has been identified as the variable of greatest importance for quantifying energy and
mass exchange by plant canopies (Running et al., 1986), and has been shown to explain 80-90% of the
                                                                              Page 47 of 339

-------
variation in the aboveground forest net primary production (NPP) (Gholz, 1982; Gower et al., 1992;
Fassnacht and Gower, 1997).  LAI is an important biophysical state parameter linked to biological
productivity and carbon sequestration potential, and is defined here as one-half the total green leaf area
per unit ground surface area (Chen and Black, 1992). NPP is the rate at which carbon is accumulated by
autotrophs and is expressed as the difference between gross photosynthesis and autotrophic respiration
(Jenkins et al., 1999).
                                                            Albemarle-
                                                            Pamlico Basin
                              Kilometwi 0
                                         50
          Figure 4-1.
LAI field validation site locations within the Albemarle-Pamlico
Basin in southern Virginia and northern North Carolina (USA).
(1) Hertford; (2) South Hill; (3) Appomattox; (4) Fairystone; (5)
Duke FACE; (6) Umstead.
NDVI has been used to provide LAI estimates for the prediction of stand and foliar biomass (Burton et
al., 1991) and as a surrogate to estimate stand biomass for denitrification potential in forest filter zones for
agricultural non-point source nitrogenous pollution along riparian waterways (Verchot et al., 1998).
Interest in tracking LAI and NDVI changes includes the role forests play in the sequestration of carbon
from carbon emissions (Johnsen et al., 2001), and the formation of tropospheric ozone from biogenic
emissions of volatile organic compounds naturally released into the atmosphere (Geron et al., 1994). The
NDVI has commonly been used as an indicator of biomass (Eidenshink and Haas, 1992), and vegetation
vigor (Carlson and Ripley, 1997).  NDVI has been applied in monitoring seasonal and inter-annual
vegetation growth cycles, land-cover (LC) mapping, and change detection. Indirectly, it has been used as
a precursor to calculate LAI, biomass, the fraction of absorbed photosynthetically active radiation
(fAPAR), and the areal extent of green vegetation cover (Chen, 1996).
Page 48 of 339

-------
Direct estimates of LAI can be made using methods such as destructive sampling and leaf litter collection
methods (Neumann et al.,  1989). Direct destructive sampling is regarded as the most accurate approach,
yielding the closest approximation of "true" LAI.  However, destructive sampling is time consuming and
labor intensive, motivating development of more rapid indirect field optical methods. Field optical
techniques include hemispherical photography, LiCOR Plant Canopy Analyzer (PCA) (Deblonde et al..
1994), and the Tracing Radiation and Architecture of Canopies (TRAC) sunfleck profiling instrument
(Leblanc et al., 2002).  In situ forest measurements serve as both reference data for satellite product
validation and as baseline  measurements of seasonal vegetation dynamics, particularly the seasonal
expansion and contraction of leaf biomass.

The development of appropriate ground-based sampling strategies is critical to the accurate specification
of uncertainties in LAI products (Tian et al., 2002). Other methods that have been  implemented to assess
the MODIS LAI  product have  included a spatial cluster design and a patch-based design (Burrows et al.,
2002).  Privette et al. (2002) used multiple parallel 750 m TRAC sampling transects to assess to measure
LAI and other canopy properties at scales approaching that of a single MODIS pixel. Also, a stratified
random sampling (SRS) design element provided for sample intensification less frequent occurring LC
types (Lunetta et al., 2001).


4.1.1   Study Area

The study area is the Albemarle-Pamlico Basin (APB) of North Carolina and Virginia (Figure 4-1).  The
APB has a drainage area of 738,735 km2 and  includes three physiographic provinces: mountain,
piedmont and coastal plain, ranging in elevation from 1280 m to sea level.  The APB sub-basins include
the Albemarle-Chowan, Roanoke, Pamlico, and Neuse River basins.  The Albemarle-Pamlico Sounds
comprise the second largest estuarine system within the continental United States.  The 1992 LC in the
APB consisted primarily of forests (50%), agriculture (27%) and wetlands (17%).  The forest component
is distributed as follows:  deciduous (48%), conifer (33%) and mixed (19%) (Vogelmann et al., 1998).
4.2   Background

4.2.1   TRAC Measurements

The TRAC sunfleck profiling instrument consists of three quantum PAR sensors (LI-COR, Lincoln, NE,
Model LI-190SB) mounted on a wand with a built-in data logger (Leblanc et al., 2002) (see Figures 4-2a
and b). The instrument is hand carried along a linear transect at a constant speed, measuring the
downwelling solar photosynthetic photon flux density (PPFD) in units of umol/nr/second. The data
record light-dark transitions as the direct solar beam is alternately transmitted and eclipsed by canopy
elements (see Figure 4-3). This record of sunflecks and shadows  is processed to yield a canopy gap size
distribution and other canopy architectural parameters, including LAI and a  foliage element clumping
index.
                                                                                   Page 49 of 339

-------
 Figure 4-2a. Photograph of TRAC instrument
             (length ~ 80 cm).
                                                   Figure 4-2b.  PAR detectors (close up).
                                       Position along transect (m)

             Figure 4-3.  TRAC transect in loblolly pine plantation (site: Hertford).
                         Peaks (black spikes) are canopy gaps. Computed
                         parameters for this transect were gap fraction = 9%;
                         clumping index (Qe) = 0.94 ; PAI = 3.07; Le = 4.4 (assuming
                         Y = 1.5, a = 0.1, and mean element width = 50 mm).
From the downwelling solar flux recorded along a transect, the TRACWin software (Leblanc, et al
2002) computes the following derived parameters describing forest canopy architecture: (I) canopy pa,
size (physical dimension of a canopy gap); (2) canopy gap fraction (percentage of canopy gaps);
(3) foliage element clumping index, Qt.(6X plant area index (LAI which includes both foliage and wooc
material); and LAI with clumping index (Qt>) incorporated.  Note that in each case the parameters are fo
the particular solar zenith angle 0 at the time of data acquisition, defining an inclined plane slicing the
canopy between the moving instrument and the sun.
Page 50 of 339

-------
Parameters entered into the TRACWin software to invert measured PPFD to the derived output
parameters include the mean element width (the mean size of shadows cast by the canopy), the needle-to-
shoot area ratio (y) (within-shoot clumping index), woody-to-total area ratio (a),  latitude/longitude, and
time.  Potential uncertainties were inherent in the first three parameters, and will be assessed in future
computational error analyses.

Solar zenith and a/irnuth influence data quality. Optimal results are achieved with a solar zenith angle 6
between 30 and 60 degrees. As 6 approaches the horizon (6 > 60 degrees), the relationship between LAI
and light extinction becomes increasingly nonlinear. Similarly, best results are attained when TRAC
sampling is conducted with a solar azimuth perpendicular to the transect azimuth. Sky condition is a
significant factor for TRAC measurements. Clear blue sky with unobstructed sun is optimal. Overcast
conditions are unsuitable; the methodology requires distinct sunflecks and shadows.

The TRAC Manual (Leblanc et al., 2002) lists the following as studies validating the TRAC instrument
and approach: Chen (1996), Chen et al. (1997), Chen and Cihlar(1995), Kucharik et al. (1997), and
Leblanc (2002).  TRAC results were compared with direct destructive sampling which is generally
regarded as the most accurate sampling technique.


4.2.2  Hemispherical Photography Measurements

Hemispherical photography is an indirect optical method that has been used in studies of forest light
transmission and canopy structure. Photographs taken upwards from the forest floor with a 180-degree
hemispherical (fisheye) lens produce circular images that record the size, shape, and location of gaps in
the forest overstory. Photographs can be taken using 35mm film cameras or digital cameras. A properly
classified  fish-eye photograph  provides a detailed map of sky visibility and obstructions (sky map)
relative to the location where the photograph was taken.  Various software programs, such as Gap Light
Analyzer (GLA), were available to process film or digital fish-eye camera images into a myriad of metrics
that reveal information about the light regimes  beneath the canopy and the productivity of the plant
canopy. These programs rely on an accurate projection of a three-dimensional hemispherical coordinate
system onto a two-dimensional surface (see Figure 4-4).  Accurate projection requires calibration
information for the fisheye lens that is used and any spherical distortions associated with the lens.  GLA
used in this analysis was available for download at http://www.ecostudies.org/gla/ (Frazer et al., 1999).

The calculation of canopy metrics depends on accurate measures of gap fraction  as a function of zenith
angle and  azimuth. The digital image can be divided into zenith and  azimuth "sky addresses" or sectors
(see Figure 4-5). Each sector can be described by a combined zenith angle and azimuth value.  Within a
given sector,  gap fraction is calculated with values between zero (totally "obscured" sky) and one (totally
"open" sky), and was defined as the proportion of unobscured sky as seen from a position beneath the
plant canopy  (Delta-T Devices, 1998).
                                                                                    Page 51 of 339

-------
                                                     B
Figure 4-4.  Illustration of (A) a hemispherical coordinate system. Such a system is used to
            convert a hemispherical photograph into a two-dimensional circular image (B), WK
            the zenith (B) is in the center, the horizon at the periphery, east is to the left, and w
            is to the right. In a equiangular hemispherical projection, distance along a radius
            proportional to zenith angle (Rich, 1990).
                     Figure 4-5.  Sky-sector mapping using GLA image
                                analysis software. Eight zenith by 18
                                azimuth sectors are shown.
Page 52 of 339

-------
4.2.3   Combining TRAC and Hemispherical Photography

LAI calculated using hemispherical photography or other indirect optical methods does not account for
the non-randomness of canopy foliage elements. Hence, the term effective leaf area index (Le) is used to
refer to the leaf area index estimated from optical measurements including hemispherical photography.
Le typically underestimates "true" LAI (Chen et al., 1991). This underestimation is due in part to non-
randomness in the canopy (i.e., foliage "clumping" at the scales of tree crown), whorls, branches, and
shoots. The TRAC instrument was developed at the Canada Centre for Remote Sensing (CCRS) to
address canopy nonrandomness (Chen and Cihlar, 1995). In the APB study, hemispherical photography
(Le) and TRAC measurements (foliage clumping index) were combined to provide a better estimate of
LAI following the method of Leblanc et al. (2002).


4.2.4   Satellite Data
             /

MODIS was launched in 1999 aboard the NASA Terra platform (EOS-AM) and in 2002 aboard the Aqua
platform  (EOS-PM), and provides daily coverage of most of the earth (Justice et al., 1998; Masuoka et al.,
1998). MODIS sensor characteristics include a spectral range of 0.42-14.35 um in 36 spectral bands,
variable pixel sizes (250, 500 and 1,000 m), and a revisit interval of 1-2 days. Landsat ETM+ images
were acquired at various dates throughout the year and were used for site characterization, and in
subsequent analysis for linking field measurements of LAI with MODIS LAI. ETM+ data characteristics
include a spectral range of 0.45-12.5 um, pixel sizes of 30 m (multi-spectral), 15m (panchromatic) and
60 m (thermal), and a revisit interval of 16 days. They also play a vital  role in linking meter-scale in situ
LAI measurements with kilometer-scale MODIS LAI imagery. 1KONOS is a high spatial resolution
commercial sensor that was launched in 1999 that provides 4.0 m multi-spectral (four bands, 0.45 -
0.88 um) and 1.0 m panchromatic data (0.45-0.90 um) with a potential revisit interval of 1-3 days.

4.2.5   MODIS LAI and NDVI Products

Numerous land, water and atmospheric geophysical products are  derived from MODIS radiance
measurements. Two MODIS land products established the primary time-series data for this research:
NDVI (MOD13Q1) (Huete et al., 1996) and LAI/FPAR  (MOD15A2) (Knyazikhin et al., 1999). The
NDVI product was a 16-day composite at a nominal pixel size of 250 m.  The LAI product was a 8-day
composite product with a pixel size of 1,000 m.  Both products were adjusted for atmospheric effects and
viewing geometry (bidirectional reflectance distribution function, BRDF). The NDVI product used in
this study was produced using the standard MODIS-NDVI algorithm (Huete et al., 1996).

The MODIS LAI product algorithms were considerably more complex. The primary approach for
calculating LAI involved the inversion of surface reflectance in two to  seven spectral bands, and
comparison of the output to biome-specific look up tables derived from three-dimensional canopy
radiative transfer modeling.  All terrestrial LC was assigned to six global biomes, each with distinct
canopy architectural properties which drove photon transport equations. The six biomes included grasses
and cereal crops, shrubs, broad leaf crops, savannas, broadleaf forests, and needle forests. The secondary
technique was invoked when insufficient high-quality data were available for a given compositing period
                                                                                Page 53 of 339

-------
 (e.g., cloud cover, sensor system malfunction) and calculated LAI based on empirical relationships with
 vegetation indices.  However, a deficiency inherent with the second approach was that NDVI saturates
 high leaf biomass (LAI values between 5-6). The computational approach used for each pixel was
 included with the metadata distributed with each data set.
 4.3    Methods

 Here we describe a field sampling design and data acquisition protocol implemented in 2002 for
 measuring in situ forest canopy properties for the analysis of correspondence to MODIS satellite NDVl
 and LAI products. The study objective was to acquire field measurement data to evaluate LAI and NDVi
 products using in situ measurement data and indirectly using higher spatial resolution imagery sensors
 including Landsat Enhanced Thematic Mapper Plus (HTM1) and IKONOS.


 4.3.1   Sampling Frame Design

 Six long-term forested research sites were established in the APB (see Table 4-1). The objective was to
 collect ground-reference data using optical techniques to validate seasonal MODIS NDVI and  LAI
 products. Baseline forest biometrics were also measured for each site.  Five sites were located in the
 Piedmont physiographic region and one site (Hertford) in the coastal plain. 'Hie Hertford and South Hill
 sites were composed of mixed conifer forest (loblolly pine), Fairystone homogeneous deciduous forest
 (oak/hickory), Uinstead mixed conifer and deciduous forest, and both Duke and Appomattox sites
 contained homogeneous stands of conifer and deciduous forest managed  under varying silvicultural
 treatments (e.g., thinning). At Duke and South Hill university collaborators monitored LAI using direct
 means (destructive harvest and leaf litter), with data being used to validate the field optical techniques
 used in this study.

       Table 4-1. Location summary for six validation sites in the Albemarle-Pamlico Basin.
Site
Appomattox
Duke_FACE
Fairystone
Hertford
South Hill
Umstead
State
VA
NC
VA
NC
VA
NC
Location
(Lat, Long)
37.219, -78.879
35.975, -79.094
36.772, -80.093
36.383, -77.001
36.681, -77.994
35.854, -78.755
Elevation
(m)
165-215
165-180
395-490
8-10
90
100-125
Physiographic
Region
Piedmont
Piedmont
Upper Piedmont
Coastal Plain
Piedmont
Piedmont
Ownership
Private
Private
State
Private
Private
State
	 	 1
Area
1200-m2 (144 h^T
1200-m2 (144~rm)~"
1200-m2 (144~~rmT~
1200-m2(144Ti^~
1200-m2 (144l^r
1200-m2 (144~fteT~
The fundamental field sampling units are referred to as quadrants and subplots (see Figure 4-6). A
quadrant was a 100 x 100 in grid with five 100 in east-west TRAC sampling transects and five
interspersed transects for hemispherical photography (lines A-E). The TRAC transects were spaced at 20
m intervals (north-south), as were the interleaved hemispherical photography sampling transects. A
subplot consisted of two 50 in transects intersecting at the 25 m center point. The two transects were
Page 54 of 339

-------
oriented at 45° and 135° to provide flexibility in capturing TRAC measurements during favorable
morning and afternoon solar zenith angles.
                              Quadrant
               LI 0
                12 0
                L3 0
                                   100m
/
A_10 • |
>
* 20n v A /wv

                                                       >L3 100
                   Sub
                                                                  L2 0
                                                                                  LI 50
               LI  0
                            L2 50
                 L4
    L4  100
                L5 0
                                                        >L5 100
                   E 10
• E 90
               Hemi"
               TRAC —
         Figure 4-6.  Quadrant and subplot designs used in the Albemarle-Pamlico Basin
                     study area.
Quadrants were designed to approximate an ETM' 3x3 pixels window.  Subplots were designed to
increase sample site density and were selected on the basis of ETM' NDVI values to sample over the
entire range of local variability. Quadrants and subplots were geographically located on each LAI
validation site using real-time (satellite) differentially corrected GPS to a horizontal accuracy of ±  1.0 m.
TRAC transects were marked every 10 m with a labeled 46 cm wooden stake. The stakes were used in
TRAC measurements as walking-pace and distance markers. Hemispherical photography transects were
staked and marked at the 10, 30, 50, 70, and 90 m locations.  Hemispherical photographs were taken at
these sampling points.

The APB quadrant design was similar to a measurement design used in a Siberian LAI study in the
coniferous forest of Krasnoyarsk, Russia (Leblanc et al., 2002). Here, each validation site had a
minimum of one quadrant.  Multiple quadrants at Faitystone were established across a 1,200 x 1,200 m
oak-hickory forest delineated on a georeferenced ETM1 image to approximate a MODIS pixel (1.0 km2),
with a 100 m perimeter buffer to partially address spatial mis-registration of a MODIS pixel (see
Figure 4-7). The stand was quartered into 600 x 600 m units. The NW corner of a LAI sampling
quadrant was assigned within each quarter block using a random number generator.
                                                                                    Page 55 of 339

-------
 A SRS design was used to select ground reference data spanning the entire range ofl.AI-NDVI values
 Fairystone sites were stratified based on a NDVI surface map calculated from July 2001 HTM imagerv
 Analysis of the resulting histogram allowed for the identification of pixels beyond i I.() standard
 deviation.  From these high/low NDVI regions, eight locations (four high, four low) were randomly
 selected from each of the four 600 x 600 m units. Subplots were established at these points to sainnle
 high or low, and mid-range NDVI regions within each of the four quadrants.
                                      1200m
                       Quadrant 1
                       Quadrant 4
           ,ioomj
Quadrant 2
  Quadrants
                     I
                   J
       Figure 4-7.   Multiple quadrant design used at the Fairystone and Umstead sites.
                    The 1,200 x 1,200 m region approximates a MODIS LAI pixel, with a 100
                    m buffer on each edge. Quadrants are randomly located within each
                    600 x 600 m quarter.
Page 56 of 339

-------
4.3.2  Biometric Mensuration
The measurement of crown closure was included
in quadrant sampling to establish the relationship
between LAI and NDVI.  Wulder et al. (1998)
found that the inclusion of this textural
information strengthened the LAI:NDVI
relationship, thus increasing the accuracy of
modeled LAI estimates. Crown closure was
estimated directly using two field-based
techniques; the vertical tube (see Figure 4-8) and
the spherical densiometer (see Figure 4-9)
(Becker et al., 2002).  Measurement estimates
were also performed using the TRAC instrument
and hemispherical photography.
Top View
Bottom View
             Cross-hairs
               eveling
             Bubbles
                           Sighting Hole

Figure 4-8.  Schematic of vertical tube used for
            crown closure estimation.
 Figure 4-9.  Illustration of (A) a spherical densiometer 60° field of view and (B) convex spherical
             densiometer (courtesy of Ben Meadows).


 Measurements of forest structural attributes (forest stand volume, basal area, and density) were made at
 each quadrant and subplot using a point sampling method based on a ten-basal-area-factor prism. Point
 sampling by prism is a plot-less technique (point-centered) where trees were tallied on the basis of their
 size rather than frequency of occurrence on a plot (Avery and Burkhart, 1983). Large trees at a distance
 had a higher probability of being tallied than small trees at that same distance. Forest structural attributes
 were measured on trees that fell within the prism angle of view included (1) diameter at breast height
 (dbh) at  1.4 m, (2) tree height, (3) tree species, and (4) crown position in canopy (dominant; co-dominant;
 intermediate; suppressed).

 At each quadrant, forest structural attributes were sampled at the 10, 50, and 90 m stations along the A, C,
 and E hemispherical photography transects (see Table 4-2).  Point sampling was performed at the subplot
                                                                                   Page 57 of 339

-------
25 m transect intersection. Physical site descriptions were made at each quadrant and subplot by
recording slope, aspect, elevation, and soil type. Digital images were recorded at the /ero-meter station of
each TRAC transect during each site visit for visual documentation. Images were collected at 0°, 45°,
and 90° from horizontal facing east along the transect line.

      Table 4-2. Vegetation summary for six validation sites in the Albemarle-Pamlico Basin.
Site
Appomattox
Duke
Fairystone
Hertford
South Hill
Umstead
Type
Pine
Hardwood
Pine-
Thinned
Hardwood
Hardwood
Pine
Pine
Pine
Hardwood
%
25
25
bU
30
100
100
100
30
70
Over
TPH
1250
1255
313
-
725-1190
1740
-
-
Under
TPH
3790
-

-
-
2830
-
-
AvgHt
(m)
15.9
21.3
16.9
-
15.5-19.5
14.3
-
-
Avg dbh
(cm)
21.6
24.3
23.2
-
8.5-11.5
18.5
-
-
CC%
Dom
71
-

-
-
71
-
-
CC%
Sup
34
-
"
-
-
29
-

BA/H
(m*/H)
36.7
22.9
11.5
^Ti^7
37^3
-

 Note:  Over TPH = trees per ha for trees greater than 5.08 cm dbh; Under TPH = trees per ha less than
       5.08 cm in dbh; Avg Ht = average Height; Avg dbh = average diameter at breast height; CC%
       Dom = crown closure for dominant crown class determined by vertical tube method; CC% Sup =
       crown closure for suppressed crown class determined by fixed radius plot method; BA/H  = basal
       area per hectare.
 4.3.3   TRAC Measurements

 The TRAC instrument was hand-carried at waist height (~ 1.0-1.5 m) along each transect at a constant
 speed of 0.3 rn/s. The operator traversed 10 m between survey stakes in 30 seconds, monitoring speed bv
 wristwatch.  The spatial sampling interval at 32 Hz at a cruising speed of 0.3 m/s was approximately
 10 mm (i.e., 100 samples/m). To the degree possible, transects were sampled during the time of day at
 which the solar azimuth was most perpendicular to the  transect azimuth. Normally, quadrants were
 traversed in an east-west direction, but if the solar azimuth at the time of TRAC sampling was near 90° Or
 270° (early morning or late afternoon in summer), quadrants were traversed on a north-south alignment

 PPFD measurements were made  in an open area before and after the under-canopy data acquisition for
 data normalization to the maximum solar input. Generally, large canopy gaps provided an approximation
 of the above-canopy PPFD, used to define the above canopy solar flux at times when access to open areas
 was limited. Under uniform sky  conditions, above canopy solar flux was interpolated between measured
 values.  Under partially cloudy conditions, the operator stopped recording photon flux during cloud
 eclipse of the solar beam.
 Page 58 of 339

-------
Operators performed a check on the data in the field immediately after download to a portable computer.
Typically, this involved plotting the PPFD in graphical form, and comparing the number of segments
collected to the number of 10 m intervals traversed. An important quality assurance measure was the use
of paper and computer forms for data entry. To ensure that all relevant ancillary data (i.e., weather
conditions, transect orientation, operator names, data file names) were captured in the field, operators
filled out paper forms on site for TRAC, hemispherical photography and biometric measurements. These
data forms were then entered into a computer database via prescribed forms, preferably immediately after
data collection.  This was a simple but valuable step to ensure that critical data acquisition and processing
parameters were not inadvertently omitted from field notes. The computer forms provided a user
interface to the relational database containing all the metadata for the APB project.

4.3.4   Hemispherical Photography

Jwo Nikon Coolpix 995 digital cameras with Nikon FC-E8 fisheye converters were  used  in conjunction
with TRAC at all six APB research sites. Exposures were set to automatic with normal file compression
(approximately  1/8) selected at 1,600 x 1,200 pixel image size. Hemispherical images were not collected
while the sun was above the horizon, unless the sky was uniformly overcast.  Images were primarily
captured at dawn or dusk  to avoid the issue of nonuniform brightness, resulting in the foliage  being
"washed out" in the black-and-white binary image.

The camera was mounted on a tripod and leveled over each wooden stake along each "A  through E"
photo transect.  The height of the camera was adjusted to approximately breast height (1.4 m) and leveled
to ensure that the "true" horizon occurred at a 90° zenith angle in the digital photographic image.  The
combination of two bubble levclers, one mounted on the tripod and the other on the lens  cap,  ensured the
capture of the "true" horizon in each photograph.  Using a hand-held compass, the camera was oriented to
true north so that the azimuth values in the photograph corresponded to the true orientation of the canopy
architecture in the forest stand. Orientation did not affect any of the whole-image canopy metrics (i.e.,
LAI, canopy openness, site openness) calculated by GLA.  However, comparison of metrics derived by
hemispherical photography, TRAC, densiometer, or forest mensuration measurements required accurate
image orientation.

After the images were captured in the field, they were downloaded from the camera disk, placed in a
descriptive file directory structure, and were renamed to reflect the site and transect point. A  GLA
configuration file (image  orientation, projection distortion and lens calibration, site  location coordinates,
growing-season length, sky-region brightness,  and atmospheric conditions) was created for each site.
Next, images were registered  in a procedure that defined an image's circular area and location of north in
the image.  Image registration entailed  entering pixel  coordinates (image size and camera dependent) for
the initial and final X and Y points.  The FC-E8 fisheye lens used  in this study had  an actual  field of view
greater than 180° (-185°). The radius  of the image was reduced accordingly so that the 90° zenith angle
represented the true horizon.  Frazer et al. ( 2001) described the procedure for calibrating a fisheye lens.
Calibration results were entered into the GLA  configuration file (Canham et al., 1994).

The analyst-determined threshold setting in GLA adjusted the number of black ("obscured" sky) and
white ("unobscured" sky) pixels  in the working image. This was perhaps the most subjective setting in
                                                                                     Page 59 of 339

-------
  the entire measurement process and potentially the largest source of error in the calculation of LAI and
  other canopy metrics from hemispherical photographs.  As a rule-of-thumb, the threshold value was
  increased so that black pixels appeared that were not represented by canopy elements in the registered
  color image. The threshold was then decreased from this point until the black dots or blotches
  disappeared and the black-and-white  working image was a reasonable representation of the registered
  color image (Frazer et al., 1999).


  4.3.5   Hemispherical Photography Quality Assurance

  The height at which each hemispherical photograph was taken represented a potential source of position
  errors (-5-10 cm).  At relatively level sampling points, the tripod legs and center shaft were fully
  extended to attain a height that approximated breast height. However, at sites with steep and/or uneven
  slopes, the camera height may have varied between repetitive measurement dates, due to variations in th
  extension of the tripod legs possibly resulting in inclusion or exclusion of near-lens vegetation.

  Several comparisons of hemispherical photographic estimates of LAI with direct estimates in broad leaf
 and conifer forest stands have been reported (Chason, 1991; Chen and Black, 1991; Deblonde et al
  1994; Fassnacht et al.,  1994; Neumann et al., 1989; Runyon et al., 1994). These comparisons all showed
 that there was a high correlation between the indirect and direct methods, but the indirect methods were
 biased low. This was because the clumping factor was not accounted  using a random foliar distribution
 model (Chen eta!.,  1991).

 To assess analyst repeatability, a set of 3 I hemispherical photographic images collected in eastern Oreeo
 were analyzed and threshold values were charted using SAS QC software (SAS,  1987). Two analysts in
 the APB study repeatedly analyzed the 31 images to develop an on-going quality control assessment of
 precision  when compared to the Oregon assessment.
 4.4    Discussion

 4.4.1   LAI Accuracy Assessment

 Chen (1996) provided an estimate of errors in optical measurements of forest LAI using combined TRAC
 and LiCOR 2000 PCA instruments. We assumed that the PCA was equivalent to digital hemispherical
 photography for this discussion.  Chen states that based on error analysis, carefully executed optical
 measurements can provide LAI accuracies of close to or better than 80% as compared to destructive
 sampling. The approximate errors accumulated as follows:  PCA measurements (3-5%); estimate of
 needle-to-shoot area ratio (y) (5-10 %); estimate of foliage element clumping index (3-10%); and
 estimate of woody-to-total area ratio (5-12%). These factors sum to an approximate total error of 15-40%
 in ground based optical instrument estimates of LAI.

 Chen (1996) also reports that the highest accuracy (-85%) (relative to destructive  sampling) "can be
 achieved by carefully operating the PCA and TRAC, improving the shoot sampling strategy and the
 measurement of woody-to-total area ratio." A crucial issue for this analysis was to better understand the
Page 60 of 339

-------
robustness of published values of needle-to-shoot area ratio (y) and woody-to-total area ratio (a), because
direct sampling of these quantities was logistically infeasible in this research effort. Published values
have been used in this analysis (Leblanc et al., 2002).
4.4.2   Hemispherical Photography

Figure 4-10 presents a chronosequence of hemispherical
photographic images taken at the midpoint (50 m) of the C
transect at the Hertford site at five different dates in 2002.
The images were the registered black-and-white bitmap
images produced by GLA. The date and LAI Ring 5
values were displayed to the right of each image. LAI
Ring 5 represented a 0° to 75° field of view. In the March
5, 2002 image, near-lens understory foliage was observed
 in the lower-left portion.  However, in subsequent images,
the large-leafed obstruction was absent. The reason for
the disappearance of this understory image component
 was unclear. The tripod height may have been adjusted to
 place the camera above the near-lens foliar obstruction, or
 perhaps field-crew  impacts may have resulted in the
 disappearance of the obstruction.  The presence of the
 near-lens foliage in the March 5 image may account for
 the somewhat elevated LAI value before leaf out.

 The orientation of the camera can be assessed by noting
 the position of the large tree bole that originates from the
 five-o'clock position  in the image.  The April 9, 2002
 image places the bole closer to the 4:30 position.
 However, as mentioned previously, camera orientation
 does not affect whole-image calculations of LAI or
 canopy openness.  Orientation was important only if it
 become necessary  to match TRAC data with a particular
 sector of the hemispherical photograph.

  L, values derived by hemispherical photography increased
  over the course of phenological development at the
  Hertford site. A decrease in Lt, from 2.13 to 1.88 was
  observed between the July 25 and August 5, 2002 images.
  The decrease may have partly been a result of the
  understory removal operation that occurred between July
  25-30, 2002. However, decreases of this magnitude were
  observed at other  APB sites in mid- to late-summer where
  no understory canopy removal was performed. The
  Hertford site was  primarily coniferous forest.  Needle loss
              Hertford, VA - TRANSECT C-50
              05 March 2002
              LAI Ring 5-1.6
              Hertford, VA - TRANSECT C-50
              05 April 2002
              LAI Ring 5-1.7
               Hertford, VA - TRANSECT C-50
               05 June 2002
               LAI Ring 5 = 2.29
               Hertford, VA - TRANSECT C-50
               05 July 2002
               LAI Ring 5 = 2.13
                Hertford, VA - TRANSECT C-50
                05 August 2002
                LAI Ring 5 - 1.88
Figure 4-10. Chronosequence of
            hemispherical photographs
            taken at the Hertford site
            along transect C, and the
            50 m midpoint. Dates and
            LAI Ring 5 values are
            shown to the right of each
             image.
                                                                                      Page 61 of 339

-------
due to the extreme drought conditions experienced in the APB study area, may partially account for the
observed decrease in Le.


4.4.3  Satellite Remote Sensing Issues

The MODIS LAI product was produced at 1.0 km2 spatial resolution. Inherent in this product were a
number of spatial factors that may contribute to uncertainty in the final accuracy of this analysis. MODIS
pixels were nominally 1.0 km2 at nadir, but expand considerably as the scan moves off nadir toward the
edges of the 2,330 km-wide swath. As a result, off-nadir pixels sampled a larger area on the ground than
near-nadir pixels.  The compositing scheme partially compensated for this by preferentially selecting
pixels closer to nadir. Mixed pixels contained more than one LC type. In the APB study region, the
landscape exhibits varying degrees of fragmentation, producing a mosaic of parcels on the  ground.
Within a 1.0 km2 block, agricultural, urbanized, and forested LC types may be mixed to such a degree that
assigning a  single LAI value is questionable.  There were also angular effects to consider.  The NDVI and
LAI products were adjusted for the bidirectional reflectance distribution function (BRDF; MODIS
product MOD43). Still, angular effects produced by variable viewing geometry may have  degraded the
accuracy or interpretability of the results.

An important issue was that of spatial scaling from in situ reference data measurements (m2) to MODIS
products (1.0 km2). ETM+ data provided the link between in situ measurements and MODIS
measurements. Quadrants correspond to a ground region of a 3 x 3 ETM+ pixel window (90 x 90 m), and
the subplots correspond to a region of approximately 2x2 pixels (60 x 60 m).  The Landsat data were
precision registered to ground coordinates using ground control-points, providing an accuracy between
0.5 to 1.0 pixel. Once the in situ data and Landsat image were coregistered, an ETM* LAI map was
generated by establishing a regression relationship between in situ LAI and Landsat NDVI. Then, the
Landsat LAI map could be generalized from a 30 m resolution to 1.0 km2 resolution.  Overall accuracy
was influenced by accuracy of co-registered data sets, interpolation methods used to expand in situ
measurements to ETM* NDVI maps, and the spatial coarsening approach applied to scale the ETM*
imagery to the MODIS scale of 1.0 km2 pixels.
4.5   Summary

Research efforts at the U.S. Environmental Protection Agency's National Exposure Research Laboratory
and National Center for Environmental Assessment include development of remote sensing
methodologies for detection and identification of landscape change.  This chapter describes an approach
and techniques for estimating forest LAI for validation of the MODIS LAI product, in the field using
ground-based optical instruments.  Six permanent field validation sites were established in the Albemarle-
Pamlico Basin of North Carolina/Virginia USA for multi-temporal measurements of forest canopy and
biometric properties that affect MODIS NDVI and LAI products. LAI field measurements were made
using hemispherical photography and TRAC sunfleck profiling, in the landscape context of vegetation
associations and physiography, and in the temporal context of the annual phenological cycle. Results of
these field validation efforts will contribute to a greater understanding of phenological dynamics evident
in NDVI time series, and will provide valuable data for the validation of the MODIS LAI product.
Page 62 of 339

-------
4.6   Acknowledgments

The authors express their sincere appreciation to Mark Murphy, Chris Murray, and Maria Maschauer for
their assistance in the field.  We thank Conghe Song for sharing of field instruments, Malcolm Wilkins
for research support, Paul Ringold for providing hemispherical photographic images for use in the quality
assurance and quality control aspects of the study, and Ross Lunetta, David Holland, and Joe Knight for
assistance in study design.  International Paper Corporation, Westvaco Corporation, the States of Virginia
and North Carolina, Duke University and North Carolina State University provided access to sampling
sites. We also thank three anonymous reviewers for helpful comments on this manuscript. The U.S.
Environmental Protection Agency funded and partially conducted the research described in this chapter.
It has been  subject to the Agency's programmatic review and has been approved for publication.  Mention
of any trade names or commercial products does not constitute endorsement or recommendation for use.
4.7    References

Avery, I.E. and H.E. Burkhart. Forest Measurements (Third Edition).  McGraw-Hill, New York, 331 p.,
   1983.

Becker, M.L., R.G. Congalton, R. Budd, and A. Fried. A GLOBE collaboration to develop land cover
   data collection and analysis protocols. J. Sci. Ed. Tech., 7, 85-96,2002.

Burrows, S.N., S.T. Gower, M.K. Clayton, D.S. MacKay, D.E. Ahl, J.M. Norman, and G. Diak.
   Application of geostatistics to characterize leaf area index (LAI) from flux tower to landscape scales
   using a cyclic sampling design.  Ecosystems, 5(7), 667-679, 2002.

Burton, A J., K..S. Pregitzer, and D.D. Reed. Leaf area and foliar biomass relationships in northern
   hardwood forests located along an 800-km acid deposition gradient. For. Sci., 37, 1041-1059,1991.

Canham, C.D., A.C. Finzi, S.W. Pascala, and D.H. Burbank. Causes and consequences of resource
   heterogeneity in forests: interspecific variation in light transmission by canopy trees. Can. J. For.
   Res., 24, 337-349, 1994.

Carlson, T.N. and D.A. Ripley. On the relation between NDVI, fractional vegetation cover, and leaf area
   index. Rem. Sens. Environ., 62, 241-252, 1997.

Chason, J.W. A comparison of direct and indirect methods for estimating forest canopy leaf area. Agr.
   For. Meteor., 57, 107-128, 1991.

Chen, J.M.  Optically-based methods for measuring seasonal variation of leaf area index in boreal conifer
   stands. Agr. For. Meteor., 80, 135-163,1996.

Chen, J.M. and T.A. Black. Measuring leaf area index of plant canopies with branch architecture. Agr.
   For. Meteor., 51,1-12, 1991.

Chen, J.M. and T.A. Black. Defining leaf area index for non-flat leaves. Plant Cell Environ., 15, 421-
   429,1992.
                                                                                  Page 63 of 339

-------
Chen, J.M., T.A. Black, and R.S. Adams.  Evaluation of hemispherical photography for determining plant
    area index and geometry of a forest stand. Agr. For. Meteor., 56, 129-143, 1991.

Chen, J.M. and J. Cihlar. Plant canopy gap-size analysis theory for improving optical measurements of
    leaf-area index. Applied Optics, 34, 6211-6222, 1995.

Chen, J.M., P.M. Rich, S.T. Gower, J.M. Norman, and S. Plummer. Leaf area index of boreal forests:
    Theory, techniques, and measurements.  J. Geophys. Res., 102(D24), 29429-29443, 1997.

Deblonde, G., M. Penner, and A. Royer. Measuring leaf area index with the LI-COR LAI-2000 in pine
    stands. Ecology,75, 1507-1511, 1994.

Delta-T Devices, Ltd. Hemiview User Manual (2.1), Cambridge, U.K., 1998.

Eidenshink, J.C. and R.H. Haas.  Analyzing vegetation dynamics of land systems with satellite data.
    Geocarto International, 1 (1), 53-61, 1992.

Fassnacht, K.S. and S.T. Gower. Interrelationships among the edaphic and stand characteristics, leaf area
    index, and aboveground net primary production of upland forest ecosystems in north central
    Wisconsin.  Can. J. For. Res., 27, 1058-1067, 1997.

Fassnacht, K.S., S.T. Gower, J.M. Norman, and R.E. McMurtrie. A comparison of optical and direct
    methods for estimating foliage surface area index in forests. Agr. For. Meteor., 71, 183-207, 1994.

Frazer, G.W., C.D. Canham, and K.P. Lertzman. Gap Light Analyzer [GLAJ: Imaging software to extract
    canopy structure and gap light transmission  indices from true-color fisheye photographs, users
    manual and program documentation (Version 2). Simon Frazer University, Bumaby, British
    Columbia, Canada, Simon Frazer University and the Institute of Ecosystem Studies, 1999.

Frazer, G.W., R.A. Fournier, J.A. Trofymow, and R.J. Hall.  A comparison of digital and film fisheye
    photography for analysis of forest canopy structure and gap light transmission. Agr. For. Meteor.,
    109,249-263,2001.

Geron, C.D., A.B. Guenther, and T.E. Pierce. An improved model for estimating emissions of volatile
    organic compounds from forests in the eastern United States. J. Geophys. Res., 99, 12773-12791,
    1994.

Gholz, H.L.  Environmental limits on aboveground net primary production, leaf area and biomass in
    vegetation zones of the Pacific Northwest. Ecology, 53, 469-481, 1982.

Gower, S.T., K.A. Vogt, and C.C. Grier.  Carbon dynamics of Rocky Mountain Douglas-fir: influence of
    water and nutrient availability. Ecological Monogr., 62,43-65, 1992.

Huete, A., C.O. Justice, and W. van Leeuwen. MODIS Vegetation Index (MODI3) Algorithm Theoretical
    Basis Document (Version 2).  http://eospso. gsfc. nasa. gov/atbd/modistables. html, 1-142,  1996.
 Page 64 of 339

-------
Jenkins, J.C., S.V. Kicklighter, J.D. Ollinger, J.D. Aber, and J.M. Melillo.  Sources of variability in net
    primary production predictions at a regional scale: a comparison using PnET-11 and TEM 4.0 in
    Northeastern US forests.  Ecosystems, 2, 555-570, 1999.

Johnsen, K.H., D. Wear, R. Oren, R.O. Teskey, F. Sanchez, R. Will, J. Butnor, D. Markewitz, D. Richter,
    T. Rials, H.L. Allen, J. Seiler, D. Ellsworth, C. Maier, G. Katul, and P.M. Dougherty.  Meeting global
    policy commitments: carbon sequestration and southern pine forests. J. Forestry, 99, 14-21, 2001.

Justice, C.O., E. Vermote, J.R.G. Townshend, R. Defries, D.P. Roy, O.K. Hall, V.V. Salomonson,
    J.L. Privette, G. Riggs, A. Strahler, W. Lucht, R.B. Myneni, Y. Knyazikhin, S.W. Running,
    R.R. Nemani, Z. Wan, A.R. Huete, W. van Leeuwen, R.E. Wolfe, L. Giglio, J.P. Muller, P. Lewis,
    and M.J. Barnsely. The Moderate Resolution Imaging Spectroradiometer (MOD1S): land remote
    sensing for global  change research. IEEE Trans. Geosci. Rein. Sens., 36, 1228-1249,  1998.

Knyazikhin, Y., J. Glassy, J.L. Privette, Y. Tian, A.  Lotsch, Y. Zhang, Y. Wang, J.T. Morisette,
    P. Votava, R.B. Myneni, R.R. Nemani, and S.W. Running. MODIS leaf area index (LAI) and
    fraction of photosyntheticully active radiation absorbed by vegetation (FPAR) Product (MODI5)
    Algorithm Theoretical Basis Document.  http://eospso. gsfc. nasa, 1999.

Kucharik, C.J., J.M. Norman, and L.M. Murdock. Characterizing canopy nonrandomness with a
    multiband vegetation  imager (MVI), J. Geophys. Res.. 102 (29), 455-29,473, 1997.

Leblanc, S.G.  Correction to the plant canopy gap size analysis theory used by the Tracing Radiation and
    Architecture of Canopies (TRAC) instrument. Applied Optics, 41 (36), 7667-7670, 2002.

Leblanc, S.G., J.M. Chen, and M. Kwong. Tracing radiation and architecture of canopies: TRAC
    Manual (Version 2.1.3), 3rd Wave Engineering, 14 Aleutian Road, Nepean, Ontario, Canada, 2002.

Lunetta, R.S., J.S. liames, J.  Knight, R.G. Congalton, and T.H. Mace. An assessment of reference data
    variability using a "Virtual Field Reference Database." Photogrammetric Engineering and Remote
    Sensing, 67 (6), 707-715, 2001.

Masuoka, E., A. Fleig, R.E. Wolfe, and F. Patt.  Key characteristics of MODIS data products, IEEE
    Transactions on Geoscience and Remote Sensing, 36, 1313-1323,  1998.

Myneni, R.B., S.  Hoffman, Y. Knyazikhin, J.L. Privette, J. Glassy, Y. Tian, Y. Wang, X.  Song,
    G.R. Smith, A. Lotsch, M. Friedl, J.T. Morisette, P. Votava, R.R. Nemani, and S.W. Running. Global
    products of vegetation leaf area and fraction of absorbed PAR from year one of MODIS data. Rein.
    Sens.  Environ., 83, 214-231, 2002.

Neumann, H.H., G. Den Hartog, and R.H. Shaw. Leaf area measurements based on hemispheric
    photographs and leaf-litter collection in a deciduous forest during autumn leaf-fall. Agr.  For.
    Meteor., 45, 325-345, 1989.

 Privette, J.L., R.B. Myneni, Y. Knyazikhin, M. Mukelabai, G. Rogerts, Y. Tian, Y. Wang, S.G. Leblanc.
    Early spatial  and temporal validation of MODIS LAI product in the Southern Africa Kalahari. Rein.
    Sens. Environ., 83, 232-243, 2002.
                                                                                    Page 65 of 339

-------
 Rich, P.M. Characterizing plant canopies with hemispherical photographs. Rein. Sens. Rev., 5, 13-29
     1990.

 Running, S. W., D.L. Peterson, MA. Spanner, and K.A. Tewber. Remote sensing of coniferous forests
     leaf-area. Ecology, 67, 273-276, 1986.

 Runyon, J., R.H. Waring, S.N. Goward, and J.M. Welles. Environmental limits on net primary
     production and light use efficiency across the Oregon Transect. Ecological Applications, 4, 226-237
     1994.

 SAS (Statistical Analysis Software). SAS/Graph Guide for Personal Computers (Version 6),
     SAS Institute, Inc., Cary, NC, 1987.

 Tian, Y., C.E. Woodcock, Y. Wang, J.L. Privette, N.L. Shabanov, Y. Zhang, W. Buermann,
    B. Veikkanen, T. Hame, Y. Knyazikhin, and R.B. Myneni. Multiscale analysis and MODIS LAI
    product.  II. Sampling strategy.  Rein. Sens. Environ, 83, 431-441, 2002.

 Verchot, L.V., B.C. Frankling, and J.W. Gilliam.  Effects of agricultural runoff dispersion on nitrate
    reduction in forested filter zone soils.  Soil Sci. Soc. Am. J., 62, 1719-1724, 1998.

Vogelmann, J.E., T. Sohl,  S.M. Howard, and D.M. Shaw. Regional land cover characterization using
    Landsat Thematic Mapper data and ancillary data sources. Environ. Monitor. Assess., 51,415-428,
    1998.

Wulder, M.A., F.L. Ellsworth, S.E. Franklin, and M.B. Lavigne. Aerial image texture information in the
    estimation of northern deciduous and mixed wood forest Leaf Area Index (LAI).  Rein. Sens.
    Environ.,  64, 64-76, 1998.
Page 66 of 339

-------
                                   Chapter 5

                         Light Attenuation  Profiling
                              as an Indicator of
                Structural Changes in  Coastal Marshes

                                         by

                       Elijah Ramsey III, Corresponding Author1*
                                    Gene Nelson1
                                   Frank Baarnes2
                                     Ruth Spell3
   USGS
   National Wetlands Research Center
   700 Cajundome Boulevard
   Lafayette, LA 70506

   *'Corresponding Author Contact:

    Telephone: (337) 266-8575
        E-mail: Elijah ramsev@usgs.gov
Oak Ridge Associated Universities
Oak Ridge, TN 37831

Presently with:

Department of Hydrology & Water
    Resources
University of Arizona
Tucson, AZ  85721
                           3 Ducks Unlimited
                            3074 Gold Canal Drive
                            Rancho Cordova, CA 95670
5.1   Introduction

To best respond to natural and human induced stresses, resource managers and researchers require remote
sensing techniques that can map the biophysical characteristics of natural resources on regional to local
scales. The implementation of advanced measurement techniques would provide significant
improvements in the quantity, quality, and timeliness of biophysical data useful in understanding the
sensitivity of vegetation communities to external influences. In turn, this biophysical data would provide
resource planners with a rational decision-making system for resources allocation and response action
development planning.
                                                                          Page 67 of 339

-------
Remotely sensed imagery can be analyzed to provide an accurate, instantaneous, synoptic view of the
spatial characteristics of vegetation environments (Ustin et al., 1991; Wickland, 1991). By
simultaneously recording reflectances in the visible to short-wave region of the electromagnetic spectrum,
die canopy reflectance associated with these spatial characteristics may be used to provide information on
the biophysical characteristics of vegetation (Goudriaan, 1977). To predict the vegetation response to
external stresses, it is essential to identify biophysical characteristics observable by remote sensing
techniques that have well defined connections to vegetative community type and condition.

In complex vegetation communities, canopy structure and leaf spectral properties are biophysical
characteristics that can vary in response to changes in vegetation type, environmental conditions, and
vegetation health. These changes can modify the spectrum of light reflected from the canopy, and thus,
directly influence the remotely sensed signal. Transformed into reflectance, variations in the image are
directly related to changes in the canopy properties broadly defined by the leaf composition, canopy
structure, and background reflectance. Direct links, however, cannot be inferred unless vegetation type
covaries directly and uniquely with these canopy parameters, or when one canopy property dominates the
canopy reflectance (e.g., leaf reflectance). Historically, limited ground-based observations circumvented
the need for directly incorporating variation in canopy properties into the remote sensing classification by
defining reflectance ranges (e.g., class ranges) that incorporate within-type canopy variability and
acceptable between vegetation type classification errors. Currently, the trend is to transform the temporal
patterns revealed in the remote sensing data into quantitative rate determinations to support qualitative
judgments of external effects on these resources (Lulla and Mausel, 1983). As we strive to extract more
detailed and accurate information about vegetation class variability, a greater understanding is needed of
how each canopy property (e.g., canopy structure) influences the canopy reflectance portion of the
remotely sensed signal.

Leaf spectral properties have been directly related to vegetation type and stress; and are general indicators
of the leaf chlorophyll, water content, and leaf biomass.  Numerous studies have related the canopy
structure variable - leaf area index (LAI); to vegetation type, health, and phenology (Goudriaan,  1977).
In essence, to map vegetation type and especially to monitor status, it is necessary to relate - both
individually and in aggregate, changes in leaf spectral properties, structural, and background parameters
to changes in the canopy reflectance.  In  the pursuit of extracting more detailed and accurate information
about vegetation type and status from remote sensing data, our goal is to provide an accurate assessment
of canopy structure that will not covary with leaf spectral and background properties with respect to
location or time. As part of this goal, the canopy structure indicator must be ultimately linkable to the
remote sensing signal in complex wetland and adjacent upland forest environments. Our challenge is to
provide this information based upon routine measurements that are cost-effective and easily implemented
into operational resource management and verified and calibrated with current operational ground-based
measurements (Teuber, 1990; Nielsen and Werle, 1993).

This chapter will examine light  attenuation profiling as an indicator of changes in marsh canopy
structures. Reported here are techniques that were tested and implemented to gain a useful measure of
canopy light attenuation over space and time. Within the constraints of the data collected, the
consistency, reliability, and comparability of the collected light attenuation data are related to the (1) area
sampling frequency (horizontal  spacing between profile samples), (2) canopy profile (vertical) sampling
frequencies, (3) exclusion of atypical  canopy structures, and (4) collections at different sun elevations. In
Page 68 of 339

-------
addition, we present some relationships observed between and within coastal wetland types and changes
in the canopy structural properties. These relationships are presented to indicate the spatial and temporal
stability of these biophysical indicators as related to mapping and monitoring with remote sensing
imaging.


5.1.1  Marsh Canopy Descriptions

Measurements of canopy light attenuation and canopy reflectance spectra were collected at 20 marsh sites
(30 x 30 m) in  coastal Louisiana and at 15 marsh sites in the Big Bend area of coastal Florida (Ramsey et
al.,  1992a,  1992b, 1993).  To provide a description of marsh characteristics, a few data subsets were
selected based  on presence of marsh grasses that dominate three of the gulf coast wetland zones
(Chabreck, 1970):  Juncus roemericmus (Juncus R.) and Spartina alterniflora (Spartina A.) for saline
marsh; Spartina patens (Spartina P.) for intermediate (brackish) marsh; and Panicum hemitomon
(Panicum H.) for fresh marsh (Chabreck 1970).  Juncus R. dominates the landscape and makes up the
majority of biomass in marshes of the northeast gulf coast and Spartina A. dominates the northcentral gulf
coast marshes (Stout 1984). In these marshes, except  for sites recovering from recent burns, canopies
usually contain a high proportion of dead canopy material (Hopkinson et al., 1978; Ramsey et al., 1999).

After reaching maturity, turnover rates of both live and dead biomass can remain nearly constant showing
no clear seasonal pattern.  Although mostly vertical, Juncus R. and Spartina A. (relatively less  vertical and
more leafy) canopy structures vary depending on local conditions (e.g., flushing strength)  and dominant
leaf orientation can change from top to bottom. Spartina P. and Panicum H. marshes dominate the
interior marshes of Louisiana. Generally, Spartina P.  canopies are hummocky with vertical shoots rising
above a layer of thick and logged  dead material. As in Juncus R. and Spartina A. canopies, Spartina P.
canopies seem to have a low turnover  with little seasonal pattern in live and dead composition. Panicum
H. canopies exhibit yearly turnover. Beginning with nearly vertical shoots in the late winter to early
spring, the canopy  gains height and increasingly adds mixed orientations until maturity in the late spring
to summer, then senesces in winter.
 5.2    Methods

 Light measurements were collected along transects centered at flag markers, as were all measurements
 describing the canopy characteristics. A 30 x 30 m area was used to encompass the spatial resolution of
 Landsat Thematic Mapper (TM) and similar Earth resource sensors. Additional recordings and
 observations collected at each site included upwelling radiance from a helicopter platform, canopy species
 type, percent cover, and height; photography; and estimates of live and dead biomass percentages.

 5.2.1   Field Collection Methods

 Canopy light attenuation measurements were acquired by using a Decagon Sunfleck Ceptometer
 (Decagon Devices, 1991). The ceptometer measures both photosynthetically active radiation (PAR) (400
 to 700 nm) and the canopy gap fraction (sunflecks). Canopy light attenuation curves were derived from
                                                                                    Page 69 of 339

-------
PAR measurements. The ceptometer probe has 80 light sensors (calibrated to absolute units) placed at
equal intervals along a 80 cm probe covered with a diffuse plate. The narrow probe (approximately 1.3 x
1.3 cm) is constructed with a hard and pointed plastic tip so that it can be inserted horizontally with
minimal disruption of the marsh canopy.  After inserting the probe into the canopy and obtaining a
horizontal probe orientation relative to gravity (bubble level), the 80 sensors are scanned and an average
light intensity value for the probe calculated, displayed and recorded (Decagon Devices, 1991). At each
site, measurements for estimating PAR canopy reflectance and the fraction of direct beam  PAR (1-
skylight/direct sun irradiance) were collected. A correction for PAR canopy reflectance was not included
in the calculations. Disregarding this correction, in general, results in <5% error in the intercepted
radiation  (Decagon Devices, 1991).  The direct beam fraction was used to estimate the leaf area and angle
distributions.  Normally, measurements were collected when clouds did not obstruct or influence
(intensified by cloud reflection) the downwelling sunlight; sky conditions were documented.

The depiction in Figure 5-1 presents our standard method of depicting light falloff with depth in the
canopy. Each point on the graph reflects the mean of all light measurements collected throughout the site
at the associated height above ground level. Error bars showing plus and minus one standard error (65%
confidence interval) depict the variance about the mean at each canopy height. The light attenuation
curve represents the percent of above-canopy  PAR sunlight (abscissa) reaching varying depths in the
                                                                           Fr*»h Marsh Canopy
                                                                           —0— Winter Profile
                                                                           —B— Summer Profile
60
                                                                          ' i ' ' ' i ' '  ' i ' ' ' i •  • • i
                                                                          80  100   120  140   160
                                                                   Canopy Height (cm)
Figure 5-1.  Left photo: a Ceptometer showing the sunflect (22.5) and PAR (299) readings; middle
            photo:  Ruth Spell is shown measuring the above PAR intensity after collecting
            readings with canopy depth at one point along one of the two transect directions (see
            Figure 5-2). The above canopy PAR intensity (shown) is used to normalize PAR
            measurements at each canopy depth. Right graph: The resulting summer and winter
            profiles of a fresh marsh canopy site showing the Light Penetration (PAR at each
            depth/above canopy PAR) with depth averaged over the 22 measurements points
            along the two transect directions. The standard errors of the 22 measurements at
            each canopy depth also are shown as horizontal bars attached to each symbol.
Page 70 of 339

-------
canopy (ordinate) throughout the site area. The curve typifies light attenuation in an undisturbed and
fully formed Panicum H. marsh.

5.2.1.1  Area Frequency Sampling

The three considerations combined to set the distance between profile collections included: (1) the
estimated canopy spatial variability; (2) the decision to sample 30 m transects in two cardinal directions;
and (3) the necessity to restrict the site occupation time. Early analyses of the collected light attenuation
data suggested profiles collected every 3.0 m and averaged over the 30 m transects provided the best
compromise to all sampling considerations (see Figure 5-2).
                                                                       N
                                                                      A
                     • Sampl* Locations
 ,3m,
-*	*-
                                                                               o
                                                                               3
                                              -30 m
                Figure 5-2.  Light attenuation profile locations every 3.0 m along
                            the 30-m east to west and south to north transects.
 5.2.1.2 Vertical Frequency Sampling

 Two vertical sampling distances were compared to assess the accurate and reliable portrayal of canopy
 light attenuation.  The earliest measurements were collected relative to the canopy height, not at a
 constant height above the ground level.  In a few of these early site occupations the relative top, middle,
 and bottom measurements collected every 3.0 m along the 30 m transects were supplemented with light
 attenuation profiles (every 20 cm) collected at three to four transect locations.
                                                                                      Page 71 of 339

-------
5.2.1.3 Atypical Canopy Structures

At each profile location, sky condition and canopy structure were recorded. Indicator flags were inserted
into the database to indicate whether (1) sunny or (2) cloudy sky existed, and whether the profile location
was (1) undisturbed, (2) a partial gap or hole, or (3) completely lodged. These flags could be used during
the data processing to exclude or include any combination of sky and canopy conditions.  In almost all
cases only sunny sky conditions were processed. Similarly, undisturbed canopy was most often solely
processed for generation of PAR light attenuation profiles typifying each site. In relation to remotely
sensed data, however, all canopy conditions will be incorporated in the reflectance returned to the sensor.
Ability to include or exclude atypical structures is expected to enhance remote sensing reflectance and
canopy structure comparisons.

5.2.1.4 Changing Sun Zenith

Light attenuation measurements collected at different times correspond to different sun elevations or
zeniths. In order to relate PAR recordings at  different sun zeniths, we  used the following relationships
equating the beam  transmittance coefficient to the product of leaf area  and canopy extinction coefficient
(Decagon Devices, 1991):
                                  lntei / Intez = Ke, / K^ = p                                    (1)
where lnte, = the log of the canopy transmission coefficient (T) at sun zenith 1 (8,), Inte2 = the log of T at
82, Ka, = the extinction coefficient at 8,, K^ = at 82, and p = the 8 correction factor. Although this
relationship is based on the penetration probability without interference and is directly relevant to
sunfleck measurements, we tested the application of the sun zenith normalization to PAR measurements.
The canopy extinction coefficient expression (K) taken from Decagon Devices (1991) was presented by
Campbell (1986) as:
                                  K = (x2 + tan28)1/2 /  [x + 1 .774(x+ 1 . 1 82)-° 733]                    (2)

where x equates to the ratio of area projected by an average canopy element on a horizontal to vertical
plane. An or of 1 ,000 defines a horizontal, an x of 1 .0 a spherical and an AC of 0.0 a vertical leaf
distribution, and the 8 represents the sun zenith angle.

Accounting for no change in leaf area and x between light attenuation measurements and after
simplification, a correction factor for off nadir sun angles is constructed as  follows:

                                                                                               (3)
where the sun zenith at the time of measurement is 82.  Assuming :c is 1.0 (spherical) and choosing a
standard zenith angle of 8, = 0 (sun directly overhead), then:

                                  p = (1.0/(1.0 + tan282)I/2                                      (4)

where normalization of PAR measurements to a sun nadir zenith (= 0.0) was estimated to be:

                                  PAR(8,) = PAR(82)P                                          (5)



Page 72 of 339

-------
5.3    Results

5.3.1  Vertical Frequency Sampling

Two examples were selected to illustrate noticeable differences between samples taken only at the top,
bottom, and middle canopy positions and at every 20 cm above the bottom (see Figure 5-3). These early
measurements were collected during July and August, when the canopies reached full growth. In both the
Spartina P. (more hummocky and logged) and Panicum H. (more vertical and leafy) marsh sites, similar
             20   40    60   80   100   120  140
                  Canopy Height (cm)	
 20   40   60   80   100  120   140
	Canopy Height (cm)	
 Figure 5-3.  Aggregate site profiles (Spartina patens and Panicum hemitomon) associated with
             PAR intensity collections at top, middle, and bottom canopy depths (shown as u with
             a dashed line) and at every 20 cm (shown as • with a solid line).

 differences were revealed between curves associated with the higher and  lower (relative) frequency depth
 sampling.  Light attenuation was over predicted nearer the top of the canopy and under predicted in the
 lower canopy.  Even though measurement techniques were further refined follow ing these early
 collections to expressly test field sampling techniques, these and similar results laid the basis for the
 vertical sampling frequency used throughout field collects in all marsh types. The choice of vertical
 sampling frequency relied on what was necessary to obtain our primary purpose; to detect and monitor
 canopy structure differences between wetland types and within-wetland types that might influence
 variability in the remote sensing image data. An additional consideration was  our goal to use the data we
 collected to estimate what influence these structural differences have on the canopy reflectance and
 whether these differences could be detected at some level with remote sensing data.  We felt light
 penetration collections limited to a few positions in the canopy profile would severely jeopardize our
 ability to fulfill this purpose and especially to reach our goal.

 Following these initial tests, our standard collection technique was to profile light intensities from the
 ground level to above the canopy in 20 cm increments at 3.0 m  intervals along each transect.  To ensure
                                                                                    Page 73 of 339

-------
 proper measurement height at each profile, a pole marked in 20 cm increments was driven into the ground
 until the zero mark was at ground-surface level (flood or non-flood).  In Spartina P. canopies the pole was
 placed between grass clumps.  Profile measurements were collected perpendicular to the transect direction
 and towards the hemisphere containing the sun. At each site occupation, either 11 or 22 (most often)
 PAR recordings (one or two transects) were taken at each profile height. The above canopy PAR
 measured at the associated profile location normalized each recorded PAR.


 5.3.2   A typical Canopy Structures

 Examples where disturbed or logged conditions existed at one or more locations within a site were found
 in all marsh types. Relative to  the number of occurrences, however, Juncus R. marshes contained the
 least of these and Panicum H. and Spartina P. the most. Juncus R. canopies were most often impacted by
 wrack deposits or the subsequent marsh dieback or by fire and the subsequent recovery.  Animal
 herbivory and fire often impacted Panicum H. marshes, but higher salinity water deposited by a storm
 surge seemed to have a lingering impact evident in the post-storm collections. In the two years of data
 collections, a major storm and a fire impacted Spartina A. sites. To a lesser extent, Spartina P. sites were
 impacted by storms and fire. The typical hummocky nature of this marsh limited the usefulness of the
 logged indicator.

 The first example contains light attenuation curves generated from two occupations one day apart of a
 Panicum H. marsh site that was severely impacted by animal activity following the occupation (see
 Figure 5-4, A & B).  Other than the magnitude of variance depicted by the error bars, little evidence was
 present in the impacted curves (see Figure 5-4, C & D) indicating the widespread abnormal canopy
 structure. In fact, neglecting that only a day has elapsed between collections, aggregating all profile
 locations results in fairly reasonable profiles (see Figure 5-4, A). Excluding all impacted profiles left few
 observations in the undisturbed sample set; however, the aggregate of these remaining profiles showed a
 more consistent depiction of canopy structure of little or no change in canopy structure (see
 Figure 5-4, B).

 A second example shows curves depicting site occupations in a lightly impacted  Panicum H. marsh
 chronologically from October (before full senescence), February (after senescence and removal of most
 dead material), September (substantially before the initiation of senescence), and December (after full
 senescence but before total dead material removal). Although  differences between the undisturbed (see
 Figure 5-5, B) and aggregated (see Figure 5-5, A) sequences were not dramatic and only two of the
 occupations contained severely logged locations, inclusion of locations with partial gap (see
 Figure  5-5, C & D) reduces the clarity of the trend consistent with expected seasonal changes. Similar to
 the second example, curves are shown depicting site occupations in a lightly impacted Juncus R. marsh
 chronologically in April, September, November, and March (see Figure 5-6, A through C).  Although
 little canopy structural change is expected in these marshes, curves including only undisturbed profiles
(see Figure 5-6, B) indicate a slight tendency for increased canopy light attenuation in the spring versus
 late summer and winter seasons. A final example shows herbivory impact of a Spartina P. marsh closely
preceding the March occupation and regrowth towards recovery by July (see Figure 5-7, A through C).
 Inclusion of locations with partial canopy (see Figure 5-7, C) fully distorted the consistency shown in the
 undisturbed profiles (see Figure 5-7, B).
Page 74 of 339

-------
  1.00-
  0.90-
  0.80-
c 0.70-
.2
  0.60-
  0.50-
£ 0.40-
  0.30-
  0.20-
  0,10-
  0.00
             l   '   I   '   I   '   I  '   I   '   I
            20    40    60    80    100   120
                 Canopy Height (cm)	
                                                 1.00-1
                                                 0.90-
                                                 0.80-
                                                 0.70-
                                                 0.60-
                                                 0.50-
                                                 0.40-
                                                 0.30-
                                                 0.20-
                                                 0.10-
                                                 0.00
                                                           I   '   I   '  I   '   I   '   I   '   I
                                                          20    40     60     80    100    120
                                                              Canopy Height (cm)	
                  I   '   I   '   T
            20    40     60    80
                Canopy Height (cm)
                                  100
 1
120
 20    40    60    80    100
	Canopy Height (cm)
120
Figure 5-4.  Light attenuation profiles ([•] 09 and [o] 10 September) associated with a Pan/cu/n
           hemltomon marsh representing the aggregate of (A) all profiles collected every 3 m
           along the 30 m transects, (B) only profiles associated with undisturbed canopy
           locations, (C) only profiles associated with partial canopy gaps, and (D) only profiles
           associated with severely lodged canopy locations.
                                                                               Page 75 of 339

-------
                                                               1   I—I—'—1—I—I—I
                                                               40    60   80   100
                                                               Canopy Height (cm)
                1
 40    60   80   100
 Canopy Height (cm)
                                                              40   60   80   100
                                                              Canopy Height (cm)
     1
40   60    80   100
Canopy Height (cm)
 Figure 5-5.  Light attenuation profiles ([a] 16 October, [•] 07 February, 09 [•] September, and [o]
            11 December) associated with a Panicum hemitomon marsh representing the
            aggregate of (A) all profiles, (B) only undisturbed canopy locations, (C) partial canopy
            gaps and (D) only severely lodged canopy locations.
Page 76 of 339

-------
                                                                      1   '   I
                                                          20     40     60    80
                                                              Canopy Height (cm)
              i
 40    60    80
Canopy Height (cm)
                                                   1   i
                                         40    60    80
                                        Canopy Height (cm)
Figure 5-6.  Light attenuation profiles ([n] 26 April, [•] 04 September, [•] 03 November, and [ >] 09
            March) associated with a Spartina altemiflora representing the aggregate of (A) all
            profiles, (B) only undisturbed canopy locations and (C) partial canopy gaps.
                                                                                Page 77 of 339

-------
               -1—I—I—1—'—1—'  I   '   I  '   I
            20    40    60   80   100   120   140
                  Canopy Height (cm)
                                          40    60   80   100
                                          Canopy Height (cm)
          80   100   120
Canopy Height (cm)
Figure 5-7.  Light attenuation profiles ([•] 30 March and [o] 09 July) associated with a Spartina
            patens representing the aggregate of (A) all profiles, (B) only undisturbed canopy
            locations, and (C) partial canopy gaps.


5.3.3  Changing Sun Zenith

Normalization of canopy light penetration measurements to a nadir sun zenith was more successful in
more vertical canopies such as Juncus R. and less effective in more lodged and horizontal canopies like
Spartina P. Canopy light penetration recordings collected at one Juncus R. site at four different times and
sun zeniths shows a number of results of applying the sun zenith correction factor (see Figure 5-8, A &
B).  First, as sun zenith  increased the rate of PAR falloff with canopy depth increased. After application
of the sun zenith correction factor, the PAR falloff was greatly decreased, and conversely the magnitude
Page 78 of 339

-------
of PAR reaching lower within the canopy was vastly increased. Second, after correction light attenuation
profiles associated with the first three occupation times completely aligned.  The latest occupation
indicated a limit to the correction.  Sun zenith angles higher than about 60° over corrected the rate of PAR
falloff through the canopy. This led to excluding PAR canopy penetration measurements to times when
the sun zenith was at least 60° or higher.

A second example shows the application of the correction to PAR profile measurements during nearly
three years of occupations (see Figure 5-9, A & B). The Juncus R. site was recovering from a burn when
the first occupation occurred in September. Occupations then followed chronologically in April,
September, January, June, and finally September two years after the initial occupation. Inspection of the
two  plot series shows the consistency and accuracy resulting from the application of the sun zenith
correction factor. In contrast to the non-corrected plot series, the corrected canopy PAR attenuation
profiles became increasingly steep and reached lower levels of PAR with time-since-burn.

The third example shows a series of PAR attenuation profiles created from measurements collected at a
Spartina A. site during a year and nine months of site occupations beginning in February (see
Figure 5-10, A & B).  The corrected and uncorrected series are dramatically different.  After February, the
site  occupations chronologically occurred in April, July, September, November. March, and again in
September of the following year. The July occupation took place about a month before a hurricane
impacted this site (August 26), and the first September occupation was about 10 days after impact. The
canopy light attenuation profile deepened each occupation before the impact, but consistent with this
impact occurrence, the attenuation profile shallowed sharply  immediately after the impact.  PAR
attenuation profiles changed little for a year after the dramatic change in September.

PAR attenuation profiles over nearly a year and a half at a Panicum H. site highlights the sun zenith factor
correction effectiveness as well as depicts typical seasonal changes and canopy  recovery following a bum
(see Figure 5-11, A & B). Site occupations occurred chronologically beginning in  February and
subsequently occurred in March, April, September, December, and the following year in January, April,
and July.  Although differences between the corrected and uncorrected profiles were subtler than in the
Juncus R. marshes, the sun zenith-corrected profile sequencing is a more convincing depiction of
expected canopy structure changes over the selected time period. From the earliest occupation  in
 February, PAR attenuation increased and the profiles deepened until October, after which the December
profile shows the expected decrease in PAR attenuation.  After the December occupation, the site was
burned as is confirmed in the February and subsequent April profiles. The final July profile shows the
 canopy recovering to the late summer and early fall profile.

 A final series of profiles associated with a Spartina P. site illustrate the increased consistency in the sun
 zenith corrected versus  uncorrected profiles (Figure 5-12, A & B). The final two profiles in  these series
 again show the zenith angle limitation of the correction technique in more vertical  canopies.  Canopy
 PAR penetration measurements collected at very low sun zeniths are not comparable to those collected at
 sun zenith angles at least <60°. The movement of the PAR attenuation profile  from shallower to deeper
 with decreasing sun zenith angle in the Spartina P. canopy was similar to what occurred in the Juncus R.
 marsh canopy (Figure 5-9, A & B).
                                                                                      Page 79 of 339

-------
              I
              20
 \  '   1  'I  'I
40   60   80   100
Canopy Height (cm)
T
I
120   140
                                                c
                                                o
                                                0)
                                                c
                                                0>
                                                0-
                                                01
             0    20
         T
40   60   80  100  120
Canopy Height (cm)
 1
14Q
Figure 5-8.  Aggregate light attenuation profiles of a Juncus roemerians marsh associated with
            PAR collections at a sun zenith angles of [V] 54°, [o] 59°, [*] 66°, and [i i] 75° (A)
            without normalization and (B) with normalization of PAR recordings to a nadir
            collection.
                  Canopy Height (cm)
                                                                                    B
                                                  I   '  I ~T~~r~'
                                                 60   80  100  120
                                             Canopy Height (cm)
                                                                                      140
                                                                                   	   H.
Figure 5-9.  Aggregate light attenuation profiles of a Juncus roemerians marsh associated with
            PAR collections at different sun zenith angles on [A] 25 September, [A] 29 April rni
            September, [*J  13 January, [o] 24 June, and [v] 22 September the following year (A*1 *
            without normalization and (B) with normalization of PAR recordings to a nadir
            collection.
Page 80 of 339

-------
 c
 V
 Q.
                 T
             20   40   60   80   100
                 Canopy Height (cm)
                                              a>
0   20
-i—i—i—i—'i  '  r
  40   60   80   100
  Canopy Height (cm)
                  120
Figure 5-10. Aggregate light attenuation profiles of a Spartina alterniflora marsh associated with
           PAR collections at different sun zenith angles on [+] 16 February, [A] 26 April, [A] 10
           July, [ii] 04 September, [*] 03 November, and the following year on [o] 09 March and
           [v] 02 September (A) without normalization and (B) with normalization of PAR
           recordings to a nadir collection.
                  40   60    80   100
                  Canopy Height (cm)
     20
40   60   80   100
Canopy Height (cm)
 Figure 5-11. Aggregate light attenuation profiles of a Pan/cum hemitomon marsh associated with
             PAR collections at different sun zenith angles on [•] 07 February, [+] 13 March, [A] 21
             April, [A] 09 October, [n] 11 December, and the following year on [*] 22 January! [°]
             22 April and [v] 06 July (A) without normalization and (B) with normalization of PAR
             recordings to a nadir collection.
                                                                                Page 81 of 339

-------
 s.
 O)
                                  100
                   Canopy Height (cm)
120   140
 \  '   i  '   r
20   40   60
 1   '  I   'I
100   120  140
                            Canopy Height (cm)
Figure 5-12. Aggregate light attenuation profiles of a Spart/na patens marsh associated with PAR
            collections at different sun zenith angles oh [A] 05 August, [o] 21 October, [*] 30
            December, [v] 21 July at a sun zenith of 49°, and [p] 21 July at 18° (A) without
            normalization and (B) with normalization of PAR recordings to a nadir collection.
5.4   Discussion

Light penetration field measurements were described and discussed in terms of their completeness,
reliability or consistency and accuracy to characterize canopy structure.  Marsh types discussed included
Sportina alterniflora and Juncus roemerianus (saline marshes), Spartinapatens (brackish marsh), and
Panicum hemitomon (fresh marsh). Our primary purpose was to devise simple operational field
collection methods and post analyses procedures that could detect and monitor canopy structure
differences between and within wetland types that might influence variability in the remote sensing image
data.  Our goal was to improve reliability and accuracy of current classifications based on remote sensor
data, and as a consequence extend the biophysical detail extractable with remotely sensed data.

Our first objective was to test the light penetration measurements for reliability, accuracy and
comparability over time and space and within and between marsh types. A ceptometer device that
measures PAR along a 80 cm probe was chosen for the light penetration measurements.  In addition to the
choice of measuring device, four variables related to field data collection and post data analysis design
were examined with respect to fulfilling data reliability and accuracy but also to maximizing the  potential
for operational use for remote  sensing calibration and assessment of classification accuracy. These
variables were the (1) horizontal (planar), (2) vertical (canopy profile) spatial sampling frequencies,
(3) description and possible exclusion of atypical canopy structures, and (4) normalization of
measurements at different sun elevations.
/Page 82 of 339

-------
At each site, we used 30 m transects in the north and south and east and west directions. Initially sample
locations were at the site center, the transect extremes, and midpoints; however, sample variability
indicated higher sample frequency was needed. Within the early testing, sampling protocol along
transects was changed to collecting light penetration measurements every 3.0 m. This 30 x 30 m site area
and higher 3.0 m transect sampling frequency helped ensure more accurate depiction of the local
variability, more reliable mean and variance measures, and matched or encompassed the spatial resolution
of most common resource remote sensing sensors. Similarly, after testing a relative canopy profile
sampling of top, middle, and bottom, the vertical  sampling frequency was standardized to light
penetration measurements every 20 cm from the ground surface to above the canopy.  At each profile
location, the above-canopy light recordings were  used to normalize lower canopy light recordings
transforming light absolute magnitudes to percent penetrations.  The relative profile sampling (top,
middle, bottom) did not adequately replicate the canopy light attenuation profile as compared to the
20 cm sampling frequency, especially in fully formed canopies. The 20 cm sampling interval was
selected to ensure our goals to increase the canopy detail extractable and improve the predictability of
canopy structure with remote sensing data.

To improve the reliability, accuracy, and detail of the canopy light attenuation data, at each profile
location the state of the sky condition (sunny or cloudy) and canopy structure (undisturbed, partial gap, or
completely lodged) was recorded. In this analysis, only sunny sky conditions were processed. Observed
differences relative to including or excluding disturbed canopy profiles were mostly attributable to the
level of canopy was disturbance.  In most cases, excluding the disturbed profiles at each site from the
aggregate site light attenuation profile increased the ability to compare aggregate profiles taken at the
same site during multiple occupations. If the aggregate profiles were more comparable within a site after
exclusion, comparison between sites and marsh types also would improve with exclusion.  The inclusion
of all profiles, each designated with a flag as to the sky and canopy condition, allows us to view and
analyze selected aspects (include or exclude) about the canopy variability and thus greatly enhance the
ability to understand and relate remote sensing reflectance to canopy structure.

Although more noticeable in Juncus roemerianus and less in Spartinapatens canopies, in all marshes as
sun zenith increased the rate of light falloff with canopy depth increased.  To insure comparability of
aggregate site light attenuation profiles, canopy light penetration measurements were normalized to a
nadir sun zenith. The successfulness of the normalization seemed to be associated with the preferred
orientation of the marsh canopy. In the most vertical canopies such as Juncus roemerianus, the correction
worked well up to a sun zenith around 60°. In highly lodged or horizontal canopies like Spartina patens,
the light penetration seemed less effective and restricted to sun zenith angles <49°.  Nonetheless,
normalized light attenuation profiles were more consistent with the expected changes in canopy structure.
Even in this normally highly lodged canopy, the normalization improved the comparability and accuracy
of the generated light attenuation profiles. In between these two extremes of vertical to lodged canopy
orientations, improvement of Spartina altemiflora aggregate attenuation profiles was somewhat similar to
that associated with the more vertical Juncus roemerianus canopy measurements. Panicum hemitomon
improvement, however, seemed dependent on the seasonal stage of canopy development.  In the early
stage of regrowth and spring "green-up," the canopy is more vertical and therefore more conducive to
normalization.  As the canopy transforms and becomes a dense mixture of leaf orientations, the
normalization tends to be less successful.
                                                                                    Page 83 of 339

-------
In all marshes, application of the sun zenith correction factor decreased the perceived light falloff with
canopy depth and increased the recorded light intensity reaching lower within the canopy. After
normalization the light attenuation profiles taken at different sun zeniths were more closely aligned with
each other and with the expected progression of canopy structure.  A limitation of the normalization
seemed to be about a sun zenith of 60° in vertical canopies and less than 49° in more horizontal canopies;
however, these normalizations used a spherical canopy orientation parameter (x = 1, p = (x / (x +
tan282)1/2).  In most cases, preferred orientations deviate highly from spherical in Spartinapatens and,
depending on the season, Panicum hemitomon canopies.  Inclusion of more appropriate values of* (0.0 =
vertical, 1.0 = spherical, 1,000 = horizontal) could improve the reliability and accuracy of light
attenuation profiles in more horizontal canopies such as Spartina patens. Future analyses will examine
inclusion of more appropriate orientation parameters.

It is difficult to relate canopy structure (as defined by light attenuation) to canopy reflectance without
further analysis, but a few observations are possible.  In combination, the light attenuation profiles show
at least one major difference between marsh grass structures:  the amount of vertical to lodged grass.  The
relative amounts remain: (1) relatively stable throughout the year as shown in the fairly vertical canopies
ofJuncus roemerianus and Spartina alterniflora; (2) relatively transitional as in Panicum hemitomon
typifying a more vertical canopy in the winter-early spring, a thicker lodged canopy in summer, and a
transition back to less lodged material through fall; and (3) highly variable as in Spartina patens which
shows typically a highly lodged hummocky character. Even though local areas may show a fairly
consistent trend or pattern in light attenuation profiles, high variability in light attenuation seems more
common within Spartina patens marshes than within other marsh types.

For remote sensing, structural influences would be the least variable in the Juncus roemerianus and
Spartina alterniflora marshes, but background variability may be relatively higher because of the higher
base light levels throughout the year. More variable influence of canopy structure on spectral reflectance
may be expected in the Panicum hemitomon marsh, with possibly higher influences of background in the
winter-early spring. Structure in the Panicum hemitomon marsh is closely related to the seasonal
occurrences of "green-up" and senescence. Less light penetration in the summer because of increased
lodging would decrease  spectral information from deeper within the canopy but in the winter dieback,
background influences may be higher through this more open canopy. Further, without the ability to
separate structure and leaf optical influences on canopy reflectance in these marshes, it would be difficult,
or impossible to detect what canopy property was changing as both were dramatically varying during
these periods. Higher structural variability within Spartina patens marshes would be expected to cause
variability in canopy reflectances, with reflectances least  affected by structural variation in the summer
and fall periods. During winter and  spring, however,  increased  high base light levels in Spartina patens
marshes could further complicate interpretation of canopy reflectance variability.
5.5   Summary

Light penetration field measurements were tested and described in terms of their completeness, reliability
or consistency and accuracy to characterize canopy structure. A ceptometer device measuring
Photosynthetic Active Radiation (PAR) along a 1.0 m probe was chosen for the light penetration
Page 84 of 339

-------
measurements. Marsh types included Spartina alterniflora and Juncus roemerianus (saline marshes),
Spartina patens (brackish marsh), and Panicum hemitomon (fresh marsh). Four variables related to field
data collection and post data analysis design were examined with respect to fulfilling data reliability and
accuracy and maximizing the potential for operational use for remote sensing calibration and assessment
of classification accuracy. These variables included: (a) the horizontal (planar); (b) the vertical (canopy
profile) spatial sampling frequencies; (c) the description and possible exclusion of atypical canopy
structures; and (d) the normalization of measurements at different sun elevations.

 Early testing showed 30 m transects in the north and south and east and west directions combined with
 light penetration measurements every 3.0 m helped ensure more accurate depiction of the local variability
 and matched or encompassed the spatial resolution of most common resource remote sensing sensors.
 Similarly, vertical light attenuation profiles derived from sampling the canopy every 20 cm from the
 ground surface to above the canopy improved reliability, consistency, and completeness of repeated
 measurements. Accounting for the state of the canopy as undisturbed, partial gap, or completely lodged
 at each profile location was found to increase the comparability and detail of PAR attenuation profiles
 taken at the same site during multiple occupations and between sites and marsh types.

 In all marshes as sun zenith increased the rate of light falloff with canopy depth increased, although this
 effect was more noticeable in Juncus roemerianus and less in Spartina patens canopies. To remove the
 sun zenith influences, a method was tested to normalize canopy PAR penetration measurements to a nadir
 sun zenith. The success of the removal was linked to the spherical canopy leaf orientation used by the
 normalization. PAR normalizations seemed more successful when used in more vertical canopies such as
 Juncuswemerianus and Spartina alterniflora, least successful in highly lodged canopies like Spartina
 patens, and more dependent on seasonal canopy development in marshes like Panicum hemitomon. In all
 marshes, application of the normalization increased alignment of PAR attenuation profile taken at
 different sun zeniths and alignment with the expected progression of canopy structure over time.
 5.6   Acknowledgments

 We thank Joe White and James Burnett of the U.S. Fish and Wildlife Service for access to the St. Marks
 National Wildlife Refuge, Florida, and John Fort and Doug Scott for help in field logistics and data
 collections.  We are grateful to U.S. Geological Survey personnel, Allison Graver, Kevin McRae,
 Dal Chappell, Richard Day, and Steve Laine for the many hours of work on planning field logistics and
 data collections on this study and also thanks to Ms. Beth Vairin for editing this manuscript.  Mention of
 trade names or commercial products is not an endorsement or recommendation for use by the U.S.
 Government.
 5.7    References

 Chabreck, R. Marsh zones and vegetative types in the Louisiana coastal marshes. PhD Dissertation,
    Louisiana State University, Baton Rouge, LA, 112 pp., 1970.
                                                                                    Page 85 of 339

-------
Campbell, G. Extinction coefficients for radiation in plant canopies calculated using an ellipsoidal
    inclination angle distribution. Agriculture, Forestry, and Meteorology, 36, 317-321, 1986.

Decagon Devices. Sunfleck ceptometer reference guide. Decagon Devices, Inc., Pullman, WA, 28 pp.,
    1991.

Goudriaan, J. Crop micrometeorology: a simulation study. Wageningen, Centre for Agricultural
    Publishing and Documentation, 249pp., 1977.

Hopkinson, C., J. Gosselink, and R. Parrondo.  Aboveground production of seven marsh plant species in
    coastal Louisiana. Ecology, 59, 760-769, 1978.

Lulla, K., and Mausel, P.  Ecological applications of remotely sensed multispectral data. Introduction to
    remote sensing of the environment, B. F. Richasen, Jr. (Editor), Kendall/Hall Publishing, Dubuque,
    IA, 354-377, 1983.

Nielsen, C., and D. Werle. Do long-term  space plans meet the needs of the Mission to Planet Earth?
    Space Policy, February 11 -16, 1993.

Ramsey, E. HI, R. Spell, and J. Johnston.  Preliminary analysis of spectral data collected for the purpose
    of wetland discrimination. Technical Papers of the ASPRS/ACSM Conference, Albuquerque, NM, 1,
    386-394, 1992a.

Ramsey III, E., and Spell, R., and Day, R., 1992b. Light attenuation and canopy reflectance as
    discriminators of gulf coast wetland types.  Proceedings of the International Symposium on Spectral
    Sensing Research, Maui, Hawaii, November 15-20,  1992, 1992b.

Ramsey III, E., R. Spell, and R. Day. Measuring and monitoring wetland response to acute stress by
    using remote sensing  techniques. Proceedings of the 25th International Symposium on Remote
    Sensing and Global Environmental Change, Graz, Austria on April 4-8, 1993.

Ramsey III, E., G. Nelson, S. Sapkota, S. Laine, J. Verdi., and S. Krasznay. Using multiple-polarization
    L-band radar to monitor marsh burn recovery.  IEEE Transactions on Geoscience and Remote
    Sensing, 37, 635-639, 1999.

Stout, J. The ecology of irregularly flooded salt marshes of the northeastern Gulf of Mexico:  a
    community profile. U.S. Fish and Wildlife Service Biological Report, 85(7.1), 98 pp., 1984.

Teuber, K.B. Use of AVHRR imagery for large-scale forest inventories.  Forest Ecology and
    Management, 33/34, 621-631, 1990.

Ustin, S., C. Wessman, B. Curtiss, E. Kasischke, J. Way, and B. Vanderbilt. Opportunities for using the
    EOS imaging spectrometers and synthetic aperture radar in ecological models. Ecology, 72(6), 1934-
    1945, 1991.

Wickland, D. Mission to  planet Earth: the ecological perspective. Ecology, 72(6), 1923-1933, 1991.
-Page, 86 of 339

-------
                                   Chapter 6

        Participatory Reference Data Collection Methods for
        Accuracy Assessment of Land-Cover Change Maps
                                        by

                              John Sydenstricker-Neto1*
                              Andrea Wright Parmenter2
                                Stephen D. DeGloria3
  Department of Development Sociology
  118 Warren Hall
  Cornell University
  Ithaca, New York 14853-7801
Center for the Environment
Rice Hall
Cornell University
Ithaca, New York 14853
                 3 Cornell Institute for Resource Information Systems
                     and Department of Crop and Soil Sciences
                   232 Emerson Hall
                   Cornell University
                   Ithaca, New York 14853

                   *Corresponding Author Contact:

                   Telephone:  (607) 255-9729
                     Facsimile:  (607) 254-2896
                        E-mail:  ims56@cornell.edu
6.1   Introduction

Development strategies aimed at settling the landless poor and integrating Amazonia into the Brazilian
national economy have led to the deforestation of between 23-50 million ha of primary forest. Over 75%
of the deforestation has occurred within 50 km of paved roads (Skole and Tucker, 1993; INPE, 1998;
Linden, 2000). Of the cleared areas, the dominant land-use (LU) practice continues to be conversion to
low productivity livestock pasture (Serrao and Toledo, 1990; Feaniside. 1987).  Meanwhile, local fanners
and new migrants to Amazonia continue to clear primary forest for transitory food, cash crops, and
pasture systems, and eventually abandon the land as it loses productivity. Though there are
                                                                         Page 87 of 339

-------
disagreements on the benefits and consequences of this practice from both economic, agronomic, and
environmental perspectives, there is a need to link land-cover (LC) change in Amazonia with more global
externalities.

Rehabilitating the productivity of abandoned pasture lands has the potential to convert large areas from
sources to sinks of carbon (C), while providing for the well-being of people in the region and preserving
the world's largest undisturbed area of primary tropical rainforest (Fernandes et al., 1997). Primary
forests and actively growing secondary forests sequester more C, cycle nutrients more efficiently, and
support more biodiversity than abandoned pastures (Fearnside, 1996; Fearnside and Guimaraes, 1996).
Results from research on LU options for agriculture in Amazonia point to agrosilvopastoral LU systems
involving rotations of adapted crops, pasture species, and selected trees as being particularly appropriate
for settlers of western Amazonia (Fernandes and Matos, 1995; Szott et al., 1991; Sanchez and Benites,
1987).  Coupled with policies that encourage the sustainability of these options and target LU
intensifications, much of the vast western Amazonia could be preserved in its natural state (Sanchez,
1987; Vostietal., 2000).

Many studies have focused on characterizing the spatial extent, pattern, and dynamics of deforestation in
the region using various forms of remotely sensed data and analytical methods (Alves et al., 1999; Boyd
et al., 1996; Peralta and Mather, 2000; Roberts et al., 1998). Given the importance of secondary forests at
sequestering carbon, the focus of more recent investigations in the region has been on developing spectral
models and analytical techniques in remote sensing to improve our ability to map these secondary forests
and pastures in both space and time, primarily in support of global carbon modeling (Asner et al., 1999;
Foody et al., 1996; Kimes et al.,  1999; Lucas et al., 1993; Mausel et al., 1993; Steininger,  1996).

The need to better integrate the human and biophysical dimensions with the remote sensing of LC change
in the region has been reported extensively (Frohn et al., 1996; Liverman et al., 1998; McCracken et al.,
1999; Moran and Brondizio, 1998; Moran et al., 1994; Rignot et al., 1997; Rindfuss and Stern, 1998;
Wood and  Skole, 1998; Vosti et al., 2000; http://www.uni-bonn.de/ihdp/lucc/). Most investigations that
integrate remote sensing, agroecological, or socio-economic dimensions focus on the prediction of
deforestation rates and the estimation of land-cover/land-use (LCLU) change at regional scale.

Local stakeholders have seldom been involved  in remote sensing research in the area. This is unfortunate
because municipal authorities and local organizations represent a window of opportunity to improve
frontier governance (Nepstad et al., 2002). These stakeholders have been increasingly called upon to
provide new services or fill gaps in services previously provided by federal and state government.  Small-
scale farmer associations are key local organizations because some of the obstacles to changing current
land use patterns  and minimizing deforestation cannot be instituted by farmers working individually but
are likely to require group effort (Ostrom E, 1999; Sydenstricker-Neto, 1997).


6.1.1   Study Objectives

The objectives of our study were to: (1) determine LC change in the recent colonization area (1986-1999)
of Machadinho D'Oeste, Rondonia, Brazil; (2) engage community stakeholders in the processes of
mapping and assessing the accuracy of LC maps; and (3) evaluate the relevance of LC maps (inventory)
:Page.88of339

-------

for understanding community-based LU dynamics in the study area. The objectives were defined to
compare stakeholder estimates and perceptions of LC change in the region to what could be measured
through the classification of multi-spectral, multi-temporal remotely sensed data. We were interested in
learning if there would be increased efficiencies, quality, and ownership of the inventory and evaluation
process by constructively engaging stakeholders in local communities and farmer associations.  In this
chapter, we focus our presentation on characterizing and mapping LC change between 1994 and 1999.


6.1.2   Study Area

Established in 1988, the municipality of Machadinho D'Oeste (8,502 km2) is located in the northeast
portion of the State of Rondonia, western Brazilian Amazonia (see Figure 6-1). The village of
Machadinho  D'Oeste is  150 km from the nearest paved road (BR-364 and cities of Ariquemes and Jaru),
and 400 km from Porto Velho, the state capital. When first settled, the majority of the area was originally
composed of untitled public lands.  A portion of the area also included old, privately owned rubber estates
(seringais), which were expropriated (Sydenstricker-Neto, 1992).
                                                 Machadinho D'Oeste and Vale do Anari
                                                 Rondonia
                                                 Legal Amazonia
A
                   Figure 6-1.  Legal Amazonia, Rondonia, and study area, Brazil.

 The most recent occupation of the region occurred during the mid-1980s with the development of the
 Machadinho Colonization Project (PA Machadinho) by the National Institute for Colonization and
 Agrarian Reform (INCRA).  In 1984, the first parcels in the south of the municipality were delivered to
 migrant fanners and since then the area has experienced recurrent migration  inflows.  From hundreds of
 inhabitants in the early-1980s, Machadinho's 1986 population was estimated to be 8,000 and in 1991 it
                                                                                     Page 89 of 339

-------
had increased to 16,756 (IBGE, 1994; Sydenstricker-Neto, 1992; Sydenstricker-Neto and Torres, 1991).
In 2000, the demographic census counted 22,739 residents. This amounted to an annual population
increase during the decade of the 1990's of 3.5%.  Although Machadinho is an agricultural area by
definition, 48% of its population lives in the urban area (IBGE, 2001).

Despite the importance of colonization in Machadinho, forest reserves comprise 1,541 km2 or 18.1% of
its area. Most of these reserves became state extractive reserves in 1995, but there are also state forests
for sustained use.  Almost the entire area of the reserves is covered with primary forest (Olmos et al.,
1999).

In biophysical terms, Machadinho's landscape combines areas of altiplano with areas at lower elevation
between 100-200 m above sea  level.  The major forest cover types are tropical semi-deciduous forest and
tropical flood plain forest. The weather is hot and humid with average annual temperature of 24 °C and
relative humidity between 80%-85%.  A well-defined dry season occurs between June and August and
annual precipitation is above 2,000 mm. The soils have medium to low fertility and most of them require
high inputs for agriculture development (Brasil, MIRAD-INCRA e SEPLAN - Projeto RADAMBRASIL,
1985; EMBRAPA, 1982).

The study area is 215,000 ha and is divided between the municipalities of Machadinho D'Oeste (66%)
and the north of Vale do Anari (34%).  It includes more recent colonization areas, but its core comprises
the first phase (land tracts 1 and 2) of the former Machadinho Settlement settled in 1984 and 1985.  These
two land tracks have a total area of 119,400 ha.  The land tracks are composed of multiple land uses of
which 3,000 ha are designated  for urban development, 35,165 ha are in extractive reserves, and 81,235 ha
are divided into 1,742 parcels (average size 46 ha) distributed to migrant farmers by INCRA
(Sydenstricker-Neto, 1992).
6.2   Methods

6.2.1  Imagery

Landsat Multi-Spectral Scanner (MSS), Thematic Mapper (TM), and Enhanced Thematic Mapper
(ETM+) digital images were acquired for the study area (path 23 I/row 67) for one date in 1986, 1994, and
1999, respectively. The 1994 and 1999 TM images were 30 m resolution and the 1986 MSS image was
re-sampled to 30 m to match the TM images. The images were acquired during the dry season (July or
August) of each year to minimize cloud cover. The Landsat images used for LC analysis were the best
available archived scenes.

The 1986 MSS image (August 10) and the 1999 ETM+ image (August 6) were obtained from the Tropical
Rainforest Information Center (TRFIC) at Michigan State University. The 1994 TM image (July 15) was
provided by the Center for Development and Regional Planning (CEDEPLAR) at the Federal University
of Minas Gerais (UFMG) in Brazil.  Although a TM image for the 1986 data was available, random offset
striping made this scene unusable. The MSS image acquired on the same date was used instead, though
thin clouds obscured part of the study area.
Page 90 of 339

-------
The geometrically corrected 1999 ETM+ image provided by TRFIC had the highest geometric accuracy as
determined using Global Positioning System (GPS) coordinates collected in the field and resulting in a
Root Mean Square Error (RMSE) <1.0 pixel. Therefore, we co-registered the 1986 and 1994 images to
the "base" 1999 ETM+ image using recognizable fixed objects (such as road intersections) in ERDAS
Imagine 8.4.  We used nine "fixed" locations, known as Ground Control Points (GCPs), to register both
images.  For the 1986 and 1994 MSS images, the RMSE was 0.54 and 0.47 pixels, respectively.

Additional image processing included the derivation of tasseled-cap indices for each image.  Tasseled-cap
transformed spectral bands 1, 2, and 3 (indices of brightness, greenness, and wetness, respectively) were
calculated for the TM images using Landsat-5 coefficients published by Crist et al. (1986). Although
Huang et al. (2002) has recommended using a reflectance-based tasseled-cap transformation for Landsat 7
(ETM+) based on at-satellite reflectance, these recommended tasseled-cap coefficients for Landsat 7 were
not published at the time of this study. Tasseled-cap bands 1 and 2 (brightness and greenness) were
calculated for the MSS image using coefficients published by Kauth and Thomas (1976).  These
investigators have shown tasseled-cap indices to be useful in differentiating vegetation types on the
landscape and the tasseled-cap indices were therefore included  in this analysis of mapping LC.  Image
stacks of the raw spectral bands and tasseled-cap bands were created in ERDAS Imagine 8.4. This
resulted in one 6-band image for 1986 (MSS spectral bands 1, 2, 3, 4, and tasseled-cap bands 1 and 2), a
10-band image for 1994 (TM spectral bands 1-7 and tasseled-cap bands 1, 2 and 3), and an 11-band
image for 1999 (ETM+ spectral bands 1-8 and tasseled-cap bands 1-3). The 15 m  panchromatic band in
the 1999 ETM+ image was not used in this analysis.

6.2.2  Reference Data Collection

As in many remote areas in developing countries, data sources for producing and assessing accuracy of
LC maps for our study area were limited. Upon project initiation (2000) no suitable LC reference data
were available. Historical aerial photographs were not available for discriminating between LC types for
our study area. In this context, satellite  imagery was the only spatially referenced data source for
producing reliable LC maps for the area.

Because we wanted to document LC change from the early stages of human settlement and development
(beginning in 1985), when major forest  conversion projects were established, our objective was to
compile retrospective data to develop and validate a time series of LC maps. The challenge of compiling
retrospective data became an opportunity to engage community stakeholders in the mapping process and
"bring farmers into the map." We decided to enlist the help of farmers, who are very knowledgeable
about  land occupation practices and the major forces of land use dynamics, to be our source for
contemporary and retrospective data collection. Also, by engaging the locals early in the process, we
could examine the advantages and limitations of this strategy for future resource inventory projects in the
region conducted by researchers and local stakeholders.

We utilized a seven category LC classification scheme as defined in Table 6-1. The level of detail of this
classification scheme is similar to others used in the region, and should permit some level of comparative
analysis with collaborators and stakeholders (de Moraes et al. 1998; Rignot, et al.  1997).
                                                                                   Page 91 of 339

-------
                Table 6-1.  Land-cover classification scheme and definitions.
Land Cover
Primary forest
Secondary
forest
Transition
Pasture
Crops
Bare soil
Water
Definition
Mature forest with at least 20 years growth
Secondary succession at any height and less than 20 years
growth
Area recently cleared, burned or unburned and not currently in
use
Area planted with grass, ranging from overgrazed to bushy
Area with agriculture, including perennial and annual crops
Area with no vegetation or low sparse vegetation
Waterbody, including major rivers, water streams, and
reservoirs
In August 2000, with the assistance of members of nine small-scale farmer associations in the study area,
we collected field data to assist in the development of spectral models of each cover type for image
classification and to validate the resulting LC maps. All associations that we contacted participated in the
mapping project. Initially, we met with the leadership of each association and presented our research
goals and objectives, answered questions, and invited members of each association to participate in the
study. After developing mutual trust and actively engaging the association, data collection groups were
formed averaging 12 individuals  per association (total over 100 individuals). Special effort was made to
include individuals in each group who were long-term residents and who were knowledgeable about
historical LU practices in the region. Nearly half of the members in each of the nine groups were farmers
who settled prior to 1986.

An introductory meeting was conducted with each group to provide a hardcopy (false-color composite) of
the 1999 ETNT image with parcel boundaries overlaid and to solicit comments and observations
regarding farm locations, significance of color tones on the image, and clarification of LU practices and
associated cover types. Each participant was then asked to indicate retrospective and current LU for
his/her parcel and for other parcels with which they were familiar. Any questions that could not be
answered by individuals were referred to the group for discussion, elaboration, and decision-making. For
each identified cover type, we annotated and labeled polygons on stable acetate overlaid on the false-
color composite image.  Each polygon consisted of a homogeneous area labeled as one of seven LC types
for each year corresponding to the dates of the Landsat images used in the study.

Notes were taken during the  interview process to indicate the date each farmer started using the land,
areas of the identified LC types for each of the three years considered in the  study (1986,  1994,  1999),
changes over time, level of uncertainty expressed by participants while providing information for each
annotated polygon, and other information farmers considered relevant. After each meeting, the research
team traveled the main roads in the area just mapped by the farmer association and compared the
identified polygons with what could be  observed.  The differences between the cover type provided by the
farmers and what was observed were minimal.  In areas where such meetings could not be organized, the
.Page 92 of 339

-------
research team traveled the feeder roads and annotated the contemporary LC types that could be
confidently identified.

Field data were collected for over 1,500 polygons, including all seven LC types of interest. We
considered this to be an adequate sample for image classification and validation of our maps. Although
an effort was made to ensure all land cover types were well represented in the database, some types such
as bare soil were represented by a relatively small sample sizes (n < 200 pixels).


6.2.3  Data Processing

More than 1,000 polygons identified during the farmer association interviews were screen digitized and
field notes about the polygons were compiled into a table of attributes. Independent random samples of
polygons for each of the seven land cover types were selected for use in image classifier training and land
cover map validation, respectively.  Although the number of homogenous polygons annotated in the field
was large, polygons varied greatly in size from <5.0 to > 1,000 ha and were not evenly distributed among
the seven cover types (see Table 6-2). For cover types that had a large number of polygons, half of the
polygons were used for classifier training and half for map validation.  For two cover types, however, the
polygon samples were so large in area (and therefore contained so many pixels) that they could not be
used effectively because of software limitations. The primary forest and pasture cover type polygons
were therefore randomly subdivided so that only one half of the pixels was set aside for both classifier
training and for map validation (i.e., one quarter of total eligible data pixels were used for each part of the
analysis).  However, this approach did not yield a sufficient number of sample polygons for some of the
more rare cover types (i.e., <1% land area). To address this issue, we randomly sampled individual pixels
within these polygons of the rare cover types and equally partitioned the pixels into the two groups used
for classifier training and map validation.
               Table 6-2.  Number of pixels sampled for classifier training and
                          map validation for the 1999 Image.
Land Cover Class
Forest
Secondary Forest
Transition
Crops
Pasture
Bare Soil
Water
Total
# Polygons
Total
189
108
43
306
261
17
106
1,030
# Pixels
Total
16,755
3,060
10,054
2,693
4,496
140
1,705
38,903
# Pixels/Polygons
Mean
89
28
33
63
17
8
16
38
Variance
5,349
401
917
1,358
120
18
244
2,089
                                                                                    Page 93 of 339

-------
6.2.4  Image Classification

Spectral signature files were generated in ERDAS Imagine 8.4 to be used in supervised classification
using a maximum likelihood algorithm.  The spectral signatures included both image and tasseled-cap
bands created for each image of each analysis year.  LC maps were produced for each of the three years
containing all seven LC types in each of the resulting maps.  Post-classification 3x3 pixel majority
convolution filter was applied to all three LC maps to eliminate some of the speckled pattern (noise) of
individual pixels. The result of this filter was to eliminate pixels that differed in LC type from their
neighbors and tended to thereby eliminate both rare cover types as well as those that exist in small patches
on the landscape (such as crops).  However, we concluded that the filtering process introduced an
unreasonable amount of homogeneity onto the landscape and obscured valuable information relevant to
the spatial pattern of important cover types within our unit of analysis, which was the land parcel. All
subsequent analyses were performed  on the unfiltered LC maps for all three dates of imagery.


6. 2. 5  Accuracy Assessment

We assessed the accuracy of the three LC maps at the pixel level using a proportional sampling scheme
based on the distribution of validation sample points (pixels) for each of the cover types in the study.
This methodology was efficiently applied in this study because the distribution of our field collected
validation sample points was  representative of the distribution in area of each cover type in the study area
(see Table 6-2).

The proportional sample of pixels used for the accuracy assessment for each year was selected by first
taking into account the cover type having the smallest area based on  the number of validation pixels we
had for that cover type. Once the number of pixels in the validation  data set was determined for the cover
type occupying the smallest area,  the total number of validation pixels to be used for each analysis year
was calculated by the general formula:

                                  S, = N/PS                                                    (1)

where, S, = total number of validation pixels to be sampled for use in accuracy assessment; Ns= number
of pixels in the land cover type  with the smallest number of validation pixels; Ps = proportion of the
classified map predicted to be the cover type with the smallest amount of validation pixels.

The total number of validation pixels to be used to assess the accuracy for each cover type was then
calculated by the general formula:
                                                                                              (2)
where Ve = the total number of validation pixels to be used for a specific cover type, S, = total number of
validation pixels to be sampled for use in accuracy assessment, and Pc = the proportion of the classified
map predicted to be that cover type.
Page 94 of 339

-------
To illustrate this proportional sampling accuracy method, we describe the forest cover type for the 1999
map. The cover type with the smallest number of validation pixels in 1999 was the bare soil cover type
with a total of 79 validation pixels (NJ.  Of the total number of pixels in the  1999 classified map
(8,970,395), the bare soil cover type was predicted to be 201,267 pixels; or a proportion of 0.0224 of the
total classified map (Pg). Using equation 1, above, the resulting sample size of validation pixels to be
used for accuracy assessment of the 1999 LC map (SJ was 3,521 pixels. In the 1999 map, the forest
cover type was predicted to cover 68.6% of the classified map (i.e. 6,155,275 pixels out of 8,970,395 total
pixels). Using equation 2, above, the sample size of validation pixels to be used for the forest cover type
(Vc) was then 2,414 (i.e., 3,521 * 0.686).

Once the validation sample sizes were chosen for each cover type, a standard accuracy assessment was
performed whereby the cover type of each of the validation pixels is compared with the corresponding
cover type on the classified map. Agreement and disagreement of the validation data set pixels with the
pixels on the classified map are calculated in the form of an error matrix where producer's, user's, and
overall accuracy are evaluated.
6.3    Results and Discussion

6.3.1   Classified Imagery and Land-Cover Change

Presentation and discussion of accuracy assessment results will focus only on the 1994 and 1999 LC
maps. (The 1986 map was not directly comparable because it was based on coarser resolution, re-
sampled MSS data and because it contained cirrus cloud cover over parts of the study.) A visual
comparison of 1986-1999 LC maps shows significant change. Plate 6-1 presents the classified imagery
with parcel boundaries overlaid for a portion of the study area near one of the major feeder roads. In
1986, approximately two years after migrant settlement, only some initial clearing was observed near
roads; however, 13 years later (1999) there were significant open areas and only a small number of
parcels that remain mostly covered with primary forest. The extensive deforestation illustrated in
Plate 6-1 is confirmed by the numeric data presented in Table 6-3. In 1994,147,380 ha or 68.5% of the
total study area (215,000 ha) was covered in primary forest. The amount of primary forest decreased in
1999 by 30,000 ha, a negative change of 20.2% in primary forested area. The area of deforestation
observed between 1994 and 1999 was more than twice that estimated for the 1986-1994 period (not
shown).  This represented a 4.5x increase compared to the 1986-1994 deforestation rate.  Table 6-3
presents the change in LC for 1994-1999 as both percent area and percent change.

For the non-forest cover types, all had an increase in area between 1994-1999.  This was largely at the
expense of primary forest.  Increases in secondary forest had the dominant "gain" in area during this
period with a total increase in area of almost 31,000 ha in 1999, followed by slightly smaller increases in
crops and pasture (27,832 ha and 22,386 ha respectively). The most significant increases on a
proportional basis occurred with the crops and pasture cover types, both increasing over 200% during this
time period.
                                                                                 Page 95 of 339

-------
           1986 AASS
              N
1994 TM
1999 6TM
             A
 Parcel Boundanes
| Forest
j Secondary Forest
j Transition
I Crops
j Pasture
I Bare Soil
I Water
                                                                         1   o   1
        Kilomtltrs

Scale 1:75,000
       Plate 6-1.   Land-cover classification for three time periods between 1986 and 1999.
                Table 6-3.  Land-cover change in study area, Rondonia 1994-1999.
Class
Forest
Secondary Forest
Transition
Crops
Pasture
Bare Soil
Water
Total:
Area (ha)
1994
147,380
27,759
2,234
12,072
16,253
5,183
4,251
215,132
1999
117,573
30,732
5,555
27,833
22,386
6,823
4,252
215,154
Change in Area (ha)
1994-99
-29,806
2,973
3,321
15,760
6,133
1,640
1

Percent of Area
1994
68.5
12.9
1.0
5.6
7.6
2.4
2.0
100.0
1999
54.6
14.3
2.6
12.9
10.4
3.2
2.0
100.0
Percent
Change
1994-99
-20.2
10.7
148.6
130.5
37.7
31.6
0.0

The increase in pasture area was inflated by a tremendous deforestation event totaling approximately
5,000 ha in 1995 in the southeast portion of the study site.  Subsequent to clearing, the area was partially
planted with grass and later divided into small-scale farm parcels in 1995-1996, creating a new settlement
called Pedra Redonda. The most important and broadly distributed crop among the small-scale farms was
Page 96 of 339

-------
coffee (Coffea robusta), which received special incentives through subsidized federal government loans
and the promotional campaign conducted by the State of Rondonia "Plant Coffee" (1995-1999).

The LC change matrix provides more detailed change information including the distribution of deforested
areas into different agricultural uses (see Table 6-4).  For 1994-1999, we determined that 61.1% of the
area did not undergo LC change. This metric was calculated by summing the percentages along the major
diagonal of the matrix. Note that primary forest showed the greatest decrease in area, while concurrently
exhibiting the largest area unchanged (48.9%), due to the large area occupied by this cover type.  For the
remaining cover types, the no-change was significantly lower (as  shown throughout the diagonal of the
matrix) because of the proportionally smaller area occupied by these cover types.

        Table 6-4.  Land-cover change matrix and transitions in study area, Rondonia 1994-1999.
1994
Forest
Sec. Forest
Transition
Crops
Pasture
Bare Soil
Water
Total %
Total Area (ha):
1999
Forest
48.9
4.8
0.1
0.3
0.2
0.3
0.0
54.6
117,553
Sec.
Forest
8.3
3.5
0.2
1.2
0.7
0.4
0.0
14.3
30,731
Transition
1.8
0.3
0.0
0.2
0.1
0.1
0.0
2.6
5,554
Crops
5.9
2.5
0.4
2.1
1.3
0.8
0.0
12.9
27,833
Pasture
2.3
1.3
0.2
1.4
4.5
0.7
0.0
10.4
22,386
Bare
Soil
1.2
0.5
0.1
0.4
0.8
0.2
0.0
3.2
6,823
Water
0.0
0.0
0.0
0.0
0.0
0.0
2.0
2.0
4,252
Total
%
68.5
12.9
1.0
5.6
7.6
2.4
2.0
100.0

Total
Area
(ha)
147,380
27,759
2,234
12,072
16,253
5,183
4,251

215,132
                                    I  No change 1994-99: 61.1%  |

 The 8.3% conversion rate of primary forest to secondary forest indicates that some recently deforested
 areas remained in relative abandonment allowing vegetation to partially recover in a relatively short
 period of time (see Table 6-4).  Increase  in classes such as transition and bare soil also indicates the same
 trend of new areas incorporated into fanning and their partial abandonment as well. Of areas that were
 primary and secondary forest in 1994, crops were the most dominate change category (>8%) followed by
 pasture (<4%).  While the change in LC  mapped from the image classification fits with what we expect to
 see in the region, it is important to differentiate (when possible) real change from misclassification.
 Potential errors associated with the mapping are discussed below.

 6.3.2   Map Accuracy Assessment

 The user's accuracy is summarized in Table 6-5. The increase in overall map accuracies for each
 subsequent year in the analysis was attributed to several factors. First, we used three different sensors
 (MSS, TM and ETM+), which introduced increased spatial and spectral resolution of the sensors over
                                                                                     Page 97 of 339

-------
 time.  Second, the 1986 MSS image had clouds that introduced some classification errors. Third,
 collecting retrospective data was a challenge because interviewees sometimes had difficulty in recalling
 LC and associated LU practices over the study period. In general, retrospective LU information had a
 higher level of uncertainty than for time periods closer to the date of the interview.

                         Table 6-5. User's accuracy in study area,
                                    Rondonia 1986-1999.
Classified Data
Forest
Secondary Forest
Transition
Crops
Pasture
Bare Soil
Water
Overall Accuracy
Kappa Statistic
1986
89.8%
45.5%
42.9%
25.0%
80.0%
—
100.0%
84.6%
0.52
1994
93.5%
63.1%
75.0%
53.6%
77.5%
66.7%
100.0%
88.3%
0.69
1999
96.7%
77.4%
57.5%
67.5%
89.6%
28.7%
93.6%
89.0%
0.78
 Despite these difficulties, however, overall accuracy was between 85% and 89% for 1986 and 1999,
 respectively.  Accuracy for specific classes ranged between 50% and 90%, achieving >96% for primary
 forest in 1999.  Some bare soil (1999) and crops (1986) classes were particularly difficult to map and
 attained accuracies below 30%.  The sample size for these particular cover types was relatively small
 which may have contributed to this poor outcome.  When coupled with the fact that areas of bare soil and
 crops tend to be small in the study area (< 1.0 ha) the lower accuracies were not unexpected for these
 classes. Error matrices for 1994 and 1999 are presented in Tables 6-6 and 6-7, respectively.  The overall
 accuracy for 1999 was 89.0% (Kappa 0.78).  With the exception of bare soil, all the remaining classes had
 user's accuracies that ranged from 57.5% to 96.7% and producer's accuracies between 66.5% and
 100.0%. The overall accuracy for the 1994 land cover map was 88.3%. In general, accuracy for specific
 cover types range between 50% and 90%, achieving a high of 96.7% for primary forest in 1999. The bare
 soil (1999) accuracy was below 30%; however, the limited proportion  of training sample pixels relative to
 the total amount of pixels comprising the study area for this specific class  may have contributed to this
 poor outcome.

 The pattern of misclassification and confusion between LC classes is similar for both the 1994 and 1999
 error matrices (Figures 6-6 and 6-7) although different image sensors (TM and ETM+) were used.
 Confusion between primary and secondary forest was expected because our classification scheme did not
 separate secondary forest for different successional stages. Some of the polygons delineated  in the field
 as secondary forest exceeded 12 years of re-growth, and closely resembled semi-deciduous primary
 forest.  Accordingly, there was probably some spectral overlap between "old" secondary forest and the
 semi-deciduous primary forest. Confusion between secondary forest and crops occurred because many
 coffee areas were shaded with native species such as rubber tree (Hevea brasiliensis), freijo (Cordia
Page 98 of 339

-------
goeldiana), and Spanish cedar (Cedrela odoratd), or included pioneers species such as embauba
(Cecropia sp). Therefore, shaded crops appeared as partially forested areas.
               Table 6-6.  Error matrix for the land-cover map in study area, Rondonia 1994.
Classified Data
Forest
Secondary Forest
Transition
Crops
Pasture
Bare Soil
Water
Total
Producer's Accuracy
Reference Data
Forest
1218
40
0
9
15
4
0
1286
94.7%
Sec.
Forest
76
82
0
15
0
0
0
173
47.4%
Transition
0
0
9
1
0
0
0
10
90.0%
Crops
5
6
3
30
6
5
0
55
54.6%
Pasture
0
1
0
0
79
0
0
80
98.8%
Bare
Soil
1
1
0
1
1
18
0
22
81.8%
Water
3
0
0
0
1
0
25
29
86.2%
Total
1303
130
12
56
102
27
25
1655

User's
Accuracy
93.5%
63.1%
75.0%
53.6%
77.5%
66.7%
100.0%


                              I   Overall Classification Accuracy = 88.3%   |
                                          Kappa Statistic = 0.69
               Table 6-7. Error matrix for the land-cover map in study area, Rondonia 1999.
Classified Data
Forest
Secondary Forest
Transition
Crops
Pasture
Bare Soil
Water
Total
Producer's Accuracy
Reference Data
Forest
2370
44
0
0
0
0
0
2414
98.2%
Sec.
Forest
73
233
0
58
7
0
2
373
62.5%
Transition
0
0
54
0
0
0
0
54
100.0%
Crops
3
12
19
206
5
64
1
310
66.5%
Pasture
0
8
5
14
198
7
0
232
85.3%
Bare
Soil
0
4
16
21
9
29
0
79
36.7%
Water
6
0
0
6
2
1
44
59
74.6%
Total
2452
301
94
305
221
101
47
3521

User's
Accuracy
96.7%
77.4%
57.5%
67.5%
89.6%
28.7%
93.6%


                               |    Overall Classification Accuracy = 89.0%    j
                                           Kappa Statistic - 0.78
                                                                                       Page 99 of 339

-------
 Despite the lack of homogeneity within the "transition" LC class, confusion with other cover types (crops,
 pasture, and bare soil) was minimal. Confusion most likely occurred because the transition cover type
 was not particularly unique (see Tables 6-6 and 6-7). The confusion seen in the error matrices between
 pasture and bare soil and the confusion between bare soil and recently planted coffee areas were expected.
 Overgra/ed pasture had little vegetative matter allowing these areas to easily be misclassified as bare soil.
 Also, it was unlikely that spectral reflectance by coffee plants less than 0.5 m tall planted in a 3.0 x 3.0 m
 spacing was detected and discriminated from the surrounding soil background, resulting in confusion
 between young coffee and bare soil. Water, although spectrally distinct, was easily biased along edge
 pixels. This was particularly true in the case of small and circuitous watercourses in mixed systems.

 6.3.3   Bringing Users Into  the Map

 Initially, the local fanners expressed substantial distrust and skepticism about the mapping project,
 however, trust was established throughout the mapping process and a good working relationship was
 established. To best present our findings, we organized community meetings in the areas of the farmer
 associations involved earlier in the  process.  Participation in these meetings ranged from as few as six
 individuals to packed rooms with more than 30 people. These meetings intentionally included the
 broader community and farmers who had not taken part in the data collection.  Each farmer that had
 provided input during the data collection phase of the study received a color copy of the  1999 LC
 mapping results.  Additional meetings  were arranged with agricultural extension agents,  leaders of the
 local rural  labor union, municipal officials, and middle school students.

 Upon examination, fanners provided verbal confirmation of our estimates and errors.  Specific concerns
 closely resembled the classification errors shown in the accuracy assessment matrices (see Tables 6-6 and
 6-7). More than thirty farmers who did not participate in the data collection process compared their
 estimates of LC for their individual parcels, with the statistics generated from the LC  maps.  In all cases,
 the general patterns were the same and differences in LC class areas were small. Relevant ideas provided
 by local fanners were that the map provided  a  common ground to engage participants in  a discussion on
 environmental awareness and appreciation, and that the maps became an instrument of empowerment to
 local communities.  For example, farmers were shocked at the significant changes in LC  overtime for the
 whole area (1994-1999).  This stimulated a debate on the incentives for forest conversion versus the
 constraints imposed by the agricultural systems adopted by farmers.

 Areas with perennial crops increased dramatically over the years of the study, with coffee becoming the
 single most important cash crop. Consequently, farmers tended to decrease the amount of land planted
 with annual subsistence crops such  as rice, maize, beans, and cassava. As a result, food security became
 an issue for some communities. Although new areas would typically come into production within two
 years, the incentives were to expand areas of pasture. An important economic  incentive was the dramatic
 drop in coffee prices worldwide. In the 2001-02 season sale prices of coffee in Machadinho D'Oeste
 were only  half of market value two years earlier. Many small farms were not entirely  harvested and many
 farmers reported that they were very inclined to change areas with old coffee trees into pasture.  However,
 the decline in coffee prices motivated enlightened discussions on the economic and environmental
 dangers in converting most of the land  into pasture.
Page 100 of 339

-------
The importance of common forest reserves in the region and the potential and constraints to foster forest
conservation were discussed extensively.  There was great appreciation for the fact that the map clearly
indicated that the major water resources were within the forest reserves that had not been cleared.
Identification of secondary forest along major water streams within the settled areas stimulated a debate
on stream bank erosion and nutrient loss into rivers. The general agreement was that farmers went too far
in clearing the land and needed to focus efforts on reforesting the areas around the rivers.  Farmers voiced
the reasons, incentives, and constraints they face in trying to deliberately reforest areas along the water
streams. In most of the reported cases of forest recovery, natural re-growth was happening rather than
seeded reforestation. The reported lack of available water in areas in which farmers had irrigated their
coffee was a surprise to the researchers.

Two outcomes contributed greatly to farmer empowerment. First, our map offered a synoptic perspective
of development patterns that farmers had not entirely realized previously.  Farmers felt that having a
deeper knowledge of what was happening in their area would enable them to better respond to local needs
and contribute to state wide discussions on promoting environmental sustainability. Second, farmers
voiced the collective opinion that the actual participation in the mapping project contributed to better
organizing themselves  into interest groups.  The explicit acknowledgment  in the LC map legends of the
local associations' contributions was a source of pride within  the broader community.
6.4    Conclusions

Visual inspection and comparison of LC maps with other data sources enable us to conclude that our
efforts provided good estimation of LC change in the study area. The study area changed over the 13
year study period from a typical new colonization area in its early stage, where higher proportions of
forests and areas in transition dominate, to one in which these cover types are diminished in area in
comparison to the proportional increase in crops and pasture.  Statistically based evaluations (error
matrices) demonstrated acceptable levels of accuracy with classification errors that were easily
explainable and understandable. Participation and input from local farmers was very useful in producing
cover maps and proved to be an extremely effective means for collecting  classifier training and validation
data in areas where other sources were not available.  Follow-up meetings with farmers were very
constructive for addressing conservation issues with regional and global implications.

Study weaknesses, included the intrinsic limitations imposed by the use of the different satellite sensors
(i.e., MSS, TM, and ETNT).  Also, reference data sizes for some cover classes were relatively small and
interviewees expressed greater levels of uncertainty in retrospective data  reconstruction than for the
current time period. Despite these constraints in data collection, we were confident that they do not
represent an extra burden when compared to the challenge of obtaining good levels  of agreement among
remote sensing specialists when using other techniques such as high resolution videography. The
application of an integrated field data collection process  would have enhanced the quality of our data.
Such an integrated process would comprise simultaneous collection of remotely sensed ground data and
household socioeconomic surveys on  LU/LC to facilitate the direct comparison between data sources.
                                                                                     Page 101 of 339

-------
 The level of detail of our classification scheme was similar to that used by other investigators in the
 region, and our map accuracies compared favorably with their results (de Moraes et al. 1998; Rignot, et
 al. 1997).  For local stakeholders, however, our classification scheme was not sufficiently detailed.
 Stakeholders would most like to clearly distinguish specialty crops such as cacao, coffee, and shade
 coffee, which would not be practical with the resolution of these data sets.
 6.5    Summary

 This study assessed LC change in a recent colonization area in the municipalities of Machadinho D'Oest
 and Vale do Anari, State of Rondonia, Brazil. Landsat MSS, TM, and ETM' data were used to create
 maps of LC conditions for 1986, 1994, and  1999.  Images were obtained in July/August (dry season) and
 field data was collected during August 2000 with the assistance of nine local farm associations and
 approximately 100 independent farmers.  At meetings with the associations, hardcopy false color
 composites of imagery data with parcel boundaries were presented to individual landowners. Each
 individual provided historical and contemporary LU for known areas.  Polygons were annotated and
 labeled on  stable acetate for each cover type, corresponding to the seven-category classification scheme
 Notes were taken during the interview process to indicate the dates of land clearing, cover type, and level
 of uncertainty expressed by the participants.

 Approximately 1,000 polygons were field annotated and random samples were selected for classifier
 training and map validation. Spectral signature files were generated from training polygons and used in
 supervised  classification using maximum likelihood classification. Overall accuracy for each year ranged
 between 85-95% (Kappa 0.52-0.78).  LC changes were consistent with the trends observed in the study
 area and reported by others. The participatory process involving local farmers was crucial for achieving
 the objectives of the study. The specific  protocol developed for data collection should be applicable in a
 wide range of cases and contexts.

 The building of trust with the local stakeholders is important with contested issues such as deforestation
 in the topics. Systematic data collection  among farmers (the primary land users) provided a valuable
 source of information based on their direct observation in the field and historical data not  directly
 available through other sources. This procedure provided greater confidence for interpreting and
 understanding classification errors. Finally,  the process itself empowered local farmers and provided a
 forum for discussing land use processes in the region, including challenges to alleviate poverty, increase
 agrosilvopastoral farming systems, arrest deforestation, and study its implications for developing more
effective land use policies.

 Including the local stakeholders in the research was very effective process for evaluating LC change in
the region.  For stakeholders and researchers, the mapping and reporting process fosters better
 understanding of the patterns and processes of environmental change in the study area.  We foresee that
 participatory mapping projects such as the one reported in this paper have the potential to  become an
 important planning device for regional-scale development in Brazil.  With greater economic opportunity
and stronger institutions at the local level, society is likely to improve the ability to identify and adopt
more environmentally sound LU activities.
Page 102 of 339

-------
6.6    Acknowledgments

We acknowledge the Brazil farmer associations in Machadinho D'Oeste, and associates in the Center for
Development and Regional Planning (CEDEPLAR) and Center for Remote Sensing (CSR), Federal
University of Minas Gerais (UFMG). The Brazilian National Council for Scientific and Technological
Research (CNPq), the Teresa Heinz Scholars for Environmental Research program, the Rural
Sociological Society (RSS), and the World Wildlife Fund (WWF, Brazil) provided major financial
support. At Cornell University, project sponsors included Cornell International Institute for Food,
Agriculture, and Development (CIIFAD), Department of Crop and Soil Sciences, Department of
Development Sociology and Graduate Field of Development Sociology, Einaudi Center for International
Studies, Latin American Study Program (LASP), Institute for Resource Information Systems (IRIS), and
the Population Development Program (PDP). In addition, the Tropical Rain Forest Information Center
(TRF1C) at Michigan State University provided the satellite imagery without which this study would not
have been possible.
6.7   References

Alves, D.S., J.L.G. Pereira, C.L. DeSouza, J.V. Soares, and F. Yamaguchi.  Characterizing land use
    change in central Rondonia using Landsat TM imagery. Int. J. Remote Sensing, 20, 28-77, 1999.

Asnet, G.P., A.R. Townsend, and M.M.C. Bustamante.  Spectrometry of pasture condition and
    biogeochemistry in the Central Amazon.  Geophysical Research Letters, 26, 2769-2772, 1999.

Boyd, D.S., G.M. Foody, P.J. Curran, R.M. Lucas, and M. Honzak. An assessment of radiance in Landsat
    TM middle and thermal infrared wavebands for detection of tropical forest regeneration. Int. J.
    Remote Sensing 17, 249-261, 1996.

Brasil, MIRAD-INCRA e SEPLAN-Projeto RADAMBRASIL, Estudo da vegetacao e do inventdrio
    florestal, Projeto de Assentamento Machadinho, Glebas 1 e 2.  Goiania, GO, Brazil,  1985.

Crist, E.P., R. Laurin, and R.C. Cicone. Vegetation and soils information contained in transformed
    thematic mapper data.  Proceedings oflGARSS' 1986 Symposium, Ecological Society of America
    Publications, Division, ESA SP-254,  1986.

de Moraes, J.F.L, F. Seyler, C.C. Cerri, and B. Volkoff. Land cover mapping and carbon pool estimates
    in Rondonia, Brazil. Int. J. Remote Sensing, 19(5), 921-934, 1998.

EMBRAPA/SNLCS. Levantamento de reconhecimento de media intensidade dos solos e avaliacao da
    aptidao agricola das terras em 100.000 hectares da gleba Machadinho, no municipio de Ariquemes,
    Rondonia. Boletim de Pesquisa n.  16, EMBRAPA/SNLCS, Rio de Janeiro, RJ, Brazil, 1982.

Fearnside, P.M. The causes of deforestation in the Brazilian Amazon, in The Geophysiology of
    Amazonia: Vegetation and Climate Interactions.  R.E. Dickenson (Editor), John Wiley, New  York,
    1987.
                                                                                Page 103 of 339

-------
Fearnside, P.M. Amazonian deforestation and global warming:  Carbon stocks in vegetation replacing
    Brazil's Amazon forest. Forest Ecol. & Management, 80, 21-34, 1996.

Fearnside, P.M. and W.M. Guimaraes.  Carbon uptake by secondary forests in Brazilian Amazon. Forest
    Ecol. & Management, 80, 35-46, 1996.

Fernandes, E.C.M. and J.C. Matos.  Agroforestry strategies for alleviating soil chemical constraints to
    food and fiber production in the Brazilian Amazon, in Chemistry of the Amazon:  Biodiversity,
    Natural Products, and Environmental Issues.  P.R. Seidl, et al. (Editors), American Chemical Society
    Washington, DC, 1995.

Fernandes, E.C.M., Y. Biot., C. Castilla, A.C. Canto, J.C. Matos, S. Garcia, R. Perin, and E. Wandelli.
    The impact of selective logging and forest conversion for subsistence agriculture and pastures on
    terrestrial nutrient dynamics in the Amazon. Ciencia e Cultura, 49, 34-47, 1997.

Foody, G.M., G. Palubinskas,  R.M. Lucas, P.J. Curran, and M. Honzak. Identifying terrestrial carbon
    sinks:  Classification of successional stages in regenerating tropical forest from Landsat TM data.
    Remote Sensing of Environment, 55, 205-216, 1996.

Frohn, R.C., K.C. McGwire, V.H. Dales,  and J.E. Estes. Using satellite remote sensing to evaluate a
    socio-economic and ecological model of deforestation in Rondonia, Brazil. Int. J. Remote Sensing,
    17,3233-3255,1996.

Huang, C., B. Wylie, L. Yang, C. Homer, and G. Zylstra. Derivation of a tasseled cap transformation
    based on Landsat 7 at-satellite reflectance.  Int. J. Remote Sensing, 23, 1741-1748, 2002.

IBGE. Censo Demogrqfico 1991 - Rondonia, Institute Brasileiro de Geografia e Estatfstica. (IBGE), Rjo
    de Janeiro, RJ, Brazil, 1994.

IBGE. Censo Demogrqfico 2000.  Available at Institute Brasileiro de Geografia e Estatistica (IBGE)
    website, http:// www.ibge.gov.br, December 2001.

INPE. Deflorestamento da Amazonia 1995-1997, Institute Nacional de Pesquisas Espaciais (INPE). Sao
    Jose dos Campos, SP, Brazil, 1998.

Kauth, R.J. and G.S. Thomas.  The Tasseled Cap - A graphic description of the spectral-temporal
    development of agricultural crops as seen by Landsat, in Symposium on Machine Processing of
    Remotely Sensed Data, IEEE, 76CH 1103 - IMPRESO, 41-51, 1976.

Kimes, D.S., R.F. Nelson, W.A. Salas, D.L. Skole. Mapping secondary tropical forest and forest age from
    SPOT HRV data.  Int. J. Remote Sensing, 20, 3625-3640, 1999.

Linden E.  The road to disaster. Time, October  16, p. 97-98, 2000.

Liverman, D., E.F. Moran, R.R. Rindfuss, and P.C. Stern (Editors). People and Pixels:  Linking Remote
    Sensing and Social Science.  National Academy Press, Washington, DC, 1998.
Page 104 of 339

-------
Lucas, R.M., M. Honzak, G.M. Foody, P.J. Curran, and C. Corves. Characterizing tropical secondary
    forests using multi-temporal Landsat sensor imagery.  Int. J. Remote Sensing, 14, 3061-3067, 1993.

Mausel, P., Y. Wu, Y. Li, E. Moran, and E. Brondizio.  Spectral identification of successional stages
    following deforestation in the Amazon. Geocarto International, 8, 61-71, 1993.

McCracken, S.D., E. Brondizio, D. Nelson, E.F. Moran, A.D. Siqueira, and C. Rodriquez-Pedraza.
    Remote sensing and G1S at farm level: demography and deforestation in the Brazilian Amazon.
    Photogrammetric Engineering and Remote Sensing, 65, 1311-1320, 1999.

Moran, E.F. and E. Brondizio. Land-use change after deforestation in Amazonia, in People and Pixels:
    Linking Remote Sensing and Social Science. D. Liverman, et al. (Editors), National Academy Press,
    Washington, DC, 1998.

Moran, E.F., E. Brondizio, P. Mausel, and Y. Wu. Integrating Amazonian vegetation, land use, and
    satellite data.  Bioscience, 44, 329-338, 1994.

Nepstad, D., D. McGrath, A. Alencar, A.C. Barros, G.  Carvalho, M. Santilli, and M. del C. Vera Diaz.
    Frontier Governance in Amazonia. Science, 295(5555), 629-631, 2002.

Olmos, F., A.P. de Queiroz Filho, and C.A. Lisboa.  As Unidades de Conservaqao de Rondonia
    Rondonia/SEPLAN/PLAN AFLORO/PNUD, Porto Velho, RO, Brazil, 1999.

Ostrom, E.  Self-governance and forest resources. Occasional Paper, 20, Center for International Forestry
    Research, Jakarta, Indonesia, 1999.

Peralta, P. and P. Mather. An analysis of deforestation patterns in the extractive reserves of Acre,
    Amazonia, from satellite imagery:  A landscape ecological approach.  Int. J. Remote Sensing ~> \
    2555-2570, 2000.

Rignot, E., W. Salas, and D.L. Skole. Mapping deforestation and secondary growth in Rondonia, Brazil,
    using imaging radar and Thematic Mapper data. Remote Sens. Environ., 59, 167-179, 1997.

Rindfuss, R.R. and P.C. Stern. Linking remote  sensing and social science: The need and the challenges,
    in People and Pixels: Linking Remote Sensing and Social Science. D. Liverman, et al. (Editors),
    National Academy Press, Washington, DC, 1998.

Roberts, D.A., G.T. Batista, J.L.G. Pereira, E.K. Waller, and B.W. Nelson. Change identification using
    multitemporal spectral mixture analysis: Applications  in eastern Amazonia, in Remote Sensing
    Change Detection:  Environmental Monitoring Methods and Applications. R.S. Lunettaand
    C.D. Elvidge (Editors), Ann Arbor Press, Chelsea, MI, 1998.

Sanchez, P.A. Soil productivity and sustainability in agroforestry systems, in Agroforestry: A Decade of
    Development. H.A. Steppler and P.K.R. Nair (Editors), Institute Council for Research in
    Agroforestry, Nairobi, Kenya, 1987.

Sanchez, P.A. and J.R. Benites. Low input cropping for acid soils of the humid tropics  Science  ^38
     1521-1527, 1987.
                                                                                   Page 105 of 339

-------
Serrao, E.A.S. and J.M. Toledo. The search for sustainability in Amazonian pastures, in Alternatives to
    Deforestation: Steps Toward Sustainable Use of the Amazon Rain Forest. A.B. Anderson (Editor)
    Columbia University Press, New York, NY, 1990.

Skole, D.L and C.J. Tucker. Tropical deforestation and habitat fragmentation in the Amazon: Satellite
    data from 1978-1988.  Science, 260, 1905-1910, 1993.

Steininger, M.K. Tropical  secondary forest regrowth in the Amazon: Age, area, and change estimation
    with Thematic Mapper data. Int. J. Remote Sensing,  17, 9-27, 1996.

Sydenstricker-Neto, J. Parceleiros  de Machadinho: Historia migratoria e as intera9oes entre a dinamica
    demografica e o ciclo agricola  em Rondonia. MA Thesis, State University of Campinas
    (UN1CAMP), Campinas, SP, Brazil, 1992.

Sydenstricker-Neto, J. Organizacdes Locals e Sustentabilidade nos  Tropicos Urnidos:  Urn estudo
    Exploratorio. Documentos de Trabaloho, EMBRAPA-IFPRI, Rio Branco, AC, Brazil, 1997.

Sydenstricker-Neto, J. and  H.G. Torres.  Mobilidade de migrantes: Autonomia ou subordinacao na
    Amazonia? Revista Brasileira de Estudos de Populacao. 8(1/2), 33-54,  1991.

Szott, L.T., E.C.M. Fernandes, and P.A. Sanchez.  Soil-plant interactions in agroforestry systems.
    For. Ecol. & Management,^, 127-152, 1991.

Vosti, S.A., J. Witcover, C.L. Carpentier, S.J. Magalhaes de Oliveira, J. Carvalho dos Santos.
    Intensifying small-scale agriculture in the western Brazilian Amazon:  Issues, implications and
    implementation in Tradeoffs or Synergies? Agricultural Intensification, Economic Development and
    the Environment. D. Lee and C. Barrett (Editors), CAB International, Wallingford, UK, 2000.

Wood, C.H. and D. Skole.  Linking satellite, census, and survey data to study deforestation in the
    Brazilian Amazon, in People and Pixels: Linking Remote Sensing and Social Science.  D. Liverman
    et al. (Editors), National Academy Press, Washington, DC, 1998.
Page 106 of 339

-------
                                   Chapter 7

                   Thematic Accuracy Assessment of
                     Regional Scale Land-Cover Data
                                        by

                                  Siamak Khorram1*
                                  Joseph F. Knight2
                                    Mali! I. Cakir1
      Center for Earth Observation
      North Carolina State University
      5123 Jordan Hall
      PO  Box 7106
      Raleigh, NC 27695-7106

      -Corresponding Author Contact:

       Telephone: (919)515-3430
        Facsimile: (919)515-3439
           E-mail: khnrram@ncsu.edu
National Research Council
U.S. Environmental Protection Agency
Research Triangle Park, NC 27711
7.1   Introduction

The Multi-Resolution Land Characteristics (MRLC) consortium, a cooperative effort of several U.S.
federal agencies, including the U.S. Geological Survey (USGS) EROS Data Center (EDC) and the U.S.
Environmental Protection Agency (EPA), have jointly conducted the National Land Cover Data (NLCD)
program. This program used Landsat Thematic Mapper (TM) 30 m resolution imagery as the baseline
data and successfully produced a consistent and conterminous land-cover (LC) map of the lower 48 states
at approximately an Anderson Level II thematic detail.  The primary goal of the program was to provide a
generalized and regionally consistent LC product for use in a broad range of applications (Lunetta et al.,
1998). Each of the 10 U.S. federal geographic regions was mapped independently. EPA funded the
Center for Earth Observation (CEO) at North Carolina State University (NCSU) to assess the accuracy of
the NLCD for federal geographic Region IV.
                                                                          Page 107 of 339

-------
An accuracy assessment is an integral component of any remote sensing-based mapping project.
Thematic accuracy assessment consists of measuring the general and categorical qualities of the data
(Khorram et al., 1999). An independent accuracy assessment was implemented for each federal
geographic  region after LC mapping was completed.  The objective for this study was to specifically
estimate the overall accuracy and category-specific accuracy of the LC mapping effort. Federal
geographic  Region IV included the states of Kentucky, Tennessee, Mississippi, Alabama, Georgia,
Florida, North and South Carolina (see Figure 7-1).
                   Figure 7-1.   Randomly selected photograph center points.
7.2    Approach

7.2.1  Sampling Design

Quantitative accuracy assessment of regional scale LC maps, produced from remotely sensed data,
involves comparing thematic maps with reference data (Congalton, 1991). Since there was no suitable
existing reference data that could be used for all federal regions, a practical and statistically sound
sampling plan was designed by Zhu et al. (2000), to characterize the accuracy of common and rare classes
for the map product using National Aerial Photography Program (NAPP) photography as the reference
data.

The sampling design was developed based on the following criteria: (1) ensure the objectivity of sample
selection and validity of statistical inferences drawn from the sample data; (2) distribute sample sites
spatially across the region to ensure adequate coverage of the entire region; (3) reduce the variance for
estimated accuracy parameters; (4) provide a low cost approach in terms of budget and time; and (5) be
easy to implement and analyze (Zhu et al., 2000).
Page 108 of 339

-------
The sampling was a two-stage design. The first stage, the primary sampling unit (PSU), was the size of a
NAPP aerial photograph. One PSU (photo) was randomly selected from a cluster of 128 photographs.
These clusters were formed using a geographic frame of 30' x 30'.  Randomly selected PSU locations are
shown in Figure 7-1. The second stage was a stratified random sample, within the extent of all of the
PSUs only, of 100 sample sites per LC class.  The selected sites were referred to as secondary sampling
units (SSU). The number of sites per photograph ranged from one to approximately 70 (see Figure 7-2).
The total number of sample sites in the study was 1,500 (100 per cover class), although only 1,473 sites
were interpreted due to missing NAPP photos. This sampling approach was chosen by the Eros Data
Center (EDC) over a standard random sample to reduce the cost of purchasing the NAPP photography
(Zhuetal.,2000).
                                            \
                                         Htoto Center
                                            06
                                                    »,2 KXX)
                  Figure 7-2.  Sample sites clustered around the photograph
                              center.
7.2.2  Training

Before the NAPP photo interpretation for the sample sites could begin, photo interpreters were trained to
accomplish the goals of the study. To provide consistency among the interpreters, a comprehensive
training program was devised.  The program consisted of a full-day training session and subsequent "on
the job" training. Two experienced aerial photo interpretation and photogrammetry instructors led the
formal classroom training sessions. The training sessions included the following topics: (a) discussion of
color theory and photo interpretation techniques; (b) understanding of the class definitions; (c)
interpretation of over 100 sample sites of different classes during the training sessions followed by
interactive discussions about potential discrepancies; (d) creation of sample sites for later reference; and
(e) repetition of interpretation practice after the sessions.
                                                                                  Page 109 of 339

-------
The focus was on real world situations that the interpreters would encounter during the project. Each
participant was presented with over 100 pre-selected sites and was asked to provide their interpretation of
the land cover for these sites. Their calls were analyzed and subsequently discussed to minimize any
misconceptions.  During the "on the job" portion of the training, each interpreter was assigned
approximately 500 sites to examine.  Their progress was monitored daily for accuracy and proper
methodology.  The interpreters kept a log of their calls and the sites for which they were uncertain about
the land cover classes.  On a weekly basis, their questions were addressed by the project Photo
Interpretation Supervisor. The problem sites (approximately 400) were discussed until each team member
felt comfortable with the class definitions and, their consistency in interpretation.  Agreement analysis
between the three interpreters resulted in an average agreement of 84%.


7.2.3  Photographic Interpretation

7.2.3.1 Interpretation Protocol

The standard protocol used by the photo interpreters was as follows:

        ° Each interpreter was assigned 500 of the 1,500 total sites

        ° Interpretation was based on NAPP photographs

        o The sample site locations on the NAPP photos were found by first plotting the sites on TM
          false color composite images, then finding the same area on the photo by context

        ° During the interpretation process, cover type and other related information such as site
          homogeneity were recorded for later analysis

        ° When there was some doubt as to the correct class or there was the possibility that two classes
          could be considered correct,  the interpreters selected an alternate class in addition to the
          primary class

        ° The interpretations were based on the majority of a 3 x 3 pixel window (Congalton and Green,
          1999)

7.2.3.2 Interpretation Procedures

The Landsat TM images were displayed using ERDAS Imagine. By plotting the site locations on the
Landsat TM false color composite images, the interpreters precisely located each site. Then based on the
context from the image, the interpreters located the site on the photographs as best they could. Clearly
there was  some error inherent in this location process; however, this was the simplest and most cost
effective procedure available. The use of a 3 x 3 pixel window for interpretation was intended to reduce
the effect of location errors.

The interpreters examined each site's characteristics using the aerial photograph and TM image and
determined the appropriate LC label for the site according to the classification scheme, then entered the
information into the project database. The following data were entered in the database: site identification
Page 11 Oof 339

-------
number (sample site), coordinates, photography acquisition date, photograph identification code, imagery
identification number, primary or dominant LC class, alternate LC class (if any), general site description,
unusual observations, general comments, and any temporal site changes between image and photo
acquisition dates. The interpreters did not have prior access to the MRLC classification values during the
interpretation process.

Individual interpreters analyzed  15% (n=75) of each of the other interpreters' sample sites to create an
overlap database to evaluate the  performance of the interpreters and the agreement between them.
Selection of these 75 sites was done through random sampling. This scheme provided 225 sites that were
interpreted by all three interpreters.  Agreement analysis using these overlap sites indicated an average
agreement of 84% between the three interpreters (see Table 7-1).

Table 7-1-  Agreement analysis between Pis: interpreter call vs. overlap consensus for the 225
            overlap sites.
Overlap Consensus
MRLC
Class





1
i
5.
1
§•
&
—



1.1
2.1
2.2
2.3
3.1
3.2
3.3
4.1
4.2
4.3
8.1
8.2
8.5
9.1
9.2
Total
%
Corr
^_^^^^MMM

1.1 2.1 2.2 2.3 3.1 3.2 3.3 4.1 4.2 4.3 8.1 8.2 8.5 9.1 9.2
18 1 1
21 1
3 1
9
4 2
1 6
16 1 1
2 14 111
2 7
3 2 1 26 4
10 1
3 10 1
1 16 2
2 13 1
15
19 22 3 10 4 6 23 18 8 29 14 14 18 19 18
0.9 1 1 0.9 1 1 0.7 0.8 0.9 0.9 0.7 0.7 0.9 0.7 0.8
18 21 3 9 4 6 16 14 7 26 10 10 16 13 15

Total % Corr
20 0.9 18
22 1 21
4 0.8 3
9 1 9
6 0.7 4
7 0.9 6
18 0.9 16
19 0.7 14
9 0.8 7
36 0.7 26
11 0.9 10
14 0.7 10
19 0.8 16
16 0.8 13
15 1 15
225
0.84
188
  7 2 3-3 Quality Assurance and Quality Control

  Quality Assurance (QA) and Quality Control (QC) procedures were vigorously implemented in the study
  as designed in the Interpretation Organization Chart (see Table 7-2). Discussions among the interpreters
  and project supervisors during the interpretation process provided an opportunity to discuss the probl»™<
  that occurred and to resolve problems on the spot.
lems
                                                                                     Page 111 of 339

-------
 Table 7-2.  Photographic Interpretation (PI) team organization.
                                    Interpreter Organization
 Photo Interpreters
PI #1 (500 pts + 75 pts
from PI #2 and 75 pts
from PI #3)
PI #2 (500 pts + 75 pts
from PI #1 and 75 pts
from PI #3)
PI #3 (500 pts + 75~jjr
from PI #1 and 75
from PI #2)
 PI Supervisor
Random checking for consistency, checking 225 overlapped sites, sites with"
question from three Pis.	
 Project Supervisor
Checking sites with question from PI Supervisor, Random checking of overaiT
sites, Overall QA/QC.
 Project Director
Procedure establishment, Discussions on issues, Random checking OveraiT
QA/QC.
The quality assurance and quality control plan is shown in Figure 7-3. Upon completion of training, a
was performed to determine how similarly the interpreters would call the same sites.  The initial results
the analysis revealed that some misunderstandings about class definitions had crept through the trainin
process. As a result, the interpreters were retrained as a group to "calibrate" themselves. This helped t
ensure that their calls were more consistent between interpreters. Upon satisfactory completion of the r*»_
training, the interpreters were assigned to complete interpretation of the 1,500 sample sites.
                   Classroom Photo
                 Interpretation Training
                     Independent
                   and Supervised
                 Photo Interpretation
                 for Each Interpreter
                   Interpretation of
                  225 Overlap Points
                 Photo Interpretation
                 of the 1500 Random
                    Sample Points
                                               t_
                                        Interpreters Work
                                        Through Overlap
                                      Points as a Group to
                                      Resolve Differences
               —•]  Accuracy Analysis
                     MRLC
                    Region 4
                 Classified Data
               Figure 7-3.   Training, photo interpretation (PI), and quality
                            assurance/quality control (QA/QC) procedures.
Page 112 of 339

-------
7.3   Results
7.3.1  Accuracy Estimates

Table 7-3 presents the error matrix for MRLC Level II classes. The numbers across the top and sides of
the matrices represent the 15 MRLC classes (Appendix A). Table 7-4 presents the error matrix for
MRLC Level I classes The Level II classes were grouped into the following Level I categories: (1) water;
(2) urban or developed; (3) bare surface; (4) agriculture and other grasslands; (5) forest (upland); (6)
wetland (woody or non-woody). The overall accuracy for the Level I and II classes were 66% and 44%,
respectively.

Table 7-3 illustrates the confusion between low intensity residential, high intensity residential,  and
commercial/transportation. Many factors may have contributed to the confusion; however, we  believe the
complex classification scheme used was a dominant factor. For example, the most ambiguous  categories
were the three urban classes that were each distinguished only by percent vegetation. Technically, it was
beyond the methods employed in this study to quantify sub-pixel vegetation content. As a result, many
high intensity residential areas in the classified image were assigned to low intensity residential and
commercial/transportation. This occurred because high intensity residential classes occurred in the
middle of the urban categories (by percent vegetation) and were easily confused with lower intensity and
higher intensity urban development.

  Table 7-3. Error matrix for the Level II MRLC data (15 classes).
Classified MRLC Data
MRLC
Class






±!
i
&
Z





1.1
2.1
2.2
2.3
3.1
3.2
3.3
4.1
4.2
4.3
8.1
8.2
8.5
9.1
9.2
Total
%
Corr

1.1 2.1 2.2 2.3 3.1 3.2 3.3 4.1 4.2 4.3 8.1 8.2 8.5 9.1 9.2
87 3352 242
47 49 22 1 2 2 1 5 1 24 1
1 10 2 42
3 22 32 1 5 1 4 1
2 3 6 33 18 12 1 12
1 3 34
1 13 33 4 2 12 3 4 51
6 3 86 46 373719
1111 7 34 4 264
24 3 7 2 6 16 29 42 62 9 4 4 16 4
2 1 2 15 4 11 4 4 28 18 11 1 2
1 3 11 1 6 3 1 1 37 57 3 2 2
1 10 11 13 20 3 7 4 1 3 8 4 41 2 3
42 1 1 2 8 2 10 4 3 1 43 15
41 2 10 1 2 1 1 9 60
98 100 100 98 100 100 100 94 99 98 93 99 97 99 98
0.9 0.5 0.1 0.3 0.3 0.3 0.3 0.5 0.3 0.6 0.3 0.6 0.4 0.4 0.6
87 47 10 32 33 34 33 46 34 62 28 57 41 43 60

Total % Corr
108 0.8 87
155 0.3 47
19 0.5 10
69 0.5 32
69 0.5 33
38 0.9 34
78 0.4 33
99 0.5 46
61 0.6 34
228 0.3 62
103 0.3 28
128 0.4 57
131 0.3 41
96 0.4 43
91 0.7 60
1473
0.44
647
                                                                                    Page 113 of 339

-------
            Table 7-4.  Error matrix for the Level I MRLC data.
MRLC data


£
(A
1
E

1
2
3
4
8
9
Total
%
Corr
123489
87 3 10 0 2 6
0 188 9 4 38 4
1 12 134 21 8 9
1 46 45 227 30 39
1 43 78 21 207 12
8 6 24 18 4 127
98 298 300 291 289 197
0.89 0.63 0.45 0.78 0.72 0.64
87 188 134 227 207 127
Total % Corr
108 0.81 87
243 0.77 188
185 0.72 134
388 0.59 227
362 0.57 207
187 0.68 127
1473
0.66
970
Also, many problems were encountered with the interpretation of cropland and pasture/hay since both of
the classes had very similar spectral and spatial patterns that occurred within the same agricultural areas.
In addition, cropland was frequently converted to pasture/hay during the interval of two acquisition dates,
or vice versa.  Confusion also existed within classes of evergreen forest and mixed forest, deciduous
forest and mixed forest, barren ground and other grassland, low intensity residential and mixed forest,
transitional and all other classes.

The difference between image classification and photo interpretation was that the classification is mostly
based on the spectral values of the pixels, whereas the photo interpretation incorporates color (tones),
pattern recognition, and background context in combination.  These issues are inherent in any accuracy
assessment project using aerial photos as the reference data (Ramsey, et al. 2001). For this project,
however, aerial photos were the only reasonable reference data source.

The interpretation process is not the only component of the accuracy assessment process (Congalton and
Green, 1999). Additional factors that should be considered are positional and correspondence error.  To
account for these errors, the following additional criteria for correct classification were considered in this
project: (I) primary matches classified pixel; (2) primary or alternate matches classified pixel;
(3) primary is most common in classified 3x3 areas; (4) primary matches any pixel in classified 3x3
area; (5) primary is most common in classified 3x3 area;  and (6) primary or alternate matches any pixel
in 3 x 3 area.  "Interpreted" refers to the classes chosen during the air photo interpretation process,
"primary" and "alternate" are the  most probable LC classes for a particular site, and "classified" refers to
the MRLC classification result for that site.  The analysis results for each cover class in six cases are
presented in Tables 7-5 and 7-6.  The overall accuracies under various scenarios ranged from 44% to
79.4% (n=l,473) for cases "a" and "f," respectively.
Page 114 of 339

-------
Table 7-5. Summary of further accuracy analysis by interpreted cover class:  number of sites.
Class
1.1
2.1
2.2
2.3
3.1
3.2
3.3
4.1
4.2
4.3 1
8.1
8.2
8.5
9.1
9.2
Totals
Num
108
155
19
69
69
38
78
99
61
228
103
128
131
96
91
1473
Primary
PI Matches
MRLC
87
47
10
32
33
34
33
46
34
62
28
57
41
43
60
647
Prim or Alt
Matches
MRLC
95
69
11
39
35
36
44
55
39
98
39
82
61
53
68
824
Primary PI is
Mode of 3x3
84
60
8
35
27
34
33
60
44
68
27
56
33
47
58
674
Primary PI
Matches any
3x3
92
81
11
41
30
36
42
68
48
110
38
83
53
59
67
859
Prim or Alt
PI is Mode
of 3x3
94
124
15
44
34
35
40
79
52
148
46
83
56
68
67
985
Prim or Alt
PI Matches
any 3x3
100
135
16
49
42
37
52
83
54
187
64
102
91
84
74
1170
 Table 7-6. Summary of further accuracy analysis by interpreted cover class: percentage of sites
           for each class.
Class
1.1
2.1
2.2
2.3
3.1
3.2
3.3
4.1
4.2
4.3
8.1
8.2
8.5
9.1
9.2
% Total:
%
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
100.0
Primary
PI Matches
MRLC
80.6
30.3
52.6
46.4
47.8
89.5
42.3
46.5
55.7
27.2
27.2
44.5
31.3
44.8
66.3
44.0
Prim or Alt
PI Matches
MRLC
88.0
44.5
57.9
56.5
50.7
94.7
56.4
55.6
63.9
43.0
37.9
64.1
46.6
55.2
73.9
55.9
Primary
PI is Mode
of 3x3
77.8
38.7
42.1
50.7
39.1
89.5
42.3
60.6
72.1
29.8
26.2
43.8
25.2
49.0
63.0
45.7
Primary
PI Matches
any 3x3
85.2
52.3
57.9
59.4
43.5
94.7
53.8
68.7
78.7
48.2
36.9
64.8
40.5
61.5
72.8
58.3
Prim or Alt
PI is Mode
of 3x3
87.0
80.0
78.9
63.8
49.3
92.1
51.3
79.8
85.2
64.9
44.7
64.8
42.7
70.8
72.8
66.8
Prim or Alt
PI Matches
any 3x3
92.6
87.1
84.2
71.0
60.9
97.4
66.7
83.8
88.5
82.0
62.1
79.7
69.5
87.5
80.4
79.4
                                                                             Page 115 of 339

-------
 7.3.2   Issues and Problems

 7.3.2.1  Heterogeneity

 The heterogeneity of many areas caused confusion in assigning the sites to an exact class label. Since the
 spatial resolution of the Landsat TM data was 30 x 30 m, pixel heterogeneity was a common problem (see
 Plate 7-1 a). For example, a site on the image frequently contained a mixture of trees, grassland and
 several houses. Thus, the reflectance of the pixel was actually a combination of different reflectance
 classes within that pixel. This factor contributed to confusion between evergreen forest and mixed forest
 deciduous forest and mixed forest, low intensity residential and other grassland, and transitional and
 several classes.

 7.3.2.2  Acquisition Dates

 Temporal discrepancies between photograph and image acquisition  dates, if not reconciled, would
 negatively impact the classification accuracy (Plate 7-1 b). For example, to interpret early forest growth
 areas, the interpreter had to decide whether the site was  a transitional or a forested area. If the photograph
 was acquired before the image (e.g. as much as six years earlier), it was clear that those early forest
 growth sites would show up as forest cover on the satellite image. In this case, the interpreters decided
 the appropriate cover class based on satellite imagery time.

 7.3.2.3  Location Errors

 Locating the reference site on the photo was sometimes  problematic. This frequently occurred when:
 (1) the LC had changed between the image and photo acquisition dates; (2) there were few clearly
 identifiable features for positional reference;  and (3) the reference site was on the border of two or more
 classes (boundary pixel problem). When the LC had changed between acquisition dates, locating
 reference sites was difficult because the features surrounding the reference site were also changed.
 Similarly, when a reference site fell in  an area with a few identifiable features for positional reference, the
 interpreter had to approximate the location of reference site. For example, when the reference site was on
 the shadowy side of a mountain, it was impossible to see the reference features except the ridgeline of the
 mountain, thus requiring the interpreter to locate the reference site based on the approximate distance to
 and the direction of the ridgeline.  The third case was the most common source of confusion in the
 interpretation process.  Reference sites were frequently on the border of two or more classes. In these
 situations, the interpreter decided between two or more classes by determining which class covered the
 majority of the 3x3 window.
Page 116 of 339

-------
         SAW Aerial Photo
(*)
LANDSAT TM Imigc
           CIR Aerial Photo
(b)
 LAND SAT TM Image
         CIR Aerial Photo
(c)
LANDSAT TM linage
Plate 7-1. (a) Heterogeneity problem: reference site consists of several classes; (b) LC
         class changed between acquisition dates in the reference site; (c) ambiguity of
         class definitions; it was difficult to differentiate between high density and
         commercial class according to definition.
                                                                       Page 117 of 339

-------
 7.4    Further Research

 The results of this study point to numerous opportunities for further research to improve accuracy
 assessment methods for regional scale assessments. They include: (1) examine the impact of alternate
 classes in the accuracy assessment; (2) evaluate and analyze the effect of positional errors on accuracy
 assessment; (3) collect field data for the 225 overlapping sample sites to validate the interpretation; and
 (4) analyze satellite data with a higher temporal resolution to better identify changes between the
 acquisition of TM data and NAPP photography (e.g., using NOAA-AVHRR and MODIS data).
 7.5   Acknowledgments

 The results reported here were generated through an agreement funded by the Environmental Protection
 Agency (EPA). The views expressed in this report are those of the authors and do not necessarily reflect
 the views of EPA or any of its sub-agencies. The authors would like to thank EPA, USGS-EROS Data
 Center (EDC) for their support and for the assistance given on this project. The authors would also like to
 thank Dr. Heather Cheshire and Ms. Linda Babcock of CEO at NCSU  for contributing their expertise in
 photo interpretation and extended help throughout the duration of this project.
 7.6    References

Congalton, R.  A review of assessing the accuracy of classifications of remotely sensed data.  Remote
    Sens. Environ., 37, 35-46, 1991.

Congalton, R., and K. Green. A practical look at the sources of confusion in error matrix generation.
    Photogrammetric Engineering and Remote Sensing, 59(5), 641-644, 1999.

Dai, X.L., and S. Khorram. The effects of image misregistration on the accuracy of remotely sensed
    change detection. IEEE Transactions on Geoscience and Remote Sensing, 36(5), 1566-1577, 1998

Khorram, S., G.S. Biging, N.R. Chrisman, D.R. Colby, R.G. Congalton, J.E. Dobson, R.L. Ferguson,
    M.F. Goodchild, J.R. Jensen, and T.H. Mace. Accuracy assessment of remote sensing-derived change
    detection.  Monograph, American Society of Photogrammetry and Remote Sensing (ASPRS),
    Bethesda, MD, 64 p., 1999.

Lunetta, R.S., R.G. Congalton, L.F. Fenstermaker, J.R. Jensen, K.C. McGwire, and L.R. Tinney. Remote
    sensing and geographic information system data integration: error sources and research issues.
    Photogrammetric Engineering and Remote Sensing, 57(6), 677-687, 1991.

Lunetta, R.S., J.G. Lyon, B. Guidon, and C.D. Elvidge. North American Landscape Characterization
    Dataset Development and Data Fusion Issues. Photogrammetric Engineering and Remote Sensing
    64(8), 821-829, 1998.

Ramsey, E., G. Nelson, K. Sapkota. Coastal Change Analysis Program implemented in Louisiana.
    Journal of Coastal Research, 17, 53-71, 2001.
Page 118 of 339

-------
Zhu, Z., L. Yang, S.V. Stehman, and R.L. Czaplewski.  Accuracy Assessment for the U.S. Geological
    Survey Regional Land Cover Mapping Program: New York and New Jersey Region.
    Photogrammetric Engineering and Remote Sensing, 66(12), 1425-1438, 2000.
                                                                               Page 119 of 339

-------
Page 120 of 339

-------
                                     Appendix A

          MRLC Classification Scheme and Class Definitions
The MRLC program utilizes a consistent classification scheme for all EPA Regions at approximately an
Anderson Level II thematic detail.  While there are 21 classes in the MRLC system, only 15 were mapped
in EPA Region IV. The following classification scheme was applied to EPA Region  IV data set.
A1    Water

All areas of open water or permanent ice/snow cover


A 1.1  Water

All areas of open water, generally with less than 25% vegetation cover.


A2    Developed

Areas characterized by high percentage of construction materials (e.g., asphalt, concrete, building, etc).


A2.1  Low Intensity Residential

Land includes areas with a mixture of constructed materials and vegetation or other cover. Constructed
materials account 30-80 percent of the total area. These areas most commonly include single-family
housing areas, especially suburban neighborhoods. Generally, population density values in this class will
be lower than in high-intensity residential areas.


A2.2 High Intensity Residential

Includes heavily built-up urban centers where people reside. Examples include apartment complexes and
row houses.  Vegetation occupies  less than 25% of the landscape.  Constructed materials account for 80-
 100% of the total area. Typically, population densities will be quite high in these areas.
                                                                              Page 121 of 339

-------
 A2.3  High Intensity Commercial/lndustrial/Transportation

 Includes all highly developed lands not classified as "High Intensity Residential," most of which is
 commercial, industrial, and transportation.
A3    Barren

Bare rock, sand, silt, gravel, or other earthen material with little or no vegetation regardless of its inherent
ability to support life. Vegetation, if present, is more widely spaced and scrubby than that in the
vegetated categories.


A3.1   Bare Rock/Sand

Includes areas of bedrock, desert pavement, scarps, talus, slides, volcanic material, glacial debris, beach,
and other accumulations of rock and /or sand without vegetative cover.


A3.2  Quarries/Strip Mines/Gravel Pits

Areas of extractive mining activities with significant surface expression.


A3.3  Transitional

Areas dynamically changing from one land cover to another, often because of land use activities.
Examples include forest lands cleared  for timber, and may include both freshly cleared areas as well as
areas in the earliest stages of forest growth.
A4    Natural Forested Upland (non-wet)

A class of vegetation dominated by trees generally forming > 25% canopy cover.


A4.1  Deciduous Forest

Areas dominated by trees where 75% or more of the tree species shed foliage simultaneously in response
to an unfavorable season.


A 4.2  Evergreen Forest

Areas dominated by trees where 75% or more of the tree species maintain their leaves all year. Canopy is
never without green foliage.
Page 122 of 339

-------
A4.3 Mixed Forest
Areas dominated by trees where neither deciduous nor evergreen species represent more than 75% of the
cover present.
A5    Herbaceous Planted/Cultivated
Areas dominated with vegetation, which has been planted in its current location by humans, and/or is
treated with annual tillage, modified conservation tillage, or other intensive management or manipulation.
The majority of vegetation in these areas is planted an/or maintained for the production of food, fiber,
feed, or seed.
A5.1  Pasture/Hay
Grasses, legumes, or grass-legume mixtures planted for livestock grazing or the production of seed or hay
crops.
,45.2  Row Crops
 All areas used for the production of crops, such as corn, soybeans, vegetables, tobacco, and cotton.

 A5.3  Other Grasses
 Vegetation planted in developed settings for recreation, erosion control, or aesthetic purposes. Examples
 include parks, lawns, and golf courses.

 A6   Wetlands
 Non-woody or woody vegetation where the substrate is periodically saturated with or covered with water.

 A6.1   Woody Wetlands
 Areas  of forested or shrubland vegetation where soil or substrate is periodically saturated with or covered
 with water.
 A6.2  Emergent Herbaceous Wetlands
 Non-woody vascular perennial vegetation where the soil or substrate is periodically saturated with or
 covered with water.
           Note: Cover class types 5,0, 6.0 and 7.0 did not occur in federal geographic Region 4.
                                                                                 Page 123 of 339

-------
Page 124 of 339

-------
                                  Chapter 8

           An Independent Reliability Assessment for the
        Australian Agricultural Land-Cover Change Project
                               1990/91-1995

                                       by

                                 Michele Barson1*
                                 Vivienne Bordas1
                                   Kim Lowell2
                                  Kim Malafant3
   Bureau of Rural Sciences
   Post Office Box 858
   Canberra, ACT, Australia 2604

   *Corresponding Author Contact:

    Telephone:  +61 2 62724347
     Facsimile:  +61 2 62724747
        E-mail:  vivienne.bordas@brs.gov.au
               michele.barson@brs.gov.au
Centre de recherche en geomatique
Universite Laval
Pavilion Casault
Ste-Foy
Quebec G1K 7P4 Canada
                           3 Complexia
                            Post Off ice Box 3011
                            Belconnen ACT, Australia 2617
8.1   Introduction

Australia's first National Greenhouse Gas Inventory (NGGI) suggested that agricultural clearing could
represent as much a quarter of Australia's total greenhouse gas emissions (DOEST, 1994). The
Agricultural Land Cover Change (ALCC) project was undertaken by the Bureau of Rural Sciences (BRS)
and eight Australian state government agencies to document rates of deforestation and reforestation
(1990/91-1995) for the purpose of improving estimates of greenhouse gas emissions (Barson et al., 2000).
For this study, woody vegetation was defined as all vegetation, native or exotic, with a height of > 2.0 m
                                                                       Page 125 of 339

-------
and a crown cover density of >20% (McDonald et al., 1990).  This was the definition of "forest," agreed
to by state and commonwealth agencies for Australia's National Forest Inventory (NFI), and the
definition used for Australia's NGGI (NGGI, 1999). The definition includes vegetation usually referred
to as forest (50-100% crown cover), as well as woodlands (20-50% crown cover), and plantations
(silvaculture operations), but not open woodlands where crown cover is <20%. Woodlands occupy 112.0
• 106 ha, followed by forests (43.6 • 106 ha),  and plantations (1.0 • 106 ha) (NFI, 1998).

The project documented increases and decreases in woody vegetation over the period 1990/91-1995 for
the intensive land-use (LU) zone (see Figure 8-1).  Decreases or clearing were defined as the removal of
woody vegetation resulting in a crown cover of < 20%. Increases in woody vegetation (usually due to
tree planting) result in a crown cover that exceeds 20%). The intensive land-use (LU) zone comprises
some 288 • 106 ha, representing approximately 38% of the Australian continent, and is where most land
clearing has taken place.  Outside this zone in the Australian outback, the LC is disturbed but relatively
intact (Graetz et al.,  1995). The project produced change maps for 156 pairs of TM scenes which  showed
that clearing for agriculture and  development activities over the study period totaled approximately 1.2 •
106 ha (308,000 ha/yr).  The results indicated that more than 80% of the clearing for agriculture was
taking place in the state of Queensland (see Figure 8-2), and that the annual rates of clearing for the
continent were almost 210.000 ha or 40% lower than the figures compiled for the first NGGI.
                                           Northern
                                            Territory
                         Western
                         Australia
                                                               New South
                                                                  Wales
                       Study area
                         500     1000 km
                                                     Tasmania
               Figure 8-1.  Study area for the Agricultural land-cover change for
                           the Australian continent 1990/91-1995 project.
Page 126 of 339

-------
                                                                   Jericho region
                                                                              Yarmouth region
                                                                               Cootabch region
     Number of hectares cleared in each
     277,000 hectare region due to
     agriculture, grazing and development.
         Outside study area
     	No change
         1 -1000
        • 1001-5000
         5001  -15000
     • 15001-32000
                                Bombala region
  
-------
            •  Reliability assessed
           Percentage change (increase & decrease)
           n 0-3
           D 3-6
           • 6-8

              Figure 8-3.  Percentage land-cover change by TM image (all causes).

A methodology that did not require a reference data set, and could be applied to change data produced
using a variety of approaches to image processing and radiometric calibration (see Table 8-1), had been
developed by Lowell (2001) to evaluate the LC change maps produced for the ALCC project by the state
agencies. This chapter reports on the application of Lowell's method, and on the reliability of the
estimates of change produced for the Australian ALCC project.
              Table 8-1. Land-cover change detection methods used for each state.
  State
Method
  NSW
Unsupervised classification of combined 1991 and 1995 images
  NT
Band 5 subtraction of 1990 and 1995 images
  OLD
Thresholding of band 2, 5 and NDVI difference images
  SA
Unsupervised classification (150 classes) of combined 1990 and 1995 images
  TAS
Thresholding of NDVI difference data
  VIC
Unsupervised classification of combined 1990 and 1995 images to create woody, non-woody,
woody increase and woody decrease
  WA
Combined 1990 and 1995 images and carried out canonical variant analysis based on
biogeographic regions to identify suitable indices and bands to classify land cover change
Page 128 of 339

-------
8.2    Methods

Lowell's method produces an area based independent estimate of change for which confidence intervals
can be calculated; the estimates of change derived from image analysis are then compared against these
confidence intervals. The method uses 500 x 500 pixel sample units (see Figure 8-4) selected for each
change image in a two-stage procedure to ensure that a minimum number (n) of samples containing
change are included. First, sample units are selected containing change areas (increase or decrease),
proportional to the amount of change across the image. The remaining sample units are then selected
according to a spatially distributed random sample without replacement.  Lowell demonstrated that 33
samples were required to obtain stable confidence intervals for TM scenes in which change was relatively
rare (i.e., approximately 0.13% of the scene).  The n could be reduced to 22 for scenes where change was
more common (i.e., approximately 2.5%) (Lowell, 1998).  Following sample unit selection, the area of
change (increase or decrease in woody vegetation) was estimated for each sample using image
enhancement techniques such as unsupervised classification or by displaying band differences or
differences amongst various combinations of bands. Confidence intervals for woody vegetation changes
were then calculated for the independent estimates of change made for each scene. These were then
compared with the amount of change reported for the scene by the corresponding state agency.
                                     Increase sample unit
                                                                       Decrease sample unit
          Increase
          Decrease

    Figure 8-4.  Sampling strategy for estimating land-cover change (increase plus decrease
                in woody vegetation) for a Thematic Mapper image.
                                                                                 Page 129 of 339

-------
Lowell's method was applied to the ALCC results within a spatial hierarchy to enable analysis of
variations in LC change at multiple resolutions (i.e., study area, state, TM scene, sample unit).
Approximate r-score values were calculated using the state estimates of change, the overall sample
estimate, and the individual sample results. Significance levels for the r-scores were calculated and
compared for both the overall and individual sample unit results. Approximate confidence intervals
(95%) for the estimated proportion of change were also compared to the overall state estimate. Dual
assessment criteria (Jupp, 1998) were implemented to consider both the overall scene results as well as
the individual sample results; the  state change estimates for a scene were only accepted as reliable when
both the overall and individual sampler-scores were not significantly different. Preparation of the grid for
sample units was automated by writing a program to determine the areal extent of each change image, and
to establish a grid of non-overlapping 500 x 500 pixel sample units.

In addition to the original estimates of change reported on hard copy maps (Kitchin and Barson, 1998),
the BRS obtained 156 change images from state agencies and calculated the total change (woody
vegetation) for each image.  Due to resource limitations, it was decided that only half the scenes in the
study area would be assessed.  Scenes containing only a small percentage of landmass were excluded,
leaving 151 for possible assessment. Within each state, scenes were classified as having "low" or "high"
levels of change. A weighting program, which took into account the amount of change in each scene, the
individual states' overall contribution to Australia's LC change and the methods used for change
detection, was used to select scenes for sampling.  A total of 67 scenes were selected for reliability
assessment (see Figure 8-3).

Analysts who had not previously been involved in the project were selected through a competitive
tendering process to prepare the independent sample based estimates of change for comparison with the
results produced by state agencies.  The analysts were provided with co-registered TM images for the 67
scenes, pixel coordinates for the upper left hand corners for sample units within scenes, and 1990/91  LC
maps showing the distribution of woody vegetation in 1990/9, but no LC change information.  They
verified the image co-registration, and after some preliminary testing, calculated the Normalised
Vegetation Index (NDVI) for each image, then subtracted the NDVI images and displayed the difference
image (Jensen, 1996).

The NDVI difference image for each sample unit was examined at a range of threshold values to
determine the location of positive (i.e. increases in woody vegetation) and negative (decreases)
differences. These  areas were checked in detail against the 1990/91 and 1995 images and the upper and
lower thresholds for increases and decreases were recorded.  A final classification of change was
performed using selected thresholds, and the change areas checked against the LC image.  This ensured
that only areas that were woody in 1990/91 could be identified as "decrease," and non-woody in  1990/91
as "increase" in woody cover.  The analysis provided an estimate of the number of pixels of increase and
decrease for each sample unit in the image.

For quality assurance (QA) purposes, four to six sample units from half the images being assessed were
randomly sampled.  Change for these sample units was assessed as described above, but by different
operators.  Differences in interpretation were discussed and evaluated statistically using a paired /-test.  A
sample unit failed QA if the average of the differences found by two operators was not the same.  In this
case, the main operator re-examined all the sample units for the image.  The analysts provided BRS with a
Page 130 of 339

-------
spreadsheet for each image containing the sample locations and the number of pixels of increase and
decrease for each sample in the image, plus notes on any other areas of possible change identified. The
BRS implemented an automatic analysis to evaluate the differences between the state estimate of change
and those provided by the analysts. For scenes where the state's and the analyst's estimates of change
differed substantially, the BRS investigated the possible reasons.  The approximate spatial distribution of
the change in the state change map was also examined to determine if a lack of acceptance was due to
highly localised changes difficult to sample effectively using the current method.

The investigation of lack of acceptance included inspecting the state's sources of information used for
initial checking of the change (i.e., aerial photography, other satellite imagery, or through field data)
(Kitchin and Barson, 1998). Discrepancies were forwarded to respective states for advice on likely
reasons for such differences.  If no reason could be identified (e.g., where severe drought had led to leaf
drop so that spectrally the area appeared to have been cleared, but ground inspection showed that it had
not), the scene was reprocessed by the analysts.
8.3    Results

In the first assessment, 60 of 67 scenes met the acceptance criteria described above. The seven
noncompliant scenes were forwarded to the states for comment, and a new set of 500 x 500 pixel sample
units generated for reprocessing these scenes.  On reprocessing, five additional scenes met the acceptance
criteria (see Table 8-2). Of the remaining two scenes, one contained a significant proportion of change
due to fire on rocky hillsides that was difficult to map from the image without additional photo
interpretation. The second also contained changes due to fire and the loss of native vegetation followed
by plantation establishment. These changes were difficult to detect without local knowledge and ancillary
data.
                   Table 8-2. Distribution of scenes for independent assessment.



State
New South Wales, including ACT
Northern Territory
Queensland
South Australia
Tasmania
Victoria
Western Australia
Total
Total
Number of
Scenes in
Study Area
37
10
53
17
5
16
18
156


Scenes
Assessed
10
3
22
11
3
9
9
67


Scenes
Reprocessed
2
0
0
3
1
0
1
7
Scenes
Meeting
Acceptance
Criteria
9
3
22
10
3
9
9
65
                                                                                     Page 131 of 339

-------
Further analyses of change maps were undertaken to estimate the variability in overall state and
continental estimates of change, and change estimates within each state.  The analyst and state results
provide a spatial hierarchy in which change proportions and variability could be estimated  (see Figure
8-5). The overall mean proportions and variability were estimated for the change scenes. Approximate
95% confidence intervals for the means were calculated at each of the levels, and were used to identify
any significant differences between the estimates at each of the levels. In most cases, the mean change
proportions estimated from the analyst's process provided values well within the 95% confidence interval
estimated by the states (see Table 8-3).  The only exceptions were two scenes from Queensland with
mean change estimates in excess of the confidence interval estimated by the state. Although the analyst's
process provided lower mean change proportion estimates than that of the state, the two estimates of
variability were generally similar.  The continental estimates were within the 95% confidence interval
although the state estimate of continental change was 1.3% versus the consultant's lower estimate of
0.9%.  The variability of the state estimate was lower (0.2%) than the estimate provided by the analyst's
process (0.3%).  Table 8-4 summarizes the mean and variability of change for the spatial hierarchy. It
shows that the variability estimates from the analyst's results are greater than those from the state at
comparable levels.  The ranges of means for the change proportions were consistently lower for the
analyst's results, but not statistically different.  Our results indicated that the states' results  were the most
accurate as was evidenced by the relatively small confidence intervals.
State's Change Maps Analyst's Sub-sample Results
Continental
L> State
^> Scene

Continental
U* State
I



Scene
^> Sub-sample
           Figure 8-5.   Estimation hierarchy from the state's and analyst's results sets.
Of the 67 scenes evaluated, 90% were determined acceptable after initial processing and 97% after
additional processing. This high level of acceptance provided confidence in the results of the ALCC
project. The total potential error in LC change estimates across Australia is shown in Table 8-5.  If the
interpretation that 10% of the change scenes failed (even though an additional 7.0% passed with
reprocessing), and the error on the failing change scenes is as high as 30%, the difference in LC change is
only 3.0%. It would be surprising if the error were as high as this, since no scenes failed for the state
(Queensland) with the largest amount (>80%) of clearing.  Thus, the error was likely to be distributed
among the states having the least amount of change.
Page 132 of 339

-------
Table 8-3. Comparative analysis for state's and analyst's sample unit change maps.
State
New South Wales

Northern Territory
Queensland
South Australia
Victoria
Tasmania
Western Australia
Continental Estimate
Source
State
Analyst
State
Analyst
State
Analyst
State
Analyst
State
Analyst
State
Analyst
State
Analyst
State
Analyst
Mean
0.003864
0.001831
0.000644
0
0.011593
0.011075
0.012995
0.005625
0.013600
0.006055
0.039160
0.014050
0.032650
0.020183
0.012898
0.008903
Standard
Deviation
0.007294
0.008726
0.000478
0
0.012670
0.025121
0.014141
0.027887
0.019658
0.029289
0.026792
0.026977
0.026897
0.061761
0.018069
0.032335
95%
Confidence
Interval
0-0.018160
0-0.018935
0-0.001579
0
0-0.036427
0-0.060313
0-0.040712
0-0.060284
0-0.052130
0-0.063462
0-0.091672
0-0.066925
0-0.085369
0-0.141235
0-0.048313
0-0.072279
Significance
Level*

n.s.

n.s.

n.s.

n.s.

n.s.

n.s.

n.s.

n.s.
      * Significance (p = 0.05) for differences between state and analyst's estimates
 Table 8-4. Comparative analysis for state's and analyst's sample unit change maps.
Source
State Change Maps
Analysts
Level
Continental
State
Scene
Continental
State
Scene
Sub-sample
Mean Range
0.012898
0.000644-0.039160
0-0.100215
0.008903
0.001831-0.020183
0.000041-0.068090
0-0.57344
Standard Deviation Range
0.018069
0.000478-0.026897

0.032335
0.008726-0.061761
0.000163-0.138222

  Table 8-5. Potential error in land-cover change estimates for Australia if 90% or
            97% of images are reliable, and the remaining 10% to 30% of images
            have various amounts of error.
Error Per Image
10%
20%
30%
90% Correct
1%
2%
3%
97% Correct
0.3%
0.6%
0.9%
                                                                        Page 133 of 339

-------
 8.4    Discussion and Conclusions

 The goal of this study was to provide an independent evaluation of the reliability of the estimates made by
 state agencies of LC change in Australia 1990/91-1995. Traditional approaches for assessing the
 accuracy of LC product derived from remote sensor data were determined to be inappropriate for an
 assessment of our LC change products because these methods require statistically sufficient class
 representation (n), and relatively homogeneous distribution (Congalton and McLeod, 1994; CGC, 1994).
 In our study LC change was a relatively rare event, and tended to be concentrated in relatively few areas.
 Reference data to support a traditional accuracy assessment approach were not available.

 The area-based method developed by Lowell (2001) was implemented to provide an independent estimate
 of change for which confidence intervals were calculated.  State estimate of change were then compared
 against these confidence intervals to provide a means of evaluating the accuracy of the state produced LC
 change products. State estimates were within the established 95% confidence intervals for 60 of the 67
 scenes initially tested.  The seven scenes that did not meet the acceptance criteria were reprocessed and
 retested, with five subsequently accepted. LC change rates were underestimated by the analysts or
 overestimated by the states for the two scenes not accepted. The method overcame the difficulties caused
 by the lack of a suitable reference data.  This is likely to be a common difficulty in large area studies of
 LC change. Suitable data reference data will rarely be available to match the multiple dates for LC
 change studies.  Even when multiple date data are available, obtaining a "true" change map will be
 difficult, since the overlay of multiple date LC is likely to introduce error (Lowell, 2001).

 We used an area-based sampling unit rather than a discrete sample-based approach because of the relative
 rarity of woody vegetation change,  and the difficulty (and cost) of sampling enough points across a
 change map to support a rigorous statistical assessment. Based on extensive testing of the sample unit
 size, Lowell (2001) determined that the 500 x 500 pixel sample unit provided stable estimates of the
 confidence intervals after relatively few sample units had been examined.  The present study
 demonstrated that a 500-pixel sample unit was a practical size for evaluation. When sample unit location
 had been automated, one operator could evaluate a change map with 33 samples in approximately 10
 hours. The area-based reliability method provided a cost-effective evaluation of the results of the ALCC
 project, and represented only 3.5% of the total project budget.

 Overall, the assessment demonstrated that the process of detecting LC change from TM data provided
 repeatable and reliable results. Different change techniques and approaches to radiometric calibration
 among individual states did not negatively impact results.  Because LC change was a relatively rare event
 the area based methodology had a considerable advantage over more traditional point-based evaluation
 methods that require a large number of points (n) to support a rigorous statistical analysis. The method
 was particularly useful  where suitable reference data for testing the change estimates are unavailable.

 Digital data sets and the final report are  available on CD ROM.  Copies can be obtained from the first two
 authors or downloaded (http://adl.brs.gov.au/ADLsearchA
Page 134 of 339

-------
8.5   Summary

Australia's first NGGI identified that land clearing could be contributing as much as 25% of Australia's
total greenhouse gas emissions.  These figures were regarded as very uncertain, and a collaborative
project was undertaken with eight state agencies using TM imagery and other data to document the rates
of change in woody vegetation 1990/91-1995.  The reliability of this project's results was assessed using
a method developed by Lowell (2001) for this purpose. Traditional methods of accuracy assessment \\ere
impractical given the large size of the study area, the relative rarity of the change detected and the lack of
an appropriate reference data set. Lowell's method was implemented to provide an independent estimate
of change against which state agency estimates were compared. The reliability assessment demonstrated
that the process of detecting land cover change from TM imagery was repeatable and provided consistent
results across states.  This approach may be useful in other environments where reference data suitable for
checking land cover change are unavailable.
 8.6   Acknowledgments

 The authors gratefully acknowledge the substantial efforts of the following state agencies which
 participated in the project: Agriculture Western Australia; New South Wales Land Information Centre.
 Department of Information Technology and Management; Northern Territory Department of Lands.
 Planning and Environment; Queensland Department of Natural Resources; South Australian Department
 of Primary Industries and Resources; Tasmanian Department of Primary Industry. Water and
 Environment; Victorian Department of Natural Resources and Environment; and the Western Australian
 Department of Land Administration.

 The authors would like to thank David Jupp (CS1RO Earth Observation Centre) for his contributions to
 the development of the independent LC change assessment method, and the consultants, the Royal
 Melbourne Institute of Technology University's Geospatial Science Initiative and Geoimage, which
 undertook the independent assessment. David Jupp, Eric Lambin (Catholic University. Louvain,
 Belgium) and the Australian Greenhouse Office are thanked for comments on an earlier manuscript. The
 Australian government and state government agencies jointly funded this project.
  8.7   References

  Barson, M.M., L.A. Randall, and V.M. Bordas. Land Cover Change in Australia.  Results of the
      collaborative Bureau of Rural Sciences - State agencies'project on remote sensing of land cover
      change.  Bureau of Rural Sciences, Canberra, Australia, 2000.

  Congalton, R., and R. McLeod.  Change detection accuracy assessment on the NOAA Chesapeake Bay
      pilot study.  Proceedings of the First International Symposium on the Spatial Accuracy of \atural
      Resource Data Bases, American Society for Photogrammetry and Remote Sensing. Bethesda,
      Maryland, pp. 78-87, 1994.
                                                                                   Page 135 of 339

-------
(CCC) Computer Graphics Centre.  Accuracy assessment of land cover change detection. CGC Report
    Number 101, North Carolina State University. Raleigh, North Carolina, 1994.

(DOEST) Department of Environment, Sport and Territories.  National Greenhouse Gas Inventory 1988
    and 1990.  National Greenhouse Gas Inventory Committee, Canberra, Australia, 1994.

Graetz, R.D., M.A. Wilson, and S.K. Campbell. Landcover Disturbance over the Australian Continent:
    a contemporary assessment. Biodiversity Series. Paper Number 7, Biodiversity Unit, Department of
    Environment, Sport and Territories, Canberra, Australia, 1995.

Jensen, J.J.  Introductory Digital Image Processing - a remote sensing perspective. Second Edition,
    Prentice Hall, New Jersey, 1996.

Jupp, D.L.B. Report on the development of an independent accuracy assessment methodology for the
    remote sensing of agricultural land cover change 1990 - 1995 project for the Australian continent.
    Bureau of Resource Sciences Report, Canberra, Australia, 1998.

Kitchin, M. and M Barson. Monitoring Land Cover Change.  Specifications for the Agricultural Land
    Cover Change 1990-1995 project (version 4). Bureau of Rural Sciences, Canberra, Australia, 1998.

Lowell, K.  Development of an independent accuracy assessment methodology for the remote sensing of
    agricultural land cover change 1990-1995 project for the Australian continent.  Report to the Bureau
    of Resource Sciences Report, Canberra, Australia, 1998.

Lowell, K.  An area based accuracy assessment methodology for digital change maps. Int. J. Remote
    Sensing, 22 (17), 3571-3596. 2001.

McDonald, R.C., R.F. Isbell, J.G. Speight, J. Walker, and M.S. Hopkins. Vegetation. In: Australian Soil
    and Land Survey:  Field handbook (Second Edition). Inkata Press, Melbourne, Australia, 1990.

(NFI) National Forest Inventory. Australia's State of the Forests Report  1998. Bureau of Rural Sciences,
    Canberra, Australia, 1998.

(NGGI) National Greenhouse Gas Inventory. National Greenhouse Gas Inventory Land Use Change and
    Forestry Sector 1990-1997. In: Workbook 4.2 and Supplementary Methodology.  Australian
    Greenhouse Office, Canberra, Australia, 1999.
Page 136 of 339

-------
                                Chapter 9


             Assessing the Accuracy of Satellite-Derived
Land-Cover Classification  Using Historical Aerial Photography,
     Digital Orthophoto Quadrangles, and Airborne Video Data

                                     by

                               Susan M. Skirvin1*
                               William G. Kepner2
                                Stuart E. Marsh3
                               Samuel E. Drake3
                                John K. Maingi4
                               Curtis M. Edmonds2
                             Christopher J. Watts5
                               David R. Williams6
 1  USDA Agricultural Research Service
   Southwest Watershed Research Center
   2000 E. Allen Road
   Tucson, AZ 85719

   *Corresponding Author Contact:

   Telephone:  (520) 670-6380, Ext. 149
     Facsimile:  (520) 670-5550
       E-mail:  sskirvin@tucson.ars.ag.gov
U.S. Environmental Protection Agency
National Exposure Research Laboratory
944 East Harmon Avenue
Las Vegas, NV 89119
   Arizona Remote Sensing Center
   Office of Arid Lands Studies
   University of Arizona
   1955 E. 6th Street
   Tucson, AZ 85719
Department of Geography
Miami University
Oxford, OH 45056
 5 DPTO. de Fisica
   Universidad de Sonora
   Hermosillo, Sonora, Mexico 83000
Lockheed Martin Environmental Services
980 Kelly Johnson Drive
Las Vegas, NV 89119
                                                                  Page 137 of 339

-------
9.1    Introduction
There is intense interest among Federal agencies, States, and the public to evaluate environmental
conditions at community, watershed, regional, and national scales. Advances in computer technology,
geographic information systems (GIS) and the use of remotely sensed image data have provided the first
opportunity to assess ecological resource conditions at a number of scales and to determine cross-scale
relationships between landscape composition and pattern, fundamental ecological processes, and
ecological goods and services.  Providing quantifiable information on the thematic and spatial accuracy of
land-cover (LC) data derived from remotely sensed sources is a fundamental step in achieving goals
related to performing large spatial assessments using space-based technologies.

Remotely sensed imagery obtained from Earth-observing satellites now spans three decades, making
possible the mapping of LC across large regions by the classification of satellite images.  However, the
accuracy of these derived maps must be known as a condition of the classification. Theoretically, the best
reference data against which to evaluate classifications are those collected on the ground at or near the
time of satellite overpass. However, such data are rarely available for retrospective multi-temporal
studies, thus mandating the use of alternative data sources. Accordingly, the U.S. Environmental
Protection Agency (EPA) has established a priority research area for the development and implementation
of methods to document the accuracy of classified LC  and  land characteristics databases  (Jones et al.,
2000).
                                  Upper San Pedro Watershed
To meet the ever-growing need
to generate reliable LC products
from current and historical
satellite remote sensing data, the
accuracy of derived products must be
assessed using methods that
are both effective and
efficient. Therefore,  our
objective was to
demonstrate the viability of
utilizing new high-
resolution digital
orthophotography, along
with other airborne data, as
an effective substitute when
historical ground sampled data were
not available. The achievement of
consistent accuracy assessment results using
these diverse sources of reference data
would indicate that these
techniques could be more
widely applied in              Figure 9-1.   Location of the Upper San Pedro River watershed study
retrospective LC studies.                     area with shaded relief map.
Page 138 of 339

-------
In this study, classification accuracies for four separate LC maps of the San Pedro River watershed in
southeastern Arizona, and northeastern Sonora, Mexico (see Figure 9-1) were evaluated using historical
aerial photography, digital orthophoto quadrangles, and high-resolution airborne video. Landsat
Multispectral Scanner (MSS) data (60 m pixels) were classified for years 1973, 1986, and 1992. Lastly,
1997 Landsat Thematic Mapper (TM) data (30 m pixels) were resampled to 60 m to match the MSS
resolution and classified.  All data were analyzed at the Institute del Medio Ambiente y el Desarrollo
Sustentable del Estado de Sonora (IMADES) in Hermosillo, Mexico. Map accuracy was assessed by
Lockheed-Martin (Las Vegas, Nevada) for 1973 and 1986 and at the University of Arizona (Tucson,
Arizona) for 1992 and 1997. This study incorporated previous accuracy assessment methods developed
for the San Pedro watershed by Skirvin et al. (2000) and Maingi et al. (2002).
9.2    Background

9.2.1   Upper San Pedro Watershed Study Area

The study location comprised the upper watershed of the San Pedro River, which originates in Sonora,
Mexico and flows north into southeastern Arizona. Covering approximately 7,600 km2 (5,800 km2 in
Arizona and 1,800 km2 in Sonora, Mexico), this area represents the transition between the Sonoran and
Chihuahuan deserts, and topography, climate, and vegetation vary substantially across the watershed.
Elevation ranges from 900-2,900 m and annual rainfall ranges from 300-750 mm. Biome types include
desertscrub, grasslands, oak woodland-savannah, mesquite woodland, riparian forest, and conifer forest,
with limited areas of irrigated agriculture.  Urban areas, including several small towns and the rapidly
growing U.S. city of Sierra Vista, are fringed by low-density development that also occurs far from
population centers. Numerous geospatial data sets covering the upper San Pedro watershed can be
viewed and downloaded at the  U.S. Environmental Protection Agency San Pedro website (U.S. EPA,
2000).

9.2.2  Reference Data  Sources for Accuracy Assessment

Aerial photography has long served in the creation of LC maps, both as a mapping base and more recently
as a source of higher resolution reference data for comparison with maps produced by classification of
satellite imagery. Coverage for the conterminous United States at 1:40,000-scale is available through the
National Aerial Photography Program (NAPP) and is scheduled for update on a 10-year repeat cycle.
Digital orthophoto quarter quadrangles (DOQQs) are produced from the  1:40,000-scale NAPP or
equivalent high-altitude aerial photography that has been ortho-rectified using digital elevation models
(DEMs) and ground control points of known location.  A DOQQ image pixel represents 1.0 m2 on the
ground, permitting detection of landscape features as small as approximately 2.0 m in diameter.
However, the image analyst may need site visits and/or supplementary higher resolution images to
visually calibrate for DOQQ-based LC interpretation.

 Marsh et al. (1994) described the utility of airborne video data as a cost-effective means to acquire
 significant numbers of reference data samples for classification accuracy assessment. In that study, very
 similar classification accuracies were derived from airborne video reference data and from  aerial color 35
                                                                                   Page 139 of 339

-------
mm reference photography acquired under the same conditions. With the addition of Global Positioning
System (GPS) coordinate data encoded directly onto the videotape for georeferencing, sample points can
be rapidly located for interpretation during playback.


9.2.3  Reporting Accuracy Assessment Results

The current standard for reporting results of classification accuracy assessment focuses on the error or
confusion  matrix, which summarizes the comparison of map class labels with reference data labels. Some
easily computed summary statistics for the error matrix include overall map accuracy, proportion correct
by classes  (user and producer accuracy), and errors of omission and commission.  Additional summary
statistics usually include a Kappa (Khat) coefficient that adjusts the overall proportion correct for the
possibility of chance agreement (Congalton et al., 1983; Rosenfield and Fitzpatrick-Lins, 1986;
Congalton and Green, 1999).  Although Kappa is widely used, some authors have criticized its
characterization of actual map accuracy (Foody, 1992).  Ma and Redmond (1995) proposed some
alternatives to the Kappa coefficient, including a Tau statistic that is more readily computed and easier to
interpret than Kappa.  Stehman (1997) reviewed a variety of summary statistics and concluded that
overall map accuracy and user and producer accuracies have direct probabilistic interpretations for a
given map, whereas other summary statistics must be used with caution. The error matrix itself is
recognized as the most important accuracy assessment result when accompanied by descriptions of
classification protocols, accuracy assessment design, source of reference data, and confidence in reference
sample labels (Stehman and Czaplewski, 1998; Congalton and Green,  1999; Foody, 2002).
9.3    Methods

Four LC maps for the Upper San Pedro River Watershed (see Plate 9-1) were generated using 1973, 1986
and  1992 North American Landscape Characterization (NALC) project MSS data (Lunetta et al., 1993)
and the 1997 TM data.  All images were coregistered and georeferenced to a 60 x 60 m Universal
Transverse Mercator (UTM) ground coordinate grid with a nominal geometric precision of 1.0-1.5 pixels
(60-90 m).

9.3.1   Image Classification

The same LC classes (n=lO) were used to develop all four maps (see Tables 9-1 & 9-2). Vegetation cover
classes represented very broad biome-level categories of biological organization, similar to the ecological
formation levels as described in the classification system for biotic communities of North America
(Brown et al., 1979). The classes included forest, oak woodland, mesquite woodland, grassland,
desertscrub, riparian, agriculture, urban, water, and barren and were selected after direct consultation with
the major land managers and stakeholder groups within the San Pedro watershed in Arizona and Mexico
(Kepner et al., 2000).

The classification process for each data set began with an unsupervised classification using the green, red
and near-infrared spectral bands to produce a map with 60 spectrally distinct classes. The choice of 60
Page 140 of 339

-------
classes was based on previous experience with NALC data that usually gave a satisfactory trade-off
between the total number of classes and the number of mixed classes. In this context, it proved helpful to
define a set of 21 intermediate classes, which were easier to relate to the spectral information. For
example, the barren class contained bare rock, chalk deposits, mines, tailing ponds, etc., which had
unique spectral signatures. Each class was then displayed over the false-color image and assigned to one
of the LC categories or to a mixed class.

Interactive manipulation of spectral signatures for each class permitted many of the mixed classes to be
resolved. The remaining mixed classes were separated into different categories using a variety of
ancillary information sources, such as topographic maps produced by the Mexican National Institute of
Statistics, Geography and Information (INEGI) (l:50,000-scale) and the U.S. Geological Survey
(1:24,000-scale). The ancillary information used depended on the image being analyzed; for example,
classification of the 1992 image relied heavily on field  visits to establish  ground control.  Five three-day
site visits were conducted from September 1997 to June 1998 to enable analysts to collect specific LC
data with the aid of GPS equipment.
                                                                         j Ctoudi (1992 to 1997 only)
            Oak Woodland
            Mesquite Woodland
     Plate 9-1. 1973,1986,1992 and 1997 land-cover maps of the Upper San Pedro River watershed
               with key to classes.
                                                                                       Page 141 of 339

-------
Forest
Oak
Woodland
Mesquite
Woodland
Grassland
Desertscrub
Riparian
Agriculture
Urban
(Low and
High Density)
Water
Barren
V
fr
e
P
V
u
Is
g
V
0
d
rr
V
h
e
le
Ic
g
d
V
0
c
a
0
V
P
rr
la
a
C
fc
T
in
T
h
s
C
S
re
a
re
c
A
(i
             Table 9-1.  Land-cover class descriptions for the Upper San Pedro Watershed.

                Vegetative communities comprised principally of trees potentially over 10m in height and
                frequently characterized by closed or multi-layered canopies. Species in this category are
                evergreen (with the exception of aspen), largely coniferous (e.g., ponderosa pine, pinyon
                pine), and restricted to the upper elevations of mountains that arise off the desert floor
                Vegetative communities dominated by evergreen trees (Quercus spp.) with a mean height
                usually between 6 and 15m.  Tree canopy is usually open or interrupted and singularly
                layered.  This cover type often grades into forests at its upper boundary and into semi-arid
                grassland below.
                Vegetative communities dominated by leguminous trees whose crowns cover 15% or more
                of the ground often resulting in dense thickets.  Historically maintained maximum
                development on alluvium of old dissected flood plains; now present without proximity to
                major watercourses.  Winter deciduous and generally found at elevations below 1,200m
                ^      —^   ,    .	.
                Vegetative communities dominated by perennial and annual grasses with occasional
                herbaceous species present.  Generally grass height is under 1m and they occur at
                elevations between 1,100 and 1,700m;  sometimes as high as 1,900m.  This is a landscape
                largely dominated by perennial bunch grasses separated by intervening bare ground or
                low-growing sod grasses and annual grasses with a less-interrupted canopy. Semi-arid
                grasslands are mostly positioned in elevation between evergreen woodland above and
                desertscrub below.	

                Vegetative communities comprised of short shrubs with sparse foliage and small cacti thai
                occur between 700 and 1,500m in elevation. Within the San Pedro river basin this
                community is often dominated by one of at least three species, i.e., creosotebush, tarbush
                and whitethorn acacia. Significant areas of barren ground devoid of perennial vegetation '
                often separate individual plants.  Many desertscrub species are drought-deciduous.
                ———^—.	.	       '
                Vegetative communities adjacent to perennial and intermittent stream reaches. Treescan—
                potentially exceed an overstory height of 10m and are frequently characterized by closed
                multi-layered canopies depending on regeneration. Species within  the San Pedro basin ar*^
                largely dominated by two species, i.e., cottonwood and Goodding willow.  Riparian specie
                are  largely winter deciduous.
                —,                                                         .	________^^^^^^^
                Crops actively cultivated and irrigated.  In the San Pedro River basin these are primarily"
                found along the upper terraces of the riparian corridor and are dominated by hay and alfalfe*
                They are minimally represented in overall extent (less than 3%) within the basin and are
                irrigated by ground and pivot-sprinkler systems.

                This is a land-use dominated by small ejidos {farming villages or communes), retirement
                homes, or residential neighborhoods (Sierra Vista).  Heavy industry is represented by a
                single open-pit copper mining district near the headwaters of the San Pedro River near
                Cananea, Sonora (Mexico),	
                —•	—.	
                Sparse free-standing wafer is available in the watershed. This category would be mostly
                represented by perennial reaches of the San Pedro and Babocomari rivers with some
                attached pools or represses (earthen reservoirs), tailings ponds near Cananea, ponds
                recreational sites such as parks and golf courses, and sewage treatment ponds east
                city  of Sierra Vista, Arizona^	
                	_	.	.	____^
                A cover class represented by large rock outcropping or active and abandoned mines
                (including tailings) that are largely absent of above-ground vegetation.
Page 142 of 339

-------
                      Table 9-2.  Upper San Pedro watershed land-cover
                                 classes:  absolute and relative areas.
                                 Representative values from 1997 land-
                                 cover classification.
Land Cover Class
Grassland
Desertscrub
Woodland Mesquite
Woodland Oak
Urban
Agriculture
Riparian
Forest
Barren
Water
Total
Area
(hectares)
263475
229571
101559
90540
16562
14530
9217
7193
6814
417
739878
Proportion of
Total Area (%)
36
31
14
12
2
2
1
1
1
<0.1
100
9. 3. 2  Sampling Design

Because available reference data only partially covered the study area, pixels within each map were not
equally likely to be selected for sampling; thus a tradeoff between practical constraints and statistical
rigor was necessary (Congalton and Green, 1999).  Sample points were selected using a stratified random
sampling design, stratified by LC area for each of the four accuracy assessments (Table 9-2). Reference
data covering the Mexican  portion of the study area were not available. The number of sample points was
calculated using the following equation based on binomial probability theory (Fitzpatrick-Lins, 1981):
where N = number of samples, p = expected or calculated accuracy (%), q = 100-p, E = allowable error,
and Z = standard normal deviate for the 95% two-tail confidence level = 1 .96.

For the lowest expected map accuracy of 60% with an allowable error of 5%, 369 sample points were
required. Under area-stratified sampling, rare classes of small total area (i.e., water and barren) would not
be sampled sufficiently to detect classification errors, so the minimum sample size was set to 20 where
available (van Genderen and Lock, 1977). Work by Congalton (1991) and Congalton and Green (1999)
suggests that sample sizes derived from multinomial theory are appropriate for comparing class
accuracies, with a minimum sample size of 50 per class; however, this goal was not attainable for rare
classes in this study.
                                                                                   Page 143 of 339

-------
After evaluation of selected sample points in each reference data set, an error matrix was constructed,
comparing map class labels to reference data labels for each LC classification. Overall map accuracy and
class-specific user and producer accuracies were calculated for each class.  A Khat (Cohen's Kappa) and
Tau (Ma and Redmond, 1995) were computed for the four error matrices, followed by a significant
difference test (Z-statistic) based on Khat values (Congalton and Green,  1999).


9.3.3  Historical Aerial Photography

Reference data for the 1973 and 1986 LC maps  were developed using aerial photography stereo pairs
covering the Arizona portion of the study area (1:40,000-scale).  A team including members from the
following disciplines conducted accuracy assessments: (1) photo interpreter; (2) image processing
specialist;  (3) CIS specialist; and (4) statistician. A preliminary study was conducted, using data
collected during a field trip to the study area, to  evaluate  the effectiveness and accuracy of using aerial
photographs to discriminate grassland, desertscrub, and mesquite woodland classes. These classes were
particularly difficult to distinguish on the aerial photographs.

9.3.3.1 Image Collection, Preparation, and Site Selection

Landsat MSS data registration and other data integrity issues were reviewed for the 1973 and 1986 maps.
These efforts included checking projection parameters and visual alignment using G1S data layers (i.e.,
roads, streams, digital raster graphics, and digital elevation models). Random sample points were
generated using DOQQs acquired in 1992 (1:25,000-scale), and individual sample points were located on
the aerial photographs using the DOQQs for accurate placement. A 180 x 180m interpretation grid was
generated and overlaid onto the LC maps.

Two mutually exclusive sets of sample points were generated for both 1973 and 1986 maps. The second
set of sample points served as a pool of substitute points when no aerial photographs were available for a
sample point in the first set. Whenever possible, pixels selected as sample sites represented the center of
a 3 x 3 pixel window representing a homogeneous cover  type.  For rare classes (e.g., water), pixel sample
points were chosen with at least six pixels in the window belonging to the same class. A total of 813
reference samples were used to assess the  1973 (n=429) and 1986 (n=384) maps. Multiple dates of aerial
photographs were used in assessment:  June  1971 and April 1972 (1973  map), and June 1983, June  1984
and September  1984 (1986 map).

9.3.3.2 Photograph Interpretation and Assessment

Photointerpreter training included using a subset of the generated sample points identified during visits to
the San Pedro watershed locations as interpretation keys.  To avoid bias, photointerpreters did not know
what classifications had been assigned to sample points on the digital LC maps.  To locate the randomly
chosen sample sites on the aerial photographs, the site locations were first displayed on the DOQQ.
Interpreters could then visually transfer the location of each site to the appropriate photograph by
matching identical spatial data such as roads, vegetation patterns, rock outcrops, or other suitable features
visible on the DOQQ and on the photograph.  Each transferred sample point was examined on
stereoscopic photographs and identified using the definitions shown in Table 9-1.  LC categories for each
sample point were recorded on a spreadsheet. A comment column  on the spreadsheet allowed the
Page 144 of 339

-------
interpreter to enter any notes about the certainty or ambiguity of the classification. The Senior
Photointerpreter checked the accuracy of 10% of the sample point locations and 15% of the spreadsheet
entries to assure completeness and consistency.  All LC class interpretations noted by a photointerpreter
as "difficult" were classified by consensus opinion of all the interpreters.


9.3.4  Digital Orthophoto Quadrangles

Approximately 60 panchromatic DOQQs acquired in 1992 for the U.S. portion of the study area were
available as reference data to evaluate the 1992  results. To obtain a precise geographic matching between
the DOQQs and the satellite-derived map, the 1992 source MSS  image data were geometrically registered
to an orthorectified 1997 TM scene, and the resulting transformation parameters were applied to the 1992
thematic map.

9.3.4.1  Interpreter Calibration

To effectively visualize conditions represented by the LC class descriptions (Table 9-1), University of
Arizona and 1MADES team members participated in a field visit to numerous sites in the San Pedro
watershed  study area,  including areas that were intermediate between classes. The analyst performing the
 1992 assessment also reviewed high-resolution color airborne video data for comparison with the
appearance of LC classes in the DOQQs. The video data was acquired over the watershed in 1995 and
vegetation in selected  frames at 1:200-scale was identified to species or species groups (Drake, 2000).
Irnage "chips" were extracted from the DOQQs as an aid to LC class recognition in the reference data
(Maingi et al., 2002).

9.3.4.2 Sample Point Selection

Generation of sample points from LC maps relied on a window majority rule. A window kernel of 3 x 3
 pixels was moved across each cover  class and resulted in selection of a sample point if a majority of six of
the nine pixels belonged to the same class.  This ensured that points were extracted from areas of
 relatively homogenous LC.  A 180 x 180 m DOQQ sample size was used to match the 3 x 3 pixel map
 window and a map class was assigned  and recorded for the DOQQ sample. A  total of 457 points were
 sampled to assess the 1992 map.


 9.3.5  Airborne  Videography

 Accuracy assessment of the 1997 LC map was performed using airborne color video data encoded with
 GPS time and latitude and longitude coordinates. The video data were acquired on May 2-5, 1997 and
 were therefore nearly coincident with the June Landsat TM scene. There were eleven hours of
 continuously recorded videography of the San Pedro Watershed for the area north of the US-Mexico
 border, acquired at a  flying height of 600 m above ground level. The nadir-looking video camera used a
 motorized 15X-zoom lens that was computer controlled to cycle every 12 seconds during acquisition,
 with a full-zoom view held for three seconds.  The swath width at wide angle was about 750 m, and
 approximately 50 m at full zoom. At full zoom, the ground pixel size was about 7.0 cm, and the frame
 was approximately 1:200-scale when displayed on  a 13-inch monitor. Although the nominal accuracy  of
                                                                                   Page 145 of 339

-------
the encoded GPS coordinates was only 100 m, ground sampling revealed that average positional accuracy
was closer to 40 m (McClaran et al., 1999; Drake, 2000). The video footage was acquired by flying
north-south transects spaced 5.0 km apart and the total flight coverage encompassed a distance of nearly
2000 km.

9.3.5.1  Video and CIS Data Preparation

The encoded GPS time and geographic coordinate data were extracted from the video into a spreadsheet
for each flight line.  Coordinate data from the spreadsheets were used to create GIS point coverages of
frames from each flight line.  Individual frames of the video data were identified during viewing by a time
display showing hours, minutes and seconds, in addition to a counter that numbered the 30 frames
recorded per second.  The time display information was included as an attribute to the GIS point
coverages, which were inspected for erroneous coordinate or time data indicated by points that fell  off the
flight lines or were out of time sequence; such points were deleted.

9.3.5.2  Video Sample Point Selection

To minimize the likelihood of video sample points falling on boundaries between cover classes, selection
of random sample points along the video flight lines was restricted to relatively homogeneous areas
within classes.  This was accomplished by applying a 3 x 3 diversity or variety filter to the 1997 map,
which replaced the center pixel in a moving window by the number of different data file values (cover
classes)  present in the window.  Pixels assigned the value of one therefore represented centers of 180 x
180 m homogeneous areas on the map. Background, clouds and cloud-shadowed pixels were excluded to
prevent  the selection of pixels that fell at the edge of the map, within openings in clouds or in cloud-
shadowed areas, where the adjacent cover classes were not known.

Video flight line coverages were overlaid on the map of homogeneous cover, and a subset of frames
falling on homogeneous areas (n = 4567) was drawn from all study area frames (n = 18,104). The  map
class under each subset frame was added as  an attribute to the "candidate frames" GIS point coverage for
stratification purposes.

9.3.5.3  Random Frame Selection  and Evaluation

Video sample points were drawn randomly from the homogeneous subset, stratified by map class area,
and were distributed throughout the Arizona portion of the study area. The water class was excluded
from analysis for lack of adequate reference data (n=6) and was not presented in the final error matrix.  A
surplus of approximately 15% over the calculated minimum number of frames needed for each cover
class was selected. The videography interpreter was provided with spreadsheet records containing  the
videotape library identifier, latitude, and longitude for each sample frame, along with GPS time for frame
location on the tape. A cover class was assigned to each sample point and recorded in the spreadsheet.

Although the accuracy of video frame  interpretation was not assessed in this study, it is expected to be
very high. Drake (1996) reported that LC identification of similar airborne videography at the more
detailed  biotic community level averaged 80% accuracy after only three hours of interpreter training.  The
interpreter for this study had substantial prior experience in both video frame interpretation and ground
sampling for videography accuracy assessment in this region.
Page 146 of 339

-------
9.4   Results

9.4.1  Aerial Photography Method

Results of accuracy assessment are presented in Table 9-3 (1973) and Table 9-4 (1986). Overall map
accuracies were similar at 70% for 1973 and 68% for 1986. Khat and Tau statistics were also similar at
0.62 and 0.59 (Khat), and 0.66 and 0.65 (Tau) for 1973 and 1986, respectively.  The user's and
producer's accuracies were similar to overall accuracy for all except the mesquite woodland and barren
classes, which showed substantially less than average accuracies in both years. The water class in 1973
had very low accuracies of 25% (producer's) and 10% (user's), and could not be assessed for 1986.

    Table 9-3.  Error matrix comparing aerial photo interpretation and 1973 digital land-cover
               classification, with producer's and user's accuracy by  class. Overall accuracy =
               70%. Tau = 0.66. Cohen's Kappa (Khat) = 0.62; standard error = 0.027.

1973 Land-Cover Class
Grand
1
2
3
4
5
6
7
8
9
10
Total
Reference
(Aerial Photo Interpretation Class)
1
19
1
0
0
0
0
0
0
0
0
20
2
1
33
1
0
0
0
0
0
0
0
35
Land-Cover Class
1. Forest
2. Woodland Oak
3. Woodland Mesquite
4. Grassland
5. Desertscrub
6. Riparian Forest
7. Agriculture
8 Urban
9. Water
10. Barren
Total
3
0
0
16
13
14
3
3
0
4
0
53
4
0
3
1
92
11
0
0
2
3
2
114
1973
Map Total
20
37
20
128
122
20
22
20
20
20
429
5
0
0
0
21
96
2
7
5
6
15
152
6
0
0
2
0
0
15
1
0
0
0
18
Photo-
interpreter
Total
20
35
53
114
152
18
11
13
5
8
429
7
0
0
0
0
0
0
10
0
1
0
11
8
0
0
0
0
0
0
0
13
0
0
13
Number
Correct
19
33
16
92
96
15
10
13
3
2
299
9
0
0
0
1
0
0
0
0
3
1
5
10
0
0
0
1
1
0
1
0
3
2
8
Producer's
Accuracy
(%)
95
94
30
81
63
83
91
100
60
25

Grand
Total
20
37
20
128
122
20
22
20
20
20
429
User's
Accuracy
(%)
95
89
80
72
79
75
45
65
15
10

                                                                                 Page 147 of 339

-------
  Table 9-4.  Error matrix comparing aerial photo interpretation and 1986 land-cover
             classification, with producer's and user's accuracy by class. Overall accuracy =
             68%. Tau = 0.65.  Cohen's Kappa (Khat) = 0.61; standard error = 0.029.

1986 Land-Cover Classes
1
2
3
4
5
6
7
8
10
Grand Total
Reference
(Aerial Photo Interpretation Class)
1
19
3
0
0
0
0
0
0
0
22
2
1
35
0
0
0
0
0
0
0
36
Land-Cover Class
1. Forest
2. Woodland Oak
3. Woodland Mesquite
4. Grassland
5. Desertscrub
6. Riparian Forest
7. Agriculture
8. Urban
10. Barren
Total
3
0
0
17
12
8
0
1
0
3
41
4
0
1
3
77
13
1
4
5
10
114
1986 Map
Total
20
39
42
104
95
23
20
21
20
384
5
0
0
19
12
74
1
3
3
7
119
6
0
0
0
0
0
19
2
0
0
21
Photo-
interpreter
Total
22
36
41
114
119
21
13
14
4
384
7
0
0
1
1
0
2
9
0
0
13
8
0
0
0
0
0
0
1
13
0
14
Number
Correct
19
35
17
77
74
19
9
13
0
263
10
0
0
2
2
0
0
0
0
0
4

Grand Total
20
39
42
104
95
23
20
21
20
384
Producer's
Accuracy
(%)
86
97
42
68
62
91
69
93
0
User's
Accuracy
(%)
95
90
41
74
78
83
45
62
0

9.4.2  Digital Orthophoto Quadrangle Method

Accuracy assessment results are summarized in Table 9-5. Overall accuracy was about 75%, with Khat
of 0.70 and Tau of 0.72. The producer's accuracy was 100% for four classes (forest, urban, water,
barren), indicating that all pixels examined in the DOQQs for these classes were correctly labeled in the
1992 map. The user's accuracy was also high for forest and water classes, but was substantially less for
urban and barren classes at 44% and 55% respectively. Accuracies of mesquite woodland and grassland
classes were lower than for other classes.
Page 148 of 339

-------
Table 9-5.  Results of DOQ-based accuracy assessment of 1992 land-cover classification:  error
           matrix and producer's and user's accuracy by class. Overall accuracy = 75%. Tau =
           0.72. Cohen's Kappa (Khat) = 0.70; standard error = 0.025.

1992 Land-Cover Classes
1
2
3
4
5
6
7
8
9
10
Grand Total
Reference
(Digital Orthophoto Quads)
1
22
0
0
0
0
0
0
0
0
0
22
2
2
44
2
6
1
0
0
0
0
0
55
Land-Cover Class
1. Forest

2. Woodland Oak
~~3. Woodland Mesquite
^4. Grassland
~~5. Desertscrub
~~6. Riparian Forest
""7. Agriculture
8. Urban

9. Water
10. Barren
Total
3
0
0
40
12
8
0
1
2
1
0
64
4
0
3
9
68
11
0
0
1
0
7
99
1992 Map
Total
24
48
62
103
109
23
23
25
20
20
457
5
0
1
10
17
89
0
0
10
0
2
129
6
0
0
1
0
0
20
4
0
0
0
25
DOQ
Total
22
55
64
99
129
25
22
11
19
11
457
7
0
0
0
0
0
3
18
1
0
0
22
8
0
0
0
0
0
0
0
11
0
0
11
Number
Correct
22
44
40
68
89
20
18
11
19
11
342
9
0
0
0
0
0
0
0
0
19
0
19
10
0
0
0
0
0
0
0
0
0
11
11
Producer's
Accuracy
(%)
100
80
63
69
69
80
82
100
100
100

Grand Total
24
48
62
103
109
23
23
25
20
20
457
User's
Accuracy
(%)
92
92
65
66
82
87
78
44
95
55

9.4.3  Airborne Videography Method

Overall 1997 map accuracy was 72%, with Khat of 65% and Tau of 68% (see Table 9-6). A detailed
examination of results by cover class shows substantial variability in classification accuracy, with
producer's accuracies ranging from 54% to 100%, and user's accuracies from 13% to 100%.  For most
classes the two measures were roughly comparable and fell within the range of 60-90%. Exceptions were
the mesquite woodland class with accuracies around 50%, and agriculture and barren classes with
relatively high producer's accuracies (71%-100%), but lower user's accuracies (13%-21%).
                                                                               Page 149 of 339

-------
    Table 9-6. Results of video-based accuracy assessment of the 1997 land-cover
              classification: error matrix and user's and producer's accuracy by class.
              Overall accuracy = 72%.  Tau = 0.68.  Cohen's Kappa (Khat) = 0.65; standard
              error = 0.024.

1997 Land-Cover Classes
Grand
1
2
3
4
5
6
7
8
10
Total
Reference
(Video Frame Data)
1
20
2
0
0
0
0
0
0
0
22
2
4
50
1
8
4
0
0
0
0
67
Land-Cover Class
1. Forest
2. Woodland Oak
3. Woodland Mesquite
4. Grassland
5. Desertscrub
6. Riparian
7. Agriculture
8. Urban
9. Water
10. Barren
Total
3
0
0
27
16
4
0
1
0
2
50
4
0
3
13
113
12
0
0
0
0
141
1997 Map
Total
24
55
56
159
137
24
24
24
N/A
24
527
5
0
0
12
21
115
0
15
0
19
182
6
0
0
2
0
0
21
2
0
0
25
Video
Total
22
67
50
141
182
25
7
30
N/A
3
527
7
0
0
0
0
0
2
5
0
0
7
8
0
0
1
1
2
1
1
24
0
30
Number
Correct
20
50
27
113
115
21
5
24
N/A
3
378
10
0
0
0
0
0
0
0
0
3
3

Grand Total
24
55
56
159
137
24
24
24
24
527
Producer's
Accuracy
(%)
91
75
54
80
63
84
71
80
N/A
100
User's
Accuracy
(%)
83
91
48
71
84
88
21
100
N/A
13

9.5    Discussion

9.5.1   Map Accuracies

Statistics describing map accuracy were very similar among the four dates tested regardless of differences
in assessment methods and reference data. Overall map accuracies ranged from 67% to 75% and Tau
values from 0.65 to 0.72.  There were no statistically significant differences among Khat values (0.61 to
0.70) for all possible date comparisons.
Page 150 of 339

-------
One aspect of sampling that differed among the assessments was the application of homogeneity
standards to the context of map sample points. Selection was made from the center of uniform 3x3 pixel
windows for the 1973 and 1986 assessments, with an exception for rare cover classes requiring only a
majority of >5 pixels to match the center pixel. All sample points were selected from uniform 3x3
windows in the 1997 assessment. In contrast, for the 1992 assessment, a map class label was assigned as
the majority of >6 pixels within a 3 x 3 window centered on the sample point.  Although a positive bias
may have been introduced by sampling only in homogeneous areas (Hammond and Verbyla,  1996), this
effect was not apparent in results presented here.


9.5.2  Class  Confusion

For all dates evaluated, the producer's and user's accuracies tended to be similar to the overall
classification accuracies and ranged between 61%-100%. Generally low classification accuracies were
expected in a spatially heterogeneous setting such as the San Pedro watershed where cover types were
distributed in a patchy fashion across this landscape due to climatic and edaphic effects and land-use
practices.  Classes mapped with lower than average accuracy included the small-area agriculture, urban,
water and barren classes and the widespread mesquite woodland class. Factors likely to have contributed
to class confusions included: (1) LC changes between the dates of image and reference data (especially
for the 1973 and 1986 maps); (2) high spatial variability within classes (including areas dominated by soil
background reflectance); (3) variable interpretations of class definitions by independent assessment
teams; and (4) errors in  reference data interpretation. Geometric misregistration did not appear to be a
factor in the results presented here.

The agriculture class had higher producer than user accuracies for all dates and was most frequently
confused with riparian,  desertscrub and mesquite woodland classes. The spatial distribution of
agricultural areas in the watershed essentially outlined the riparian corridors, contributing to mixed pixel
spectral response and classification confusion. There may have been difficulty in distinguishing fallow
and abandoned agricultural  fields from adjacent desertscrub and mesquite woodland, since the spectral
response of these cover types was generally dominated by soil background.

 The urban class included low-density settlement on both sides of the border. Low-density development
 was difficult to distinguish from surrounding cover types even at the DOQQ scale, suggesting the
 possibility of error in both maps and reference data. The accelerating pace of development in the
 watershed, particularly in Arizona, may have contributed to cover changes occurring between the dates of
 imagery and reference data.

 The water class had the smallest area and was likely to have changed between the dates of images and
 reference data, due to the ephemeral nature of most surface water in this semi-arid environment.  For
 example, the 1973 NALC scene was acquired after a high rainfall El Nino-Southern Oscillation (ENSO)
 event during the winter of 1972-73 and portrayed wetter conditions than reference aerial photography
 acquired in 1971 and 1972 (Easterling et al., 1996; NOAA, 2001).  The water class was not evaluated in
  1986 and 1997 assessments due to insufficient representation in reference data.
                                                                                     Page 151 of 339

-------
The barren class was mapped with poor accuracy overall, including 0% correct in 1986.  This class was
most often confused with mesquite woodland, grassland, and desertscrub. These classes generally have
sparse vegetation cover, with many image pixels dominated by soil or rock spectral responses, and were
difficult to distinguish from truly barren areas at the MSS 60 m pixel size. A total of 38% of samples
interpreted as barren on reference aerial photography from 1971  and 1972 were mapped as water in 1973;
this was probably due to the inter-annual variations in precipitation mentioned above.

The mesquite woodland class may be interpreted as an indicator of landscape change in the San Pedro
watershed (Kepner et al., 2000; Kepner et al., 2002). Conversion of many grassland areas to shrub
dominance during the last 120 years  is well documented for this region (Bahre,  1991 and 1995; Wilson et
al., 2001), and these change detection results were of potential interest to many researchers. However,
both user and producer accuracies of all four dates were generally low for mesquite woodland (30% and
80% respectively for 1973 and 40% to 65% for other years). Class confusions included all but the forest
class, with especially large errors in the grassland and desertscrub classes. This result may substantially
reflect both the spatially and temporally transitional nature of the class and differences in interpretation
among the groups performing image classification and accuracy assessment. Additionally, it was likely
that the neither the spectral  nor the spatial resolution of MSS imagery was adequate to distinguish the
mesquite woodland class in  a heterogeneous semi-arid environment, where most pixels are mixtures of
green and woody vegetation, standing litter, and soils of varying brightness (Asner et al., 2000).


9.5.3  Future Research

Assessment of future LC classifications for the Upper San Pedro area should incorporate some measure of
the reference data variability, perhaps also allowing a secondary class label (Zhu et al., 2000; Yang et al.
2001). This may help to clarify the results for some cover classes. For example, the low accuracies and
class confusions associated  with the mesquite woodland class may have been due, in large part, to its
gradational nature. If the interpreter were able to quantify the confidence associated with reference point
interpretations, there would  not have been a need to select sample points from homogeneous map areas,
thus reducing the possibility of a  positive accuracy bias (Foody,  2002). Another useful tool for future San
Pedro LC  work is the map of all sample points used in the accuracy assessment.  Each point was
attributed  with geographic coordinates and both map and reference data labels (Skirvin et al., 2000).
These data could be applied to generate a geographic representation of the continuous spatial distribution
of LC errors (Kyriakidis and Dungan, 2001) to highlight especially difficult  areas that should be field-
checked or otherwise handled in the  future.
9.6   Conclusions

The results discussed in this report indicate that historical aerial photography, DOQQ data, and high
resolution airborne video data can be used successfully to perform classification accuracy assessment on
LC maps derived from historical satellite data. Archived aerial photographs may be the only reference
data available for retrospective analysis before 1992. However, their resolution (l:40,000-scale for
N APP data) often makes this task difficult. Successful use of DOQQ data requires precise geometric
registration of the LC map to allow the overlay of ortho-rectified DOQQs.  The use of georeferenced
Page 152 of 339

-------
high-resolution airborne videography as a proxy for actual ground sampling in accuracy assessment
provided the best method for current reference data development in the San Pedro watershed. The
advantages include: (1) cost-effective collection of a statistically meaningful number of sample points;
(2) effective control of coordinate locational error; and (3) variable-scale videography that permits the
identification of specific plant species or communities of interest. Additionally, the videography provides
a clear depiction of cultural features and land-use activities. The main limitation of this method is that
data are collected along pre-determined flight paths, thus constraining the sampling frame design.
9.7    Summary

Because the rapidly growing archives of satellite remote sensing imagery now span decades, there is
increasing interest in the study of long-term regional LC change across multiple image dates. However,
temporally coincident ground sampled data may not be available to perform an independent accuracy
assessment of the image-derived LC map products.  This study explored the feasibility of utilizing
historical aerial photography, DOQQs, and high-resolution airborne color video data to assess the
accuracy of satellite derived LC maps for the upper San Pedro River w atershed in southeastern Arizona
and northeastern Sonora, Mexico. Satellite image data included Landsat Multi-Spectral Scanner (MSS)
and Landsat Thematic Mapper (TM) data acquired over an approximately 25-year period.  Four LC
classifications were performed using three dates of MSS imagery (1973,  1986, 1992) and one TM  image
/1997). The TM imagery was aggraded from 30 m to 60 m the match the coarser MSS pixel size.

A stratified random sampling design was incorporated with samples apportioned by LC area, using a
rninimum sample size of n=20 for rare classes.  Results indicated similar map accuracies were obtained
 using the three alternative methods. Aerial photography provided reference data to assess the 1973 and
 1986 LC maps with overall classification  accuracies of 70% (1973) and 67% (1986).  Assignments of
class labels to sample points on 1992 reference DOQQs were verified by comparison with higher
 resolution airborne video data, with overall 1992 map classification accuracy of 75%. Accuracy
 assessment of the 1997 products used contemporaneous airborne color video data, and resulted in  an
 overall map accuracy of 72%. There was no evidence of positive bias in accuracy resulting from use of
 homogeneous versus heterogeneous pixel contexts in sampling the LC maps.

 The use of historical aerial photography, high-resolution DOQQs, and airborne videography as a proxy
 for actual ground sampling for satellite image classification accuracy has merit.  Selection of a reference
 data set for this study depended on the date of image  acquisition.  For example, prior to 1992, historical
 aerial photography was the only data available.  DOQQs covered the period since initiation of the high-
 resolution NAPP in 1992, and high resolution airborne videography provided a cost-effective means of
 acquiring many reference sample points near the time of image acquisition. Problems that were difficult
 to avoid included inadequate sampling of rare classes and reconciling cover changes between acquisition
 dates of aerial photography or DOQQs and satellite image data. Other issues, including the need  for
 consistent geometric rectification and criteria for mutually exclusive and reproducible LC class
 descriptions, need special attention when satellite image classification and subsequent LC map accuracy
 assessment are performed by different teams.
                                                                                     Page 153 of 339

-------
9.8   Acknowledgments

The U.S. Environmental Protection Agency, Office of Research and Development, provided funding for
this work. The authors wish to thank participants from U.S. EPA, Lockheed Martin  Environmental
Services, Institute del Medio Ambiente y el Desarrollo Sustentable del Estado de Sonora (1MADES), and
the Arizona Remote Sensing Center at the University of Arizona for their assistance.
9.9   References

Asner, G.P., C.A. Wessman, C.A. Bateson. and J.L. Privette.  Impact of tissue, canopy, and landscape
    factors on the hyperspectral reflectance variability of arid ecosystems. Remote Sens. Environ., 74, 69-
    84, 2000.

Bahre, C.J. A Legacy of Change. The University of Arizona Press, Tucson, AZ, 180 p., 1991.

Bahre, C.J. Human impacts on the grasslands of southeastern Arizona, In:  The Desert Grassland.
    M.P. McClaran and T.R. Van Devender (Editors), The University of Arizona Press, Tucson, AZ,
    264 p., 1995.

Brown, D.E., C.H. Lowe, and C.P. Pase, A digitized classification system for the biotic communities of
    North America, with community (series) and association  examples for the Southwest, Journal of the
    Arizona-Nevada Academy of Science. 14(Suppl. 1), 1-16, 1979.

Congalton, R.  A review of assessing the accuracy of classifications of remotely sensed data. Remote
    Sens. Environ., 37, 35-46, 1991.

Congalton, R.G., R.G. Oderwald, and R.A. Mead.  Assessing Landsat classification accuracy using
    discrete multivariate statistical techniques.  Photogrammetric Engineering and Remote Sensing
    49(12), 1671-1678, 1983.

Congalton, R.G., and K. Green.  Assessing the Accuracy of Remotely Sensed Data: Principles and
    Practices. CRC Press, Inc., Boca Raton, FL, 86 p., 1999.

Drake, S.E.  Visual interpretation of vegetation classes from airborne videography: an evaluation of
    observer proficiency with minimal training.  Photogrammetric Engineering and Remote Sensing,
    62(8), 969-978,  1996.

Drake, S.E.  "Climate-Correlative Modeling of Phytogeography at the Watershed Scale," dissertation
    presented to the University of Arizona, Tucson, Arizona, in partial fulfillment of the requirements for
    the degree of Doctor of Philosophy, 2000.

Easterling, D.R., T.R. Karl, E.H. Mason, P.Y. Hughes, D.P. Bowman, R.C. Daniels, and T.A. Boden.
    United States Historical Climatology Newark (U.S. HCN) Monthly Temperature and Precipitation
    Data. Revision  3, Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory,
    Oak Ridge, TN, 1996.
 Page 154 of 339

-------
Fitzpatrick-Lins, K.  Comparison of sampling procedures and data analysis for a land-use and land-cover
    map. Photogrammetric Engineering and Remote Sensing, 47(3), 343-351. 1981.

foody, G.M.  On the compensation for chance agreement in image classification accuracy assessment.
    Photogrammetric Engineering and Remote Sensing, 58(10),  1459-1460. 1992.

Foody, G.M.  Status of land cover accuracy assessment. Remote Sens. Environ.. 80, 185-201,2002.

Hammond, T.O., and D.L. Verbyla.  Optimistic bias in classification accuracy assessment. //;/. J. Remote
    Sensing,  17, 1261-1266, 1996.

Jones, K.B., L.R. Williams, A.M. Pitchford, E.T. Slonecker, J.D. Wickham. R.V. O'Neill. D. Garofalo,
    and W.G. Kepner. A National Assessment of Landscape Change and Impacts to Aquatic Resources:
    a 10-Year Strategic Plan for the Landscape Sciences Program. EPA/600/R-00/001. U.S.
    Environmental Protection Agency, Office of Research and Development, Washington, D.C.. 2000.

Kepner, W.G., C.J. Watts, and C.M. Edmonds.  Remote Sensing and Geographic Information Systems for
    Decision Analysis in Public Resource Administration: A Case Stuck of 25 Years of Landscape
    Change in a Southwestern Watershed. EPA/600/R-02/039, U.S. Environmental Protection Agency,
    Office of Research and Development, Washington, D.C., 2002.

Kepner, W.G., C.J. Watts,  C.M. Edmonds, J.K. Maingi, S.E. Marsh, and G.  Luna.  A landscape approach
    for detecting and evaluating change in a semi-arid environment. Environmental Monitoring and
    Assessment, 64( 1), 179-195, 2000.

Lunetta, R.L., J.G. Lyon, J.A. Sturdevant, J.L. Dwyer, C.D. Elvidge, L.K.. Fenstermaker, D. Yuan.
    S.R. Hoffer, and R. Werrackoon.  North American Landscape Characteri:ation: Research Plan.
    EPA/600/R-93/135, U.S. Environmental Protection Agency, Las Vegas, NV, 419 p.. 1993.

Kyriakidis, P.C., and J.L. Dungan. A geostatistical approach for mapping thematic classification accurac}
    and evaluating the impact of inaccurate spatial data on ecological model predictions. Environmental
    and Ecological Statistics, 8, 311 -330, 2001.

Ma, Z., and R.L. Redmond. Tau coefficients for accuracy assessment of classification of remote sensing
    data, Photogrammetric Engineering and Remote Sensing. 61(4), 435-439. 1995.

Maingi, J.K., S.E. Marsh, W.G. Kepner, and C.M. Edmonds. An Accuracy Assessment of 1992 Landsat-
    MSS Derived Land Cover for the Upper San Pedro Watershed  (U.S./<\fexico).  EPA/600/R-02/040,
     U.S. Environmental Protection Agency, Office of Research and Development, Washington, D.C.,
     2002.

 Marsh, S.E., J.L. Walsh, and C. Sobrevila.  Evaluation of airborne video data for land-cover classification
    accuracy assessment in an isolated Brazilian forest.  Remote Sens. Environ., 48, 61-69. 1994.

McClaran, M., S.E. Marsh, D. Meko, S.M. Skirvin, and S.E. Drake. Evaluation of the Effects of Global
    Climate Change on the San Pedro Watershed:  Final Report. Cooperative Agreement Number A950-
    Al-0012 between the University of Arizona and the U.S. Geological Survey. Biological Resource
     Division, 1999.
                                                                                  Page 155 of 339

-------
NOAA. Climate Prediction Center: ENSO Impacts on the U.S.: Previous Events, Web page [accessed
    22 October 2002], available at http://www.cpc.ncep.noaa.gov/products/analysis monitoring/
    ensostuft7ensovears.html.

Rosenfield, G.H., and K. Fitzpatrick-Lins. A coefficient of agreement as a measure of thematic
    classification accuracy. Photogrammetric Engineering and Remote Sensing, 52(2), 223-227, 1986.

Skirvin, S.M., S.E. Drake, J.K. Maingi, S.E. Marsh, and W.G. Kepner. An Accuracy Assessment of 1997
    Landsat Thematic Mapper Derived Land Cover for the Upper San Pedro Watershed (U.S./Mexico).
    EPA/600/R-00/097, U.S. Environmental Protection Agency, Office of Research and Development,
    Washington, D.C., 2000.

Stehman, S.V.  Selecting and interpreting measures of thematic classification accuracy. Remote Sens.
    Environ., 62, 77-89,  1997.

Stehman, S.V., and R.L. Czaplewski.  Design and analysis for thematic map accuracy assessment:
    fundamental principles.  Remote Sens. Environ.,  64, 331-344, 1998.

U.S. EPA (U.S. Environmental Protection Agency).  Upper San Pedro River, Web page [accessed 17
    October 2002]. Available at http://www.epa.gov/nerlesdl/land-sci/html2/sanpedro  home.html.

van Genderen, J.L., and B.F. Lock.  Testing land use map accuracy. Photogrammetric Engineering and
    Remote Sensing, 43,  1135-37, 1977.

Wilson, T.B., R.H. Webb, and T.L. Thompson, Mechanisms of Range Expansion and Removal of
    Mesquite in Desert Grasslands of the Southwestern United States, General Technical Report RMR.S-
    GTR-81, U.S. Forest Service, Rocky Mountain Research Station, 2001.

Yang, L., S.V. Stehman, J.H. Smith, and J.D. Wickham. Thematic accuracy of MRLC land cover for the
    eastern United States. Remote Sens. Environ., 76, 418-422, 2001.

Zhu, Z., L. Yang, S.V. Stehman, and R.L. Czaplewski. Accuracy assessment for the U.S. Geological
    Survey regional land-cover mapping program: New York and New Jersey region. Photogrammetric
    Engineering and Remote Sensing, 66(12), 1425-1435, 2000.
Page 156 of 339

-------
                                   Chapter 10

                   Using Classification Consistency in
      Inter-Scene Overlap Areas to Model Spatial Variations in
       Land-Cover Accuracy Over Large Geographic Regions

                                         by

                                    Bert Guindon1*
                                  Curtis M. Edmonds2
                                               2
                                                 U.S. Environmental Protection Agency
                                                 944 East Harmon Avenue
                                                 Las Vegas, NV 89119
1  Natural Resources Canada
  588 Booth Street
  Ottawa, Canada K1AOY7

  •Corresponding Author Contact:

   Telephone: (613)947-1228
    Facsimile: (613)947-1383
       E-mail: bert.guindon@ccrs.nrcan.gc.ca
10.1  Introduction

Over the past decade a number of programs have been undertaken to create definitive data sets of
processed satellite imagery that encompass national and global coverage at specific acquisition epochs
Initial initiatives include the Multi-Resolution Land Characteristics (MRLC), the North American
Landscape Characterization (NALC), and the GEOCover programs (Loveland and Shaw. 1996; Sohl and
pvvyer, 1998; Dykstra et al., 2000).  Subsequent initiatives have been spawned to generate information
layers from these data sets including the National Land Cover Data (NLCD) layer (Vogelmann et al.,
2001)- It is recognized that a quantitative assessment to characterize product accuracies is needed to
support their acceptance and application by the general scientific community (Zhu et al., 2000).  An
"ideal" accuracy assessment methodology for large-area products would meet the following objectives:
(a) provide an estimation of classification confidence; (b) effectively characterize spatial variations in
accuracy; (c) can be implemented coincident with the classification process (feedback mechanism); (d)
consistent and repeatable; and (e) sufficiently robust in design to support subsequent change detection
assessments.
                                                                          Page 157 of 339
                                                                               is.

-------
The most common approach to classification assessment is through the analysis of confusion matrices
(Congalton, 1991).  In this approach product classifications for a statistically robust number of samples
(n) are compared with "reference" data derived from an independent source (e.g., interpretation of aerial
photography). The cost of "reference" data acquisition represents a significant challenge. This results in
numerous limitations which include:  (1) only a small fraction of the area-of-interest is used in the
assessment process; (2) the content of a single confusion matrix is used to characterize the accuracy of
diverse areas (Zhu et al., 2000); (3) rare classes are frequently underrepresented (n)\ and (4) accuracy
characterization is limited to "macroscopic" levels (i.e., overall product and individual class levels).

Cost and logistics preclude highly detailed accuracy characterization based solely on conventional ground
reference data and therefore one must investigate complementary, albeit indirect, methods of accuracy
assessment. This chapter describes an assessment strategy based on classification consistency.  For most
land resources satellites (e.g., Landsat), extensive image overlap occurs between scenes from adjacent
World Reference System (WRS) frames.  For a given adjacent path/row pair, each scene provides a quasi-
independent classification estimate of those pixels resident in the overlap region.  Intuitively, we would
expect the level of classification agreement, hereafter referred to as classification consistency, to be
indicative of the absolute levels of classification accuracy (i.e., high levels of consistency should be
associated with high levels of classification accuracy).

The objectives here are to (1) establish a statistical link between classification consistency and both user's
and producer's accuracies, (2) develop an integrated accuracy assessment strategy to quantify
classification consistency and hence infer classification confidence, and (3) illustrate and assess  this
approach using synoptic land-cover (LC) products.
10.2   Link Between Classification Consistency and Accuracy

To develop the statistical relationship between classification consistency for user's and producer's
accuracies consider the case of two adjacent scenes, hereafter referred to as scenes number 1 and 2.  If
each scene is independently classified to a common scheme, the overlap region can be used to quantify
the classification consistency. For example, the consistency of class A in scene number 1 can be written
as:
                 M                   M
          c1A=  SP^P™  /(  SNTPITA).
                                      =]
(1)
where C,A = the consistency, is defined as the fraction of overlap pixels, classed as A in scene number 1
that are also classed as A in scene number 2, M = the number of classes, PkTA = the probability that a pixel
of true class T is labelled as class A in scene number k and NT = number of true class T pixels in the
overlap region. Note that PkTT is the producer accuracy of class T in scene k.

The user accuracy for scene number 1A will be equal to the ratio of the number of correctly classified
class A pixels to the total number labelled as A.
Page 158 of 339

-------
             = NAP1AA/(     NTP1TA).                                                         (2)
                           T=l
The restricted 2 -class scenario (i.e., classes A and B) provides useful insights for those classes within a
larger class mix whose labelling accuracy is limited primarily by pair-wise class confusion. In this case
Equation (1) reduces to:

         CIA = [f P1AA PZAA + P.BA P2BA] / [f P,AA + P1BJ,                                          (3)

where f is the ratio of numbers of true class A to true class B pixels. That is:

         f=NA/NB.                                                                            (4)

It can be seen that consistency is a function not only of the producer accuracies but also the relative class
proportions.  Similarly, user accuracy can be expressed as a function of producer accuracy and f.  For
example:

                           A + P,BA]                                                             (5)
If the two classifications are derived from similar data sources, e.g., scenes from the same sensor, each
scene will typically exhibit similar producer accuracies (i.e., P1AA = P2AA = P^, etc.). In this instance,
consistency and user accuracy will be the same for each scene:

         C,A = CM = CA =-[f PAAZ + PBA2]  / [f PAA + PBA].                                          (6)

         and

         Q,A = QZA = QA = f PAA / [f PAA + PBA].                                                  (7)
We have examined the relationships of consistency and user's accuracy as functions of producer's
accuracy and f for a range of parameters applicable to the Laurentian Great Lakes region in which LC has
been classed as either forest or non-forest. Producer's accuracies in the range 0.5 to 1.0 need only be
considered since 0.5 corresponds to random class assignment. Also, for this level of stratification, we
would expect high producer's accuracy performance (e.g., >0.8 with Landsat MSS data).  Finally, in the
Great Lakes region,/varies dramatically from approximately 0.1 in the agricultural south to 10 in the
north for forested land and vice versa for non-forested land.

Figures 10-1 and 10-2 illustrate the relationships of consistency and user accuracy with producer accuracy
respectively for f values ranging from 0.1 to 10 and a nominal class B producer's accuracy of 0.8. These
results are typical of a range of realistic cases. From an inspection of these plots we can draw a number
of conclusions:  (1) both consistency and user's accuracy increase monotonically with producer's
accuracy suggesting that consistency is an indicator of classification accuracy performance; and (2)
consistency and user's accuracy exhibit similar sensitivities to/  We hypothesize that consistency can be
                                                                                     Page 159 of 339

-------
employed as a "surrogate" of user's accuracy to monitor variations in accuracy at scene-level spatial
scales.
   o
   c
   0)
   o
   O
          Consistency as a Function of Producer's
       Accuracy for a Range of Class Proportions (f)
f=0.1
f=0.5
f=1.0
f=10.0
           0.5    0.6   0.7   0.8    0.9
                 Producer's Accuracy
Figure 10-1
Relationship of
classification consistency
as a function of producer's
accuracy for a range of
class proportions (f).  The
four cases shown span the
range of forested and non-
forested class proportions
encountered in scenes of
the Laurentian Great Lakes
watershed.
        User's Accuracy as a Function of Producer's
       Accuracy for a Range of Class Proportions (f)
                                               -0-f=0.1
                                               HD-f=0.5
                                               -A-f=1.0
                                               -0-f=10.0
                 0.6    0.7   0.8   0.9     1
                 Producer's Accuracy
          Figure 10-2
          User's accuracy as a
          function of producer's
          accuracy for a range of
          class proportions (f}.  The
          four cases shown spanned
          the range of forested and
          non-forested class
          proportions encountered in
          scenes of the Laurentian
          Great Lakes watershed.
Page 160 of 339

-------
10.3   Using Consistency Within a Classification Methodology

Our approach for applying consistency measures is dependent on the specific algorithms and
methodologies employed for our study area. The following discussion addresses key aspects of our Great
Lakes LC methodology and how they incorporate consistency and address our accuracy objectives.
Figure 10-3 illustrates the overall data processing flow.

  •  Each Landsat scene is independently classified and composited with other scenes to generate a final
    large area LC product. This approach was labor-intensive and is suitable primarily for synoptic
    mapping (i.e., categorization into a few broad classes).  However, it did have a number of important
    practical advantages which included:  (1) the thorough exploitation of image information content; and
    (2) consistency analyses was undertaken on each scene by comparing its classification with those of
    its nearest four neighbours (cross- and along-track). Thus, regional variations in classification
    accuracy, arising from inter-scene quality  differences and spatial diversity in class proportions, were
    monitored at the scene level.

  • Scene classification was achieved through unsupervised spectral clustering (K-means algorithm, 150
    clusters), followed by cluster labelling. For synoptic mapping (i.e., <10 classes), each class was
    described by a number of clusters (5-50).  Cluster-based classification had some important
    ramifications for accuracy considerations  including: (a) the true "unit of classification" was the cluster
    since it was at this level that label decision-making occurs; (b) since each class was represented by a
    number of clusters, we did not expect that the labelling of each cluster would be equally reliable; and
    (c) if consistency was evaluated at the cluster level and not at the "conventional" class level, it
    provided a better model of "microscopic" aspects of user's accuracy and an accuracy estimate closer
    to the individual pixel level than conventional class-level assessment methods.

  • Accuracy assessment was undertaken during the LC product generation process. Inter-scene
    classification comparison identified potentially mislabelled clusters since these exhibited low
    classification consistency levels. The statistical foundation for "grading" cluster label quality is
     described elsewhere (Guindon and Edmonds, 2002). Suspect clusters were then revisited and re-
     labelled before the scene classifications were composited into the final product.

  •  Consistency played a pivotal role in the classification compositing process. Consistency can be
     viewed as an indicator of the "confidence" that can be assigned to the accuracy of the class label.  For
     overlap regions, relative consistency was used to select the most likely correct classification if two or
     more  scenes predict conflicting class labels. Additionally, net consistency or confidence was
     accumulated during compositing leading to a confidence overlay, sampled at the pixel level for the
     final product. This layer encapsulated (1) parent cluster confidence, (2) the spatial distribution of
     available image data, and (3) inter-scene  information agreement where multiple scene coverage is
     available.  As such it provided a valuable ancillary product both for accuracy assessment and to
     support post-production interpretation activities.
                                                                                      Page 161 of 339

-------
Archival
Landsat Scenes (
for a Given Epoch
Land-Cover /
Classifications ^
Classification
Confidence
Based on
Consistency
Land-Cover +
Accumulated
Confidence
Scene #1^ (jScene #2^) ^Scene #3
1 1 1
Classify Individual Scenes
1 1 1
Classf n #y (Classf n #2y (ciassf n #3
i
Analyze Inter-Scene Classification
Consistency
1 1 1
1 1 I
Composite Scene Classifications
t
C^Large-Area Land-Cover PrpcJucP]>
)
N
;
)

                                                                         Figure 10-3
                                                                         Schematic diagram
                                                                         illustrating the
                                                                         processing flow used
                                                                         in the Laurentian
                                                                         Great Lakes land-
                                                                         cover mapping
                                                                         initiative.
                                                                         Classification
                                                                         consistency was
                                                                         used both to check
                                                                         individual scene
                                                                         classifications and in
                                                                         the classification and
                                                                         compositing process
                                                                         to rationalize
                                                                         multiple
                                                                         classifications in
                                                                         overlap regions and
                                                                         to generate a
                                                                         classification
                                                                         confidence layer.
10.4  Great Lakes Results


The classification and accuracy assessment methodologies outlined above were implemented using
QUAD-LACC (Guindon, 2002). Here we will illustrate example outputs relevant to the accuracy
components. These processing examples were drawn from the creation of two synoptic LC products, of
the mid 1980s and early 1990s NALC epochs. Each was sampled at 6" (longitude) by 4" (latitude) or
approximately  140 m and included four general cover classes (i.e., water, forest, urban or developed, and
other non-forest land).  For illustrative purposes, here we stratified the cover classes into two categories
(i.e., forest versus non-forest).

A total of 5,300 reference sites were identified within extended regions of thematically homogeneous
cover based on supporting evidence from aerial photography interpretation and topographic map
inspection. They represented the spectral dispersion of each class.  Since each Landsat scene
encompassed 100-150 sites, the classification labels of pixels within 5x5 pixel neighborhoods of each
site was analyzed to derive estimates of producer's accuracy. These estimates were optimistic since
pixels near inter-class boundaries are not included and should not be viewed as a measure of accuracy in
the absolute sense.
Page 162 of 339

-------
10.4.1  Variation of Consistency Among Clusters of a Given Class
                                              Distribution of Consistency Measures for
                                                   Forest Clusters of Scene 16/29
Classification consistency analysis was undertaken on a scene once it and its immediate neighboring
scenes have been classified.  The scenes from adjacent paths were most important since they provide the
greatest overlap and were not temporally correlated to the central scene.  Using QUAD-LACC,
consistency evaluations were performed at the cluster level with each cluster assigned an integral
consistency measure of 0.0 to 10.0 corresponding to a range of classification agreement of 0.0 to 100%.
As an example, we use the case of scene 16/29 from the 1990s epoch. The LC of this scene was
approximately equally divided between forest and non-forest classes with the forest class encompassing a
total of 52 clusters. An analysis of the two cross-track overlap regions (i.e., with scenes 15/29 and 17/29),
indicated that 76.4% of 710,610 overlap pixels, classed as forested in scene 16/29, were also labelled as
forest in  one of the cross-track neighboring scenes leading to an overall class measure of 8.0. For the
hypothesis that all  clusters were
equivalent in terms of consistency, we
estimated the approximate dispersion in
cluster consistency measures from
binomial theory (Thomas and Allcock,
1984). Assuming equal pixel
populations per cluster, the predicted
1.0-sigma spread in consistency among
clusters should be only ± 0.05% (i.e.,
practically all clusters should exhibit a
consistency measure of 8.0).
Figure 10-4 shows the spread in
observed consistency measures for the
clusters of scene 16/29. Note that the
histogram contained 104 entries since
each overlap region provides an
independent measure estimate for each
cluster. The observed distribution was
much broader than predicted by the
binomial model, indicating that there is  Figure 1(M  Histogram of consistency levels for forest
a significant spread in classification                  clusters of scene 16/29.  The dispersion among
quality among clusters  and hence,                   values is indicative of the broad differences in
added accuracy information was                     classification "quality" among member clusters
available at the cluster  level.                        within a given class.
                                                         345678
                                                             Consistency Measure
10
10.4.2  Aspects of Scene-Based Consistency Overlays

Once a final classification was obtained for a given scene, a "confidence" overlay was produced where
the confidence value of each pixel corresponded to the consistency level of its parent cluster. Example
results are shown in Figure  10-5 for the 1980s path/row scene  17/29. Figure 10-5 illustrates the three
primary classes (i.e., water (dark), non-forested  land (medium grey) and forested (white)).  Figure 10-6
shows an enlargement of the confidence layer of the central portion of 10-5. The confidence range 0.0 to
                                                                                  Page 163 of 339

-------
10.0 was presented as a grey level scale from black to white. The following points are worthy of note: (1)
Water was easily recognized and hence the central portions of most water bodies exhibited a high,
uniform confidence level; (2) Pixels along inter-class boundaries, such as the edges of lakes or forest
patches, tend to be of low confidence (Figure 10-6). They are members of clusters containing primarily
"mixed" pixels and therefore have a low accuracy; and (3) Forested areas exhibit a slightly higher average
confidence than non-forested areas. This is related to the fact that this scene has more forest than non-
forest cover. Consequently, the population of pixels classed, as forest will contain a relatively lower
proportion of commission errors, resulting in a corresponding higher level of inter-scene classification
consistency.
        Figure 10-5.  LC classification of a portion of scene 17/29.  Three classes are
                     shown, water (dark), non-forest land (medium grey) and forest (bright).
Page 164 of 339

-------
       UASy~F»i
       r*^  > ^ * .T '••*••
       Figure 10-6. Confidence overlay, derived from cluster consistency analyses, for the
                   central quadrant of the classification where brightness is proportional
                   to confidence.
10.4.3  Aspects of the Accumulated Confidence Layer

Figures 10-7 and 10-8 illustrate a portion of a three-class LC product and accompanying confidence
overlay, respectively. The inter-scene overlap regions are readily distinguishable in Figure 10-8 by their
higher levels of accumulated confidence. In these regions, significant confidence variations still arise
either from conflicting classifications or information loss in one of the constituent scenes because of
cloud contamination (e.g., in central Michigan). Finally, in Figure 10-7 there are data gaps, appearing as
near-horizontal black lines, that arise because of along-track data loss during the pre-processing steps of
resolution reduction and haze removal (Guindon and Zhang, 2002).
                                                                                 Page 165 of 339

-------
 Figure 10-7. Three class (water (dark), non-forest (medium grey), and forest (bright)) land-cover
             product of the central portion of the Laurentian Great Lakes watershed.
Page 166 of 339

-------
Figure 10-8. Accumulated confidence layer for the classification where brightness is proportional
           to confidence. On the consistency scale described in the text, numerical values
           range from 0.0 to 10.0 in non-overlap regions and 0.0 to 40.0 in regions where up to
           four individual scenes contribute classification estimates.
                                                                             Page 167 of 339

-------
 10.4.4  Relationship of Accumulated Confidence and User's Accuracy

 The data set of 5,300, 5x5 pixel reference sites was used to investigate the relationship between
 accumulated confidence and user's accuracy. For each pixel, the appropriate reference class was
 compared to the assigned class of the final LC product. Confusion matrices were generated for pixels
 grouped according to number of contributing scenes and accumulated confidence. User's accuracies of
 both the forested and non-forested classes were computed for each matrix.  Figure 10-9 shows the
 relationship of user's accuracy versus accumulated confidence for those pixels whose classification is
 determined based on two scenes. The monotonic relationship between these variables confirms the earlier
 statistical arguments that consistency is a legitimate "surrogate" of user's accuracy.
                     User's Accuracy versus Accumulated Confidence
                                       (Consistency)
               o

               3
               O
               O
               (Ti

               OJ
               Cfi
     -&-Forest
     HU- Non-forest
                                5       10       15
                            Accumulated Confidence
20
           Figure 10-9.  Plot of user's accuracy versus accumulated confidence for
                       forested and non-forested reference sites located in areas
                       where two scene classifications are available. The results
                       indicated that classification confidence based on consistency
                       monotonically increases with increasing  user's accuracy and
                       therefore is a useful indicator of the latter.
Page 168 of 339

-------
10.5   Conclusions

Multiple scene LC products can be expected to exhibit significant internal variations in user accuracy.
Detailed characterization of this variability was not feasible using conventional ground reference
sampling because of cost and logistics.  However, the level of inter-scene classification consistency
provides an indirect "surrogate" measure and was used to gauge local accuracy. This alternative approach
was especially attractive for application with Landsat-based maps since extensive overlap areas exist for
adjacent orbital paths located in non-equatorial latitudes.

Consistency measures were effectively employed using a number of processing steps.  First, assessments
were evaluated at the cluster level thereby providing an estimation of performance at the level of the
labeling unit rather than only at the class level. Then, by analyzing the consistency during the product
generation phase, detection and correction of incorrectly labelled clusters was accomplished prior to the
creation of the final product thereby improving its quality. Finally, within the inter-scene overlap regions.
consistency served as a "compositing" criterion to select an optimum label and can be accumulated to
encapsulate the added confidence associated with multiple independent class estimations.
 10.6   Summary

 During the past decade, a number of initiatives have been undertaken to create systematic national and
 global data sets of processed satellite imagery. An important application of these data is the derivation  of
 large geographic area (i.e., multi-scene) LC products. These products exhibit internal variations in
 information quality for two principal reasons. First, they have been assembled from a multi-temporal mix
 of satellite scenes acquired under differing seasonal  and atmospheric conditions. Second, intra-product
 landscape diversity will lead to spatially varying levels of class commission errors. Detailed modeling  of
 these variations with conventional ground truth is prohibitively expensive and hence an alternative
 accuracy assessment method must be sought.

 In this chapter we presented a method for confidence estimation based on the analysis of classification
 consistency in regions of overlapping coverage between Landsat scenes from adjacent orbital paths and
 rows.  A LC mapping methodology has been developed that exploits consistency evaluation to
 CD improve scene-based  classification performance; (2) support the integration of scene classifications
 through compositing; (3) provide a detailed confidence characterization of the final product; and
 (4) conduct post-generation accuracy assessment. This methodology was implemented within a prototype
 mapping system, QUAD-LACC, to derive  synoptic LC products of the Laurentian Great Lakes
 watershed. It should be noted that other researchers have suggested using overlap regions to assess the
 accuracy of landscape metrics (Brown et al., 2000).
                                                                                      Page 169 of 339

-------
10.7   References

Brown, D.G., J.D. Duh, and S.A. Drzysga. Estimating error in the analysis of forest fragmentation change
    using North American landscape characterization (NALC) data. Remote Sens. Environ., 71, 106-117,
    2000.

Congalton, R.G.  A review of assessing the accuracy of classifications of remotely sensed data. Remote
    Sens. Environ., 3 7, 3 5-46, 1991.

Dykstra, J.D., M.C. Place, and R.A. Mitchell.  GEOCOVER-ORTHO: Creation of a seamless
    geodetically accurate, digital base map of the entire earth's land mass using Landsat multispectral
    data. Proceedings of the ASPRS 2000 Conference, Washington, D.C., 7, pp. 2000.

Guindon, B. QUAD-LACC: A proto-type system to generate and interpret satellite-derived land cover
    products. Proceedings of the IGARSS2002 Symposium, Toronto, Ontario, pp. 1459-1461, June 24-28,
    2002.

Guindon, B. and, C.M. Edmonds. Large-area land cover mapping through scene-based classification
    compositing. Photogrammetric Engineering and Remote Sensing, 68(6), 589-596, 2002.

Guindon, B., and Y. Zhang. Robust haze reduction: an integral processing component in satellite-based
    land cover mapping.  Proceedings of the Joint International Symposium on Geospatial Theory,
    Processing and Applications, Ottawa, Ontario, CD, 5 p., July 8-12, 2002.

Loveland, T.R., and D.M. Shaw. Multiresolution land characterization: Building collaborative
    partnerships. In: GAP Analysis: a Landscape Approach to Biodiversity Planning (J.M. Scott and
    F. Davis, Editors), ASPRS, Bethesda, MD, pp. 83-89,1996.

Sohl, T.L., and J.L. Dwyer. North American landscape characterization project: The production of a
    continental scale three-decade Landsat data set.  Geocarto International, 13, 43-51,1998.

Thomas, I.L., and C.M. Allcock. Determining the confidence level for a classification. Photogrammetric
    Engineering and Remote Sensing, 50, 1491-1496, 1984.

Vogelmann, J.E., S.M. Howard, L. Yang, C.R. Larson, B.K. Wylie, and N. Van Driel. Completion of the
    1990s national land cover data set for the conterminous United States from Landsat Thematic Mapper
    data and ancillary data sources.  Photogrammetric Engineering and Remote Sensing, 67, 650-662,
    2001.

Zhu, Z., L. Yang, S.V. Stehman, and R.L. Czaplewski. Accuracy assessment for the U.S. Geological
    Survey regional land-cover mapping program: New York and New Jersey region. Photogrammetric<
    Engineering and Remote Sensing, 66, 1425-1435, 2000.
.Page 170 of 339

-------
                                              by

                                    Phaedon C. Kyriakidis1*
                                        Xiaohang Liu2
                                    Michael F. Goodchild1
      Department of Geography
      University of California at Santa Barbara
      Santa Barbara, CA 90106

      * Corresponding Author Contact:
      Telephone: (805) 893-2266
        Facsimile: (805)893-3146
           E-mail: phaedon@qeoa.ucsb.edu
Department of Geography and
   Human Environmental Studies
San Francisco State University
San Francisco, CA 94132
HH.H

Thematic data derived from remotely sensed imagery lie at the heart of a plethora of environmental
models at local, regional, and global scales. Accurate thematic classifications are therefore becoming
increasingly essential for realistic model predictions in many disciplines. Remotely sensed information
and resulting classifications, however, are not error free, but carry the imprint of a suite of data
acquisition, storage, transformation, and representation errors and uncertainties (Zhang and Goodchild,
2002).  The increased interest in characterizing the accuracy of thematic classification has promoted the
practice of computing and reporting a set of different, yet complementary, accuracy statistics all derived
from the confusion matrix (Congalton, 1991; Stehman, 1997; Congalton and Green, 1999; Foody, 2002).
Based on these accuracy statistics, users of remotely sensed imagery can evaluate the appropriateness of
different maps on their particular application, and subsequently decide to retain one classification versus
another.

Accuracy statistics, however, express different aspects of classification quality, and consequently appeal
differently to different people, a fact that hinders the use of a single measure of classification accuracy
(Congalton, 1991; Stehman, 1997, Foody, 2002). Recent efforts to provide several measures of map
accuracy based on map value (Stehman, 1999) constitute a first attempt to address this problem, but in
practice map accuracy is still communicated in the form of confusion-matrix-based accuracy statistics.
The confusion matrix, and all derived accuracy statistics, however, is a regional (location-independent)
measure of classification accuracy: it does not pertain to any pixel or sub-region of the study area. For
example, user's accuracy denotes the probability that any pixel classified as forest is actually forest on the
                                                                                 Page 171 of 339

-------
ground. In this case, all pixels classified as forest have the same probability of belonging to that class on
the ground, a fact that does not allow identification of pixels or sub-regions (of the same class) that
warrant additional sampling.  A new sampling campaign based on this type accuracy statistic, would just
place more samples at pixels allocated to the class with the lower user's accuracy measure, irrespectively
of the location of these pixels and their proximity to known (training) pixels. In other words, confusion-
matrix-based accuracy assessment has no explicit spatial resolution; it only has explicit class resolution.

In this work, we capitalize on the fact that conventional (hard) class allocation is typically based on the
probability of class occurrence at each particular pixel calculated during the classification procedure.
Maps of such posterior probability values portray the spatial distribution of classification quality, and are
extremely useful supplements to traditional accuracy statistics (Foody et al., 1992). As opposed to
confusion-matrix-based accuracy assessment, such maps could identify pixels of the same category where
additional sampling is warranted, based precisely on a measure of uncertainty regarding class occurrence
at each particular pixel.

Evidently, the above classification uncertainty maps will depend on the classification algorithm adopted.
Conventional classifiers typically use the information brought by reflectance values (feature vector)
collocated at the particular pixel where classification is performed. In some cases, however, classes are
not easily differentiated in the spectral (feature) space, due to either sensor noise or to the inherently
similar spectral response of certain classes. Improvements to the above classification procedures could be
introduced in a variety of ways, including geographical stratification, classifier operations, post-
classification sorting, and layered classification (Hutchinson, 1982; Jensen, 1996; Atkinson; Lewis,
2000). The above methods enhance the classification procedure by introducing, explicitly or implicitly,
contextual information (Tso and Mather, 2001). Within this contextual classification framework, one of
the most widely used avenues of incorporating ancillary information is that of pixel-specific prior
probabilities (Strahler, 1980;  Switzer et al., 1982).

Along these lines,  we propose a simple, yet efficient, method for modeling pixel-specific context
information using  geostatistics (Isaaks and Srivastava, 1989; Cressie, 1993; Gooyaerts, 1997).
Specifically, we adopt indicator kriging to estimate the conditional probability that a pixel belongs to a
specific class, given the nearby training pixels and a model of the spatial correlation for each class
(Journel, 1983; Solow, 1986; van der Meer, 1996).  These context-based probabilities  are then combined
with conditional probabilities of class occurrence derived from a conventional (non-contextual)
classification via Bayes' rule to yield posterior probabilities that account for both spectral and spatial
information. Steele (2000) and Steele (2001) used a similar approach based on Bayesian integration of
spectral and spatial information, the latter being derived using the nearest neighbor spatial classifier. In
this work, we also use Bayes' rule to merge spatial and spectral information, but we use the indicator
kriging classifier that incorporates texture information via the indicator covariance of each class. De
Bruin (2000) and Goovaerts (2002) also adopted similar approaches using indicator kriging, but did not
link them to contextual classification. This research extends the above approaches in a formal contextual
classification framework, and illustrates their use for mapping thematic classification uncertainty.

Once posterior probabilities of class occurrence are derived at each pixel, they can be converted to
classification accuracy values. In this work, we distinguish between classification uncertainty and
classification accuracy: A measure of classification uncertainty, such as the posterior probability of class
occurrence, at a particular pixel does not pertain to the allocated class label at that pixel, whereas a
measure of classification accuracy pertains precisely to the particular class label allocated at that pixel.
We propose a simple procedure for converting posterior probability values to classification accuracy
values, and we illustrate its application in the case study section of this chapter using a realistically
simulated data set.
Page 172 of 339

-------
HH.2  Methods

Let C(n) denote a categorical random variable (RV) at a pixel with 2D coordinate vector m = («„ «2)
within a study area A. The RV C(m) can take K mutually exclusive and exhaustive outcomes
(realizations): (c(m) = clc,k = 1,..., K}, which might correspond to K alternative land cover types.  In this
chapter, we do not consider fuzzy classes, i.e., we assume that each pixel on is comprised only of a single
class, and do not consider the case of mixed pixels.

Let/>t[c(rai)] = Prob{C(ti) = ct} denote the probability mass function (PMF) modeling uncertainty about
the fc-th class ck at location a. In absence of any relevant information, this probability Pk[c(m)] is deemed
constant within the study area A, i.e.: pt [c(m)] =pl[,VueA.  For the set of K classes, these K probabilities
are typically estimated from the class proportions based on a set of G training samples cg = [c(m^, g =
                                      1  C"
1,..., G]' within the study area A, s&:p\ =— £ »0ng), where it(ng)= 1 if pixel ng belongs to the £-th
                                     G g=\

class, 0 if not (superscript' denotes transposition). In a Bayesian classification framework of remotely
sensed imagery, these K probabilities {ph k =  I,..., K} are termed prior probabilities, because they are
derived before the remote sensing information is accounted for.

11.2.1   Classification Based on Remotely Sensed Data

Traditional classification algorithms, such as the maximum likelihood (ML) algorithm, update the prior
probability pt of each class by accounting for local information at each pixel w derived from reflectance
data recorded in various spectral bands.  Given a vector s(ra) = [x,(mi)>—^sOO]'  of reflectance values at a
pixel E in the study area, an estimate of the conditional (or posterior) probability />t[c(mi)|s(ra)] =
Prob{C(im) = cjs(nn)} for a pixel m to belong to the fc-th class can be derived via Bayes' rule as:


           •r  /• M  / vi   n   L- r™ \    I   i M   P*[s(u)  c(u) = C*]-  D
          p t [c(m) s(m)] = Prob {C(ui) = cJ  s(ta)} =         	—	  ^*                      m
                                                          •r^x^*                    **
where: p [s(ra)|c(n) = ck] = Prob'{Xt(n) = xt(u),...J[B(m) = xe(m) \c(m) = ck} denotes the class-
conditional multivariate likelihood function, i.e., the PDF for the particular spectral combination x(nn) =
[r,(iia),...^cB(iui)]' to occur at pixel m, given that the pixel belongs to class k.  In the denominator,
/[s(m)] = Prob'{Xt(n) = JC,(M),... ,XB(n) = xe(w)} denotes the unconditional (marginal) PDF for
the same spectral combination S(M) to occur at the same pixel. For a particular pixel m, this latter
marginal PDF is just a normalizing constant (a scalar). It is common to all K classes, i.e., does not affect
                                                            f£
the allocation decision, and it is typically computed as: /?'[S(E)] = X p' [x(u) \c(n)  = cj -p \, to ensure
                                                           *=i

that the sum of the resulting K conditional probabilities {p\ [c(w) \ S(M)], k = I,..., K} is 1.  The final
step in the classification procedure is typically the allocation of pixel in to the class cm with the largest
conditional probability: p'jc(m) \ s(ui)] = max{pj [c(u) \x(u)],k = l,...,K}. which is termed
maximum a posteriori (MAP) selection.      *

In the case of Gaussian maximum likelihood (GML), the likelihood function is 5-variate Gaussian, and
fully specified in  terms of the (Bxl) class-conditional  multivariate mean vector
                                                                                   Page 173 of 339

-------
m* =[E{Xb(n)\c(n) = ck},b = \,.

St = [Cov{Xb(n),Xb.(m) \ c(u) =
likelihood function then becomes:
                                   ,B]' and the (BxB) variance-covariance matrix

                                    ,M' = 1.....5] of reflectance values.  The exact form of the
                                                                                              (2)
where  Et| and £~' denote, respectively, the determinant and inverse of the class-conditional variance-
covariance matrix Et .

In many cases, there exists ancillary information that is not accounted for in the classification procedure
by conventional classifiers.  One approach to account for this ancillary information is that of local prior
probabilities, whereby the prior probabilities p\ are replaced with, say, elevation-dependent probabilities
p'k[c(n) | e(n)] , where e(w)  denotes the elevation or slope value at pixel u. Such probabilities are
location-dependent due to the spatial distribution of elevation or slope.

In the absence of ancillary information, the spatial correlation of each class (which can be modeled from a
representative set of training samples) provides important information that should be accounted for in the
classification procedure.  Fragmented classifications, for example, might be incompatible with the spatial
correlation of classes inferred from the training pixels.  This characteristic can be expressed in
probabilistic terms via the notion that a pixel in is more likely to be classified in class k than in class k\
i.e., pjc(m) | S(M)] > pt.[c(u) \ s(iu)] , if the information in the neighborhood of that pixel indicates the
presence of a A>class neighborhood. This notion of context is typically incorporated in the remote sensing
literature via Markov random field models (MRFs), see for example Li (2001) or Tso and Mather (2001)
for details.

11.2.2   Geostatistical Modeling of Context

In this chapter, we propose an alternative procedure for modeling context based on indicator geostatistics,
which provides another way for arriving at local prior probabilities pk [c(in) | cg ]  given the set of G class
labels £g = [c(mg ), g = 1, . . . , G] ' , see for example Goovaerts (1997). Contrary to the MRF approach, the
geostatistical alternative: (i) does not rely on a formal parametric model, (ii) is much  simpler to explain
and implement in practice, (iii) can incorporate complex spatial correlation models which could also
include large-scale (low-frequency) spatial variability, and (iv) provides a formal way of integrating other
ancillary sources of information to yield more realistic local prior probabilities.

Indicator geostatistics (Joumel, 1983; Solow, 1986) is based on a simple, yet effective, measure of spatial
correlation:  the covariance  
The indicator covariance 0"t(h) quantifies the frequency of occurrence of any two pixels of the same
category k, found to distance units apart. Intuitively, as the modulus of vector h becomes larger, that
frequency of occurrence would decrease. Note that the indicator covariance is related to the bivariate
Page 174 of 339

-------
probability Prob{lk (m + Bn) = 1, Ik (in) = 1} of two pixels of the same Jt-th category being to distance
units apart, and is thus related to joint count statistics. For an application of joint count statistics in
remote sensing accuracy assessment, the reader is referred to Congalton (1988).

Under second-order stationarity, the sample indicator covariance crj(to)  of the £-th category for a
separation vector to is inferred as:

                                                      1   G(h)
                                           
-------
When modeling context at pixel u via the local conditional probability /?j [c(u)|c K], the G(u) weights
{w*(ux), g = l,...,G(u)} for the k-th category indicators are derived per solution of the (ordinary indicator
kriging) system of equations:
                       «'=!
                       ,;«„>                                                                        (6)
where y/t denotes the Lagrange multiplier that is linked to the constraint on the weights, see Goovaerts
(1997) for details.  The solution of the above system yields a set of G(u) weights that account for:  (i) any
spatial redundancy in the training samples by reducing the influence of clusters, and (ii) the spatial
correlation between each sample indicator it(uK) of the k-th category and the unknown indicator /t(u) for
the same category.

A favorable property of OIK is its data exactitude: At any training pixel, the estimated probability
pi [c(u)|cx] identifies the corresponding observed indicator. For example: pi [c(ux)|cs] = it(ug).  This
feature is not shared by traditional spatial classifiers, such as the nearest neighbor classifier (Steele, et al.,
2001), which allow for misclassification at the training locations.  On the other hand, at a pixel u that lies
further away from the training locations than the correlation length of the indicator covariance model ak ,
the estimated OIK probability is very similar to the corresponding prior class proportion (i.e.,
pi [c(u) |  cx] a pk).  In short, the only information exploited by IK is the class labels at the training sample
locations, and their spatial correlation.  Nearby training locations, IK is faithful to the observed class
labels, whereas away from these locations IK has no other information apart from the K prior (constant)
class proportions {pk,k= \,...,K}.

 11.2.3  Combining Spectral and Contextual Information

Once the two conditional probabilities/^ [c(u)|x(u>] ar\dp'k [c(u)|cs] are derived from  spectral and spatial
 information, respectively, the goal is to fuse these probabilities into an updated estimate of the conditional
probability/?^ [c(u) | x(u),c^, ] = Prob{C(u) = ck x(u),c^,}, which accounts for both information sources. In
what follows, we will drop the superscript * from the notation for simplicity, but the reader should bear in
mind that all quantities  involved are estimated probabilities. In accordance with Bayesian terminology,
we will refer to the individual source conditional probabilities, />j [c(u)|x(u)J and pi [c(u)\cg], as pre-
posterior probabilities, and retain the qualifier posterior only for the final conditional probability
p*k [c(u)|x(u), CK] that accounts for both information sources.

 Bayesian updating of the individual source pre-posterior probabilities for,  say, the &-th class  is
accomplished by writing the posterior probability /»t[c(u)|x(u),cg] in terms of the prior probability/^ and
the joint likelihood function/? [x(u), cjc(u) = ck}\


                                  {C(u) = ct \ x(u),cg} = P ['(u)'C'  ' c(°) = C"}'P"
                                                                 p [x(u),cj
 Page 176 of 339

-------
where:

 p [x(u), cg | c(u) = ck ] = Prob{ AXu,) = x(Ul),..., *(us) = x(ufl), C(u,) = c,  ,..., C(u(;) = ci(, | c(u) = ct}
denotes the probability that the particular combination of B reflectance values and G sample class labels
occurs at pixel u and its neighborhood (for simplicity, G and G(u) are not differentiated notation-wise).
In the denominator,/? [x(u), CK] denotes the marginal (unconditional) probability, which can be expressed
in terms of the entries of the numerator using the law of total probability.

Assuming class-conditional independence between the spatial and spectral information, that is,
/7[x(u), tg \ c(u) = c k] = />[x(u) I c(u) = ck ] • p{dg, \ c(u) = ct], one can write:
Class-conditional independence implies that the actual class c(u) = ck at pixel u suffices to model the
spectral information independently from the spatial information and vice versa. Although conditional
independence is rarely checked in practice, it has been extensively used in the literature because it renders
the computation of the conditional probability tractable.  It appears in evidential reasoning theory
(Bohnam-Carter, 1994), in multisource fusion (Benediktsson et al., 1990; Benediktsson and Swain, 1992),
and in spatial statistics (Cressie, 1993).  The consequence of this assumption is that one can combine
spectra-derived and spatial derived probabilities without accounting for the interaction of spectral and
spatial information.

Using Bayes' rule, one arrives at the final form of posterior probability under conditional independence
(Lee, et al., 1987; Benediktsson and Swain, 1992):
    A[c(u) x(u),cj = ft [c(u) , x(u)] . Pk [c(u) , Cg j  ' A ^ , x(ojj. Pk ^ ! — )               0)
                                     Pk                             Pk
 where c(u) denotes the complement event of the k-th class, and pk denotes the prior probability for that
 event. In the case of three mutually exclusive and exhaustive classes, forest, shrub and rangeland for
 example, if the fc-th class corresponds to forest then the complement event is the absence of forest (i.e.,
 presence of either shrub or rangeland), and the probability for that complement event is the sum of the
 shrub and rangeland probabilities.

 In words, the final posterior probability/^ [c(u)|x(u), cg] that accounts for both sources of information
 (spectral and spatial) under conditional independence is a simple product of the spectra-based conditional
 probability pk [c(u)\ x(u)] and the space-based conditional probability pk [c(u) | cg ] divided by the prior
 class probability pk.  Each resulting probability ft [c(u)|x(u), cj is finally standardized by the sum

 V /»JC(U) I *(u),cj of all resulting probabilities over all A: classes to ensure a unit sum.
 *=i
                                                                                      Page 177 of 339

-------
A more intuitive version of the above fusion equation is easily obtained as:
                                                K
where the proportionality constant is still the sum T^ pk [c(u) | x(u),c J of all resulting probabilities,
which ensures that they sum to 1.0.              *=i
This version of the posterior probability equation entails that the ratio /jt[c(u)|x(u), c^,]//^ of the final
posterior probability /^ [c(u) | x(u), cj to the prior probability^ is simply the product of the ratio
pk [c(u)|x(u)]/pt of the spectra-derived pre-posterior probability/^ [c(u)|x(u)] to the  prior probability pk
times the ratio pk[c(u)\cK]/ pk of the derived pre-posterior probability /^ [c(u) | cx ] to the prior probability
pk. Note that this  is a congenial assumption whose consequences have not received much attention in the
remote sensing literature (and in other disciplines). Under this assumption, the final posterior probability
pk [c(u)|x(u), CK] can be seen as a modulation of the prior probability/?* by two factors:  The first factor pk
[c(u)\\(u)]/ pk quantifies the influence of remote sensing, while the second factor/^ [c(u)\cg]/pt
quantifies the influence of the spatial information.

Note that, in the above formulation, both information sources are deemed equally reliable, which need not
be the case in practice.  Although individual source pre-posterior probabilities in the  fusion Equation 9
can be discounted via the use of reliability exponents (Benediktsson and Swain, 1992; Tso and Mather,
2001), this avenue is not explored in this chapter due to space limitations.

11.2.4   Mapping Thematic Classification Accuracy

The set of A: posterior probabilities of class occurrence {pk, [c(u)|x(u), cj, k' = \,...,K} derived at a
particular pixel u can be readily converted into a classification accuracy value a(u). If pixel u is allocated
to, say, category ck, then a measure of accuracy associated with this particular class allocation is simply
o(u) =pk. Jc(u)|jc(u), CK], whereas a measure of inaccuracy (error) associated with this allocation is
1  - a(u) =1 - pk.  k [c(u)|jc(u), c^]. If such posterior probabilities are available at each pixel u, any
classified map product can be readily accompanied by a map (of the same  dimensions), which depicts the
spatial distribution of classification accuracy.

The accuracy value at each pixel u is a sole function of the K posterior probabilities available at that
pixel; different probability values will therefore yield different accuracy values at the same pixel.
Evidently, the more realistic is the set of posterior probabilities at a particular pixel u, the more realistic
the accuracy value at that pixel will be.  Consider for example, the set of K pre-posterior probabilities
(Pk- [c(u)|x(u)J, k' = \,...,K} derived from a conventional maximum likelihood classifier (subsection
11.2.1), and the set of K posterior probabilities (/v[c(u)|x(u), cj, k '= \,...,K} derived from the proposed
fusion of spectral and spatial information (subsection 11.2.3).  These two sets of probability values will
yield two different accuracy measures ac (u) and  af(u) at the same pixel u (subscripts c and/distinguish
the use of conventional versus fusion-based probabilities). It is argued that the use of contextual
information for deriving the latter posterior probabilities yields a more realistic accuracy map than that
typically constructed using the former pre-posterior probabilities derived from a conventional classifier
(Foody etal., 1992).
Page 178 of 339

-------
11.2.5   Generation of Simulated TM Reference Values

This section describes a procedure used in the case study (subsection 11.3.1) to realistically simulate a
reference classification and the corresponding set of six TM spectral bands.  Availability of an exhaustive
reference classification allows computation of accuracy statistics without the added complication of a
particular sampling design.

Starting from a raw TM imagery, a subscene is classified into L clusters using the Iterative Self-
Organizing Data Analysis Technique (ISODATA) clustering algorithm (Jensen, 1996).  These L clusters
are assigned into K known classes.  To reduce the degree of fragmentation in the resulting classified map,
the classification is smoothed using MAP selection within a window around each pixel u (Deutsch, 1998).
The resulting land-cover (LC) map is regarded as the exhaustive reference classification.

Based on this reference classification, the class-conditional joint PDF of the six TM bands is modeled as
multivariate Gaussian with mean and covariance derived from raw TM bands.  Let mX|t and EXI* denote
the (6x1) vector of class-conditional mean and the (6  x 6) matrix of class-conditional (co)variances of
the raw reflectance values in the k-th class.  Let mx and Ex denote the (6x1) mean vector and (6 x 6)
covariance matrix, respectively, of the above K class-conditional mean vectors  jm'jL,& = !,...,&}.  A
set of ^simulated (6 x 1) vectors \mx]lc,k = !,...,#}  of class-conditional means are generated from a
six-variate Gaussian distribution with mean mx and covariance Ex. In the case  study, simulated class-
conditional mean vectors [mx|t,&  = !,...,£ j were used instead of their original counterparts
/ m ™, fc = 1,..., K }  in order to introduce class confusion.  Simulated reflectance values are then
generated for each pixel in the reference classification from the appropriate  class-conditional distribution,
which is assumed Gaussian with mean mxjjt, and covariance  L^ •  F°r example,  if a pixel in the
reference classification has land cover forest (k=J), six simulated reflectance values are simulated at that
pixel from a Gaussian distribution with mean mx|1 and covariance E'^,.  A similar procedure for
generating synthetic satellite imagery (but without the simulation of class-conditional mean values
fm    k = \,..., K\) was adopted  by Swain et al.  (1981) and Haralick and  Joo (1986).  The simulated
reflectance values are further degraded by introducing  white noise generated by a six-variate Gaussian
distribution with mean 0 and (co)variance 0.2 Sx; this  entails that the simulated noise is correlated from
one spectral band to another.

Independent simulation of reflectance values from one pixel to another implies the non-realistic feature of
low spatial correlation in the simulated reflectance values.  In the case study, in order to enhance spatial
correlation as well as positional error, typical of real images, a motion blur filter with a horizontal motion
of 21 pixels in the -45° direction was applied to each band to  simulate the linear motion of a camera. The
resulting reflectance values were further degraded by addition of a realization of an independent
multivariate white noise process, which implies correlated noise from one spectral band to another. This
latter realization was generated using a multivariate Gaussian distribution with mean 0 and (co)variance
0.05 2JX-  To avoid edge effects introduced by the motion blur filter, the results of Gaussian maximum
likelihood classification, as well as those for indicator  kriging, were reported on a smaller (cropped)
subscene.

The last step in the simulated TM data generation consists of a band-by-band histogram transformation:
The histogram of reflectance values for each spectral band in the simulated image is transformed to the
histogram of the original TM reflectance values for that band through histogram equalization. The
purpose of this transformation is to force the simulated TM imagery to have the same histogram as that of
the original TM imagery, as well as similar covariance among bands. The (transformed) simulated
reflectance values are finally rounded to preserve the integer digital nature  of the data.
                                                                                     Page 179 of 339

-------
 11.3  Results

 To illustrate the proposed methodology for fusing spatial and spectral information for mapping thematic
 classification uncertainty, a case study was conducted using simulated imagery based on a Landsat
 Thematic Mapper subscene from path 41 / row 27 in western Montana, and the procedure described in
 subsection 11.2.5.  The TM imagery, collected on September 27, 1993, was supplied by the U.S.
 Geological Survey's (USGS) Earth Resources Observation Systems (EROS) Data Center and is one of a
 set from the Multi-Resolution Land Characteristics (MRLC) program (Vogelmann et al., 1998).  The
 study site consisted of a subscene covering a portion of the Lolo National Forest (541 x 414 pixels). The
 original 30m TM data served as the basis for generating the simulated TM imagery used in this case
 study.

 The subscene was classified into L=150 clusters using the ISODATA algorithm, and these L clusters were
 assigned to K=3 classes: forest (k=l), shrub (k=2), and rangeland (k=3).  The resulting classification was
 smoothed using MAP selection within a 5 x 5 window around each pixel u. The resulting LC  map is
 regarded as the exhaustive reference classification (unavailable in practice). A small subset (G=314) of
 the 541 x 414 pixels (a 0.14% of the total  population) was selected as training pixels through stratified
 random sampling.  The sample and reference class proportions of forest, shrub and rangeland were: p =
 0.65,/?2 = 0.21, p3 = 0.14, respectively. The remaining non-sampled reference pixels were used as
 validation data for assessing the accuracy of the different methods. The cropped (ranging from 7 to 530
 and from 9 to 406 pixels) reference classification and the G=314 training samples used in this study are
 shown in Figure 11-1 (a & b).
                       (a)
                   (b)
       SO  100  180  200  260  300  3SO  400  .
                                                       .11.-

                                                       MO

                                                       00

                                                       2M

                                                       MO

                                                       uo

                                                       100

                                                       n

                                                        "
.* * ***+     / '  4*
;    4-         ~ ff    0

r*4   *.   '•!'
                                                                                4*4

                                                               100  1SO 2CD  2»0  MO  380 400  450~
 •  Forest       C3 Shrub    CH  Rangeland            A  Shrub     +  Forest     O Rangeland
 Figure 11-1.  Reference classification (a) and 314 training pixels (b) selected via stratified random sampling


The class labels and the corresponding simulated reflectance values at the training sample locations were
used to derive statistical parameters, such that, the class-conditional means mx|l,mx|,,mxp  and the class-
conditional (co)variances EX|l,£'X|2, SX|3, for forest, shrub, and rangeland, respectively.  The class labels"
of the training pixels were also used to infer the three indicator covariance models, 
-------
practical range 25-30 pixels (59-61% of the total variance), and a larger-scale structure of practical range
100-120 pixels (37-38% of the total variance). The rangeland indicator covariance model, 0,, consisted
of a nugget component (1.0% of the total variance), a small-scale structure of practical range 22 pixels
(75% of the total variance), and one larger-scale  structure of practical range 400 pixels (24% of the total
variance). These covariance model parameters imply that forest and shrub have very similar spatial
correlation, which differs slightly from that of rangeland. The latter class has more pronounced small-
scale variability, and less large-scale variability, which is also of longer range than that of forest and
shrub. For further details regarding the interpretation of variogram and covariance functions computed
from remotely sensed imagery (Woodcock et al., 1988).

 Table 11-1.  Parameters of the three indicator covariance  models, o^, o2, o3, for forest, shrub, and
              rangeland, respectively.  All indicator covariances were modeled using a nugget
              contribution, and two exponential covariance structures with respective sills and
              practical ranges: sill(1), sill(2), range(1), and  range(2). Sill values are expressed as
              percentage of the total variance:  pk(1  - p») = 0.23,  0.17,0.12, for forest, shrub, and
              rangeland, respectively, range values  are expressed in numbers of pixels.

Forest
Shrub
Rangeland
Nugget
0.02
0.03
0.01
Sill
(1)
0.61
0.59
0.75
L <2>
0.37
0.38
0.75
Range
(D
30
25
22
(2)
120
100
400
 11.3.1   Spectral and Spatial Classifications

 Using the class-conditional means mx|1 ,mx|2, mxp and (co)variances £'x,, £/x,. £x,, tnree Gaussian
 likelihood functions were established for any vector x(u) of reflectance values at any pixel u not in the
 training set (Equation I).  The three Gaussian likelihood functions were subsequently inverted
 (Equation 2) to compute the three spectra-derived pre-posterior probabilities, ;:>, [e(u)|x(u)],
 p2 [c(u)  x(u)], and /?3 [c(u) | x(u)] for forest, shrub and rangeland, respectively.  These GML pre-posterior
 probabilities are shown in Figure 11-2 (a-c). Note: (1) the high degree of noise in the probabilities, (2)
 the confusion of shrub and rangeland (probabilities close to 0.5), and (3) the motion-like appearance that
 entails diffuse class boundaries. The corresponding MAP selection at each pixel u is shown in Figure 1 1-
 2 (d). Note, again, the high degree of fragmentation in  the classified map. The overall classification
 accuracy (evaluated against the reference classification) was 0.73 (Kappa=0.44) indicating a rather severe
 misclassification.

 Arguably, in the presence of noise, the original spectral vector could have been replaced by a vector of
 the same dimensions whose entries are averages  of reflectance values within a (typically 3x3)
 neighborhood around each pixel (Switzer, 1980). This, however, amounts to implicitly introduce
 contextual information into the classification procedure: Spatial variability in the reflectance values is
 suppressed via a form of low-pass filter to introduce more spatial correlation, and thus produce less
 fragmented classification maps. In the absence of noise-free data, any such filtering procedure is rather
 arbitrary: There is no reason to use a 3 x 3 versus a 5 x 5 filter, for example.  In this chapter, we propose
 a method for introducing that notion of compactness in classification via a model  of spatial correlation
 inferred from the training pixels themselves.
                                                                                     Page 181 of 339

-------
                    (a)
(b)
         100  1»  200  250 y» vo 400 ten  ton
                                                           90  100 100 200 290  300  3SO  400  450 500
      90  100  190  200  2W  300 MO 400 490  SOO
                                                           SO  100 160 200 260  MO  MO  400  450 SfO
                                                      Overall accuracy = 73.36%, Kappa - 43.93%
 Figure 11-2.  Conditional probabilities for forest (a), shrub (b), and rangeland (c), based on Gaussian
             maximum likelihood (GML), and corresponding MAP selection (d).


 Ordinary indicator kriging (OIK) (Equation 5-6) was performed using the three sets of G training class
 indicators and their corresponding indicator covariance models to compute the space-derived pre-
 posterior probabilities />, [c(u)|cj, />2[c(u)|c ],  p3[c(u)\cg] for forest, shrub, and rangeland,
 respectively. These OIK pre-posterior probabilities are shown in Figure 11-3 (a-c). Note the very smooth
 spatial patterns, and the absence of clear boundaries, as opposed to those found in the spectra-derived
 posterior probabilities of Figure 11-2.  Note, also, that the training sample class labels are reproduced at
 the training locations, per the data-exactitude property of OIK.  The corresponding MAP selection at each
 pixel u is shown in Figure 11-3 (d).  The overall classification accuracy is 0.73 (Kappa=0.44) the same as
 those computed from the spectra-derived classification, indicating the same level of severe
 misclassification for the space-derived classification.
Page 182 of 339

-------
                                       —, 1
      SO  100 150 200  260  300  360 400 460  SCO
                                                                                            	1 1
      90  100  1«0 200 260  300  MO  400  400 WO
                                                           ID  WO  150 ZOO 250  300  3SO 400
                                                           90 100  '»  J«0 :» 300  5»  4OO  4W

                                                      Overall accuracy - 73.34%, Kappa = 43.92%

Figure 11-3.  Conditional probabilities for forest (a), shrub (b), and rangeland (c), based on ordinary indicator
             kriging (OIK), and corresponding MAP selection (d).
 11.3.2   Merging Spectral and Contextual Information

 Bayesian fusion (Equation 9) was performed to combine the individually derived spectral and spatial pre-
 posterior probabilities into posterior probabilities p} [c(u)|*(u), cj, p,[c(u)|*(u), cj, and
p3 [c(u)   *(u), cj, for forest, shrub, and rangeland, respectively; these'posterior probabilities account for
 both information sources, and are shown in Figure  11-4 (a-c). Compared to the spectra-derived pre-
 posterior probabilities of Figure 11-2, the latter posterior probabilities have smoother spatial patterns, and
 much less noise.  Compared to the space-derived pre-posterior probabilities of Figure 11-3, the latter
 posterior probabilities more variable patterns, and indicate clearer boundaries. The corresponding MAP
 selection at each pixel u is shown in Figure 11-4 (d). The overall classification accuracy increased to
 0.80, and the Kappa coefficient to 0.59, a 9.6% and 34.1% improvement, respectively,  relative to the
 corresponding accuracy statistics computed from the GML classification.
                                                                                     Page 183 of 339

-------
                     (a)
      60  100  160 200  250  300  350  400 490  900

                     (c)
                                                                                                 0.5
50  100  150  200 2SO  300 350  400  460  500

                (d)
      60  100  150 200  250  300  360  400 490  COO
                                                         60 100  150 200  250 300  350  400

                                                      Overall accuracy = 79.75%, Kappa = 59.26%
 Figure 11-4.  Conditional probabilities for forest (a), shrub (b), and rangeland (c), based on Bayesian
             integration of spectra-derived and space-derived pre-posterior probabilities (GMUOIK), and
             corresponding MAP selection (d).


 For comparison, accuracy assessment statistics, including producer's and user's accuracy, for all
 classification algorithms considered in this chapter are tabulated in Table 11-2.  Clearly, classification
 accuracy using the proposed contextual classification methods was superior to that using only spectral or
 only spatial information. As stated above, overall accuracy and the Kappa coefficients are significantly
 higher for the proposed methods.  In addition, both producer's and user's accuracy for all three classes are
 higher from the corresponding values computed from the spectra-derived or the space-derived
 classifications.
Page 184 of 339

-------
        Table 11-2.  Accuracy statistics for classification based on MAP selection from
                    conditional probabilities computed using different methods:
                    Gaussian maximum likelihood (GML), ordinary Indicator kriging
                    (OIK), and Bayesian integration of GML and OIK probabilities
                    (GMUOIK).

Forest
Shrub
Rangeland

Forest
Shrub
Rangeland
GML
OIK
GMUOIK
Overall accuracy
0.73
0.73
0.80
Kappa
0.44
0.44
0.59
Producer's accuracy
0.92
0.44
0.30
0.88
0.52
0.39
0.91
0.63
0.51
User's accuracy
0.82
0.48
0.55
0.78
0.61
0.63
0.86
0.64
0.68
The reference and classification-derived class proportions are also provided in Table 11-3 for comparison.
Clearly, MAP selection from the fused posterior probabilities/?, [c(u)|jc(u), cg] yielded the closest class
proportions to the reference ones: 0.69 versus 0.65 (reference) for forest, 0.21 versus 0.21 for shrub, and
0.10 versus 0.14 for rangeland. The other methods performed worse with respect to reproducing the
reference class proportions.

            Table 11-3.  Class proportions from reference and classified maps
                        based on MAP selection from conditional probabilities
                        computed using different methods: Gaussian maximum
                        likelihood (GML), ordinary indicator kriging (OIK), and
                        Bayesian integration of GML and OIK probabilities
                        (GMUOIK).

Forest
Shrub
Rangeland
Reference
0.65
0.21
0.14
GML
0.73
0.19
0.08
OIK
0.73
0.18
0.09
GMUOIK
0.69
0.21
0.10
                                                                               Page 185 of 339

-------
 11.3.3  Mapping Classification Accuracy
 The three spectra-derived pre-posterior probabilities,pt [c(u)|x(u)],p, [c(u)|x(u>], and/?, [c(u)|x(u>] for
 forest, shrub and rangeland, respectively, were converted into an accuracy value ac(u) f°r the particular
 class reported at pixel u (i.e., for the classification of Figure 11 -2 (d)), as described in subsection 11.2.4.
 These accuracy values were mapped in Figure 11-5 (a).  The same procedure was repeated using the three
 fusion-based posterior probabilities p[ [c(u)|x(u), cj, p2 [c(u)|*(u), cj, and p3 [c(u)  *(u), cj, for
 forest, shrub, and rangeland, respectively, to yield an accuracy value oy(u) for the particular class reported
 at pixel u (i.e., for the classification of Figure 11-4 (d)).  These accuracy values were mapped in
 Figure 11-5 (b).  The accuracy map of Figure 11-5 (b) exhibited much higher values than the
 corresponding map of Figure 11-5 (a), indicating an increased confidence in classification due precisely
 to the consideration of contextual information. In addition, the low accuracy values (-0.4-0.6) of
 Figure 11-5 (b) were found near class boundaries, as opposed to the low accuracy values of
 Figure 11-5 (a), which just corresponded to pixels classified as shrub and rangeland.  This latter
 characteristic implied that contextual information yielded a more realistic map of classification accuracy
 which could be useful for designing additional sampling campaigns.





                                                                        <\
                                                   •


                                                                       MO  3M  «»  00  tO>
                               i l

                                O 9

                                0 »

                                D 7

                                0.0

                                O 5

                                04

                                O 3

                                O 3

                                O 1

                                0
                  (a)
(b)
      Figure 11-5.  Pixel-specific accuracy values for GML-derived classes (a), and for GML/OIK-derived
                  classes (b).
 11.4  Discussion

 A geostatistical approach for mapping thematic classification uncertainty was presented in this chapter
 The spatial correlation of each class, as inferred from a set of training pixels, along with the actual
 locations of these pixels, was used via indicator kriging to estimate the location-specific probability that a
 pixel belongs to certain class, given the spatial information contained in the training pixels.  The proposed
 approach for estimating the above pre-posterior probability accounted for texture information via the
 corresponding indicator covariance model for each class, as well as for the spatial proximity of each pixel
 to the training pixels after this proximity was discounted for the spatial redundancy (clustering) of the
 training pixels. Space-derived pre-posterior probabilities were merged via Bayes rule with spectra-
 derived pre-posterior probabilities, the latter based on the collocated vector of reflectance values at each
 pixel.  The final (fused) posterior probabilities accounted for both spectral  and spatial information
Page 186 of 339

-------
The performance of the proposed methods was evaluated via a case study that used realistically simulated
reflectance values.  A subset of 0.14% (314) of the image pixels was retained as a training set. The
results indicated that the proposed method of context estimation, when coupled with Bayesian integration,
yielded more accurate classifications than the conventional maximum likelihood classifier.  More
specifically, relative improvements of 10% and 34% were found for overall accuracy and the Kappa
coefficient.  In addition, contextual information yielded more realistic classification accuracy maps,
whereby pixels with low accuracy values tended to coincide with class boundaries.
11.5  Conclusions

The proposed geostatistical methodology constitutes a viable means for introducing contextual
information into the mapping of thematic classification uncertainty. Since the results presented in the
case study of this chapter appear promising, further research is required to evaluate the performance of
the proposed contextual classification, and its use for mapping thematic classification uncertainty, over a
variety of real-world data sets.  In particular, issues pertaining to the type and level of spatial correlation,
the density of the training pixels, and their effects on the resulting classification uncertainty maps should
be investigated in greater detail.

Concluding, we suggest that the final posterior probabilities of class occurrence be used in a stochastic
simulation framework, whereby multiple, alternative, synthetic representations of land cover maps would
be generated using various algorithms for simulating categorical variables (Deutsch and Journel, 1998).
These alternative representations would reproduce: (1) the observed classes at the training pixels; (2) the
class proportions; (3) the spatial correlation of each class inferred from the training pixels; and
(4) possible relationships with spectral or other ancillary spatial information.  The ensemble of simulated
land cover maps could be then used for error propagation, e.g., Kyriakidis and Dungan (2001), thus
allowing one to go beyond simple map accuracy statistics and address map use (and map value) issues.


11.6  Summary

Thematic classification accuracy constitutes a critical factor in the successful application of remotely
sensed products in various disciplines, such as ecology and environmental sciences.  Apart from
traditional accuracy statistics based on the confusion matrix, maps of posterior probabilities of class
occurrence are extremely useful for depicting the spatial variation of classification uncertainty.
Conventional classification procedures, such as Gaussian maximum likelihood, however, do not account
for the plethora of ancillary data that could enhance such a meta-data map product.

In this chapter, we propose a geostatistical approach for introducing contextual information into the
mapping of classification uncertainty using information provided only by the training pixels.
Probabilities of class occurrence that account for context information are first estimated via indicator
kriging,  and are then integrated in a Bayesian framework with probabilities for class occurrence based on
conventional classifiers, thus yielding improved maps of thematic classification uncertainty. A case study
based on realistically simulated TM imagery illustrates the applicability of the proposed method:
(I) regional accuracy scores indicate relative improvements over traditional classification algorithms in
the order of 10% for overall accuracy and 34% for the Kappa coefficient; and (2) maps of pixel-specific
accuracy values tend to pinpoint class boundaries as the most uncertain regions, thus appearing as a
promising means for guiding additional sampling campaigns.
                                                                                    Page 187 of 339

-------
11.7  References

Atkinson, P.M., and P. Lewis. Geostatistical classification for remote sensing: An introduction.
    Computers & Geosciences, 26, 361-371, 2000.

Benediktsson, J.A., and P.H. Swain. Consensus theoretic classification methods.  IEEE Transaction on
    Systems, Man, and Cybernetics, 22, 688-704, 1992.

Benediktsson, J.A., P.H. Swain, and O.K. Ersoy. Neural network approaches versus statistical methods in
    classification of multisource remote sensing data. IEEE Transaction on Geoscience and Remote
    Sensing, 28, 540-552, 1990.

Bonham-Carter, G.F.  Geographic Information Systems for Geoscientists. Pergamon, Ontario, 398 p.,
    1994.

Congalton, R.G. Using spatial autocorrelation analysis to explore the errors in maps generated from
    remotely sensed data.  Photogrammetric Engineering and Remote Sensing, 54, 587-592, 1988.

Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data.  Remote
    Sens. Environ., 37, 35-46, 1991.

Congalton, R.G. and K. Green. Assessing the Accuracy of Remote Sensed Data: Principles and
    Practices, Lewis, Boca Raton, 180 p., 1999.

Cressie, N.A.C. Statistics for Spatial Data. John Wiley & Sons, NY, 900 p., 1993.

DeBruin, S. Predicting the areal extent of land-cover types using classified imagery and geostatistics,
    Remote Sens. Environ., 74, 387-396, 2000.

Deutsch,  C.V.  Cleaning categorical variable (lithofacies) realizations with maximum a-posteriori
    selection. Computers & Geosciences, 24, 551-562, 1998.

Deutsch,  C.V. and A.G. Journel.  GSLIB: Geostatistical Software Library and User's Guide (2nd Edition).
    Oxford University Press, New York, NY, 369 p., 1998.

Foody, G.M., N.A. Campbell, N.M. Trood, and T.F. Wood.  Derivation and applications of probabilistic
    measures of class membership from the maximum-likelihood classifier.  Photogrammetric
    Engineering and Remote Sensing, 58, 1335-1341, 1992.

Foody, G.M. Status of land-cover classification accuracy assessment.  Remote Sens. Environ., 80, 185-
    201,2002.

Goovaerts, P. Geostatistics for Natural Resources Evaluation. Oxford University Press, New York, NY,
    483 p., 1997.

Goovaerts, P. Geostatistical incorporation of spatial coordinates into supervised classification of
    hyperspectral data. Journal of Geographical Systems, 4, 99-111, 2002.

Haralick, R.M. and H. Joo. A context classifier, IEEE Transactions on Geoscience and Remote Sensing,
    24,997-1007, 1986.

Hutchinson, C.F. Techniques for combining Landsat and ancillary data for digital classification
    improvement. Photogrammetric Engineering and Remote Sensing, 48,  123-130, 1982.
Page 188 of 339

-------
Isaaks, E.H. and R.M. Srivastava.  An introduction to Applied Geostatistics. Oxford University Press,
   New York, NY, 561 p., 1989.

Jensen, J.R. Introductory Digital Image Processing: A Remote Sensing Perspective.  Prentice Hall,
   Upper Saddle River, NJ, 316 p., 1996.

Journel, A.G.  Non-parametric estimation of spatial distributions.  Mathematical Geology, 15, 445-468,
   1983.

Kyriakidis, P.C. and J. L. Dungan.  A geostatistical approach for mapping thematic classification accuracy
   and evaluating the impact of inaccurate spatial data on ecological model predictions.  Environmental
   and Ecological Statistics, 8, 311-330, 2001.

Lee, T., J.A. Richards, and P.H. Swain. Probabilistic and evidential approaches for multisource data
   analysis. IEEE Transaction on Geoscience and Remote Sensing, 25, 283-293, 1987.

Li, S.Z. Markov Random Field Modeling in Image Analysis. Springer-Verlag, Tokyo, 323 p., 2001.

Solow,  A.R. Mapping by simple indicator kriging. Mathematical Geology, 18, 335-352, 1986.

Stehman, S.V.  Selecting and interpreting measures of thematic classification accuracy. Remote Sens.
   Environ., 62, 77-89, 1997.

Stehman, S.V.  Comparing thematic maps based on map value. Int. J. Remote Sensing, 20, 2347-2366,
   1999.

Steele, B.M. Combing multiple classifiers: An application using  spatial and remotely sensed information
   for land cover type mapping. Remote Sens. Environ., 74, 545-556, 2000.

Steele, B.M., and R.L. Redmond.  A method of exploiting spatial  information for improving classification
   rules: application to the construction of polygon-based land cover type maps. Int. J. Remote Sensing,
   22,3143-3166,2001.

Strahler, A.H. Using prior probabilities in maximum likelihood classification of remotely sensed data.
   Remote Sens. Environ., 47, 215-222, 1980.

Swain, P.H., S.B. Vardeman, and J.C. Tilton. Contextual classification of multispectral image data.
   Pattern Recognition, 13,429-441, 1981.

Switzer, P.  Extensions of linear discriminant analysis for statistical classification of remotely sensed data.
   Mathematical Geology, 12, 367-376, 1980.

Switzer, P., W.S. Kowalik, and R.J.P. Lyon. A prior method for smoothing discriminant analysis
   classification maps. Mathematical Geology, 14,433-444, 1982.

Tso, B. and P.M. Mather. Classification Methods for Remotely Sensed Data.  Taylor and Francis,
   London, 332 p., 2001.

van der Meer, F.  Classification of remotely sensed imagery using an indicator kriging approach:
   Application to the problem of catcite-dolomite mineral mapping. Int. J. Remote Sensing, 17, 1233-
   1249,  1996.
                                                                                  Page 189 of 339

-------
Vogelmann, I.E., T.L. Sohl, P.V. Campbell, and D.M. Shaw. Regional land cover characterization using
    Landsat Thematic Mapper data and ancillary data sources. Environ. Monitoring and Assess. ,51,415-
    428, 1998.

Woodcock, C.E., A.H. Strahler, and D.L.B. Jupp. The use of variograms in remote sensing. I: Scene
    models and simulated images.  Remote Sens. Environ., 25, 323-348, 1988.

Zhang, J. and M. Goodchild.  Uncertainty in Geographic Information.  Taylor and Francis, London,
    266 p., 2002.
 Page 190 of 339

-------
                                    Chapter 12

    An Error Matrix Approach to Fuzzy Accuracy Assessment:
                         The NIMA Geocover Project

                                           by
                                      Kass Green1
                                 Russell G. Congalton
                                                     ,2*
               1
                Space Imaging
                5915 Hollis Street
                Emeryville, CA 94608
                                               2
Department of Natural Resources
215 James Hall
University of New Hampshire
Durham, NH 03824

*Corresponding Author Contact:

 Telephone: (603) 862-4644
  Facsimile: (603) 862-4976
     E-mail: russ.congalton@unh.edu
12.1  Introduction

As remote sensing applications have grown in complexity, so have the classification schemes associated
with these efforts. The classification scheme then becomes a very important factor influencing the
accuracy of the entire project. A review of the recent accuracy assessment literature points out some of
the limitations of using only an error matrix approach to accuracy assessment with a complex
classification scheme.  Congalton and Green (1993) recommend the error matrix as a jumping-off point
for identifying sources of confusion (i.e., differences between the map created from remotely sensed data
and the reference data) and not simply the "error." For example, the variation in human interpretation can
have a significant impact on what is considered correct. If photographic interpretation is used as the
source of the reference data and that interpretation is not completely correct, then the results of the
accuracy assessment could be very misleading. The same holds true even for observations made in the
field. As classification schemes become more complex, more variation in human interpretation is
introduced (Congalton, 1991; Gong and Chen, 1992; Lowell, 1992; Congalton and Biging, 1992).

Gopal and Woodcock (1994) proposed the use of fuzzy sets to "allow for explicit recognition of the
possibility that ambiguity might exist regarding the appropriate map label for some locations on a map.
The situation of one category being exactly right and all other categories being equally and exactly wrong
often does not exist." They allowed for a variety of responses such as: absolutely right, good answer,
                                                                            Page 191 of 339

-------
acceptable, understandable but wrong, and absolutely wrong. While dealing with the ambiguity, this
approach does not allow the accuracy assessment to be reported as an error matrix.

This chapter introduces a technique using fuzzy accuracy assessment that allows for the analyst to
incorporate the variation or ambiguity in the map label and also present the results in the form of an error
matrix.  This approach is applied here to a worldwide mapping effort funding by the National Imagery
and Mapping Agency (NIMA) using Landsat Thematic Mapper (TM) imagery. The Earth Satellite
Corporation (Earthsat) performed the mapping and Pacific Meridian Resources of Space Imaging
conducted the accuracy assessment.  The results presented here are for one of the initial prototype test
areas (for an undisclosed location of the world) used for development this fuzzy accuracy assessment
process.
12.2  Background

The quantitative accuracy assessment of maps produced from remotely sensed data involves the
comparison of a map with reference information that is assumed to be correct. The purpose of a
quantitative accuracy assessment is the identification and measurement of map errors. The two primary
motivations include: (1) to provide an overall assessment of the reliability of the map (Gopal and
Woodcock, 1994); and (2) to understand the nature of map errors.  While more attention is often paid to
the first motivation, understanding the errors is arguably the most important aspect of accuracy
assessment.  For any given map class, it is critical to know the probability of the site being labeled
correctly and what classes are confused with one another.  Quantitative accuracy assessment provides
map users with a consistent and objective analysis of map quality and error. Quantitative analysis is
fundamental to map use; without it, users would make decisions without knowing the reliability of the
map as a whole or the  sources of confusion.

The error matrix is the most widely accepted format for reporting remotely sensed data classification
accuracies (Story and Congalton, 1986; Congalton, 1991).  Error matrices simply compare map data to
reference data.  An error matrix is an array of numbers set out in rows and columns which express the
number of pixels or polygons assigned to a particular category in one classification relative to those
assigned to a particular category in another classification (Table 12-1). One of the classifications is
considered to be correct (reference) and may be generated from aerial photography, airborne video,
ground observation or ground measurement, while the other classification is generated from the remotely
sensed data (observed).

An error matrix is an effective way to represent accuracy because both the total and the individual
accuracies of each category are clearly described and confusion between classes is evident. Also
indicated are errors of inclusion (commission errors) and errors  of exclusion (omission errors) that may be
present in the classification.  A commission error occurs when an area is included into a category when it
doesn't belong.  An omission error is excluding an area from the category in which it does belong. Every
error is an omission from the correct category and a commission to a wrong category. For example, in the
error matrix in Table 12-1 there are four areas that were classified as deciduous when the reference data
shows that they were actually conifers.  Therefore, four areas were omitted from the correct coniferous
category and committed to the incorrect deciduous category.  Utilizing this information, users can
ascertain the relative strengths and weaknesses of each map class, creating a more solid basis for decision-
making.
Page 192 of 339

-------
  Classified
    Data
                   SB
               Column
                 Total
                          Table 12-1.  Example error matrix.

                                   Reference Data
                                                      Row
                          D      C     AG    SB    Total
                     Land-Cover Categories
                       D = deciduous

                       C = conifer

                      AG = agriculture

                      SB = shrub

                     Overall Accuracy =
                     (63 + 79 + 85 + 89)/424 =
                     316/424 = 75%
               Producer's Accuracy
               D  = 63/73  = 86%
               C  = 79/101 = 78%
               AG = 85/118 = 72%
               SB = 89/132 = 67%
   User's Accuracy

D   = 63/113 =  56%
C   = 79/101 =  78%
AG = 85/107 =  79%
SB = 89/103 =  86%
Additionally, the error matrix can be used to compute overall accuracy, and producers and users
accuracies (Story and Congalton 1986). Overall accuracy is simply the sum of the major diagonal (i.e.,
the correctly classified sample units) divided by the total number of sample units in the error matrix. This
value is the most commonly reported accuracy assessment statistic. User's and producer's accuracies are
ways of representing individual category accuracies instead of just the overall classification accuracy.

One of the assumptions of the traditional or deterministic error matrix is that an accuracy assessment
sample site can have only one label. However, classification scheme rules often impose discrete
boundaries on continuous conditions in nature. In situations where classification scheme breaks represent
artificial distinctions along a continuum of land-cover (LC), observer variability is often difficult to
control and, while unavoidable, can have profound effects on results (Congalton and Green, 1999).
While it is difficult to control observer variation, it is possible to use a fuzzy assessment approach to
compensate for differences between reference and map data that are caused not by map error, but by
variation in interpretation (Gopal and Woodcock, 1994). In this study, both deterministic error matrices
and those using the fuzzy assessment approach were compiled.
                                                                                 Page 193 of 339

-------
12.3  Methods

Accuracy assessment requires the development of a statistically rigorous sampling design of the location
(distribution) and type of samples to be taken, or collected. Several considerations are critical to the
development of a robust design to support an accuracy assessment that is truly representative of the map
being assessed. Important design considerations include the following:

   •  What are the map classes and how are they distributed?  How a map is sampled for accuracy will
      partially be driven by how the categorical information of interest is spatially distributed.  These
      distributions are a function of how the features of interest have been categorized - referred to as the
      "classification scheme."

   •  What is the appropriate sample unit?  Sampling units are the portions of the landscape that will be
      sampled for the accuracy assessment.

   •  How many samples should be taken? Accuracy assessment requires that an adequate number of
      samples be gathered so that any analysis performed  is statistically valid.  However, the collection of
      data at each sample point can be very expensive, requiring that sample size be kept to a minimum to
      be affordable.
   •  How should the samples be chosen? The choice and distribution of samples, or sampling scheme,
      is an important part of any inventory design.  Selection of the proper scheme is critical to
      generating results that are representative of the map being assessed.  First, the samples must be
      selected without bias.  Second, further data analysis will depend on which sampling scheme is
      selected. Finally, the sampling scheme will determine the distribution of samples across the
      landscape, which will significantly affect accuracy assessment costs.

This chapter addresses all of the above considerations relative to the NIMA Geocover study. Major study
elements included (1) the final ization of the NIMA GeoCover classification scheme, (2) accuracy
assessment sample design and selection, (3) accuracy assessment site labeling, and (4) the compilation of
the deterministic and fuzzy error matrix.

12.3.1   Classification Scheme
The first task in this project was to specify the
NIMA GeoCover classification system rules.  A
classification scheme has two critical components:
(1) a set of labels (e.g., deciduous forest, urban,
shrub/scrub, etc.); and (2) a set of rules or
definitions such as a dichotomous key for assigning
labels. Without a clear set of rules, the assignment
of labels to types can be  arbitrary and lack
consistency. In addition to having labels and  a set
of rules, a classification scheme should be
(a) mutually exclusive and (b) totally exhaustive.
All study partners worked together to develop and
finalize a classification scheme with the necessary
labels and rules.  Table 12-2  presents the labels; the
classification rules can be found in Appendix A of
this report.
Table 12-2. Classification Labels.
Class #
1
2
3
4
5
6
7
8
9
10
11
12
13
Class Name
Forest, Deciduous
Forest, Evergreen
Shrub/Scrub
Grassland
Barren/Sparsely Vegetated
Urban/Built-Up
Agriculture, Other
Agriculture, Rice
Wetland, Permanent Herbaceous
Wetland, Mangrove
Water
Ice/Snow
Cloud/Cloud Shadow/No Data
Page 194 of 339

-------
12.3.2  Sampling Design

Sample design often requires trade-offs between the need for statistical rigor and the practical constraints
of budget and available reference data. To achieve statistically reliable results and keep costs to a
minimum, a multi-staged, stratified random sample design was employed for this project.  Research by
Congalton (1988) indicates that random and stratified random samplings are the optimal sampling designs
for accuracy assessment.

One of the most important aspects of sample design is that the reference data must be independent from
data used to create the map. The need for independence posed a dilemma for the assessment of the NIMA
Geocover prototype because the National Technical Means (NTM) used for reference data development
were not available for the entire study area.  NTM can be defined as classified intelligence gathering
systems and the data they generate.

As a result of this limited NTM availability, a choice needed to be made to either (1) constrain the
accuracy assessment sample to the areas with existing NTM data, and thereby risk sampling only some of
the mapped area, or (2) allow samples to be chosen randomly, resulting in some samples landing in areas
where existing NTM was not immediately available for reference data development. The  latter approach
was selected because limiting the accuracy assessment area was considered statistically unacceptable.  To
overcome the NTM data gaps, first stage samples were chosen prior to receipt of the final  map. This
provided additional time for the acquisition of new NTM data.  Persistent data gaps were supplemented
by the interpretation of TM composite images.

First stage sample units were 15-minute quadrangle areas. To ensure that an adequate number of
accuracy assessment sites per cover class were sampled, quadrangles were selected for inclusion in
accuracy assessment based on the diversity and number of cover classes in the quadrangle. A relative
diversity index  was determined through the screening of TM composite images of the study area. The
number and diversity of cover type polygons were summarized for each quadrangle, and the six
quadrangles with the greatest cover type diversity and largest number of classes were selected as the first
stage samples.

The second stage  sample units were the polygons of the LC map vector file. Fifty polygons per class
were randomly  selected across all the  six quadrangles. If less than 50 polygons of a particular class
existed within the six quadrangles, then all the available polygons in that class were selected. Both
primary and secondary sample selection was automated using accuracy assessment software developed
for this project.

72.3.3  Site Labeling

All accuracy assessment samples had two class labels: A map label and a reference site label.  For this
project, the "map" label was automatically derived from the LC polygon map label provided by Earthsat
and stored for later use in the compilation of the error matrix.  An expert analyst, based on image
interpretation of NTM data, manually assigned the corresponding "reference" label.  Each sample
polygon was automatically displayed on the computer screen simultaneously with the assessment data
form (see Figure 12-1). The analyst entered the label for the site into the form using the imagery and
other ancillary data available. To insure independence, at no time did the image analyst labeling the
samples have access to map data.
                                                                                 Page 195 of 339

-------
                                                     Accuracy Assessment
                                                             lJoyPaw.ll
                                                   AnaVttt: |Airon HentcKO   V]
                                                   S»l Den: Tua Sap 28 00:00:00 1 999
                                                   Commaott
                                                    Badi
 Clo««i*cwion

P Deciduous Forest

r EveigiMn Fores!

r Shrub/Scmb

r Grwiland

P Ba/r»n/Spai«« Vag

r Urtxm

r In/Snow

r Ag.OOm

r Ag Rica

P Wit Permanent Herfoact

r Mangrove

r WoWr

l~ Ctoud/Shadow/NoCalf
                                                                                     Good
To account for variation in interpretation, the
accuracy assessment analyst also completed a LC
type fuzzy logic matrix for every accuracy
assessment site (Figure 12-1). Each polygon was
evaluated for the likelihood of being identified as
each of the possible cover types. First, the analyst
determined the most appropriate label for the site,
and the label was entered in the appropriate box
under the "classification" column in the form.
This label determined in which row of the matrix
the site will be tallied, and was used for calculation
of the deterministic error matrix. After assigning
the label  for the site, the remaining possible map
labels were evaluated as "good," "acceptable," or
"poor" candidates for the site's label. For example,
a site might fall near the classification scheme
margin between forest and shrub/scrub.  In this
instance, the analyst might rate forest as most
appropriate, but shrub/scrub as "acceptable." As each site was interpreted, the deterministic and fuzzy
assessment reference labels were entered into the accuracy assessment software for creation of the error
matrix.
                                                      dose
                                                                M	>
                                                 F'gure 12'1-
                                                                                      assessment
 12.3.4   Compilation of the Deterministic and Fuzzy Error Matrix

 Following reference site labeling, the error matrix was automatically compiled in the accuracy assessment
 software. Each accuracy assessment site was tallied in the matrix in the column (based on the map label)
 and row (based on the most appropriate reference label).  The deterministic (i.e., traditional) overall
 accuracy was calculated by dividing the total of the diagonal by the total number of accuracy assessment
 sites. The producer's and user's accuracies were calculated by dividing the number of sites in the diagonal
 by the total number of reference (producer's  accuracy) or map (user's accuracy) for each class.  That is,
 from a map producer's viewpoint, given the  total number of accuracy assessment sites for a particular'
 class, what was the proportion of sites correctly mapped? Conversely, class accuracy by column
 represents "user's" class accuracy. For a particular class on the map, user's class accuracy estimates the
 percent of time the class was mapped correctly.

 Non-diagonal cells in the matrix contain two tallies, which can be used to distinguish class labels that are
 uncertain or that fall on class margins, from class labels that are most probably in error.  The first number
 represents those sites in which the map label matched a "good" or "acceptable" reference label in the
 fuzzy assessment (see Table 12-3). Therefore, even though the label was not considered the most
 appropriate, it was considered acceptable given the fuzziness of the classification system and the minimal
 quality of some of the reference data.  These sites are considered a "match" for estimating fuzzy
 assessment accuracy. The second number in the cell represents those sites where the map label was
 considered poor (i.e., an error).

 The fuzzy assessment overall accuracy was estimated as the percentage of sites where the "best," "good "
 or "acceptable" reference label(s) matched the  map label. Individual class accuracy was estimated by
 summing the number of matches for that class' row or column divided by the row or  column total.  Class
 accuracy by row represents "producer's" class accuracy.
Page 196 of 339

-------
        Table 12



Initial Prototype Area
3.  Error matrix for the initial prototype area showing the computations for the deterministic and fuzzy assessments.



                                        MAP                                         Producer's  Accuracies
13
0)
CO
rt>
CD
-J
O^
OJ
c~>
en
Labels
Deciduous Forest
R Evergreen Forest
E Shrub/Scrub
c Grassland
R Barren/Sparse Veg
E Urban
_ Ice/Snow
I*
£ Agriculture Other
Agriculture Rice
D
. Wet, Perm. Herb.
A
j Mangrove
A Water
Cloud/Shadow
Wet,
Decid. EG Scrub/ Barren/ Ice/ Ag. Ag. Perm. Man- Cloud/ Deterministic Percent Fuzzy %
Forest Forest Shrub Grass Sparse Urban Snow Other Rice Herb, grove Water Shadow Totals Deterministic Totals Fuzzy
48
4,0
2,0
0,1
0,0
0,0
0,0
0,1
0,0
0,0
0,0
0.0
0,0
24,7
17
0,1
0,0
0,0
0,0
0,0
0,1
0,0
0,0
0,0
0,0
0,0
0,1
0,1
15
5,1
0,2
0,0
0,0
7,15
0,0
0,0
0,0
0,0
0,0
0,3
0,0
8,1
14
0,0
0,0
0,0
18,6
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,1
0,0
0,0
0,0
0,0
20
0,0
2,0
0,0
0,1
0,0
0,0
0,0
0,0
0,0
0,0
0.0
0,0
0,0
0
0,0
0,0
0,0
0,0
0,0
0,0
0,11
0,1
2,2
3,0
0,1
2,0
0,0
29
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0
0,0
0.0
00
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0
0,0
0.0
0,18
0,3
0.0
0,0
0,0
0,0
0,0
1,2
0,0
1,0
0,0
8
0,0
User's Accuracies
Totals Deterministic 48/56 17/50 15/47 14/50 NA 20/24 NA 29/51 NA NA NA 8/33
Percent Deterministic 85.7% 34.0% 31.9% 28.0% NA 83.3% NA 56.9% NA NA NA 24.2%
Totals Fuzzy 54/56 41/50 27/47 40/50 NA 22/24 NA 36/51 NA NA NA 10/33
Percent Fuzzy 96.4% 82.0% 57.4% 80.0% NA 91.7% NA 70.6% NA NA NA 30.3%
Kappa: 37.2
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0,0
0
NA
NA
NA
48/113 42.5% 72/113 63.7%
17/26 65.4% 21/26 80.8%
15/31 48.4% 27/31 87.1%
14/24 58.3% 22/24 91.7%
0/3 0.0% 0/3 0.0%
20tt2 90.9% 22/22 100.0%
NA NA NA NA
29/82 35.4% 57/82 69.5%
NA NA NA NA
0/2 0.0% 1/2 50.0%
NA NA NA NA
8/8 100.0% 8/8 100.0%
NA NA NA NA
Overall Accuracies
Deterministic Fuzzy
151/311 48.6% 230/311 74.0%
NA

-------
12.4  Results

Table 12-3 reports both the deterministic and fuzzy assessment accuracies. The overall and individual
class accuracies and the Kappa statistic are displayed. Overall accuracy is estimated in a deterministic
way by summing the diagonal and dividing by the total number of sites. For this matrix, overall
deterministic accuracy would be estimated at 48.6% (151/311).  However, this approach ignores any
variation in the interpretation of reference data and the inherent fuzziness  at class boundaries. Including
the "good" and "acceptable" ratings, overall accuracy is estimated at 74% (230/311). The large
difference between these two estimates reflects the difficulty in distinguishing several of the classes, both
from TM imagery and from the NTM.  For example, a total of thirty-one sites were labeled as evergreen
forest on the map and deciduous forest in the reference data. However, twenty-four of those sites were
labeled as acceptable, meaning they were either at or near the class break or were inseparable from the
TM and/or NTM data (see Appendix A).

The Kappa statistic was 0.37%.  The Kappa statistic adjusts the estimate of overall accuracy for the
accuracy expected from a purely random assignment of map labels and is  useful for comparing different
matrices.  However, it does not account for fuzzy class membership and variation in interpretation of the
reference data. From a map user's perspective, individual fuzzy assessment class accuracies vary from
30% (for water) to 96% (for deciduous forest). Producer's accuracies range from 0% (for barren/sparse
vegetation and wet, permanent herbaceous) to 100% (for water and urban).  The highest combined user's
and producer's accuracies occur in the urban class (100% and 91.7%, respectively).

A useful comparison is the total number of sites for a particular class by row and by column. For
example, for deciduous forest there are a total of 113 reference sites and a total of 56 map sites. This
indicates that the map underestimates deciduous forest.  Another underestimated class is agriculture-other
(51 versus 82). Conversely, for evergreen forest there are a total of 50 map sites and 26 reference sites,
indicating that the map overestimates evergreen forest. Other overestimated classes include shrub (47
versus 31) and grassland (50 versus 24).
12.5   Discussion and Conclusions

The following text discusses and analyzes the major sources of confusion and agreement in the LC map
for the initial prototype study. The highest user's accuracy occurs in the deciduous forest class (96.4%).
However, producer's accuracy in deciduous forest is low (63.7%), indicating that there is more deciduous
forest in the area than indicated on the map.  The highest producer's accuracy is in water and urban
(100%). While the urban user's accuracy is also high (91.7%) (indicating that urban is a very reliable
class), the user's accuracy for water is low (30.3%), indicating that significant commission errors may
exist in the water class. For example, eighteen water map sites were determined to be deciduous in the
reference data. After the matrix was generated, these sites were reviewed. In each case, the sites were
small, scattered polygons in forested areas. Because the water was maintained at full resolution (no
filtering was performed), any scattered pixels of water were maintained in the polygon coverage. Many
of these polygons came from one or two pixels of water. Because there are many of these  small
polygons, more than half of the accuracy assessment sites for water came from these polygons.

Confusion also existed in the agriculture-other class, which tends to be confused with shrub/scrub,
grassland, or deciduous forest. User's class accuracy for agriculture-other is estimated at 71% (36/51).
Eleven sites were labeled as deciduous forest.  These sites were also re-examined.  In most all cases, the
polygons came from small groups of pixels (greater than the minimum mapping unit of 1.4 ha) labeled as
 Page 198 of 339

-------
agriculture within forested areas.  The matrix also identifies confusion between agriculture and shrub, and
between agriculture and grasslands. For the shrub/scrub map class, twenty-two sites were labeled as
agriculture in the reference data, with fifteen sites rated as "poor." Subsequent review of the maps
revealed scattered pixels and polygons of shrub within agricultural areas and scattered agriculture within
shrub. For grasslands, twenty-four sites were labeled as agriculture in the reference data, with eighteen
sites labeled as "acceptable." This reflects the uncertainty with separating grassland from agriculture in
many cases.  Often, they have identical spectral responses, and unless there are distinct geometric spatial
patterns or other contextual features, it is very difficult to distinguish these classes from TM imagery
alone.

Map error is often the result of scattered polygons in otherwise homogeneous areas. For example
scattered small polygons of water (particularly in forested areas) accounted for the low estimate of class
accuracy for water. Likewise, scattered polygons of agriculture in shrub and grassland, and scattered
polygons of shrub and grassland in agriculture influenced the accuracies of these classes. This type of
error points to the need for increased precision in the image classification  algorithms,  additional map
editing, and/or refinement of the polygon generating algorithms.

Finally, it should be noted that the first stage sample units contained no polygons of barren/sparse
vegetation, agriculture-rice, ice/snow, mangrove, cloud/shadow or wet, permanent herbaceous.
Therefore, these map classes were not sampled for accuracy assessment. Because the first stage samples
are chosen for their diversity, this indicates that the entire map also has no or few polygons with these
classes.  Considering the location of the prototype, it is reasonable to assume that ice/snow, agriculture-
rice, and mangrove do not exist in the area.  However, a few reference sites (n=5) were labeled
barren/sparse vegetation and wet, permanent herbaceous, indicating that these classes do exist in the area
and may be underrepresented in the map.
12.6  Summary

The error matrix or contingency table has become widely accepted as the standard method for reporting
the accuracy of CIS data layers derived from remotely sensed data. The matrix provides descriptive
statistics including overall, producer's, and user's accuracies as well as sample size information by
category and in total. In addition, the matrix is a starting point for a variety of analytical tools, including
normalization and Kappa analysis.  More recently, the incorporation of fuzzy accuracy assessment has
been suggested and adopted by many remote sensing analysts. As proposed, most of these current
techniques use a variety of metrics to represent the fuzzy analysis. This paper introduces the use of a
fuzzy error matrix for applying fuzzy accuracy assessment.  The fuzzy matrix has the same benefits as a
traditional deterministic error matrix, including the computation of all the descriptive statistics.  A
detailed, practical case study is presented to demonstrate the application of this fuzzy error matrix.

A total of 311 accuracy assessment sites were utilized to estimate the accuracy of the initial prototype
area. The traditional estimate of overall accuracy is 48.6%.  Accounting for fuzzy class membership and
variation in interpretation, overall accuracy is estimated at 74.0%. The spread between the deterministic
and fuzzy assessment estimates is large, but not unusual. Part of this spread is a function of the lack of
NTM for several of the reference sites (n=84), resulting in the reference label being determined from
manual interpretation of the TM data.  Hopefully, more NTM will be available as the project progresses,
which will reduce the spread between deterministic and fuzzy logic estimates. However, some spread
will remain because of fuzziness in the boundaries of LC classes.  Therefore, acceptable fuzziness
                                                                                     Page 199 of 339

-------
between deciduous and evergreen forest (especially in mixed conditions) and deciduous forest and shrub
will remain.
12.7  References

Congalton, R. A comparison of sampling schemes used in generating error matrices for assessing the
    accuracy of maps generated from remotely sensed data. Photogrammetric Engineering and Remote
    Sensing, 54(5), 587-592, 1988.

Congalton, R. A review of assessing the accuracy of classifications of remotely sensed data.  Remote
    Sens. Environ., 37, 35-46, 1991.

Congalton, R. and G. Biging. A pilot study evaluating ground reference data collection efforts for use in
    forest inventory. Photogrammetric Engineering and Remote Sensing, 58(12), 1669-1671, 1992.

Congalton R., and K. Green. A practical look at the sources of confusion in error matrix generation.
    Photogrammetric Engineering and Remote Sensing, 59(5), 641-644, 1993.

Congalton, R. and K. Green. Assessing the Accuracy of Remotely Sensed Data:  Principles and Practices.
    Lewis Publishers, Chelsea, MI, 1999.

Gong, P. and J. Chen. Boundary uncertainties in digitized maps:  Some possible determination methods.
    Proceedings ofGIS/LIS  '92, San Jose, CA, pp. 274-281, 1992.

Gopal, S. and C. Woodcock. Theory and Methods for Accuracy Assessment of Thematic Maps Using
    Fuzzy Sets.  Photogrammetric Engineering and Remote Sensing, 60(2), 181-188, 1994.

Lowell, K. On the incorporation of uncertainty into spatial data systems.  Proceedings ofGIS/LIS '92,
    San Jose, CA, pp. 484-493, 1992.

Story, M. and R. Congalton.  Accuracy assessment:  A user's perspective. Photogrammetric Engineering
    and Remote Sensing, 52(3), pp 397-399, 1986.
Page 200 of 339

-------
    Appendix A



Classification Rules
Parcel Appearance
If pixel appears as water
If > 35% Man-made Impervious material
If cultivated (excluding forest plantations)
If Rice
Otherwise
If total natural vegetation cover >= 10%
If coastaltestuarine AND vegetation cover is
Mangrove
If >= 35% woody vegetation AND > 3 m in height
~lf~v\/oody vegetation deciduous w/ <25% evergreen
intermixture
If woody vegetation deciduous w/ >= 25% evergreen
intermixture OR if woody vegetation is 100%
evergreen
If woody vegetation >= 10% cover AND height < 3 m
OR if woody vegetation between 10% and 35% cover
at any height
If herbaceous cover >= 10% OR mixed shrub and
nrass AND no evidence of seasonal or permanent
saturation (topo position = upland)
Else
If non-vegetated
If soil intermittently or permanently saturated
TFsnow or ice cover
"TTview of ground obscured by cloud, shadow, satellite
sensor artifact or lack of TM data
Else
Categorization Call
Water (Category 11)
Urban (Category 6)
Examine for evidence of Rice cultivation
Agriculture, Rice (Category 8)
Agriculture, Other (Category 7)
Examine for content
Wetland, Mangrove (Category 10)
Examine for forest type
Forest, Deciduous (Category 1)
Forest, Evergreen (Category 2)
Shrub/Scrub (Category 3)
Grassland (Category 4)
Wetland, Permanent Herbaceous (Category 9
Examine for content
Wetland, Permanent Herbaceous (Category 9)
Perennial Ice or Snow (Category 12)
Cloud/Cloud Shadow/No Data (Category 13)
Barren/Sparsely Vegetated (Category 5)
                                  Page 201 erf 339

-------
Page 202 of 339

-------
                                   Chapter 13

 Mapping Spatial Accuracy and Estimating Landscape Indicators
     from Thematic Land-Cover Maps Using Fuzzy Set Theory

                                         by

                                    Liem T. Iran1*
                                  S. Taylor Jarnagin2
                                  C. Gregory Knight1
                                   Latha Baskaran1
  Center for Integrated Regional Assessment
     and Department of Geography
  The Pennsylvania State University
  University Park, PA 16801

  ''Corresponding Author Contact:

   Telephone: (814)865-1587
    Facsimile: (814)865-3191
       E-mail: LTT1@psu.edu
U.S. Environmental Protection Agency
National Exposure Research Laboratory
12201 Sunrise Valley Drive
Reston, VA 20192
13.1  Introduction

The accuracy of thematic map products is not spatially homogenous, but instead variable across most
landscapes. Properly analyzing and representing the spatial distribution (pattern) of thematic map
accuracy would provide valuable user information for assessing appropriate applications for land-cover
(LC) maps and other derived products (i.e., landscape metrics). However, current thematic map accuracy
measures, including the confusion or error matrix (Story and Congalton, 1986) and Kappa coefficient of
agreement (Congalton and Green, 1999), are inadequate for analyzing the spatial variation of thematic
map accuracy. They are not able to answer several important scientific and application-oriented questions
related to thematic map accuracy. For example: Are errors distributed randomly across space? Do
different cover types have the same spatial accuracy pattern? How do spatial accuracy patterns affect
products derived from thematic maps?  Within this context, methods for displaying and analyzing the
spatial accuracy of thematic maps and bringing the spatial accuracy information into other calculations,
such as deriving landscape indicators from thematic maps, are important issues to advance scientifically
appropriate applications of remotely sensed image data.
                                                                         Page 203 of 339

-------
Our study objective was to use fuzzy set approach to examine and display the spatial accuracy pattern of
thematic LC maps and to combine uncertainty with the computation of landscape indicators (metrics)
derived from thematic maps. The chapter is organized by (1) current methods for analyzing and mapping
thematic map accuracy, (2) presentation of our methodology for constructing fuzzy LC maps, and
(3) deriving landscape indicators from fuzzy maps.

There have been several studies analyzing the spatial variation of thematic map accuracy (Campbell,
1981; Congalton, 1988). Campbell (1987) found a tendency for misclassified pixels to form chains along
boundaries of homogenous patches. Townshend et al. (2000) explained this tendency by the fact that, in
remote sensed images, the signal coming from a land area represented by a specific pixel can include a
considerable proportion of signal from neighboring pixels. Fisher (1994) used animation to visualize the
reliability in classified remotely sensed images.  Moisen et al. (1996) developed a generalized linear
mixed model to analyze misclassification errors in connection with several factors, such as distance to
road, slope, and LC heterogeneity. Recently Smith et al. (2001) found that accuracy decreases as LC
heterogeneity increases and patch sizes decrease.

Steele et al. (1998) formulated a concept of misclassification probability by calculating values at training
observation locations, and then use spatial interpolation (kriging) to create accuracy maps for thematic LC
maps.  However, this work used the training data employed in the classification process but not the
independent reference data usually collected after the thematic map has been constructed for accuracy
assessment purposes. Steele et al. (1998) stated that the misclassification probability is not specific to a
given cover type. It is a population concept indicating only the probability that the predicted cover type is
different from the reference cover type, regardless of the predicted and reference types as well as the
observed outcome, and whether correct or incorrect. Although  this work brought in a useful approach to
constructing accuracy maps, it did not provide information for the relationship between misclassification
probabilities and the independent reference data used for accuracy assessment (i.e., the "real" errors).
Furthermore, by combining training data of all different cover types together, it produced similar
misclassification probabilities for pixels with different cover types that were co-located. This point
should be open to discussion as our analysis described below indicates that the spatial pattern of thematic
map accuracy varies from one cover type to another, and pixels with different cover types located in close
proximity, might have different accuracy levels.

Recently,  fuzzy set theory has been applied for thematic map accuracy assessment using two primary
approaches. The first was to design a fuzzy matching definition for a crisp classification, which allows
for varying levels of set membership for multiple map categories (Gobal and Woodcock, 1994;  Muller et
al., 1998;  Townsend, 2000; Woodcock and Gopal, 2000). The second approach defines a fuzzy
classification or fuzzy object (Zhang and Stuart, 2000; Cheng et al., 2001).  Although the fuzzy theory-
based methods take into consideration error magnitude and ambiguity in map classes while doing the
assessment, like other conventional measures, they do not show spatial variation of thematic map
accuracy.

To overcome shortcomings  in mapping thematic map accuracy, we have developed a fuzzy set-based
method that is capable of analyzing and mapping spatial accuracy patterns of different cover types. We
expanded that method further in this study to bring the spatial accuracy information into the calculations
of several landscape indicators derived from thematic LC maps. As the method of mapping spatial
accuracy was at the core of this study, it will be presented to a reasonable extent in this chapter.
Page 204 of 339

-------
13.2  Methods

This study used data collected for the accuracy assessment of the National Land Cover Data (NLCD) set.
The NLCD is an LC map of the contiguous United States derived from classified Landsat Thematic
Mapper (TM) images (Vogelmann et al., 1998; Vogelmann et al., 2001). The NLCD was created b\ the
Multi-Resolution Land Characterization (MRLC) consortium (Loveland and Shaw, 1996) to provide a
national-scope and consistently classified LC data set for the country. Methodology and results of the
accuracy assessment have been described in Stehman et al. (2000). Yang et al. (2000; 2001), and Zhu
et al. (1999; 2000). While data for the accuracy assessment were taken  by federal region and available for
several regions, this study only used  data collected for Federal Geographic Region 111 and the Mid-
Atlantic Region (MAR) (see Figure  13-1).  Table  13-1  shows the number of photographic interpreted
-•reference" data samples  associated with each class in the LC map (Level I) for the MAR.  Note that the
reference  data for Region HI did not  include alternate reference cover type labels or information
concerning photographic  interpretation confidence, unlike data associated with other Federal geographic
regions.
                                               50   0    50   100   150 K>lom«tan
               Figure 13-1.  The Mid-Atlantic Region; 10 watersheds used in later analysis
                           are highlighted on the map.
                                                                                   Page 205 of 339

-------
                  Table 13-1. Number of samples by Andersen Level I classes.
Class name
Water
Developed
Barren
Forested Upland
Shrubland
Non-Natural Woody
Herbaceous Upland Natural/Semi-natural Vegetation
Herbaceous Planted/Cultivated
Wetlands
MRLC
Code
11
20s
30s
40s
51
61
71
80s
90s
Total:
Number of
samples
79
222
127
338
0
0
0
237
101
1104
Major analytical study elements were:  (1) define a multi-level agreement between sampled and mapped
pixels; (2) construct accuracy maps for six LC types; (3) define cover-type conversion degrees of
membership for mapped pixels; (4) develop a cover-type conversion rule set for different conditions of
accuracy and LC dominance; (5) construct fuzzy LC maps; and (6) develop landscape indicators from
fuzzy LC maps.

13.2.1   Multi-Level Agreement

In the MRLC accuracy assessment performed by Yang et al. (2001), agreement was defined as a match
between the primary or alternate reference cover type label of the sampled pixel and a majority rule LC
label in a 3 x 3 window surrounding the sample pixel. Here, we defined a multi-level agreement at a
sampled pixel (see Table 13-2) and applied it to all available sampled pixels. It has been demonstrated
that the multi-level agreement went beyond the conventional binary agreement and covered a wide range
of possible results, ranging from "conservative bias" (Verbyla and Hammond, 1995) to "optimistic bias"
(Hammond and Verbyla, 1996). We define a discrete fuzzy set A (A = {(a,,  ^,),...,(a6, u6)}) representing
the multi-level agreement at a mapped pixel regarding a specific cover type as follows:
                                                          dk
                                                                        M,
where a^ i=l,..,6, are six different levels (or categories) of agreement at a mapped pixel; jut is fuzzy
membership of the agreement level / of the pixel under study; d is the distance from sampled point k to
the pixel (k ranges from 1 to n, where n is the number of nearest sampled points taken into consideration);
Ik is a binary function which equals 1  if the sampled point k has the agreement level / and 0 otherwise; p is
the exponent of distance used in the calculation; 6t is photographic interpretation confidence score of the
sampled pixel k.  As information of photographic interpretation confidence was not available for the
Region in data set, 6t was set as constant (6t = 1.0) in this study.  The division by the maximum of At was
Page 206 of 339

-------
to normalize the'fuzzy membership function (Equation 1). Verbally, the fuzzy number of multi-level
agreement at a mapped pixel defined in Equation 1 is a modified inverse distance weighted (IDW)
interpolation of the n nearest sample points for each agreement level defined in Table 13-2.  But instead
of using all n data points together in the interpolation like in conventional IDW for continuous data, the n
sample pixels were divided into six separate groups based on their agreement levels and six iterations of
IDW interpolation (one for each agreement level) were run.  For each iteration of a particular agreement
level, only those samples (among n sample pixels) with that agreement level would be coded as 1 while
other reference samples were coded as 0 by the use of the binary function  7t. IDW then returned a value
between 0.0 and 1.0 for Mt in each iteration. In other words, M, is an IDW-based weight of sample pixels
at the agreement level i among the n closest sample pixels surrounding the pixel under study. With the
"winner-takes-all" rule, the agreement level with maximum Mt (i.e., maximum membership value ft, = 1)
will be assigned as the agreement level of the mapped pixel under study.
                         Table 13-2. Multi-level agreement definitions.
           Levels
Description
             I
A match between the LC label of the sampled pixel and the center
pixel's LC type as well as a LC mode of the 3x3 window (662 sampled
points)	
                    A match between the LC label of the sampled pixel and a LC mode of
                    the 3x3 window (39 sampled points)	
                    A match between the LC label of the sampled pixel and the LC type of
                    any pixel in the 3x3 window (199 sampled points)	
             IV
A match between the LC label of the sampled pixel and the LC type of
any pixel in the 5x5 window (84 sampled points)
             V
A match between the reference LC label of the sampled pixel and the
LC type of any pixel in the 7x7 window (31  sampled points)
             VI
Failed all of the above (89 sampled points)
After the multi-level agreement fuzzy set A was calculated (Equation 1), its scalar cardinality was
computed as follows (Bardossy and Duckstein, 1995):
                                                                                             (2)
Thus, the scalar cardinality of the multi-level agreement fuzzy set A is a real number between 1.0 and 6.0.
This is an indicator of the agreement-level "homogeneity" of sampled pixels surrounding the pixel under
study. [fcar(A) is close to 1.0, the majority of sampled pixels surrounding the mapped pixel under study
have the same agreement level.  Conversely, the greater car(A) is, the more heterogeneous in agreement
levels the sampled pixels are. Note that there is another way for a mapped pixel to have a near 1.0
cardinality. That is when the distance between the mapped pixel and a sampled pixel is very close
compared to those of other sampled pixels, reflecting die local effect in the inverse distance weighted
(IDW) interpolation. However, this case occurs only in small areas surrounding each sampled pixel.
                                                                                  Page 207 of 339

-------
 13.2.2   Spatial Accuracy Map

Using the above equations, discrete fuzzy sets representing multi-level agreement and their cardinalities
were calculated for all mapped pixels associated with a particular cover type. Then the cardinality values
of all pixels were divided into three unequal intervals (1.0 - 2.0; 2.0 - 3.0; and >3.0). They were assigned
(labeled) to the appropriate category, representing different conditions of agreement-level heterogeneity
of neighboring sampled pixels.  The three cardinality classes were then combined with six levels of
agreement to create 18-category accuracy maps.

 13.2.3   Degrees of Fuzzy  Membership

This step calculated the possible occurrence of multiple cover types for any given pixel(s) locations
expressed in terms of degrees of fuzzy membership. This was done by comparing cover types of mapped
pixels and sampled pixels at the same location based on individual pixels and a 3 x 3 window-based
evaluation. To illustrate, assume that the mapped pixel and the sampled pixel had cover types x andy,
respectively. In the one-to-one comparison between the mapped and sampled pixels, if x and y are the
same, then it is reasonable to state that the mapped pixel was classified correctly. In that case, the degree
of membership for cover type x to remain the same is assigned to 1.0.  On the other hand, if x is different
fromy, then it can be stated that the mapped pixel is wrongly classified, and the degree of membership of
x to become y would be 1.0. The above statements can be summarized as follows:

              Ma(x^x)=l                                 if       \=y
                                                                                             (3)
              Afa (*->>)= 1 and Afa(jc-x)=0                  if       x*y

Using a 3 x 3 window, if there was a match between x and_y, then it is reasonable to state that the cover
type of the more dominant pixels (x) in the 3 x 3 window was probably most representative.  However, if
the mapped pixels was wrongly classified (e.g., no match between x and_y), then the more dominant cover
type x is, the higher possibility that the mapped pixel with cover type x will have cover type y.  Within
that context, the cover-type-conversion degrees of membership regarding x and y at the mapped pixel
were computed as follows:

              Mb (*-*) = nx 19                            if       \=y
                                                                                             (4)
              Mb (x-y) = nx 19 and Mb (x~x) = !-(«, /9)      if \*y

where nx is the number of pixels in the 3x3 window with cover type x.  The ultimate degrees of
membership of cover types at the mapped pixel were computed as the weighted-sum average of those
from the one-to-one and 3x3 window-based comparisons as follows:

              M(x~y).= o)a. Ma(x-y) + <*>b.Mb(x-y)                                            (5)

where o>0 and wb were weights for Ma and Mb, respectively, with a)0+coA=l  (note that x and y  in  Equation 5
can be different or the same). In this study, we applied equal weights (i.e., o)a=a>A=0.5) for the  two one-
to-one and 3x3 window-based comparisons. Figure 13-2 demonstrates degrees of fuzzy membership if a
mapped pixel were computed.
;Page 208 of 339

-------
   •  Rule 2:  If x is "subordinate" and the accuracy is "high," then:

   •  Rule 3:  If x is "dominant" and the accuracy is "low," then:
                                                                                              0)
      Rule 4:  If A: is "subordinate" and the accuracy is "low," then:

        ;
where ^4. is accuracy level for land-cover type x at point i with its values ranging from 0.0 to 1.0; nsi is
the number of pixels labeled x in the 3x3 window surrounding the mapped,pixel /'. We assigned values
of Ai based on the multi-level agreement for cover type x at that point.  Ai  is equal 1.0 if the agreement
level is I; and 0.8, 0.6, 0.4, 0.2, and 0.0 for agreement levels II, III, IV, V, and VI, respectively.  While
Equations 7-10 are based on fuzzy set theory and the error or confusion matrix is associated with
probability theory, outcomes of Equations 7-10 are somewhat similar to information in a row of the error
matrix.  Note that, while one sampled point is used only once in computing the error matrix, it is
employed four times at different degrees in constructing the four fuzzy rules. For example, a sampled
point in a high accuracy area dominated by cover type x will contribute more to rule-1 than rules 2-4.  In
contrast, a sampled point in a low accuracy area and subordinate cover type x will have a significant
contribution to rule four above, compared with other rules.  Consequently, each rule represents the
degrees of membership of cover type conversion for specific conditions of accuracy and  dominance that
vary spatially on the map.  In contrast, a row in the error matrix is a global summary of a cover type for
the whole map and does  not provide any localized information.


13.2.5  Fuzzy Land-Cover Maps

The fiizzy rule set derived in the previous step was used to construct various LC conversion maps
representing the degrees of fuzzy membership (or possibility) from x to y of all mapped pixels associated
with cover type x. For example, to construct the "barren-to-forested upland" map, the four fuzzy rules
were applied to all pixels mapped as barren (Table 13-3 (a-d)).  In contrast to ordinary rules - where only
one rule is activated at a  time - the four fuzzy rules were activated simultaneously at different degrees
depending on levels of accuracy and LC dominance at that particular location. Consequently, four
outcomes resulted from the four fuzzy rules. There are different methods for combining fuzzy rule
outcomes (Bardossy and Duckstein, 1995). Here we applied the weighted sum combination method
whose details and application can be found in Bardossy and  Duckstein (1995) and Tran (2002).
Page 21 Oof 339

-------
                  One-to-one comparison of
                  the map-based pixel (MP)
                  and the sampled pixel (SP)
                        SP        MP
                     14C30-MO) = 1.0

                                 = 0.0
                           3x3 window-based
                           comparison of MP with SP
                            M,.(30-NlO) - 5/9-0 56
                                        . 1-5/9=0 44
                         Land-cover -to-land-cover fuzzy membership at MP
                                     = 0.5 -1.0 -i- 0.5 -0.56= 0.78
M4(30->30)
                                                      4toc.
                                       05*00 + 0 5*0.44= 0.22
           Figure 13-2.  Illustration of calculating the cover-type-conversion degrees of
                       membership.

13.2.4  Fuzzy Membership Rules

Here we integrate degrees of membership at individual locations derived from the previous step into a set
Of fuzzy rules. Theoretically, a fuzzy rule generally consists of a set of fuzzy set(s) as argument(s) A.k and
an outcome B also in the form of a fuzzy set such that:
               If (A, and A2 and	and At)
                     then   B
                                                                                            (6)
where k is the number of arguments. We constructed four fuzzy rules for each cover type for four
different combinations of two arguments including (I) accuracy level (i.e., low and high) and (2) majority
(i.e., dominant or subordinate).  Both of the arguments were available spatially with the first obtained
from the accuracy maps constructed in previous steps and the second derived directly from the I  i
thematic map.  The four fuzzy rules for cover type x are stated as follows:

   •  Rule I:  If* is "dominant" and the accuracy is "high," then the degree of membership of x to
              become y is:
                                                                                            (7)
                                                                                 Page 209 of 339

-------
   •  Rule 2:  If x is "subordinate" and the accuracy is "high," then:

   •  Rule 3:  If x is "dominant" and the accuracy is "low," then:
                                                                                              0)
      Rule 4:  If A: is "subordinate" and the accuracy is "low," then:

        ;
where ^4. is accuracy level for land-cover type x at point i with its values ranging from 0.0 to 1.0; nsi is
the number of pixels labeled x in the 3x3 window surrounding the mapped,pixel /'. We assigned values
of Ai based on the multi-level agreement for cover type x at that point.  Ai  is equal 1.0 if the agreement
level is I; and 0.8, 0.6, 0.4, 0.2, and 0.0 for agreement levels II, III, IV, V, and VI, respectively.  While
Equations 7-10 are based on fuzzy set theory and the error or confusion matrix is associated with
probability theory, outcomes of Equations 7-10 are somewhat similar to information in a row of the error
matrix.  Note that, while one sampled point is used only once in computing the error matrix, it is
employed four times at different degrees in constructing the four fuzzy rules. For example, a sampled
point in a high accuracy area dominated by cover type x will contribute more to rule-1 than rules 2-4.  In
contrast, a sampled point in a low accuracy area and subordinate cover type x will have a significant
contribution to rule four above, compared with other rules.  Consequently, each rule represents the
degrees of membership of cover type conversion for specific conditions of accuracy and  dominance that
vary spatially on the map.  In contrast, a row in the error matrix is a global summary of a cover type for
the whole map and does  not provide any localized information.


13.2.5  Fuzzy Land-Cover Maps

The fiizzy rule set derived in the previous step was used to construct various LC conversion maps
representing the degrees of fuzzy membership (or possibility) from x to y of all mapped pixels associated
with cover type x. For example, to construct the "barren-to-forested upland" map, the four fuzzy rules
were applied to all pixels mapped as barren (Table 13-3 (a-d)).  In contrast to ordinary rules - where only
one rule is activated at a  time - the four fuzzy rules were activated simultaneously at different degrees
depending on levels of accuracy and LC dominance at that particular location. Consequently, four
outcomes resulted from the four fuzzy rules. There are different methods for combining fuzzy rule
outcomes (Bardossy and Duckstein, 1995). Here we applied the weighted sum combination method
whose details and application can be found in Bardossy and  Duckstein (1995) and Tran (2002).
Page 21 Oof 339

-------
0)
CO
(D
M
a
w
01
           Table 13-3. The fuzzy cover-type-conversion rule set.
Land Cover Types
Water (11)
Developed (20s)
Barren (30s)
Natural Forested Upland (40s)
Herbaceous Planted/Cultivated (80s)
Wetlands (90s)
Dominant
Subordinate
Dominant
Subordinate
Dominant
Subordinate
Dominant
Subordinate
Dominant
Subordinate
Dominant
Subordinate
Rules
1-a
1-b
2-a
2-b
3-a
3-b
4-a
4-b
5-a
5-b
6-a
6-b
Low Accuracy
11
0.20
0.35
0.03
0.00
0.01
0.01
0.04
0.02
0.01
0.04
0.07
0.05
20s
0.09
0.07
0.08
0.32
0.21
0.36
0.04
0.16
0.18
0.20
0.06
0.11
30s
0.43
0.17
0.17
0.08
0.06
0.17
0.36
0.09
0.37
0.19
0.07
0.16
40s
0.06
0.23
0.33
0.32
0.47
0.33
0.08
0.36
0.27
0.16
0.69
0.34
80s
0.00
0.00
0.35
0.27
0.24
0.10
0.35
0.34
0.12
0.42
0.06
0.13
90s
0.21
0.28
0.03
0.00
0.01
0.03
0.13
0.04
0.04
0.00
0.06
0.21
Rules
1-c
1-d
2-c
2-d
3-c
3-d
4-c
4-d
5-c
5-d
6-c
6-d
High Accuracy
11
0.98
0.53
0.00
0.00
0.00
0.00
0.00
0.01
0.00
0.02
0.02
0.04
20s
0.01
0.16
0.91
0.72
0.05
0.08
0.01
0.14
0.05
0.12
0.01
0.06
30s
0.00
0.00
0.00
0.00
0.67
0.48
0.01
0.03
0.01
0.05
0.01
0.03
40s
0.00
0.09
0.03
0.11
0.21
0.36
0.91
0.58
0.06
0.14
0.13
0.24
80s
0.00
0.00
0.05
0.16
0.06
0.07
0.06
0.20
0.88
0.67
0.02
0.08
90s
0.01
0.22
0.00
0.00
0.00
0.01
0.01
0.04
0.00
0.00
0.82
0.55

-------
A fuzzy LC map for a given cover type was constructed by combining six cover-type-conversion maps.
For example, to develop the fuzzy forested upland map, six maps were merged including: (1) forested
upland-to-forested upland; (2) \vater-to-forested upland, developed-to-forested upland; (3) barren-to-
forested upland; (4) herbaceous planted/cultivatcd-to-forested upland; and (5) wetlands-to-forested
upland. The final fuzzy forested upland map represented the degrees of membership of forested upland
for all pixels on the map.  The degree of membership at a pixel on the fuzzy LC map was a result of
several factors, including the thematic mapped cover type at that pixel and the dominance and accuracy of
that LC type in the area surrounding the pixel under study.  To illustrate, in a forested dominated  upland
area with high accuracy, the degrees of membership of forested upland will be high (i.e., close to  1.0).
Conversely, in a barren dominating area with high accuracy, the degrees of membership of forested
upland will be very low (i.e., close to 0.0) for barren labeled pixels.  In contrast, in a barren dominated
area with low accuracy, the degrees of membership of forested upland increases to some extent (i.e.,
approximately 0.3 to 0.4)  for barren labeled pixels. Focusing on forest-related landscape indicators, we
used only the fuzzy forested upland map in the next section.


13.2.6   Deriving Landscape Indicators

First, several a-cut maps were created from the fuzzy forested upland map.  Each a-cut map was  a binary
map of forested upland with the degrees of membership < a.  For example, a 0.5-cut forested  upland map
is a binary map with two lumped  categories; forest for pixels with degrees of membership for  forested
upland < 0.5 and non-forest otherwise. Then, landscape indicators of interest were derived from  these
a-cut maps in a similar way to those from an ordinary LC map. The difference was,  instead of having a
single number for the indicator under study (as with an ordinary LC map), there were several  values of
the indicator in accordance to various a-cut maps.  Generally, the more variable those values were, the
more uncertain the indicator was  for that particular watershed.
13.3   Results and Discussion

Plate 13-1 presents accuracy maps for six cover types.  All maps were created with the values of 10 for
the number of sampled pixels n and 2.0 for the exponent of distance/) (Equation I).  The smaller the
number of/? and/or the larger the value ofp are, the more the local effects of sampled points on the
accuracy maps are taken into account.  One important point illustrated by these maps is that the spatial
accuracy patterns were different from one cover type to another. For example, while forested upland was
understandably more accurate in highly forested areas, herbaceous planted/cultivated tended to be more
accurate in populated areas. On the other hand, developed  areas around Richmond and Roanoke had
lower accuracy levels compared with other urbanized areas, such as Baltimore, Washington, DC,
Philadelphia, and Pittsburgh.

For the  forested upland accuracy map, some areas had  abnormally low accuracy levels, such as those in
central and southern Pennsylvania.  The southwest corner of Virginia had a very low level of accuracy
(agreement level-6), indicating that there was almost no match at all between sampled pixels and mapped
pixels in this area. This raised questions on both the thematic map classification process as well as the
quality of the reference data. Thus, the fuzzy accuracy maps  indicated irregularities or accumulated
errors associated with both the thematic map and reference data set. This information is not illustrated
using conventional accuracy measure, however, it's very beneficial for designing sampling schemes to
support reference data cross-examination.
Page 212 of 339

-------
18 accuracy
categories

••1-1
  • 1-2
    1-3
  • 11-1
    11-2
   " lt-3
    111-1
    111-2
    IM-3
    IV-1
    IV-2
    IV-3
    | V-1
    V-2
    V-3
    I Vl-1
*
Vl-2
VI-3
not available
  Plate 13-1.    Fuzzy accuracy maps of (a) water, (b) developed, (c) barren, (d) forested upland,
                (e) herbaceous planted/cultivated, and (f) wetlands.
                                                                                        Page 213 of 339

-------
Table 13-3 presents the fuzzy cover type conversion rule set that is, as mentioned above, somewhat
similar to a combination of four error matrices in one. The possibilities derived from each fuzzy rule
should be interpreted relatively. For example, for a low accuracy barren dominant area, the possibility for
a barren labeled pixel to be forested upland (i.e., rule 3-a) was the highest compared with other cover
types, including barren, and it was double the second highest possibility of barren-to-herbaceous
planted/cultivated (i.e., 0.47 versus 0.24).  Note that the outcomes of each fuzzy rule were not normalized
(i.e., to have the highest possibility equal 1.0) for the purpose of global rule-to-rule comparison.  For
instance, the wetlands-to-forested upland possibility of a wetlands labeled pixel in a low accuracy
wetlands dominant area (rule 6-a) was double (0.69 versus 0.33) the developed-to-forested upland
possibility of a developed labeled pixel in a low accuracy developed dominant area (rule 2-a). Unlike an
error matrix, the fuzzy rule set table provided significant insights on spatial accuracy variation of the
thematic map under study. As the size of the referenced data set was relatively small compared with the
area it covered, we used only two arguments  (inputs): the accuracy levels and cover type dominance.  If
there are more sampled data in future analyses, additional arguments (factors) that might affect the
classification process (e.g., slope, altitude,  sun angle, fragmentation) can be included in the fuzzy rules,
and potentially more insights of the thematic  map spatial accuracy patterns can be revealed.

Figure 13-3 presents six fuzzy conversion maps of water-to-forested upland, developed-to-forested
upland, barren-to-forested upland, forested upland-to-forested upland,  herbaceous planted/cultivated-to-
forested upland, and wetlands-to-forested upland. These maps resulted from spatially applying the fuzzy
rule set to six LC types on the thematic map.  Each map had a distinct  pattern as the degree of
membership of a cover type reclassified as forested  upland at each location on a map was decided by the
dominance and accuracy of that cover type at that spot.  Figure 13-4 shows the fuzzy forested upland map
that was a combination of the six cover type conversion maps (Figure  13-3).  An abnormality in the
southwest corner of Virginia apparently resulted form very  low level of accuracy for most of the forest
upland sampled pixels in the vicinity.  This made the forested upland degrees of membership for this area
very low, although the area was dominated by forest.  This irregularity can be verified only through the
additional reference data.  For other forested  areas with low accuracy levels, like southern Pennsylvania,
the degrees of membership were greater (around 0.5 to 0.6). This value implies that a forested upland
labeled pixel in such an area has a low probability (0.1 to 0.2) of being another cover type (i.e.,
herbaceous planted/cultivated or developed).

Figure 13-5 (a-d) presents the crisp binary map and three a-cut maps of the fuzzy forested upland map at
the levels of 0.1, 0.25, and 0.5.  One can see  that the 0.1 cut forested upland map (b) had more forest than
the crisp binary map (a) in all areas other than southwest Virginia.  This result is because the 0.1 cut man
included pixels that were labeled to other cover types but had possibilities >0.1 of being forested upland
This was somewhat similar to the result if a rule to include only the forested upland omission errors into
the forested upland category had been used.  Conversely, the 0.25 cut  forested upland map (c) appeared to
be similar to the crisp binary map in terms  of forest coverage. This can be explained by the fact only
pixels with moderate forested upland  degrees of membership (>0.25) were included in the 0.25 cut map
This excluded the forested upland omission errors but maintained the commission errors.  For the 0 50 cut
map (d), forest coverage was proportionately less than those on the binary map and areas with low forest
accuracy were excluded from the map. By exploring various a-cut maps of forested upland,  the different
forested upland map outcomes can be explored including and or excluding omission and commission
errors.
Page 214 of 339

-------
 Land-cover-to-land-
 cover degrees of
 fuzzy membership
  •••00-01
       0.1-0.2
       0.2-0.3
       0.3 - 04
       0.4 - 0.5
       0.5 - 06
       0.6-1.0
       No Data
Figure 13-3. Fuzzy cover-type-conversion maps of (a) water-to-forested upland, (b) developed-to-
            forested upland, (c) barren-to-forested upland, (d) forested upland-to-forested
            upland, (e) herbaceous planted/cultivated-to-forested upland, and (f) wetlands-to-
            forested upland.
                                                                                Page 215 of 339

-------
                 Degrees of fuzzy
                 membership
                      0.0-0.1
                      0.1 - 0.2
                      0.2 - 0.3
                      0.3 - 0.4
                      0.4-0.5
                      0.5 - 0.6
                      0.6 - 0.7
                      0.7-0.8
                      0.8-0.9
                      0.9-1.0
                      No Data
                              Figure 13-4. Fuzzy forested upland map.
Page 216 of 339

-------

                                       forest
                                       non-forest


Figure 13-5.  Crisp binary forested upland map (a) and three a-cut maps derived from the
            fuzzy forested upland map:  (b) 0.1=cut, (c) 0.25-cut, and (d) 0.5-cut.
                                                                              Page 217 of 339

-------
Table 13-4 presents two forested landscape indicators (FOR% and INT20) for 10 watersheds in MAR
(see Figure 13-5). FOR% was computed to extract the number of pixels with forested upland cover on a
watershed basis divided by the total number of pixels for each watershed to yield the watershed-based
index value. INT20 was used to calculate the proportion of forested upland cover within each window
using a threshold of 90% to determine interior habitat suitability (i.e., suitable if ^90% forest coverage).
Then the proportion of watershed with suitable interior habitat was determined as INT20 (bases on 450 x
450 m window). Various values of FOR% and INT20 at three cc-cut maps provided possible values of
these landscape indicators for the watersheds understudy.
 Table 13-4. Values of FOR% and INT20 for 10 watersheds in the Mid-Atlantic region.
Watershed
Schuylkill
Lower West Branch Susquehanna
Lower Susquehanna
Nanticoke
Cacapon-Town
Pamunkey
Upper James
Hampton Roads
Connoquenessing
Little Kanawha
FOR0/.
Crisp
47.5
68.8
29.0
30.1
84.9
64.2
86.9
16.2
55.4
86.2
0.10-cut
55.4
73.0
36.3
57.5
96.0
78.4
95.3
35.0
65.2
90.4
0.25-cut
47.7
69.0
29.2
31.2
84.9
65.2
87.1
7.3
54.1
86.4
0.50-cut
45.4
68.8
28.1
21.9
84.0
60.1
86.9
4.4
50.3
86.2
INT20
Crisp
23.6
54.9
10.8
6.8
72.0
39.1
77.4
2.4
25.0
71.8
0.10-cut
31.1
60.3
16.2
37.6
92.3
61.9
91.4
14.0
39.4
80.5
0.25-cut
24.0
55.3
10.9
8.0
72.6
40.5
77.8
1.6
25.0
72.4
0.50-cut
22.8
54.9
10.7
4.5
71.1
36.4
77.3
1.1
23.3
71.8
For the Schuylkill watershed (2040203) located in urbanized area with moderate accuracy for forested
upland pixels. FOR% ranged from 55.4 to 45.4 with a 10% change from 0.1 to 0.5 cut.  Also, the FOR%
value at 0.25 cut was very close to those for crisp binary forested upland map (i.e., 47.7 versus 47.5) and
INT20 values at this watershed changed about 8.3% from 0.1 to 0.5 cut. The Lower Susquehanna
watershed (2050306), also located in an urbanized area, had a relatively higher accuracy level, 0.10
to -0.25 cut variations of FOR% and INT20 at were only 8.2% and 5.5%, respectively. Conversely, the
Little Kanawha watershed (5030203) located  in a forested area with high accuracy level, FOR% changed
only 4.2% from 0.1 to 0.5 cut (from 90.4 - 86.2%). However, the INT20 0.10 to 0.25 cut variation went
up to 8.7%.  These analysis can be applied to  other watersheds, providing valuable insights on the
accuracy of the landscape indicators across the region. These two landscape indicators serve as an
example of how landscape indicators derived  from thematic LC maps can be analyzed to reveal their
spatial accuracy and possible values in the study area.
 13.4  Conclusions

 We have developed a fuzzy set-based method to map the spatial accuracy of thematic maps and compute
 landscape indicators while taking into account the spatial variation of accuracy associated with different
 LC types. This method provides valuable information not only on the spatial patterns of accuracy
 associated with various cover types but also on the possible values of landscape indicators across the
 study area. Such insights have not previously been incorporated into any of the existing thematic map-
 related accuracy assessment methods. We believe that including a spatial assessment to the accuracy
 assessment process would greatly enhance the users capability to evaluate map suitability for numerous
 environmental applications.
 Page 218 of 339

-------
13.5  Summary

This chapter presented a fuzzy set-based method of mapping the spatial accuracy of thematic maps and
computing landscape indicators while taking into account the spatial variation of accuracy associated with
different LC types.  First, a multi-level agreement was defined, providing a framework to accommodate
different levels of matching between sampled pixels and mapped pixels. Then the multi-level agreement
data at the sampled pixel locations were used to construct spatial accuracy maps for six cover types
approximating an Anderson Level I classification for the Mid-Atlantic region.  A set of fuzzy rules were
developed that determined degrees of fuzzy membership for cover type conversion under different
conditions of accuracy and cover type dominance. Operations of the fuzzy rule set created a set of fuzzy
cover type conversion maps. Fuzzy LC maps were then created from a combination of six fuzzy cover-
type-conversion maps from all cover types. Then the LC maps were used to derive several cc-cut maps
that were binary maps for representative cover types in accordance to different degrees of fuzzy
membership.  Finally, landscape indicators were derived from those binary cc-cut LC maps. Variations of
the value of indicator values derived from different a-cut map illustrated the level of accuracy
(uncertainty) associated with watershed specific indicators.


13.6  Acknowledgments

The authors would like to thank Mr. James Wickham, U.S. EPA Technical Director of the Multi-
Resolution Land Characterization (MRLC) consortium, for his valuable remarks. In addition, comments
from Drs. Elizabeth R. Smith and Robert O'Neill were greatly appreciated. The first author gratefully
acknowledges partial financial support from the National Science Foundation and National Oceanic and
Atmospheric Administration (Grant SBE-9978052, Brent Yarnal, Principal Investigator) and from the
United States Environmental Protection Agency via Cooperative Agreement number R-82880301 with
Pennsylvania State University. Any opinions, findings, and conclusions or recommendations expressed
in this material are those of the authors and do not necessarily reflect those of the National Science
Foundation or the U.S. Environmental Protection Agency.
13.7  References

Bardossy, A., and L. Duckstein. Fuzzy Rule-Based Modeling with Applications to Geophysical,
   Biological and Engineering Systems. CRC Press, Boca Raton, 232 p., 1995.

Campbell, J.B. Spatial autocorrelation effects upon the accuracy of supervised classification of land
   cover. Photogrammetric Engineering and Remote Sensing, 47, 355-363, 1981.

Campbell, J.B. Introduction to Remote Sensing.  Guilford Press, New York, NY, 622 p., 1987.

Cheng, T., M. Molenaar, and H. Lin.  Formalizing fuzzy objects from uncertain classification results.
   International Journal of Geographical Information Science, 15( 1), 2001, 27-42.

Congalton, R.G. Using spatial autocorrelation analysis to explore the errors in maps generated from
   remotely sensed data. Photogrammetric Engineering and Remote Sensing, 54(5), 587-592, 1988.

Congalton, R.G., and K. Green.  Assessing the Accuracy of Remote Sensed Data: Principles and
   Practices. Lewis Publishers, Boca Raton, Florida, 137 p., 1999.
                                                                                 Page 219 of 339

-------
Fisher, P.P. Visualization of the reliability in classified remotely sensed images.  Photogrammetric
    Engineering and Remote Sensing, 60, 905-910, 1994.

Gopal, S. and C. Woodcock. Theory and methods of accuracy assessment of thematic maps using fuzzy
    sets.  Photogrammetric Engineering and Remote Sensing, 60, 181-189,1994.

Hammond, T.O., and D.L. Verbyla.  Optimistic bias in classification accuracy assessment. Int. J. Remote
    Sensing, 17,1261-1266, 1996.

Loveland, T.R., and D.M. Shaw.  Multiresolution land characterization: building collaborative
    partnerships. Proceedings of the ASPRS/GAP Symposium on Gap Analysis: a landscape approach to
    biodiversity planning (J.M. Scott, T. Tear, and F. Davis, Editors), Charlotte, NC, National Biological
    Service, Moscow, ID, pp. 83-89, 1996.

Moisen, G.C., D.R. Cutler, and T.C. Edwards, Jr. Generalized linear mixed models for analyzing error in
    a satellite-based vegetation map of Utah.  In: Spatial Accuracy Assessment in Natural Resources and
    Environmental Sciences: Second International Symposium (H.T. Mowrer, R.L. Czaplewski, and
    R.H.  Hamre, Editors), USDA Forest Service, General Technical Report RM-GTR-277, Fort Collins,
    CO, pp. 459^66, 1996.

Muller, S.V.,  D.A. Walker, F.E. Nelson, N.A. Auerbach, J.G. Bockheim, S. Guyer, and D. Sherba.
    Accuracy assessment of a land-cover map of the Kuparuk River Basin, Alaska: considerations for
    remote regions.  Photogrammetric Engineering and Remote Sensing, 64(6), 619-628, 1998.

Shao, G., D. Liu, and G. Zhao. Relationships of image classification accuracy and variation of landscape
    statistics.  Canadian Journal of Remote Sensing, 27(1), 33-43, 2001.

Smith, J.H., J.D. Wickham, S.V. Stehman, and L. Yang.  Impacts of patch size and land-cover
    heterogeneity on thematic image classification accuracy. Photogrammetric Engineering and Remote
    Sensing, 68(1), 65-70,2001.

Steele, B.M.,  J.C. Winne, and R.L. Redmond.  Estimation and mapping of misclassification probabilities
    for thematic land cover maps. Remote Sens. Environ., 66, 192-202, 1998.

Stehman, S.V., J.D. Wickham, L. Yang, and J.H. Smith.  Assessing the accuracy of large-area land cover
    maps: Experiences from the Multi-Resolution Land-Cover Characteristics (MRLC) project.
    Proceedings of the Fourth International Symposium on Spatial Accuracy Assessment in Natural
    Resources and Environmental Sciences (G.B.M. Heuvelink and M.J.P.M. Lemmens, Editors),
    Amsterdam (July 12-14, 2000),  pp: 601-608, 2000.

Story, M., and R.G. Congalton. Accuracy assessment: A user's perspective.  Photogrammetric
    Engineering and Remote Sensing, 52(3), 397-399, 1986.

Townsend, P. A quantitative fuzzy approach to assess mapped vegetation classifications for landscape
    applications. Remote Sens. Environ., 72(3), 253-267, 2000.

Townshend, J.R.G., C. Huang, S.N.V. Kalluri, R.S. DeFries, and S. Liang. Beware of per-pixel
    characterization of land cover. Int. J.  Remote Sensing, 21(4), 839-843, 2000.

Tran, L.T., M.A. Ridgley, L. Duckstein, and R. Sutherland.  Application of fuzzy logic-based modeling to
    improve the performance of the Revised Universal Soil Loss Equation. Catena 47(ER3), 203-226,
    2002.
 Page 220 of 339

-------
Verbyla, D.L., and T.O. Hammond. Conservative bias in classification accuracy assessment due to pixel-
   by-pixel comparison of classified images with reference grids.  Int. J. Remote Sensing, 16, 581-587,
   1995.

Vogelmann, J.E., T. Sohl, and S.M. Howard. Regional characterization of land cover using multiple
   sources of data. Photogrammetric Engineering and Remote Sensing, 64(1), 45-57, 1998.

Vogelmann, J.E., S.M. Howard, L. Yang, C.R. Larson, B.K. Wylie, and N. Van Driel. Completion of the
   1990s National Land Cover Data Set for the conterminous United States from Landsat Thematic
   Mapper data and ancillary data source. Int. J. Remote Sensing, 67(6),650-662, 2001.

Woodcock, C. and S. Gobal.  Fuzzy set theory and thematic maps: accuracy assessment and area
   estimation. Int. J. Geographical Info. Science, 14(2), 153-172, 2000.

Yang, L., S.V. Stehman, J.D. Wickham, J.H. Smith, and N.J. Van Driel. Thematic validation of land
   cover data for the eastern United States using aerial photography: feasibility and challenges.
   Proceedings of the Fourth International Symposium on Spatial Accuracy Assessment in Natural
   Resources and Environmental Sciences (G.B.M. Heuvelink and M.J.P.M. Lemmens, Editors),
   Amsterdam (July 12-14, 2000), pp. 747-754, 2000.

Yang, L., S.V. Stehman, J.H. Smith, and J.D. Wickham. Thematic accuracy of MRLC land cover for the
   eastern United States.  Remote Sens. Environ., 76, 418-422, 2001.

Zhang, J., and N. Stuart. Fuzzy methods for categorical mapping with image-based land cover data. Int.
   J. Geographical Info. Science, 15(2), 175-195, 2000.

Zhu, Z., L. Yang, S.V. Stehman, and R.L. Czaplewski. Designing an accuracy assessment for a USGS
   regional land cover mapping program. In:  Spatial Accuracy Assessment—Land Information
   Uncertainty in Natural Resources (K. Lowell and A. Jaton, Editors), Sleeping Bear Press/Ann Arbor
   Press, Chelsea, Michigan, pp. 393-398, 1999.

Zhu, Z., L. Yang, S.V. Stehman, and R.L. Czaplewski. Accuracy assessment for the U.S. Geological
   Survey regional land cover mapping program: New York and New Jersey region.  Photogrammetric
   Engineering and Remote Sensing, 66(12), 1425-1435,2000.
                                                                                Page 221 of 339

-------
Page 222 of 339

-------
                                    Chapter 14
               Fuzzy Set and Spatial Analysis Techniques
       for Evaluating Thematic Accuracy of a Land-Cover Map

                                          by

                                  Sarah R. Falzarano1*
                                  Kathryn A. Thomas2
   National Park Service
   Grand Canyon National Park
   2255 N. Grmini Drive
   Flagstaff, AZ 86001

   *Corresponding Author Contact:

    Telephone: (928)556-7164
     Facsimile: (928)556-7169
         E-mail: Sarah Falzarano@NP2S.gov
U.S. Geological Survey
Southwest Biological Science Center
Colorado Plateau Field Station
P.O. Box 5614
Flagstaff, AZ 86011-5614
141  Introduction

14.1.1   Accuracy Assessment

Accuracy assessments of thematic maps have often been overlooked.  With the increasing popularity and
availability of Geographic Information Systems (CIS), maps can be readily produced with minimal regard
for accuracy. Frequently, a map that looks good is assumed to be 100% accurate. Understanding the
accuracy of meso-scale (1:100,000- to 1:500,000-scale) digital maps produced by government agencies is
especially important because of the potential for broad dissemination and use. Meso-scale maps
encompass large areas, and thus the information may affect significantly large populations. Additionally,
digital information can be shared much more easily than hard copy maps in the rapidly growing
technological world.  Finally, information produced by public agencies is freely available and sometimes
actively disseminated. These combined factors highlight that a thorough understanding of the thematic
accuracy of a map is essential for proper use.

A rigorous assessment of a map allows for users to determine the suitability of the map for particular
applications. For example, estimates of thematic accuracy are needed to assist land managers in
providing a defensible basis for use of the map in conservation decisions (Edwards et al., 1998).
                                                                           Page 223 of 339

-------
Errors can occur and accumulate throughout a land-cover (LC) mapping project (Lunetta et al.,  1991).
The final map can have spatial (positional) and/or thematic (classification) error. Spatial error may occur
during the registration of the spatial data to ground coordinates or during sequential analytical processing
steps, while thematic errors occur as a result of cover type misclassifications. Thematic errors may
include variation in human interpretation of a complex classification scheme or an inappropriate
classification system for the data used (e.g., understory classification when satellite imagery can only
visualize the overstory).

This chapter focuses on analysis and estimation of thematic accuracy of a LC map containing 105 cover
types. Using a single reference data set, three methods of analysis were conducted to illustrate the
increase in accuracy information portrayed by fuzzy set theory and spatial visualization. This added
information allows a user to better evaluate use of the map  for any given application.
14.2  Analysis of Reference Data

14.2.1   Binary Analysis

The analysis and estimation of thematic accuracy of meso-scale LC maps has traditionally been limited to
a binary analysis (i.e., right/wrong) (Congalton and Green, 1999; Congalton, 1996).  This type of
assessment provides information about agreement between cover types as mapped (classified data) and
corresponding cover types as determined by an independent data source (reference data). The binary
assessment is summarized in an error matrix (Congalton and Green, 1999), also referred to as a confusion
or contingency table.  In the matrix, the cover type predicted by the classified data (map) is assigned to
rows and the observed cover type (reference data) is displayed in columns. The values in each cell
represent the count of sample points matching the combination of classified and reference data
(Congalton, 1996). Errors of inclusion (commission errors) and errors of exclusion (omission errors) for
each cover type, and overall map accuracy can be calculated using the error matrix. "User's accuracy"
corresponds to the area on the map that actually represents that land cover type on the ground.
"Producer's accuracy" represents the percentage of sampling points that were correctly classified for each
cover type.

A binary analysis of accuracy data using an error matrix omits information in two ways; (1) it does not
take into account the degree of agreement between reference and map data, and (2) it ignores spatial
information from the reference data. The error matrix forces each  map label at each reference point into a
correct or incorrect classification.  However, a LC classification is often not discrete where one type is
exclusive of all others. Instead, types grade from one to another, and may be related, justifying one or
more map labels for the same geographic area.  The binary assessment does not take into account that the
reference data may be incorrect. In addition, the error matrix does not use the locations of the reference
points directly, and accuracy is assumed to  be spatially constant within each  land cover type. Instead,
accuracy may  vary spatially across the landscape in a manner partially or totally unrelated to land cover
type (Steele et al., 1998).  This has led to the utilization of two additional analysis techniques, fuzzy set
analysis and spatial analysis, to describe the thematic accuracy of a LC map.


14.2.2  Fuzzy Set Analysis

An alternative method of analysis of thematic accuracy uses fuzzy set theory (Zadeh, 1965).  Adapted
from  its original application to describe the ability of the human brain to understand vague relationships
Gopal and Woodcock (1994) developed fuzzy set theory for thematic accuracy assessment of digital
Page 224 of 339

-------
     s   A fuzzy set analysis provides more information about the degree of agreement between the
^f rence and mapped cover types.  Instead of a right or wrong analysis, map labels are considered
r£ rtially right or partially wrong, generally on a five-category scale. This is more useful for assessing
Pa  etation types that may grade into one another, yet must be classified into discrete types by a human
v ^g _,er (Gopal and Woodcock, 1994). The fuzzy set analysis provides a number of measures with
  -hich to judge the accuracy of a LC map.
\v
       set theory aids in the assessment of maps produced from remotely sensed data by analyzing and
    ntifying vague, indistinct, or overlapping class memberships (Gopal and Woodcock.  1994).  Distinct
£ua  paries between LC types seldom exist in nature. Instead, there are often gradations  from one cover
f°U etation) type to another. Confusion results when a location can legitimately be labeled as more than
     cover type (i.e., vegetation transition zones). Unlike a binary assessment, fuzzy set analysis allows
°nrtial agreement between different land cover types.  Additionally, the fuzzy set analysis prov ides
?   •  ht into the types of errors that are being made.  For example, the misclassification of ponderosa pine
'"^odland as juniper woodland may be a more acceptable error compared to a desert shrubland label.  In
*   first instance, the misclassification may not be important if the map user wishes to know where all
coniferous woodlands exist in an area.


 14.2.3   Spatial Analysis

 Advanced techniques in assessing the thematic accuracy of maps are continually evolving.  A new
    hnique proposed in this chapter uses the spatial locations of the reference data to interpolate accuracy
 u  tween sampling sites to create a continuous spatial view of accuracy. This technique  is termed a
     matic spatial analysis, however, it should not be confused with assessing the spatial error of the map.
 The thematic spatial analysis portrays thematic accuracy in a spatial context.

     ference data inherently contain spatial information that is usually ignored in both binary and fuzzy set
     ivses.  For both analyses, the spatial locations of the reference data are not utilized in the summary
 al1 tistics, and results are given in tabular, rather than spatial format. The most fundamental drawback of
 S.   confusion matrix is its inability to provide information on  the spatial distribution of the uncertainty in
 1  classified scene (Canters, 1997). A thematic spatial analysis addresses this spatial issue by using the
 a  ographic locations gathered using a global positioning system (GPS) with the reference data. These
 focations are used in  an interpolation process to assign accuracy to locations that were not directly
    mpled.  Accuracy is not tied to cover type, but rather to the  location of the reference sites. Therefore.
 ^curacy can be displayed for specific locations of the LC map.

     ta that are close together in space are often more alike than those that are far apart. This spatial
     tocorrelation of the reference data is accounted for in spatial models. In fact, spatial models are more
  3  neral than non-spatial models (Cressie, 1993), and have less strict assumptions, specifically about
  ^dependence of the samples.  Therefore, randomly located reference data will be accounted for in a
  spatial model.

    iterature on the spatial variability of thematic map accuracy is limited. Congalton (1988) proposed a
      thod of displaying accuracy by producing a binary difference image to represent agreement or
   ,. agreement between the classified and reference images. Fisher (1994) proposed a dynamic portrayal
    fa variety of accuracy measures.  Steele et al. (1998) developed a map of accuracy illustrating the
  °  snitude and distribution of classification errors. The latter used kriging to  interpolate misclassification
  "\jrnates (produced from a bootstrapping method) at each reference  point.  The interpolated estimates
      re then used to construct a contour map showing accuracy estimates over the map extent.  This work
                                                                                       Page 225 of 339

-------
 provided a starting point for this study. The fuzzy set analysis described earlier was used in conjunction
 with kriging to produce a fuzzy spatial view of accuracy.
14.3   Background

A LC map, or map of the natural vegetation communities, water, and human alterations that represent the
landscape (e.g., agriculture, urban, etc.). provide basic information for a multitude of applications by
federal, state, tribal, and local agencies.  Several public (i.e., the USDA Forest Service and USDI Fish &
Wildlife Service) and private agencies (i.e., The Nature Conservancy) use meso-scale land cover maps for
local and regional
conservation planning.
LC maps can be used in
land use planning, fire
modeling, inventory,
and other applications.
Because of the potential
for use in a variety of
applications by
different users, it is important to
determine the thematic map accuracies.

A thematic accuracy  assessment was
conducted on the northern half of a
preliminary Arizona Gap Analysis
Program (AZ-GAP) LC map (Graham,
1995). The map (see Plate 14-1) was
derived primarily from Landsat
Thematic Mapper (TM) satellite
imagery from 1990.  Aerial video and
ground measurements were used  to
facilitate classification of spectral
classes into 105
discrete cover types for
Arizona using  a
modified Brown, et al.
(1979) classification
system. This system
attempted to model
natural hierarchies in
the southwestern
United States.
However, Graham's
procedures were  not
well  described  or
documented.
                        Formation

                            Tundra

                          | Forest

                            Wbodland

                            Chaparral
                            Oied Scrub

                           [ Ripiriin Forest/Woodland

                            Riparian Sen*

                           I Water


                                                           '*
                            Developed

                       Plate 14-1.
                                                                    50   100
                                                  200
                                                   i Kilometers
Preliminary AZ-GAP land-cover map to formation level classification
The preliminary map contained 58,170 polygons describing 105
vegetation types.
Page 226 of 339

-------
The preliminary LC map consists of polygons labeled with cover types contained in a GIS with a 40 ha
minimum mapping unit (MMU) and smaller in riparian locations.  This resolution is best suited for
interpretation at the 1:100,000-scale (meso-scale).
14.4  Methodology

14.4.1   Reference Data

A random sampling design, stratified according to cover type, was used to determine the set of polygons
to be sampled in the accuracy assessment. A total of 930 sampling sites representing 59 different cover
types in northern Arizona were visited during the summer of 1997. Field technicians identified dominant,
co-dominant, associate plant species and ancillary data for a 1.0 ha area. The field data at each site was
assigned to one of the 105 cover classes by the project plant ecologist using the incomplete definitions
provided  by Graham.  Each reference site was tied to the GPS-measured point location at the center of the
1.0 ha field plots. The resulting reference dataset, therefore, consisted of 930 points with a field assigned
cover type and associated point location.

14.4.2   Binary Analysis

Traditional measures of map accuracy were calculated by comparing the cover label at each reference site
to the map. Matches between the two were coded as either agreed (one) or disagreed (zero). These
statistics were incorporated into an error matrix from which User's and Producer's accuracies for each
cover type were calculated, as well as overall accuracy of the LC map.

14.4.3   Fuzzy  Set Analysis

The Gopal-Woodcock (1994) fuzzy set ranking system was refined for application to the reference data
for the  northern AZ-GAP LC map (see Table 14-1). The fuzzy set ranks reflected a hierarchical approach
to LC classification. While Gopal and Woodcock (1994) suggested that fuzzy set ranks for each cover
type be assigned at  each sampling point, this method would have been impractical in the field. Instead,
the fuzzy set ratings were predefined rather than assessed at each sampling site. A matrix of the 105
cover classes (reference versus map) assigned a fuzzy set rank to each reference site by comparing its
reference data assignment to the map assignment.

Using the fuzzy set rank for each reference site, the functions that described the thematic accuracy of the
classification were calculated (Gopal and Woodcock 1994).  For this study, we calculated the following
functions.

             Max (M)  = number of sites with an absolutely right answer (accuracy rank of 5)

             Right (R)  = number of sites with a reasonable, good, or absolutely right answer
                         (accuracy ranks of 3, 4, and 5)

      Increase (R-M)  = difference between the Right and Max functions
                                                                                   Page 227 of 339

-------
The Max (M) function calculated the same information as User's accuracy in a binary assessment. The
Right (R) function allowed reasonable and better answers to be counted. For this study, the R function
calculated the accuracy of the LC map to the life form level or better. The Increase (R-M) function
reflected the improvement in accuracy associated with using the R function instead of the M function.
Since the Gopal-Woodcock (1994) fuzzy set assessment was altered to save time in the field, certain data
for calculating membership, difference, ambiguity, and confusion statistics, were not collected.

    Table 14-1. Accuracy ranks assigned to the reference data of the AZ-GAP land-cover map.
Rank
1
2
3
4
5
Answer
Wrong
Understandable but Wrong
Reasonable or Acceptable
Good
Absolutely Right
Description
The reference and map types did not correspond, and there was
no ecological reason for the non-correspondence.
The reference and map types did not correspond, but the reason
for non-correspondence was understood.1
The reference and map types were all the same life form
(i.e., formation types2 ).
The reference and map types were characterized by the same
species at the dominant species level.
The reference and map types were exactly the same.
   These reasons include vegetation types that are ecotonal and/or vegetation types that can occur as
   inclusions within other vegetation types.
   Tundra, Coniferous Forest, Evergreen Woodland, Chaparral, Grasslands, Desert Scrub, Riparian
   Broadleaf, Woodland/Forest, Riparian Leguminous Woodland/Forest, Riparian Scrub, Wetlands, Water,
   and Developed.


14.4.4   Spatial Analysis

The nature of the accuracy ranks was explored by calculating the mean, median, and mode, and a
histogram was plotted.  The points were mapped to display the accuracy rank and location of the data.
Interpolating the accuracy ranks produces a continuous map of thematic accuracy. Kriging was data-
driven, and exploited the spatial autocorrelation exhibited by the data. An ordinary kriging regression
technique for estimating the best linear unbiased estimate of variables at a non-sampled location was
applied to reduce the local variability by calculating a moving spatial average.

The kriging  interpolation produces continuous values even though the accuracy ranks are ordinal.
However, because a value between two of the ranks is meaningful suggests that the kriged results are also
meaningful. For example, a value between "reasonable or acceptable" and "good" can be characterized as
"reasonably good."

The first step in the kriging process was to calculate the empirical variogram, or an analogous measure of
the spatial autocorrelation present in the data. The variogram is one of the most common measures of
spatial autocorrelation used in geostatistics. It is calculated as 0.5 the average difference squared of all
data values separated by a specified distance (lag):
                                                               1
Page 228 of 339

-------
where

                            h  = distance measure with magnitude only,
                         N(h)  = set of all pair-wise Euclidean distances /' -j = h,
                        \N(h)\  = number of distinct pairs in N(h),
                      z, and Zj  = fuzzy set ranks at spatial locations / and j

For the accuracy ranks in this study, we chose to use a modified version of the variogram to calculate the
empirical variogram, as follows (Cressie and Hawkins, 1980):
                                                                  ..-Zj
                                          v(M = \-\--v\»w	J_             (2)
                                          n  '        0.457 + 0.494
                                                           \N(h)\

This modified form of the variogram has the advantage of reducing the effect of outliers in the data
without removing specific data points.  The estimation is based on the fourth power of the square root of
the absolute differences in z-values.

Once an appropriate empirical variogram is calculated, a model is fit to the data (see Figure 14-1). The
model variogram has known mathematical properties (such as positive defmiteness), and is used in
kriging equations to determine the estimator weights. Possible valid models  include exponential,
spherical, Gaussian, linear, and power (Goovaerts,  1997).

The nugget effect (C0) represents the random variability present in a data set  at small distances. By
definition, the value of the variogram at a distance of zero is zero, however, data values can display a
discontinuity at very small distances. This apparent discontinuity at the origin could reflect the
unaccounted for spatial variability at distances smaller than the sampling distance or could be an artifact
of the error associated with measurement.

The range (A0) is the distance over which the samples are spatially correlated. The sill (C0 + C) is the
point of maximum variance, and is the  sum of the structural variance (C, variance attributed purely to the
process) and the nugget effect (Royle, 1980). It is the plateau that the model variogram reaches at the
range, and is estimated by the sample variance only in the case of a model showing a pure nugget effect.
The model is fit to the empirical variogram visually, and is optimized by calculating the residual sum of
squares (RSS). The values of the three main parameters are changed iteratively to reduce the RSS value
and fit the model.

Ordinary kriging was performed on the fuzzy set reference data.  The model  and parameters were selected
to produce a regularly spaced lattice of points representing accuracy ranks. Kriging predicted continuous
(rather than ordinal) accuracy ranks ranging from one to five. The grid of predictions covered northern
Arizona, was located at the center of 1.0 km2 cells,  and was taken to be the average of the cell. The
resulting tabular file of coordinate locations and predicted accuracy ranks was converted  to a grid format,
with predicted accuracy rank as the value of each cell. The result is the fuzzy spatial view of accuracy, a
map of predicted accuracy ranks for northern Arizona. The continuous accuracy  rank estimates were
                                                                                   Page 229 of 339

-------
rounded into ordinal ranks for ease of interpretation and display. A frequency histogram was produced
from the predicted accuracy ranks.
        Sill (C ••• Co)
    Nugget effect (Co)
                                                                o     o
                                                           Structural Variance (C)
                                                Lag Distance (h)     g>

         Figure 14-1. Generic variogram including empirical data (circles) and model (heavy line).
14.5  Results

74.5.7   Binary Analysis

User's and Producer's accuracies for each cover type, and overall accuracy were low (see Table 14-2).
The highest Producer's accuracies were for anthropogenic defined cover types industrial (60%) and
mixed agriculture/urban/industrial (80%). Producer's accuracies for natural cover types ranged between
zero and 50% with the best performers being Encinal mixed oak/mixed chaparral/semidesert grassland -
mixed scrub (50%) and Mohave blackbush - Yucca scrub (50%). Likewise, the highest User's accuracies
were also for anthropogenic defined cover types urban (91%) and industrial (86%).  Natural cover types
ranged between zero and 48.3%, with the best performer being Engelmann spruce - mixed conifer
(48.3%).  The standard error was <5.0% for almost all sampled vegetation types, and overall map
accuracy was 14.8%.
Page 230 of 339

-------
                *-2. Producer's and user's accuracies by land-cover type
TJ
Ui
CO
m
to
to
laciei
Code
3
4
5
6
7
8
9
10
11
12
T3
14
15
16
17
18
20
21
22
23
24
25
26
27
__28__
29
30 _
32
33 1
• - •*

Cover Type
Engelmann Spruce-Mixed Conifer
Rocky Mountain Lichen-Moss
Rocky Mountain Bristlecone-Limber Pine
Pinyon-Juniper-ShruWPorcderDsa Pme-Gambel Oak-Juniper
Pinyon-Juniper/Sagebrush/Mixed-Grass-Scrub 	 	 	
Pinyon-Juniper-Shrub Live Oak-Mixed Scrub
Pinyon-Juniper (Mixed}/Chapanal-Scrub
Pinyon-Juniper-Mixed Shrub 	 	
Pinyon- Juniper-Mixed Grass-Scrub 	 1
Pinyon-Juniper (Mixed) 	 —
Douglas Fir-Mixed Conifer 	
Arizona Cypress . _ 	 	 	 . 	 _ 	
Ponderosa Pine 	
Ponderosa Pine-Mixed Conifer 	
hpcnd'erosa Pine-Gambel Oak-Juniper/Pinyon-Jumper Complex
Ponderosa Pine-Pinyon-Juniper 	 . 	
Ponderosa Pine-Mixed Oak-Juniper 	
Encinal Mixed Oak 	 	 	 — 	 — —
Encircal Mixed Oak-Pmyon-Jumper 	 	 	
Encinal Mixed Oak-Mexican Pine-Juniper 	 _ 	
Encinal Mixed Oak-Mexican Mixed Pine
Fnrinal Mixed Oak-Mesqurte 	 . 	 _ 	
Enrirtal M"^ Oak/Mixed onaDarral/semideseil za-iita 	
Interior Criacarral Mtxed^Evergreen SchleropViyH 	 ,
intprior Chaparral (MixedJ/Sonoran-Paloverde-Mixed Cacti
— t r nr Chacarral (Mixed)/M xed Grass-Mixed Scrub Complex
Rocky Mountain/Great Basin Dry Meadow_ 	 . 	 	 	 __
KJoHrean flrv Meadow 	 _J

#of
Sites
29
— . • •— — ~<*
2
21
34
13
33
ia
34
41
35
8
45
23
36
39
3
1
5
2
1
1
10
2
18
1
10
22
.^ —

Producer's
Accuracy (%)
41.2
0.0
0.0
o.o
18.2 "1
8,0
8.3
0.0
6.3
6.7
38.5
25,0
12.5
115
11.8
16.7
10.0
0.0
16.7
0.0
0.0
0.0
50.0
0.0
?n n
33.3
OO "
— 	 • • 	 •
0.0
200
00
.

Standard
Error
7.0
0.0
0.0
0.0
6.5
' .3
4.6
0.0
3.7
3.9
7.2
12.8
4.8
5.4 ^
__5J_j
c ft
0.0
18.1
0.0
0.0
0.0
15.0
0.0
11.0
0.0
0.0
64
00

User's
Accuracy
W
48.3
0.0
0.0
0.0
11.8
3.0
0.0
2.9
2.4
28.6
12.5
133
13.0
16.7
ft 1
0.0
40.0
0.0
0.0
0.0
10.0
0.0
278
0,0
0.0
27 8
00

Standard
Error
7.2
0.0
0.0
0.0
5.5
2.9
0.0
2.9
2.1
6.7
9.9
4.6
5.6
5.9
^ ^
0.0
23.7
0,0
0.0
0.0
9.0
0.0
10.5
0.0
0.0
7 2
o.c

-------
         Table 14-2. Continued
CQ

-------
14.5.2   Fuzzy Set Analysis

The Max statistic for the fuzzy set reference data yields the same information as user's accuracy for the
binary accuracy assessment (see Table 14-3).  However, the R function provided a different view.
Accuracy improves across the table for all cover types because the R function was more inclusive than the
M function. For example, in cover class  18 (ponderosa pine - pinyon -juniper), the M statistic indicates
this type has very low accuracy (5.0%). The R statistic indicated that when assessed at the life-form level
it was 74% correct. The range for R statistics was large, between zero and 100%. However, the cover
types were more often correct to the life form (mean 52.7% ± 33.4%) compared to the M  statistic (mean
13.8% ± 18.8%). The mean increase in accuracy when viewed at the life form level was 38.8% ± 31.5%.


14.5.3   Spatial Analysis

The accuracy ranks had a mean and median near 3.0 with a large standard deviation, however, the mode
did not correspond to the mean and median (see Figure 14-2). The distribution had a fairly broad shape,
but is mostly symmetrical.  The fuzzy set reference data (see Figure 14-3) illustrated classic signs of being
positively spatially autocorrelated at shorter distance separations (see Figures 14-4 and 14-5). This was
substantiated by the  lower variance values at shorter lag distances.  Also, the variance values seem to
plateau at a lag distance where they become uncorrelated.  The empirical variogram was best fit with  a
spherical model (see Figure 14-4).  The parameters were iteratively changed to achieve a low residual
sum of squares, and  resulted in a nugget of 0.6638, sill of 1.4081, and range of 22.6 km.
    e spherical model and parameters were used to determine the weights in the kriging equations. The
 predicted accuracy ranks produced from kriging do not reach the extremes of "wrong" and "absolutely
 right." Instead, they range from a minimum of 1.039 to a maximum of 4.934, and mean and median are
 very close to 3.0 (see Figure 14-5).
The
pre
 The fuzzy spatial view of accuracy displays the predicted accuracy ranks reclassified as an ordinal
 variable (see Figure 14-6). High accuracy is lighter in color than low accuracy. The frequency histogram
 of accuracy ranks shows that approximately 85% of the fuzzy spatial view of accuracy had a rank of 3. 4.
 or 5 (see F'8ure  14-5).  In ecological terms, the LC map was accurate to the life form level or better for a
 majority of the study area.
                                                                                     Page 233 of 339

-------
ID
re
             Table 14-3.  Fuzzy set accuracy by land-cover type.
 oo
 UJ
•™^^— «^—
Code
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Cover Type
Engelmann Spruce-Mixed Conifer
Rocky Mountain Lichen-Moss
Rocky Mountain Bristlecone-Limber Pine
Pinyon-Juniper-Shrub/Ponderosa Pme-Gambel Oak-Juniper
Pinyon-Juniper/Sagebrush/Mixed-Grass-Scrub
Pinyon-Juniper-Shrub Live Oak-Mixed Scrub
Pinyon-Juniper(Mixed)/Chaparral-Scrub
Pinyon-Juniper-Mixed Shrub
Pinyon-Juniper-Mixed Grass-Scrub
Pinyon-Juniper (Mixed)
Douglas Fir-Mixed Conifer
Arizona Cypress
Ponderosa Pine
Ponderosa Pine-Mixed Conifer
Ponderosa Pine-Gambel Oak-Juniper/Pinyon-Juniper Complex
Ponderosa Pine-Pinyon-Juniper
Ponderosa Pine-Mixed Oak-Juniper
Encinal Mixed Oak
Encinal Mixed Oak-Pinyon-Juniper
Encinal Mixed Oak-Mexican Pine-Juniper
Encinal Mixed Oak-Mexican Mixed Pine
Encinal Mixed Oak-Mesquite
Encinal Mixed Oak/Mixed Chaparral/Semidesert Grassland-Mixed Scrub
Great Basin Juniper
Interior Chaparral Shrub Live Oak-Pointleaf Manzanita
Interior Chaparral Mixed Evergreen Schlerophyll
Interior Chaparral (Mixed)/Sonoran-Paloverde-Mixed Cacti
Interior Chaparral (Mixed)/Mixed Grass-Mixed Scrub Complex
Rocky Mountain/Great Basin Dry Meadow
Madrean Dry Meadow
#of
Sites
29
1
2
2
34
13
33
18
34
41
35
8
45
23
36
39
3
1
5
2
1
1
10
2
14
18
1
10
18
22
Max (M)
Best Answer
#
14
0
0
0
4
2
1
0
1
1
10
1
6
3
6
2
1
0
2
0
0
0
1
0
5
5
0
0
5
0
%
48.3
00
0 0
0.0
11.8
15.4
3.0
0.0
2.9
2.4
28.6
12.5
13.3
13.0
16.7
5.1
33.3
0.0
40.0
0.0
0.0
0.0
10.0
0.0
35.7
27.8
0.0
0.0
27.8
0.0
Right (R)
Correct
#
25
0
2
10
19
11
12
7
18
21
28
1
28
16
20
29
2
0
3
0
0
1
2
0
6
7
1
0
5
6
%
86.2
0.0
100.0
47.6
55.9
846
36.4
389
529
51.2
80.0
12.5
62.2
69.6
55.6
74.4
66.7
0.0
60.0
0.0
0.0
100.0
20.0
0.0
42.9
38.9
100.0
0.0
27.8
27.3
Increase
(R-M)
#
11
0
2
10
15
9
11
7
17
20
18
0
22
13
14
27
1
0
1
0
0
1
1
0
1
2
1
0
0
6
%
37 9
0.0
1000
47.6
44.1
692
33.3
389
50.0
48.8
51.4
00
48.9
565
389
692
33 3
0.0
20.0
0.0
0.0
100.0
10.0
0.0
7.1
11.1
100.0
0.0
0.0
27.3

-------
                     , Continued
\
Code
34
35
36
37
38
42
43
44
45
46
47
48
49
SO
51
52
53
55
58
59
61
63
64
75
82
83
84
35
87

I
Cover Type
Great Basin (or Plains) Mixed Grass
Great Basin (or Plains) Mixed Grass-Mixed Scrub
Great Basin (or Plains) Mixed Grass-Sagebrush
Great Basin (or Plains) Mixed Grass-Saltbush
Great Basin (or Plains) Mixed Grass-Mormon Tea
Semidesert Mixed Grass-Mixed Scrub
Great Basin Sagebrush Scrub
Great Basin Big Sagebrush-Juniper-Pinyon
Great Basin Sagebrush-Mixed Grass-Mixed Scrub
Great Basin Shadscale-Mixed Grass-Mixed Scrub
Greal Basin Greasewood Scrub
Great Basin Saltbush Scrub
Great Basin Blackbrush-Mixed Scrub
Great Basin Mormon Tea-Mixed Scrub
Great Basin Winterfat-Mixed Scrub
Great Basin Mixed Scrub
Great Basin Mormon Tea/Pinyon-Juniper
Mohave Creosotebusri-Bursage Mixed Scrub
Mohave Blackbush- Yucca Scrub
Mohave Saltbush Yucca Scrub
Mohave Creosotebush-Brittlebush Mohave Globemallow Scrub
Mohave Joshua Tree
Mohave Mixed Scrub
Soncran Paloverde-Mixed Cacti-Mixec Scrub
Agriculture
Urban
Industrial
Mixed Agricullure/Urban/l ndustrial
Water
#of
Sites
20
40
4
24
20
2
12
30
27
24
11
7
36
18
11
26
16
7
13
5
5
1
9
1
1
11
7
20
2
MaxM \
Best Answer
#
2
6
1
5
1
0
0
4
6
0
3
1
4
6
0
3
0
2
3
0
0
0
1
0
c
10
6
4
0
%
10.0
15.0
25.0
20.8
5.0
0.0
0.0
13.3
22.2
0.0
27.3
14.3
11.1
33.3
0.0
11.5
0.0
28.6
23.1
0.0
0.0
0.0
11.1
0.0
0.0
90.9
85.7
20.0
0.0
mgjttlKt
Correct
n
3
15
2
10
B
0
7
13
17
13
10
4
24
9
5
18
10
%
15.0
37.5
50.0
41.7
45.0
0.0
58.3
43.3
63.0
54.2
90.9
57.1
66.7
50.0
45.5
69.2
62.5
6 85.7
11 i 846
5 100.0
5 j 100.0
1 ! 100.0
9
0
0
11
7
19
0
100.0
0.0
0.0
100.0
100.0
95.0
0.0
Increase
(R-M)
#
1
9
1
5
8
0
7
9
11
13
7
3
20
3
5
15
10
4
8
5
5
1
8
0
0
1
1
15
0
%
5.0
22.5
25.0
20.8
40.0
0.0
58.3
30.0
40.7
54.2
63.6
42.9
55.6
16.7
45.5
57.7
62.5
57.1
61.5
100.0
100.0
100.0
88.9
0.0
0.0
9.1
14.3
75.0
0.0
	 Sum: 930 138 523 385
en
o
                                                        Accuracy of the Whole Map:
                                                                                                     14.8
                                                                                                                    56.2
                                                                                                                                  41.4

-------
                   10000
                   07500
                   05000
                  02500

                            02032
                                           02323
                                                         0 1677
                                                                             Mrcmum         1
                                                                                            5
                                                                             '
                                                                             Median          3
                                                                             Skewness    004100
                                                                                         •1 286
                                                                       02484
                                             2                            4

                                                     Accuracy Rank

                             Figure 14-2. Frequency histogram of accuracy ranks.
                                                                                        .
                         •

                                Figure 14-3.  Map of fuzzy set reference data.
Page 236 of 339

-------
    1.5-
     0.0
         81 = 1.4081
                         nugget effect = 0 6638
                                                                    RSS = 0.8759
!
I
                                                               T
                                                              150
 1
200
                                        Distance (km)
    Figure 14-4.  Semivariogram and spherical model of the fuzzy set reference data (930
                points).
    1 0000
    0.7500
    05000
    02500
01486
0.7676


Accuracy Standard
Rank Error
Minimum 1 039 0 1785
Maximum 4 934 1 1880
Maan 2 901 1 1070
Madian 2900 11220
00807
00004
    0.0000
                 1                            345
                                        Accuracy Rank
Figure 14-5. Frequency histogram of accuracy ranks in the fuzzy spatial view of accuracy.
                                                                              Page 237 of 339

-------
                                          ACCURACY RANK
                                          m 1 - vrong
                                          |^B 2* indentanojc* but wrong
                                          •• 3 • M»anaa» or acceptac*
                                              4-0000
                             Figure 14-6. Fuzzy spatial view of accuracy.
 14.6   Discussion

 A hmaiA analvsis using an error matrix provides limited information about thematic accuracy of a LC
 map  In fact, .in overall accuracy of 14.8% for the map was dismal and discouraged use of the map for
 anv application.  However, this was not unexpected given the preliminar> nature of the map. high number
     . n i\pcs. small rcleii-nce data sample si/e (//) compared to number of cover types, and lack of
 documentation ol the draham vegetation t\pes.  In fact, a binan anaKsis is conservatively biased against
 a classification ssstcm that is [HHtrl) defined and numerous in classes (Verb)la and Hammond,  1995).
 I he lack ol descriptions m the ciraham classification s\stem made labeling the cover type of each
 iclciciuc point difficult.  In addition, division of the cov er types of Arizona into 105 classes made
 distinguishing between tvpes problematic.  I herefore, a binary anahsis likelv assigned a wrong ansxver to
 locations with paitiallv  correct I  (  classification.
Page 238 of 339

-------
A fuzzy set analysis provided more information about the agreement between the reference data and the
map, and was less biased against a small sample size compared to number of cover types. The M
statistics were disturbing, but less so when the R statistics were considered. The R function indicated that
many cover types were more accurately classified to the life form level.  Yet, even for this statistic,
accuracies did not achieve the target 80% in most instances. This added information allowed the user and
producer to judge the value of the LC map for different applications. For example, for certain cover
types, the map performed adequately to the life form level, and could be used in applications where this
determination is all that is required. Fuzzy set theory was particularly appropriate for LC classification
systems that must be discrete but represent a continuum.

Adding the spatial location of accuracy to the accuracy ranks contributed additional accuracy information
to the LC map. Thematic map accuracy may vary spatially across a landscape in a manner partially or
totally unrelated to cover type.  In other words, a cover type may be misclassified more often when it
occurs in certain contexts, such as on  steep slopes.  Also, cover types that were located near ground
control data used in the map development tended to be more correct than remote areas for which only
imagery was used to develop the map.

The fuzzy spatial view of accuracy built upon the information produced by the fuzzy set analysis, and
created a map of accuracy of the preliminary AZ-GAP LC map. Not only was accuracy displayed as it
varied across the northern Arizona landscape, but the degree of accuracy was conveyed by accuracy
ranks.  Overall, the fuzzy spatial view of accuracy indicated that the LC map was accurate to the life form
level, with locations of higher and lower accuracy.  The histogram of accuracy ranks for northern Arizona
indicated that the interpolated accuracy was 85% at the life form level for all cover types. However,
where classification required identification of the dominant and, in some cases associate, species,
accuracy remained low (8.0%).

The fuzzy spatial view of accuracy facilitated the identification of areas with low accuracy that  needed
focused attention to refine the map and allowed users to assess the accuracy of the map for their specific
area of interest.
 14.7   Summary

 Using the same reference data and LC map, three methods of thematic accuracy assessments were
 conducted.  First, a traditional thematic accuracy assessment using a binary rule (right/wrong) was used to
 compare mapped and reference data. Results were summarized in an error matrix and presented in tabular
 form by thematic class.  Second, a fuzzy set assessment was used to rank and express the degree of
 agreement between the mapped and reference data. This allowed for the expression of accuracy to reflect
 the fuzzy nature of the classes. Results were also displayed in tabular form by class, but included several
 estimates of accuracy based on the degree of agreement defined.  Lastly, a spatial analysis using the
 accuracy rank of the reference data was interpolated across the study area and displayed in map form.
 Fuzzy set theory and spatial visualization help portray the accuracy of the LC  map more effectively, to the
 user than a traditional binary  accuracy assessment. The approach provided a substantially greater level of
 information about map accuracy, which allows the map users to thoroughly evaluate its utility for specific
 project applications.
                                                                                     Page 239 of 339

-------
14.8  References

Brown, D.E., C.H. Lowe, and C.P. Pase. A digitized classification system for the biotic communities of
    North America, with community (series) and association examples for the Southwest. Journal of the
    Arizona-Nevada Academy of Science, 14(1), 1-16, 1979.

Canters, F. Evaluating uncertainty of area estimates derived from fuzzy land-cover classification.
    Photogrammetric Engineering and Remote Sensing, 63(4), 403-414, 1997.

Cochran, W.G. Sampling Techniques,  3rd Edition.  Wiley & Sons, New York, 1977.

Congalton, R.G.  Accuracy assessment: A critical component of land cover mapping.  In Gap Analysis:
    A Landscape Approach to Biodiversity Planning (J.M. Scott, T.H. Tear, and F.W. Davis, Editors).
    American Society for Photogrammetry and Remote Sensing, Bethesda, MD, 1996, pp. 119-131.

Congalton, R. Using spatial autocorrelation analysis to explore the errors in maps generated from
    remotely sensed data. Photogrammetric Engineering and Remote Sensing, 54(5), 587-592, 1988.

Congalton, R.G. and K.. Green. Assessing the Accuracy of Remotely Sensed Data:  Principles and
    Practices.  Lewis Publishers, Boca Raton, FL, 1999.

Cressie, N. Statistics for Spatial Data, Revised Edition. Wiley & Sons, New York, 1993.

Cressie, N. and D.M. Hawkins.  Robust estimation of the variogram:  I. Mathematical Geology, 12, 115-
    125,  1980.

Edwards, T.C., Jr., G.G. Moisen, and D.R. Cutler. Assessing map accuracy in a remotely sensed,
    ecoregion-scale cover map.  Remote Sens. Environ., 63, 73-83, 1998.

Fisher, P.P.  Visualization of the reliability in classified remotely sensed images. Photogrammetric
    Engineering and Remote Sensing,  60(7), 905-910, 1994.

Goovaerts, P. Geostatistics for Natural Resources Evaluation. Oxford University Press, New York,
    1997.

Gopal, S. and C. Woodcock. Theory and methods for accuracy assessment of thematic maps using fuzzy
    sets.  Photogrammetric Engineering and Remote Sensing, 60(2), 181-88, 1994.

Graham, L.A. Preliminary  Arizona Gap Analysis Program Land Cover Map. University of Arizona,
    Tucson, 1995.

Lunetta, R.S., R.G. Congalton, L.F. Fenstermaker, J.R. Jensen, K.C. McGwire, and L.R. Tinney.  Remote
    sensing and geographic information system data integration: error sources and research issues.
    Photogrammetric Engineering and Remote Sensing, 57(6), 677-687, 1991.

Royle, A.G.  Why geostatistics?  In:  Geostatistics (A.G. Royle, I. Clark, P.I. Brooker, H. Parker,
    A. Journel, J.M. Endu,  R. Sandefur, D.C. Grant, and P. Mousset-Jones, Editors),  McGraw Hill, New
    York, 1980.

Steele, B.M., J.C. Winne, and R.L. Redmond.  Estimation and mapping of local misclassification
    probabilities for thematic land cover maps.  Remote Sens. Environ., 66(2), 192-202, 1998.
Page 240 of 339

-------
Verbyla, D.L. and T.O. Hammond. Conservative bias in classification accuracy assessment due to
   pixel-by-pixel comparison of classified images with reference grids. Int. J. Remote Sensing, 16(3),
   581-587, 1995.

Zadeh, L.A.  Fuzzy sets.  Information and Control, 8, 338-353, 1965.
                                                                                   Page 241 of 339

-------
Page 242 of 339

-------
                                  Appendix A



          Arizona  Gap Analysis Classification System
Formation
Land-Cover Class
Tundra
Rocky Mountain Lichen-Moss
Forest
Engelmann Spruce-Mixed Conifer
Forest
Rocky Mountain Bristlecone-Limber Pine
                         Douglas Fir-Mixed Conifer
                         Arizona Cypress
Forest
—	

Forest
                         Ponderosa Pine
Ponderosa Pine-Mixed Conifer
Forest
Ponderosa Pine-Gambel Oak-Juniper/Pinyon-Juniper Complex
Forest
Ponderosa Pine/Pinyon-Juniper
Forest
•   	

Forest
Ponderosa Pine-Aspen
Ponderosa Pine-Mixed Oak-Juniper
 Forest
 —•	

 Forest
 _	

 Woodland
Douglas Fir-Mixed Conifer (Madrean)
Ponderosa Pine (Madrean)
Pinyon-Juniper-Shrub/Ponderosa Pine-Gambel Oak-Juniper
 Woodland
 Pinyon-Juniper/Sagebrush/Mixed Grass-Scrub
 Woodland^
 Pinyon-Juniper-Shrub Live Oak-Mixed Shrub
 Woodland
 Pinyon-Juniper (Mixed)/Mixed Chaparral-Scrub
 Woodland
 Pinyon-Juniper-Mixed Shrub
 Woodland

 Woodland
 	•	
 Woodland

 Woodland

 Woodland
 Pinyon-Juniper-Mixed Grass Scrub
 Pinyon-Juniper (Mixed)
 Encinal Mixed Oak
 Encinal Mixed Oak-Pinyon-Juniper
 Encinal Mixed Oak-Mexican Pine-Juniper
 Woodland

 Woodland

 Woodland
 Encinal Mixed Oak-Mexican Mixed Pine
 Encinal Mixed Oak-Mesquite
 Encinal Mixed Oak/Mixed Chaparral/Semidesert Grassland-Mixed Scrub
 Woodland

 Chaparral

 Chaparral
 Great Basin Juniper
 Interior Chaparral-Shrub Live Oak-Pointleaf Manzanita
 Interior Chaparral-Mixed Evergreen Sclerophyll
                          Interior Chaparral (Mixed)/Son. Paloverde-Mixed Cacti
                          Interior Chaparral (Mixed)/Mixed Grass-Scrub Complex
 Grassland
 Rocky Mountain/Great Basin Dry Meadow
 Grassland
 Madrean Dry Meadow
 Grassland
 Great Basin Mixed Grass
 Grassland
 Great Basin Mixed Grass-Mixed Scrub
 Grassland
 Great Basin Mixed Grass-Sagebrush
 Grassland
 Great Basin Mixed Grass-Saltbush
                                                                               Page 243 of 339

-------
     Appendix A, Continued
Formation
Grassland
Grassland
Grassland
Grassland
Grassland
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Desert Scrub
Riparian Forest/Woodland
Riparian Forest/Woodland
Land-Cover Class
Great Basin Mixed Grass-Mormon Tea
Semidesert Tobosa Grass-Scrub
Semidesert Mixed Grass- Yucca-Agave
Semidesert Mixed Grass-Mesquite
Semidesert Mixed Grass-Mixed Scrub
Great Basin Sagebrush
Great Basin Big Sagebrush-Juniper-Pinyon
Great Basin Sagebrush-Mixed Grass-Mixed Scrub
Great Basin Shadscale-Mixed Grass-Mixed Scrub
Great Basin Greasewood Scrub
Great Basin Saltbush Scrub
Great Basin Blackbrush-Mixed Scrub
Great Basin Mormon Tea-Mixed Scrub
Great Basin Winterfat-Mixed Scrub
Great Basin Mixed Scrub
Great Basin Mormon Tea/Pinyon-Juniper
Mohave Creosotebush Scrub
Mohave Creosotebush-Bursage-Mixed Scrub
Mohave Creosotebush-Yucca spp. (incl. Joshua tree)
Mohave Blackbrush-Mixed Scrub
Mohave Blackbrush- Yucca spp. (incl. Joshua tree)
Mohave Saltbush-Mixed Scrub
Mohave Brittlebush-Creosotebush Scrub
Mohave Creosotebush-Brittlebush/Mohave Globemallow Scrub
Mohave Catclaw Acacia-Mixed Scrub
Mohave Joshuatree
Mohave Mixed Scrub
Chihuahuan Creosotebush-Tarbush Scrub
Chihuahuan Mesquite Shrub Hummock
Chihuahuan Whitethorn Scrub
Chihuahuan Mixed Scrub
Sonoran Creosotebush Scrub
Sonoran Creosotebush-Bursage Scrub
Sonoran Creosotebush-Mesquite Scrub
Sonoran Creosotebush-Bursage-Paloverde-Mixed Cacti (wash)
Sonoran Brittlebush-Mixed Scrub
Sonoran Saltbush-Creosote Bursage Scrub
Sonoran Paloverde-Mixed Cacti-Mixed Scrub
Sonoran Paloverde-Mixed Cacti/Sonoran Creosote-Bursage
Sonoran Paloverde-Mixed Cacti/Semidesert Grassland-Mixed Scrub
Sonoran Crucifixion Thorn
Sonoran Smoketree
Sonoran Catclaw Acacia
Great Basin Riparian/Cottonwood-Willow Forest
Interior Riparian/Cottonwood-Willow Forest
Page 244 of 339

-------
Appendix A, Continued
Formation
Riparian Forest/Woodland
Riparian Forest/Woodland
Riparian Forest/Woodland
Riparian Forest/Woodland
Riparian Forest/Woodland
Riparian Forest/Woodland
Riparian Scrub
Riparian Scrub
Riparian Scrub
Riparian Scrub
Riparian Scrub
Riparian Scrub
Riparian Scrub
Riparian Scrub
Riparian Scrub
Riparian Scrub
Riparian Scrub
Riparian Scrub
Riparian Scrub
Water
Developed
Developed
Developed
Developed
Land-Cover Class
Interior Riparian/Mixed Broadleaf Forest
Interior Riparian/Mesquite Forest
Sonoran Riparian/Cottonwood-Willow Forest
Sonoran Riparian/Cottonwood-Mesquite Forest
Sonoran Riparian/Mixed Broadleaf Forest
Sonoran Riparian/Mesquite Forest
Madrean Riparian/Wet Meadow
Playa/Semi-Permanent Water
Great Basin Riparian Forest/Mixed Riparian Scrub
Great Basin Riparian/Sacaton Grass Scrub
Great Basin Riparian/Reed-Cattail Marsh
Great Basin Riparian/Wet Mountain Meadow
Interior Riparian/Mixed Riparian Scrub
Sonoran Riparian/Leguminous Short-Tree Forest/Scrub
Sonoran Riparian/Mixed Riparian Scrub
Sonoran Riparian, Sacaton Grass Scrub
Sonoran Riparian/Low-lying Riparian Scrub
Sonoran/Chihuahuan Riparian/Reed-Cattail Marsh
Riparian/Flood-damaged 1993
Water
Agriculture
Urban
Industrial
Mixed Agriculture/Urban/lndustrial
                                                                   Page 245 of 339

-------
Page 246 of 339

-------
                                      Chapter 15

   The Effects of Classification Accuracy on Landscape Indices

                                            by

                                      Guofan Shao*
                                      Wenchun Wu

                       Department of Forestry and Natural Resources
                                   1159 Forestry Building
                                     Purdue University
                                 West Lafayette, IN 47907

                              *Corresponding Author Contact:

                               Telephone: (765) 494-3630
                                Facsimile: (765) 496-2422
                                   E-mail: shao@purdue.edu
15.1   Introduction

Remote sensing technology has advanced markedly during the past decades. Accordingly, remote sensor
data formats have evolved from image (pre-1970s) to digital formats subsequent to the launch of
Landsat 1 (1972), resulting in a proliferation of derivative map products. The accuracy of derived data
products has become an integral analysis step essential to evaluate appropriate applications (Congalton
and Green, 1999). During the past three decades, accuracy assessment has become widely applied and
accepted. Although methodologies have improved, little attention has been given to the effects of
classification accuracy on the development of landscape metrics or indices.

Thematic maps derived from image classification are not always the final product from the user's
perspective (Stehman and Czaplewski, 1998). Because all image processing or classification inevitably
introduces errors into the resultant thematic maps, any subsequent quantitative analyses will reflect.these
errors (Lunetta et al.,  1991). Landscape metrics are commonly derived from remote sensing derived LC
maps (O'Neill et al., 1988; McGarigal and Marks, 1994; Frohn, 1998). Metrics are commonly used to
compare landscape configurations through time or across space, or as independent variables in modeling
linking spatial pattern and process (Gustafson, 1998). Therefore, conclusions drawn directly or indirectly
from analyzing landscape metrics contain uncertainties.  The relationships between the accuracy of LC
maps and specific derived landscape metrics are quite variable (i.e., metric dependent), which complicates
assessment efforts (Hess, 1994; Shao et al, 2001).
                                                                              Page 247 of 339

-------
A major obstacle to assessing the accuracy of LC maps is the high cost of generating reference data or
multiple thematic maps for subsequent comparative analysis. Commonly employed solutions include
(1) selecting sub-sectional maps from a region (Riitters et al., 1995), (2) sub-dividing regional maps into
smaller maps (Cain et al., 1997), or (3) creating multiple maps using computer simulations (Wickham et
al.,  1997; Yuan 1997).  Maps created using the first or second method are spatially incompatible or
incomparable,  while maps created using the third method contain errors that do not necessarily represent
those found in  actual LC maps. Therefore, it is necessary to create multiple maps for a specific
geographic area using different analysts or different classification methods (Shao et al., 2001). The
approach presented here represents an actual image data analysis and, therefore, conclusions drawn from
the analysis should be broadly applicable.

Past studies have focused on only a few indices. Hess and Bay (1997) made a breakthrough in
quantifying the uncertainties of adjusted diversity indices.  Various statistical models have also been
developed to assess the accuracy of total area (%LAND) for individual cover types (Bauer et al., 1878;
Card, 1982; Hay, 1988; Czaplewski, 1992; Dymond, 1992; Woodcock, 1996). However, few have used
modeling to perform area calibrations (Congalton and Green, 1999). Shao et al. (2003) derived the
Relative Area Error (REA) index, which has causal relationships with area estimates of LC categories.
This study employed multiple  classifications and reference maps to demonstrate how classification
accuracy affects landscape metrics. Here the overall accuracy and REA were compared and a simple
method was demonstrated to revise %LAND values using corresponding REA index values.
15.2  Methods

Multiple thematic maps were derived from sub-scenes of Landsat Thematic Mapper (TM) data for two
sites (A, B) located in central Indiana and the temperate forest zone in eastern Eurasian continent (at the
border of China and North Korean). LC mapping was performed to approximate a Level I classification
product (Anderson et al., 1976). Site A thematic maps included the following classes: (1) agriculture
(including grassland); (2) forest (including shrubs); (3) urban; and (4) water.  The second site included
only forest and non-forest (clear cuts and other open areas) cover types. A total of 23 independent
thematic maps were developed for site A. Analysts (n=23) were allowed to use any method to classify
the TM imagery acquired on October 5, 1992. LC maps were evaluated based on the overall accuracy.
All the accuracies were comparable because all assessments were performed using the same reference
data set.  Students performed the image analysis, thus representing work performed by non-professionals
(Shao etal., 2001).

Eighteen thematic maps were created for site B using a single TM data set acquired on September 4, 1993
and a stack data set combining the 1993 data with another TM data acquired on September 21, 1987.
Training samples were acquired using three methods including (1) computer image interpretation,
(2) field observations, and  (3) a combination of the two above. Three classification algorithms were used,
including (1) the minimum distance (MD), (2) maximum likelihood (ML), and (3) extraction and
classification of homogeneous objects (ECHO).  Our goal was to make the classification process
repeatable and, therefore, represent a professional work process (Wu and Shao, 2002). Two additional
maps with 94.0% and 94.5% of overall accuracy, which were created with alternative approaches, were
also incorporated into this  study.  The overall accuracy of these maps ranged from 82.6% to 94.5% (Wu
and Shao, 2002).  More importantly, a reference map was manually digitized for site B.  The errors of
landscape metrics of each map were computed as:


                                                                                             (1)
.Page 248 of 339

-------
 where, E^^ = relative errors (in percent) of a given landscape index for a given thematic map; /map =
 landscape index value derived from a thematic map; 7ref = landscape index value derived from a reference
 map.

 Thematic maps were assigned into three accuracy groups based on the overall accuracy maps at site A
 (n=23).  Landscape metrics were computed for each map with the FRAGSTATS for site A (McGarigal
 and Marks, 1994) and with patch analyst (PA) for site B (Elkie et al., 1999). Nine landscape indices were
 used for site A.  They were largest patch index (LPI), patch density (PD), mean patch size (MPS), edge
 density (ED), area-weighted mean shape index (AWMSI), mean nearest neighbor distance (MNN),
 Shannon's diversity index (SHDI), Simpson's diversity index (SDI), and contagion index (CONTAG).
 Thirteen landscape  indices were used for site B.  They were PD, MPS, patch size coefficient of variance
 (PSCOV), patch site standard deviation (PSSD), ED, mean shape index (MSI), AWMSI, mean patch
 fractal dimension (MPFD), area-weighted mean patch fractal dimension (AWMPFD), MNN, mean
 proximity index (MPI), SDI and %LAND. These landscape indices had broad representation within the
 different cover categories (McGarigal and Marks, 1994).

 15.2.1  Relative Errors of Area (REA)

 If a thematic map contains n classes or types, its accuracy can be assessed with an error matrix
 (Table 15-1).


       Table 15-1. A general presentation of an error matrix adapted from Congalton and
                   Green (1999).
Classified
Cover Type
1

/

n
Total:

1
'11

fn

f*
f«
Reference Data
...






J
fv

f*

fnj
f+i
...






n
'in

f,n

L
f.n
Total
'i*

f»

k
N
                        n = the total number of land cover types;
                        N = the total number of sampling points;
       f,j (i and/ = 1, 2...., n) = the joint frequency of observations assigned to type / by
                            classification and to type/ by reference data;
                        k = the total frequency of type / as derived from the classification; and
                        f*j = the total frequency of type/as derived from the reference data.

For a given patch type k(\  ± k 
-------
The classification value of %LAND (LCJ is derived as:
                                               LCk =
                                                       f    *-}" "J   j=\                      {3)
                                                      Jk+   >=i      ;**
                                                       N     N          N


Thus, the difference between LCk and LRk is:


                                                                    n        n

                                                                   / . Jkj ~ / , Jik

                                              _ J=\	i=l     _ jfk
                                     ,  _ f    ^ J kj   J_, J ,k   J=i      ,=|                  (4)
                            - LR  =          = —
                                "       N             N               N

If LCk - LRk = 0, there are two possibilities:  classification errors are zero, or commission errors (CE) and
omission errors (OE) are the same for patch type k.  The first possibility is normally untrue in reality. In
many situations, the second possibility is untrue also. If CEk > OE& LCk - LRk > 0, the value of %LAND
of type k is overestimated; if CEk < OEk, LCk - LRk < 0, the value of %LAND of type k is underestimated.
Therefore, the components of CEk and OEk in  Eq. 4 determine the accuracy of %LAND for patch type k.

Mathematically, CEk is just as follows:


                                                                     CEk =



OEk is just expressed as:
                                                                           /=!
                                                                           ifk
The balance between CEk and OEk indicates the absolute errors of area estimate for patch type k.  The
relative errors of area (REA) are then defined as:
                                                                   fk
                                                                      1=1                       (7)
                                                                      — - xlOO
                                                                    kk
where, f^ is an element of the Ath row and Ath column in a error matrix. It represents the frequency of
sample points that are correctly classified.
Page 250 of 339

-------
According to Congalton and Green (1999), User's Accuracy of type k (UAk) can be expressed as:


                                              UAk=-
Jkk    Jkk         Jkk
                                                    fk+    I/,   /*+!/„             <8>
                                                           y-i           y-i
and Producer's Accuracy of type k (PAk) can be expressed as:
                                                 '"/,
       n
  \-k
          ' ik
       i=l            i=l
By substituting Equations 8 and 9 into 7, it is easily derived that:
                                                    REAk=\—	—1x100            (10)
                                                             UAk   PAt\
Thus, REA can be obtained using information on the error matrix or the user and producer's accuracy.

Under the assumption that the distribution of errors in the error matrix is representative of the types
misclassification made in the entire area classified, it is easy to calibrate area estimates with REA or UA
and PA as follows:


                                  —        =ADck-£*-*{—	—1x100
                                                 Pc,k    N  I              i
                                                           \   K      K /
where, Act = calibrated area in percent for a given land cover type k, and A^ = pre-calibrated area in
percent for a given land cover type k.
                                                                                Page 251 of 339

-------
15.3  Results

Figure 15-1 shows the means and standard deviations of nine landscape indices for three accuracy groups.
Except for PD and MRS, landscape indices had <10% differences in their means among three accuracy
groups. The standard deviations of the landscape indices in the lowest accuracy group are much higher
than those in the higher accuracy groups. The differences in standard deviations between the lowest
accuracy group and other two accuracy groups exceeded 100%, indicating that the uncertainties are
higher when classification accuracy was lower.
        Mean
Std. Dev.
Mean
Std. Dev.
Mean
Std. Dev.
30 -
25 --


s
0 - -






















1 2 -
10 -
6 -
2 -










—





123 123
Largest Patch Index
Mean 
-------
The statistics of classification accuracy, including the overall accuracy, producer's accuracy and user's
accuracy, all have differences of <20% among the three accuracy groups (see Figure 15-2a).  The
standard deviation values for overall accuracy are also about the same among the three accuracy groups
but those for producer's accuracy and user's accuracy are clearly different (see Figure 15-2b). Maps in
the lowest accuracy group have much higher variations in producer's accuracy and user's accuracy than
those in the other two accuracy groups.
        Landscape  Urban Agriculture  Forest    Water    Urban Agriculture  Forest    Water
   25
                                            (b)
         Overall
        Accuracy
Producer's
Accuracy
 User's
Accuracy
     \  \
       Landscape  Urban  Agriculture  Forest    Water   Urban  Agriculture  Forest   Water
    Figure 15-2.  The mean (a) and standard deviation (b) values for overall and individual classification
                accuracies.  LA = Lowest Accuracy; IA = Intermediate Accuracy; HA = Highest Accuracy.
                                                                                      Page 253 of 339

-------
For a few indices, such as MPDF, AWMPFD, and SDI at the landscape level, no matter what the
classification accuracy was, the errors of landscape indices were within a range of 10% (see Figure 15-3).
If classification accuracy was poor, the errors of some other landscape indices exceed 100%. They
include PD, PSCOV, ED, AWMSI, and MPI for entire landscapes or forest patches (see Figures 15-3 and
15-4).  Although no constant relationships were found between the overall accuracy and landscape
indices, maps with higher classification accuracy resulted in lower errors for the most of landscape
indices (see Figures 15-3 and 15-4). However, the overall accuracy did not have a good control on the
variations of landscape index errors, and therefore was not a reliable predictor for the errors of landscape
indices. This was particularly true when the overall accuracy is relatively low.
lann

i?nn
1000
Ann

Ann

?nn

-10
-40
_en



-70

-80
-90
-100
£
14f> -r
i?n .


100 •
80 •
60 •
40
on .

8
0_
-in

-?n

-'VI

-sn
* PD
•

$ «,
*****
• •

*" X A ++

t
32 84 86 88 90 92 94 9
PSSD ,
•

• *
+

* * * ^ *
• * * *
** *
(2 84 86 88 90 92 94 9
*AWMSI
* *
*
•
* * .
*.*
*t* %
^*
*• •
2 84 86 88 90 92 94 9
MSIN *

* * *.
* * * *
* * *
* ".«*


-fin



-80

-Qn

-inn
6
1*1(1

120

90



30

6 8
•)

-1 . -

-4 -



-6 --
7
6 82
°nn
1 ^n

mn

 • '«•'-»•
*

2 84 86 88 90 92 94 91
AWMFTD
™

^
• *
•
* *
« **
^
* * 4*

! 84 86 88 90 92 94 9(
SO
* •» * * *

* *


•

    82  84  86  88  90  92  94  96    82  84  86  88  90  92  94  96     82  84  86  88  90   92  94  96

Figure 15-3. The relative errors of selected 12 landscape indices for the landscape (y-axis) against the
            overall accuracy (x-axis).
Page 254 of 339

-------
1UUU
ftnn
• CAA
Ann
Oftn

(

1UO
Crt



-50
mn
e
cnn
4on-
300 •

9nn

100 •

8

ZQ
in



-10
_9n
^* PD

* • *
•
* * * **•
32 84 86 88 90 92 94 9
PSSD
• * * %
* +

.f.


»» •
2 84 86 88 90 92 94 9
AWMSI
» # •
» » » »
« *
•

* *«*
2 84 86 88 90 92 94 9
IVMN »
* * * %

+ ^
f • **

%••

u
on
_^n
-fin
on
mn
6 f
•ten


on

fin

30

6 8

-2 --


-6--

-8 . -
6 82
mnn
800
firm

Af)f\

9nn

0
-?nn
IVPS
* /
A *
* ^
•
•
* 4 *
52 84 86 88 90 92 94 9
.* K


A «
** • »
»
*!***.
* ^*»
284868890 92 949
NFFD
•
» «• 4» •
^^
> ** 4»
^

84 86 88 90 92 94 9
• IVFI
• * *
* *
% «

• «
* ***
	 »* 	 * 	 — —
3UU
4no
?nn
?nn
mn

6 t
•1C
-20 -
-9«i -

-V\ -

-T5 -

-40 -
-41 -
6 8
9fl

15 -
in .

5.

0 •
6 8,
^n
OA

m



-10
-20
• ^PSCOV
•
*
* ••*.••.
" t
«..•#
12 84 86 88 90 92 94 9e
MSI • •
»*
9
* ** * *
* * * *
• A A A *

* »

2 84 86 88 90 92 94 9f
AWMFFD
•
^ A A A A
* 4» % $ \
• * *
• * *
*
4r

2 84 86 88 90 92 94 9f
« %LAND

g »
^ •
A ^ A ^A

•
**
    82   84   86   88   90   92   94   96     82  84  86  88  90  92  94  96    82  84  86  88  90  92  94   96

Figure 15-4. The relative errors of selected 12 landscape indices for forest class (y-axls) against the overall
            accuracy (x-axis).
                                                                                          Page 255 of 339

-------
The errors of %LAND have a perfect linear relationship with REA (R2 = 0.98), but the errors of all other
landscape indices did not show simple relationships with REA (see Figure 15-5). The REA seems still to
have a better control on landscape indices errors than the overall accuracy did as the variations of
landscape index errors corresponding to REA were smaller than those corresponding to the overall
accuracy (see Figures 15-4 and 15-5).  Also, the lowest errors of landscape indices normally occurred
when REA reaches to zero (see Figure 15-5). Both the overall accuracy and REA were not reliable
indicators for explaining variations of spatially sophisticated landscape indices, such as  MNN and MPI.
800 •
finn .
400 •
200 •
0
\ PD

'
*
< *•/» ' *
-20 •
Af\
-fin -
-80 •
-inn -
MRS


^ *
* *
400
300
200
100
n
f PSCOV
* ^
* /. '*
* »*
*
    -20.00 -10.00  0.00  10.00  20.00
                                      -20.00 -10.00  0.00   10.00  20.00
                                                                        -20.00 -10.00  0.00  10.00  20.00

50 •
0 •
-50 •
mn .
PSSD
^ A ^ *
•«** % •

* \
% *
120
90
60
30
n
• B3

* »*
4 * *
~ *****
-20

-30 •


-40 •
-A* .
„ MSI
* •
* * *
»
• *

AWMSI

*****
»
* *
-2 -
-4 -
-fi -
-8 -
MFFD
4* 4» » »
» **» »
**
  -20.00 -10.00   0.00  10.00  20.00

500


300


100

  0
  -20.00 -10.00  0.00   10.00  20.00

 30

 20

 10

  0

-10

-20
  -20.00 -10.00  0.00   10.00  20.00
                                     -20.00 -10.00  0.00   10.00   20.00
 -20.00 -10.00  0.00   10.00  20.00

20

15
                                                                                    AVWFFD
                                                                                         -«-
                                    -20.00  -10.00  0.00   10.00  20.00
 -20.00  -10.00   0.00  10.00  20.00
* NNN
• *^»
* * * +
^ «»

800
finn .

200 •

0 •
-?nn .
WR «

»•*
^
J^
*
20
10
0
-10
%LAND «

***
J
^
                                      -20.00 -10.00  0.00  10.00  20.00
                                                                        -20.00 -10.00  0.00  10.00  20.00
Figure 15-5.  The relative errors of selected 12 landscape indices for forest class (y-axis) against the REA
             (x-axis).
Page 256 of 339

-------
 The relative errors of %LAND for the forest from the 20 maps range from 12% to 25% before calibration
 (Figure 15-6a). Based on Equation 11, the values of %LAND for the forest were calibrated and resulting
 errors of %LAND for the forest were between 2% to 5% (Figure 15-6b), which were much lower than the
 errors before calibration.
                     30
                     20
                     10
                    -10
                    -20
                    Figure 15-6. A comparison of %LAND errors for forest class
                               among thematic maps (n=20) before calibrations (a)
                               and after calibrations (b).
        Discussion

jyjethods used for image classified determine thematic maps classification content and quality. Although
there are different statistics that are used for assessing the accuracy of image data classifications, most are
Derived directly or indirectly from error matrices. Indices of thematic map accuracy indicate how well
image data are classified but do not tell how thematic maps correspond with landscapes structure and
function. This was partly because there is no effective approach to quantify classification errors that have
causal relationships with landscape function. The overall accuracy is the most frequently used accuracy
statistics but it was among statistics that have limited controls on the errors of landscape indices. In
                                                                                    Page 257 of 339

-------
 practice, greater overall accuracy resulted in more controllable errors associated with landscape indices.
 Only an unrealistic 100% accurate map represents perfect source data for computing indices.  For
 example, the overall accuracy of LC and LU map derived from TM data for the eastern United States was
 81% for Anderson Level I (i.e., water, urban, barren land, forest, agricultural  land, wetland, rangeland),
 and was 60% for Anderson Level II (Vogelmann et al., 2001). Such classification accuracies are not high
 enough for assuring reliable landscape  index calculations.

 The overall accuracy did not have a causal control on the variability of landscape  index accuracies. When
 overall accuracy is relatively low, it also lost a control on the difference between user's and producer's
 accuracies. It also appeared that the uncertainties of landscape indices were more sensitive to the
 variations  in user and producer accuracies, than to overall accuracy values along.  RLA values reflected
 the differences between user's and producer's accuracies, and therefore, had a better control on the errors
 of landscape indices than the overall accuracy, particularly when overall accuracy was relatively low.

 Because REA is derived for assessing the accuracy of %LAND, this index alone can be used to predict
 the errors of %LAND. The linear relationship with REA and  the area of forested  land verifies the
 reliability of such predictions with  REA.  While the overall accuracy is approximately the average of
 user's and  producer's accuracy, REA reveals the differences between user's and producer's accuracy.
 Therefore, the overall accuracy and REA explained different aspects of classification accuracy. Although
 the lowest  errors of landscape indices often occur when REA is near zero, variations in the errors of
 landscape  indices still existed. When REA and the overall accuracy were used together, the errors of
 landscape  indices were better predicted (greater overall accuracy the smaller REA). However, the overall
 accuracy and REA explain some aspects of classification errors, but did not explain other possible sources
 of classification errors (e.g., the spatial  distributions of misclassifications).  Therefore, these accuracy
 measures alone were not adequate to assess the accuracy of the MNN and MPI, which have particularly
 strong spatial features.

 The variations of landscape index errors were different among different landscape indices. For example
 the errors of MPDF, AWMPFD, and SDI at the landscape level were within a range of 10% whereas the
 errors of PD, PSCOV, ED, AWMSI, and MPI at for entire landscapes or forest patches exceed 100%.
 The former group of landscape indices was not as sensitive to  image data classification and the errors of
 these landscape indices were not controlled by classification accuracy measures. Landscape indices in
 this group  were unreliable despite the image classification accuracy values. The later group of landscape
 indices was sensitive to image data classifications, and therefore a small difference in classification
 accuracy resulted in a large difference in landscape index values. In this case, higher classification
 accuracy always performed superior when accuracy-sensitive landscape indices were used. Intermediate
 indices exhibited intermediate sensitivity to image data classifications.  The rule of higher overall
 accuracy and smaller absolute values of REA were particularly effective to this intermediate group.
 Further systematic studies are needed to determine which landscape index belongs to these sensitive
 groups.


 15.5  Conclusions

 The uncertainties or errors associated with landscape indices vary in their responses to image data
 classifications.  Also, the existing statistics methods for assessing classification accuracy have different
 controls relative to the uncertainties or errors of landscape indices.  Assessing accuracy of landscape
 indices requires combination knowledge of the overall accuracy (means of user's accuracy and producer's
 accuracy) and the REA (differences between user's accuracy and producer's accuracy). To reliability
Page 258 of 339

-------
characterize landscape condition using landscape indices, our results indicate the necessity to use maps
with high overall accuracy and low absolute REA.  The selections of landscape indices are also important
because different landscape indices have different sensitivities to image data classifications.  Based on
commonly achievable levels of classification accuracy, the magnitudes of errors associated with
landscape indices can be higher than the values of landscape indices.  Comparisons between different
thematic maps should consider these errors. Assuming that the distribution of errors identified by the
error matrix, are representative of the misclassifications across the area of interest, the total land area of
different class categories can be revised with REA and the errors of this landscape index can be lowered.
Revised values of %LAND should be used when quantifying landscape conditions.


15.6   Summary

A total of 43 LC maps from two study sites were used to demonstrate the effects of classification
accuracy on the uncertainties or errors of selected I 5 landscape indices.  The measures of classification
accuracy used in this study were the overall accuracy and REA.  The REA was defined as the difference
between the reciprocals of user's accuracy and producer's accuracy. Under variable levels of
classification accuracy, different landscape indices had different uncertainties or errors. These variations
or errors were explained by both the overall accuracy and REA. Thematic maps with relatively high
overall accuracy and low absolute REA, assured lower uncertainties or errors of at least several landscape
indices.  For landscape indices that were sensitive to classification accuracy, a small increase in
classification accuracy resulted in a large increase in their accuracy. Assuming that the error matrix truly
represents misclassification errors, the total areas of different class categories can  be calibrated using the
      index and the accuracy of quantifying or comparing relative landscape characteristics increased.
 15.7  Acknowledgments

 Thematic LC maps used in this sludy were partially provided by 23 students from a remote sensing class
  ffered at Purdue University in 1 999.  The Cooperative Ecological Research Program in cooperation with
 the China Academy of Sciences and the German Department of Science and Technology provided the
 TM data used in this study.  The authors would like to thank the book editors and anonymous reviewers
 for their insightful comments and suggestions on the manuscript.


 \ 5.8  References

 Anderson, J.R., E.E. I lardy, J.T. Toach, and R.E. Witmer. A land use and land cover classification
     system for use with remote sensor data.  U.S. Geological Survey Professional Paper 964, U.S.
     Government Printing Office, Washington, D.C. 28pp., 1 976.

 Bauer, ME., M.M. Hixson, B.J. Davis, and J.B. Etheridgs. Area  estimation of crops by digital analysis of
     Landsat data. Photogrammetrie Engineering and Remote Sensing, 44, 1 033- 1 043, 1 978.

 Card, D.H.  Using map categorical marginal frequencies to improve estimates of thematic map accuracy.
     photogrammetric Engineering and Remote Sensing, 48, 43 I -439, 1 982.

 Cain  D.H., K.. Riitters, and K. Orvis.  A multi-scale analysis of landscape statistics.  Landscape Ecology,
     12, 199-212, 1997.
                                                                                     Page 259 of 339

-------
Congalton, R.G. and K. Green. Assessing the Accuracy of Remotely Sensed Data: Principles and
    Practices.  New York, Lewis Publishers,  1999.

Czaplewski, R.L. Missclassification bias in areal estimates. Photogrammetric Engineering and Remote
    Sensing, 58, 189-192, 1992.

Dymond, J.R.  How accurately do image classifier estimate area? InternationalJournal of Remote
    Sensing, 13, 1735-1342, 1992.

Elkie, P., R. Rempel, and A. Carr. 1999. Patch Analyst User's Manual. Ont. Min. Natur. Resour.
    Northwest Sci.  & Technol. Thunder Bay, Ont. Tm-002.  16 p.

Frohn, R.C. Remote Sensing for Landscape Ecology: New metric indicators for monitoring, modeling,
    and assessment of ecosystems.  Boca Raton, Lewis Publications, 1998.

Gustafson, E.J. Quantifying landscape spatial pattern: What is the state of the art? Ecosystems, 1, 143-
    156, 1998.

Hay, A.M. The derivation of global estimates from a confusion matrix. Int. J. Remote Sensing, 9, 1395-
    1398, 1988.

Hess, G.R. Pattern  and error in landscape ecology: A commentary. Landscape Ecology, 9, 3-5,  1994.

Hess, G.R. and J.M. Bay. Generating confidence intervals for composition-based landscape indexes.
    Landscape Ecology, 12, 309-320, 1997.

Lunetta, R.S., R.G.  Congalton, L.F. Fenstermaker, J.R. Jensen, K.C. McGwire, and L.R. Tinney. Remote
    sensing and geographic information system data integration:  Error sources and research  issues.
    Photogrammetric Engineering and Remote Sensing, 57(6), 677-687, 1991.

McGarigal, K. and B.J. Marks. FRAGSTATS: Spatial patterns analysis program for quantifying
    landscape structure.  Unpublished software, USDA Forest Science, Oregon State University, 1994.

O'Neill, R.V., J.R. Krummel, R.H. Gardner, G. Sugihara, B. Jackson,  D.L. DeAngelis, B.T. Milne,
    M.G. Turner, B. Zygnut, S.W. Christensen, V.H. Dale, and R.L. Graham. Indices of landscape
    pattern. Landscape Ecology, 1, 152-162, 1988.

Riitters, K.H., R.V. O'Neill, C.T. Hunsaker, J.D. Wickham, D.H. Yankee, S.P. Timmins, K.B. Jones, and
    B.L.Jackson. A factor analysis of landscape pattern and structure metrics. Landscape Ecology, 10,
    23-39, 1995.

Shao, G., D. Liu, and G. Zhao. Relationships of image classification accuracy and variation of landscape
    statistics. Canadian Journal of Remote Sensing, 27, 33-43, 2001.

Shao, G., W. Wu, G. Wu, X. Zhou, and J. Wu. An explicit index for assessing the accuracy of cover class
    areas. Photogrammetric Engineering and Remote Sensing, 69(8), 907-913, 2003.

Stehman, S.V. and R.L. Czaplewski.  Design and analysis for thematic map accuracy assessment:
    Fundamental principles. Remote Sens.  Environ., 64, 331-344, 1998.

Vogelmann, J.E., M.H. Stephen, L. Yang, C.R. Clarson, B.K. Wylie, and N. Van Driel.  Completion of
    the  1990s national land cover data set for the conterminous United States from Landsat Thematic
    Mapper data and ancillary data sources. Photogrammetric Engineering and Remote Sensing, 67, 650-
    662,2001.
Page 260 of 339

-------
Wickham, J.D., R.V. O'Neill, K.H. Riitters, T.G. Wade, and K.B. Jones.  Sensitivity of selected landscape
   pattern metrics to landcover misclassification and differences in landcover composition.
   Photogrammetric Engineering and Remote Sensing, 63, 397-402, 1997.

Wookcock, C.E. On the roles and goals for map accuracy assessment: A remote sensing perspective.  In:
   Proceedings of 2nd International Symposium on Spatial Accuracy Assessment in Natural Resources
   and Environmental Sciences, Fort Collins, CO, 1996, USDA Forest Service Rocky Mountain Forest
   and Range Experiment Station, Technical Report RM-GTR-277, pp. 535-540, 1996.

Wu, W. and G. Shao. Optimal Combinations of Data, Classifiers, and Sampling Methods for Accurate
   Characterizations of Deforestation. Canadian Journal of Remote Sensing, 28(4), 601-609, 2002.

Yuan, D.  A simulation of three marginal area estimates for image classification. Photogrammetric
   Engineering and Remote Sensing, 63(4), 385-392, 1997.
                                                                                 Page 261 of 339

-------
Page 262 of 339

-------
                                     Chapter 16

         Assessing Uncertainty in Spatial Landscape Metrics
                    Derived from Remote Sensing Data

                                           by

                                    Daniel G. Brown1'
                                   Elisabeth A. Addink1
                                     Jiunn-Der Dun1
                                   Mark A. Bowersox2
  1
    School of Natural Resources and
    Environment
    University of Michigan
    Ann Arbor, Ml  48109-1115

    *Corresponding Author Contact:

     Telephone: (734) 763-5803
      Facsimile: (734)936-2195
         E-mail: danbrown@umich.edu
                                                2
Town of Pittsford
Spiegel Community Center
35 Lincoln Avenue
Pittsford, NY 14543
16.1   Introduction

Recent advances in the field of landscape ecology have included the development and application of
quantitative approaches to characterize landscape condition and processes based on landscape patterns
(Turner et al., 2001). Central to these approaches is the increasing availability of spatial data
characterizing landscape constituents and patterns, which are commonly derived using various remote
sensor data (i.e., aerial photography or multi-spectral imagery). Spatial pattern metrics provide
quantitative descriptions of the spatial composition and configurations of habitat or land-cover (LC) types
that can be applied to provide useful indicators of the habitat quality, ecosystem function, and the flow of
energy and materials within a landscape.  Landscape metrics have been used to compare ecological
quality across landscapes (Riitters et al., 1995), across scales (Frohn, 1997), and to track changes in
landscape pattern through time (Henebry and Goodin, 2002).  These comparisons can often provide
quantitative statements of the relative quality of landscapes with respect to some spatial pattern concept
(e.g., habitat fragmentation).

Uncertainty associated with landscape metrics has several components including (1) accuracy (how well
the calculated values match the actual values), (2) precision (how closely repeated measurements get to,
                                                                            Page 263 of 339

-------
 the same value), and (3) meaning (how comparisons between metric values should be interpreted).  In
 practical terms, accuracy, precision and the meaning of metric values are affected by several factors that
 include the definitions of categories on the landscape map, map accuracy, and validity and uniqueness of
 metric of interest.  Standard methods for assessing LC map accuracy provide useful information, but are
 inadequate as indicators of the spatial metric accuracy because they lack information concerning spatial
 patterns of uncertainty and the correspondence between the map category definitions and landscape
 concepts of interest. Further, direct estimation of the accuracy of landscape metric values is problematic.
 Unlike LC maps, standard procedures are currently not available to support landscape metric accuracy
 assessment. Also, the scale dependence of landscape metric values complicates comparisons between
 field observations and map-based calculations.

 As a transformation process, in which mapped landscape classes are transformed into landscape
 measurements describing the composition and configuration of that landscape, landscape metrics can be
 evaluated using precision and meaning diagnostics (see Figure 16-1). The primary objective is to acquire
 a metric with a known and relatively high degrees of accuracy and precision  that is interpretable with
 respect to the  landscape character!stic(s) of interest.  The research presented  in this chapter addresses the
 following issues: (1) precision  estimates associated with various landscape metrics derived from satellite
 images; (2) sensitivity of landscape metrics relative to differences in landscape class definitions; and
 (3) sensitivities of landscape metrics to landscape pattern concepts of interest (e.g., ecotone abruptness or
 forest fragmentation) versus potential confounding concepts (e.g., patchiness or amount of forest).
                       lnputMap   ^_    Landscape
                    ((Precision,         ,         A Precision)       =     Precisiono
                    /{Meaning,         ,         A Meaning)       =     Meaningo
                Figure 16-1.  Illustration of the issues affecting the quality and utility
                            affecting landscape pattern metric values derived from
                            landscape class maps. The precision and meaning of output
                            values from landscape metrics are functions of the precision
                            and meaning of the input landscape maps and the effect of
                            the metric transformation.


This chapter presents results from recent research that seeks to evaluate uncertainty in landscape metrics
as defined above. To calculate the precision of landscape metrics, repeated estimates of metric values are
used to observe the variation in the estimates.  Because measures of precision are based on multiple
calculations, they are more practical for landscape metric applications than are measures of accuracy.
Here we discuss two different approaches to performing multiple calculations of landscape metric values.
First, redundant mapping of landscapes was used to calculate the variation in metric values resulting from
the redundant maps.  Second, spatial simulation  was used to evaluate the response of landscape metric
values to repeated landscape mapping under a neutral model (Gustafson and Parker, 1992).

Following a general discussion of alternative types of landscape metrics, we compare past research and
our results to illustrate how landscape metric values vary using redundant mapping and  simulation
methods. First, the precision of estimates of change in metric values between two images was
investigated using redundant mapping of sample areas that were defined by the overlap  of adjacent
satellite scenes (Brown et al. 2000a;  Brown et al. 2000b). Next, the variations in  metrics were calculated
using landscape maps derived from the same remote sensing source, but classified using different
definitions and class. Comparisons illustrating the effects of alternative definitions of "forest," and the
Page 264 of 339

-------
application of LC versus land-use (LU) classes for calculating metrics are presented.  Finally, we evaluate
the use of simulation to investigate the interpretability of the construct being measured, the degree of
similarity between several landscape metrics and the concept of ecotone abruptness, and present
simulations to illustrate the problem of interpreting the degree of fragmentation from  landscape metrics
(Bowersox and Brown, 2001).


16.2  Background

A. variety of approaches to characterizing landscape pattern are available, each with their own
implications for the accuracy, precision, and meaning of a landscape pattern analysis.  With the goal of
quantitatively describing the landscape structure, landscape metrics provide information both about
landscape composition and configuration (McGarigal and Marks, 1995). The most common approach to
quantifying these characteristics has been to map defined landscape classes (e.g., habitat types) and
delineate patches of representative landscape classes. Patches are then defined as contiguous areas of
homogenous landscape condition. Landscape composition metrics describe the presence, relative
abundance, and diversity of various cover types. Landscape configuration refers to the "physical
distribution or spatial character of patches within the landscape" (McGarigal and Marks, 1995).
Summaries of pattern can be made at the level of the individual patch (e.g., size, shape, and relative
location), averaged across individual landscape classes (e.g., average size, shape, and location), or
averaged across all patches in the landscape (e.g., average size, shape, and location of all patches).

j\n alternative to patch-based metrics are metrics focused on identifying transition zone boundaries that
ofC present in continuous data. This approach has not been used as extensively as the patch approach in
landscape ecology (Johnston and Bonde, 1989; Fortin and Drapeau, 1995).  One approach to using
boundaries is to define "boundary elements," defined as cells that exhibit the most rapid spatial rates of
change, and "sub-graphs," which are strings of connected boundary elements that share a common
orientation (direction) of change (Jacquez et al., 2000). The landscape metrics characterize the numbers
of boundary elements and sub-graphs and the length of sub-graphs, which is defined by the number of
boundary elements in a sub-graph. An important advantage is that boundary-based statistics can be
calculated from images directly, skipping the classification step through which errors can propagate.
'Throughout this chapter, we refer to patch-based metrics, which were calculated using FRAGSTATS
/jvlcGarigal and Marks, 1995), and boundary-based metrics calculated using the methods described by
jacquez et al. (2000).


16.3   Methods

    .3-1  Precision of Landscape Change Metrics
 -TO measure imprecision in metric values, overlapping Landsat Multi-Spectral Scanner (MSS) path/row
 images were redundantly processed for two different study areas in the Upper Midwest to create
 classifications representing forest, non-forested, water, and other, and maps of the normalized difference
 vegetation index (NDVI).  Images on row 28 and paths 24-25 overlapped in the Northern Lower
 peninsula of Michigan and on row 29 and paths 21-22 overlapped on the border between Northern
 Wisconsin and the western edge of Michigan's Upper Peninsula (Brown et al., 2000a).

 The georeferenced MSS images at 60 m resolution were acquired from the North American Landscape
 Characterization (NALC) project during the growing seasons corresponding to three periods 1973-1975,
                                                                                    Page 265 of 339

-------
1985-1986, and 1990-1991 (Lunette et al., 1998).  Subsequent LC classifications of the four images
resulted in accuracies ranging from 72.5% to 91.2% (average 80.5%), based on comparison with aerial
photograph interpretations.

For landscape pattern analysis, the two study areas were partitioned into 5x5 km2 cells.  A total of 325
cells in the Michigan site and 250 in the Wisconsin-Michigan site were used in the analysis. The
partitions were treated as separate landscapes for calculating the landscape metric values. The values of
eight pattern metrics, four patch-based and four boundary-based, were calculated for each partition using
each of two overlapping images at each of three time periods in both sites.

The precision of landscape metric values was calculated using the difference between metric values
calculated for the same landscape partition within the same time period. For each metric, these
differences were summarized across all landscape partitions using the root mean squared difference
(RMSD).  To standardize the measure of error for comparison between landscape metrics, the relative
difference (RD) was calculated as the  RMSD divided by the mean of the metric values obtained in both
images of a pair.

76.3.2   Comparing Class Definitions

16.3.2.1  Landsat Classifications

To evaluate the sensitivity of maps to differences in class definitions, we calculated landscape metric
values from two independent LC classifications derived from Landsat Thematic Mapper (TM) imagery of
for the Huron River watershed located in Southeastern  Michigan. The only significant difference
between the two LC maps was the class definitions. Accuracy assessments were not performed for either
map. Therefore, the analysis serves only as an  illustration for evaluating the importance of class
definitions.

For the first map, Level I LU/LC classes were mapped  for the early-1990's using the National Land
Cover Data (NLCD) classification for the region. We developed the second date set using TM imagery
from July 24, 1988. It was classified to identify all areas of forest, defined as pixels with >40% canopy
cover, versus non-forest. Spectral clusters, derived through unsupervised classification (using the
ISODATA technique), were labeled through visual interpretation of the image and reclassified.
Landscape metrics were computed using FRAGSTATS applied to the forest class from both data sets
across the entire watershed. Also, the two date sets were overlaid to evaluate their spatial
correspondence.

16.3.2.2  Aerial  Photography Interpretations

We also compared two classifications  of aerial  photography over a portion of Livingston County in
Southeastern Michigan. The first date set consisted of a manual interpretation of LU and LC using color
infrared (CIR) aerial photographs (l:24,000-scale) collected in 1995 (SEMCOG, 1995).  The classes were
based on a modified Anderson et al. (1976) system, which we reclassified to high-density residential,
low-density residential, other urban, and other.  The second was a LC classification created through
unsupervised clustering, and subsequent cluster labeling of scanned color-infrared photography
(l:58,000-scale) collected in 1998.  The LC classes were forest, herbaceous, impervious, bare soil,
wetland, and open water.  The two maps were overlaid to identify the correspondence between the LC
classes and the urban LU classes. The percentages of forest and impervious cover were calculated within
each of the urban LU types.
Page 266 of 339

-------
16.3.3   Landscape Simulations

16.3.3.1   Ecotone Abruptness

An experiment was designed in which 25 different landscape types were defined, each representing a
combination of among five different levels of abruptness and five levels of patchiness (Bowersox and
Brown, 2001). Ecotone abruptness (i.e., how quickly an ecotone transitioned from forest to non-forest)
was controlled by altering the parameters of a mathematical function to model the change from high to
low values along the gradient representing forested cover.  Patchiness was introduced by combining the
mathematical surface with a randomized surface that was smoothed to introduce varying degrees of
spatial autocorrelation. Once the combined gradient was created, all cells with a value above a set
threshold were classified as forest, and below as non-forest. The threshold was set so that each simulated
landscape was 50% forested and 50% non-forested.

For each type  of landscape, 50 different simulations were conducted. The ability of each landscape
metric to  detect abruptness was then tested by comparing the values of the 50 simulations among the
different cover types.  The landscape metric values were compared among the abruptness and patchiness
levels, using analysis of variance (ANOVA).  The ANOVA results were analyzed to identify the most
suitable metrics for measuring abruptness (i.e., those exhibiting a high degree of variation between
landscape types with variable abruptness levels but a low degree of variation between landscape types
with variable patchiness).

In addition to  several patch-based metrics (including area-weight patch fractal dimension, area-weighted
mean  shape index, contagion, and total edge), boundary-based metrics were used, including (1) number of
boundary elements, (2) number of sub-graphs, and (3) maximum sub-graph length.  The analysis
compared the  ability of two new boundary-based metrics, designed specifically to measure ecotone
abruptness and distinguish different levels of abruptness. These new metrics characterize the dispersion
of boundary elements around an "average ecotone position," calculated as the centroid of all boundary
elements, and the area under curve of number of boundary elements versus slope threshold level.

16.3.3.2  Fragmentation

The sensitivity of several potential measures of forest fragmentation to the amount of forest was also
investigated through simulation. The simulation included:  (1) the generation of a random map for a 100
* 100 grid cells with pixel values randomly drawn from a normal distribution (mean = 0.0, standard
deviation = 1.0); (2) smoothing with a 5 x 5 averaging filter to introduce spatial autocorrelation; and
(3) maps  (n=10) were created by classifying cells as  forest or non-forest based on different threshold
levels. The threshold levels were defined so that the different maps had a uniformly increasing amount of
forest from about 9% to about 91% (see Figure 16-2). By extracting the maps with different proportions
of forest from the same simulated surface, patterns were controlled and the dominant difference among
maps  was the  amount forested. The simulation process  was repeated 10 times to produce a range of
output values  at each  landscape proportion level.
                                                                                 Page 267 of 339

-------
                     90%
                     81 %
                     72%
                     63%
                                                          rV'fc-'lKlit^"...*
                     53%
18%
                                                                      -•«..%
                                                                 •

                                                                  in    *  I
                                                                 '^n  . '  If
                                                                      m  +v

                                                                ,  '-*=f.  "...
                   Figure 16-2. One of 10 realizations of landscape simulations
                              created to illustrate the influence of the proportion of
                              the landscape covered by a class on the values of
                              landscape pattern metrics. The number indicates the
                              percentage of the landscape in forest (shown in
                              black).
Page 268 of 339

-------
16.4  Results
16.4.1   Precision of Landscape Metrics

Comparison among the patch-based metrics indicated that the number and size of patches were much less
precise than the area of forest and the edge density (see Table 16-1). A likely explanation is that the
number of patches and means patch size metrics required that the pixel classification and patch
aggregation processes be consistent.  Both of which can be sensitive to spatially patterned classification
error, thus suggesting that there are differences among metrics in the A precision described in
Figure 16-1.

                Table 16-1. The average relative error for eight different landscape
                           metrics, four based on identifying landscape patches
                           and four based on description of boundaries in a
                           continuous image.
Metric
Percent forest
Edge density
# patches
Mean patch size
Average
Relative
Error
0.23
0.35
0.75
1.52
Metric
# boundary elements
# sub-graphs
# singletons
Max sub-graph length
Average
Relative
Error
0.02
0.11
0.24
0.40
Comparing the patch- versus boundary-based metrics indicated that the majority of boundary metrics had
greater precision than the patch-based statistics (see Table 16-1). This can best be explained by the way
in which changes in precision were affected by the procedures used to calculate the metric values. All of
the patch-based metrics involved an image classification step, and two of them added a patch
identification step. Both of these steps are sensitive to spatial variations in image quality and to the
specific procedures used. Because the boundary-based metrics were calculated directly from the NDVI
images, there was less opportunity for propagation of the spatial pattern of error. Further, the boundary-
based metrics used only local information to characterize pattern, but the patch-based metrics use global
information (i.e., spectral signatures from throughout the image.  This use of global information
introduced more opportunities for error in metric calculation.

Additionally, we evaluated the effects of various processing choices on the precision of metrics (Brown et
al., 2000a). The results of this work suggest that haze in the images and differences in seasonal timing
were important determinants of metric variability.  Specifically, with less precision resulting from hazier
images and image pairs that are separated by more Julian days, irrespective of the year. Also,
summarizing landscape metrics over larger areas (i.e., using larger landscape partitions) increased the
precision of the estimates though it reduces the spatial resolution. Further, post-classification processing,
like sieving and filtering, did not consistently increase the precision, and can actually reduce the
precision.

The obvious cost associated with obtaining precise estimates through the empirical approach of redundant
mapping is that the areas need to be mapped twice.  However, the costs may be lower than the costs of
obtaining reference data for accuracy assessment, and can provide reasonable estimates of precision in a
                                                                                    Page 269 of 339

-------
pattern analysis context, where comparison with a reference data set is much more problematic. Guindon
et al. (2003) used a similar approach to dealing with the precision of LC maps.


16.4.2   Comparing Class Definitions


16.4.2.1   Comparing TM Classifications

Across all landscape metrics tested, our forest cover classification of the Huron River watershed
suggested that the landscape was much  less fragmented than did the NLCD forest class (i.e., that there
was more forest, in fewer but larger patches, with less forested/non-forested edge and more core area)
(see Table 16-2). Comparisons of forested cells indicated that forest cover occurred in several of the non-
forest NLCD classes.  The definitions of NLCD classes allowed for substantial amounts of forest cover in
non-forest classes. For example, in the  low-density residential class "vegetation" could account for 20%
to 70% of the cover (USGS, 2001). Also, the NLCD forest classes were not 100% forested. Although
65% of the forested cover in the region  (by our definition) was contained within forest classes as defined
by NLCD, 25% was located in agricultural areas and <6% in urban areas (see Table 16-3).
           Table 16-2.  Patch-based landscape metrics describing forest in Michigan's
                       Huron River watershed, as mapped in the NLCD data compared
                       with a separate classification of forested cover derived from
                       Landsat TM. All metrics are summarized at the class level for
                       the forest patches.
Metric
Percent forest
# patches
Mean patch size (ha)
Edge density (m/ha)
Mean shape index
Total core area index
NLCD Forest Data
28.1
28857
2.17
106.05
1.37
37.33
Forest Cover
Classification
31.1
19137
3.62
85.33
1.37
50.53
           Table 16-3. Percentage of generalized NLCD forest classes based on the
                      classification of Landsat TM data, and the percent of the total
                      forested cover within each NLCD class. The first column
                      indicates how much forested cover contained within each
                      NLCD class. The second indicates the amount of the forested
                      cover within each class.
Generalized NLCD Class
Urban
Forest
Agriculture and Herbaceous
Other
% Forest Cover
5.6
57.1
13.2
17.5

% Forest Cover
Total
5.6
65.1
25.8
3.5
100.0
 Page 270 of 339

-------
These findings indicate that landscape metrics are sensitive to the definitions of the input classes. This
sensitivity is a result of differences in the meaning of the classes themselves rather than the lack of
classification detail or because of inaccuracy in the classification. For some landscape analysis purposes
(e.g., habitat of a wildlife species), accounting for forested urban areas may be important.  Therefore,
some LC classifications,  while not necessarily inaccurate, may be inadequate for some purposes.

16.4.2.2  Comparing Photographic Classifications

Urban LU classes, as identified from aerial photographs, all had some amount of forest and impervious
cover (see Table 16-4). This comparison again illustrated the importance of class definitions, but raised
the additional issue of class definitions based on LC versus LU.  In the case of LU, the diversity of cover
types that made up residential areas was lumped together to map the LU type termed residential. Cover
types contained within urban LU regions included impervious surfaces, forest, and others (e.g.,
grasslands).
           Table 16-4. The percent of impervious surface and forested cover within
                       three urban land-use classes.
:••::•..--: •- \
High Density Residential
Low Density Residential
Other Urban
% Impervious
36.1
23.1
45.4
% Forest
15.4
16.8
19.1
16.4.3   Landscape Simulations

16.4.3.1   Ecotone Abruptness

The results of the analysis of metric sensitivity to the abruptness of ecotones suggested that existing
landscape metrics were not as useful for quantifying ecotone abruptness as were the new metrics
specifically designed for that purpose (Bowersox and Brown, 2001).  Some metrics (e.g., total edge,
maximum sub-graph length) were not sensitive to abruptness.  Those sensitive to abruptness were also
sensitive to landscape patchiness, which confounded their interpretation (i.e., numbers of boundary
elements and sub-graphs, area-weight mean shape index). The new metrics, dispersion of boundary
elements and cumulative boundary elements, were most consistently related to abruptness while not
exhibiting the confounding effects of sensitivity to patchiness. There was not a clear indication that patch-
or boundary-based metrics were more or less sensitive to abruptness.

16.4.3.2  Forest Fragmentation

Using simulated landscapes, each of several patch-based metrics exhibited a significant degree of
variation when calculated at different levels of percent forested (see Figure 16-3 (A)). Edge density was
clearly highest when the landscape was 50% forest and lowest when the landscape was either 100% or
zero percent forest.  The largest patch index and the total core area index both increased with increasing
percent forested. The number of patches decreased with increasing forest percentage, after an initial
increase.
                                                                                   Page 271 of 339

-------
The number of patches exhibited the highest degree of variation across different simulation runs (see
Figure 16-3 (A)). The coefficient of variation across simulation runs varied at different levels of percent
forest, depending on the average value of the metric and its variance (see Figure 16-3 (B)). Largest patch
index and the number of patches exhibit the highest coefficient of variation across the runs, indicating a
higher degree of relative error, and lower precision. This finding was consistent with the redundant
mapping work described above, and it highlights the relative instability of metrics that require patch
delineation. Both the empirical and simulation work show that slight changes in the maps of a landscape,
as the result of remote sensing image quality issues or just random perturbations, can result in relatively
large variations in the number of patches and identified and, as a result, in the mean patch size.
             I
                 300
                                                                 -Largest Patch Index
                                                                 •# Patches
                                                                 - Edge Density
                                                                 -Total Core Area Index
                            20     40     60     80    100

                                 Percent Forest
                                                                                    B
                                                                   Largest Patch Index
                                                                   # Patches
                                                                   Edge Density
                                                                   Total Core Area Index
                             20
40
                                  Percent Forest
            Figure 16-3:  (A) The relationships between mean landscape pattern metric values
                        across 10 simulations and the proportion of the landscape covered by
                        forest  The error bars show the two times the standard deviation
                        across the 10 runs. (B) The coefficient of variation of the metric
                        values across simulations, indicating their relative errors.
Page 272 of 339

-------
16.5  Discussion

The results indicated the difficulty involved in distinguishing the effects of changes in the amount of
forest from changes in the pattern of forest. The question is relevant in attempts to understand the effects
of landscape structure on ecological processes.  Some have argued that the concept of fragmentation is
meant to include both the amount of forest and its spatial configuration (Forman, 1997). Others define
fragmentation to mean a spatial pattern characteristic of the forest, independent of the effects of how
much forest there is (Trzcinski et al., 1999). If the latter definition is used, then a measure of forest
fragmentation that is not sensitive to the amount of forest is required.  For example, do changes in pattern
of forest have impacts on ecological processes beyond the effects resulting from changes in the amount of
forest? Trzcinski et al (1999) dealt with this question by, first, evaluating the correlation between bird
populations and forest amount, then correlating bird populations with the residuals that resulted from the
regression of forest amount versus forest pattern. The results indicated that there was little effect of forest
pattern on bird populations independent of forest amount.  However, more work is needed to understand
the interactions of land cover amount and  pattern from both the perspective of how to measure pattern
independently and how to understand its independent effects.
16.6  Conclusions

This chapter summarizes work on the precision and meaning of landscape pattern metrics derived from
remote sensing. The transformations involved in calculating landscape metrics are complex, and
analytical approaches to estimating their uncertainty are likely not to be practical. For that reason, this
study has focused on two approaches to evaluating this propagation. First, we used redundant mapping of
areas and evaluation of the variation in metric values derived from different imagery acquired near to
each other in time. Second, simulation was used to explore the sensitivity of various metrics to
differences in landscapes by controlling certain landscape characteristics.

We determined that uncertainty in input data propagates throughout the calculations and ultimately affects
of landscape metric precision.  The precision of landscape metric values calculated to measure forest
fragmentation is affected by the similarity in seasonal date of the imagery, atmospheric disturbances in
the imagery (clouds and haze), and the amount of forest in the landscape.  Metrics calculated for larger
landscapes tend to exhibit less variation, but post-processing of imagery (e.g., through sieving to remove
small patches) did not result in increased precision. Landscape metrics whose calculation required more
steps (e.g., image classification and patch delineation) were more likely to be susceptible to slight
variations in the input data. Therefore, patch-based metrics (e.g., number of patches and mean patch size)
tend to be less precise than boundary-based metrics.

Landscape class definitions, whether intentionally different or different because of the mapping method
used, are important determinants of landscape pattern.  It is possible to achieve significantly different
landscape pattern metrics values based on different class definitions. This suggests extreme caution
should be used when attempting to compare pattern metric values for landscape maps derived from
different sources and methods. For example, urban areas typically contain significant forest cover that is
not represented in LU class definitions. Also, landscape metric values describing fragmentation of forest
calculated from a LU/LC map were found to be different from those calculated from a classification
specifically designed to map forest and non-forest.  Applications that target the habitat quality for specific
animal species, especially small animals, may not be well served by aggregated LU classes.
                                                                                    Page 273 of 339

-------
Landscape pattern metrics transform spatial data in complex ways and users need to exercise caution
when interpreting the calculated values. Spatial simulation was a valuable tool for evaluating the
behavior of landscape metrics and their sensitivity to various inputs.  Ecotone abruptness can be detected
using existing landscape metrics, but simulation illustrates that new metrics that measure the variation in
boundary element locations are more sensitive to abruptness than existing metrics.  Most measures of
spatial pattern are also sensitive to both the composition and configuration of the landscape.  More work
is needed to evaluate the various influences of landscape configuration and composition on metric values.

The results presented here raise several issues for both the users and producers of remote sensing based
LU and LC products for landscape ecology investigation. First, in addition to the issue of data accuracy,
the user is well advised to consider the appropriateness of class definitions for a specific application. For
example, a general LU/LC classification may not be appropriate for calculation of landscape metrics in a
study of habitat quality for a specific species. For this type of application, landscape maps may need to
be developed specifically for the intended application.  Second, the nature of the spatial transformations
taken to compute pattern metrics can have dramatic implications for the precision of the estimated values.
Metrics that require image classification and patch delineation are subject to greater imprecision than
those based on local characterizations of pattern. Third, the meaning of metric values can be confounded
and difficult to interpret.  Applications of landscape metrics that seek empirical relationships between
metric values and ecosystem characteristics may be able to by-pass concerns about meaning and instead
focus on correlations with a ecosystem outcomes of interest (e.g., based on independent measurements of
ecosystem characteristics).  However, when directed toward spatial land management goals (e.g., a less  J
fragmented forest), understanding the meaning of metrics is important to improve the probability of
achieving the desired objectives.
16.7  Summary

Landscape pattern metrics have been increasingly applied in support of environmental and ecological
assessment for characterizing the spatial composition and configuration of landscapes to relate and
evaluate ecological function. This chapter summarizes a combination of previously published and new
work that investigates the precision and meaning of spatial landscape pattern metrics.  The work is
conducted on landscapes of the Upper Midwest, USA, using satellite images, aerial photographs, and
simulated landscapes. By applying a redundant mapping approach, we assessed and compared the degree
of precision in the values of landscape metrics calculated over landscape subsets.  While increasing
landscape size had the effect of increasing precision in the  landscape metric estimates, by giving up
spatial resolution, post-processing methods like filtering and sieving did not have a consistent effect.
Comparing multiple classifications  of the  same area that use different class definitions, we demonstrate
that conclusions about landscape composition and configuration are affected by how the landscape classes
are defined. Finally, using landscape simulation experiments, we demonstrate that metric sensitivity to a
pattern characteristic of interest (e.g., ecotone abruptness of forest fragmentation) can be confounded by
sensitivity to other landscape characteristics (e.g., landscape patchiness or amount of forest), making
direct measurement of the desired characteristic difficult.
16.8  Acknowledgments

NASA's Land Cover and Land Use Change Program, the National Science Foundation, the USGS's
Global Change Program, and the USDA's Forest Service North Central Forest Experiment Station funded
the research described in this chapter.
Page 274 of 339

-------
16.9  References

Anderson, J.R., E.E. Hardy, J.T. Roach, and R.E. Witmer. A Land Use and Land Cover Classification
    System for Use with Remote Sensing Data.  U.S. Geological Survey, Professional Paper 964,
    U.S. Government Printing Office, Washington, DC, 1976.

Bowersox, M.A. and D.G. Brown.  Measuring the abruptness of patchy ecotones: a simulation-based
    comparison of patch and edge metrics.  Plant Ecology, 156(1), 89-103, 2001.

Brown, D.G., J.D. Duh, and S. Drzyzga.  Estimating error in an analysis of forest fragmentation change
    using North American Landscape Characterization (NALC) Data.  Remote Sens. Environ., 71, 106-
    117,2000a.

Brown, D.G., G.M. Jacquez, J.-D. Duh, and S. Maruca. Accuracy of remotely sensed estimates of
    landscape change using patch- and edge-based pattern statistics. In Spatial Accuracy 2000
    (M.J. Lemmens, et al., Editors), Delft University Press, Amsterdam, 75-82, 2000b.

Forman, R.T.T. Land Mosaics: The Ecology of Landscapes and Regions. Cambridge University Press,
    New York, 1997.

Fortin, M.J. and P. Drapeau. Delineation of ecological boundaries: comparison of approaches and
    significance tests. Oikos, 72:323-332, 1995.

Frohn, R.C., Remote Sensing for Landscape Ecology: New Metric Indicators for Monitoring, Modeling,
    and Assessment of Ecosystems. Lewis Publishers, Boca Raton, FL, 1997.

Guindon, B. and C.M. Edmonds. Using Classification consistency in inter-scene overlap regions to
    model spatial variation in land-cover accuracy over large geographic regions. In: Geospatial Data
    Accuracy Assessment (R.S. Lunetta andJ.G. Lyon, Editors), U.S. Environmental Protection Agency,
    Report No. EPA/600/R-03/064, 335 p., 2003.

Gustafson, E.J. and G.R.  Parker.  Relationships between land cover proportion and  indices of landscape
    spatial pattern. Landscape Ecology, 7(2), 101-110, 1992.

Hargis, C.D., J.A.  Bissonette, and J.L. David. The behavior of landscape metrics commonly used in the
    study of habitat fragmentation. Landscape Ecology,  13(3), 167-186,  1998.

Henebry, G.M., and D.G. Goodin. Landscape trajectory analysis:  toward  spatio-temporal models of
    biogeophysical fields for ecological forecasting. Workshop on Spatio-temporal Data Models for
    Biogeophysical Fields, La Jolla, CA, April 8-10, 2002.
    http://www.calmit.unl.edu/BDEI/papers/henebry_goodin_position.pdf

Jacquez, G.M., S.L. Maruca, and M.J. Fortin. From fields to objects: a review of geographic boundary
    analysis. J. Geographical Systems, 2, 221-241, 2000.

Jensen, J.R.  Introductory Digital Image Processing:  A Remote Sensing Perspective, Prentice Hall, Upper
    Saddle River, NJ, 1996.

Johnston, C.A. and J. Bonde.  Quantitative analysis of ecotones using a geographic  information system.
    Photogrammetric Engineering and Remote Sensing,  55(11), 1643-1647, 1989.
                                                                                 Page 275 of 339

-------
Lunetta, R.S., J.G. Lyon, B. Guindon, and C.D. Elvidge. North American Landscape Characterization
    dataset development and data fusion issues. Photogrammetric Engineering and Remote Sensing,
    64(8): 821-829, 1998.

Malanson, G.P., N. Xiao, and K.J. Alftine.  A simulation test of the resource-averaging hypothesis of
    ecotone formation. J. Vegetation Science, 12,743-748,2001.

McGarigal, K. and B.J. Marks. FRAGSTATS: Spatial Pattern Analysis Program for Quantifying
    Landscape Structure. Technical Report PNW-GTR-351, USDA Forest Service, Pacific Northwest
    Research Station, Portland, OR, 1995.

Petit, C.C. and E.F. Lambin. Integration of multi-source remote sensing data for land cover change
    detection. Int. J. Geographical Information Science, 15(8), 785-803, 2001.

Riitters, K.H., R.V. O'NeiN, C.T. Hunsaker, J.D. Wickham, D.H. Yankee, S.P. Timmins, K..B. Jones, and
    B.L. Jackson. A factor analysis of landscape pattern and structure metrics. Landscape Ecology,
    10,23-39, 1995.

SEMCOG (Southeastern Michigan Council of Governments).  Land Use/Land Cover, Southeast
    Michigan. Digital data product from SEMCOG, Detroit, MI, 1995.

Trzcinski, M.K., L. Fahrig, and G. Merriam. Independent effects of forest cover and fragmentation on the
    distribution of forest breeding birds. Ecological Applications, 9(2), 586-593, 1999.

Turner, M.G., R.H. Gardner, and R.V. O'Neill. Landscape Ecology in Theory and Practice.  Springer-
    Verlag, New York, 2001.

Turner, M.G., R.V. O'Neill, R.H. Gardner, and B.T. Milne.  Effects of changing spatial scale on the
    analysis of landscape pattern. Landscape Ecology, 3, 153-162,  1989.

USGS (United States Geological Survey). National Land Cover Data. Product Description, 2001.
    http://landcover.usgs.gov/prodescription.html.
 Page 276 of 339

-------
                                    Chapter 17


                   Components of Agreement Between
                Categorical Maps at Multiple  Resolutions

                                          by

                                   R. Gil Pontius, Jr.'
                                    Beth Suedmeyer

                                    Clark University
                                    950 Main Street
                                 Worcester, MA 01610

                             * Corresponding Author Contact:

                              Telephone: (508) 793-7761
                               Facsimile: (508) 793-8881
                                  E-mail: rpontius@clarku.edu
17.1   Introduction


17.1.1  Map Comparison

Map comparisons are fundamental in remote sensing and geospatial data analysis for a wide range of
applications including accuracy assessment, change detection, and simulation modeling. Common
applications include the comparison of a reference map to one derived from a satellite image or a map of
a real landscape to a simulation model output. In either case, the map that is considered to have the
highest accuracy is used to evaluate the map of questionable accuracy. Throughout this chapter, the term
"reference" map refers to the map that is considered to have the highest accuracy and the term
"comparison" map refers to the map that is compared to the reference map. Typically, one wants to
identify similarities and differences between the reference map and the comparison map.

There are a variety of levels of sophistication by which to compare maps when they share a common
categorical variable (Congalton, 1991; Congalton and Green, 1999).  The simplest method is to compute
the proportion of the landscape classified correctly. This method  is an obvious first step, however, the
proportion correct fails to inform the scientist of the most important ways in which the maps differ, and
                                                                           Page 277 of 339

-------
 hence it fails to give the scientist information necessary to improve the comparison map.  Thus, it would
 be helpful to have an analytical technique that budgets the sources of agreement and disagreement to
 know in what respects the comparison map is strong and weak. This chapter introduces map comparison
 techniques to determine agreement and disagreement between any two categorical maps based on the
 quantity and location of the cells in each category; moreover the techniques apply to both hard and soft
 (i.e., fuzzy) classifications (Foody, 2002).

 This chapter builds on recently published methods of map comparison and extends the concept to
 multiple resolutions (Pontius, 2000, 2002). A substantial additional contribution beyond  previous
 methods is that the methods described in this chapter support stratified analysis. In general, these new
 techniques serve to facilitate the computation of several types of useful information  from  a generalized
 confusion matrix (Lewis and Brown, 2001). The following puzzle example illustrates the fundamental
 concepts of comparison of quantity and location.


 17.1.2   Puzzle  Example

 Figure 17-1 shows a pair of maps containing two  categories (i.e., light and dark). At the simplest level of
 analysis, we compute the proportion of cells that agree between the two maps.  The agreement is 12/16
 and the disagreement is 4/16. At a more sophisticated level, we can compute the disagreement in terms of
 two components: (a) disagreement due to quantity; and (b) disagreement due to location.  A disagreement
 of quantity is defined as a disagreement between the maps in terms of the quantity of a category. For
 example, the proportion of cells in the dark category in the comparison map is 10/16 and  in the reference
 map is 12/16, therefore there is a disagreement due to quantity of 2/16. A disagreement of location is
 defined as a disagreement such that a swap of the location of a pair of cells within the comparison  map
 increases overall agreement with the reference map. The disagreement of location is determined by the
 amount of spatial rearrangement possible in the comparison map, so that its agreement  with the reference
 map is maximized. In this example, it would be possible to swap the #9 cell with the #3,  #10, or #13 cell
 within the comparison map to increase its agreement with the reference map (see Figure 17-1). Either of
 these is the only swap we can make to improve the agreement, given the quantity of the comparison map
 Therefore the disagreement of location is 2/16.  The distinction between information of quantity and
 information of location is the foundation of this chapter's philosophy of map comparison.
1
5
9
13
2
6
10
14
3
7
11
15
4
8
12
16
1
5
9
13
2
6
10
14
3
7
11
15
4
8
12
16
                          Comparison
Reference
                   Figure 17-1.  Demonstration puzzle to illustrate agreement of
                               location versus agreement of quantity.  Each map
                               shows a categorical variable, with two categories:
                               dark and light.  Numbers identify the individual grid
                               cells.
Page 278 of 339

-------
17.2   Methods


17.2.1  Example Data

Categorical variables consisting of "forest" and "non-forest"
are represented in three maps of example data (see Figure 17-
2). Each map is a grid of 12 x  12 cells.  The 100 non-white
cells represent the study area and the remaining 44 white cells
are located out of the study area. We have purposely made a
non-square study area to demonstrate the generalized
properties of the methods.  The methods apply to a collection
of any cells within a grid, even if those cells are non-
contiguous, as is typically the case in accuracy assessment.
Each map has the same nested  stratification structure. The
coarser stratification consists of two strata (i.e., north and
south halves) separated by the thick solid line.  The finer
stratification consists of four substrata quadrates of 25 cells
each, defined as the northeast (NE), northwest  (N W),
southeast (SE), and southwest (SW). The set of three maps
illustrates the common characteristics encountered when
comparing map classification rules. Imagine that Figure 17-2
represents the output maps from a standard classification rule
(COM1), alternative classification rule (COM2), and the
reference data (REF). Typically, a statistical test would be
applied to assess the relative performance of the two
classification approaches and to determine important
differences with respect to the reference data.  However, it
vvould also be helpful if such a comparison would offer
additional insights concerning the sources of agreement and
disagreement.

Table  17-1 (A & B) represents the standard confusion matrix
for the comparison of COM 1 and COM2 versus REF.  The
agreement in Table  17-1 (A &  B) is 70% and 78%,
respectively.  Note that the classification in COM2 is
identical to the reference data in the south stratum.  In the
north stratum, COM2 is the mirror image of REF reflected
through the central vertical axis. Therefore, the proportion
forest  in COM2 is identical to REF in both the north stratum
and the south stratum.  For the entire study area, REF is 45%
forest, as is COM2.  COM1 is 47% forest. A standard
accuracy assessment ends with the confusion matrices of
Table  17-1.
  10ut of
  JStudyArea  I	iForest  I •Non-Forest
              Com 1
I   lOut of
I	I Study Area
Forest  I • Non-Forest
              Com 2
                                                               ' Out of
                                                               .StudyArea  I	IForest I • Non-Forest
                                                                            Ref

                                                          Figure 17-2. Three maps of example data.
                                                                                    Page 279 of 339

-------
               Table 17-1. A:  Confusion matrix for COM1 versus REFERENCE.
                          B:  Confusion matrix for COM2 versus REFERENCE.

Comparison Map

Forest
Non-Forest
Total
Table "A" Reference Map
Forest
31
14
45
Non-Forest
16
39
55

Total
47
53
100

Comparison Map

Forest
Non-Forest
Total
Table "B" Reference Map
Forest
34
11
45
Non-Forest
11
44
55

Total
45
55
100
 17.2.2  Data Requirements and Notation

 We have designed COM1, COM2 and REF to illustrate important statistical concepts. However, this
 chapter's statistical techniques apply to cases that are more general than the sample data of Figure 17-2.
 In fact, the techniques can compare any two maps of grid cells that are classified as any combination of
 soft or hard categories.

 This means that each grid cell can have some membership in each category, where the membership can
 range from no membership (0.0) to complete membership (1.0). The membership is the proportion of the
 cell that belongs to a particular category; therefore the sum of the membership values over all categories
 is 1.0. In addition, each grid cell  has a weight to denote its membership to any particular stratum, where
 the stratum weight can also range from 0.0 to 1.0. The weights do not necessarily need to sum to  1.  For
 example, if a cell's weights are 0  for all strata, then that cell is eliminated from the analysis.  These ideas
 are expressed mathematically in Equations 1-4, where j is the category index, J is the number of
 categories, Rdnj is the membership of category j in cell n of stratum d of the reference map, Sdllj is the
 membership of category] in cell n of stratum d of the comparison map, and Wdn is the weight for the
 membership of cell n in stratum d.
                                                              0 ^ R
                                                                   dnt
                                                               0 <. SJn] < 1
d)

(2)
                                                      j=\
                                                                                           (3)
                                                              0 * W,n<  \
                                                                                           (4)
Page 280 of 339

-------
Just as each cell has some proportional membership to each category, each stratum has some proportional
membership to each category. We define the membership of each stratum to each category as the
proportion of the stratum that is covered by that category. For each stratum, we compute this membership
to each category as the weighted proportion of the cells that belong to that category. Similarly, the entire
landscape has membership to each particular category, where the membership is the proportion of the
landscape that is covered by that category. We compute the landscape-level membership by taking the
weighted proportion over all  grid cells. Equations 5-9 show how to compute these levels of membership
for every category at both the stratum scale and the landscape scale. Equations 5-9 utilize standard dot
notation to denote summations, where Nd denotes the number of cells that have some positive
membership in stratum d of the map and D denotes the number of strata.  Equation 5 shows that Wd.
denotes the sum of the cell weights for stratum d. Equation 6 shows that  Rd.j denotes the proportion of
category j in stratum d of the reference map.  Equation  7 shows that R.j denotes the proportion of category
j in the entire reference map. Equation 8 shows that Sd.j denotes the proportion of category j in stratum d
of the comparison map.  Equation 9 shows that S..j denotes the proportion of category j in the entire
comparison map.

                                                                           Afc
                                                                                             (5)
                                                                   Nd
                                                               D  Nt
                                                                    rf-1


                                                                   Afo

                                                                      Wj~ * Sd*'             (8)
                                                               D
                                                               £
                                                                                             (9,
                                                                                  Page 281 of 339

-------
  17.2.3   Minimum Function

 The Minimum function gives the agreement between a cell of the reference map and a cell of the
 comparison map. Specifically, Equation 10 gives the agreement in terms of proportion correct between
 the reference map and the comparison map for cell n of stratum d.  Equation 11 gives the landscape-scale
 agreement weighted appropriately with grid cell weights, where M(m) denotes the proportion correct
 between the reference map and the comparison map.
                                     agreement in cell n of stratum d = £ MIN(R^S^)           /1 Qi
... ,   rf
M(m)
                                                   D   m     ( j               ]
                                                      JX* \LMIN(RdnJ' "W
                                                   -1 *-l     [j-l _    ]
                                                              -  - '
                                                            
-------
        t
,
 J



                                                     Jl

                                                ££»•.
                                                        *,j
                                                                    Yw
                                                        H
                                        Information of Quantity
    Figure 17-3. Expressions for fifteen points defined by a combination of the information of quantity
               and location. The vertical axis shows information of location and the horizontal axis
               shows information of quantity. The text defines the variables.
Equations 12 and 13 give the necessary adjustment to each grid cell in order to scale the comparison map
to express no information of quantity.
                               %*),        if\IJ  efce
                                                                                           (12)
                                                                      if\/J
-------
 The logic of the scaling is as follows, where the word "paint" can be substituted for the word "category"
 to continue the painting analogy. If the quantity of category j in the comparison map is less than 1/J, then
 more of category j must be added to the comparison map.  In this case, category j is increased in cells that
 are not already 100% members of category j. If the quantity of category j in the comparison map is more
 than I/J, then some of category j must be removed from the comparison map. In that case, category j is
 decreased in cells that have some of category j.

 For expressions in the medium information column of Figure 17-3, the "other" maps have the same
 quantities as the comparison map. For the expressions  in the perfect information column, the "other"
 maps are derived such that the proportion of membership for each of the J categories matches perfectly
 with the proportions in the reference map. This adjustment is necessary to answer the question, "What
 would be the agreement between the reference map and the comparison map, if the scientist would have
 had perfect information of quantity during the production of the comparison map?"  The adjustment holds
 the level of information of location constant, while adjusting each grid cell such that the quantity of each
 of the J categories  in the landscape matches the quantities in the reference map. The logic of the
 adjustment is similar to the scaling procedure described for the "other" maps in the no information of
 quantity column of Figure 17-3.

 Equations 14 and 15 give the necessary mathematical adjustments to scale the comparison map to express
 perfect information of quantity.
                                                                                              (14)
                                                    = !-(! -£/,„)      , else
                                                         (R n
                                              Fd j = Sd j (j$~~\>        if R J ^ S -j


Equation 14 performs this scaling at the grid cell level, hence creates an "other" map, denoted B^.
Equation 15 performs this scaling at the stratum level, hence creates an "other" map, denoted Fd..

There are five levels of information of location: no, stratum, medium, perfect within stratum, and perfect
denoted respectively as N(x), H(x), M(x), K(x) and P(x). Figure 3 shows the differences in the fifteen
mathematical expressions among these various levels of information of location. In N(x), H(x) and M(x)
rows, the mathematical expressions of Figure 17-3 consider the reference map at the grid cell level, as
indicated by the use of all three subscripts:  d, n, and j.  In the K(x) row, the mathematical expressions
consider the reference map at the stratum level, as indicated by the use of two subscripts: d and j.  In the
P(x) row, the expressions consider the reference map at the study area level, as indicated by the use of one
subscript: j. In the M(x) row, the expressions consider the other maps at the grid cell level, as indicated
by the use of all three subscripts:  d, n, and j.  In the H(x) and K(x) rows, the expressions consider the
other maps at the stratum level, as indicated by the use of two subscripts:  d and j.  In the N(x) and P(x)
rows, the expressions consider the other maps at the study area level, as indicated by the use of one
subscript: j.
Page 284 of 339

-------
The concepts behind these combinations of components of information of location are as follows.  In row
N(x), the categories of the other maps are spread evenly across the landscape, such that every grid cell has
an identical multinomial distribution of categories. In row H(x), the categories of the other maps are
spread evenly within each stratum, such that every grid cell in each stratum has an identical multinomial
distribution of categories. In row M(x), the grid cell level information of location in the other maps is the
same as in the comparison map. In row K(x), the other maps derive from the comparison map, whereby
the locations of the categories in the comparison map are swapped within each stratum in order to match
as best as possible the reference map, however this swapping of grid cell locations does not occur across
stratum boundaries.  In row P(x), the other maps derive from the comparison map, whereby the locations
of the categories in the comparison map are swapped in order to match as best as possible the reference
map and this swapping of grid cell locations can occur across stratum boundaries.

Each of the fifteen mathematical expressions of Figure  17-3 is denoted by its location in the table.  The x
denotes the level of information of quantity.  For example, the overall  agreement between the reference
map and the comparison map is denoted M(m), since the comparison map has a medium level of
information of quantity and a medium level of information of location, by definition. The expression
p(p) is in the upper right of Figure 17-3 and is always equal to 1.0, because P(p) is the agreement
between the reference map and the other map that has perfect information of quantity and perfect
information of location.

There are seven mathematical expressions that are especially interesting and helpful. They are N(n),
N(m), H(m), M(m), K(m), P(m), and P(p).  For N(n), each cell of the other map is the same and has a
membership in each category equal to 1/J. For N(m), each cell of the  other map is the same and has a
membership in each category equal to the proportion of that category in the comparison map. For H(m),
each cell within each stratum of the other map is the same, and has a membership in each category equal
to the proportion of that category in each stratum of the comparison map.  For M(m), the other map is the
comparison map.  For K(m), the other map is the comparison map with the locations of the grid cells
swapped within each stratum, so as to have the maximum possible agreement with the reference map
vvithin each stratum. For P(m), the other map is the comparison map with the locations of the grid cells
swapped anywhere within the map, so as to have  the maximum possible agreement with the reference
map. For P(P)>tne other map is the reference map, therefore the agreement is perfect.


 17.2.5  Agreement and Disagreement
                                                   \
The seven mathematical expressions N(n), N(m), H(m), M(m), K(m), P(m), and P(p) constitute a
sequence of measures of agreement between the reference map and other maps that have increasingly
accurate information. Therefore, usually 0.0 < N(n) < N(m) < H(m) < M(m) < K(m) < P(m) < P(p) =
 1.0. This sequence partitions the interval [0.0,1.0] into components of the agreement between the
reference map and the comparison map.  M(m) is the total proportion correct, and  l-M(m) is the total
proportion error between the  reference map and the comparison map.  Hence, the sequence of N(n),
N(m), H(m) and M(m) defines components of agreement, and the sequence of M(m), K(m), P(m) and
p(p) defines components of disagreement.

Table 17-2 defines these components mathematically.  Beginning at the bottom of the table  and working
up, the first component is agreement due to chance, which is usually N(n).  However, if the agreement
between the reference map and the comparison map is less than would be expected by chance, then the
component of agreement due to chance may be less than N(n). Therefore, Table 17-2 defines the
                                                                                 Page 285 of 339

-------
 component of agreement due to chance as the minimum of N(n), N(m), H(m), and M(m).  The
 component of agreement due to quantity is usually N(m)-N(n); Table 17-2 gives a more general
 definition to account for the possibility that the comparison map's information of quantity can be worse
 than no information of quantity.  The component of agreement at the stratum level is usually H(m) -
 N(m); Table  17-2 gives a more general definition to restrict this component of agreement to be non-
 negative.  Similarly, the component of agreement at the grid cell level is usually M(m) - H(m);
 Table 17-2 restricts this component of agreement to be non-negative.  Table 17-2 also defines the
 components of disagreement. It is a mathematical fact that M(m) < K(m) < P(m) < P(p), therefore the
 components of disagreement are the simple definitions of Table  17-2.
  Table 17-2.  Definition and values of seven components of agreement for COM1 versus
              REFERENCE derived from the mathematical expressions of Figure 17-3.
Name of Component
Disagreement due to quantity
Disagreement at stratum level
Disagreement at grid cell level
Agreement at grid cell level
Agreement at stratum level
Agreement due to quantity
Aqreement due to chance
Definition
P(p)-P(m)
P(m)-K(m)
K(m)-M(m)
MAX [M(m)-H(m), 0]
MAX [H(m)-N(m), 0]
If MIN [N(n), N(m), H(m), M(m)] = N(n), then
MIN [N(m)-N(n), H(m)-N(n), M(m)-N(n)],
elseO
MIN [N(n), N(m), H(m), M(m)]
Percent of Each
Component
Stratum
2.0
8.0
20.0
12.2
7.5
0.3
50.0
Substratum
2.0
8.0
20.0
11.5
8.2
0.3
50.0
 The partition of the components of agreement can be performed for any stratification structure.
 Table 17-2 shows the results for the comparison of REF and COM 1 at both the stratum level and the
 substratum level.  Figure 17-4 shows this information in graphical form. The stratum bar shows the
 components at the stratum level and the substratum bar shows the components at the substratum level
 Since the substrata are nested within the strata, it makes sense to overlay the Stratum bar on top of the
 Substratum bar to produce the Nested bar. Depending on the nature of the maps, the Nested bar could
 show nine possible components listed in the legend.  In the comparison of RHF and COM 1, the bar shows
 eight nested components.
Page 286 of 339

-------
            Stratum
Substratum
                                                    Nested
                                                                    • Disagreement due to quantity

                                                                    • Disagreement at stratum level

                                                                    • Disagreement at substratum level

                                                                    D Disagreement at grid cell level

                                                                    •Agreement at grid cell level

                                                                    •Agreement at substratum level

                                                                    n Agreement at stratum level

                                                                    DAgreement due to quantity

                                                                    a Agreement due to chance
    Figure 17-4. Stacked bars showing components of agreement between COM1 and REF.  The vertical
                axis shows the cumulative percent of cells in the study area. The nested bar overlays
                the stratum bar overlaid on top of the substratum bar to show agreement at both the
                stratum and substratum levels. Table 17-2 gives the numerical values for the
                components in the stratum and substratum bars.
17.2.6   Multiple Resolutions

Up to this point, our analysis of the maps of Figure 1 7-2 has been based on a cell-by-cell analysis with
hard classification. The advantage of cell-by-cell analysis with hard classification is its simplicity.  The
disadvantage of cell-by-cell analysis with hard classification is that if a specific cell fails to have the
correct category, then it is counted as complete error, even when the correct category is found in a
neighboring cell. Therefore, cell-by-cell analysis can fail to indicate general agreement of pattern because
it fails to consider spatial proximity to agreement. In order to remedy this problem we perform  multiple
resolution analysis.

fhe multiple resolution analysis requires a new set of maps for each new resolution. Figure 1 7-2 shows
maps that are hard classified, whereas Figure 1 7-5 shows the COM I map at four coarser resolutions.
Each cell of each map of Figure 1 7-5 is an average of neighboring cells of the original COM I map of
figure 1 7-2. For example, for resolution 2, four neighboring cells become a single coarse cell,  therefore
the 1 2 x 12 map of original cells yields a 6 x 6 map. At resolution three, we obtain a 4 x 4 map of coarse
cells, in which the length of the side of each coarse  cell is three times the  length of the side of each
original fine resolution cell.  At resolution four, we  obtain a 2 x 2 map of coarse cells, where each coarse
cell is its own substratum.  At resolution 12, the entire  map is in one cell. For each coarse cell, the
membership in each category is the average of the memberships of the contributing cells.  When using
this aggregation technique, the lack of a square study area can result in an unequal number of fine
resolution cells in each of the coarser cells. This is  taken into consideration by the weights that give each
cell's membership in the study area.  This characteristic of the technique allows the method to apply to
accuracy assessment where the grid cells of interest are not contiguous.
                                                                                      Page 287 of 339

-------
                                           —iQ.OO
                                            0.06
                                            0.13
                                            0.19
                                            0.25
                                            0.31
                                            0.38
                                            0.44
                                            0.50
                                            0.56
                                            0.63
                                            0.69
                                            0.75
                                            0.81
                                            0.88
                                            094
                                           :d 1.00
                           .1
0.00
0.06
0.13
0.19
0.25
0.31
0.38
0.44
0.50
0.56
0.63
0.69
0.75
0.81
088
0.94
1.00
                Resolution 2
Resolution 3
                                            -0.32
                                             0.35
                                             0.37
                                             0.40
                                             0.43
                                             0.46
                                             0.48
                                             0.51
                                             0.54
                                             0.57
                                             0.59
                                             0.62
                                            ^0.65
                                            :0.68
                                            ^0.70
                                            ,0.73
                                           I—10.76
                             0.53
                Resolution 6
Resolution 12
    Figure 17-5. Map COM1 at four different resolutions. On the legend, zero means completely forest,
                one means completely non-forest, and white is outside of the study area.



Figure 17-5 shows the cell configuration.  The darker shading shows stronger membership in the non-
forested category. Figure 17-6 shows this same type of aggregation for the REF map. For each
resolution, we are able to generate a bar similar to the nested bar of Figure 17-4 because the equations of
Figure 17-3 allow for any cell to have partial membership in any category.
Page 288 of 339

-------
               Resolution 2
Resolution 3
               Resolution 6
                                                             Resolution 12
    Figure 17-6. Map REF at four different resolutions. On the legend, zero means completely forest,
               one means completely non-forest, and white is outside of the study area.
17.3  Results

Figure 17-7 shows the components of agreement and disagreement between REF and COM I  at all
resolutions.  Figure 17-8 shows analogous results for the comparison between REF and COM2. The
overall proportion correct is the top of the component of agreement at the grid cell level and the overall
proportion correct at the coarser resolutions is the top of the component of agreement due to quantity.
Proportion correct tends to rise as resolution becomes coarser, however the rise is not monotonic.
Proportion correct rises for each resolution that is nested within finer resolutions.  That is, the proportion
correct for resolution one < proportion correct for resolution two < proportion correct for resolution six <
proportion correct for resolution 12.  In addition, the proportion correct for resolution one < proportion
correct for resolution three < proportion correct for resolution six < proportion correct for resolution 12.
                                                                                   Page 289 of 339


-------
  However, the proportion correct for resolution two > proportion correct for resolution three. Note tha
  resolution two is not nested within resolution three.
                                                                     • Disagreement due to quantity
                                                                     D Disagreement at stratum level
                                                                     D Disagreement at substratum level
                                                                     D Disagreement at grid cell level
                                                                     • Agreement at grid cell level
                                                                     • Agreement at substratum level
                                                                     O Agreement at stratum level
                                                                     • Agreement due to quantity
                                                                     a Agreement due to chance
              1236
     Figure 17-7.  Stacked bars showing agreement between COM1 and REF. The vertical axis shows the
                 cumulative percent of total study area. The numbers on the horizontal axis give the
                 resolutions.
                                                                     • Disagreement due to quantity
                                                                     n Disagreement at stratum level
                                                                     D Disagreement at substratum level
                                                                     D Disagreement at grid cell level
                                                                     • Agreement at grid cell level
                                                                     • Agreement at substratum level
                                                                     n Agreement at stratum level
                                                                     • Agreement due to quantity
                                                                     • Agreement due to chance
              12           3           6          12
          (17-8. Stacked bars showing agreement between COM2 and REF. The vertical axis shows the
                cumulative percent of total study area. The numbers on the horizontal axis give the
                resolutions.
 The largest component is agreement due to chance, which is 50% at the finest resolution since there are
 two categories. Agreement due to chance rises as resolution becomes coarser.  Besides the component
 due to chance, the largest components at the finest resolution are agreement at the grid cell level and
 disagreement at the grid cell level. As resolution becomes coarser, the grid cell level information
 becomes less important, relative to information of quantity. At the coarsest resolution, where the entire
 study area is in one cell, the concept of location has no meaning; hence the only components are
 agreement due to chance, agreement due to quantity and disagreement due to quantity. COM1 has a
 component of disagreement due to quantity, which does not change as resolution changes, since quantity
 is a concept independent of resolution.  COM2 has no disagreement in quantity.
Page 290 of 339

-------
pigure 17-9 shows that at a fine resolution, the agreement between COM2 and REF is greater than the
agreement between COM 1  and REF. The components that account for the greater agreement are the
agreement at the stratum level and at the grid cell level.
                   100
                                                   • Disagreement due to quantity

                                                   a Disagreement at stratum level

                                                   D Disagreement at substratum level

                                                   n Disagreement at grid cell level

                                                   • Agreement at grid cell level

                                                   • Agreement at substratum level

                                                   a Agreement at stratum level

                                                   • Agreement due to quantity

                                                   a Agreement due to chance
                           COM1
COM2
                   Figure 17-9. Stacked bars showing comparison of COM1 and
                               COM2 with the REF map, at the finest resolution. The
                               vertical axis shows the cumulative percent of total
                               study area.
fables 17-3 and 17-4 display contingency tables that show the nested stratification structure of strata and
substrata.  These tables are another helpful way to present results. The information on the diagonal
indicates the number of cells for each substratum that are in agreement. Therefore the number of correct
cells may be calculated for each  substratum by summing the diagonal for each subset of the table.
furthermore, the row and column totals indicate stratum level agreement. For example, Table 17-3 shows
disagreement at the stratum level, since there are 31 forested cells in COM1 versus 35 in REF for the
Oorth stratum, and there are 16 forested cells in COM1 versus 10 in REF for the south stratum. This
disagreement at the stratum level is reflected in the component of disagreement at stratum level in  Figure
 \7-7. In contrast, Table 17-4 shows perfect agreement at the stratum level; hence Figure 17-8 shows no
component of disagreement between COM2 and REF at the stratum level.
                                                                                    Page 291 of 339

-------
      Table 17-3.  Confusion matrix for COM1 versus REF by strata and substrata. F denotes
                 forest cells and N denotes non-forest cells.

o
0
1
I
CO
NW
NE
SW
SE
F
N
F
N
F
N
F
N
Total:





REF
Substratum
NW
F
13
3






16
N
1
8






9
35
F
NE
F


12
7




19
N


5
1




6
15
N
North
SW
F




5
1


6
N




5
14


19
10
F
SE
F






1
3
4
N






5
16
21
40
N
South
Stratum
Total
14
11
17
8
10
15
6
19
100

31
19
16
34
F
N
F
N
£
0
•*-!
O
(0

I


Stratum

     Table 17-4.  Confusion matrix for COM2 versus REF by strata and substrata. F denotes
                 forest cells and N denotes non-forest cells.

«M
O
0

Substratum
NW
NE
SW
SE
F
N
F
N
F
N
F
N
Total:





REF
Substratum
NW
F
12
4






16
N
7
2






9
35
F
NE
F


12
7




19
N


4
2




6
15
N
North
SW
F




6
0


6
N




0
19


19
10
F
SE
F






4
0
4
N






0
21
21
40
N
South
Stratum
Total
19
6
16
9
6
19
4
21
100

35
15
10
40
F
N
F
N
f
0
o
W

I


E

Page 292 of 339

-------
117.4

17.4.1   Common Applications

The three maps in Figure 17-2 represent a common situation in map comparison analysis.  There are
many applications where a scientist wants to know which of two maps is more similar to a reference map.
Three likely applications are in remote sensing, simulation modeling, and land change analysis.

In remote sensing, when a scientist develops a new classification rule, the scientist needs to compare the
map generated by the new rule to the map generated by a standard rule.  Two fundamental questions are
(1) did the new method perform better than the standard method concerning its estimate of the quantity of
each category, and (2) did the new method perform better than the standard method concerning its
specification of the location of each category? The format of Figure 17-9 is an effective way to display
the results because it conveys the answer to both of these questions quickly. Specifically, COM1 makes
some error of quantity while COM2 does not. COM2 shows a better specification of location than does
COM1. When the analysis is stratified, the scientist can see whether errors of location exist at the stratum
level, substratum level or grid cell level.  For example, Figure 17-9 shows that COM1 has errors at the
stratum level and not the substratum level, while COM2 has errors at the substratum level and not the
stratum level.  In another application from remote sensing, we could examine the influence of a hardening
rule since the techniques work for both hard and soft classification.  For example, COM1 could show the
soft category membership before a hardening rule is applied, and COM2 could show the hard category
membership after the  hardening rule is applied. The format of Figure 17-9 would then summarize the
influence of the hardening rule.

In simulation  modeling, a scientist commonly builds a model to predict how land changes over time
(Veldkamp and Lambin, 2001).  The scientist performs validation to see how the model performs and to
obtain  ideas for how to improve the model.  When the model is run from T, to T2, the scientist validates
the model by comparing the simulated landscape of T2 with a reference map T2. A null model would
predict no change between T, and T2.  In other words, if the scientist had no simulation model, then the
best guess at the T2 map would be the T, map.  Therefore, to see whether the simulation model is
performing better than a null model, the scientist needs to compare (1) the agreement between the T2
simulation map and the T2 reference map, versus (2) the agreement between the T, map and the T2
reference map. In this situation,  the format of Figure 17-9 is perfectly suited to address this question
because the analogy is that COM1 is the T, map, COM2 is the T2 simulation map and REF is the T2
reference map.

The methods described here are particularly helpful in this case since land-cover and land-use (LCLU)
change models are typically stratified according to political units because data are typically available by
political unit and because the process of land change often happens by political unit. For example, land-
use activities in Brazil are planned at the regional and household scales, where the household stratification
is nested within the regional stratification. Researchers are dedicating substantial effort to collect data at
a relevant scale in order to calibrate and to improve change models. Therefore, it is essential that
statistical methods budget the components of agreement and disagreement at relevant scales because
researchers want to collect new data at the scale at which the most uncertainty exists.

In land change analysis, the scientist wants to know the manner in which land categories change and
persist over time. For this application, the methods of this chapter would use COM 1 as the T, map and
REF as the T2 map. Figure 17-7 would supply a multiple resolution analysis of LCLU change, where
                                                                                  Page 293 of 339

-------
 agreement means persistence and disagreement means change. A disagreement in quantity indicates that
 a category has experienced either a net gain or a net loss.  Disagreement at the stratum level means that a
 loss of a category in one stratum is accompanied by gain in that category in another stratum.
 Disagreement at the grid cell level means that a loss of a category at one location is accompanied by gain
 of that category at another location within the same stratum.  Therefore, Figure 17-7 would show at what
 scales LCLU change occurs.


 17.4.2   Quantity Information

 We focus primarily on the center column of mathematical expressions of Figure 17-3 because those
 expressions give the components of agreement. However, the other two columns can be particularly
 helpful depending on  the purpose of map comparison. In the case of remote sensing, guidance is needed
 to improve the classification rules. For simulation modeling, guidance is required to improve simulation
 model's rules. It would be helpful to know the expected improvement if the rule's specification of
 quantity changes, given a specific  level of information of location.  The mathematical expressions in the
 right most column of Figure 17-3 show the expected results, when the rule specifies the quantity of each
 category perfectly with respect to the reference quantities. At the other end of the spectrum, the
 mathematical equations of the left  most column show the expected results, when the rule uses random
 chance to specify the quantities of each category.

 For example, M(n)  expresses the agreement that a scientist would expect between the reference map and
 the other map, when the other map is the adjusted comparison map that is scaled to show the quantity in
 each category as l/J. M(p) expresses the agreement that a scientist would expect between the reference
 map and the other map, when the other map is the adjusted comparison map that is scaled to show the
 quantity in each category as matching perfectly the reference map.

 The definitions of M(n) and M(p)  in Table 17-2 are slightly different than the definitions of M(n) and
 M(p) in  Pontius (2000, 2002). Table  17-2 gives expressions for  M(p) and M(p) that depend on the
 scaling given by Equations 12 -  15. The method of scaling simulates the change in quantity spread
 evenly across the grid  cells as one moves from M(m) to M(p) or  from M(m) to M(n).  In contrast, Pontius
 (2000, 2002) does not scale the comparison map, and does not represent an even spread of the change in
 quantity across the cells. The methods of Pontius (2000,  2002) define M(n) and M(p) in a manner that
 makes sense for applications of land-cover change simulation modeling and slightly confounds
 information of quantity with information of location.  Table 17-2 defines M(n) and M(p) in a manner that
 is appropriate for a wider variety of applications since it maintains complete separation of information of
 quantity  from information of location.


 17.4.3    Stratification and Multiple Resolutions

 If we think of grid cells as tiny strata, then the maps of Figure 17-2 show a three tiered nested
 stratification structure. The cells are 100 tiny strata that are nested within the four substrata that are
 nested within the two broadest strata. The multiple resolution procedure grows the cells such that at the
 resolution of six, the four coarse grid cells constitute the four quadrants of the substrata. Another
 similarity between strata and cells is that they both can indicate information of location; hence they both
 appear on the vertical axis of (Figure 17-3).
Page 294 of 339

-------
However, there are three major conceptual differences between grid cells and strata. First, the concept of
location within a grid cell does not exist because category membership within a grid cell is completely
homogenous. By definition, we cannot say that a particular category is concentrated at a particular
location within a cell.  In contrast, the concept of location within a stratum does exist because we can say
that a particular category is concentrated at a particular location within a stratum, since strata usually
contain numerous cells. Second, the multiple resolution procedure increases the lengths of the sides of
the grid cells, thus reduces the number of coarse grid cells within each stratum, but the multiple resolution
analysis does not change the number of strata. Third, each cell is a square patch, whereas a stratum can
be non-square and non-contiguous. As a consequence of these differences, analysis of multiple
resolutions of cells  shows how the landscape is organized in geographic space, whereas analysis of
multiple strata shows how the landscape is organized with respect to the strata definitions.
17.5   Conclusions

The profession of accuracy assessment is advancing past the point where assessment consists of only a
calculation of percent correct or Kappa index of agreement (Foody 2002).  Now, measures of agreement
are needed that indicate how to create more accurate maps. Here we presented novel methods of accuracy
assessment to budget the components of agreement and disagreement between any two maps that show a
categorical variable. The techniques incorporate stratification, examine multiple resolutions, apply to
both hard & soft classifications, and compare maps in terms of quantity & location. Perhaps most
importantly, this paper shows how to present the results of a complex analysis in a simple graphical form.
We hope that this technique of accuracy assessment will soon become as common as today's use of
percent correct.
17.6   Summary

This chapter presented novel methods of accuracy assessment to budget the components of agreement and
disagreement between a reference map and a comparison map, where each map shows a categorical
variable. The measurements of agreement can take into consideration soft classification and can analyze
multiple resolutions.  Ultimately, the techniques express the agreement between any two maps in terms of
various components that sum to one. The components may be agreement due to chance, agreement due to
quantity, agreement due to location at one of the stratified levels, agreement due to location at the grid
cell level, disagreement due to location at the grid cell level, disagreement due to location at one of the
stratified levels, and/or disagreement due to quantity. These techniques can be used to compute
components of agreement at all resolutions and to present the results of a complex analysis in  a simple
graphical form.
 17.7  Acknowledgments

 We thank Clark University's Master of Arts program in Geographic Information Science for
 Development and Environment, and the people of Clark Labs, especially Hao Chen, who has
 programmed these methods into the GIS software Idrisi32** (*registered trademark of Clark Labs,
 Worcester, Massachusetts). We also thank the Center for Integrated Study of the Human Dimensions of
                                                                                  Page 295 of 339

-------
Global Change at Carnegie Mellon University with which this work is increasingly tied intellectually and
programmatically through the George Perkins Marsh Institute of Clark University.
17.8  References

Congalton, R., A review of assessing the accuracy of classification of remotely sensed data, Remote Sens.
    Environ., 37, 35-46,  1991.

Congalton, R. and K. Green, Assessing the accuracy of classification of remotely sensed data: principals
    and practices, Lewis Publishers, New York, NY, 1999.

Foody, G., On the Comparison of chance agreement in image classification accuracy assessment,
    Photogrammetric Engineering and Remote Sensing, 58(10), 1459-1460, 1992.

Foody, G., Status of land cover classification accuracy assessment, Remote Sens. Environ., 80, 185-201,
    2002.

Lewis, H. and M. Brown, A generalized confusion matrix for assessing area estimates from remotely
    sensed data, Int. J. Remote Sensing, 22(16), 3223-3235, 2001.

Pontius, R., Quantification error versus location error in comparison of categorical maps,
    Photogrammetric Engineering and Remote Sensing, 66(8), 1011-1016,  2000.

Pontius, R., Statistical methods to partition effects of quantity and location during comparison of
    categorical maps at multiple resolutions, Photogrammetric Engineering and Remote Sensing, 68(10),
    1041-1049,2002.

Veldkamp, A. and E. Lambin, Predicting land-use change, Agriculture, Ecosystems and Environment,
    85(1), 1-6,2001.
 Page 296 of 339

-------
                             Chapter 18


Accuracy Assessments of Airborne Hyperspectral Techniques
            for Mapping Opportunistic Plant Species
                 in Freshwater Coastal Wetlands

                                  by

                            Ricardo D. Lopez1"
                            Curtis M. Edmonds1
                             Anne C. Neale1
                            Terrence Slonecker2
                             K. Bruce Jones1
                            Daniel T. Heggem1
                              John G. Lyon1
                             Eugene Jaworski3
                             Donald Garofalo2
                              David Williams2
 U.S. Environmental Protection Agency
 National Exposure Research Laboratory
 944 East Harmon Avenue
 Las Vegas, NV 89119

 'Corresponding Author Contact:

  Telephone: (702) 798-2394
    Facsimile: (702) 798-2692
      E-mail: lopez.ricardo@epa.gov
U.S. Environmental Protection Agency
National Exposure Research Laboratory
12201 Sunrise Valley Drive
Reston, VA20192
                      3 Eastern Michigan University
                       Department of Geography and Geology
                       Ypsilanti,Ml48197
                                                               Page 297 of 339

-------
18.1  Introduction

The aquatic plant communities within the coastal wetlands of the Laurentian Great Lakes (LGL) are
among the most biologically diverse and productive ecosystems of the world (Mitsch and Gosselink,
1993). Coastal wetland ecosystems are also among the most fragmented and disturbed, as a result of
impacts from land-use mediated conversions (Dahl, 1990; Dahl and Johnson, 1991). Many LGL coastal
wetlands have undergone a steady decline in biological diversity during the 1900s, most notably within
wetland plant communities (Hcrdcndorfetal., 1986; Herdendorf,  I987; and Stuckcy, 1989). Losses in
biological diversity can often coincide with an increase in the presence and dominance of invasive (non-
native and aggressive native) plant species (Bazzaz, 1986; Noble,  1989).  Research also suggests that the
establishment and expansion of such opportunistic plant species may be the result of general ecosystem
stress (Elton, !958;Odum, 1985).

Reduced biological diversity in LGL coastal  wetland communities is frequently associated with
disturbances such as land-cover (LC) conversion within or along wetland boundaries (Miller and Egler,
1950; Nieringand Warren, 1980). Disturbance  stressors may include fragmentation from road
construction, urban development, agriculture, or alterations in wetland hydrology (Jones et al., 2000;
Jones et al., 2001; Lopez et al., 2002).  Although, specific ecological  relationships between  landscape
disturbance and plant community composition are not well understood. Remote sensing technologies
offer unique capabilities to measure the presence, extent, and composition of plant communities over
large geographic regions. However,  the accuracy of remote sensor derived products can be difficult to
assess, owing to  both species complexity and the inaccessibility of many wetland areas.  Thus, coastal
wetland field data, contemporaneous with remote sensor data collections, is essential to improve our
ability to map and assess the accuracy of remote sensor derived wetland classifications.

The purpose of this study was to assess the utility and accuracy of using airborne hyperspectral imagery
to improve the capability of determining the  location and composition of opportunistic wetland plant
communities.  Here we specifically focused on the results of detecting and mapping dense patches of the
common reed (Phragmites australis).
18.2  Background

Phragmites typically spreads as mono-specific "stands" that predominates throughout a wetland,
supplanting other plant taxa as the stand expands in area and density (Marks et al., 1994).  It's a
facultative-wetland plant, which implies that it usually occurs in wetlands, but occasionally can be found
in non-wetland environments (Reed, 1988).  Thus, Phragmilcs can grow in a variety of wetland soil
types, in a variety of hydrologic conditions (i.e., in both moist and dry substrate conditions).  Compared
to most heterogeneous plant communities, stands tend to provide low quality habitat or forage for some
animals and thus reduce the overall  biological diversity of wetlands. The establishment and expansion of
Phragmites is difficult to control because the species is persistent, produces a large amount of biomass,
propagates easily, and is very difficult to eliminate with mechanical, chemical, or biological control
techniques.
Page 298 of 339

-------
The differences in spectral characteristics between the common reed and cattail (Typha sp.) are thought to
result from differences between their biological and structural characteristics. Phragmites has a fibrous
main stem, branching leaves, and a large seed head that varies in color from a reddish-brown to brownish-
black; while Typha are primarily comprised of photosynthetic "shoots" that emerge from the base of the
plant (at the soil surface) with a relatively small, dense, cylindrical seed head (see Figure 18-1).
Distinguishing between the two in mixed stands can be difficult using automated remote sensing
techniques. This confusion can reduce the accuracy of vegetation maps produced using standard
broadband remote sensor data.
       Phragmlto* mstrtal*
       Common reed
                                                             \~ffi      I
                                                                        i
 Figure 18-1.  lllustration(s) of Phragmites australis and Typha.  With permission from the Institute of Food
             and Agricultural Sciences, Center for Aquatic Plants, University of Florida, Gainesville, FL.
                                                                                       Page 299 of 339

-------
 This chapter explores the
 implications of the biological and
 structural differences, in
 combination with differing soil and
 understory conditions, on observed
 spectral differences within
 Phragmites stands and between
 Phragmites and Typha using
 hyperspectral data.  We applied
 detailed ground-based wetland
 sampling to developed spectral
 signatures for the calibration of
 airborne hyperspectral data, and to
 assess the accuracy of semi-
 automated remote sensor mapping
 procedures.  Particular emphasis
 was placed on linkages between
 field-based data sampling and
 remote sensing analyses to support
 semi-automated mapping. Field
 data provided a linkage to
 extrapolate between airborne sensor
 data and the physical structure of
 Phragmites stands, soil type, soil
 moisture content, and the presence
 and extent of associated plant taxa.
 This chapter presents the wetland
 mapping techniques and results
 from one of the thirteen coastal
 wetland sites currently undergoing
 long-term assessment by the EPA at
 the Pointe Mouillee wetland
 complex (see Figure  18-2).
 18.3   Methods
                            Field-sampled Site Location Legend
                               Pa = Phragmites australla
                               Ts = Typha sp.
                               Nt * non-target plant species
                               Gc = ground control point
Figure 18-2.  Thirteen wetland study sites in Ohio and Michigan
            coastal zone, lettered A-M. Sites were initially
            sampled during July-August, 2001.  Inset image is
            magnified view of Pointe Mouillee wetland complex
            (Site E). White arrows indicate general location of
            both field sampling sites for Phragmites australis
            (i.e., the northernmost stand and the southernmost
            stand). Inset image is a grayscale reproduction of
            false color infrared IKONOS data acquired in August
            2001.
Thirteen coastal wetland sites were selected from a group of 65 potential coastal locations to support the
EPA's wetland assessment efforts in western Lake Erie, Lake St. Clair, Lake Huron, and Lake Michigan
(Lopez and Edmonds, 2001).  These sites were selected after visual inspection of aerial photographs,
topographic and National Wetland Inventory (NWI) maps, National Land Cover Data (NLCD) data,'input
from local wetland experts, and review of published accounts at each wetland (Lyon, 1979; Herdendorf et
al., 1986; Herdendorf, 1987; Stuckey, 1989; Lyon and Greene, 1992).  The study objectives required that
each site (1) generally spanned the gradient of current LGL landscape conditions, (2) consisted of
emergent wetlands, and (3) included both open lake and protected wetland systems.  LC adjacent to the
thirteen selected study sites included active agriculture, old-field agriculture, urban areas, and forest in
varying amounts (Vogelmann et al., 2001).
Page 300 of 339

-------
18.3.1   Remote Sensor Data Acquisition and Processing

Airborne imagery data were collected over the Pointe Mouillee study area using both the PROBE-1
hyperspectral data and the Airborne Data Acquisition and Registration system 5500 (ADAR). The
ADAR sensor enabled remote sensing of materials at the site of < 5.0  m, which is the nominal spatial
resolution of the PROBE-l sensor. The ADAR system is a four camera, multi-spectral airborne sensor
that acquired digital images in three visible and a single near infrared  band. ADAR data acquisition
occurred on August 14, 2001  at an altitude of 1,900 m above ground level (AGL) providing an average
pixel resolution of 75 x 75 cm.  Using EN VI software, a single ADAR scene in the vicinity of the initial
PhrugmitL's sampling location was georeferenced corresponding to a root mean square (RMS) error of
< 0.06 using digital orthorectified quarter quadrangles (DOQQs) and ground control points from field
surveys.

The PROBE-l scanner system has a rotating axe-head scan mirror that sequentially generated cross-track
scan-lines on both sides of nadir to form a raster image cube.  Incident radiation was dispersed onto four
32-channel detector arrays. The PROBE-l data were calibrated to reflectance by means of a National
Institute of Standards (NIS) laboratory radiometric calibration procedure, providing 128 channels of
reflectance data from the visible through the short wave infrared wavelengths (440 - 2,490 nm). The
instrument carried an on-board lamp for recording in-flight radiometric stability along with shutter-closed
(dark current) measurements on alternate scan lines.  Geometric integrity of recorded images was
improved by mounting the PROBE-l on a three axis, gyro-stabilized mount, thus minimizing the effects
in the imagery of changes in aircraft pitch, roll, and yaw resulting from flight instability, turbulence, and
aircraft vibration. Aircraft position was assigned using a non-differential Global Positioning System
(GPS), tagging each scan line with the time, which was cross-referenced with the time interrupts from the
GPS receiver. An inertial measurement unit added the instrument attitude data required for spatial geo-
correction.

During the Pointe Mouillee overflight, the PROBE-l sensor had a 57° instantaneous field of view (IFOV)
for the required mapping of vertical and sub-vertical surfaces within the wetland.  The typical IFOV of
2.5 mrad along track and 2.0 mrad across track results in an optimal ground IFOV of 5.0-10.0 m,
depending on altitude and ground speed. PROBE-l data at Pointe Mouillee were collected on August 29,
2001  at an altitude of 2,170 m AGL, resulting in an average pixel size of 5.0 m x 5.0 m. The data
collection rate was 14 scanlines per second  (i.e., pixel dwell time of 0.14 ms) and the 6.1 km flight line
resulted in total ground coverage of 13 km2. The PROBE-l scene covering Pointe Mouillee was then
georeferenced (RMS error < 0.6 pixel) using the vendor-supplied onboard GPS data, available DOQQs,
and field-based GPS ground control points  provided from August, 2001 field surveys. Georeferencing
was completed using ENV1 image processing software.

The single scene of PROBE-l data covering Pointe Mouillee was initially visually examined to remove
missing or noisy bands.  The resulting 104 bands of PROBE-l data were then subjected to a minimum
noise fraction (MNF) transformation to first determine the inherent dimensionality of the image data,
segregate noise in the data, and reduce the computational  requirements for subsequent processing
(Boardman and Kruse, 1994). MNF transformations were applied as modified from Green et al. (1988).
The first transformation, based on an estimated noise covariance matrix, decorrelated and rescaled the
noise in the data. The second MTF step was a standard principal components transformation of the
"noise-whitened" data. Subsequently, the inherent dimensionality of the data at Pointe Mouillee was
determined by examining the final eigen values and the associated images from the MNF transformations.
The data space was then divided into that associated with large eigen values and coherent eigen images.
                                                                                   Page 301 of 339

-------
 and that with near-unity eigen values and noise-dominated images. By using solely the coherent portions
 the noise was separated from the original PROBE-1 data, thus improving the spectral processing results of
 image classification (RSI, 2001).


 A supervised classification of the PROBE-1 scene was performed using the EN VI Spectral Angle Mapper
 (SAM) algorithm. Because the PROBE-1 flights occurred three weeks after field sampling, there was a
 possibility that trampling from the field crew could have altered the physical structure of the vegetation
 stands.  For this reason, and due to the inherent georeferencing inaccuracies, spectra were collected over a
 3x3 pixel area centered on the single pixel with the greatest percent aerial cover and stem density within
 the vegetation stand (see Figures 18-3 &  18-4). The SAM algorithm was then used to determine the
 similarity between the spectra of homogeneous Phragmites and other pixels in the PROBE-1 scene by
 calculating the spectral angle between them (spectral angle threshold = 0.07 rad). SAM treats the spectra
 as vectors in an ^-dimensional space equal to the number of bands.
Figure 18-3. Field sampling activities were an important part of calibrating the hyperspectral data and
            assessing map accuracy. (A) dense Phragmites canopy and (B) dense Phragmitcs understory
            layer in the northernmost stand. The edges of the stand and the internal transects were
            mapped using a real-time differential global positioning system.
                                                        Figure 18-4

                                                        Magnified view of northernmost field-
                                                        sampled vegetation stands to the east and
                                                        west of Pointe Mouillee Road. Two methods
                                                        were used to quadrat-sample vegetation
                                                        stands: (1) edge and interior was sampled if
                                                        the stand was small enough  to be completely
                                                        traversed (left, Phragmites); or (2) solely the
                                                        interior was sampled if the stand was too
                                                        large to be completely traversed (right,
                                                        Typha). This example shows a Typha stand
                                                        that extended approximately 0.75 km east of
                                                        Pointe Mouillee Road. Thus, the field crew
                                                        penetrated into the stand, but did not
                                                        completely traverse the stand. Black
                                                        squares = nested quadrat sample locations.
                                                        Image is a  grayscale reproduction of a
                                                        natural color spatial subset of an  airborne
                                                        ADAR data acquired August  14, 2001.
Page 302 of 339

-------
 The SAM classification resulted in the detection of 18 image end-members, each with different areas
 mapped as potentially homogeneous regions of dense Phragmites. The accuracy of the  18 end members
 was determined based on reference data derived from the interpretation of 1999 panchromatic aerial
 photography and field observation data collected in 2001.  Additional accuracy checking of mapped areas
 of Phragmites was accomplished using ENVJ Mixture Tuned Matched Filtering (MTMF) algorithms.
 Visual interpretation of the MTMF "infeasibility values" (noise sigma units) versus "matched filtering
 values" (relative match to spectrum), further aided  in the elimination of potential end members.  The
 rnatched filtering values provided a means of estimating the relative degree of match to the Phragmites
 patch reference spectrum and the approximate sub-pixel abundance.  Correctly mapped pixels had a
 matched filter score above the background distribution and a low infeasibility value. Pixels with a high
 matched filter result and high infeasibility were "false positive" pixels that did not match the Phragmites
target.


 18.3.2   Field Reference Data Collection

 To minimize ambiguous site identifications, specific definitions of wetland features were provided to field
 investigators (see Table 18-1).  Vegetation was sampled on August 7-8, 2001 to provide training data for
 the semi-automated vegetation  mapping (see Table 18-2) and subsequent accuracy assessment effort.
 Prior to field deployment, aerial photographs were  used along with on-site assessments to locate six large
 stands of vegetation at the site. They included (I) two stands of Phragmites, (2) two stands of Typha, and
 (3) two non-target vegetation stands for comparison to the target species (see Figure 18-2).  Digital video
 of each vegetation stand was recorded to fully characterize the site for reference during image processing
 and accuracy assessment.  Additional field data used to support accuracy assessment efforts included
 vegetation stand sketches, notes of the general location and shape of vegetation stand, notes of landmarks
 that might be recognizable in the imagery, and miscellaneous site characterization information.
           Table 18-1.  Definition(s) of terms used during field sampling protocol at
                        Pointe Mouillee.
Term
Wetland
Target plant species
Non-target plant species
Vegetation stand
Edge of vegetation stand
Definition(s)
Transitional land between terrestrial and aquatic
ecosystems where the water table is usually at or near
the surface, land that is covered by shallow water, or
an area that supports hydrophytes, hydric soil, or
shallow water at some time during the growing season
(after Cowardinetal., 1979)
Phragmites australis or Typha spp. (per Voss, 1972;
Voss, 1985)
Any herbaceous vegetation other than target plant
species
A relatively homogeneous area of target plant species
with a minimum approximate size of 0.8 ha
Transition point where the percent canopy cover ratio
of target: non-target species is 50:50
                                                                                    Page 303 of 339

-------
    Table 18-2.  Non-spectral data parameters collected {/) along vegetation sampling transects
                 at Pointe Mouillee.
Parameter Description
Number of live target species stems
Number of senescent target species stems
Number of flowering target species stems
Water depth
Litter depth
Mean stem diameter (n=5)
Percent cover live target species in canopy
Percent cover senescent target species in canopy
Percent cover live non-target species in canopy
Percent cover senescent non-target species in canopy
Percent cover live non-target species in understory
Percent cover senescent non-target species in understory
Percent cover senescent target species in understory
(i.e., senescent material that is not litter)
Percent cover exposed moist soil
Percent cover exposed dry soil
Percent cover litter
Percent cover water
General dominant substrate type (i.e., sand, silt, or clay)
Distance to woody shrubs or trees within 15m
Direction to woody shrubs or trees within 15 m
Total canopy cover (area) of woody shrubs or trees within 15m
1.0m2 quadrat
V
/
V"
/
/
/















3.0 m2 quadrat






/
V
/
/
/
/
/
/
/
/
/
/
/
/
/
 I ransects along the edges of target-species stands were recorded using a real-time differential GPS for
 sampled target species (see Figure 18-3).  Each of the two non-target stands of vegetation were delineated
 with a minimum of four GPS points, evenly spaced around the perimeter.  Five GPS ground control
 points (GCPs) were collected at Pointe Mouillee, generally triangulating on the sampled areas of the
 wetland (see Figure 18-2). GPS location points were recorded along with multiple digital photographs, as
 necessary to provide multiple angle views of each sample location.  The edge polygons, GPS points,
 GCPs, field notes, and field-based images (camera) were used to provide  details about ground  data for
 imagery georeferencing, classification, and accuracy assessments.

 A quadrat sampling method was used within each target-species stand to sample herbaceous plants,
 shrubs, trees, and other characteristics of the stand (Mueller-Dombois and Ellenherg,  1974; Barbour,
 1987). Depending on stand size, twelve to twenty (nested) 1.0 m2 and 3.0 m2 quadrats were evenly
 spaced along intersecting transects (see Figure 18-4). The approximate percent cover and taxonomic
 identity of trees and shrubs within a 15.0 m radius were also recorded at each quadrat. Where
Page 304 of 339

-------
appropriate, the terminal quadrat was placed outside of the target-species stand perimeter to characterize
the immediately adjacent area. This placement convention improved the accurate determination of
vegetation patch edge locations.  The location of SAM classification output was accomplished partly by
identifying a uniform corner of each quadrat with the real-time differential GPS to provide a nominal
spatial accuracy of 1.0 m.  Field data were collected to characterize both canopy and understory in
targeted wetland plant communities (see Table 18-2).


Reflectance spectra were measured in the field for each of the target species at four selected wetland sites
(Site A, Site B, Site F, and Site J; see Figure 18-2) on August 14-17, 2001 using a field
spectroradiometer (see Figure 18-5).  Field spectra collected from 1.0 m above the top of the Phragmites
canopy were compared to PROBE-1 to confirm target species spectra at Pointe Mouillee and were
archived in a wetland plant spectral library.
                                                                    Figure 18-5
                                                                    Field spectroradiometry
                                                                    sampling conducted
                                                                    August 14-17, 2001 at
                                                                    four of thirteen wetland
                                                                    sites for comparison to
                                                                    the PROBE-1
                                                                    reflectance spectra.
                                                                    The procedure involved
                                                                    recording (A) reference
                                                                    spectra and
                                                                    (B) vegetation
                                                                    reflectance spectra
                                                                    during mid-day solar
                                                                    illumination.
                                                                    Vegetation spectra
                                                                    were recorded from
                                                                    1.0 m above the
                                                                    vegetation canopy.
                                                                                      Page 305 of 339

-------
 18.3.3   Accuracy Assessment of Vegetation Maps

 A three-tiered approach was used to assess the accuracy of PROBE-1  vegetation maps. This approach
 included unit area comparisons with (I) photointerpreted stereo panchromatic (1999) aerial photography
 (I :l 5,840-scale), (2) GPS vector overlays and field transect data from 2001 (Congalton and Mead, 1983),
 and (3) field measurement data (2002).

 Pointe Mouillee 2002 sampling locations were based on a stratified random sampling grid and was
 provided to a field sampling team as a list of latitude and longitude coordinates along with a site
 orientation image, which included a digital grayscale image of the site with the listed coordinate points
 displayed as an ArcView point coverage.  Stratification of samples was based on Universal Transverse
 Mercator (1,000 m) grid cells (n=17), from which the total number of potential sampling  points were
 selected (n=86). The supplied points represented the center point of mapped areas of dense Phragmites
 (>25 stems/m2 and >75% cover). Accordingly, the 86 sampling points selected to support the validation
 and accuracy assessment effort contained no "false positive'' control locations. At each field validation
 sampling location, both 1.0  m2 and 3.0 m2 quadrats were used. Five differentially corrected GPS ground
 control points were collected to verify the spatial accuracy of field validation locations.
 18.4  Results


 18.4.1   Field Reference Data Measurements

 The northernmost Phragmites stand sampled at Pointe Mouillee was bounded on the eastern edge by an
 unpaved road with two small patches of dogwood and willow in the north, and a single small patch of
 willow in the south (see Figure I 8-4). A mixture of purple loostrife (Lythrum salicaria) and Typha
 bounded the eastern edge of the stand.  Soil in the Phragmites stand was dry and varied across the stand
 from clayey-sand, to sandy-clay, to a mixture of gravel and sandy-clay near the road. Litter cover was a
 constant 100% across the sampled stand, with non-target plants in the understory including smartweed
 (Polygonum spp.), jewel weed (Impatiem spp.), mint (Mentha spp.), Canada thistle (Cirsium arvense),
 and an unidentifiable grass. Cattail was the sole additional plant species in the Phragmites canopy.

 The southernmost Pointe Mouillee Phragmites stand was completely bounded by manicured grass or
 herbaceous vegetation, with dry and clayey soil throughout. Litter cover was 100% and non-target plants
 in the understory included smartweed, mint, purple loosestrife, and an unidentifiable grass. Non-target
 plants were not observed in the canopy. Comparisons of the two field-sampled stands indicated that
 quadrat-10 region of the northernmost stand was the most homogeneous of all sampled quadrats.
 Accordingly,  field transect data were used to determine which pixel(s) in the PROBE-1 data had the
 greatest percent cover of non-flowering Phragmites and the greatest stem density (see Figure 18-6).
Page 306 of 339

-------
                                               10  11    12
                                               10  11   12
                                               10  11   12
   i •,
    -
   u
    i
   1.5
    i
    I
    I
                                                10  11   12
                               __^
     Q—o—o—o—o       Q—o—a
-d—a—a—n—a—a   d—a—a—D—a—a
                                                                Canopy Characteristics
                                                                O Percent cover lolal largel species
                                                                tV Percenl cover total non-target species

                                                                •Q Percent cover live target species
                                                                  • Percenl cover live non-target species

                                                                -B Percent cover dead target species
                                                                 7- Percenl cover dead non-target species
                                                                Target Species Stem Characteristics
                                                                O Mean height (m)
                                                                -0 Mean stem diameter (mm)

                                                                -V- Total number ol storm
                                                                •Q- Numberoflive stems
                                                                -A- Number ol'dcad steins
                                                                -O- Number of flow cring stems
                                                                Understory Characteristics
                                                                O Percent cover total nun-target species
                                                                O- Percent cover live non-target species
                                                                •O- Percent cover dead non-target species

                                                                -A- Percent cover dead target species
                                                                Water, Litter, and Soil Characteristics
                                                                 -O Percent cover open water
                                                                 -£r Percent cover litter
                                                                 O Percenl cover exposed moist soil
                                                                 O Percent cover exposed dry soil
Q Mean depth of open water (cm)
O Mean dcplh of litter (cm)
                                                10  11   12
Figure 18-6. The heterogeneity of canopy, stem, understory, water, litter, and soil characteristics in the
             northernmost Phragmites stand was used to calibrate the PROBE-1 data for the purpose of
             detecting relatively homogeneous areas of Phragmites throughout the Pointe Mouillee wetland
             complex. The most homogeneous area of Phragmites in the northernmost stand was in the
             vicinity of quadrat-10. These pixels were  used in the Spectral Angle Mapper (supervised)
             classification of PROBE-1 reflectance data.
                                                                                               Page 307 of 339

-------
  18.4.2  Distinguishing Between Phragmites and Typha

  Phragmites and Typha are often interspersed within the same wetland, making it difficult to distinguish
  between the two species.  Because plant assemblage uniformity was measured in the field (see Figure
  18-6) we could compare the PROBE-1 reflectance spectra of Phragmites within a single stand of
  Phragmites (see Plate 18-1) and with Typha (see Figure 18-7).  There was substantial spectral variability
  among pixels within the northernmost stand of Phragmites (see Plate 18-1). The greatest variability for
  Phragmites corresponded to the spectral range associated with plant pigments (470 nm - 850 nm) and
  structure (740 nm - 840 nm).  Comparison of reflectance characteristics in the most homogeneous and
  dense regions of Phragmites (quadrat-10) and Typha (quadrat-8) (see Figure 18-4) indicated that
  Phragmites was reflecting substantially less energy than Typha in the near infrared (NIR) wavelengths
  and reflecting substantially more energy than Typha in the visible wavelengths (see Figure 18-7).
                  4000
                  £00
                  3000
                  2500
                  2000
                 1500
                 1000
                  500
                                                                       Approximate Quadrat
                                                                        Pixel Associations
                                                                           Quadrat 12
                    440
                          490
                               540
590    640   690

  Wavelength (nm)
                                                            790   840
           Plate 18-1.   Comparison of Phragmites australis among 10 field-sampled quadrats,
                       using spectral reflectance of PROBE-1 data (480 nm - 840 nm).  Pixel
                       locations were in the approximate location of quadrats in the
                       northernmost Phragmites stand at Pointe Mouillee.
Page 308 of 339

-------
    3000
    2500
     2000
     1500
     1000
      500
                                                                           -O- Phragmites australis
                                                                           -S*- Typha sp
        450
500
                        550
600      650      700
   Wavelength (nm)
                                        750
850
Figure 18-7.  Comparison of Phragmites australis and Typha sp. spectral reflectance in separate relatively
            homogeneous stands (5.0 m x 5.0 m). Pixel locations were in the northernmost Phragmites
            (quadrat-10) and Typha (quadrat-8) field sites.
 18.4.3  Semi-Automated Phragmites Mapping

 Based on the analyses of field measurement data, digital still photographs, digital video images, field
 sketches, and field notes, we selected nine relatively pure pixels of Phragmites centered on quadrat-10 in
 the northernmost stand (see Figure 18-4).  A supervised Spectral Angle Mapper (SAM) classification of
 the PROBE-1  imagery, using precision-located field characteristics, resulted in a vegetation map
 indicating the likely locations of homogeneous Phragmites stands (see Plate 18-2). Several of the
 mapped areas were within the drier areas of the Pointe Mouillee wetland complex, which was typical of
 Phragmites observed in other diked Lake Erie coastal wetlands.
                                                                                    Page 309 of 339

-------
            Plate 18-2.  Results of a Spectral Angle Mapper (supervised) classification,
                       indicating likely areas of relatively homogeneous stands of
                       Phragmites australis (solid blue) and field-based ecological data.
                       Black arrows show field-sampled patches of Phragmites. Areas of
                       mapped Phragmites are overlaid on a natural color PROBE-1 image of
                       Pointe Mouillee wetland complex (August 29, 2001). Yellow "P"
                       indicates location of generally known areas of Phragmites, as
                       determined from 1999 aerial photographs.
Page 31 Oof 339

-------
 18.4.4  Accuracy Assessment
    r- 1  accuracy assessments that compared Phragmites maps to photointerpreted reference data
supplemented with and field notes resulted in an estimated accuracy of 80% (n=l I ) for the presence or
absence of Phragmites.  Tier-2 assessments resulted in an approximate ± 1 .0 pixel accuracy relative to the
actual  location of Phragmites on the ground.  Tier-3 field-based accuracy assessments resulted in 91%
Accuracy (n=86).  Eight of the sampling points were located in vegetated areas other than Phragmites
/'i.e., either Typha or other mixed wetland species), resulting in an omission error rate of 9%. Because the
allalyses presented here solely pertain to locations of relatively dense Phragmifes (>25 stems/nr and
^75% cover), errors of commission were not calculated.

-------
archived aerial photography to assess and understand site history; and (6) collaborating with local wetland
experts to better understand the ecological processes at the site and the historical context of changes.
18.6   Conclusions

The use of hyperspectral data at Pointe Mouillee demonstrated the spectral differences between
Phragmites and Typha. Spectral differences between taxa are likely attributable to differences in
chlorophyll content, plant physical structure, and water relations of the two taxa.  The combined use of
detailed ecological field data, field spectrometry data, and multi-scalar accuracy assessment approaches
were instrumental to our ability to validate mapping results for Phragmilcs and provide important
information to assess the future coastal mapping efforts in the L(JL. Additional classification and
accuracy assessment procedures are ongoing at twelve other wetland study sites to determine the broader
applicability of these techniques and results (Lopez and Edmonds, 2001; Figure 18-2). Other important
ongoing research related to advanced hyperspectral wetland remote sensing include:  (1) improving
techniques for separating noise from signal in hyperspectral data; (2) determining the relevant
relationships between imagery data and field data for other plant species and assemblages; (3) calibrating
sensor data with field spectral data; (4) cross-platform data merging to improve detection of plant taxa;
and (5) additional assessment techniques using field reference data.

The results of this study describe the initial steps required to investigate the correlations between local
landscape disturbance and the presence of opportunistic plant species  in coastal wetlands. These results
support general goals to develop techniques for mapping vegetation in ecosystem types other than
wetlands, such as upland herbaceous plant communities. The results of this and other similar research
may help to better quantify the cost-effectiveness of semi-automated vegetation mapping  and accuracy
assessments so that local, state, federal, and tribal agencies in the IXJL can decide if such techniques are
useful for their monitoring programs.
 18.7  Summary

 The accuracy of airborne hyperspectral PROBE-1  data was assessed for detecting dense patches of
 Phragmites austmlis in LGL coastal wetlands.  This chapter presents initial research results from a
 wetland complex located at Pointe Mouillee, Michigan. This site is one of thirteen coastal wetland field
 sites currently undergoing long-term assessment by the EPA. Assessment results from wetland field
 sampling indicated that semi-automated mapping of dense stands of Phrcigniilcs were 91% accurate using
 a supervised classification approach.  Results at Pointe Mouillee are discussed in the larger context of the
 long-term goal of determining the ecological relationships between landscape disturbance in the vicinity
 of wetlands and the presence of Phragmites.
 18.8  Acknowledgments

 We thank Ross Lunetta and an anonymous reviewer for their comments regarding this manuscript. We
 thank Joe D'Lugosz, Arthur Lubin, John Schneider, and EPA's Great Lakes National Program Office for
 their support of this project.  We thank Marco Capodivacca, Karl Leavitt, Joe Robison, Matt I lamilton.
Page 312 of 339

-------
and Susan Braun for their help with the field sampling work.  The EPA's Office of Research and
Development (ORD) and Region 5 Office jointly funded this project. This publication has been subjected
to the EPA's programmatic review and has been approved for publication. Mention of any trade names
or commercial products does not constitute endorsement or recommendation for use.
18.9  References

Barbour, M.G., J.H. Burk, and W.D. Pitts. Terrestrial Plant Ecology, Benjamin/Cummings. Menlo Park,
    CA, 1987, p. 634.

Bazzaz, F.A.  Life history of colonizing plants. In:  Ecology of Biological Invasions of North America
    and Hawaii (H.A. Mooney and J.A. Drake, Editors), Springer-Verlag, NY, 1986.

Boardman, J.W. and F.A. Kruse. Automated spectral analysis: a geological example using AVIRIS data,
    north Grapevine Mountains, Nevada.  In: Proceedings, ERIM Tenth Thematic Conference on
    Geologic Remote Sensing. Environmental Research Institute of Michigan, Ann Arbor, MI, 1994.

Congalton, R. and R. Mead.  A quantitative method to test for consistency and accuracy in
    photointerpretation.  Photogrammetric Engineering and Remote Sensing, 49, 69-74, 1983.

Cowardin, L.M., V. Carter, F.C. Gollet, and E.T. LaRoe. Classification of Wetlands and Deep-water
    Habitats of the United States. FWS/OBS-79/31, U.S. Fish and Wildlife Service, Washington, DC,
    103 p., 1979.

Dahl, T.E.  Wetlands Losses in the United States, 1780s to 1980s. U.S. Fish and Wildlife Service,
    Washington, DC, 21  p., 1990.

Dahl, T.E. and C.E. Johnson.  Status and Trends of Wetlands in the Conterminous United States, Mid-
    1970s to Mid-1980s. U.S. Fish and Wildlife Service, Washington, DC, 28 p., 1991.

Elton, C.S. The Ecology of Invasions by Animals and Plants.  Methuen, London, UK, 181 p., 1958.

Green, A.A., M. Berman, P. Switzer, and M.D. Craig. A transformation for ordering multispectral data in
    terms of image quality with implications for noise removal. IEEE Transactions on Geoscience and
    Remote Sensing, 26(1), 65-74, 1988.

Herdendorf, C.E.  The Ecology of the Coastal Marshes of Western Lake Erie: A Community Profile.
    Biological Report 85(7.9), U.S. Fish and Wildlife Service, Washington, DC, 171 p., 1987.

Herdendorf, C.E., C.N. Raphael, E. Jaworski, and W.G. Duffy.  The Ecology of Lake St. Clair Wetlands:
    A Community Profile. Biological Report 85(7.7), U.S. Fish and Wildlife Service, Washington, DC,
    187 p., 1986.

Jones, K.B., A.C. Neale,  M.S. Nash, K.H. Riitters, J.D.  Wickham, R.V. O'Neill, and R.D. Van Remortel.
    Landscape correlates of breeding bird richness across the United States Mid-Atlantic Region.
    Environmental Monitoring and Assessment, 63, 159-174, 2000.

Jones, K.B., A.C. Neale,  M.S. Nash, R.D. Van Remortel, J.D. Wickham, K.H. Riitters, and R.V. O'Neill.
    Predicting nutrient and sediment loadings to streams from landscape metrics: A multiple watershed
    study from the United States Mid-Atlantic Region.  Landscape Ecology, 16(4), 301-312, 2001.
                                                                                 Page 313 of 339

-------
Lopez, R.D., C.B. Davis, and M.S. Fennessy. Ecological relationships between landscape change and
    plant guilds in depressional wetlands. Landscape Ecology, 17, 43-56, 2002.

Lopez, R.D. and C.M. Edmonds. An Ecological Assessment of Invasive and Aggressive Plant Species in
    Coastal Wetlands of the Laurentian Great Lakes: A Combined Field Based and Remote Sensing
    Approach. EPA/600/R-01/018, U.S. Environmental Protection Agency, Washington, DC, 22 p.,
    2001.

Lyon, J.G.  Remote sensing analyses of coastal wetland characteristics: The St. Clair Flats, Michigan.
    Proceedings of the Thirteenth International Symposium on Remote Sensing of Environment,
    April 23-27, Ann Arbor, MI, 1979.

Lyon, J.G. and R.G. Greene.  Use of aerial photographs  to measure the historical areal extent of Lake Erie
    coastal wetlands.  Photogrammetric Engineering and Remote Sensing, 58, 1355-1360, 1992.

Marks, M., B. Lapin,  and J. Randall. Phragmites australis (P. communis): threats, management, and
    monitoring. Natural Areas Journal,  14,285-294, 1994.

Miller, W. and F. Egler. Vegetation of the Wequetequock-Pawcatuck tidal marshes, CT. Ecological
    Monographs, 20, 147-171, 1950.

Mitsch, W.J. and J.G. Gosselink.  Wetlands. Van Nostrand Reinhold, NY, 722 p., 1993.

Mueller-Dombois, D. and H. Ellenberg. Aims and Methods of Vegetation Ecology.  J. Wiley and Sons,
    London, UK, 547 p., 1974.

Niering, W.A. and R.S. Warren.  Vegetation patterns and processes in New England salt marshes.
    BioScience, 30, 301-307, 1980.

Noble, I.R. Attributes of invaders and  the invading process: terrestrial and vascular plants.  In:
    Biological Invasions:  A Global Perspective (J.A. Drake, H.A.  Mooney, F. di Castri, R.H. Groves,
    F.J. Kruger, M. Rejmanek, and M. Williamson, Editors).  John Wiley, Chichester, UK, 1989.

Odum, E.P. Trends expected in stressed ecosystems. Bioscience, 35(7), 419-422, 1985.

Reed, P.B., Jr.  National List of Plant Species that Occur in Wetlands.  U.S. Fish and Wildlife Service,
    Biological Report 88(26.3), Washington, DC, 99 p., 1988.

RSI (Research  Systems, Inc.) ENVI User's Guide. ENVIversion 3.5, RSI, Boulder, CO, 2001.

Stuckey, R.L. Western Lake Erie Aquatic and Wetland  Vascular-Plant Flora: Its Origin and Change. In:
    Lake Erie Estuarine Systems: Issues, Resources, Status, and Management (K.A. Krieger, Editor).
    National Oceanographic and Atmospheric Administration, Washington, DC, 1989.

Vogelmann, J.E., S.E. Howard, L. Yang, C.R. Larson, B.K. Wylie, and N. Van Driel. Completion of the
    1990s National Land Cover Data Set for the Conterminous United States from Landsat Thematic
    Mapper Data and Ancillary Data Sources. Photogrammetric Engineering and Remote Sensing, 67,
    650-662,2001.

Voss, E.G. Michigan Flora (Part I, Gymnosperms and Monocots). Cranbrook Institute of Science and
    University of Michigan Herbarium, Ann Arbor, MI, 488 p., 1972.

Voss, E.G. Michigan Flora (Part II, Dicots, Saururaceae - Cornaceae).  Cranbrook Institute of Science
    and University of Michigan Herbarium, Ann Arbor, MI, 724 p., 1985.
Page 314 of 339

-------
                                   Chapter 19

              A Technique for Assessing the Accuracy of
                Sub-Pixel Impervious Surface Estimates
                    Derived from  Landsat TM Imagery

                                         by

                                 S. Taylor Jarnagin*1
                                 David B. Jennings1
                                  Donald W. Ebeif
   U.S. Environmental Protection Agency
   National Exposure Research Laboratory
   12201 Sunrise Valley Drive
   Reston, VA 20192

   Corresponding Author Contact:
    Telephone:  (703) 648-4797
      Facsimile:  (703) 648-4290
        E-mail:  iarnagin.tavlor(S)epa.aov
U.S. Environmental Protection Agency
National Exposure Research Laboratory
944 East Harmon Avenue
Las Vegas, NV 89193-3478
19.1  Introduction

An emerging area in remote sensing science is sub-pixel image processing (Ichoku and Karnieli, 1996).
Sub-pixel algorithms allow the characterization of spatial components at resolutions smaller than the size
of the pixel.  Recent studies have shown the general effectiveness of these techniques (Huguenin, 1994;
Huguenin, 1997). The importance of sub-pixel methods is particularly relevant to the field of impervious
surface mapping where the predominance of the "mixed-pixel" in medium-resolution imagery forces the
aggregation of urban features such as roadways and rooftops into general "developed" categories (Civco,
1997; Ji and Jensen, 1999; Smith, 2001).  The amount of impervious surface in a watershed is a landscape
indicator integrating a number of concurrent interactions that influence a watershed's hydrology, stream
chemical quality, and ecology and has emerged as an important landscape element in the study of non-
point source pollution (NFS) (USEPA, 1994).  As such, Schuler (1994) proposed that impervious surfaces
                                                                         Page 315 of 339

-------
should be the single unifying environmental theme for the analysis of urbanizing watersheds.  Effectively
extracting the percent impervious surface from medium-resolution imagery would provide a time and cost
savings as well as allowing the assessment of these landscape features over extensive geographic area
such as the Chesapeake Bay. As part of the Multi-Resolution Landscape Characterization 2000 program
(MRLC 2000), the United States Geological Survey (USGS) has embarked on an effort to map
impervious surfaces across the conterminous United States utilizing sub-pixel techniques. This study
proposes to produce a spatial and statistical framework from within which we can investigate sub-pixel
derived estimates of a inaterial-of-interest (MOI) utilizing multiple accuracy assessment strategies.

Traditional map accuracy assessment has utilized a contingency table approach for assessing the per-pixel
accuracy of classified maps.  The contingency table is referred to as a confusion matrix or error matrix
(Story and Congalton, 1986). This type of assessment is a "hit or miss" technique and produces a binary
output in that a pixel  is either "correct" or "not correct.'' The  generally accepted overall accuracy level
for land-use (LU) maps has been 85% with approximately equal accuracy for most categories (Jensen,
1986). While alternative techniques to assess the accuracy of land-cover (LC) maps using measurement
statistics  such as the Kappa coefficient of agreement have been proposed, most methods still rely on the
contingency table and use  per-pixel assessments of the thematic map class compared to "truth" sample
points (Congalton and Green, 1999).  However, as noted by Ji and Jensen (1999), this classical "hit or
miss" approach is problematic with respect to assessing the accuracy of a sub-pixel derived classification.
A sub-pixel algorithm allows the pixel to be classified based on the percent of a given material of interest
(MOI) such that for any given pixel the "fit" to truth can be assessed. A level of accuracy can be still be
obtained from a pixel that  "misses" the truth. The derivation of a percent of a MOI per-pixel allows for
alternative accuracy assessment approaches such as aggregate whole-area assessments (i.e., watershed)
and correlations (Ji and Jensen, 1999). These alternative  approaches may produce adequate accuracies
despite a lower per-pixel accuracy being derived from the standard error matrix.

An accuracy assessment of sub-pixel data is largely dependent upon high-resolution planimetric maps or
images to provide reference data. Concurrent with the emergence of sub-pixel techniques has been a
trend in the production of high-resolution datasets including high-resolution multi-spectral satellite
imagery, CIS planimetric data, and USGS Digital Ortho Quarter Quads (DOQQ's).  All these data sources
can be readily processed within standard G1S software packages and used to assess the accuracy of sub-
pixel estimates, as derived from Landsat data, over large geographic regions.

In this study we compared classified sub-pixel impervious surface data derived from Landsat TM imagery
and planimetric impervious surfaces maps produced from photogrammetric mapping processes.
Comparisons were performed on the classified sub-pixel (30 m) using planimetric reference data, in a
raster G1S overlay environment.  Our goal was to produce a spatial framework in which to test the
accuracy of sub-pixel derived estimates of impervious surface coverage.  In addition to a traditional per-
pixel assessment of accuracy, our technique allowed for a correlation assessment, and an assessment of
the whole-area accuracy of the impervious surface estimate per unit-area (i.e., watershed). The latter is
important for ecological and water quality models that have percent impervious surface as a variable
input.
Page 316 of 339

-------
19.2  Methods

19.21   Study Area

Our study area was the Dead Run watershed - a small 14.0 km2 sub-watershed located 9.0 km west of
Baltimore, Maryland (see Figure 19-1). The Dead Run sub-watershed is a portion of the greater
Baltimore Long Term Ecological Research (LTER) area located in Baltimore County, Maryland and
resides within the coastal plain and piedmont geologic areas of the mid-Atlantic physiographic region.
previously produced planimetric and sub-pixel data sets were available for the area.
  Figure 19-1.  Location of the 14 km2 Dead Run sub-watershed approximately 9.0 km west of Baltimore, MD.


 19.2.2   Data

 Sub-pixel impervious surface cover data, derived from TM imagery, was provided by the University of
 Maryland's mid-Atlantic Regional Earth Sciences Application Center (RESAC) impervious surface
 mapping effort (Mid-Atlantic RESAC, 2002). The mid-Atlantic RESAC process utilized a decision tree
 classification system to map eleven different levels  of impervious surface percent per 30 m pixel
 (Table 19-1) (Smith, 2001). Reference data was obtained using photogrammetrically derived GIS
 planimetric vector data provided by Baltimore County, Maryland.  The vector data included
 anthropogenic features such as roads, parking lots and rooftops but did not include driveways associated
 with single-family homes. The lack of compiled driveways was a limitation of the truth set and has the
 potential to be a source of error.

 The Dead Run sub-watershed was delineated using USGS Digital Raster Graphics (DRG) and  "heads-
 up" digital collection methods.  The compiled Dead Run sub-watershed was subsequently utilized to clip
 both the mid-Atlantic RESAC  raster data and the Baltimore county impervious surface planimetric data.
 This produced a spatially coincident Dead Run 30 m sub-pixel estimate GRID and a Dead Run
 impervious surface truth vector file (see Figure 19-2).  All data were processed  in the UTM Zone 18,
 NAD83 projection.  The respective datasets were independently registered (prior to our study) and  no
 attempt was made to co-register the data via image-to-image methods.
                                                                                   Page 317 of 339

-------
       Table 19-1. The University of Maryland Mid-Atlantic RESAC impervious surface
                   percentage per pixel classes.  Classes are represented in the raster data as
                   10, 20, etc., such that class 1 -10 = 10,11 - 20 = 20, etc.
                                            Impervious
                                              Class
                                                0
                                               1 - 10
                                             11-20
                                             21 -30
                                             31 -40
                                             41 -50
                                             51 -60
                                             61 -70
                                             71 -80
                                             81 -90
                                             91 - 100
                                                                           lmp*rviout Surface
                                                                           Am Dtfined from
                                                                            High -Resolution
                                                                              Source*
 Figure 19-2.  Two graphics of the Dead Run sub-watershed showing the separate data tvoe  of (a) s
            denved estates of imperious surface percent and (b) truth imperlioS^surface"vector file"
Page 318 of 339

-------
 19.2.3   Spatial Processing
<3IS raster overlay techniques were utilized to compute the reference values for percent impervious
surface for each 30 m grid cell within the Dead Run sub-watershed.  The process was a modified form of
2:onal analysis. Here however, the zones are the individual 30 m classified pixels as opposed to individual
land  LU/LC zones.  This method was a variation of the overlay processes reported by Prisloe et al. (2000)
     Smith (200 1) and included the following analysis procedures.
      A vector-to-raster conversion of the Dead Run impervious surface reference data was performed to
      produce a high-resolution (3.0 m) impervious surface grid cell (0.0 = non-impervious, 1 .0 =
      impervious).

      A comparison of the classified 30 m Dead Run data with the 3.0 m impervious surface reference
      data was performed using an overlay process, which calculated the number of reference data cells
      spatially coincident with the classified data (see Plate 1 9- 1).  The count of coincident reference data
      cells percent for each Dead Run grid cell was tallied.
                         Dead Run Roadt
                         Dead Run Buildings
Sub-Pixel Estimates of impervious
    Surface % per Pixel
3 meter Truth Impervious
    Surface Data
  plate 19-1.   An approximate 15 ha portion of the Dead Run sub-watershed showing (a) truth vector file
              (roads and rooftops) overlain on a USGS DOQQ and (b) rasterized 3.0 m reference GRID
              overlain on the 30 m sub-pixel estimate grid.
                                                                                       Page 319 of 339

-------
       A vector point file point file was created based on the Dead Run cell centroids. Table 19-2
       summarizes the percent impervious reference data (REFERENCE_1S%) and associated sub-pixel
       impervious surface estimate (SUB-PIXEL  IS%) for each Dead Run cell record.  Reference data
       were "rounded-up" to coincide with the sub-pixel estimate class structure implemented by the mid-
       Atlantic RESAC (1-10=10, 1 1-20=20, etc.). Table 19-2 was used to derive per pixel or aggregate
       watershed error assessments statistics.
               Table 19-2.  Attribute table produced from the overlay of the truth and
                           sub-pixel classified data sets (Plate 19-1). The
                           continuous REFERENCE_IS% field data are subsequently
                           "rounded-up" to coincide with the mid-Atlantic RESAC
                           classification structure (Table 19-1). Data from the
                           SUB-PIXEL_IS% and REFERENCE_IS% fields are utilized
                           in the accuracy assessment.
POINTJD
24944
24945
24946
24947
24948
24949
24950
24951
24952
24953
24954
24955
25093
25094
25095
25096
25097
25098
25099
25100
25101
25102
25103
25104
SUB-PIXEL_IS%
100
20
20
20
40
40
40
40
40
30
90
90
40
90
40
50
100
50
30
100
90
50
90
90
REFERENCE_IS%
7
11
0
0
39
30
8
43
27
0
63
82
28
36
16
0
45
31
4
34
31
5
56
93
Page 320 of 339

-------
 19.2.4   Statistical Processing

\Ve tested the overall classification accuracy of the sub-pixel derived impervious surface estimates by
comparing "per-pixel" measures of accuracy with whole-area measures of accuracy for a series of simple
random samples (with replacement). A range of sample sizes corresponding to various unit-areas were
utilized to determine if the calculated accuracies were dependent on sample size. Sample sizes ranged
from the entire Dead Run watershed (1 5,65 1 pixels) to simple random samples of 225 pixels.  We wanted
to achieve an absolute ± 95% C.I. of < 5% and found that six replicates per unit-area provided that level
of accuracy for every sample size except the smallest. Given the incomplete nature of the planimetric
truth dataset, a more rigorous sampling scheme was considered to be unnecessary.  To explore the issue
of spatial autocorrelation of sub-pixel classified imagery, we sampled a series of discrete pixel blocks
without replacement and compared "per-pixel" measures of accuracy with whole-area measures of
accuracy. Six replicates per block sizes 3  x 3, 5 x 5, 9 x 9, 1 5 x 1 5, and 25  x 25 were used.
    assess pixel accuracy, we processed the reference and sub-pixel classified data within an eleven-
category contingency table and calculated a per-pixel overall accuracy value and a Kappa coefficient of
agreement (Khat) value.  To assess the whole-area accuracy, we compared the sub-pixel derived
jrnpervious estimates with the reference data estimates per unit-area and calculated the absolute value of
the relative error ([abs(REFERENCE_IS% - SUB-PIXEL _IS%)]/ (REFERENCE_IS%)) for each of the
six sample replicates. From the six replicates we computed the mean and coefficient of variation (defined
as  the standard deviation divided by the mean, expressed as a percent) at both the pixel and whole-area
^easures of accuracy for each of the unit areas.


Additionally, per Ji and Jensen (1 999), we performed a per-pixel rank Spearman correlation test between
the sub-pixel estimates and the reference data for all cells in the Dead Run watershed (no smaller unit area
were processed for the rank test).
49.3  Results and Discussion

The authors wish to stress that this study is not an accuracy assessment of the mid-Atlantic RESAC sub-
  jxel classification, but rather a d.scussion of alternative assessment methodologies that may be more
Compatible with the characteristics of sub-pixel classified data. The classified data utilized for this study
^ere in preliminary form and were not meant for external distribution for use in watershed assessments
Although we did not quantitatively assess the registration displacement of the two datasets, a manual
review showed an approximate  and non-systematic 1.0 m difference throughout the Dead Run sub-
w/atershed.

 Our results indicated that the pixel-based methods of determining the accuracy of the sub-pixel estimates
 yielded results that were consistently lower than the whole-area method of determining accuracy (see
 Tables 19-3 and 19-4). The whole-area estimate of impervious surface percent for the entire Dead Run
 watershed was approximately 71% accurate. Also, the whole-area estimates were robust with respect to
 the size of the sample  subset, although the variability of the estimate increased with smaller sample sizes.
 The per-pixel assessments of accuracy for the same unit-area datasets were approximately 28% (Kappa
 0.19) for the Error Matrix Overall Accuracy measurement.  For simple random sampling, the per-pixel
 assessments of accuracy showed less variability with smaller sample sizes than the whole-area method,
 with the error matrix overall accuracy measurement being particularly stable in this regard (Table 19-3).
                                                                                    Page 321 of 339

-------
For pixel block sampling, measured accuracy declined with smaller block sizes when considering both the
whole area and per-pixel methods of accuracy assessment (Table 19-4).
 Table 19-3.  A comparison of accuracy assessment statistics derived at different spatial scales
             of analysis using per-pixel and whole-area assessment comparisons of the
             classified and planimetric reference data sets based on simple random samples of
             the data.  The relative percent correct column lists the whole-area accuracy and the
             error matrix overall accuracy and Khat columns list pixel-based accuracy estimate
             values.
Portion of
Watershed
Sampled
(w/replacement)
Full
Half
Quarter
Eighth
Sixteenth
1/25th
1/40th
1/70th
Pixels
Sampled
per Run
15651
7825
3913
1956
978
625
400
225
Area
Analyzed
(Km2)
14.086
7.0425
3.5217
1 .7604
0.8802
0.5625
0.36
0.2025
Relative
Percent
Correct
(± 95 %
C.I.)
70.85 (-)
70.61
(0.39)
71.18
(0.63)
72.33
(2.47)
71.67
(2.33)
67.20
(4.38)
73.92
(3.06)
71.43
(5.74)
Relative
Percent
Correct
C.V. (%)
-
1.65
2.74
11.16
10.26
16.67
14.65
25.13
Error
Matrix
Overall
Accuracy
(± 95 %
C.I.)
28.41 (-)
28.31
(0.30)
28.72
(0.71)
29.12
(0.44)
r 27.74
(091)
2779
(081)
2850
(2.13)
27.31
(2.43)
Error
Matrix
Overall
Accuracy
C.V. (%)
-
1.31
3.08
1.87
4.12
3.65
9.35
11.13
Khat
(± 95 %
C.I.)
0.1853(-)
0.1846
(0.0027)
0.1871
(0.0060)
0.1928
(0.0033)
0.1774
(0.0078)
0.1787
(0.0085)
0.1899
(0.0197)
0.1780
(0.0244)
Khat
C.V.
(%)
—
1.85
3.98
2.11
5.48
5.93
12.99
17.15
  Table 19-4.  A comparison of accuracy assessment statistics derived at different spatial scales
             of analysis using per-pixel and whole-area assessment comparisons of the
             classified and planimetric reference data sets based on pixel blocks sampled
             without replacement. The relative percent correct column lists the whole-area
             accuracy and the error matrix overall accuracy and Khat columns list pixel-based
             accuracy estimate values.

Block Size
Sampled
(w/o replacement)
25 x 25 blocks
15x15 blocks
9x9 blocks
5x5 blocks
3x3 blocks

Pixels
Sampled
per Run
625
225
81
25
9
Area
Analyzed
(Km2)
per block
0.5625
0.2025
0.0729
[ 0.0225
0.0081
Relative
Percent
Correct
(± 95 % C.I.)
63.68(13.43)
54.67(13.33)
51.47(17.21)
48.05 (26.33)
-17.81 (114.25)
Error Matrix
Overall
Accuracy
(± 95 % C.I.)
26.75 (4.48)
23.78 (8.42)
21.81 (8.04)
17.33(7.75)
18.52(7.26)

Error
Matrix
C.V.
20.91
44.24
I" 46.08
55.90
48.99


Khat
(± 95 % C.I.)
0 1593 (00439)
0.1214(0.0751)
0.1018(0.0730)
0.0455 (0.0576)
0.0732 (0.0742)


KM
C.V.
34.47
77.30
89.66
158.06
126.71
Page 322 of 339

-------
The Spearman correlation results were 0.609, suggesting an increased estimate of accuracy compared to
the result from the contingency table assessment. Ji and Jensen (1999) also noted an increase in accuracy
when utilizing the rank correlation test. Of particular benefit would be sub-pixel classifications that yield
continuous data estimates as opposed to rank order data.  These data would allow for regression modeling
that could be applied to the individual per-pixel errors.

The results presented here would have greatly benefitted from a more accurate reference data set. The
lack of driveways in the planimetric dataset affected a large proportion of the pixels and probably served
to under-report the actual truth for any given pixel. Intuitively, we feel that this "lack of truth" probably
had a greater effect on the per-pixel assessments (error matrix) than on either the Spearman correlation or
the whole-area approaches.  This probably explains a portion of the low per-pixel accuracy.  However, all
three approaches have been affected by the inaccuracies  in the truth set. For example, using non-random
techniques, we sampled 50 driveway areas to derive a total driveway area in the sub-watershed.
Summing the driveway area to the previously compiled planimetric impervious surface area increased the
accuracy of the whole-area approach to approximately 85%. This underscores the need for high-quality
reference data when assessing sub-pixel estimates. However, reference data sets in  the "real-world" will
always contain a certain proportion  of error.  The GIS overlay framework effectively extrapolated the
reference impervious surface to correspond to the classified 30 m Dead Run cell. These spatial overlay
methods provided here may be repeated over any region to assess the accuracy of any pixel-based
product.

The overlay framework also allows  for the analysis of the
spatial distribution of errors.  Figure 19-3 is  an error grid
showing the absolute error per 30 m cell for  the entire Dead
Run sub-watershed. A cursory review of the error grid
reveals that approximately 66% of the errors exist within the
 1.0-20% (absolute error) range signifying that a majority of
cells (within the two datasets) are in close agreement. This
explains why the correlation assessment outperformed
the assessment from the contingency matrix. We can
also discern that the contiguous blocks of error in the
90-100% range are primarily due to areas not
compiled in the truth data but present in the  Landsat
data. This would include anthropogenic areas-of-
 interest such as parking lots as well as bare soil areas
 not included in the truth  data. Generally, in  these
 contiguous areas of large error, the sub-pixel classification
 outperformed the truth data.  We feel that this is in part due
 to a temporal disconnection between the two datasets.
 Areas that were not included in the truth dataset as of the
 date of the imagery acquisition were actually present and
 imaged by the sensor.  Misregistration  between the two
 data sets can also be observed in the error grid. Linear patterns
 appear to be associated with the large roadways that traverse the
 area and are generally associated  with the middle
 range of error (30-80%). Spatial aggregation of error
 also contributes to decreased measures of accuracy
 when using sample blocks of pixels compared to a
 simple random sampling scheme.
                            Absolute error (in %) of
                             tub-pixel estimate*
                             at compared to truth
                                     0
                                   10-20
                                   30-50
                                   60-80
                                   90-100
Figure 19-3.  The error grid shows the absolute
            per pixel error between the truth and
            sub-pixel estimates.
                                                                                      Page 323 of 339

-------
 19.4  Conclusions

 Results indicate that accuracy assessments of sub-pixel derived estimates based on per-pixel sampling
 strategies may underestimate the overall accuracy of the map product.  We believe this is because per-
 pixel assessments of sub-pixel estimates are sensitive to registration accuracies, the accuracy of the truth
 data and classification variability at the pixel scale.  A more robust sub-pixel assessment may be achieved
 by applying a whole-area (aggregate) or correlation based approaches.  These approaches are less
 sensitive to differences in  image registrations as well as errors in the truth set and, in certain large-area
 applications such as watersheds, are probably a more realistic indicator of the sub-pixel classification map
 accuracy.

 With respect to impervious surfaces, we believe this technique has considerable merit when considering
 water quality and watershed runoff models that require, as an input, the percentage of impervious surface
 area above a given gage or "pour point." Our analysis shows that whole-area scale estimates of a sub-
 pixel derived MOI can be relatively accurate even when the per-pixel measurement of accuracy, derived
 from the contingency table, is very low.  Furthermore, the accuracy assessment of sub-pixel estimates
 over large areas by utilizing a sampling scheme based on sampled unit-areas (i.e., 5 x 5  or 9 x 9 windows)
 may not be as accurate as one based on the simple random sampling of individual pixels to derive a
 whole-area estimation of an MOI.  For application over large geographic regions, high-resolution multi-
 spectral satellite data could provide an optimal  data source for these sampling situations. Further
 investigation is necessary to corroborate these results over multiple watershed areas using more accurate
 reference data.


 The spatial and statistical techniques reported here provide an analytical tool that can be used to  easily
 make per-pixel, unit-area, or correlation based accuracy assessments of sub-pixel derived classification
 estimates. In addition, these techniques allow the spatial relationship of the per-pixel error to be explored.
 The raster overlay technique easily extracted the data necessary to derive these assessments. The
 ArcView Avenue format script used here, although primarily suited for the assessment of data at the sub-
 pixel level, can be utilized (or altered) to derive the accuracy of any classified dataset where higher
 resolution truth data is available.
 19.5  Summary


 I his chapter presents a technique for assessing the accuracy of sub-pixel derived estimates of impervious
 surface extracted from Landsat TM imagery. We utilized spatially coincident sub-pixel derived
 impervious surface estimates, high-resolution planimetric CIS data, vector-to-raster conversion methods
 and raster GIS overlay methods to derive a level of agreement between the sub-pixel classified estimates
 and the planimetric truth in the Dead Run watershed, a small (14 km2) sub-watershed in the mid-Atlantic
 physiographic region. From the planimetric data we produced a per-pixel reference data estimate of
 impervious surface percent as a means for assessing the accuracy of preliminary sub-pixel estimates of
 impervious surface cover derived  from TM imagery. The spatial technique allows for multiple accuracy
 assessment approaches.  Results indicated that even though per-pixel based estimates of the accuracy of
 the sub-pixel data were poor (28.4%, Kappa = 0.19), the accuracy of the impervious surface percentage
 estimated using whole-area and rank correlation approaches were much improved (70.9%, Spearman
 correlation = 0.608). Our findings suggest that per-pixel based approaches to the accuracy assessment of
 sub-pixel classified data need to be approached with  some caution. Per-pixel based approaches may
 underestimate the actual whole-area accuracy of the MOI map, as derived from sub-pixel methods, when
Page 324 of 339

-------
 applied over large geographic areas. The raster overlay technique easily extracted the data necessary to
 derive these assessments.  Although the in ArcView Avenue script used here was primarily suited for the
 assessment of data at the sub-pixel level, it can be utilized to derive the accuracy of any classified dataset
 where higher resolution digital truth data is available.
  19.6  Acknowledgments

  We gratefully acknowledge the United States Geological Survey and the mid-Atlantic RESAC for
  providing the preliminary sub-pixel derived impervious surfaces estimates and Baltimore County,
  Maryland for providing the Planimetric GIS data.  The U.S. Environmental Protection Agency (EPA),
  through its Office of Research and Development (ORD), funded and performed the research described
  This manuscript has been subjected to the EPA's peer and programmatic review and has been approved
  for publication. Mention of any trade names or commercial products does not constitute endorsement or
  recommendation for use.
  19.7   References

 Civco, D.L. and J.D. Hurd. Impervious surface mapping for the state of Connecticut. In: Proceedings,
     American Society of Photogrammetry and Remote Sensing (ASPRS) 1997, Seattle, 3, American
     Society of Photogrammetry and Remote Sensing, Bethesda, MD, 1997.

 Congalton, R.G. and K. Green.  Assessing the accuracy of remotely sensed data: principles and
     practices. Lewis Publications, Boca Raton, FL, 1999.

 Huguenin, R.L. Subpixel analysis process improves accuracy of multispectral classifications. Earth
     Observation Magazine, 3(7), 37-40, 1994.

 Huguenin, R.L., M.A. Karaska, D.V. Blaricom, and J.R. Jensen.  Classification of bald cypress and tupelo
     gum trees in Thematic Mapper imagery. Photogrammetric Engineering and Remote Sensing, 63(6),
     717-727, 1997.

\ Ichoku, C. and A. Karnieli. A review of mixture modeling techniques for sub-pixel land cover estimation.
     Remote Sensing Reviews, 13, 161-186, 1996.

 Jensen, J.R. Introductory Digital Image Processing: A Remote Sensing Perspective. Prentice-Hall,
     Englewood Cliffs, NJ, 1986.

 Ji, M. and J.R. Jensen. Effectiveness of subpixel analysis in detecting and quantifying urban
     imperviousness from Landsat Thematic Mapper.  Geocarto International,  14(4), 31-39, 1999.

 Mid-Atlantic RESAC, Impervious Surface Mapping using Multi-Resolution Imagery,
     http://www.geog.umd.edu/resac/impervious/htm. 2002.

 (MRLC 2000) Multi-Resolution Landscape Characteristics 2000,
     http://landcover.usgs.gov/natlandcover 2000.html. 2002.
                                                                                  Page 325 of 339

-------
Prisloc, M., L. Gianotti, and W. Sleavin.  Determining impervious surfaces for watershed modeling
    applications.  The 8th National Non-point Monitoring Workshop, Hartford, and
    http://\v\vw.canr.ucorin.edii/ces/nemo/gis/pdfs/nps_paper.pdf, 2000.

Sehueler, T.R. The Importance of imperviousness.  Watershed Protection Techniques, 1(3), 100-111,
    1994.

Smith, A.  Subpixel estimates of impervious cover from Landsat TM image.
    hllp://vv\vw.geog.iimd.cHJii/rcsac/iiripcrvious2.litrn. 2001.

Story M. and  R.G. Congalton. Accuracy assessment:  A user's perspective. Photogrammetric
    Engineering and Remote Sensing, 52(3), 397-399, 1986.

U.S. Environmental Protection Agency (U.S. EPA).  The Quality of Our Nation's Water: 1992.  United
    States Environmental Protection Agency, USEPA Office of Water, Report # EPA-841-S-94-002,
    Washington DC, 1994.
Page 326 of 339

-------
                                   Chapter 20


  Area and Positional Accuracy of DMSP Nighttime Lights Data

                                         by

                               Christopher D.  Elvidge1'
                                   Jeffrey Safran2
                                   Ingrid L. Nelson2
                                 Benjamin T. Turtle2
                                 Vinita Ruth Hobson2
                                 Kimberly E. Baugh2
                                   John B. Dietz3
                                  Edward H. Erwin1
   NOAA
   National Geophysical Data Center
   325 Broadway
   Boulder, CO 80305

   Corresponding Author Contact:

    Telephone: (303)497-6121
      Facsimile: (303)497-6513
        E-mail: chris elvidge@noaa.gov
Cooperative Institute for Research in
   Environmental Sciences
University of Colorado
Boulder, CO 80303
                  3 Cooperative Institute for Research on the Atmosphere
                    Colorado State University
                    Fort Collins, CO  80523
20.1  Introduction

The Operational Linescan System (OLS) is an oscillating scan radiometer designed for cloud imaging
with two spectral bands (visible and thermal) and a swath of approximately 3,000 km. The OLS is the
primary imager flown on the polar orbiting Defense Meteorological Satellite Program (DMSP) satellites.
The OLS nighttime visible band straddles the visible and near-infrared (NIR) portion of the spectrum
from 0.5-0.9 urn and had 6-bit quantization, with digital numbers (DNs) ranging from zero to 63. The
                                                                         Page 327 of 339

-------
thermal band has 8-bit quantization and a broad band-pass from 10-12 urn. The wide swath widths
provide for global coverage four times a day: dawn, day, dusk, and night.  DMSP platforms are stabilized
using four gyroscopes (three axis stabilization) and platform orientation is adjusted using a star mapper,
Earth limb sensor and a solar detector.

 At night the OLS visible band is intensified using a photomultipliertube (PMT), enabling the detection
of clouds illuminated by moonlight.  With sunlight eliminated, the light intensification results in a unique
data  set in which city lights, gas flares, lightning illuminated clouds, and fires can be observed (see Figure
20-1). The OLS visible band sensor system is designed to produce visually consistent imagery of clouds
at all scan angles for use by U.S. Air Force meteorologists with a minimal amount of ground processing.
The  visible band base gain is computed onboard based on scene source illumination predicted from solar
elevation and lunar phase and elevation. This automatic gain  setting can be overridden or modified by
commands transmitted from the ground. The automatic gain is lowest when lunar illuminance is high.
As lunar illuminance wanes, the gain gradually rises.  The highest visible gain settings occur when lunar
illumination is absent. The combination of high gain settings and low lunar illuminance provides for the
best  detection of faint light sources present at the earth's surface.  The drawback of these high gain setting
observations is that the visible band data of city centers are typically saturated.  Data acquired under a full
moon when the gain is turned to a lower level are generally not as useful for nighttime lights product
generation since they exhibit fewer lights and have the added  complication of bright clouds and terrain
features.
                                                                        Visible Band
            Thermal Band
           Figure 20-1. Visible and thermal NIR nighttime OLS images over California. With
                       sunlight eliminated, the OLS' light intensification results in the
                       detection of lights present at the Earth's surface.
Page 328 of 339

-------
In addition to tracking lunar illuminance, gain changes occur within scan-lines with the objective of
making visually consistent cloud imagery, regardless of scan angle. The base gain is modified every 0.4
ms by an on-board along scan gain algorithm. A bi-directional reflectance distribution function (BRDF)
algorithm further adjusts the gain to reduce the appearance of specular reflectance in the scan segment
where the solar or lunar illumination angle approaches the observation angle.

f he OLS design provides imagery with a constant ground sample distance (GSD) both along- and across-
track. The along track GSD is kept constant through a sinusoidal scan motion, which keeps the track of
the scan-lines on the ground parallel. The analog to digital conversion within individual scan-lines is
timed to keep the GSD constant from the nadir to the edge of scan. OLS data can be acquired in two
spatial resolution modes corresponding to fine resolution  data (0.5 km GSD) and smoothed data (2.7 km
GSD). All data are acquired in fine resolution mode, but  in most cases the recorded data are converted to
the smoothed resolution by averaging of 5 x 5 pixel blocks.
       the GSD of OLS data is kept constant, the instantaneous field of view (IFOV) gradually expands
from the nadir to the edge of scan (see Figure 20-2).  At nadir the low light imaging IFOV of the fine
resolution data is 2.2 km and expands to 4.3 km 800 km out from nadir. At this point in the scan the
electron beam within the OLS PMT automatically shifts to constrain the enlargement of pixel dimensions,
which normally occur as a result of cross track scanning (Lieske, 1981). This reduces the IFOV to 3.0
^m.  The IFOV then expands to 5.4 km at the edge of scan, 1,500 km out from the nadir.  Thus, the IFOV
js substantially larger than the GSD in both the along track and along scan directions. At nadir the
grnoothed OLS low-light imaging pixel has an IFOV of 5.0 km and at the edge of scan the IFOV is
Approximately 7.0 km.
                 Nadir
                                                                   3.0 km pixel
              IFOV switching position
              766 km from nadir
                                                                       Edge of scan
                                                                       5.4 km pixel
            Figure 20-2.  The OLS fine resolution nighttime visible band Instantaneous Field of
                        View (IFOV) data starts at 2.2 km at the nadir and expands to 4.5 km at
                        766 km out from the nadir. After the PMT electron beam is switched,
                        the IFOV is reduced to 3.0 km and expands to 4.8 km at the far edges
                        of the scan.
                                                                                    Page 329 of 339

-------
\n order xobu\\d c\oud-free g\oba\ maps of nighttime lights and to separate ephemeral lights (e.g., fires)
from persistent lights from cities, towns and villages a compositing procedure is used to aggregate lights
from cloud-free portions of large numbers of orbits, spanning months or even multiple years (Elvidge et
al., 1997, 1999, 2001). To avoid the inclusion of moonlit clouds in the products, only data from the dark
half of the lunar cycle are composited.  The lights in the resulting composites are known to overestimate
the actual size of lighting on the ground.

The objective of this chapter is to document the area and positional accuracy of OLS nighttime lights and
to examine the causes for the area overestimation of OLS lighting. We have done this using light from
isolated sources located in southern California. The analyses were conducted using data from four OLS
sensors spanning a 10-year time period.
20.2   Methods
 20.2.1   Modeling a Smoothed OLS Pixel Footprint
 A scaled model of an OLS PMT smoothed pixel IFOV at nadir was built by placing 25 fine resolution
 pixel footprints onto a five by five grid, each displaced by a 0.5 km GSD. The number of times a light
 would get averaged into a smoothed pixel was tallied for each of the resulting polygon outlines (see
 Figure 20-3). A similar model was built to show the IFOV overlap between adjacent PMT smoothed
 pixels. This model was constructed by placing nine of the smoothed pixel footprints from Figure 20-5
 onto a three by three grid using a scaled GSD of 2.7 km. The number of smoothed pixel detection
 opportunities was then tallied for each polygon zone (see Figure 20-4).

                                                        Figure 20-3
                                                        Scaled model of a PMT smoothed pixel at
                                                        nadir composed of 25 fine resolution pixel
                                                        footprints. The overlap between IFOV of
                                                        adjacent Tine resolution pixels results in the
                                                        possibility that lights present on the Earth's
                                                        surface will be averaged into the smoothed
                                                        pixels multiple times. The number labels
                                                        marked on the polygons indicate the number
                                                        of times lights present in the polygons would
                                                        be averaged into the resulting smoothed
                                                        pixel.
1
2
3
4
5
6
                         5Km
                              1011 1213 14  15
 Page 330 of 339

-------
                                                         Figure 20-4
                                                         Scaled model of a three by three block
                                                         of smoothed OLS PMT pixels. The
                                                         dashed line indicates the boundary of a
                                                         single smoothed pixel IFOV at nadir, as
                                                         modeled in Figure 20-4. Because of the
                                                         substantial overlap between adjacent
                                                         smoothed pixel IFOVs, it is possible for
                                                         point sources of light to show up in
                                                         more than one smoothed pixel. The
                                                         number of overlapping smoothed OLS
                                                         pixel IFOVs for each polygon is
                                                         indicated using the grayscale.
                                                         Additional levels of overlap are
                                                         encountered in actual OLS imagery as
                                                         the pattern is extended beyond this
                                                         three by three example and as the IFOV
                                                         expands at off nadir scan angle
                                                         conditions.
                 1234
               Number of Overlapping Smooth Pixels

'N*    Xv:>
Pacific      Chan
                          0    45    90         180 Kilometers
                          I          i          I
                                                         Figure 20-5
                                                         Image indicating the number of time
                                                         lights were detected for each 30 arc
                                                         second grid cell.
                                                                              Page 331 of 339

-------
20.2.2   OLS Data Preparation

Nighttime DMSP-OLS data from 2,210 orbits acquired between April 26, 1992 and April 4, 2001 were
processed to produce georeferenced images of lights and clouds of the southern California region.  The
data were initially processed for the NOAA National Marine Fisheries Service to determine the locations
and temporal patterns of squid fishing activities conducted using heavily lit boats offshore from the
Channel Islands. Data were included from four day-night DMSP satellites:  F-10, FT 12, F-14, and F-15.
DMSP data deliveries to the archive were irregular during 1992, resulting in gaps in the early part of our
time series.

Orbits were selected from the archive based on their acquisition time to include nighttime data over
California. The orbits were automatically sub-orbited based on the nadir track to 32°-42° north latitude.
Lights and clouds were identified using the basic algorithms described in Elvidge et al. (1997). The next
step in the processing was to geographically locate (geolocate) the sub-orbits. The geolocated images
covered the area from 32°-36° north latitude and 117°-122° west longitude.  The OLS geolocation
algorithm uses satellite ephemeris (latitude, longitude, and altitude at nadir) generated by the SPEPH
(Special Ephemeris) orbital model developed by the U.S. Air Force specifically for the DMSP platforms.
The orbital model was parameterized by bevel vectors derived from daily RADAR sightings of each
DMSP satellite. Ephemeris data were calculated for each scan line. The geolocation algorithm calculates
the position of each OLS pixel center using the satellite using the ephemeris, a calculation of the scan
angle, an earth geode model, and a terrain correction using GTOPO30.  The pixel center positions were
used to locate the corresponding 30 arc second grid cells, which are filled with the OLS DN values. This
generates sparse grid, having DN data only in cells containing OLS pixel centers. The complete 30 arc
second grids were then filled to form a continuous image using nearest neighbor resampling of the sparse
grids.


20.2.3   Target Selection and Measurement

A composite of cloud-free light detections was produced using data from the entire time series. The
composite values indicated the number of times lights were detected for each 30 arc second grid cells.
These were then filtered to remove single pixel light detections, a set which contains most of the system
noise  (see Figure 20-5). The cloud-free composites were then used to identify persistent light  sources
(present through the entire time series) for potential use in the study.  Two types of persistent lights were
selected:   (1) isolated point sources with lighting ground areas much smaller than the OLS pixel, such as
oil and gas platforms in the Santa Barbara Channel (see  Figure 20-6); and (2) isolated lights with moire
extensive areas of ground lighting.  We identified five point sources:  four oil and gas platforms (Channel
Islands 1-3, and Gaviota-1) and a solitary light present at an airfield on San Nicolas Island.  Calibration
targets with more extensive area of lighting included a series of cities, towns and facilities found on land
(see Table 20-1).

The area of each of the lighting sources was estimated using year 2000 Landsat Enhanced Thematic
Mapper plus (ETM+) data by manually drawing a polygon around each of the targets using ENVI
software.  The number of ETM+ pixels in the polygon was then multiplied by pixel area to estimate the
total target size (km2). For the point sources lights, locations and area estimates were performed using the
15 m  panchromatic data.  Area extraction for the onshore targets was  based on visual interpretation of an
ETM+ color composites formed using bands 2, 4, and 5 as blue, green, and red. The ETM+ color
composite was individually contrast enhanced for each target prior to  the manual polygon generation.
Page 332 of 339

-------
                               Figure 20-6
                               Oil and gas platforms detected by the
                               OLS are approximately 0.01 km2. This
                               represented approximately 0.2% of the
                               IFOV of a smoothed PMT pixel from the
                               OLS.
Table 20-1.  List of calibration target characteristics.
Name
Gaviota 1
Channel Island 3
Gaviota 2 West
Gaviota 2 East
Structure East of CA City
Gaviota Plant
Avalon
San Nicolas
California Correctional Institute
Johannesburg
Helendale
Lake Los Angeles
California City
Edwards AFB
Ridgecrest
Santa Barbara
Santa Clarita
Bakersfield
Lancaster/Palmdale
Victorville
Latitude
34.3506
34.1253
34.3768
34.3907
35.1583
34.4751
33.3501
33.2584
35.1169
35.3667
34.7584
34.6224
35.1333
34.925
35.6418
34.4334
34.4521
35.3667
34.625
34.4917
Longitude
-120.2806
-119,401
-120.1688
-120.1218
-117.8585
-120.2084
-118.3251
-119.4918
-118.5718
-117.6502
-117.3419
-117.8329
-117.9668
-117.9002
-117.6751
-119.7084
-118.5418
-119.0418
-118.1252
-117.3085
ETIVT Area
(km2)
0.00765
0.008325
0.00855
0.009
0.318375
0.348075
1.133
1.417
1.834
2.13
8.614
20.961
36.126
37.597
73.644
147.188
197.802
333.866
357.979
369.411
OLS Area
(km2)
20.445
15.51
18.33
17.625
41.595
39.48
40.185
21.15
38.775
23.265
34.545
51.465
67.68
174.135
169.2
425.82
457.545
763.515
661.995
719.1
                                                            Page 333 of 339

-------
20.3   Results

20.3.1   Geolocation Accuracy

Light detections of the point sources from individual sub-orbits in the time series were examined to
determine the geolocation accuracy of OLS nighttime visible band data.  The latitude/longitude locations
of the five point sources of light were extracted for the center of each feature using the 15 m
panchromatic ETM1 data. Vector shorelines were overlain on the panchromatic data to confirm that the
geolocation accuracy of the ETM' data  was in the range of+/- one or two pixels. This was deemed fully
adequate for use as a geolocation accuracy reference source for the 2.7 km GSD OLS data.

We followed the geolocation accuracy assessment procedures outlined by the U.S. Federal Geographic
Data Committee (FGDC, 1998). This procedure used the root-mean-square error (RMSE) to estimate
positional accuracy.  RMSE was calculated as the square root of the average of the set of squared
differences between dataset coordinate values and coordinate values from an independent source of
higher accuracy for identical points.

For the OLS geolocation  accuracy assessment, we compared the latitude/longitude position of the
centroid of OLS detected lights against  the latitude/longitude position extracted from the ETM'
panchromatic band.  We tested the geolocation accuracy of lights detected for the five point sources.  The
analysis was performed for the data from the individual satellites.

For each image in the time series, an automated process searches for a light near the specified
latitude/longitude position from the ETM' data.  The algorithm  looks for the presence of cloud-free lights
in an  11  x 11 box of 30 arc second grid cells centered on the ETM' latitude/longitude. When a light was
found the algorithm identifies the full extent of the light (extending beyond the initial 11x11 box as
needed). Valid lights for the analysis were limited to those no larger than 50 grid cells in extent. A
bounding rectangle of 30 arc second grid cells was established for each of the valid lights. The 30 arc
second grid cell representing to the centroid of the light was  identified through a separate analysis of the
DN values in both the* andy directions inside the bounding rectangles.  Two arrays are generated
containing the average DN of the grid cells for the lines and  columns.  The centroid x,y was identified
based on the average DN peak found in the two arrays. The  centroid x,y were then converted to a
latitude/longitude for the  center of the identified 30 arc second grid cell.  The algorithm then calculated
the positional offset between the centroid and the ETM1 derived latitude/longitude of the light.  The
process is repeated for each of five point sources and each of the images  in the time series.  The resulting
lists of offsets were used to calculate RMSE.r, RMSEy and accuracy in accordance with the FGDC
procedure.

 I he geolocation accuracy assessment results for lights detected in  F-10, F-12, F-14, and F-15 satellite
data are shown in Figure  20-7.  The white triangles indicate the 30 arc second grid cell containing the
ETM1 latitude/longitude of the light sources. The RMSEx and RMSEy values ranged from 0.74 to
1.13 km. RMSEx was lower than RMSEy for each satellite. This indicates that there is more dispersion
in the along-track geolocation accuracy  than in the cross-track direction.  The satellite F-14 data yielded
the highest geolocation accuracy (1.55 km).  The satellite F-12 and F-15  data had nearly identical
RMSEx, RMSEy, and geolocation accuracy results.  Satellite F-10 data had the lowest geolocation
accuracy (2.36 km).  Data from all four  satellites produced geolocation accuracies of less than one pixel.
Page 334 of 339

-------
F10

















1

1

1









I


1











1
1
2
I







I
1
1

3
4
2
1






!
1
1
2

6
3
1





1

1
2

III
r9*
i






i
i

i
2
2
1
1





1



1
t










1
















1






















                                               F12
    RMSF. x = 0.8km     RMSEy = 1.13km
    Accuracy = 2.36km            N=245
                          RMSE x = 0.86km
                          Accuracy = 1.85km
                   RMSE y = 0.94km
                            N=545
    RMSE x = 0.74km
    Accuracy = 1.55km
RMSE y = 0.76km
          N=601
                                               FI5
RMSE x = 0.85km
Accuracy = 1.83km
RMSE y = 0.93km
         N=201
Figure 20-7.  Geolocation accuracy of the centroid positions of OLS lights from DMSP satellites F-
            10, F-12, F-14, and F-1S. The numbers printed on the grid cells indicate the percentage
            of observations in which the OLS light centroids were found in that grid cell position
            relative to the actual location of the light.
                                                                                Page 335 of 339

-------
20.3.2  Comparison of OLS Lighting Areas to E771/T /Areas

The area of OLS lighting for each of the targets was extracted from the year 2000 F-14 cloud-free
composite. This composite was selected because it had large numbers of cloud-free observations and was
most contemporaneous with the year 2000 ETM' data. The composite was filtered to remove light
detections that occurred only once.  Figure 20-8  shows the area of OLS lighting versus ETM' area for
twenty light sources, indicating that the OLS overestimated the area of lighting. However, the OLS
lighting area was highly correlated to the area of lighting estimated  from the daytime ETM* data.
Regression analysis indicated that the OLS lighting areas were approximately twice the size of the area of
ground  lighting for lights ranging from 20-400 km2. This overestimation was substantially higher for
lighting sources that are smaller than the OLS IFOV.  The OLS was able to detect lights as small as 0.01
kin2, representing approximately 0.01% of an OLS smoothed pixel  IFOV (see Table 20-1).

                               OLS Area of Lighting vs. TM Ground Area
   900 ,		-		 —	 	
                                                                        y = -0 0025x2 + 2 8302X + 20 248
                                                                                R2 = 0 984
                50         100         150         200         250         300         350         400
                                    Area of Lighting From Lands*t ETM (km2)

     Figure 20-8.  Area of OLS lighting versus area of lighting estimated from Landsat 7 ETM* imagery.
20.3.3   Multiplicity of OLS Light Detections

The multiplicity of OLS light detections was examined by tallying the number of OLS pixels detected as
lights for the point sources. For this test, we pooled the data from all four sensors and five point sources
of light. For each point source we tallied the number of OLS light pixels present (1,2,3, etc.) on nights
with light detection and zero lunar illuminance. From this we calculated the percentage of observations
resulting in single OLS light detections, double, triple, and higher. The results show that point sources of
Page 336 of 339

-------
light are detected in solitary OLS pixels 38% of the time, two OLS pixels in 28% of the detections, and
three OLS pixels for 13% of the observations (see Figure 20-9). This phenomenon is caused by the
substantial overlap in the footprints of adjacent OLS pixels (see Figures 20-3 and 20-4).
                                        456
                                         Number of OLS Light Pix.ls
           Figure 20-9. Multiplicity of OLS light detections from point sources of surface lighting.
 20.4   Conclusions

 The DMSP-OLS provides a global capability to detect lights present at the earth's surface. This chapter
 provides the first quantitative assessment of the area and positional accuracy of DMSP-OLS observed
 nighttime lights.

 Light sources from isolated oil and gas platforms with areas as small as 0.1 km2 were detected in this
 study. Since these platforms are heavily lit, the 0.1  km2 area approximates the detection limits of the OLS
 for other heavily lit sources. For detection, the aggregated radiances within an OLS pixel must produce a
 DN value that exceeds the background noise present in the PMT data. A larger area of lighting would be
 required for OLS detection of more dimly lit features than the oil and gas platforms.  Based on pre-flight
 calibrations of the OLS, the oil and gas platforms produce a top-of-atmosphere brightness of
 approximately 10"9 watts/cm2/sr.

  Using isolated point sources of light, we have tested the geolocation accuracy of nighttime lights data
  from the four day/night DMSP satellites for which  there is a digital archive.  This includes satellites F-10,
  F-12, F-14, and F-15.  The OLS lights from single  orbits have geolocation accuracies ranging from 1.55
                                                                                      Page 337 of 339

-------
to 2.36 km.  This was less than the GSD of the raw data (2.7 km). This sub-pixel geolocation accuracy is
achieved without use of ground control points. Being able to position lights with comparable geolocation
accuracy from the multiple satellites will be crucial to the analysis of changes in the extent of
development from the DMSP-OLS time series. While further testing will be required, the geolocation
accuracy results reported  here are encouraging in terms of the prospects for using nighttime OLS data to
analyze changes in the extent of lighting over time.

In examining the relation between the area of OLS lighting and the area of lighting present on the ground,
our study con firms that cloud-free composited DMSP nighttime lights overestimated the area of lighting
on the ground.  This overestimation was the result of a combination of factors including (1) the large OLS
pixel size, (2) the OLS' capability to detect sub-pixel light sources, (3) overlap in the IFOV footprints of
adjacent pixels resulting in multiple pixel detections from sub-pixel sized lights,  and (4) geolocation
errors. These effects, present in data from single observations, were accumulated during the time series
analysis.

There were three other mechanisms, which may enlarge OLS lights beyond the extent of surface lighting
under certain conditions that have  not been explicitly explored in the current study.  One was the
scattering of light in the atmosphere as it is transmitted from  the earth surface to space.  The second being
the reflection of lights off of surface waves in cases where bright city lights are adjacent to water bodies.
The third possible mechanism was the  detection of terrain illuminated by downward scattered light arising
from very bright urban centers or gas flares.

Imhoff et al. (1997) developed thresholding techniques to accurately map urban areas.  The disadvantage
of these techniques is that they eliminate lights from small towns owing to their low frequency of
detection. We believe that it would be possible to reduce the overestimation of the area of lighting based
on an empirical calibration to the extent of surface lighting or via the modulation transfer function (MTF)
of the OLS nighttime visible band imagery.
20.5  Summary

The Defense Meteorological Satellite Program (DMSP) Operational Linescan System (OLS) has a unique
low-light imaging capability developed for the detection of moonlit clouds. In addition to moonlit clouds,
the OLS also detects lights from human settlements, fires, gas flares, heavily lit fishing boats, lightning
and the aurora.  Because all these lights are detected in single spectral band and to remove the effects of
cloud cover, time series compositing is used to make stable lights products, which depict the location and
area of persistent light sources. This compositing is done using data collected on nights with low lunar
illumination to avoid the detection of moonlit clouds and the lower number of lights detected due to the
OLS gain settings during periods of high lunar illumination.  A number of studies have found that these
stable lights products overestimate the size of light sources present on the earth's surface. This
overestimation is due to a combination of factors: the large OLS pixel size, the OLS' capability to detect
sub-pixel light sources, and geolocation errors.  These effects, present in data from single observations,
are accumulated  during the compositing process.
Page 338 of 339

-------
20.6  Acknowledgments

The NOAA NESDIS, Ocean Remote Sensing Research Program, funded this project. Information
Integration and Imaging LLC, Fort Collins, Colorado provided the Landsat imagery used in this study
imagery.
20.7  References

Croft, T.A. Night-time images of the earth from space. Scientific American, 239, 68-79, 1978.

Elvidge, C.D., K.E. Baugh, E.A. Kihn, H.W. Kroehl, E.R. Davis. Mapping of city lights using DMSP
    Operational Linescan System data.  Photogrammetric Engineering and Remote Sensing, 63, 727-734,
    1997.

Elvidge, C.D., K.E. Baugh, J.B. Dietz, T. Bland, P.C. Sutton, H.W. Kroehl, Radiance calibration of
    DMSP-OLS low-light imaging data of human settlements.  Remote Sen. Environ., 68, 77-88, 1999.

Elvidge, C.D., M.L. Imhoff, K.E. Baugh, V.R. Hobson, I. Nelson, J. Safran, J.B. Dietz, B.T. Turtle.
    Nighttime lights of the world:  1994-95. ISPRS Journal ofPhotogrammetry and Remote Sensing, 56,
    81-99,2001.

Imhoff, M.L., W.T. Lawrence, D.C. Stutzer, C.D. Elvidge.  A Technique for Using Composite
    DMSP/OLS "City Lights" Satellite Data to Accurately Map Urban Areas. Remote Sens. Environ.,
    61(3), 361-370, 1997.

Lieske, R. W. DMSP  primary sensor data acquisition.  Proceedings of the International Telemetering
    Conference, 17,1013-1020,1981.

U.S. Federal Geographic Data Committee. Geospatial Positioning Accuracy Standards, Part 3: National
    Standards for Spatial Data Accuracy (NSSDA). FGDC-STD-007.3-1998, 25 p., 1998.
                                                                                 Page 339 of 339

-------