United States
Environmental Protection
Agency
Office of
Reseach and
Development
Environmental Monitoring EPA-600/7-77-100
and Support Laboratory
Las Vegas, Nevada 89114 September 1977
GUIDE TO PRESELECTION OF
TRAINING SAMPLES AND
GROUND TRUTH COLLECTION
Interagency
Energy-Environment
Research and Development
Program Report
-------
RESEARCH REPORTING SERIES
Research reports of the Office of Research and Development, U.S. Environmental
Protection Agency, have been grouped into nine series. These nine broad cate-
gories were established to facilitate further development and application of en-
vironmental technology. Elimination of traditional grouping was consciously
planned to foster technology transfer and a maximum interface in related fields.
The nine series are:
1. Environmental Health Effects Research
2. Environmental Protection Technology
3. Ecological Research
4. Environmental Monitoring
5. Socioeconomic Environmental Studies
6. Scientific and Technical Assessment Reports (STAR)
7. Interagency Energy-Environment Research and Development
8. "Special" Reports
9. Miscellaneous Reports
This report has been assigned to the INTERAGENCY ENERGY-ENVIRONMENT
RESEARCH AND DEVELOPMENT series. Reports in this series result from the
effort funded under the 17-agency Federal Energy/Environment Research and
Development Program. These studies relate to EPA's mission to protect the public
health and welfare from adverse effects of pollutants associated with energy sys-
tems. The goal of the Program is to assure the rapid development of domestic
energy supplies in an environmentally-compatible manner by providing the nec-
essary environmental data and control technology. Investigations include analy-
ses of the transport of energy-related pollutants and their health and ecological
effects; assessments of, and development of, control technologies for energy
systems; and integrated assessments of a wide range of energy-related environ-
mental issues.
This document is available to the public through the National Technical Informa-
tion Service, Springfield, Virginia 22161.
-------
EPA-600/7-77-100
September 1977
GUIDE TO PRESELECTION OP TRAINING SAMPLES
AND
GROUND TRUTH COLLECTION
by
Charles E. Tanner
Lockheed Electronics Company, Inc.
Las Vegas, Nevada 89114
Contract 68-03-2153
Project Officer
Robert W. Landers
Remote Sensing Division
Environmental Monitoring and Support Laboratory
Las Vegas, Nevada 89114
ENVIRONMENTAL MONITORING AND SUPPORT LABORATORY
OFFICE OF RESEARCH AND DEVELOPMENT
U.S. ENVIRONMENTAL PROTECTION AGENCY
LAS VEGAS, NEVADA 89114
-------
DISCLAIMER
This guide has been reviewed by the Environmental Monitoring and Support
Laboratory/Las Vegas, Nevada, U.S. Environmental Protection Agency, and ap-
proved for publication. Approval does not signify that the contents nec-
essarily reflect the views and policies of the U.S. Environmental Protection
Agency, nor does mention of trade names or commercial products constitute
endorsement or recommendation for use.
11
-------
FOREWORD
Protection of the environment requires effective regulatory actions
which are based on sound technical and scientific information. This infor-
mation must include the quantitative description and linking of pollutant
sources, transport mechanisms, interactions, and resulting effects on man
and his environment. Because of the complexities involved, assessment of
specific pollutants in the environment requires a total systems approach
which transcends the media of air, water, and land. The environmental
Monitoring and Support Laboratory-Las Vegas contributes to the formation and
enhancement of a sound integrated monitoring data base through multidiscip-
linary, multimedia programs designed to:
develop and optimize systems and strategies for
monitoring pollutants and their impact on the
environment
demonstrate new monitoring systems and technologies
by applying them to fulfill special monitoring needs
of the Agency's operating programs
This report describes and outlines procedures for the preselection of train-
ing samples used in computer processing of multispectral scanner data.
These data are then used to access reclamation efforts and monitor changes
on active strip mines in the Western United States.
r I
George B. Morgan
Director
Environmental Monitoring and Support
Laboratory
Las Vegas, Nevada
iii
-------
CONTENTS
Foreword
Figures vi
Tables vi
Acknowledgment vii
1. Introduction . - . . . . . ... ...... . . « ... . . . . i
- 2. Summary ,... ............. 3
3. Conclusions and Recommendations 4
4. Geography .... . . ,., ... . . . . .... « . . . 5
Geography and History of the Area 5
Mid-latitude Grasslands 5
Desert Vegetation 6
Mountain Vegetation . 6
Vegetative Groupings 7
5. Sampling Scheme 9
Landsat Scheme g
Aircraft Scheme 14
6. Training Sample Selection 17
7. Ground Truth 19
Information Requirements 19
Field Operations 19
Data Collection Equipment and Materials 20
The Ground Truth Form 20
Additional Equipment 20
References 23
-------
FIGURES
Number Page
1 One of many possible line-scanning methods utilized
in airborne infrared sensing 10
2 Black and white Landsat image of eastern Montana 12
3 Low-cost Data System 13
4 Image processing system of the Data Analysis Station .... 15
TABLES
Number Page
1 Simulated Classification Hierarchy H
2 Modified ground truth form 21
vi
-------
ACKNOWLEDGMENT
The scale and scope of this report were made possible through the coop-
eration of the U.S. Environmental Protection Agency and the National Aero-
nautics and Space Administration/Earth Resources Laboratory (NASA/ERL) in
Slidell, Louisiana.
Sincere thanks are extended to Mr. Larry Erickson, Lockheed Electronics
Company at NASA/ERL, for allowing us to cite sections of his unpublished
manuscript and also for acting as major advisor on this report.
Appreciation is extended to Mr. Sidney Whitley for providing the nec-
essary documentation for generating the photography of the Varian-75 system
and peripheral hardware.
vii
-------
SECTION 1
INTRODUCTION
Quite often data processing analysts are forced to process multispectral
scanner data with a minimum of background information and personal knowledge
of the scene under investigation. This situation is compounded by the col-
lection of inadequate data about the site. Under ideal situations a data
sampling scheme should be developed by the project statistician with inputs
from the data processing analyst.
The classification hierarchy (a categorical ranking of the natural and/
or manmade features for use in processing digital multispectral scanner data)
and ground truth form (a ground truth form is merely an out-of-doors exten-
sion of the classification hierarchy) should be developed by the person(s)
responsible for the data processing in conjunction with the field mensuration
leader. This would assure the collection of adequate ground data.
Obviously there are certain things that cannot be done in the field
because of time and resource constraints. On the other hand, there are
certain operations that can only be accomplished by a knowledgeable field
person with virtually little expenditure of energy, time, and/or resources.
These points must be discussed at meetings that are open to free discourse
of ideas and suggestions. An explanation of the data processing objectives
should also be presented to the field personnel to make them cognizant that
a thorough job on their part will greatly enhance the analyst's chances of
generating an accurate classification map and that they are also responsible
for the results and any conclusions drawn from such analysis.^
Field ground truth operations are usually expensive and time consuming
and rarely have been done properly. New efficient methods must be developed
to make it much more cost-effective. Hopefully, statistical sampling methods
will be developed in order to reduce the total number of ground truth sam-
ples. In essence, ground truth data must be collected by statistically sound
methods and must be efficiently collected and sufficiently accurate for the
given application and objectives.
This report addresses the problem of how to go about collecting ground
truth for the purpose of processing digital data. Also, this report outlines
some of the procedures used by the author in the actual data reduction phase
of automatic data processing. Because of the immediate need of such a report
by personnel engaged in ground truth collection at the Environmental Monitor-
ing and Support Laboratory in Las Vegas, Nevada, no attempt was made to famil-
iarize the reader with the basic principles of pattern recognition analysis.
-------
However, those who are interested in pursuing this type of analysis may refer
to Landgrebe's "Systems Approach to the Use of Remote Sensing" published by
Purdue University in 1971.
-------
SECTION 2
SUMMARY
In summary, pattern recognition using the multispectral approach has
been described as an analysis procedure that has proved useful in coping with
the vast amount of digital data being collected daily by conventional air-
craft and space satellites (Landsat). This type of analysis is heavily
dependent upon accurate ancillary data, e.g., topographic maps, vegetation
maps, geologic survey, land-use maps and ground truth. Current ground truth
that has been accurately and inexpensively gathered is a "dream come true"
for the automatic data processing analyst.
This guide addresses the criteria and procedures that must be used in
the gathering of ground truth data for use in processing digital remote
sensing data. Regardless of the choice of pattern recognition analysis, this
report should have application to the ground truthing activities.
-------
SECTION 3
CONCLUSIONS AND RECOMMENDATIONS
Ground truth operations are usually expensive and time consuming and
are rarely performed properly. The'pattern^recognition community in the
world of remote sensing is indeed dependent upon ground truth data and can
do very little, if anything, without this vital information. Work must con-
tinue to develop new efficient methods to make ground truth collection more
cost-effective. Also work must continue on the development of statistical
sampling methods that will reduce the total number of ground truth samples.
The procedures outlined in this guide were based on specific agency con-
straints, systems designs and manpower availability; therefore, it is recom-
mended that modifications be made to the procedure to fit the needs of the
analyst. It is also recommended that an explanation of the data processing
objectives be presented to the field mensuration personnel to make them
aware of the need for the collection of accurate ground truth. Too, it is
recommended that preselection of training samples be performed as a joint
venture between the analyst and the field team. Such an effort would foster
cooperation as well as providing a learning experience for the novice field
personnel.
-------
SECTION 4
GEOGRAPHY OF THE AREAS
Before a classification hierarchy or sampling scheme can be developed,
one must obtain basic information concerning the area. The following in-
formation is intended to illustrate the broad ranges of problems associated
with developing inventory techniques for strip mining operations via remote
sensing techniques in the west and Northern Great Plains areas of the United
States.
The Western Energy Project/Strip Mining (WEP/SM) is a rather unusual
undertaking. It is the analyst's intent to map active mining features, re-
vegetative types within the mined area and natural vegetation and also
"determine the condition and density classes associated with each vegetative
species via automatic data processing techniques.
GEOGRAPHY AND HISTORY OF THE AREA
The Northern Great Plains are expansive prairie lands bounded in the
west by the Rocky Mountains in Montana and Wyoming. The whole Northern
Great Plains Coal Province includes all coal in western North Dakota, coal
occurring in the Missouri River drainage and east of the Rocky Mountains of
Montana and Wyoming, coal in western South Dakota, coal in the Denver Basin
of Colorado and Raton Mesa of Colorado and eastern New Mexico. A good deal
of the Federal Coal Reserve lies in this province.
The geologic formation containing these coal deposits consists of from
1,700 to 3,200 feet of sandstone, shale and coal. In general, coalbeds are
thickest in Wyoming. Some important coalbeds are the Badges and School
seams in Glenrock, the Monarch seam near Sheridan, the Healey bed near Lake
DeSmet, and the famous Wyodak bed near Gillette.^ These coal seams in
this region were formed from thick and extensive accumulations of biological,
principally vegetative, matter buried through geologic time. Development
of the thick coal seams of the west required very large flooded areas
(swamps) which slowly subsided while growth of vegetation was optimal.
MID-LATITUDE GRASSLANDS
In the text written by Glen Trewartha, 1961 (3), page 322, it states
that, "In the midcontinent grasslands of the interior United States east of
the Rockies, two, or sometimes three, subdivisions are recognized. The tall-
grass prairie, or true prairie, which has been replaced by farmland, used
to occupy the more humid eastern parts, and the short-grass steppe, or
-------
plains, still exists in many of the drier western parts. Between them is a
transition zone called the mixed-grass prairie, which consists both of mid-
grasses, two to four feet tall, and of shorter grasses, which together form
an upper and a lower story. There are assumptions that the two-story mixed-
grass prairie originally also prevailed on the plains area farther west, and
that the midgrass species were largely killed by overgrazing. West of the
Rockies extensive vegetation areas of the original short bunchgrass still
exist, mainly in Washington, Oregon, Idaho and California." (>3)
DESERT VEGETATION
In the same text '^) page 323, Trewartha said, "Desert type vegetation
pose still another problem. Some soilless, windswept, rocky deserts and
some areas of moving sand dunes, may be devoid of all vegetation and these
are the exception. Most desert regions have some plant life, although almost
invaribly it is sparse. Desert plants are of several types and each has a
different way of surmounting the handicaps of its arid environment."
"Desert shrub is the most widespread form of arid-land vegetation. The
deciduous species make use of leaf shedding in order to withstand drought.
The evergreen varieties have protective structures such as smal-l, thick and
leathery leaves with shiny, waxy surfaces. The two most widespread desert
shrubs are sagebrush and creosote bush, the second of which grbws chiefly
in the hotter and more arid southwest."
"Arid-land vegetation also includes leafless, thorny succulents such as
cacti, certain salt-tolerant plants, and a variety of short-lived transients.
The so-called transients, usually small in size, include many flowering
annuals, but also grasses and tuberous plants."
MOUNTAIN VEGETATION
According to Trewartha ^' "Mountain areas are characterized by an
unusual variety of local environments existing in close juxtaposition. On
the lower slopes of highlands, the vegetation cover may resemble that of the
surrounding lowlands. With increasing altitude, temperature decreases rap-
idly, while solar energy, rainfall and wind speeds increase. Change of
vegetation fall into a rough vertical zonation of plant life, but with many
interrupting local variations."
"Conifers, with their strong tolerance for the vicissitudes of mountain
climates and soils, comprise the largest element of mountain forests in
middle latitudes and they are also found at high elevations. Within the
coniferous zone, pines usually predominate at lower elevation, but fir and
spruce take over near the upper climatic limits of forest. Above these
limits, which vary in altitude, trees will not grow because of low temper-
atures, a short growing season, diurnal freeze and thaw, strong winds and
thin soils that alternate between saturation and aridity."(3)
-------
VEGETATIVE GROUPINGS
Paul Packer in his technical report entitled "Rehabilitation Potentials
and Limitations of Surface-Mined Land in the Northern Great Plains"^2' iden-
tified 16 broad, but recognizable, vegetation types. Nine of these 16
vegetation types occur on surface mineable areas. These vegetation types
and an explanation as to their range and suitability as rehabilitation types
are also given below:
F£oodp£flU.lt This vegetation type occurs in the bottom lands
and banks of major rivers and on broad floodplain terraces
that have alluvial soils. It occupies, sites, some highly
saline, that have high water tables. The floodplain type is
characterized predominately by hardwood tree and shrub species
that are poorly suitable and poorly available for rehabilitation.
This vegetation type occupies breaks along rivers
and streams and steep south slopes of exposed shales, sandstones
and clays. Dominant plant species are arid-land shrubs and
grasses associated locally with scabby ponderosa pine forests.
The species that characterize this type are poorly available
for rehabilitation.
SkoSLt-Gft.CU>& PtLCUAsie. This vegetation type occupies dry
prairies on shallow soils in southeastern Montana and
northeastern Wyoming. Dominant species are blue grama grass,
western wheatgrass, and various needlegrasses. The species
that characterize this type have moderately poor suitability
and fair availability for rehabilitation.
M*Ld-Shdflt G/uti-6 PJWUSLie. -- This type occupies rolling prairies
on loam to clay- loam soils in eastern Montana. It is char-
acterized by western wheatgrass, needle and thread grass and
blue grama grass. These species have moderately poor suitability
and fair availability for rehabilitation.
This type, which occurs on loamy soils in
extreme eastern Montana, southwestern North Dakota, and north-
western South Dakota, contains no dominant short grass. Prin-
cipal species are needlegrasses, wheatgrasses, and blue stem
grasses. Most species that comprise this type are fairly
suitable and have good availability for rehabilitation.
This vegetation type occurs on open grass-
land of mid and short grasses, with scattered sagebrush, and occurs
on silty clay-loam soils in southeastern Montana and northeastern
Wyoming. Most of the species that comprise this type have good
suitability and moderately good availability for rehabilitation.
Sa.ge.bJiUAh-Ste.ppe. This type is dominated by sagebrush in open
grassland containing wheatgrasses, needlegrasses and threadgrasses
on silt to silty clay-loam soils. It occurs chiefly in north-
-------
eastern Wyoming and most of the major species have moderately
good suitability, but are relatively unavailable for reha-
bilitation.
-- This vegetation type occurs on gently
rolling prairie northeast of the Missouri River in North Dakota,
and is characterized by wheatgrasses, big and little blue stem
grasses, and needlegrasses on loam soils of glacial till origin.
Nearly all species in this type have good suitability and
availability for rehabilitation.
\
Pond&LOAO. Rtne ₯oti&>t This vegetation type occurs mainly in
eastern Montana and northeastern Wyoming on uplands, ridges
and north slopes that have shallow loam soils. Prominent
species are ponderosa pine, snowberry, blue grasses, fescues,
and June grass. These species are only fairly suitable but
have good availability for rehabilitation.
These types have been modified and placed in the Western Energy Project/
Strip Mining classification hierarchy for use in the actual data reduction
phase of the study.
-------
SECTION 5
SAMPLING SCHEME
Before a scene (any segment, of digital data) can be classified distinct
signatures must be developed for each class in the hierarchy. This process
of developing signatures poses the problem of determining the number of
training fields that, must be selected for such an undertaking. Too many
training fields would require an excessive expenditure of resources in the
field as well as increasing processing time in the Laboratory. Conversely,
an insufficient number of training fields that are poorly distributed over
the imagery could be the cause of the generation of a classification run
with little, if any, confidence attached to the results. Also, scanner
anomalies will require that training fields be selected at nadir and to the
left and right of this point as well.,
After all is said and done, it must be realized that there is, at this
time, no way to systematically establish training fields for a given number
of classes over a large area simply because classes will occur as nature
dictates, e.g., certain tree species can only be found where the soil types,
elevation and water regime meet their requirements. Training fields have to
be established wherever the class of interest occurs along the entire length
of the flight line or across the scanner's full field of view (Figure 1).
LANDSAT SCHEME
Assume that, based on the information in the previous section," Table 1
has been developed as the initial step in attempting to use Landsat data to
classify native vegetation in eastern Montana. This region is characterized
by a short summer season of scanty precipitation and low humidities with a
high percentage of clear, warm, sunny days and winters with heavy snowfall
and fairly low temperatures. Annual precipitation ranges from 28 to 60
inches with the fall season receiving the largest percentage 6f rainfall.
Growing season (May through August) precipitation varies from 5 to somewhat
more than 7 inches. (4) Vegetation types are in the mid-grass and mid-short
grass prairie varieties as described earlier.
A reduction of a Landsat image of this area may be seen in Figure 2.
This imagery is formed by merging the four tapes that are required to com-
plete a scene into a single 9-track computer compatible tape. Following this
merging step, the tape is viewed and filmed at the Data Analysis Station
(Figure 3). Finally, the 9" x 9" film format is processed according to
standard operational procedures to produce transparencies or hard copies.
Training samples should be selected from this imagery in conjunction with
-------
SCANNER GEOMETRY
NADIR TRACK
NADIR GROUND COVERAGE
OF
INSTANTANEOUS FIELD OF VIEW
GROUND COVERAGE OF
INSTANTANEOUS FIELD
OF VIEW INCREASES
WITH SCAN ANGLE
Figure 1. One of many possible line-scanning methods utilized in airborne
infrared sensing. A rotating mirror in the multispectral scanner
scans the terrain perpendicular to the line of flight.
10
-------
TABLE 1. SIMULATED CLASSIFICATION HIERARCHY,
LEVEL I
Agriculture
Fqrestland
Grasslands
LEVEL II
Cultivated Fields
<1
/^-sFallow Fields
Pinr rf '
c
Hardwood
/Short-Grass Prairie
£-M1d-Grass/Mid-Short Grass
wassland - Sagebrush
LEVEL III
White Pine
Lodgepole Pine A
Ponderosa Pine
.^Blue Grama
/
eedlegrass'
LEVEL IV
0-20% Canopy Density.
25-50% Canopy Density \
50-75% Canopy Density /
75- > % Canopy Densi ty*
0-25% Ground Density
25-50% Ground Density
50-75% Ground Density
75- > % Ground Density
LEVEL V
Seedling/Saplings
Pole Timber
Immature Sawtimber
Mature Sawtimber
/Seed! ings
Immature
\ Mature
\
Nlixed
-------
Figure 2. NASA black and white Landsat imagery of eastern Montana,
-------
LOW-COST DATA SYSTEM
OPERATOR'S TERMINAL
AND CARD READER
9 TRACK MAGNETIC
TAPE DRIVES
COLOR FILM RECORDER
ACK SYSTEM AND
VARIAN V 75 COMPUTER
COMTAL 8100 COLOR DISPLAY
Figure 3. The Low-Cost Data Analysis Station adapted from NASA/ERL
report number 157.
13
-------
low altitude color-infrared photography of the same area. As a general prac-
tice, at least three training samples per class per tape should be selected,
outlined and coded on clear or frosted overlay material. Based on the
classification hierarchy (Table 1) it is necessary to select at least 12
training samples per specie per tape. This means that 36 training samples
for the general class "grassland" would have to be selected. Three training
samples per class per tape is by no means the maximum nor the minimum number
required for sampling purposes. In areas of intensive land use considerably
more samples may be necessary to accurately identify the class because of
the varying nature of land use practices between individual land owners.
Because of the difference in line/pixel (instantaneous field of view)
count, which is a function of altitude and detector size, a new sampling
scheme must be devised to facilitate processing of the aircraft multispectral
scanner data.
AIRCRAFT SCHEME
To begin remember the constraints of the system on which the multi-
spectral scanner data are to be processed. For all practical purposes we
will confine all of our calculations to the low-cost Data Analysis Station
(DAS). The image processing system (IPS) on the DAS (Figure 4) is capable
of displaying 508 pixels of the total 628 pixels from the EPA scanner and
508 lines of the data at one time. The sampling strategy for aircraft
scanner data will be based on these figures and the assumption that all data
have been obtained at an altitude of 12,000 feet with a multispectral scanner
having a 2.5 mrad rad. detector size. Also, a 6 x 6 pixel (slightly less
than 1 acre of 0.301 hectare) cursor will be used as a standard sample size.
Data obtained at the aforementioned altitude with the 2.5 mrad spot size
will have a pixel size of approximately 30 feet (9.04 meters) on a side at
nadir. Due to scanner sweep angle and instability of the aircraft, pixel
sizes will vary from both sides of nadir. This problem can, of course,
be corrected by software programs currently available on the system. A dne
percent sample of 258,000 pixels (508 pixels x 508 pixels of data) or 5,160
acres (2085 hectares) is sufficient to demonstrate the design potential of
the automatic processing unit. This is what automatic data processing is
all about; performing an accurate inventory with less expenditure of time,
money and natural energy sources than conventional mthods. A one percent
sample would be:
Total number pixels 258,064 x 0.01 sample rate =
2580.64 pixels (total number of samples)
The standard sample rate is 36 pixels per training field (roughly 3/4 of an
acre or 0.03 hectare), therefore:
2580.64 * 36 = 71 total training samples/scene
14
-------
Figure 4. Image Processing System (IPS) of the Data Analysis Station.
-------
It is important to remember that sampling schemes are theoretical and
serve as an initial step in most investigations. Modifications of the
schemes are and should be expected because of the very nature of remote
sensing targets. This derived figure (71) is the total number of training
samples that should be selected from the scene. In order to determine the
number of training samples per class divide the total number of samples by
the number of classes in the hierarchy (in this example the number is 12).
This sampling scheme is designed for classification hierarchy with 5 or more
classes and/or subclasses. Training samples for hierarchies with less than
5 classes and/or subclasses should be located and coded as often as possible
and over the entire scene.
16
-------
SECTION 6
TRAINING SAMPLE SELECTION
The previous material provided an analyst with a number to work with:
a starting point. It is now the responsibility of the photo interpreter to
locate areas on the ground, via photographs or transparencies, that are in
the classification hierarchy.
Aircraft color, color-infrared, or black and white imagery should be
used in the selection of training fields; the optimum condition being that
the aerial photography is obtained simultaneously with the aircraft multi-
spectral scanner coverage (all flightlines and other flight parameters should
be calculated on scanner specifications since it is the primary sensor).
Selecting training fields in this manner, prior to field observations, is at
best a guess. The ground training samples may exhibit conditions consider-
ably different from those prognosticated by the image interpreter. There
are several criteria that must be satisfied before the training field is
accepted, its spectral characteristics examined and ultimately merged to
form a single representative spectral signature by which all other areas
of the same category or classification may be recognized. The first and
most important criterion is that the training field be homogeneous and
second its material must be uniformly distributed. This is, of course,
determined by the spatial attributes of the field in question. Obviously,
homogeneity and uniformity may be affected by a variety of natural or
anomalous influences that can occur in a brief period of time. Size is the
third criterion used in selecting training fields. Sample size can be
measured directly from the photography when the scane of imagery and/or
ground distance is known. However, subsequent field visits may reveal
dimensional changes, caused by manmade factors or natural phenomena
invalidating the sample(s).
In the selection of training samples, the image analyst should attempt
to distribute them evenly throughout the geographic area covered by the
scanner. A relatively even distribution of samples will in most cases:
1. Account for variations in ground cover conditions over a
fairly large geographic area.
2. Force field teams to observe a larger geographic area in which
additional samples may be selected to supplement existing ones
or replace invalidated samples1. In the process of traveling
between preselected training samples, field personnel frequently
17
-------
observe features or conditions unique or common enough to include
as additional training samples; by covering a considerable
geographic area in the field, the potential for additional
data and representative samples is increased immeasurably.
Whether identified by photo interpretation or field observation, every
training sample must be precisely located. There should never be any ques-
tion as to the exact location of a training sample. In fact, whenever
possible, training fields should be located so that geographic features can
help in locating these samples on the photography, in the field and on the
screen of a cathode ray tube. These training samples may be delin-
eated on translucent overlays and annotated with preestablished coding for
the various categories in the hierarchy. Fiducial marks and distinctive
geographic features such as roadways, pipelines, water bodies, etc., should
be traced for assistance in overlay orientation.
The point of all this is that field personnel should never expect to
find perfect training samples at every location indicated on the photography.
Because of this, programs must be planned with sufficient flexibility to
contend with these problems and, yet, gather the quality and quantity of
data required to develop an accurate and useful classification.
In addition to surface conditions such as size, homogeneity, and
uniformity, anomalies in scanner data may prevent using one or more training
samples. It must be recognized that no single sample can be expected to
adequately represent every other area of similar surface material within
the scanner field of view.
Training sample selection is by no means infallible. When visited in
the field, even the most likely training samples may fall short of expecta-
tions. Field verification of any group of preselected samples may result
in as much as 20% attrition rate. Personnel in the field are in a much
better position than the laboratory photo interpreter to evaluate the
qualifications of a given site as a potential training sample. All samples
identified in this manner should be plotted and coded on maps, imagery
and overlays with the same exacting attitude and considerations as in the
preselection process.
18
-------
SECTION 7
GROUND TRUTH
Surface investigations provide indispensable information for developing
a valid ground cover classification, whether the remote sensing system
employs traditional image analysis or automated data processing techniques.
The term "ground truth" used in remote sensing communities is actually
a misnomer. Ground observations are really only valid when they are a£ the
time of overflight. Traditionally, ground truth is acquired a week or two
after the overflight and is therefore, subject to speculation. Because of
its wide acceptance and usage by the remote sensing community we will con-
tinue to refer to these visits (regardless of time) as ground truth.
INFORMATION REQUIREMENTS
A comprehensive land use project must accurately identify as many land
uses or ground cover types/conditions as the state of the art allows.
For this to occur, determine what conditions (sun angle, climatic or
atmospheric) will have the most significant influence on developing a valid
classification for a given project, in a given geographic area at a partic-
ular time of the year.
Also, it must be remembered that the Earth's surface is being studied
and that although the major physiographic features may seem to remain statis,
the ground cover, whether manmade or natural, are extremely dynamic. Since
the individual or combined influences of the conditions may vary considerably
from one ecosystem to the next, it is vitally important that the area se-
lected as the focus of surface investigations be representative of the entire
geographic area to be classified.
FIELD OPERATIONS
In the actual field operations, it is important that the investigators
be reasonably well versed in geography, botany, forestry and agronomy, and
have basic photo interpretation skills. Each person of a ground truth team
should have at least a good understanding of how the remote sensor will
respond to the various surface conditions being identified. Field work
must be carried out efficiently as well.
19
-------
DATA COLLECTION EQUIPMENT AND MATERIALS
Regardless of attempted simplicity, ground truth activities are often
encumbered by numerous checklists, forms, maps, photographs and reference
material that may rapidly become a confusing mass of paper. To minimize
this possibility, all vital materials should be filed in an envelope or
folder large enough to contain the following items:
Ground truth forms
Training sample check list
Instructions
Aerial photography (paper prints and/or transparencies)
Maps
Photographic logs (if required)
The Ground Truth Forms
A well researched, well human-engineered ground truth form is essential
in the automatic data processing area of remote sensing. The data that are
obtained on the ground and transferred via the ground truth form serves as
the base on which decisions that affect data processing procedures are made.
A site observation form should be easy to complete and can serve as a
motivating force to the observer to note and assess those features that ap-
pear to be anomalies. The form should be straightforward and designed to
eliminate excessive writing which could eventually cause the observer to
become sloppy and careless. In constructing the form, attempt to avoid
redundant entries and try to maintain a logical ordering of the data. Keep
the form to a maximum of two pages (reverse printing is acceptable) and
concentrate on those caegories that will satisfy your hierarchial require-
ments. Not only will this reduce the cost of printing the form, it will also
require less storage space when filing completed forms.
Table 2 is an example of how to organize the form for quick examination
of an area based on the classification hierarchy. Note that the form re-
flects the concern of the data processing analyst with reclaimed areas, nat-
ural vegetation and activities associated with a strip mining operation. A
word or couple of words, check marks or circles are the methods employed to
facilitate the completion of this form.
Additional Equipment
The following is a listing and brief explanation, where necessary for
clarification, of various items essential or useful in field studies.
Clipboard
Ball point pens
Pencils
Driftline pen and ink For annotations on photography and maps.
20
-------
TABLE 2. MODIFIED GROUND TRUTH FORM.
I.
Mine Name
Site No.
Date
Time
Crew
II. Film Type
Roll No.
Azimuth
Filter
Exposure No.
Subject
III. Site Description (check appropriate site)
Forested Area ( ), Woodland ( ), Savannah ( ), Reclaimed Area ( ),
Mining Activities ( )
Major Plant/Tree Species , ,
* Understory vegetation types if area is forested or a woodland
Average Plant Height
Ground/Canopy Density: 0-10% ( ) 10-20% ( ) 20-30% ( ) 30-40% ( )
40-50% ( ) 50-60% ( ) 60-70% ( ) 70-80% ( )
80-90% ( ) 90%> ( )
Plant Condition: Good-Excellent ( ) Fair-Good ( ) Poor ( )
Vigorous
Healthy
Above Avg. Growth
Diseased ( )
Necrosis
Vigorous
Healthy
Avg. Growth
Chlorotic
Wilted
Insect-Damaged ( )
U- % of Site
Stage of Growth: Immature ( ) Mature ( ) Flowering ( ) Heading ( )
Soil Type Soil Color
Soil Texture
Water Regime: Dry ( ) Moist ( ) Wet ( ) Boggy ( )
Slope
Aspect
Exposed Rocks, Boulders, Etc.
Comments:
N
Approximate Size
21
-------
Scale (1/10 inch increments) Imagery at a scale of 1:24000 or 1:62500
provides good surface detail on the photography, ease of detail
matching to a standard scale map, and allows direct measurements
of approximately one inch to a mile. One-tenth inch increments
are advantageous in relating vehicle odometer indications to the
maps and photographs.
Field identification guides Not everyone can hope to be a proficient
botonist or forester, at least in respect to making rappLd species
identifications in the field. If project requirements call for
species differentiation, the average field worker will find a good
identification guide an invaluable source of information.
Camera and film Photographic records are often valuable in later
analysis of specific areas or individual training samples.
Plant press If field identifications are not possible, a plant press
may be used to preserve samples for later identification and
specimen for study by future crews.
Transparent tape It is often advisable, when making annotations on
photography, to cover them with a protective piece of tape* Vinyl
transparent tape eliminates glare and readily accepts ink for
additional notations.
Collection bags Transparent plastic bags are good for collecting
plant species. If the samples must be retained for more than 2
or 3 days, a plant press is more practical.
Adhesive labels Labels may be used to identify training samples on
photography or to mark collected vegetation samples.
Compass The importance of precise orientation in the field cannot
be overemphasized. Not only is it embarrassing to be lost, but
one may be hard pressed to locate a number of training samples
studies under such a condition.
22
-------
REFERENCES
1. Wenderoth, S. and E. Yost. Multispectral Photography for Earth Resources,
Long Island University, Greenvale, New York, pp. 4.1-4.17, 1972.
2. Packer, Paul E. Rehabilitation Potential and Limitations of Surface-
Mined Land in the Northern Great Plains, U.S. Department of Agriculture,
Technical Bulletin Number , pp. 4-7, 1974.
3. Trewartha, Glenn T., A. H. Robinson, and E. H. Hammons. Fundamentals of
Physical Geography, McGraw Hill Book Company, pp. 322-325, 1961.
4. Fowells, H. A. Silvics of Forest Trees of the United States, U.S.
Department of Agriculture, Handbook Number 271, pp. 411-422, 1965.
5. Whitley, Sidney L. Low Cost Data Analysis Systems for Processing Multi-
spectral Scanner Data, NASA-JSC Earth Resources Laboratory at NSTL,
Report Number 157, January, 1977.
23
-------
1 TECHNICAL REPORT DATA
|1. REPORT NO. 2.
| EPA-600/7-77-100
J4. TITLE AND SUBTITLE
GUIDE TO PRESELECTION OF TRAINING SAMPLES AND GROUND
| TRUTH COLLECTION
7. AUTHOR(S)
Charles E. Tanner
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Lockheed Electronics Company, Inc.
Remote Sensing Laboratory
Las Vegas, Nevada 89114
12. SPONSORING AGENCY NAME AND ADDRESS
U.S. Environmental Protection Agency/Las Vegas, NV
Office of Research and Development
Environmental Monitoring and Support Laboratory
I Las Vegas, Nevada 89114
3. RECIPIENT'S ACCESSION'NO.
5. REPORT DATE
September 1977
6. PERFORMING ORGANIZATION CODE
8. PERFORMING ORGANIZATION REPORT NO.
10. PROGRAM ELEMENT NO.
EHE 625
11. CONTRACT/GRANT NO.
EPA 68-03-2153
13. TYPE OF REPORT AND PERIOD COVERED
14. SPONSORING AGENCY CODE
EPA/600/07
15. SUPPLEMENTARY NOTES
116. ABSTRACT
This report was prepared to provide the novice data processing analyst and field
personnel with the tools and basic concepts used in the processing of multispectral
scanner data via an interactive or conventional processing system.
Included in the guide is an explanation of the need for the collection of
accurate/inexpensive "ground truth" and brief descriptions of the various ecosystems
that will be encountered in this study. Also, a detailed list of the actual
parameters that should be included in a well-designed ground truth form are provided.
Sampling schemes from Landsat and aircraft multispectral scanner data are also
discussed at length along with procedures and recommendations for selecting training
samples from photography for use in automatic data processing.
JV7. KEY WORDS AND DOCUMENT ANALYSIS
ja. DESCRIPTORS
I Aerial Photography
I Land Use
I Photographic Reconnaissance
I Photo Interpretation
I Space Born Photography
I Stereo Photography
18. DISTRIBUTION STATEMENT
Release to Public
b.lDENTIFIERS/OPEN ENDED TERMS
Automatic Data Processing
Multispectral Scanner
Ground Truth
Field Observations
19. SECURITY CLASS (This Report)'
Unclassified
20. SECURITY CLASS (This page)
Unclassified
c. COSATI Field/Group
10B
14E
21. NO. OF PAGES
32
22. PRICE
EPA Form 2220-1 (9-73)
a.U.S. GOVERNMENT PRINTING OFFICE! 1977-785-007/1007 9-1
------- |