United States Environmental Protection Agency Office of Reseach and Development Environmental Monitoring EPA-600/7-77-100 and Support Laboratory Las Vegas, Nevada 89114 September 1977 GUIDE TO PRESELECTION OF TRAINING SAMPLES AND GROUND TRUTH COLLECTION Interagency Energy-Environment Research and Development Program Report ------- RESEARCH REPORTING SERIES Research reports of the Office of Research and Development, U.S. Environmental Protection Agency, have been grouped into nine series. These nine broad cate- gories were established to facilitate further development and application of en- vironmental technology. Elimination of traditional grouping was consciously planned to foster technology transfer and a maximum interface in related fields. The nine series are: 1. Environmental Health Effects Research 2. Environmental Protection Technology 3. Ecological Research 4. Environmental Monitoring 5. Socioeconomic Environmental Studies 6. Scientific and Technical Assessment Reports (STAR) 7. Interagency Energy-Environment Research and Development 8. "Special" Reports 9. Miscellaneous Reports This report has been assigned to the INTERAGENCY ENERGY-ENVIRONMENT RESEARCH AND DEVELOPMENT series. Reports in this series result from the effort funded under the 17-agency Federal Energy/Environment Research and Development Program. These studies relate to EPA's mission to protect the public health and welfare from adverse effects of pollutants associated with energy sys- tems. The goal of the Program is to assure the rapid development of domestic energy supplies in an environmentally-compatible manner by providing the nec- essary environmental data and control technology. Investigations include analy- ses of the transport of energy-related pollutants and their health and ecological effects; assessments of, and development of, control technologies for energy systems; and integrated assessments of a wide range of energy-related environ- mental issues. This document is available to the public through the National Technical Informa- tion Service, Springfield, Virginia 22161. ------- EPA-600/7-77-100 September 1977 GUIDE TO PRESELECTION OP TRAINING SAMPLES AND GROUND TRUTH COLLECTION by Charles E. Tanner Lockheed Electronics Company, Inc. Las Vegas, Nevada 89114 Contract 68-03-2153 Project Officer Robert W. Landers Remote Sensing Division Environmental Monitoring and Support Laboratory Las Vegas, Nevada 89114 ENVIRONMENTAL MONITORING AND SUPPORT LABORATORY OFFICE OF RESEARCH AND DEVELOPMENT U.S. ENVIRONMENTAL PROTECTION AGENCY LAS VEGAS, NEVADA 89114 ------- DISCLAIMER This guide has been reviewed by the Environmental Monitoring and Support Laboratory/Las Vegas, Nevada, U.S. Environmental Protection Agency, and ap- proved for publication. Approval does not signify that the contents nec- essarily reflect the views and policies of the U.S. Environmental Protection Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use. 11 ------- FOREWORD Protection of the environment requires effective regulatory actions which are based on sound technical and scientific information. This infor- mation must include the quantitative description and linking of pollutant sources, transport mechanisms, interactions, and resulting effects on man and his environment. Because of the complexities involved, assessment of specific pollutants in the environment requires a total systems approach which transcends the media of air, water, and land. The environmental Monitoring and Support Laboratory-Las Vegas contributes to the formation and enhancement of a sound integrated monitoring data base through multidiscip- linary, multimedia programs designed to: develop and optimize systems and strategies for monitoring pollutants and their impact on the environment demonstrate new monitoring systems and technologies by applying them to fulfill special monitoring needs of the Agency's operating programs This report describes and outlines procedures for the preselection of train- ing samples used in computer processing of multispectral scanner data. These data are then used to access reclamation efforts and monitor changes on active strip mines in the Western United States. r I George B. Morgan Director Environmental Monitoring and Support Laboratory Las Vegas, Nevada iii ------- CONTENTS Foreword Figures vi Tables vi Acknowledgment vii 1. Introduction . - . . . . . ... ...... . . « ... . . . . i - 2. Summary ,... ............. 3 3. Conclusions and Recommendations 4 4. Geography .... . . ,., ... . . . . .... « . . . 5 Geography and History of the Area 5 Mid-latitude Grasslands 5 Desert Vegetation 6 Mountain Vegetation . 6 Vegetative Groupings 7 5. Sampling Scheme 9 Landsat Scheme g Aircraft Scheme 14 6. Training Sample Selection 17 7. Ground Truth 19 Information Requirements 19 Field Operations 19 Data Collection Equipment and Materials 20 The Ground Truth Form 20 Additional Equipment 20 References 23 ------- FIGURES Number Page 1 One of many possible line-scanning methods utilized in airborne infrared sensing 10 2 Black and white Landsat image of eastern Montana 12 3 Low-cost Data System 13 4 Image processing system of the Data Analysis Station .... 15 TABLES Number Page 1 Simulated Classification Hierarchy H 2 Modified ground truth form 21 vi ------- ACKNOWLEDGMENT The scale and scope of this report were made possible through the coop- eration of the U.S. Environmental Protection Agency and the National Aero- nautics and Space Administration/Earth Resources Laboratory (NASA/ERL) in Slidell, Louisiana. Sincere thanks are extended to Mr. Larry Erickson, Lockheed Electronics Company at NASA/ERL, for allowing us to cite sections of his unpublished manuscript and also for acting as major advisor on this report. Appreciation is extended to Mr. Sidney Whitley for providing the nec- essary documentation for generating the photography of the Varian-75 system and peripheral hardware. vii ------- SECTION 1 INTRODUCTION Quite often data processing analysts are forced to process multispectral scanner data with a minimum of background information and personal knowledge of the scene under investigation. This situation is compounded by the col- lection of inadequate data about the site. Under ideal situations a data sampling scheme should be developed by the project statistician with inputs from the data processing analyst. The classification hierarchy (a categorical ranking of the natural and/ or manmade features for use in processing digital multispectral scanner data) and ground truth form (a ground truth form is merely an out-of-doors exten- sion of the classification hierarchy) should be developed by the person(s) responsible for the data processing in conjunction with the field mensuration leader. This would assure the collection of adequate ground data. Obviously there are certain things that cannot be done in the field because of time and resource constraints. On the other hand, there are certain operations that can only be accomplished by a knowledgeable field person with virtually little expenditure of energy, time, and/or resources. These points must be discussed at meetings that are open to free discourse of ideas and suggestions. An explanation of the data processing objectives should also be presented to the field personnel to make them cognizant that a thorough job on their part will greatly enhance the analyst's chances of generating an accurate classification map and that they are also responsible for the results and any conclusions drawn from such analysis.^ Field ground truth operations are usually expensive and time consuming and rarely have been done properly. New efficient methods must be developed to make it much more cost-effective. Hopefully, statistical sampling methods will be developed in order to reduce the total number of ground truth sam- ples. In essence, ground truth data must be collected by statistically sound methods and must be efficiently collected and sufficiently accurate for the given application and objectives. This report addresses the problem of how to go about collecting ground truth for the purpose of processing digital data. Also, this report outlines some of the procedures used by the author in the actual data reduction phase of automatic data processing. Because of the immediate need of such a report by personnel engaged in ground truth collection at the Environmental Monitor- ing and Support Laboratory in Las Vegas, Nevada, no attempt was made to famil- iarize the reader with the basic principles of pattern recognition analysis. ------- However, those who are interested in pursuing this type of analysis may refer to Landgrebe's "Systems Approach to the Use of Remote Sensing" published by Purdue University in 1971. ------- SECTION 2 SUMMARY In summary, pattern recognition using the multispectral approach has been described as an analysis procedure that has proved useful in coping with the vast amount of digital data being collected daily by conventional air- craft and space satellites (Landsat). This type of analysis is heavily dependent upon accurate ancillary data, e.g., topographic maps, vegetation maps, geologic survey, land-use maps and ground truth. Current ground truth that has been accurately and inexpensively gathered is a "dream come true" for the automatic data processing analyst. This guide addresses the criteria and procedures that must be used in the gathering of ground truth data for use in processing digital remote sensing data. Regardless of the choice of pattern recognition analysis, this report should have application to the ground truthing activities. ------- SECTION 3 CONCLUSIONS AND RECOMMENDATIONS Ground truth operations are usually expensive and time consuming and are rarely performed properly. The'pattern^recognition community in the world of remote sensing is indeed dependent upon ground truth data and can do very little, if anything, without this vital information. Work must con- tinue to develop new efficient methods to make ground truth collection more cost-effective. Also work must continue on the development of statistical sampling methods that will reduce the total number of ground truth samples. The procedures outlined in this guide were based on specific agency con- straints, systems designs and manpower availability; therefore, it is recom- mended that modifications be made to the procedure to fit the needs of the analyst. It is also recommended that an explanation of the data processing objectives be presented to the field mensuration personnel to make them aware of the need for the collection of accurate ground truth. Too, it is recommended that preselection of training samples be performed as a joint venture between the analyst and the field team. Such an effort would foster cooperation as well as providing a learning experience for the novice field personnel. ------- SECTION 4 GEOGRAPHY OF THE AREAS Before a classification hierarchy or sampling scheme can be developed, one must obtain basic information concerning the area. The following in- formation is intended to illustrate the broad ranges of problems associated with developing inventory techniques for strip mining operations via remote sensing techniques in the west and Northern Great Plains areas of the United States. The Western Energy Project/Strip Mining (WEP/SM) is a rather unusual undertaking. It is the analyst's intent to map active mining features, re- vegetative types within the mined area and natural vegetation and also "determine the condition and density classes associated with each vegetative species via automatic data processing techniques. GEOGRAPHY AND HISTORY OF THE AREA The Northern Great Plains are expansive prairie lands bounded in the west by the Rocky Mountains in Montana and Wyoming. The whole Northern Great Plains Coal Province includes all coal in western North Dakota, coal occurring in the Missouri River drainage and east of the Rocky Mountains of Montana and Wyoming, coal in western South Dakota, coal in the Denver Basin of Colorado and Raton Mesa of Colorado and eastern New Mexico. A good deal of the Federal Coal Reserve lies in this province. The geologic formation containing these coal deposits consists of from 1,700 to 3,200 feet of sandstone, shale and coal. In general, coalbeds are thickest in Wyoming. Some important coalbeds are the Badges and School seams in Glenrock, the Monarch seam near Sheridan, the Healey bed near Lake DeSmet, and the famous Wyodak bed near Gillette.^ These coal seams in this region were formed from thick and extensive accumulations of biological, principally vegetative, matter buried through geologic time. Development of the thick coal seams of the west required very large flooded areas (swamps) which slowly subsided while growth of vegetation was optimal. MID-LATITUDE GRASSLANDS In the text written by Glen Trewartha, 1961 (3), page 322, it states that, "In the midcontinent grasslands of the interior United States east of the Rockies, two, or sometimes three, subdivisions are recognized. The tall- grass prairie, or true prairie, which has been replaced by farmland, used to occupy the more humid eastern parts, and the short-grass steppe, or ------- plains, still exists in many of the drier western parts. Between them is a transition zone called the mixed-grass prairie, which consists both of mid- grasses, two to four feet tall, and of shorter grasses, which together form an upper and a lower story. There are assumptions that the two-story mixed- grass prairie originally also prevailed on the plains area farther west, and that the midgrass species were largely killed by overgrazing. West of the Rockies extensive vegetation areas of the original short bunchgrass still exist, mainly in Washington, Oregon, Idaho and California." (>3) DESERT VEGETATION In the same text '^) page 323, Trewartha said, "Desert type vegetation pose still another problem. Some soilless, windswept, rocky deserts and some areas of moving sand dunes, may be devoid of all vegetation and these are the exception. Most desert regions have some plant life, although almost invaribly it is sparse. Desert plants are of several types and each has a different way of surmounting the handicaps of its arid environment." "Desert shrub is the most widespread form of arid-land vegetation. The deciduous species make use of leaf shedding in order to withstand drought. The evergreen varieties have protective structures such as smal-l, thick and leathery leaves with shiny, waxy surfaces. The two most widespread desert shrubs are sagebrush and creosote bush, the second of which grbws chiefly in the hotter and more arid southwest." "Arid-land vegetation also includes leafless, thorny succulents such as cacti, certain salt-tolerant plants, and a variety of short-lived transients. The so-called transients, usually small in size, include many flowering annuals, but also grasses and tuberous plants." MOUNTAIN VEGETATION According to Trewartha ^' "Mountain areas are characterized by an unusual variety of local environments existing in close juxtaposition. On the lower slopes of highlands, the vegetation cover may resemble that of the surrounding lowlands. With increasing altitude, temperature decreases rap- idly, while solar energy, rainfall and wind speeds increase. Change of vegetation fall into a rough vertical zonation of plant life, but with many interrupting local variations." "Conifers, with their strong tolerance for the vicissitudes of mountain climates and soils, comprise the largest element of mountain forests in middle latitudes and they are also found at high elevations. Within the coniferous zone, pines usually predominate at lower elevation, but fir and spruce take over near the upper climatic limits of forest. Above these limits, which vary in altitude, trees will not grow because of low temper- atures, a short growing season, diurnal freeze and thaw, strong winds and thin soils that alternate between saturation and aridity."(3) ------- VEGETATIVE GROUPINGS Paul Packer in his technical report entitled "Rehabilitation Potentials and Limitations of Surface-Mined Land in the Northern Great Plains"^2' iden- tified 16 broad, but recognizable, vegetation types. Nine of these 16 vegetation types occur on surface mineable areas. These vegetation types and an explanation as to their range and suitability as rehabilitation types are also given below: F£oodp£flU.lt This vegetation type occurs in the bottom lands and banks of major rivers and on broad floodplain terraces that have alluvial soils. It occupies, sites, some highly saline, that have high water tables. The floodplain type is characterized predominately by hardwood tree and shrub species that are poorly suitable and poorly available for rehabilitation. This vegetation type occupies breaks along rivers and streams and steep south slopes of exposed shales, sandstones and clays. Dominant plant species are arid-land shrubs and grasses associated locally with scabby ponderosa pine forests. The species that characterize this type are poorly available for rehabilitation. SkoSLt-Gft.CU>& PtLCUAsie. This vegetation type occupies dry prairies on shallow soils in southeastern Montana and northeastern Wyoming. Dominant species are blue grama grass, western wheatgrass, and various needlegrasses. The species that characterize this type have moderately poor suitability and fair availability for rehabilitation. M*Ld-Shdflt G/uti-6 PJWUSLie. -- This type occupies rolling prairies on loam to clay- loam soils in eastern Montana. It is char- acterized by western wheatgrass, needle and thread grass and blue grama grass. These species have moderately poor suitability and fair availability for rehabilitation. This type, which occurs on loamy soils in extreme eastern Montana, southwestern North Dakota, and north- western South Dakota, contains no dominant short grass. Prin- cipal species are needlegrasses, wheatgrasses, and blue stem grasses. Most species that comprise this type are fairly suitable and have good availability for rehabilitation. This vegetation type occurs on open grass- land of mid and short grasses, with scattered sagebrush, and occurs on silty clay-loam soils in southeastern Montana and northeastern Wyoming. Most of the species that comprise this type have good suitability and moderately good availability for rehabilitation. Sa.ge.bJiUAh-Ste.ppe. This type is dominated by sagebrush in open grassland containing wheatgrasses, needlegrasses and threadgrasses on silt to silty clay-loam soils. It occurs chiefly in north- ------- eastern Wyoming and most of the major species have moderately good suitability, but are relatively unavailable for reha- bilitation. -- This vegetation type occurs on gently rolling prairie northeast of the Missouri River in North Dakota, and is characterized by wheatgrasses, big and little blue stem grasses, and needlegrasses on loam soils of glacial till origin. Nearly all species in this type have good suitability and availability for rehabilitation. \ Pond&LOAO. Rtne ₯oti&>t This vegetation type occurs mainly in eastern Montana and northeastern Wyoming on uplands, ridges and north slopes that have shallow loam soils. Prominent species are ponderosa pine, snowberry, blue grasses, fescues, and June grass. These species are only fairly suitable but have good availability for rehabilitation. These types have been modified and placed in the Western Energy Project/ Strip Mining classification hierarchy for use in the actual data reduction phase of the study. ------- SECTION 5 SAMPLING SCHEME Before a scene (any segment, of digital data) can be classified distinct signatures must be developed for each class in the hierarchy. This process of developing signatures poses the problem of determining the number of training fields that, must be selected for such an undertaking. Too many training fields would require an excessive expenditure of resources in the field as well as increasing processing time in the Laboratory. Conversely, an insufficient number of training fields that are poorly distributed over the imagery could be the cause of the generation of a classification run with little, if any, confidence attached to the results. Also, scanner anomalies will require that training fields be selected at nadir and to the left and right of this point as well., After all is said and done, it must be realized that there is, at this time, no way to systematically establish training fields for a given number of classes over a large area simply because classes will occur as nature dictates, e.g., certain tree species can only be found where the soil types, elevation and water regime meet their requirements. Training fields have to be established wherever the class of interest occurs along the entire length of the flight line or across the scanner's full field of view (Figure 1). LANDSAT SCHEME Assume that, based on the information in the previous section," Table 1 has been developed as the initial step in attempting to use Landsat data to classify native vegetation in eastern Montana. This region is characterized by a short summer season of scanty precipitation and low humidities with a high percentage of clear, warm, sunny days and winters with heavy snowfall and fairly low temperatures. Annual precipitation ranges from 28 to 60 inches with the fall season receiving the largest percentage 6f rainfall. Growing season (May through August) precipitation varies from 5 to somewhat more than 7 inches. (4) Vegetation types are in the mid-grass and mid-short grass prairie varieties as described earlier. A reduction of a Landsat image of this area may be seen in Figure 2. This imagery is formed by merging the four tapes that are required to com- plete a scene into a single 9-track computer compatible tape. Following this merging step, the tape is viewed and filmed at the Data Analysis Station (Figure 3). Finally, the 9" x 9" film format is processed according to standard operational procedures to produce transparencies or hard copies. Training samples should be selected from this imagery in conjunction with ------- SCANNER GEOMETRY NADIR TRACK NADIR GROUND COVERAGE OF INSTANTANEOUS FIELD OF VIEW GROUND COVERAGE OF INSTANTANEOUS FIELD OF VIEW INCREASES WITH SCAN ANGLE Figure 1. One of many possible line-scanning methods utilized in airborne infrared sensing. A rotating mirror in the multispectral scanner scans the terrain perpendicular to the line of flight. 10 ------- TABLE 1. SIMULATED CLASSIFICATION HIERARCHY, LEVEL I Agriculture Fqrestland Grasslands LEVEL II Cultivated Fields <1 /^-sFallow Fields Pinr rf ' c Hardwood /Short-Grass Prairie £-M1d-Grass/Mid-Short Grass wassland - Sagebrush LEVEL III White Pine Lodgepole Pine A Ponderosa Pine .^Blue Grama / eedlegrass' LEVEL IV 0-20% Canopy Density. 25-50% Canopy Density \ 50-75% Canopy Density / 75- > % Canopy Densi ty* 0-25% Ground Density 25-50% Ground Density 50-75% Ground Density 75- > % Ground Density LEVEL V Seedling/Saplings Pole Timber Immature Sawtimber Mature Sawtimber /Seed! ings Immature \ Mature \ Nlixed ------- Figure 2. NASA black and white Landsat imagery of eastern Montana, ------- LOW-COST DATA SYSTEM OPERATOR'S TERMINAL AND CARD READER 9 TRACK MAGNETIC TAPE DRIVES COLOR FILM RECORDER ACK SYSTEM AND VARIAN V 75 COMPUTER COMTAL 8100 COLOR DISPLAY Figure 3. The Low-Cost Data Analysis Station adapted from NASA/ERL report number 157. 13 ------- low altitude color-infrared photography of the same area. As a general prac- tice, at least three training samples per class per tape should be selected, outlined and coded on clear or frosted overlay material. Based on the classification hierarchy (Table 1) it is necessary to select at least 12 training samples per specie per tape. This means that 36 training samples for the general class "grassland" would have to be selected. Three training samples per class per tape is by no means the maximum nor the minimum number required for sampling purposes. In areas of intensive land use considerably more samples may be necessary to accurately identify the class because of the varying nature of land use practices between individual land owners. Because of the difference in line/pixel (instantaneous field of view) count, which is a function of altitude and detector size, a new sampling scheme must be devised to facilitate processing of the aircraft multispectral scanner data. AIRCRAFT SCHEME To begin remember the constraints of the system on which the multi- spectral scanner data are to be processed. For all practical purposes we will confine all of our calculations to the low-cost Data Analysis Station (DAS). The image processing system (IPS) on the DAS (Figure 4) is capable of displaying 508 pixels of the total 628 pixels from the EPA scanner and 508 lines of the data at one time. The sampling strategy for aircraft scanner data will be based on these figures and the assumption that all data have been obtained at an altitude of 12,000 feet with a multispectral scanner having a 2.5 mrad rad. detector size. Also, a 6 x 6 pixel (slightly less than 1 acre of 0.301 hectare) cursor will be used as a standard sample size. Data obtained at the aforementioned altitude with the 2.5 mrad spot size will have a pixel size of approximately 30 feet (9.04 meters) on a side at nadir. Due to scanner sweep angle and instability of the aircraft, pixel sizes will vary from both sides of nadir. This problem can, of course, be corrected by software programs currently available on the system. A dne percent sample of 258,000 pixels (508 pixels x 508 pixels of data) or 5,160 acres (2085 hectares) is sufficient to demonstrate the design potential of the automatic processing unit. This is what automatic data processing is all about; performing an accurate inventory with less expenditure of time, money and natural energy sources than conventional mthods. A one percent sample would be: Total number pixels 258,064 x 0.01 sample rate = 2580.64 pixels (total number of samples) The standard sample rate is 36 pixels per training field (roughly 3/4 of an acre or 0.03 hectare), therefore: 2580.64 * 36 = 71 total training samples/scene 14 ------- Figure 4. Image Processing System (IPS) of the Data Analysis Station. ------- It is important to remember that sampling schemes are theoretical and serve as an initial step in most investigations. Modifications of the schemes are and should be expected because of the very nature of remote sensing targets. This derived figure (71) is the total number of training samples that should be selected from the scene. In order to determine the number of training samples per class divide the total number of samples by the number of classes in the hierarchy (in this example the number is 12). This sampling scheme is designed for classification hierarchy with 5 or more classes and/or subclasses. Training samples for hierarchies with less than 5 classes and/or subclasses should be located and coded as often as possible and over the entire scene. 16 ------- SECTION 6 TRAINING SAMPLE SELECTION The previous material provided an analyst with a number to work with: a starting point. It is now the responsibility of the photo interpreter to locate areas on the ground, via photographs or transparencies, that are in the classification hierarchy. Aircraft color, color-infrared, or black and white imagery should be used in the selection of training fields; the optimum condition being that the aerial photography is obtained simultaneously with the aircraft multi- spectral scanner coverage (all flightlines and other flight parameters should be calculated on scanner specifications since it is the primary sensor). Selecting training fields in this manner, prior to field observations, is at best a guess. The ground training samples may exhibit conditions consider- ably different from those prognosticated by the image interpreter. There are several criteria that must be satisfied before the training field is accepted, its spectral characteristics examined and ultimately merged to form a single representative spectral signature by which all other areas of the same category or classification may be recognized. The first and most important criterion is that the training field be homogeneous and second its material must be uniformly distributed. This is, of course, determined by the spatial attributes of the field in question. Obviously, homogeneity and uniformity may be affected by a variety of natural or anomalous influences that can occur in a brief period of time. Size is the third criterion used in selecting training fields. Sample size can be measured directly from the photography when the scane of imagery and/or ground distance is known. However, subsequent field visits may reveal dimensional changes, caused by manmade factors or natural phenomena invalidating the sample(s). In the selection of training samples, the image analyst should attempt to distribute them evenly throughout the geographic area covered by the scanner. A relatively even distribution of samples will in most cases: 1. Account for variations in ground cover conditions over a fairly large geographic area. 2. Force field teams to observe a larger geographic area in which additional samples may be selected to supplement existing ones or replace invalidated samples1. In the process of traveling between preselected training samples, field personnel frequently 17 ------- observe features or conditions unique or common enough to include as additional training samples; by covering a considerable geographic area in the field, the potential for additional data and representative samples is increased immeasurably. Whether identified by photo interpretation or field observation, every training sample must be precisely located. There should never be any ques- tion as to the exact location of a training sample. In fact, whenever possible, training fields should be located so that geographic features can help in locating these samples on the photography, in the field and on the screen of a cathode ray tube. These training samples may be delin- eated on translucent overlays and annotated with preestablished coding for the various categories in the hierarchy. Fiducial marks and distinctive geographic features such as roadways, pipelines, water bodies, etc., should be traced for assistance in overlay orientation. The point of all this is that field personnel should never expect to find perfect training samples at every location indicated on the photography. Because of this, programs must be planned with sufficient flexibility to contend with these problems and, yet, gather the quality and quantity of data required to develop an accurate and useful classification. In addition to surface conditions such as size, homogeneity, and uniformity, anomalies in scanner data may prevent using one or more training samples. It must be recognized that no single sample can be expected to adequately represent every other area of similar surface material within the scanner field of view. Training sample selection is by no means infallible. When visited in the field, even the most likely training samples may fall short of expecta- tions. Field verification of any group of preselected samples may result in as much as 20% attrition rate. Personnel in the field are in a much better position than the laboratory photo interpreter to evaluate the qualifications of a given site as a potential training sample. All samples identified in this manner should be plotted and coded on maps, imagery and overlays with the same exacting attitude and considerations as in the preselection process. 18 ------- SECTION 7 GROUND TRUTH Surface investigations provide indispensable information for developing a valid ground cover classification, whether the remote sensing system employs traditional image analysis or automated data processing techniques. The term "ground truth" used in remote sensing communities is actually a misnomer. Ground observations are really only valid when they are a£ the time of overflight. Traditionally, ground truth is acquired a week or two after the overflight and is therefore, subject to speculation. Because of its wide acceptance and usage by the remote sensing community we will con- tinue to refer to these visits (regardless of time) as ground truth. INFORMATION REQUIREMENTS A comprehensive land use project must accurately identify as many land uses or ground cover types/conditions as the state of the art allows. For this to occur, determine what conditions (sun angle, climatic or atmospheric) will have the most significant influence on developing a valid classification for a given project, in a given geographic area at a partic- ular time of the year. Also, it must be remembered that the Earth's surface is being studied and that although the major physiographic features may seem to remain statis, the ground cover, whether manmade or natural, are extremely dynamic. Since the individual or combined influences of the conditions may vary considerably from one ecosystem to the next, it is vitally important that the area se- lected as the focus of surface investigations be representative of the entire geographic area to be classified. FIELD OPERATIONS In the actual field operations, it is important that the investigators be reasonably well versed in geography, botany, forestry and agronomy, and have basic photo interpretation skills. Each person of a ground truth team should have at least a good understanding of how the remote sensor will respond to the various surface conditions being identified. Field work must be carried out efficiently as well. 19 ------- DATA COLLECTION EQUIPMENT AND MATERIALS Regardless of attempted simplicity, ground truth activities are often encumbered by numerous checklists, forms, maps, photographs and reference material that may rapidly become a confusing mass of paper. To minimize this possibility, all vital materials should be filed in an envelope or folder large enough to contain the following items: Ground truth forms Training sample check list Instructions Aerial photography (paper prints and/or transparencies) Maps Photographic logs (if required) The Ground Truth Forms A well researched, well human-engineered ground truth form is essential in the automatic data processing area of remote sensing. The data that are obtained on the ground and transferred via the ground truth form serves as the base on which decisions that affect data processing procedures are made. A site observation form should be easy to complete and can serve as a motivating force to the observer to note and assess those features that ap- pear to be anomalies. The form should be straightforward and designed to eliminate excessive writing which could eventually cause the observer to become sloppy and careless. In constructing the form, attempt to avoid redundant entries and try to maintain a logical ordering of the data. Keep the form to a maximum of two pages (reverse printing is acceptable) and concentrate on those caegories that will satisfy your hierarchial require- ments. Not only will this reduce the cost of printing the form, it will also require less storage space when filing completed forms. Table 2 is an example of how to organize the form for quick examination of an area based on the classification hierarchy. Note that the form re- flects the concern of the data processing analyst with reclaimed areas, nat- ural vegetation and activities associated with a strip mining operation. A word or couple of words, check marks or circles are the methods employed to facilitate the completion of this form. Additional Equipment The following is a listing and brief explanation, where necessary for clarification, of various items essential or useful in field studies. Clipboard Ball point pens Pencils Driftline pen and ink For annotations on photography and maps. 20 ------- TABLE 2. MODIFIED GROUND TRUTH FORM. I. Mine Name Site No. Date Time Crew II. Film Type Roll No. Azimuth Filter Exposure No. Subject III. Site Description (check appropriate site) Forested Area ( ), Woodland ( ), Savannah ( ), Reclaimed Area ( ), Mining Activities ( ) Major Plant/Tree Species , , * Understory vegetation types if area is forested or a woodland Average Plant Height Ground/Canopy Density: 0-10% ( ) 10-20% ( ) 20-30% ( ) 30-40% ( ) 40-50% ( ) 50-60% ( ) 60-70% ( ) 70-80% ( ) 80-90% ( ) 90%> ( ) Plant Condition: Good-Excellent ( ) Fair-Good ( ) Poor ( ) Vigorous Healthy Above Avg. Growth Diseased ( ) Necrosis Vigorous Healthy Avg. Growth Chlorotic Wilted Insect-Damaged ( ) U- % of Site Stage of Growth: Immature ( ) Mature ( ) Flowering ( ) Heading ( ) Soil Type Soil Color Soil Texture Water Regime: Dry ( ) Moist ( ) Wet ( ) Boggy ( ) Slope Aspect Exposed Rocks, Boulders, Etc. Comments: N Approximate Size 21 ------- Scale (1/10 inch increments) Imagery at a scale of 1:24000 or 1:62500 provides good surface detail on the photography, ease of detail matching to a standard scale map, and allows direct measurements of approximately one inch to a mile. One-tenth inch increments are advantageous in relating vehicle odometer indications to the maps and photographs. Field identification guides Not everyone can hope to be a proficient botonist or forester, at least in respect to making rappLd species identifications in the field. If project requirements call for species differentiation, the average field worker will find a good identification guide an invaluable source of information. Camera and film Photographic records are often valuable in later analysis of specific areas or individual training samples. Plant press If field identifications are not possible, a plant press may be used to preserve samples for later identification and specimen for study by future crews. Transparent tape It is often advisable, when making annotations on photography, to cover them with a protective piece of tape* Vinyl transparent tape eliminates glare and readily accepts ink for additional notations. Collection bags Transparent plastic bags are good for collecting plant species. If the samples must be retained for more than 2 or 3 days, a plant press is more practical. Adhesive labels Labels may be used to identify training samples on photography or to mark collected vegetation samples. Compass The importance of precise orientation in the field cannot be overemphasized. Not only is it embarrassing to be lost, but one may be hard pressed to locate a number of training samples studies under such a condition. 22 ------- REFERENCES 1. Wenderoth, S. and E. Yost. Multispectral Photography for Earth Resources, Long Island University, Greenvale, New York, pp. 4.1-4.17, 1972. 2. Packer, Paul E. Rehabilitation Potential and Limitations of Surface- Mined Land in the Northern Great Plains, U.S. Department of Agriculture, Technical Bulletin Number , pp. 4-7, 1974. 3. Trewartha, Glenn T., A. H. Robinson, and E. H. Hammons. Fundamentals of Physical Geography, McGraw Hill Book Company, pp. 322-325, 1961. 4. Fowells, H. A. Silvics of Forest Trees of the United States, U.S. Department of Agriculture, Handbook Number 271, pp. 411-422, 1965. 5. Whitley, Sidney L. Low Cost Data Analysis Systems for Processing Multi- spectral Scanner Data, NASA-JSC Earth Resources Laboratory at NSTL, Report Number 157, January, 1977. 23 ------- 1 TECHNICAL REPORT DATA |1. REPORT NO. 2. | EPA-600/7-77-100 J4. TITLE AND SUBTITLE GUIDE TO PRESELECTION OF TRAINING SAMPLES AND GROUND | TRUTH COLLECTION 7. AUTHOR(S) Charles E. Tanner 9. PERFORMING ORGANIZATION NAME AND ADDRESS Lockheed Electronics Company, Inc. Remote Sensing Laboratory Las Vegas, Nevada 89114 12. SPONSORING AGENCY NAME AND ADDRESS U.S. Environmental Protection Agency/Las Vegas, NV Office of Research and Development Environmental Monitoring and Support Laboratory I Las Vegas, Nevada 89114 3. RECIPIENT'S ACCESSION'NO. 5. REPORT DATE September 1977 6. PERFORMING ORGANIZATION CODE 8. PERFORMING ORGANIZATION REPORT NO. 10. PROGRAM ELEMENT NO. EHE 625 11. CONTRACT/GRANT NO. EPA 68-03-2153 13. TYPE OF REPORT AND PERIOD COVERED 14. SPONSORING AGENCY CODE EPA/600/07 15. SUPPLEMENTARY NOTES 116. ABSTRACT This report was prepared to provide the novice data processing analyst and field personnel with the tools and basic concepts used in the processing of multispectral scanner data via an interactive or conventional processing system. Included in the guide is an explanation of the need for the collection of accurate/inexpensive "ground truth" and brief descriptions of the various ecosystems that will be encountered in this study. Also, a detailed list of the actual parameters that should be included in a well-designed ground truth form are provided. Sampling schemes from Landsat and aircraft multispectral scanner data are also discussed at length along with procedures and recommendations for selecting training samples from photography for use in automatic data processing. JV7. KEY WORDS AND DOCUMENT ANALYSIS ja. DESCRIPTORS I Aerial Photography I Land Use I Photographic Reconnaissance I Photo Interpretation I Space Born Photography I Stereo Photography 18. DISTRIBUTION STATEMENT Release to Public b.lDENTIFIERS/OPEN ENDED TERMS Automatic Data Processing Multispectral Scanner Ground Truth Field Observations 19. SECURITY CLASS (This Report)' Unclassified 20. SECURITY CLASS (This page) Unclassified c. COSATI Field/Group 10B 14E 21. NO. OF PAGES 32 22. PRICE EPA Form 2220-1 (9-73) a.U.S. GOVERNMENT PRINTING OFFICE! 1977-785-007/1007 9-1 ------- |