Design Report for EMAP


                                                                         EPA/600/3-91/053
                                                                               May, 1990
                            Design Report for EMAP

           Environmental  Monitoring and Assessment Program


                    W. Scott Overton1, Denis White2, and Don L. Stevens, Jr.2

               'Department of Statistics, Oregon State University, Corvallis, Oregon
       2ManTech Environmental Technology, Inc., U.S. EPA Environmental Research Laboratory
                            200 SW 35th Street, Corvallis, Oregon


                                   Contract 68-C8-0006
                                      Project Officer
                                    Anthony R. Olsen
                            U.S. Environmental Protection Agency
                             Environmental Research Laboratory
                            200 SW 35th Street, Corvallis, Oregon
Research sponsored by the U.S. Environmental Protection Agency under cooperative agreement CR-816721
with Oregon State University at Corvallis, and under Contract No. 68-C8-0006 to ManTech Environmental
Technology, Inc.
                                                                  HyD Printed on Recycled Paper

-------
The research described in this report has been funded by the U.S. Environmental Protection Agency.
This  document  has been  prepared  at the EPA Environmental  Research Laboratory in  Corvallis,
Oregon, through Contract No. 68-C8-0006 to ManTech Environmental Technology, Inc., and through
cooperative agreement CR 816721 with Oregon State University at Corvallis.  It has been subjected to
the Agency's peer and administrative review and approved for publication.  Mention of trade names or
commercial products does not constitute endorsement or recommendation for use.
                                              11

-------
                                           CONTENTS
1  INTRODUCTION

2  DESIGN STRATEGY AND CHARACTERISTICS
    2.1  Strategic Approach
    2.2  EMAP Bywords
    2.3  EMAP Capability
    2.4  Tier Structure
    2.5  An Emerging Perspective of EMAP

3  SAMPLING DESIGN
    3.1  Design Overview
    3.2  The Sampling Grid
       3.2.1 The Grid
       3.2.2 Geometric Design Criteria
       3.2.3 The Global Structure of the Design
       3.2.4 Baseline Grid Density and Augmentation
       3.2.5 Randomization
    3.3  The Sample
       3.3.1 Tier 1 Landscape Description
       3.3.2 Discrete and Extensive  Resources
       3.3.3 Sampling Considerations
       3.3.4 Tier 1 Sample of Extensive Resources
       3.3.5 Tierl Sample of Discrete Resources
       3.3.6 Augmenting the Grid
       3.3.7 Subpopulations of Interest
       3.3.8 Tier 2 Characterizations
    3.4  Interpenentrating Samples
       3.4.1 Interpenetrating Subsamples
       3.4.2 Ramping Up in the First Cycle
    3.5  Existing Inventories and Monitoring Programs
       3.5.1 Examples of Existing Programs

4  ESTIMATION  AND ANALYSIS
    4.1  Estimation and Description
       4.1.1 Formulae
       4.1.2 Verification
    4.2 Tier 1 Estimates
       4.2.1 Resource Inventory
        4.2.2  Discrete Resources
       4.2.3 Domains
       4.2.4 Domains and Strata
       4.2.5 Spatial Control of Tier  2 Selection
    4.3 Tier 2 Estimates and Descriptions
       4.3.1 Estimates
       4.3.2 Option 1
       4.3.3 Option 2
       4.3.4 Distribution Functions
       4.3.5 Distributions of Continuous Resources
 1-1

 2-1
 2-1
 2-2
 2-3
 2-3
 2-4

 3-1
 3-1
 3-2
 3-2
 3-2
 3-2
 3-4
 3-6
 3-6
 3-6
 3-6
 3-7
 3-7
 3-8
3-10
3-10
3-10
3-11
3-11
3-13
3-13
3-14

 4-1
 4-1
 4-1
 4-2
 4-2
 4-2
 4-4
 4-4
 4-5
 4-8
 4-8
 4-8
4-10
4-10
4-12
4-12
                                                111

-------
       4.3.6 Deconvolution
       4.3.7 Trends and the Interpenetrating Design
       4.3.8 Spatial Pattern
    4.4  Changes in Classification and Structure
       4.4.1 Reclassification
       4.4.2 Subpopulation Estimation
       4.4.3 Restratification
       4.4.4 Restructuring the Interpenetrating Sample
    4.5  Analysis of Associations
       4.5.1 Weighting
       4.5.2 Observational Data
       4.5.3 Structures
       4.5.4 Multivariate Analyses
       4.5.5 Subpopulations Revisited
4-15
4-15
4-17
4-18
4-18
4-18
4-18
4-19
4-19
4-19
4-20
4-20
4-21
4-21
5  REFERENCES
 5-1
                                                 IV

-------
                                           List of Figures
Figure 3-1  The base grid density placed advantageously on the United States                      3-3
Figure 3-2  The truncated icosahedron model projects the familiar soccer ball tesselation pattern
             onto the earth                                                                        3-5
Figure 3-3  The structure of the EMAP grid,  illustrating the four-fold decomposition that will be
            the basis of the  interpenetrating design                                               3-12
Figure 4-1  When a resource is spatially restricted to a subregion, then an approximation to
            the boundary will identify the effective sample size for that resource                    4-6
Figure 4-2  Overlapping domains make it  difficult to impose a common spatial stratification        4-7
Figure 4-3  Partitioning a resource domain into compact clusters of points, with each cluster
            having the same size, Ew, provides structure that maintains spatial distribution of
             the Tier 2 sample                                                                    4-9
Figure 4-4  Examples of descriptive statistics used in the National Lake Survey that could also
            be used in EMAP                                                                    4-13
Figure 4-5  Examples of estimated distributions                                                  4-14
Figyre 4-6  An unrestricted random sample will give a binomial random variable for the number
             of points in the enclosed area in  each figure                                          4-16
                                                  List of Tables
Table 3-1   Schematic of the rotating interpenetrating design prescribed for EMAP
Table 4-1   Tier 1 inventory  estimates and estimated variance of estimates are provided
            for all resources and classes of resources
3-12

 4-3

-------
                                    ACKNOWLEDGEMENTS
The authors sincerely thank the many scientists who  have contributed, directly or indirectly, to this
document  through workshops,  informal  discussions, and reviews.   We appreciate the editorial and
technical assistance provided by Penny Kellar of Killkelly Environmental Associates and by Perry Suk,
who have edited several previous drafts.
                                               VI

-------
                                           ABSTRACT
The EMAP design was developed with the following considerations:
    •  Consistent representation of environmental reality by use of a probability sample
    •  Potential representation of all resources and environmental entities
    •  Capacity for quick response to a new question or issue
    •  Spatial distribution of the sample according to the distribution of the resource

These  considerations,have been met by prescribing a triangular sampling grid on approximately 27 km
spacing,  with  a  40-km2 hexagon  (40-hex)  centered  on  each  grid  point  to  supply  the  sample
representation of resource space.  Inventory of each 40-hex  provides the Tier  1 sample. The sample grid
thus provides a one-sixteenth probability sample of the resource area. The Tier 2 sample is a subsample
of resource sites in the sample hexagons; these provide  the  detailed monitoring data. This double
sample provides the monitoring data for characterization of status and trends  of the various resources.
This document provides an overview of the EMAP sampling design and grid framework, along with a
discussion of the statistical estimation and analysis procedures.
                                                vn

-------
FORWARD
The Environmental Monitoring and Assessment Program (EMAP) is an Office of Research and
Development program whose goal is to monitor the condition of the nation's ecological resources. The
goal poses a challenge that cannot be met without a long-term commitment to environmental
monitoring on national and regional scales. One component essential to the design of EMAP is the
statistical design and evaluation of integrated statistical monitoring frameworks and protocols for
collecting data on indicators of ecological condition. A series of technical reports is being prepared on
the development of the EMAP statistical sampling design.

The first report on the sampling design, "Design Report for EMAP," gives the conceptual basis, or
framework, for the development and implementation of the EMAP statistical sampling design.
Ecological researchers associated with the EMAP collaborated with statistical researchers to identify
and clarify the monitoring program's objectives and complete the problem identification. Based on
their extensive experience with ecological monitoring and statistical sampling design, the authors
developed criteria that the design had to satisfy to meet the EMAP objectives. The report gives the
long-term design perspective, or prescription, for EMAP. That is, it gives the vision for the statistical
sampling design; with the vision based on the experience of the EMAP researchers and preliminary
assessments of what might be implementable. These researchers are the primary audience for the
report. The report documents the design concept as it existed in May, 1990.

Since implementation of the design is a continuing process, the "Design Report for EMAP" is not a
description of the final sampling design as implemented by the EMAP. The ecological resource groups
are actively implementing the design concept. To aid the implementation process, a companion report,
"EMAP Sampling Design Implementation Perspective and Issues," gives additional insight into the
EMAP statistical sampling design process. The companion report gives further details on
implementation issues and translates the general design perspective into the context of specific
ecological resources.
Anthony R. Olsen, Project Officer
U. S. Environmental Protection Agency
Environmental Research Laboratory
200 WS 35th Street, Corvallis, Oregon

-------
                                            SECTION 1
                                         INTRODUCTION
The  need  to establish  baseline environmental conditions  against  which future  changes can be
documented with confidence has grown more acute with the increasing  complexity,  scale,  and social
importance of issues  such as global atmospheric change, acidic deposition, and the destruction and
alteration  of wetlands.  It is  therefore critical for monitoring  programs  to  be in place to provide
quantitative,  scientific assessments of the complex effects of pollutants on ecosystems.

In  1988,  the  U.S. Environmental Protection Agency's (EPA) Science Advisory  Board recommended
implementation  of a  program within  EPA  to monitor  ecological  status  and trends,  as well as
development  of  innovative methods for anticipating emerging  environmental problems before they
reach crisis proportions.  In response to the need for better assessments of the condition of the nation's
ecological  resources,  EPA's Office of Research and Development began  planning the Environmental
Monitoring and  Assessment Program  (EMAP). When  fully  implemented,  EMAP  will  be able to
confirm  that the nation's efforts to protect the environment are producing  the expected results in
maintaining and improving environmental quality.

EMAP provides  a strategic approach for meeting the growing need to identify and bound the extent,
magnitude, and  location of degradation or  improvement in environmental condition.   When fully
implemented, EMAP will answer critical  questions for  policy- and  decision-makers  and  the public:
What  is the  current extent of  our ecological resources (e.g.,  estuaries, lakes, streams,  forests, deserts,
wetlands, grasslands), and  how are they distributed  geographically? What percentages of the resources
appear to be adversely affected  by  pollutants  or other  human-induced  environmental stress, and in
which  regions are the problems most severe or widespread?  Which resources are degrading, where, and
at what rate? What are the relative patterns and magnitudes of the possible causes of adverse effects?
Do adversely  affected ecosystems show an overall improvement?

In  order  to answer  these  questions, an integrated monitoring network  will be  implemented within,
EMAP with the following objectives:

     • Estimate current status, extent, changes, and trends in indicators of the
         condition of the nation's ecological resources on a regional basis with known
         confidence.

     • Monitor  indicators of pollutant exposure and habitat condition  and seek
         associations between human-induced stresses and ecological condition.

     • Provide periodic statistical summaries and interpretive reports on ecological
         status and trends to resource managers and the public.

Assessments of whether  the condition of the nation's ecological resources is improving  or  degrading
require data on large geographic  scales  and over long time frames.  For  this reason, comparability of
data among geographic regions and over extended time periods is critical  for EMAP, and meeting this
need by  simply  aggregating data from many individual, local, and  short-term networks has proven
difficult if not impossible.  EMAP networks therefore will be designed  to  provide statistically unbiased
estimates of  status,  trends, and relationships  with quantifiable confidence limits on national  and
regional scales over periods of years  to  decades.  This characteristic, along with its statistically  based
design, distinguishes EMAP from  most current monitoring efforts.
                                                1-1

-------
EMAP is being designed around six primary activities:

     •   Strategic evaluation, testing, and development of indicators of ecological
         condition and pollutant exposure.

     •   Design and evaluation of an integrated monitoring framework.

     •   Design and testing of protocols for collecting data on indicators.

     •   Nationwide characterization of the extent and location of ecological resources.

     •   Demonstration studies and implementation of integrated sampling designs.

     •   Development of data handling, quality assurance, and statistical analytical
         procedures for efficient analysis and reporting of status and trends data.


This  design report focuses on the second  activity, design and  evaluation of  integrated statistical
monitoring frameworks and protocols.   The document provides a description of  the design objectives
and characteristics (Section 2), an overview of the sampling design and grid framework (Section 3), and
a discussion of the statistical estimation and analysis procedures (Section 4).
                                                 1-2

-------
SECTION 2
DESIGN OBJECTIVES AND STRATEGY

The overall EMAP design strategy is to implement a permanent national sampling framework that will
enable EMAP/EPA to meet its program objectives. To guide the development of the design, specific
design objectives have been formulated that will allow the resulting monitoring program to fulfill this
goal.

EMAP Design Objectives

* The EMAP design will establish a monitoring program that is capable of
• providing rigorous answers regarding any explicit question on the status and
condition of any regionally defined resource;

• providing baseline data leading to rigorous detection and description of
trends in status and condition of regionally defined resources;

• providing assessment of association among attributes, both within and
between resources;

• quickly responding to new issues and questions.

* This design will be implemented with respect to currently identified resources.

2.1 STRATEGIC APPROACH

The design strategy chosen to meet these objectives is based on the grid of points at which each
ecological resource will be sampled. The area around each point will be characterized by ecological and
land use criteria. The areal extent and numbers of units of the various resource types will be described
in a carefully defined area. The collection of these descriptions at the grid points will constitute a Tier
1 sample and will be used to estimate the structural properties of the regional and national populations
of the resource types. The structural properties include the numbers of resource units, their surface
area, and other geometric or geophysical measures obtainable from remotely sensed imagery.

A subsample of this Tier 1 sample will be used for field sampling of other attributes of the resources.
This double sample will constitute Tier 2 of the design. Field measurements will be taken on many
attributes, for example, chemical analyses of water samples, visual symptoms of foliar damage to
forests, species composition of wetlands, and other indicators of environmental well being, on the
resource units selected in the Tier 2 sample.

The efficient statistical properties of estimates made from the double sample derive in part from the
extensiveness of the Tier 1 sample and the relevance of the information in the Tier 1 sample to the
data in the Tier 2 sample. But a major advantage of this design strategy is that the Tier 1
characterization of areas around the grid points is not restricted to the resources identified a priori.
Other resources that can be identified in the characterization may be sampled at a later time, in
response to a new issue regarding their condition. The primary design device for implementing this
adaptable capability is the sampling grid.
2-1

-------
2.2 EMAP BYWORDS

In recognition of these demands, it is accepted that the EMAP design must be:

• Adaptive. The program must be able to adapt to changing circumstances,
perspectives, objectives, and knowledge, and to correct past mistakes.

• Simple. Simplicity minimizes the difficulties and consequences of adaptive
change.

• Rigorous. Rigor with respect to important considerations will be
established by explicit protocols, and by a program of quality management.

• Robust. Robust methods will get priority over those dependent on
assumptions, and effort will be made to ensure robustness of results
relative to necessary assumptions.

• Unambiguous. The resources described by EMAP will be explicitly defined,
either conceptually or by identification of an explicit frame. '

• Flexible. The protocols and design structures will be flexible and capable of
molding to a wide variety of resource and time/space patterns, and accommodating
existing resource sample designs.

• Adequate. The need to address multiple and changing objectives rules out
optimization; adequacy for all objectives must be a guiding criterion.
Other programmatic components must also fulfill these general criteria and uphold the objectives.
Specifically, identification of indicators, activities of field data collection, data management, data
analysis, quality assessment, and quality management must all uphold the standards set for EMAP.
The best design will be useless if it is not translated into worthy operational protocols.

2.3 EMAP CAPABILITY

EMAP is oriented to regional phenomena. The capacity to describe regional status and trends is
dependent on the capacity to measure status and trends at sites or on resource units, but the emphasis
on regional phenomena focuses EMAP descriptors on estimates for explicitly defined populations and
on representation of pattern for spatial regions and temporal frame. The use of probability samples on
well-defined sampling frames will permit rigorous, unambiguous estimation and assessment of
population attributes. Statistical methods based on explicit models will provide other rigorous
inferences, including spatial and temporal patterns and association among attributes and resources.
The focus on regional phenomona requires that the EMAP design strategy emphasize accommodation
of a wide range of resources and a wide range of explicit questions. The grid-sampling process provides
the capability of sampling any spatially distributed and well-defined resource. Application of the design
to the currently identified resources ensures that the currently identified rules for selection will be broad
and versatile. Randomization of the grid provides the protocol that generates a probability sample and
ensures the desired rigor of population characterization. The triangular structure of the grid provides
minimum distance between grid points and an additional degree of freedom from alignment with
regular anthropogenic forms. The hierarchically structured grid provides the capability of describing
resources at coarser resolution than the base grid, or locally enhancing the grid to sample other
resources requiring higher resolution. Thus there will be a general capability for providing information
on multiple spatial scales: global, national, regional, and local.

2-2

-------
EMAP cannot be designed to optimize the answer to any single question or the representation of any
single resource. EMAP must provide adequate answers for all questions and resources within the basic
framework and adequate flexibility to accommodate increased resolution for specific issues. It is
anticipated that the greatest interest in ecological resources is at the subpopulation level, e.g., in
specific types of wetlands rather than all wetlands. The design strategy should therefore incorporate a
classification scheme that identifies these subpopulations and a sampling structure that provides
estimates of status and quality for them. The flexibility to adequately characterize subpopulations
spanning a wide range, of densities and distributions is an essential property of the EMAP design.

The probability sample will provide the basis for rigorous estimation and population characterization.
However, good scientific descriptions will also require good indicator characterization of sites and
resource units. In general, indicator measurement on sample resource units will not be based on
probability samples, so it will be necessary to provide standards of objectivity and representation.
Further, inferences of population characteristics will be accompanied by quantified statements of
statistical uncertainty that acknowledge all relevant sources of statistical variability.

The establishment of cause-and-effect relationships among stressor, exposure, and response indicators is
not a design objective for EMAP. However, the EMAP design must accommodate the planned
collection of information to investigate associations between known or suspected cause-and-response
ecological indicators. The design must be flexible enough to enable the evaluation of such associations
both within and among resources. The EMAP design should also provide for limited diagnostic
capability via correlations between condition and potential causative factors, thereby identifying areas
for further examination or research.

It is anticipated that there will be changes in resource definition and classification as EMAP progresses.
The EMAP design must be adaptable to accommodate changes or refinements in resource definition or
classification. In addition, implementation of the changes should be without substantial alteration of
the sample or its ability to meet the other EMAP objectives. As an example, the design should
accommodate the identification and correction of errors in the initial classification that will accrue
through the accumulation of improved information developed by the EMAP monitoring tasks.

Substantial amounts of information similar to that collected by EMAP exist in data sets generated by
other monitoring networks, and the EMAP design should be capable of integrating and using such
information. It is anticipated that integration can best be achieved when the other network is based on
a rigorous statistical foundation. However, it will also be necessary for the EMAP design to establish
linkages to networks that use nonprobability sampling designs. The primary advantage of such
associations will be to expand the temporal and spatial resolution of EMAP-derived information and to
provide a common framework for reporting environmental monitoring data.

2.4 TIER STRUCTURE

The EMAP monitoring strategy is based on a hierarchical structure, in which several distinct tiers are
identified.

1. Tier 1.
The grid (Section 3.2) provides the structure for the Tier 1 sample, and the Landscape
Descriptions (Section 3.3.1) provide the data for that sample. Because the individual resources
are not necessarily represented by explicit frames, and because the design is intended to be
capable of sampling any resource, including unspecified ones, a device like the grid is essential.
Even in sampling an explicit frame, the grid is a good way to select a sample to fulfill the
EMAP objectives. The data generated at this tier provide the estimates of regional structural
parameters for the defined resource frames.
2-3

-------
2. Tier 2.
This represents the level of higher resolution characterization of specific resources. The sample at
this level provides resource units selected explicitly for field visitation in a manner to best
respond to questions regarding status and trends of the specific resources.

3. Tiers 3 and 4.
These levels are not distinct in the thoughts of planners at this time. Collection of field data on
individual units, high resolution study of trends, process studies and other functions are all
lumped together. As the EMAP planning process proceeds to the later stages, attention will be
given to organizing these topics and making the roles of Tiers 3 and 4 explicit. The original
interpretation of Tier 3 as the level of annual measurements for trend detection has been greatly
modified by development of the interpenetrating design (Section 3.4), which has moved trend
detection into Tier 2, and by investigation into the required frequency of resampling .

2.5 AN EMERGING PERSPECTIVE OF EMAP

As we have progressed in EMAP design, and as the basic design issues are resolved and more subtle
issues emerge, a clearer perspective of the capability and potential of EMAP monitoring has begun to
take form. One aspect of this perception derives from our investigation of the need for annual
observations on some systems. The fundamental tradeoff is between annual observations on n systems,
or observations every k years on kn systems. As we investigated this tradeoff, the perception grew
increasingly strong that there are many advantages to a broader and more comprehensive coverage of
the populations that we want to characterize. The perspective that remains is that EMAP, in the
design proposed, has the capability of discovering trends and insult in subtly defined and spatially
diffuse populations that are not likely to be recognized without a design like that proposed for EMAP.
This capability is enhanced by increasing the number of systems periodically monitored by EMAP.
Less frequent visitation is one way of gaining increased numbers of systems.

Another aspect is that EMAP is capable of providing information not available in other monitoring
programs, and of putting that information into a more useful and integrated form. EMAP design
objectives should be oriented toward these novel capabilities, and not toward capabilities tha.t already
exist elsewhere. Strong trends and flagrant insult will be detected without EMAP, so their detection is
not a necessary design objective.

Finally, we perceive that we have just begun to plumb the potential and general capability of the
EMAP concept. It is important to think of EMAP as an evolutionary system that will evolve as EMAP
steersmen exploit its adaptability in responding to new and newly perceived issues. In order to remain
vital, and not succumb to the aging process, EMAP requires a structure, which might justifiably be
thought of in Wienerian terms as cybernetic, that uses that adaptability to fullest advantage. At this
point it will suffice to focus on the EMAP capability that does not reside elsewhere, and think deeply
about those things that need EMAP in order to be discovered and described. These are the things that
should establish EMAP directions.
2-4

-------
SECTION 3
SAMPLING DESIGN
3.1 DESIGN OVERVIEW
The resources of the United States are sampled via a triangular point grid. This grid identifies
approximately 12,600 locations at which ecological resources will be cataloged, classified, and studied
with respect to condition and status and to trends in condition and status. Most of this descriptive
effort will be focused on the areas immediately surrounding the grid points. These activities are divided
into two tiers, the first directed toward information that essentially can be collected using remote
sensing (e.g., aerial photography, Landsat images, maps, and other existing information) and the
second directed largely toward more intensive data collection at sites selected to represent specific
resources.

At Tier 1, the area surrounding the grid points will be characterized. It is convenient to think of these
characterizations as consisting of two separate data sets, although one is really a subset of the other.
Specifically, landscape descriptions of the 40-km2 hexagon centered on each grid point provide the base
Tier 1 dataset. These hexagon (40-hex) descriptions constitute a probability area sample of the United
States. From these descriptions will be generated regional estimates of areal extent of all ecological and
landscape classifications and regional estimates of numbers of entities of all discrete ecological objects
of interest, such as lakes, stream reaches, or prairie potholes. Any classification available at this level
carries into the structure of these estimates.

This Tier 1, 40-hex description also provides a sample from any explicitly defined resource frame, such
as the frame of lakes between 10 and 2,000 ha in the United States, or the frame of primary sampling
units in the National Agricultural Statistics Service (NASS) area sample. In this capacity, the
landscape descriptions supply the Phase I sample for each of the resources identified, to be sampled in
the current implementation of EMAP. This function requires only a partial completion of the full
landscape descriptions. A feasible implementation strategy generates the Tier 1 sample for the various
resources and then completes the rest of the landscape descriptions at a later time. But if the landscape
descriptions are completed before any of the Tier 1 resource samples are identified, the latter will be
included in the descriptions.

This perspective makes clear the most powerful property of this design: the generation of Tier 1
samples of yet unidentified resource frames. Just as the landscape descriptions provide a sample for any
identified frame, they also provide samples for frames that have not yet been defined, as long as the
specification of landscape attributes is general enough to include the frame identifying attributes. This
is justification for the great generality of the landscape descriptions, providing the capacity for rapid
response to novel questions.

A Tier 2 resource sample will be a subset of the Tier 1 sample for that resource, selected by probability
methods for more intensive data collecting, and typically involving site visits. The data collected at
Tier 2 will be the basis for reporting on regional status and trends in indicators of ecological response
and pollutant exposure, as well as other attributes recorded at this level. Unless there is a specific
question regarding co-response of two resources, the Tier 2 resource samples will be independent.

Rare resources will provide a small Tier 1 sample that is incapable of supporting a sufficiently large
Tier 2 sample. For such cases, the grid will be enhanced by one of several available enhancement
factors, solely for this resource, and solely in the necessary area to cover this resource. The enhanced
grid will also be regular and triangular.
3-1

-------
The characterization of one-sixteenth of the total area of the United States will provide baseline
descriptions of the ecological resources of interest, and a general landscape basis for characterizing
landscape change over time. Specific resource samples extracted from the general descriptions provide
the basis for describing status and detecting trends relative to that resource. The universality of the
grid and the generality of the descriptions ensure that any resource is sampled by the design, and that
a Tier 2 sample can be constructed for any newly identified resource with a minimum of effort and
disruption.

3.2 THE SAMPLING GRID

3.2.1 The Grid

The point grid established for the United States is triangular, with approximately 27 km between
points in each direction. A fixed position that represents a permanent location for the base grid is
established (Figure 3-1), and the sampling points to be used by EMAP are generated by a slight
random shift of the entire grid from this base location. The full descriptions and rationale for the
cartography and geometry of the grid are given in White et al. (1991).

3.2.2 Geometric Design Criteria

The design objectives were translated into a set of geometric specifications required for the grid: (1)
equal area sampling structure using a regular placement of sampling locations, (2) compact
arrangement of sampling locations, (3) hierarchical structure, and (4) a realization of the grid on a
single planar surface for the entire domain of application.

Regularity in the planar projection is required for the randomization step prescribed for EMAP. A
regular placement of points on the plane can be achieved either with a square or a triangular lattice.
Hierarchical decompositions are available in both square and triangular spacing.

The triangular structure is slightly preferable to the square in compactness, evenness of spatial
coverage, and precision of spatial estimates (c.f., Olea, 1984). The square structure has the advantage
of being implemented in many computer hardware devices and software data structures. The triangular
arrangement has the advantage of an extra degree of freedom from coincidence with anthropogenic
structures such as public land survey lines. The balance is slightly in favor of the triangular
arrangement, and EMAP will use a regular triangular grid extending without interruption across the
entire domain.

3.2.3 The Global Structure of the Design

Although the initial EMAP objective was the conterminous United States, with potential extension to
Alaska and other areas, there was from the beginning a perspective that the design for the United
States should be considered in the context of whatever global designs were in existence. Some global
models do exist, with varying properties and purposes, but none fit the design criteria set for EMAP.

These criteria for the United States alone are met by the standard map projection for the conterminous
United States, the Albers projection. This projection has the equivalent, or equal area, property (the
ratio of two areas on the sphere to the corresponding areas on the projection is constant) and a
relatively low degree of scale distortion (resulting in low distortion of shape and distance). Scale
distortion is within 1% for most of the United States and is exceeded only in the southern parts of
Florida and Texas. However, the distortion degrades sharply into neighboring areas of Canada and
Mexico. The search was extended in the hope of getting a projection that was better for neighboring
areas, and also extended to Alaska.
3-2

-------
Figure 3-1.  The base grid placed advantageously on the United States.





                                                3-3

-------
Other projection systems in common use are also unsuitable for various reasons. The Universal
Transverse Mercator projection, for example, is neither equal area nor continuous across the
conterminous United States. It consists of 6 degree meridional zones. Consideration of a number of
proposed global tessellations1, or other systematic treatments of areas on the globe, has resulted in
identification of one that provides good representation for the United States and neighboring regions,
including Alaska, and can be used adaptively for other parts of the world. This tessellation is based on
the semi-regular geometric solid named the truncated icosahedron, which is a modification of the
regular icosahedron, the 20-sided regular platonic solid.

The truncated icosahedron starts with the 20 equilateral sides of the icosahedron and replaces each
vertex of this model with a regular pentagon. The resulting solid has 20 hexagonal faces and 12
pentagonal faces. This solid is the basis for the most common construction of the surface of a soccer
ball (Figure 3-2). Each hexagonal face of the truncated icosahedron may be considered a domain for
implementing the EMAP sampling grid. Neighboring plates can be contiguous along one edge, but will
not join with all adjacent plates in the plane. The regular grid on a hexagon will lose some regularity
when extended onto pentagons. Additionally, there is a minor sampling irregularity at the edges of two
plates.

One hexagonal face, when appropriately centered on the United States, covers the land area and part of
the adjacent continental shelf of the conterminous United States, southern Canada, and northern
Mexico. On this hexagonal face, an equal area map projection (Lambert's azimuthal equal area) was
constructed. This projection has a maximum scale distortion of approximately 1.1% at the center and
at the vertices of the hexagon.

As a global design, this configuration has some disadvantages. A polar orientation does not exist that
is satisfactory for the primary objectives of EMAP, and the orientation chosen for EMAP provides poor
representation for a number of areas on earth. However, it is possible to generate plates for the several
areas, more or less independently of each other, using the general approach adopted for EMAP. An
adaptive design based on this model will provide the needed representation in any area.

3.2.4 Baseline Grid Density and Augmentation

The hexagon covering the United States has been chosen as a plate from the truncated icosahedron and
measures approximately 2,599 km on each side (Figure 3-1). Each side of this hexagon and each radius
connecting the center with a vertex is divided into 96 equal parts. A triangular grid is then
constructed by connecting the endpoints of the 96 parts along each of the three axes of the six
equilateral triangles making up the hexagon. The endpoints plus the intersections of the constructed
grid form a triangular grid having 27,937 points in the full hexagon. The distance between the points
on this grid Is approximately 27.1 km.

The Dirichlct (Theissen, Voronoi) polygons constructed on the grid points are hexagons. These
hexagons tile the plane and have an area of approximately 634.5 km2. Characterization hexagons of
l/16th this size (approximately 39.7 km2) will be centered on the grid points. There will be
approximately 12,600 grid points in the conterminous United States, thereby establishing the grid
density for baseline sampling in EMAP.

Augmentation of the baseline grid to increase the density of sampling for certain rare resources can be
achieved through the hierarchical geometrical properties of triangular grids. In any specified region,
the grid can be enhanced by the insertion of additional locations in a systematic pattern that is a factor
of 3, 4, or 7 times that of the baseline density. Products and quotients of combinations of powers of
these factors will also provide enhancement factors. For the factor of 3, a new point is placed in the
A tessellation is a pavement or tiling by a mosaic pattern.
3-4

-------
Figure 3-2.  The truncated icosahedron model projects the familiar soccer ball tessellation pattern onto
the earth.
                                                 3-5

-------
center of each triangle defined by three adjacent points. For the factor of 4, a new point is placed in the
center of the line connecting each pair of adjacent points. Additional enhancements are described in
White et al. (1991).

The hierarchical structure is also useful for identifying interpenetrating subsamples (Section 3.4). The
factor of 4 is prescribed, leading to 4 interpenetrating subsamples that are visited in successive years,
with a repeating cycle length of 4 years. However, 3, 7, and 9 are also feasible factors. :

3.2.5 Randomization

The randomization of the grid will be achieved by a random shift in the plane of the entire grid of
points. As the design is systematic, a single random translation will be chosen and all points will
translate systematically. The restriction that randomization be a feature of EMAP is important, in
that this is the basis for the strict requirement of grid regularity. This restriction is also important in
that it provides the protocol that establishes the EMAP systematic sample as a probability sample.
Other ways could have been used to get a probability sample; this probability sample also provides the
advantages of a systematic sample.
3.3 THE SAMPLE

The Tiers (Section 2.4) impose the dominant sample structure, and many of the implementation tasks
are organized by this structure. The interpenetrating design structure is separable from the general
sampling considerations, and is discussed in Section 3.4.

3.3.1 Tier 1. Landscape Descriptions

One-sixteenth of the area of the United States will be characterized, primarily on the basis of remote
sensing data (e.g., aerial photography, Landsat images, maps, and other existing information). These
characterizations provide baseline descriptions of all ecological resources, identify resources and
populations of concern, provide a base for assessment of landuse change, and provide a Tier 1 sample of
all resources, from which a subsample will be taken for assessment of resource status and trends.

Landscape descriptions (LDs) will be made on hexagons (40-hex's) constructed and centered on the grid
points to have area (39.7 km2) exactly one-sixteenth of the area per point in the base grid. Physical,
biological, organizational, and anthropogenic attributes will be described as part of the characterization
(for examples, see Section 5). These descriptions will be in the form of (1) maps of the areal extent of
the several landscape, landuse, and resource classes prescribed, (2) numbers, location, and classification
of discrete resource units, and (3) identification and measurement of a variety of attributes of landscape
and resource structure and organization.

The LDs have several functions with respect to fulfillment of EMAP objectives. As area samples, they
provide the data for estimation of certain regional resource and landscape parameters. They also
characterize the Tier 1 (or first phase or first stage) sample that will be subsampled for further
characterization of resource sampling units or sampling sites, leading to regional estimates of other
parameters of those specific resources. Additionally, these LDs must provide sufficiently general
descriptions of the ecological resources that it will be possible to identify, at a future date, other
resources for which new questions have arisen.

3,3.2 Discrete and Extensive Resources

Resources will be divided into two categories, depending on the size of the natural unit. If the natural
unit is less than 2,000 ha (roughly half of a characterization hexagon), it will be categorized as a
discrete resource. Most wetlands, lakes, and stream segments are discrete resources. If the natural unit

3-6

-------
is 2,000 hectares or larger, it will be called an extensive resource. Many estuaries, forests, large lakes,
and large wetlands (e.g., the Everglades) are extensive resources. Large rivers will be a special class of
extensive resource. It is necessary to recognize these differences because the methods of sampling and
characterization are often different for the several forms. Sets (populations) of discrete units can be
characterized in terms of unit characteristics, whereas extensive resources will be represented on an
areal or other basis.

The class of extensive resources also contains a special subclass of interest, being the class of continuous
resources. The major examples of continuous resources are water and air, although many others may
have similar sampling properties.

These different classes of resources require somewhat different sampling strategies and different
descriptors, and these features enter into design considerations. We also note that some populations on,
say, Chesapeake Bay are continuous and others extensive, so that the designs must be very flexible.
The consolidating feature is that each resource is sampled in some manner in the neighborhood of each
point in the grid, unless the resource is not found in that neighborhood.

3.3.3 Sampling Considerations

The intent to use the landscape descriptions for estimation and further sampling puts certain
requirements on the nature of the records made. Each 40-hex will be partitioned into the areas
occupied by the various resource and land use classes. This requires digitization of the boundaries and
involves a certain degree of subjective interpretation of the exact location of many of the boundaries.
All uses of these classes must accommodate their fuzziness.

Identification of the numbers of resource units in a 40-hex requires that all resource units be
represented as "being" at a point. This location point can be arbitrary, if it was set without knowledge
of the grid location, as in the digital line graphs (DLGs). Otherwise it must be objective. One such
objective definition would be the centroid of the digitized unit boundary; another would be the centroid
of the tangential NSEW rectangle.

The ways in which these records will be used in the sampling process require that certain care be taken
in generating these data in the beginning, and a certain reluctance in modifying them in the future. If
the digitized boundary of a unit is changed, it will not be necessary to change the original identity of
the point position of the unit. However, other changes will have greater impact on sample properties.
These are all considerations of the long-term adaptability of the design.

3.3.4 Tier 1 Sample of Extensive Resources

Extensive resources, e.g., lakes over 2,000 ha in area, can generally be sampled by a list frame.
However, these are also sampled in a meaningful collective manner by the 40-hexes in the area-
sampling mode, and otherwise by the point grid, in a point/spatial mode. Specifically, the area sample
that derives from the LDs will contain sample areas of large lakes, simply as a landscape classification.
These samples lead to estimates of areal extent of the class. Field observations to determine higher
resolution characteristics will lead to estimates of the population parameters for those characteristics.

Sample areas included in the 40-hexes can be the basis for location of quadrats, transects, or other
surrogates for discrete sampling units on which to make field records of indicators. Via the area-
sampling mode, samples selected in this manner can be carried into Tier 2 samples in the exact manner
of that prescribed for discrete resources.

For the special case of continuous resources, such as estuaries, the samples may be point samples. Point
samples will also arise in "plotless" methods, as are often used in forestry. Thus, for certain
noncontinuous, extensive resources, the locations making up the Tier 1 samples will be points in space

3-7

-------
determining where observations will occur, in a manner similar to the point designs for continuous
resources.

It will often be of interest to characterize entire specific extensive resources, such as the Everglades or
Chesapeake Bay. In such cases, grid enhancement (Section 3.3.5) will often be a desirable option in
order to gain greater precision for spatial representation of pattern. The sampling plan for Near
Coastal, currently 5n review, contains explicit plans for monitoring such extensive resources, with
illustration of grid enhancement and other features of interest.

3.3.5 Tier 1 Sample of Discrete Units

Discrete resource units (e.g., entire lakes, entire wetlands) identified in the LD data will make up the
(full) Tier 1 sample for the respective resources. The set of materials derived from the LDs regarding
those resource units constitutes a significant part of the landscape descriptions for the various resources.
It is important to identify this as a subset of the LD data, but also important to recognize that this
particular dataset could be generated, in its entirety, before any of the rest of the LD data are
generated. These data are a subset of the whole, but not dependent on the whole except through
protocols of generation.

As a Tier 1 activity, each of the Resource Groups (Wetlands, Surface Waters, Great Lakes, Forests,
Agro-Ecosystems, Near Coastal, and Arid Ecosystems) will determine the specific classes of resources to
be distinguished in the LDs. There will be from few to many classes for each of the resources
represented by the several Groups. It is necessary to recognize that some of these classifications will
have significant misclassification error; such a classification would not make good stratification.

3.3.5.1 Tier 1 Resources

Also as a Tier 1 activity, each of the Groups will designate certain classes as Tier 1 Resources. The
decision regarding these designations does not have to be made prior to landscape description, but
rather can be based on the LD data. Any resource that is of particular interest, or disinterest; any
resource that is too rare or too spatially limited to get adequate representation; any resource that has
the potential of screening others of greater interest; any resource that requires special treatment for one
or another reason can be designated as a Tier 1 Resource. Tier 2 samples will be taken from the Tier 1
Resources in a manner that establishes them as strata.
3.3.5.2 The Process of Selecting the Tier 2 Sample

There are several steps between the generation of the full Tier 1 sample, as derived from the LDs, and
the final selection of the Tier 2 sample. For each Tier 1 Resource, identify the Tier 1 40-hexes in which
the resource Is present and perform these two steps:

1. Select a single resource sampling unit at each grid point (40-hex).

2. Select a subset of 40-hexes.

This two-stage selection process can be performed in either order. As the rules of selection at each step
can be varied, there are a number of possible designs for the process. In particular, several different
rules (rules of association) have been proposed for associating a resource unit with a grid point:

1. Select the unit in the 40-hex closest to the grid point2,
7Tt- = a,-/634.5, where a,- is the area of selection of the unit.
2See Section 4 for statistical notation.
3-8

-------
2. Select the unit that contains the grid point,
7rt- = a,-/634.5, where a,- is the area of the unit.

3. Select a unit with equal probability from the set in the 40-hex,
TT,- = l/16ni, where n,- is the number of resource units in the 40-hex.

3.3.5.3 Design Option 1

The first design considered did not depend on the LDs. Specifically, a rule of association was prescribed
that would select a resource sampling unit (RSU) at each grid point, and then this sample of RSUs
would be subsampled to yield the Tier 2 sample. As the utility of the LDs became more apparent, it
became clear that the third rule of association, above, was not best utilized with this option.

For each resource of interest (e.g., a specific type of wetland) and for each 40-hex within which the
resource occurs, one resource unit will be selected. This collection of sampling units was called the Tier
1 sample for that specific resource. (This usage is out of date; now the entire set of units in a 40-hex is
considered to constitute the Tier 1 sample.) The Tier 1 inclusion probability for the selected unit is
carried as part of the Tier 1 data record for that unit.

Recall that the Tier 1 sample for this resource now contains no more than one unit for each 40-hex,
and still contains all the 40-hexes that had the resource. The next step selects a subset of this sample of
units (or of the 40-hexes) to obtain the Tier 2 sample. Two criteria are specified here:

• Sample with conditional inclusion probability inversely proportional to
the Tier 1 inclusion probability.

• Sample in a manner that retains the spatial distribution of the Tier 1
sample.

Each of these criteria imposes constraints on the selection. The first imposes an upper limit to Tier 2
sample size, and must occasionally be relaxed in order to obtain the desired sample size. The second
imposes minimum sample size.

Other options may differ statistically from this one, and there are other sequences of steps that will
produce an equivalent sample. The following alternative is of particular interest because it offers
adaptive advantages.

3.3.5.4 Design Option 2

First, employ the step reducing the number of 40-hexes, and then select an-RSU from each of the
selected 40-hexes according to one of the rules of association. This switches the order of the two steps in
Section 3.3.5.2 from the order used in the first option (Section 3.3.5.3). There appears to be no
advantage to this arrangement when the first or second rule of association is used. However, it does
seem advantageous to use this order if the third rule is used, and the following example is developed
with the third rule. ,

Given the set of 40-hexes containing units of a particular Tier 1 resource, select a subsample of these
40-hexes with probability proportional to the number of units. Employ spatial constraints in the usual
manner. Then select a single unit from each of the selected 40-hexes with equal probability. The
resultant sample is identical to the sample derived from the first option, when that option uses the
association rule of a single random unit from each 40-hex. The difference is that the full set of units in
each 40-hex has been carried into the second stage of this selection process, 'and so will be available for
A method of sampling that retains spatial distribution is outlined in Section 4.2.5.
3-9

-------
modification of the sample at a future time. (The comparison of the possibilities is not complete, but
recognition of this option puts a new light on the choice of selection rule.)

3.3.5.5 Wrap-up

Because of the way that these options relate to each other, and the way they compare, it is reasonable
to take the view that Option 1 should be used only with association rules 1 and 2, and Option 2 should
be used only with association rule 3. Subsequent treatment in this report will follow this identity.

3.3.6 Augmenting the Grid

Certain resources may occur in relatively few of the 40-hexes. This will happen when the resource is
highly localized (e.g., redwood forest) or rare (e.g., some types of wetlands). One way to obtain a
larger Tier 1 sample with good spatial coverage is to increase the grid density over the area occupied by
the resource. The increased grid density will be accomplished by augmenting the original grid (keeping
the original grid points and adding others; see Section 3.2.4). The enhanced grid points will receive
landscape description only for the resource for which enhancement was made.

3.3.7 Subpopulations of Interest

The Tier 1 resources have been sampled as Tier 2 strata. Given a Tier 1 sample for a discrete resource,
generated in the manner of the previous section, certain subclasses may also be sufficiently important
to be considered as strata. Although it is possible to impose another level of stratification below the
Tier 1 resources, the current perception is that classes needing special consideration should be made
Tier 1 resources.

In considering this, it is necessary to be aware that any subclasses of a Tier 1 resource can be
characterized by subpopulation estimation. Further, the fewer strata, the better, unless there is clear
conflict among objectives that requires stratification for resolution. When the Tier 2 sample is visited,
additional classes will be identified, so that the resultant estimates can also reflect populations
constructed on any of these Tier 2 classes.

There is a natural hierarchical structure in these three levels of classification that is reinforced by the
fact that Tier 1 resources have been treated as strata. Thus all subclasses within Tier 1 resources are, in
a sense, nested within those strata. Further, if the subclasses identified at Tier 1 are used as sub-strata,
this imposes a second hierarchical level on the sample.

However, there is no need for the subpopulations identified by the classifications -to be conceptually
hierarchical or mutually exclusive; a subpopulation of interest may interpenetrate two strata. In
technical jargon, the union of any set of classes, constructed as intersections of any classification, can
be a subpopulation of interest. But the disadvantage that accrues from combining population estimates
over strata is one of the major reasons for minimizing stratification. .

3.3.8 Tier 2 Characterizations

Tier 2 is oriented toward collecting information at the Tier 2 RSUs or sites. Physical, chemical, and
biological measurements made at Tier 2 will form the basis for estimating regional and national
estimates of status, change, and trends, and for identifying additional subpopulations of interest. For
Tier 2 characterization, each resource group is evaluating, selecting, and developing measurements of
overall resource condition (response indicators), measurements related to pollutant or other exposure
(exposure indicators), and measurements of possible sources of exposure (stressor indicators).
Associations between these indicators will be of particular interest for studying possible causes of
change. Details of the EMAP approach to indicators is provided in Hunsaker and Carpenter (1990).
3-10

-------
3.4 INTERPENETRATING SAMPLES

To complete the design specifications, it is necessary to address the issues of numbers of units and sites
to be included in the Tier 2 sample, and also the schedule of revisits, for repeating measurements on
the same units.

EMAP is designed both to describe current and ongoing status and to detect trends in a suite of
indicators. These two objectives have somewhat conflicting design criteria; status is ordinarily best
assessed by including as many population units as possible in the sample, while trend is ordinarily best
detected by repeatedly observing the same units over time. Meeting both objectives may require some
trade off between the designs that are best suited for each objective. One mitigating factor is the simple
fact that for faint trends there is little value in observations on an annual basis; annual observations
are better suited to strong trends.

Other considerations also are involved in this design decision regarding the temporal/spatial pattern of
the monitoring activity. An early proposed option was the regional rotation. In this option, all Tier 2
sites in a region covering approximately one quarter of the United States would be visited in one year.
Sampling efforts would shift to other regions during successive years, with regions and sites being
revisited in a four-year rotation cycle. Regional blocking satisfies certain logistic concerns.

3.4.1 Interpenetrating Subsamples

The interpenetrating design was devised as an alternative to the regional rotation. This option proposes
to block the sample according to the fourfold decomposition of the grid into four interpenetrating
subsamples (Figure 3-3). This decomposition would apply to both the Tier 1 and the Tier 2 sample. In
the first year, the first interpenetrating Tier 2 sample would be implemented over the entire continental
United States. The second year, Tier 2 sites from the second interpenetrating subsample would be
visited, and so on (Table 3-1). In this manner, all of the Tier 2 sample sites would be visited during a
four-year period. A second cycle could begin in the fifth year with revisits to the first interpenetrating
subsample, and this pattern would continue indefinitely.

This approach ensures nearly uniform spatial coverage for each annual subsample. The four subsamples
"interpenetrate" the spatial structure, and whatever other structure exists, and provide annual
estimates of population parameters over every geographical region and over every identifiable
population, no matter how dispersed it might be. The Tier 1 subsamples are highly related. The degree
of relation between the Tier 2 subsamples depends in part on the manner of their selection and
especially on the degree of spatial control in that selection.

Some other rotating panel and partial replacement designs have been considered; our conclusion, based
on the preliminary results, is that the interpenetrating design is by far the most favorable of those
considered. However, there is an appreciable advantage to a low rate (~10%) of annual repeats during
the first cycle of the interpenetrating design. This advantage accrues from site pairing that eliminates
many sources of variation, but it is very interesting that a greater percentage of repeats, or carrying the
repeats beyond the first cycle, has little benefit. Once some sites have been revisited at the period of the
cycle, there appears to be negligible advantage to annual revisits.

The interpenetrating alternative is proposed for EMAP because of its estimation and reporting
advantages. During the first year, the interpenetrating sampling alternative provides national and
regional estimates of condition, with higher resolution estimates available as more sites are visited in
successive years. An important advantage of the interpenetrating sampling design is in the estimation
of regional and national trends through time (see Section 4). Generally, EMAP will focus on faint
trends that do not result in immediate catastrophic changes. Such trends require some time before the
3-11

-------
        o    •
     A— -o-
                            A    n
  --O—-•    O    •
-A    n
n
        o   m--—o—-m    o   m
     A    n    A    n    An
        o   m    o    m    o   m
     A    a    A    n    A    n
        o   •    o    •     o   •
4—-n—-4    o    A    n
  \        f
  b    *'-—G-—p    o    m
n   X   ti    A    n    A
            \   /
             \   f
   m    o   ¥   o    •    o
A   n    A    n    A    n
   o    •    o   m    o    m
n   A    n    A   n    A
   •    o    m   o    m  .  o
            Incorrect alignment
                                             Correct alignment
                                           O
Figure 3-3.  The structure of the EMAP grid, illustrating the fourfold decomposition that will be the
basis of the interpenetrating sample design.
Table 3-1.  Schematic of the rotating interpenetrating design prescribed for EMAP. The subsamples
arc identified by the fourfold decomposition of the triangular grid, as in Figure 3-2.
SubsampleJ. Year-
                                  7    8    9    10    11    12    13
1
2
3
4
XXX X
X X X
X X X
X X X
                                          3-12

-------
cumulative change is detectable, and as great a population coverage as possible is needed in order to
isolate subpopulations that may respond differently than others. The interpenetrating design is well-
adapted to detecting persistent, gradual change of diffuse subpopulations and to accurately representing
the trajectory of status variables.

3.4.2 Ramping Up in the First Cycle

The logistics of implementing a new monitoring program apparently dictate that some form of "ramp-
up" strategy be employed. This issue is still being investigated, and several options for ramping up to
full EMAP monitoring effort are being discussed. Ramping-up involves the scheduling of the landscape
description part of the Tier 1 sample selection, as well as training and deployment of field crews and
the logistics of field data collection. Additionally, the various options involve statistical issues that
have significant bearing on the choice. It seems clear that some kind of ramping is required, but just
how this will be accomplished is yet to be determined.

The design-preferred (as opposed to logistics-preferred) implementation of the interpenetrating design is
completion of characterization and selection of Tier 1 sites for the first interpenetrating subsample
before initiating Tier 2 field work. The same restriction would apply to the second grid subsample, and
to the third, etc. The option of regional ramping at the Tier 2 field level creates a number of design
difficulties and should not be entered into without full awareness of the statistical consequences.

One ramped option that has not been given sufficient consideration is based on resources and
indicators. Any of the ramped options involve ramping on resources and indicators, and this is one of
the attractions. It might be feasible to implement the full interpenetrating sample, but with a reduced
field load, by judicious selection of indicators. This option would bring some populations and estimates
on line later than others, but would not disrupt the selection and estimation protocols.

Although supplementing the interpenetrating design with a few annual repeats during the first cycle
appears to be desirable, the situation does not look favorable for such a plan. If the basic effort cannot
be mustered to jump into the full interpenetrating design, supplements appear to be out of the
question.

3.5 EXISTING INVENTORIES AND MONITORING PROGRAMS

Existing inventories and monitoring programs can be useful to EMAP in several ways. They can
provide background and historical information of interest to EMAP. They can provide estimates that
fulfill the objectives of EMAP, either completely, or for specific subpopulations. They can provide an
explicit frame that covers EMAP resources, either completely, or in part.

In some cases, EMAP objectives will be fulfilled by extracting information from the reports generated
by existing networks. In others, cooperative agreements may provide for additional data collection so
that the modified network better represents EMAP objectives. In still others, fully cooperative
monitoring programs may be devised to fulfill the objectives of each program.

The EMAP grid is capable of sampling the frame of an existing network or of sampling the monitoring
sites of that network. This capability provides for a variety of supplementary or verification designs in
interfacing the existing network with the EMAP monitoring programs. At the extreme, EMAP could
have an independent monitoring net on the existing frame. The possibilities are many.

The EMAP sampling grid provides both remotely sensed descriptions of resources and a convenient
(conceptual) frame for sampling at Tier 2. For certain EMAP resource groups, rigorous probability
monitoring programs may already be in existence that, with some supplementation, could replace the
need for Tier 2 sampling that is based on the EMAP sampling grid.
3-13

-------
Use of existing frames, or nets, requires certification of coverage of the existing frame for the EMAP
resource. If the existing frame incompletely covers the resource, then it is necessary to supplement that
frame to provide coverage of the complementary subset of the resource and to implement a sample on
this frame supplement. Such a supplementary sample may be required in association with the use of
any existing frame, in whatever capacity.

3.5.1 Examples of Existing Programs

The National Surface Water Survey (NSWS) provides valuable background information on the
susceptibility of lakes and streams to acidic deposition for specific target populations. These surveys
covered only parts of the general surface water resource, but the methodology and frames are of
interest. Many of the sampling and estimating protocols to be used in EMAP were initiated in the
NSWS. Probability samples were used throughout.

The USDA Forest Service maintains an ongoing national forest monitoring program, the Forest
Inventory and Analysis (FIA) program. Additionally, that agency is in the process of initiating
collection of forest health data. Coordination between EMAP and the Forest Service presents a good
opportunity to develop an integrated design that fulfills the objectives of both agencies. The design of
FIA is not strictly probabability based, and the utility of the network in EMAP has not yet been
established.

The USDA Agricultural Research Service maintains an annual monitoring program, through the
National Agricultural Statistics Service (NASS), based on a probability area sample that uses a well-
defined area sampling frame. EMAP is exploring several options for integrating the AGRO\EMAP
design with the NASS frame and sample. Additionally, the USDA Soil Conservation Service (SCS)
maintains a monitoring program on soils, designated the National Resources Inventory, which also has
potential for coupling with EMAP.

The USDI Fish and Wildlife Service National Wetlands Inventory (NWI) provides periodic estimates of
wetland extent by wetland type. NWI assistance and material is being used to obtain frame material
and population descriptions for the EMAP Wetlands Resource Group.

Many other ongoing programs being conducted by governmental agencies provide potential EMAP
interfaces. For the most part, these are nonprobability designs, and must be used in a supporting role.
There may be a few cases in which information from existing programs can be incorporated even
though the samples were not collected using probability sampling, or else the essential details of the
sampling design are no longer available. The term "found data" has been used for such data that
cither did not come from a rigorous design or have in some sense lost their "identity." A strategy has
been proposed for incorporating such found data into a rigorous monitoring program, but the process
requires extensive data on the sampling frame, considerable effort, and explicit assumptions (Overton,
1990). Although this strategy may be an attractive option only in a few circumstances, it is important
to have the strategy available for use in a long-term monitoring program.
3-14

-------
SECTION 4
ESTIMATION AND ANALYSIS

The kinds of analytic output that will be generated by EMAP are dictated by the specific objectives of
the various EMAP programs, with some constraint imposed by the design. The algorithms of
estimation and analysis are specific for the design and oriented to the nature of the needed output. It is
appropriate for these algorithms to be made explicit at the time design decisions are made, in order to
allow any constraints to be considered in the decision process.

4.1 ESTIMATION AND DESCRIPTION

Generation of descriptive statistics is simplified by use of Horvitz-Thompson (HT) formulae, which
reduce all design features to specification of the inclusion probabilities. The inclusion probabilities are
knowable for any probability sample, and the strict requirement that EMAP Tier 1 and Tier 2 samples
be probability samples has ensured that these formulae can be used. Horvitz-Thompson estimation is
provided for all basic population parameters, but it is necessary in many cases to make approximations
in estimating variances. These approximations will also be kept in the form of HT algorithms, so that
no change in the computing algorithms will be necessary. Several documents address the nature of
these approximations and their adequacy in the context of EMAP-like surveys (Stehman and Overton,
1987a,b; Overton and Stehman, 1987).

Inclusion probabilities are of two kinds, first order and second order. First-order inclusion probabilities
are simply the probabilities with which the individual sampling units are included in the sample. These
must be known for each selected unit, and the data record for each unit must include the value or the
information necessary to determine the value. This information is generated as a product of the process
of sample selection and will be archived at that time. The symbol iru (referring to the sampling unit u)
or TTf (referring to the ith sampling unit) will designate the first-order inclusion probability. When
dealing only with sample notation, the subscript "i" is unambiguous and will be used.

Second-order, or pairwise, inclusion probabilities are the probabilities with which two specific sampling
units are included in the sample. These are designated as TT^, with obvious extension of the notation,
referring to the probability of simultaneously including units i and j in the sample. In this document
and the supporting documents, TT,^ usually will be calculated as a specific function of TT,- and TT •.
Certain design features may be required to make this calculation, for example, stratum or cluster
identification and sample size. That information will also be retained as part of the individual data
record, so that storage and use of the data are kept uncomplicated. In some circumstances, it may be
necessary to record and archive the explicit ir^.'s, and that specification must be a component of the
design established for these circumstances.

4.1.1 Formulae

Estimation formulae are simplified by use of weights (w) rather than inclusion probabilities, where

w,- = I/*,. and w,.,,. = 1/x.j .

In practice it is appropriate to archive the weights rather than the inclusion probabilities as part of the
data set. The HT estimating formulae are then expressed as:
= £ w«y,.
(1)
4-1

-------
V(Ty) = £
i€S
-i) + £
£
J6S
(2)
where y is any attribute and Tj, is the total of that attribute over any specific identified population.
The summation is restricted to the specific set of units, S, in the sample or any subset of the sample
defined by a specific population. Estimates over subpopulations are thus provided by subsetting the
sample. Estimates of numbers of units in populations are provided by setting y=l.

The variance formula (2) provides unbiased variance estimates if all pairwise inclusion probabilities are
positive. Systematic sampling, when interpreted in the fixed configuration perspective, has a large
number of zero pairwise inclusion probabilities. Thus, when systematic sampling has been used, it is
usually necessary to use an approximation to the variance estimator. The randomized systematic
sample is a commonly used model for this purpose, and several approximations to the pairwisj
inclusion probabilities are available. The following one has been shown (Stehman and Overton,
1987a,b) to perform satisfactorily with the form of the variance equation used here. It has also been
demonstrated that the model is satisfactory in EMAP-like sampling circumstances (Overton and
Stehman, 1987).

For Tier 1, an appropriate formula for the second-order inclusion probabilities will usually be given by
(3)

Strata and clusters pose
from which,
2nw,-Wy — Wf—vfj
W'V = 2(n- 1) '
This approximation will usually be used in subsequent special formulae.
exceptions, as specified in Section 4.2.2.

4.1.2 Verification

EMAP will continue to develop estimates in all general areas of inquiry for many years; anticipating
the variance estimation restrictions in a particular case is much less important than appropriately
responding to the evidence that is generated. For many purposes, retrospective assessment of variance
will serve EMAP needs.

It is proposed that a general assessment capability be maintained that routinely investigates the
behavior of the EMAP design under the conditions being experienced and under a variety of
hypothetical circumstances. This is a part of planned ongoing support for EMAP. Specifically, the
capacity will be developed to explore the precision of estimates and the adequacy of variance estimates
in any particular part of EMAP by basic simulation experiments.
4.2 TIER 1 ESTIMATES

Tier 1 estimates are of several kinds, depending on the data on which they are based and the nature of
the populations and the selection process (see Section 3.3).

4.2.1 Resource Inventory

Resource inventory is provided by the data from the landscape descriptions (3.3.1) on the 40-hexes
established on the grid points. These Tier 1 estimates will typically be of the areal extent of surveyed
resources, by whatever classes have been identified at Tier 1. Also, estimates of the numbers of resource
units (e.g., lakes) in specific populations of discrete units will be made at Tier 1. The landscape
description units represent 1/16"1 of the area per grid point, so that w = 16, and y is either the area of

4-2

-------
the resource or the number of resource units in the 40-hex. Other attributes of landscapes will
ultimately be identified for similar population estimation.

Let the areal extent, ar, of a particular resource be identified in the 40-hexes. Then, with i now
indexing the grid points,

(la)
ies
estimates the total area of this resource in the defined class representing any spatial or other
subpopulation, and where the summation is consistent with the identity of that subpopulation. The
variance of this estimate is estimated by a special case approximation of (2),

(2a)
One way to derive this result is to apply (3) to (2) with wt-= 16 for all i.

Estimation of numbers of any population (class) of resource units (e.g., lakes) is possible for any class
for which units are uniquely represented by points. Point representation (3.3.2.1) is necessary in order
to unambiguously count the number of such units in the 40-hexes. Given such a count, nri, over the n
grid points, the total number of units in the defined population is estimated by,
Nr = 16£ nri
ies
with variance similarly estimated by
(ib)
V(Nr) -
es
(2b)
Estimates representing resource inventory will then be summarized in a table similar to Table 4.1.
This inventory can be repeated for any geographic subdivision, or other spatial partitioning of the
United States, or for any population defined in any manner. Some resources (extensive) will not have
estimates of numbers of units, but otherwise all of these estimates are generated for all populations of
interest.
Table 4-1. Tier 1 Inventory Estimates are provided from the Tier 1 Sample for All Resources
and Classes of Resources.
Resource
Estimated Area, by Class

1 2 3 .... k
Estimated Number of Units

123 .... k
B
C
4-3

-------
4.2.2 Discrete Resources

An alternate Tier 1 analysis derives from Design Option 1 (Section 3.3.5.3) when used with an
association rule that is not based on the list of resource units in the 40-hexes. Specifically, one example
is the rule of taking the nearest lake to the sample point. Such a rule provides a single resource unit in
association with each grid point. There must be some limitation of distance to allow misses, and the
one prescribed is that no unit will be selected by this rule if it is located outside the 40-hex.

An especially useful application of this option is selection of the National Agricultural Statistical Sevice
(NASS) primary sampling units (PSUs) that contain the EMAP grid points. This selection does not
involve the LDs in any way, and any incompleteness in the NASS coverage will show up as misses,
there being no PSU containing the grid point. This selection will necessarily involve variable
probabilities; in this case, TT{ = a,-/634.5, where a,- is the area of the PSU containing the \th grid
point.
Extensive resources may also have a sampling unit chosen in this manner, with the unit an arbitrary
area or quadrat, and so be represented at the set of grid points by this specific structure. There are
several specific variations involved in the EMAP structure. The Tier 1 sample of units is then
subsampled to obtain a useful Tier 2 sample. The subject is best discussed in terms of discrete resource
units, such as lakes, stream reaches, or NASS sampling units.

Estimating formulae revert to the general forms (1, 2), with general representation of the inclusion
probabilities. When using the specific approximation (3), with no complicating design features, it is
convenient to rewrite the variance estimator as
ies
iss jes
y.y (2w.w _ w,_ Wj
(2c)
Estimates from this Tier 1 sample provide the base of variance estimation at Tier 2, via the double
sampling protocol.

Design features that create exceptions in formula (2c) are:

• Stratification: TT,-^ = 7r,-7r;- , for i, j in different strata

• Clustering: ir,^- = TT,- = Wj , for i, j in the same cluster

• Units entering with probability 1
4.2.3. Domains

It is easy to generate circumstances in which the randomly ordered model will be inadequate for
variance assessment, and therefore it must be recognized that consideration of spatial matters must be
made in certain circumstances. One such circumstance is of particular interest in the discussion of
strata, and can be identified in terms of formula (2c) and the identity of n in that formula.

The argument is simple. If the grid is laid out over the domain of a resource, and data collected over
that domain, then extension of the grid over the rest of the country does not change the estimate.
Thus, extension over the rest of the country does not change the variance of that estimate. In effect,
the randomization model has assumed that the resource is randomly distributed over the entire
country, and prescribed the variance in terms of (2c) on that premise. But if we recognize that the
resource is distributed over no greater an area than a designated "domain", then we can re-express the
assumption in terms of randomization over the domain, and let the effective sample size be determined

4-4

-------
by the domain boundaries. For the Tier 1 sample, this will change n from 12,000 to, perhaps, 1,000 for
a particular resource.

Inspection of (2c) reveals that the term in brackets in the second term,

(2w,-w • — w,- — w.-)

is necessarily non-negative, since w,- > 1 for all i. Therefore the second term is strictly non-negative for
any positive attribute, y. As a consequence, increase in n without an accompanying increase in S
implies a decrease in the second term and an increase in V. Similarly, reduction in n without an
accompanying reduction in S produces a reduction in V. This is exactly the effect of restricting the
sample size, n, for a particular resource to the domain of that resource ( Figure 4-1).

Under* this restriction, the effective assumption in invoking the randomization model is that the
resource is randomized over its identified domain, rather than over the entire United States. Not only is
this a satisfying change in the assumption, it provides greatly reduced variance estimates in crucial
circumstances. Specification of the domain will remain an issue as we learn to apply this concept. The
domain of many, if not most, resources will be imperfectly known, a priori, and it will be necessary to
modify our precept in accordance with the data. On the other hand, it is not allowable to reduce this
"envelope" to the concave set, or to some other "shrunken skin." It is realistic to think of
determination of domain boundaries as a spatial issue for investigation, and meantime adopt a strategy
of conservative approximation.

A number of other issues are also raised by this development. When should a subpopulation be assessed
in terms of its own domain, and when should the parent domain remain effective? At first
consideration, the general answer to this seems to be in terms of the identified base of reference.

What is the effect of greatly varying density of resource units over the domain? This too represents a
clear deviation from the assumed randomization model, capable of description by the data collected at
Tier 1, and possibly leading to loss of perceived precision. This is an obvious circumstance for
consideration of stratification. The same is recognized with respect to spatial patterns in responses, as
treated elsewhere. There are a number of aspects of the analyses that require attention to spatial
matters.

4.2.4 Domains and Strata

It is reasonable to think of domains as resource-specific strata, and this is compatible with the EMAP
design in terms of Tier 1 Resources. For Tier 1 estimates, the domain-specific n is then appropriate for
variance estimates, and this all is carried into Tier 2 sampling. The Tier 2 sample sizes will be specific
for these Tier 1 Resources. A resource-specific, spatial division of this domain into two strata will be
generally tolerable, and may have certain advantages. However, an arbitrary spatial stratum will cross
the boundaries of many resource classes and domains (Figure 4-2), forcing an awkward and undesirable
substratification on them all, with no real redeeming benefits. It is possible that some ecoregion
structures could minimize these undesirable features, but the entire issue seems worthy of a firm
position. Resource-specific spatial substrata are tolerable, but general spatial strata imposed on all the
resources are intolerable. However, subresources should be seen in the subpopulation/domain
perspective. If the classes of a Tier 1 resource are addressed as subpopulations, and if these are confined
to well-defined subdomains, then the statistics of these subpopulations should also reflect this domain
constraint. These perspectives of EMAP strata must be explored in great depth in elaborating the
spatial analyses of the sample.
4-5

-------
Figure 4-1. When, a resource is spatially restricted to a subregion, then an approximation to the domain
boundary will identify the effective sample size for that resource.  If the resource is restricted, say to
New England (sugar bush) or to California and Southern Oregon (redwoods), then sample points taken
outside the domain have no bearing on the precision  of the sample taken within the domain. But just
what is the appropriate sample size to use? It is possible to generate a suitable "effective sample size",
just by delineating the approximate domain of the resource; slight inflation of the domain will lead to
slight inflation of the estimated variance.

                                                4-6

-------
              Arbitrary Boundary
Figure 4-2. Overlapping domains make it difficult to impose a common spatial stratification.
                                               4-7

-------
4.2.5 Spatial Control of Tier 2 Selection

These considerations lead directly to the issue of maintaining spatial distribution of the Tier 2 sample.
If the Tier 1 sample has been carefully generated to contain great spatial distribution, then it seems
imperative that this distribution be retained in the Tier 2 sample. The device that was used for this
purpose in the National Lake Survey (Overton, 1987a) appears well suited for this purpose in the
EMAP design. Spatial stratification is the basis for the device.

Let the domain (of a particular resource) be partitioned into a number of compact clusters of grid
points, such that the sum of weights for the Tier 1 sample in a cluster is "equal" for all clusters. The
purpose is to select a Tier 2 sample in such a manner that "equal" representation is obtained from
these clusters (Figure 4-3). This effectively distributes the Tier 2 sample in proportion to the
population distribution for all subspaces of the domain, subject to the resolution of the Tier 1 sample
information.

The explicit subspaces represented by the clusters are effective strata, but will not be treated as explicit
strata in selection of the Tier 2 sample. However, it is a simple step to combine clusters to make up
coarser strata that represent meaningful spatial subdomains. Such subdomains may provide statistical
advantages, as well as conceptual meaningfulness, and the statistics of these structures will be
investigated.

The ready utility of spatial strata in the context of specific resources imposes certain constraints on the
use of other design features that will inhibit that utility. Artificial strata, like political regions, are
particularly to be avoided. Strata based on ecoregions probably should also be avoided, but in part this
rests on the compatibility of ecoregion boundaries and domain boundaries. The only nonspatial
stratification criterion that seems to be warranted is that of resource identity.

4.3 TIER 2 ESTIMATES AND DESCRIPTIONS

4.3.1 Estimates

The Tier 2 samples will provide additional variables representing higher resolution classification of
resources and the suite of indicators. From these data, estimates will be generated of the areal extent of
resource classes that were not identified on the Tier 1 sample and of the numbers of resource units in
those populations. A table similar to Table 4-1 can be generated for Tier 2 estimates. In addition, the
Tier 2 sample lends itself to more complete population description in terms of the indicator variables
that have been measured on the samples of population units, or on the quadrats or points used to
characterize extensive or continuous resources. Additional descriptive statistics will also be used in
reporting the Tier 2 analyses.

Estimating formulae for Tier 2 will follow the general forms (1, 2), but the inclusion probabilities are
somewhat more complex than in Tier 1, representing the sampling process at both tiers. Actually this
is a three-stage process, although the middle stage is not identified in Option 2. The process is an
extension of that used in the National Lake Surveys (NLS) (Linthurst, et al., 1986; Landers, et al.,
1987) and the National Stream Surveys (NSS) (Messer, et al., 1986; Kaufmann, et al., 1988). For
discrete resources, selection at Tier 2 will be made in a manner to cancel out the variable inclusion
probabilities that were generated at Tier 1. Another source of variable probability will be stratification
at Tier 2 selection. There will be some restriction on the ability to remove Tier 1 variable probability,
and the general formulae (1, 2) will be used throughout. Further, there are possible situations in
extensive resources in which one might wish to retain variable probabilities.
4-8

-------
Figure 4-3. Partitioning a Resource Domain into compact  clusters of points, with each cluster  having
the same size, Ew, provides structure that maintains spatial distribution of the Tier 2 sample.

-------
4.3.2  Option 1

For Design Option 1, the Tier 1 sample will consist of one resource unit per grid point, for each discrete
resource at that point. The Tier 2 sample for a particular resource will then be a subsample of the Tier
1 sample for that resource, such that the product of the Tier 1 inclusion probability and the conditional
Tier 2 inclusion  probability yields the "total" probability of selecting the ith unit in the Tier 2 sample,
                — 7ri«7r2-i»
                              and thus
                                *liw2-li
                                                                              (4)
The recommended protocol for Tier 2 selection again involves systematic sampling on an ordered list,
though there  is local  randomization. Again it is necessary to approximate the second  order inclusion
probabilities for this process, and to compute  w2ij-. Two solutions have been used (Messer, et al., 1987;
Ovcrton,  1987a).  In certain circumstances, Equation (5) is appropriate, with approximation either at
the second term or the first,
Wn;.- =  W,
                                                                                                (5)
But when both  terms must be  approximated,  as  by Equation (3),  it seems no  more arbitrary to
approximate \v2l]- directly by Equation (3), as was done in the Stream  Survey (Overton, 1987b). Minor
complications arise when certain units are selected with certainty.

Further  investigation will  be made into the behavior of this approximation.  For the recommended
selection procedure at Tier 2, 7T2i = nj/Nj, where n2 is the Tier 2 sample size and N\ is the Tier 1

estimate of the size of that resource. It follows that  N2 =  ^l/*"*  — Nlt so that the variance of the
                                                          S2
population total  estimated from the Tier 2 sample is identical to the variance of the estimate from the
Tier 1 sample.


4.3.3 Option 2

4.3.3.1 Discrete Resources

For Design Option 2, the Tier 1 sample for a particular discrete resource consists of all resource units in
each hexagon, and the Tier 2 sample is selected by a two-step process:  (1) select a set of hexagons with
probability proportional  to the number of units,  and (2) then select a single unit at random from each
hexagon selected at the first step. For the second step, equal probability selection from the number in
the  hexagon  results in  selection with probability  inverse  to that in the first  step.  Therefore, the
inclusion probability for each unit in this Tier  2 sample is also given by ir2i =  n2/N,  but in this
option, N  was estimated from the full hexagon  list of sampling units, rather than from a single unit
selected by an arbitrary rule.

This design leads to the identical sample that would have  been obtained if the sequence of Option 1
had been used, but it creates a different perspective  of the sample, allows perception of the basis for the
apparent advantage  of  this association  rule over the first,  and provides greater flexibility for future
changes in sample structure due to shifting objectives.

Again,  Equation (5)  can be used  in the manner of  Phase II of the  Eastern  Lake Survey (Overton,
1987a) if cither of the stage  designs will yield exact HT variance estimation. The second stage of this
option is a cluster subsample with one unit  per cluster,  and  this produces a large  number of zero
second-order  T'S. Thus  both components must be approximated,  so again  the  indicated method is to
estimate the w2,-.-'s directly by Equation  (3). This has an additional reinforcement in that the variance
estimated  would be  identical  to  that estimated  from this association  rule used in Option 1. Still, the
                                                4-10

-------
behavior of this prescription has not been investigated in the context of Option 2, and this will be done
in the coming months.

.4.3.3.2 Extensive Resources

Continuous resources can be sampled in the manner of general extensive resources (following
paragraph), but the preferred design will be based on point samples of one or another type. Several
have been investigated, and further investigation is necessary before a definitive assessment will be
possible. The strict triangular point grid, enhanced from the base EMAP grid, is the currently favored
design. When the surface signals are strong relative to noise, population estimation methods for
variances are weak and approximations based on spatial components are needed. In this circumstance,
spatial models for surface fitting are more greatly needed than elsewhere in EMAP. The Near Coastal
Demonstration Project is a pilot study for this approach (see section 3.3.5).

Extensive resources can also be sampled by Option 2, but there are so many different ways to represent
the resource contained in a 40-hex that it is appropriate to restrict the general treatment and leave
considerable latitude to the resource group for field methods. We will treat only the first step of Option
2, and identify two ways to select the subset of 40-hexes for a particular resource.

1. Choose a subset of the Tier 1 hexagons containing the resource with inclusion probability
proportional to the area, a,-, of the resource in the ith hexagon. Then these elements .are
selected with "total" probability
r2t- =
so their variances are also equal.
Again,
A2 =
The analogy to a Tier 2 sample for discrete resources is completed if a single quadrat of fixed
size is randomly selected in the resource of each selected 40-hex. Let the quadrat be of size m,
so that 7r3.2i = m/a,- if m
-------
certain resources than for others. Extensive homogeneous resources will be represented nicely by
index samples using quadrats, transects, or other simple devices. It is the heterogeneous
extensive resources that will pose a real challenge. This approach is clearly the preferred
option for general extensive resources.

4.3.4 Distribution Functions

These populations also will be characterized by the estimated distribution function for the variable, x,
of interest. Indicator variables will play such a role. In this, it is necessary only to consider a particular
range of the variable x as defining a subpopulation, and estimate the number of, say, lakes in the
population that are in this subpopulation. This process is repeated for all values of x in the sample, and
the estimated distribution plotted as in Figure 4-4. Note that such distributions are generated for a
variety of y-variables, such as number (frequency distributions), area (areal distributions), length, or
other attribute. For example, it is useful to generate the distribution of stream miles on the variable, x,
indicating the miles of stream that have a specific attribute.

The page of descriptive statistics that was prescribed (Linthurst et al., 1986) for the NLS, and also used
for the NSS, will also serve EMAP descriptions (Figure 4-4). Quantiles are interpolated from the fitted
distribution function. In the NLS, confidence bounds were defined for the number of units in the class
in question and presented in scaled form. Extension to EMAP demands greater versatility, and it is
recommended that the distributions be generated both as numbers and as proportions, and that the
confidence bounds be generated accordingly (Figure 4-5). Then a single page of description will report
only one kind of distribution, either of numbers or area, but in both forms. To keep the two forms
distinct, it is recommended that the scale of the distribution of numbers be labeled as numbers, but
that the plots be of fixed dimension to facilitate comparison. Confidence bounds on the distribution of
numbers should continue to be one-sided, as in the NLS (Overton, 1985), and ascending and descending
analyses should be provided, depending on the variable. The ascending analyses provide upper bounds
on the numbers of resource units having value of the variable below a particular value, and descending
analyses provide upper bounds on the numbers of units having value above a particular value.

Confidence bounds on proportions of numbers of units will usually be appropriately provided by exact
binomial bounds, and these are to be two-sided, with descending analyses unnecessary. Binomial
bounds can be used only when the variable probabilities have been eliminated by the selection process
at Tier 2; otherwise it may be necessary to use ratio variances for these confidence bounds. Combined
strata will involve means of binomials, and will also not be suitable to exact binomial bounds.

Distributions of area or length can be estimated directly using the HT formulae, but proportions of
areas and lengths must be treated as ratios and therefore will require special treatment in generating
confidence bounds of the second type.

4.3.5 Distributions of Continuous Resources

Continuous resources are also represented in the form of distributions in terms of random points over
the spatial extent of the resource. A temporal component may also be represented in the distribution.
These representations do not have the finite population corrections of the discrete resources, but
otherwise the representations are identical. However, there is a class of continuous resource, charac-
terized by strong surface signal-to-noise ratio, in which the proposed binomial confidence bounds will
be too conservative. Effort will be expended to identify these resources and to construct better
confidence bounds. It seems likely that a technique from spatial statistics will be necessary. The signal-
to-noise ratio deserves a brief mention. If the spatial pattern, as seen by the measurements, is very
regular, then this is evidence that noise is relatively low and that the surface signal is strong, relative to
noise. The noise in question can either be true surface, high-resolution irregularity or measurement
error. In some circumstances, discrete resources can also exhibit such patterns of strong surface signal
4-12
-------
Vu-lsbU: ANC a* p«q L" for lakes S200O ha
X:,: 0.0 X,,: SO.O X«> 2OO.O
Population Siia (N): 7O96 SEW 16SJ3
Lake ATM (At 427864 SE(A): 36414
Sample Size: 763
1.0
0.8-
0.6-
0.4 -I
0.2-
0.0
1 ! 1-
-100.0-0.0 1OO.O 2OO.O 300.0 4OO.O 500.0 6OO.O 7OO.O 8OO.O. 9OO.O 1OOO.O
Mire -45.60
Q>: SI. 63
Q3: 118.36
j: 199.64
CU ,399.34
Max: 4046.60
Median; 158.11
Mean: 268.O8
Sid. Dev.: 411.69
Proportions and Numbers Below ih« Value of X<
p<,: O.046 Ntl: 326 N,..,,: 422
p^: 0.192 N<3: 1364 Ne^,: 1536
p' OG7? A., 2II7-J77 A,,, 3389-3
Figure 4-4. Example of descriptive statistics used in the National Lake Survey that could also be used
in EMAP: F(x) and G(x) are frequency and areal distributions of ANC (acid neutralizing capacity) for
the target population of lakes < 2000 ha in the northeastern United States, Eastern Lake Survey.
(Source: Linthurst et al., 1986)
4-13
-------
A
N
100
150 200
250 300 350 400
100
150
200
250 300
350 400
Figure 4-5. Examples of estimated distributions, of numbers of resource units (upper plot), and of
proportions of resource units (lower plot). The confidence bound in the upper plot is based on
estimated variance of the estimated numbers; in the lower plot, the confidence bound is based on the
binomial distribution.
4-14
-------
to noise, but the phenomenon is more expected in continuous resources. An illuminating example of
spatial effect can be identified in this context. Consider estimation of the proportion of Chesapeake
Bay for which the value of x is below a specified nominal value (Figure 4-6).

Draw the contours in the bay for this nominal value. The variance of the estimated proportion derives
from the variance of the number of grid points that fall in the designated region under randomization
of the grid. If the region is entire, a systematic sample will have variance greatly less than will a
random sample, which generates the binomial distribution. But if the region is greatly fragmented,
then the systematic and random samples will have similar variances. The key to adequate estimation of
the variance of the estimated proportion, when using the systematic sample, is thus identification of
the degree of fragmentation of the region sampled.

This example is presented to convey that the focus on population estimation, rather than spatial
analysis, does not imply that the spatial component of distributions is being overlooked. A principal
reason for use of the systematic grid is to provide uniform spatial coverage, and the restriction that
Tier 2 sample selection will retain that spatial coverage is consistent with this goal. However, the usual
variance estimation methodology does not take spatial components adequately into account, and
methodological development is required. But neither do current spatial methods take into account the
needed probability sampling considerations. Clearly a melding of the areas is needed.

4.3.6 Deconvolution

Estimated distribution functions can also contain an extraneous temporal component if the data are
taken in a temporal window that has appreciable variation in the variable. When such variation is
present, it is important to identify this component of the distribution, and when this component is not
consistent with the objectives of the survey, it is therefore extraneous and should be removed. We refer
to this removal as "deconvolution," and development of satisfactory methodology for deconvolution is
a prime objective. This topic has been addressed in specific circumstances (Church, et al., 1989), but a
comprehensive treatment has not yet been established (Overton 1989a). Attention will also be given to
deconvolution to remove the effect of other sources of extraneous variation.

4.3.7 Trends and the Interpenetrating Design

Characterization of trends is a key component of EMAP. We will use the term in the broad sense, to
mean the general temporal pattern of variation in an attribute, and will focus on trends in populations.
Trends in single systems are the subject of process studies, and not really part of the general
monitoring perspective. Trends in early warning systems may be an exception, but this concept is not
well developed. The interpenetrating design is effective in characterizing trends, and a linear model-
based paired t-test is available to determine if the change from one time to another is significant. It is
also simple to determine if the pattern from one time to another can be considered linear, or if there is
evidence that a nonlinear pattern holds.

The descriptive ability of the four-year moving average of the interpenetrating samples is great, and
this moving average is easily corrected for linear trend, so as to accurately reflect the resource
trajectory. Further, the same moving average analyses can be extended to distributions, for those
variables to be analyzed for population trend. Again, tests for population trend are straightforward for
these distributions at intervals of four years.
4-15
-------
Figure 4-6. An unrestricted random sample will give a binomial random variable for the number of
points in the enclosed area in each figure. A systematic grid will have greatly less variation for the
"entire" region of (a), and will have close to the binomial variance for the highly fractured case, (c).
4-16
-------
The basic population moving average estimates are given below, either for Tier 1 or Tier 2 estimation.
Si, S2) S3, and S4 refer to the samples implemented in years 1, 2, 3, and 4.
year 1

year 2

year 3

year 4
t =
£yw,
Si
if £
LSI
= if £yw
"I,
J
= if £yw + £yw +
if
L
!
J
£yw + £yw ~|
S3 s4 J
The formula for year 4 can be expressed in a form

1 4
Ty = 3 £ £yw

that applies to all subsequent years, with the four samples now representing the last four years of
execution of the Tier 2 sample. For example, S4 might represent 1995, Sx 1996, S2 1997, and S3 1998.
We can identify this estimator as a four-year moving average, with each of the four years represented
by a different subsample. The sequence of moving averages will therefore provide averaging on the
subsamples as well as on the years, and provide a better description of the resource trajectory than will
the other alternatives, with essentially the same effort.

Variance estimates for the moving average estimates follow standard HT formulae, using the
approximations adopted for EMAP and used throughout. Further, subpopulation estimates and
variances follow the standard subsetting protocol, including the generation of distributions and
confidence bounds for distributions. Then, comparison among distributions at intervals of four years is
direct and follows the general protocol developed for such comparison in Phase II of the Eastern Lake
Survey (Overton, 1987a; Overton, 1989a).

4.3.8 Spatial Pattern

The systematic grid design is capable of representing spatial patterns and processes that are apparent
at this or coarser resolution. In some instances, higher density systematic grids will be used, for
example in characterizing large bodies of water, and design sensitivity to pattern will change with the
grid density. This is an important consideration in design; choice of density, and the size of the
landscape description hexagons (the 40-hexes), is tied to the scale of the phenomena intended for
measurement. After these designs have been fixed, spatial characterizations are limited by the
resolution of the grid and the size of the description hexes.

Two general ways of representing spatial pattern are available. Regional differences are easily
accommodated by subpopulation analyses. Any subpopulation can be described by subsetting the
sample on that subpopulation, and regional or other spatial entities are simple examples of
subpopulations (Section 4.2.2). Then the population distributions and other statistics are capable of in-
depth characterization of those entities. This approach will be used extensively and will be the base for
a number of other analytic approaches. Regional subpopulation analyses are enhanced for most
resources by the uniform grid design.
4-17
-------
Surface fitting, as by the methodologies of spatial statistics, will be another essential technique. Kriging
is a common method, and is clearly of use in some circumstances. However, we perceive the need for
greatly enhanced methods of spatial representation to account for irregularities created by topographic
and meteorological patterns, as well as by anthropogenic and ecological processes. Kriging will be used
in the beginning, but effort will be directed toward a more satisfactory spatial analysis. We cannot look
at the output of kriging without wanting something better.

One simple estimation that will be immediately available, without much development, derives from the
capacity to integrate sample statistics over spatial regions. Such methodologies are extensions of end
corrections developed for systematic sampling by Yates (1949) and others. This is closely tied to the
issues of spatial stratification. However, beyond the first simple applications, this leads back to spatial
statistics, and the need for extensive development in that area.

At another level, we identify several distinct modes of visual presentation of spatial pattern. Useful
representation of surfaces can be made by contour plots, but in certain circumstances, more
information will be carried by spatial mosaics. Mosaics and a variety of point representations also can
be used for spatial characterization of discrete resource units. Some effort will be expended in
developing and applying these methods.

4.4 CHANGES IN CLASSIFICATION AND STRUCTURE

4.4.1 Reclassification

The dominant consideration with respect to flexibility and adaptability of the design is the capacity to
accommodate reclassification. Some resource units will change from one type to another over the course
of years. Others will turn out to have been misclassified, either on ground inspection, or on photo-
reintcrpretation. More importantly, as more is learned about the systems that are being monitored, it
is certain that some changes in classification will be desired to bring the design more into alignment
with new perspectives. If classification has been the basis for stratification, then there will be interest in
rcstratifying to accommodate reclassification.

4.4.2 Subpopulation Estimation

Subpopulation estimation is always a way to deal with these problems. Summation over the sample
units (Section 4.1.1) belonging to a specific Subpopulation provides estimates for that Subpopulation. A
Subpopulation being so characterized can be present in all strata, and one simply subsets the sample of
each stratum in generating the Subpopulation estimates. Thus it is not necessary to restratify in order
to fulfill the basic characterization goals, but it might be desirable. For example, association analyses
are made more complex by mixing strata, due to the need to weight the analyses, as discussed in
Section 4.5.1. In general, data handling and analyses are simpler if the sample structure is Simpler. The
general capacity to restratify, and otherwise to reorganize the sample structure, is seen as a necessary
feature of a long-term monitoring design.

4.4.3 Restratification

Reclassification at Tier 1 will have two kinds of effect on the resource samples. If a unit is changed
from one resource to another, then the identification of the Tier 1 sample of each resource is subject to
change. This change will carry through into the Tier 2 sample, with attendant changes in the inclusion
probabilities.

The other kind of change involves reclassification of a single resource, ranging from a simple shift of a
few units from one class to another to a complete cross classification. The first again involves only a
few adjustments. The latter requires wholesale modification in the sample in order to obtain uniform
4-18
-------
inclusion probabilities in the new classes. In some cases, one may wish to retain the original structure,
as well as the new one, and treat the cross classification cells as strata. However, this is little different
from subpopulation estimation.

The general process of restratification involves sample reduction and sample enhancement. Sample
reduction is straightforward; one simply subsamples the original sample and carries the selection
probabilities into the new inclusion probabilities. Sample enhancement is more difficult, and it is not
trivial to enhance a sample in such a manner as to obtain a prescribed uniform inclusion probability. A
general EMAP recommendation will be to oversample at the initial determination of the Tier 2 sample,
with reduction to the working level. The remainder will be archived as a source of future enhancement
units.

4.4.4 Restructuring the Interpenetrating Sample

Other procedures also involve these operations of reduction and enhancement. Sample selection for the
interpenetrating Tier 2 samples must be made, in the beginning, in ignorance of the structure of the
later Tier 1 interpenetrating samples. Thus, it seems a good design to modify these samples after all
the Tier 1 descriptions are completed, in order to obtain uniform inclusion probabilities over the four
samples. This modification involves reduction and enhancement; oversampling the first three
subsamples, so that only reduction might be required, again seems desirable. Here one would consider
implementing the oversample in the first cycle. If the Tier 1 resource samples are all identified before
any Tier 2 samples are selected, then these adjustments will not be needed, but this hardly seems
feasible.

4.5 ANALYSIS OF ASSOCIATIONS

Analysis of associations is an integral part of intended EMAP assessment, and the needs of this activity
also have played a large role in design consideration. The statistical analyses of data from a probability
sample pose problems that are not encountered in conventional statistics. Except for this, the analyses
are the same as in any other statistical context. Inferences, however, may be modified because of the
observational nature of these surveys.

4.5.1 Weighting

If the data are represented by variable inclusion probabilities, it is necessary to account for this in the
analyses, by weighting. For this reason, in large part, the EMAP plan eliminates variable probabilities,
except between resource strata, and these strata are usually based on resource classes that are of
particular interest. Simple unweighted analyses suffice for the individual strata, within which the
weights are all the same.

In conducting an association analysis for two strata, combined, suppose that inclusion probabilities are
uniformly .5 within one stratum and uniformly .2 within the second stratum. Analysis of the combined
data set is relative to the combined populations, and it is necessary to weight data from the first
stratum by 2 and data from the second stratum by 5 in order to get the appropriate analysis. This is
not really a heavy additional load, but it is something else to take care of.

One of the most perplexing issues involving weighted data is that of graphical representation. Scatter
plots are a basic tool for displaying an association, but they are difficult to produce when the weights
are different. Spatial plots are even more difficult; what is needed is some way to make the number of
points in each of the stratum samples proportional to the population numbers. Some preliminary work
done using Eastern Lake Survey data has met with some success, but in general, much attention must
be paid to this issue.
4-19
-------
In regression analysis, if the two strata have different functional forms for the regression, say a different
slope, then there are grounds for the position that the combined analysis is meaningless. If the relation
is to be used to provide a prediction for a particular unit, then it is clear that the stratum of that unit
should be identified, and the individual stratum regression used. No problem arises if the strata are not
going to be combined.

Further, if the two strata have the same functional association, there is no problem in combining the
data, even if the two sets of inclusion probabilities are disparate. The Gauss Markov Theorem says
nothing about inclusion probabilities, but it does require the functional form of the relation to be the
same.

The EMAP strategy, then, is to identify resource strata that have specific meaning, and to construct
the design to adequately characterize these resource classes. Elimination of variable probabilities within
these strata allows simple analysis for association, within strata. If it turns out that association
relations are the same among several strata, then it is reasonable to combine strata for combined
representation of those relations, but separate stratum analyses are appropriate when relations are
different.

These considerations still do not prohibit combining strata with different association relations if there is
a compelling reason to analyze the mixture. It is only required that weighted association analyses be
made in such cases, but we would argue that such cases will not be common.

4.5.2 Observational Data

Testing in the context of EMAP association analyses is another issue. These will be observational data.
Tests based on observational data are no different from tests based on experimental data, and the
statements of significance are the same. However, inferences drawn from such tests may be very
different. Causality is proven by formal experiments, but not by observational studies. Observational
studies prove only association, and the nature of the cause must be discovered by other means. For this
reason, it will be appropriate, even after extensive exploration of association in the context of EMAP
assessment, to speak of "possible cause," rather than "probable cause."

4.5.3 Structures

Point by point, or sampling unit by sampling unit, analyses will be appropriate for any sets of
variables taken on points or on sampling units. Data sets prescribed by stressor/stress indicator pairs
will be generated on a sampling unit basis, and these will be a major source of association analysis. But
there is interest, also, in association analyses across resources. Cross-resource associations will not be
evident on a sampling unit basis unless there is a specific design that provides data on those resources
on the same sampling units. It will often be feasible to collect the needed cross-resource data on the
sampling units of one resource. In other cases, it may be necessary to have associated pairs of units
from the two resources. This is a feasible design option, but should be used only for matters of
particular concern. Precision of resource estimates will be lessened by use of designs that restrict the
individual samples.

Evidence of association and clues to meaningful relations will also derive from subpopulation
differences, and the search for subpopulations that exhibit insightful differences should be a prominent
activity. Subpopulations may be identified spatially, or by any other criterion, and the nature of the
criterion is the basis for investigation of cause. Any attribute that is associated with the population
criterion is thus a candidate for identification as causal.

At least in the beginning, association of deposition with resource response must be on a subpopulation
basis, because it will not be possible to project deposition on high spatial resolution, so as to adequately
associate with sampling unit variation. Spatial regions must suffice.
4-20
-------
Comparisons among subpopulations will be made in terms of distributions, rather than simply relative
to means. Such comparisons prescribed for the NSWS provided satisfactory assessment (Overton,
1985). These were simple chi square analyses of the data, partitioned by quantiles of the estimated
distribution of the combined population. Comparisons among strata are simple, but again comparisons
of mixed stratum populations pose some difficulties, because of the variable probabilities. However,
concern with the testing protocol must be tempered by recognition that such tests are simply for the
purpose of screening, and that follow up investigation will be required to make any inference of
mechanistic association or causation.

That is, given identified subpopulation associations, it will be of interest to establish higher resolution
relations on a sampling unit by sampling unit basis. At the most fundamental level, it is necessary to
make sure that the specific lesource units that show stress have received the stressor stimulus. This
places requirements, for example, on analysis of deposition networks that cannot be met by currently
available spatial statistical techniques. However, such designs will not be needed in the beginning, but
rather only after the subpopulation associations have been established. It will be several years before
these capacities are needed.

4.5.4 Multivariate Analyses

A question has arisen on several occasions regarding multivariate analyses, and in particular, the use of
multivariate distribution functions. Unquestionably, any data exploration must admit multivariate
methods, and there is no intent to prohibit them. Further, there are specific forms of multivariate
description that have proven particularly useful, such as the trilinear plots used in the NSWS. When
the nature of the relation of certain variables can be identified, as in water chemistry, then this relation
can be exploited to great advantage in the manner of analysis.

Bivariate distribution functions will not generally be supported by data sets of the sizes used in EMAP.
However, when the categories of one variable, or of a combination of several variables, are coarse, then
these categories can define subpopulations that are described by the distribution functions of whatever
other variables are of interest in association with those that define the subpopulation. In a sense this
allows multivariate distribution functions, but within the subpopulation paradigm.

4.5.5 Subpopulations Revisited

Subpopulations are an extremely important facet of the EMAP design perspective. Their use permeates
most of the analytic topics, and their care and maintenance dominate the design issues. Recalling the
thought that EMAP programs are designed to discover and characterize subtle, diffuse, likely-to-be-
overlooked-without-EMAP subpopulations that are doing interesting things is a good way to end this
section.
4-21
-------
-------
SECTION 5
REFERENCES
Church, M.R., K.W. Thornton, P.W. Shaffer, D.L. Stevens, B.P. Rochelle, G.R. Holdren, M.G.
Johnson, J.J. Lee, R.S. Turner, D.L. Cassell, D.A. Lammers, W.G. Campbell, C.I. Liff, C.C. Brandt,
L.H. Liegel, G.D. Bishop, D.C. Mortenson, S.M. Pierson, and D.D. Schmoyer. 1989. Future effects of
long-term sulfur deposition on surface water chemistry in the Northeast and Southern Blue Ridge
Province. EPA-600/3-89/061. Washington, DC: U.S. Environmental Protection Agency.

Horvitz, D.G. and D.J Thompson. 1952. A generalization of sampling without replacement from a
finite universe. J. Amer. Statist. Assoc. 47:663-685.

Kaufmann, P.R., A.T. Herlihy, J.W. Elwood, M.E. Mitch, W.S. Overton, M.J. Sale, K.A Cougan,
D.V. Peck, K.H. Reckhow, A.J. Kinney, S.J. Christie, D.D. Brown, C.A. Hagley, and H.I. Jager. 1988.
Chemical Characteristics of Streams in the Mid-Atlantic and Southeastern United States. Volume I:
Population Descriptions and Physico-Chemical Relationships. EPA/600/3-88/021a U.S.
Environmental Protection Agency, Washington, D.C.

Landers, D.H., J.M. Eilers, D.F. Brakke, W.S. Overton, P.E. Kellar, M.E. Silverstein, R.D. Schonbrod,
R.E. Crowe, R.A. Linthurst, J.M. Omernik, S.A. Teague, and E.P. Meier. 1987. Characteristics of
lakes in the western United States. Volume I: Population descriptions and physico-chemical
relationships. EPA-600/3-86/054a. Washington, DC: U.S. Environmental Protection Agency.

Linthurst, R.A., D.H. Landers, J.M. Eilers, D.F. Brakke, W.S. Overton, E.P. Meier, and R.E. Crowe.
1986. Characteristics of lakes in the eastern United States. Volume I: Population descriptions and
physico-chemical relationships. EPA-600/4-86/007a. Washington, DC: U.S. Environmental
Protection Agency.

Messer, J.J., C.W. Ariss, J.R. Baker, S,K, Drouse, K.N. Eshleman, P.R. Kaufmann, R.A. Linthurst,
J.M. Omernik, W.S. Overton, M.J. Sale, R.D. Schonbrod, S.M. Stambaugh, and J.R. Tuschall, Jr.
1986. National Surface Waters Survey: National Stream Survey, Phase I - Pilot Survey. EPA/600/4-
86/026, U.S. Environmental Protection Agency, Washington, D.C.

Messer, J.J., C.W. Ariss, D.H. Landers, and W.S. Overton. 1987. Critical design and interpretive
aspects of the National Surface Water Survey. Lake Reserv. Manage. 3:463-469.

Messer, J.J., R.A. Linthurst, and W.S. Overton. 1991. An EPA Program for Monitoring Ecological
Status and Trends. Environmental Monitoring and Assessment 17:67-78.

Olea, R.A. 1984. Sampling design optimization for spatial functions. Mathematical Geology
16(4):369-392.

Overton, W.S. 1985. Working draft, analysis plan for the Eastern Lake Survey. March 1985.
Technical Report 113, Dept. of Statistics, Oregon State University.

Overton, W.S. 1987a. Phase II analysis plan, National Lake Survey — working draft. April 1987.
Technical Report 115, Dept. of Statistics, Oregon State University.
5-1
-------
Overtoil, W.S. 1987b. A sampling and analysis plan for streams, in the National Surface Water
Survey conducted by EPA. June 1987. Technical Report 117, Dept. of Statistics, Oregon State
University.

Overton, W.S. 1989a. Calibration methodology for the double sample structure of the National lake
Survey Phase II Sample. Nov. 1989. Technical Report 130, Dept. of Statistics, Oregon State
University.

Overton, W.S. 1989b. Effects of measurement and other extraneous errors on estimated distribution
functions in the National Surface Water Surveys. Aug. 1989. Technical Report 129, Dept. of Statistics,
Oregon State University.
Overton, W.S. 1990. A strategy for use of found samples in a rigorous monitoring design.
Report 139, Dept. of Statistics, Oregon Sate University.
Technical
Overton, W. S., and S.V. Stehman. 1987. An empirical investigation of sampling and other errors in
the National Stream Survey: Analysis of a replicated sample of streams. Oct. 1987. Technical Report
119, Dept. of Statistics, Oregon State University.

Palmer, C.J., Riiters, K.H., Stirckland, T., Cassell, D.L., Byers, G.E., Papp, M.L., and Liff, C.I. 1991.
Monitoring and research strategy for forests — Environmental Monitoring and Assessment Program
(EMAP). EPA/600/4-91/XXXX. U.S. Environmental Protection Agency, Washington, D.C.

Stehman, S.V., and W.S. Overton. 1987a. Estimating the variance of the Horvitz-Thompson
estimator in variable probability, systematic samples. Proceedings of the Section on Survey Research
Methods of the American Statistical Association.

Stehman, S.V. and W.S. Overton. 1987b. An empirical investigation of the variance estimation
methodology prescribed for the National Stream Survey: Simulated sampling from stream data sets.
Oct. 1987. Technical Report 118, Dept. of Statistics, Oregon State University.

White, D., A. Jon Kimmerling, and W. Scott Overton. 1991. Cartographic and Geometric
components of a Global Sampling Design for Environmental Monitoring, Accepted by Cartography and
Geographic Information Systems.

Yates, E. 1949. Sampling Methods for Censuses and Surveys. London: Charles Griffin and Co.
5-2
•ifV.S. GOVERNMENT PRINTING OFFICE: 1992 - 648-003/41801
-------