EPA/600/3-91/053
                                                                               May, 1990
                            Design Report for EMAP

           Environmental  Monitoring and Assessment Program


                    W. Scott Overton1, Denis White2, and Don L. Stevens, Jr.2

               'Department of Statistics, Oregon State University, Corvallis, Oregon
       2ManTech Environmental Technology, Inc., U.S. EPA Environmental Research Laboratory
                            200 SW 35th Street, Corvallis, Oregon


                                   Contract 68-C8-0006
                                      Project Officer
                                    Anthony R. Olsen
                            U.S. Environmental Protection Agency
                             Environmental Research Laboratory
                            200 SW 35th Street, Corvallis, Oregon
Research sponsored by the U.S. Environmental Protection Agency under cooperative agreement CR-816721
with Oregon State University at Corvallis, and under Contract No. 68-C8-0006 to ManTech Environmental
Technology, Inc.
                                                                  HyD Printed on Recycled Paper

-------
The research described in this report has been funded by the U.S. Environmental Protection Agency.
This  document  has been  prepared  at the EPA Environmental  Research Laboratory in  Corvallis,
Oregon, through Contract No. 68-C8-0006 to ManTech Environmental Technology, Inc., and through
cooperative agreement CR 816721 with Oregon State University at Corvallis.  It has been subjected to
the Agency's peer and administrative review and approved for publication.  Mention of trade names or
commercial products does not constitute endorsement or recommendation for use.
                                              11

-------
                                           CONTENTS
1  INTRODUCTION

2  DESIGN STRATEGY AND CHARACTERISTICS
    2.1  Strategic Approach
    2.2  EMAP Bywords
    2.3  EMAP Capability
    2.4  Tier Structure
    2.5  An Emerging Perspective of EMAP

3  SAMPLING DESIGN
    3.1  Design Overview
    3.2  The Sampling Grid
       3.2.1 The Grid
       3.2.2 Geometric Design Criteria
       3.2.3 The Global Structure of the Design
       3.2.4 Baseline Grid Density and Augmentation
       3.2.5 Randomization
    3.3  The Sample
       3.3.1 Tier 1 Landscape Description
       3.3.2 Discrete and Extensive  Resources
       3.3.3 Sampling Considerations
       3.3.4 Tier 1 Sample of Extensive Resources
       3.3.5 Tierl Sample of Discrete Resources
       3.3.6 Augmenting the Grid
       3.3.7 Subpopulations of Interest
       3.3.8 Tier 2 Characterizations
    3.4  Interpenentrating Samples
       3.4.1 Interpenetrating Subsamples
       3.4.2 Ramping Up in the First Cycle
    3.5  Existing Inventories and Monitoring Programs
       3.5.1 Examples of Existing Programs

4  ESTIMATION  AND ANALYSIS
    4.1  Estimation and Description
       4.1.1 Formulae
       4.1.2 Verification
    4.2 Tier 1 Estimates
       4.2.1 Resource Inventory
        4.2.2  Discrete Resources
       4.2.3 Domains
       4.2.4 Domains and Strata
       4.2.5 Spatial Control of Tier  2 Selection
    4.3 Tier 2 Estimates and Descriptions
       4.3.1 Estimates
       4.3.2 Option 1
       4.3.3 Option 2
       4.3.4 Distribution Functions
       4.3.5 Distributions of Continuous Resources
 1-1

 2-1
 2-1
 2-2
 2-3
 2-3
 2-4

 3-1
 3-1
 3-2
 3-2
 3-2
 3-2
 3-4
 3-6
 3-6
 3-6
 3-6
 3-7
 3-7
 3-8
3-10
3-10
3-10
3-11
3-11
3-13
3-13
3-14

 4-1
 4-1
 4-1
 4-2
 4-2
 4-2
 4-4
 4-4
 4-5
 4-8
 4-8
 4-8
4-10
4-10
4-12
4-12
                                                111

-------
       4.3.6 Deconvolution
       4.3.7 Trends and the Interpenetrating Design
       4.3.8 Spatial Pattern
    4.4  Changes in Classification and Structure
       4.4.1 Reclassification
       4.4.2 Subpopulation Estimation
       4.4.3 Restratification
       4.4.4 Restructuring the Interpenetrating Sample
    4.5  Analysis of Associations
       4.5.1 Weighting
       4.5.2 Observational Data
       4.5.3 Structures
       4.5.4 Multivariate Analyses
       4.5.5 Subpopulations Revisited
4-15
4-15
4-17
4-18
4-18
4-18
4-18
4-19
4-19
4-19
4-20
4-20
4-21
4-21
5  REFERENCES
 5-1
                                                 IV

-------
                                           List of Figures
Figure 3-1  The base grid density placed advantageously on the United States                      3-3
Figure 3-2  The truncated icosahedron model projects the familiar soccer ball tesselation pattern
             onto the earth                                                                        3-5
Figure 3-3  The structure of the EMAP grid,  illustrating the four-fold decomposition that will be
            the basis of the  interpenetrating design                                               3-12
Figure 4-1  When a resource is spatially restricted to a subregion, then an approximation to
            the boundary will identify the effective sample size for that resource                    4-6
Figure 4-2  Overlapping domains make it  difficult to impose a common spatial stratification        4-7
Figure 4-3  Partitioning a resource domain into compact clusters of points, with each cluster
            having the same size, Ew, provides structure that maintains spatial distribution of
             the Tier 2 sample                                                                    4-9
Figure 4-4  Examples of descriptive statistics used in the National Lake Survey that could also
            be used in EMAP                                                                    4-13
Figure 4-5  Examples of estimated distributions                                                  4-14
Figyre 4-6  An unrestricted random sample will give a binomial random variable for the number
             of points in the enclosed area in  each figure                                          4-16
                                                  List of Tables
Table 3-1   Schematic of the rotating interpenetrating design prescribed for EMAP
Table 4-1   Tier 1 inventory  estimates and estimated variance of estimates are provided
            for all resources and classes of resources
3-12

 4-3

-------
                                    ACKNOWLEDGEMENTS
The authors sincerely thank the many scientists who  have contributed, directly or indirectly, to this
document  through workshops,  informal  discussions, and reviews.   We appreciate the editorial and
technical assistance provided by Penny Kellar of Killkelly Environmental Associates and by Perry Suk,
who have edited several previous drafts.
                                               VI

-------
                                           ABSTRACT
The EMAP design was developed with the following considerations:
    •  Consistent representation of environmental reality by use of a probability sample
    •  Potential representation of all resources and environmental entities
    •  Capacity for quick response to a new question or issue
    •  Spatial distribution of the sample according to the distribution of the resource

These  considerations,have been met by prescribing a triangular sampling grid on approximately 27 km
spacing,  with  a  40-km2 hexagon  (40-hex)  centered  on  each  grid  point  to  supply  the  sample
representation of resource space.  Inventory of each 40-hex  provides the Tier  1 sample. The sample grid
thus provides a one-sixteenth probability sample of the resource area. The Tier 2 sample is a subsample
of resource sites in the sample hexagons; these provide  the  detailed monitoring data. This double
sample provides the monitoring data for characterization of status and trends  of the various resources.
This document provides an overview of the EMAP sampling design and grid framework, along with a
discussion of the statistical estimation and analysis procedures.
                                                vn

-------
                                           FORWARD
The Environmental  Monitoring and Assessment Program  (EMAP)  is  an Office of Research  and
Development program whose goal is to monitor the condition of the nation's ecological resources.  The
goal poses  a challenge that  cannot be  met  without a long-term commitment  to  environmental
monitoring  on national and regional scales.  One component essential to the design of EMAP is the
statistical design  and evaluation of integrated  statistical monitoring frameworks  and protocols for
collecting data on indicators of ecological condition.  A series of technical reports is being prepared on
the development of the EMAP statistical sampling design.

The first report on the sampling design,  "Design Report for EMAP,"  gives the conceptual basis, or
framework,  for the  development  and  implementation of the  EMAP  statistical sampling  design.
Ecological researchers associated with the  EMAP collaborated with statistical researchers to identify
and clarify  the monitoring program's objectives  and complete the problem identification.   Based on
their extensive experience  with ecological monitoring and statistical  sampling design, the authors
developed criteria that the design had to satisfy  to meet the EMAP objectives.  The report gives the
long-term design perspective,  or prescription, for EMAP.  That is, it gives the vision for the statistical
sampling design; with the  vision based on the experience of the EMAP researchers and preliminary
assessments of  what might be implementable.   These researchers are the primary audience  for the
report.  The report documents the design concept as it existed in May, 1990.

Since implementation of the  design is a continuing  process,  the  "Design Report for EMAP" is not a
description  of the final sampling design as implemented by the EMAP.  The ecological resource groups
are actively implementing the design concept.  To aid the implementation process, a companion report,
"EMAP Sampling Design  Implementation Perspective and Issues," gives additional insight into the
EMAP   statistical sampling  design  process.    The  companion report  gives  further  details  on
implementation  issues and translates the general  design perspective  into  the  context of  specific
ecological resources.
Anthony R. Olsen, Project Officer
U. S. Environmental Protection Agency
Environmental Research Laboratory
200 WS 35th Street, Corvallis,  Oregon

-------
                                            SECTION 1
                                         INTRODUCTION
The  need  to establish  baseline environmental conditions  against  which future  changes can be
documented with confidence has grown more acute with the increasing  complexity,  scale,  and social
importance of issues  such as global atmospheric change, acidic deposition, and the destruction and
alteration  of wetlands.  It is  therefore critical for monitoring  programs  to  be in place to provide
quantitative,  scientific assessments of the complex effects of pollutants on ecosystems.

In  1988,  the  U.S. Environmental Protection Agency's (EPA) Science Advisory  Board recommended
implementation  of a  program within  EPA  to monitor  ecological  status  and trends,  as well as
development  of  innovative methods for anticipating emerging  environmental problems before they
reach crisis proportions.  In response to the need for better assessments of the condition of the nation's
ecological  resources,  EPA's Office of Research and Development began  planning the Environmental
Monitoring and  Assessment Program  (EMAP). When  fully  implemented,  EMAP  will  be able to
confirm  that the nation's efforts to protect the environment are producing  the expected results in
maintaining and improving environmental quality.

EMAP provides  a strategic approach for meeting the growing need to identify and bound the extent,
magnitude, and  location of degradation or  improvement in environmental condition.   When fully
implemented, EMAP will answer critical  questions for  policy- and  decision-makers  and  the public:
What  is the  current extent of  our ecological resources (e.g.,  estuaries, lakes, streams,  forests, deserts,
wetlands, grasslands), and  how are they distributed  geographically? What percentages of the resources
appear to be adversely affected  by  pollutants  or other  human-induced  environmental stress, and in
which  regions are the problems most severe or widespread?  Which resources are degrading, where, and
at what rate? What are the relative patterns and magnitudes of the possible causes of adverse effects?
Do adversely  affected ecosystems show an overall improvement?

In  order  to answer  these  questions, an integrated monitoring network  will be  implemented within,
EMAP with the following objectives:

     • Estimate current status, extent, changes, and trends in indicators of the
         condition of the nation's ecological resources on a regional basis with known
         confidence.

     • Monitor  indicators of pollutant exposure and habitat condition  and seek
         associations between human-induced stresses and ecological condition.

     • Provide periodic statistical summaries and interpretive reports on ecological
         status and trends to resource managers and the public.

Assessments of whether  the condition of the nation's ecological resources is improving  or  degrading
require data on large geographic  scales  and over long time frames.  For  this reason, comparability of
data among geographic regions and over extended time periods is critical  for EMAP, and meeting this
need by  simply  aggregating data from many individual, local, and  short-term networks has proven
difficult if not impossible.  EMAP networks therefore will be designed  to  provide statistically unbiased
estimates of  status,  trends, and relationships  with quantifiable confidence limits on national  and
regional scales over periods of years  to  decades.  This characteristic, along with its statistically  based
design, distinguishes EMAP from  most current monitoring efforts.
                                                1-1

-------
EMAP is being designed around six primary activities:

     •   Strategic evaluation, testing, and development of indicators of ecological
         condition and pollutant exposure.

     •   Design and evaluation of an integrated monitoring framework.

     •   Design and testing of protocols for collecting data on indicators.

     •   Nationwide characterization of the extent and location of ecological resources.

     •   Demonstration studies and implementation of integrated sampling designs.

     •   Development of data handling, quality assurance, and statistical analytical
         procedures for efficient analysis and reporting of status and trends data.


This  design report focuses on the second  activity, design and  evaluation of  integrated statistical
monitoring frameworks and protocols.   The document provides a description of  the design objectives
and characteristics (Section 2), an overview of the sampling design and grid framework (Section 3), and
a discussion of the statistical estimation and analysis procedures (Section 4).
                                                 1-2

-------
                                            SECTION 2
                             DESIGN OBJECTIVES AND STRATEGY

The overall EMAP design strategy is to implement a permanent national sampling framework that will
enable  EMAP/EPA  to meet its  program objectives. To guide the development of the design, specific
design objectives have been formulated that will allow  the resulting  monitoring program to fulfill this
goal.

EMAP Design Objectives	

  *  The EMAP design will establish a monitoring program that is capable of
      •  providing rigorous answers regarding any explicit question on the status and
          condition of any regionally defined resource;

      •  providing baseline data leading to rigorous detection and description of
          trends in status and condition of regionally defined resources;

      •  providing assessment of association among attributes, both within and
          between resources;

      •  quickly responding  to new issues and questions.

  *  This design will be implemented with respect to currently identified resources.


2.1  STRATEGIC APPROACH

The design strategy  chosen  to  meet  these objectives  is based on  the grid  of points at which  each
ecological resource will be sampled. The area  around each point will be characterized by ecological and
land use criteria. The areal extent and numbers  of units of the various resource types will be described
in a carefully defined area. The collection of these descriptions at the grid points will constitute a Tier
1 sample and will be used to estimate the structural properties of the regional and national populations
of the resource  types. The  structural properties include the  numbers of  resource units, their surface
area, and other geometric or geophysical measures obtainable from remotely sensed imagery.

A subsample  of this  Tier 1  sample will be used  for field sampling of other attributes of the resources.
This  double sample  will constitute Tier 2 of the design. Field measurements will  be taken on many
attributes, for example, chemical  analyses of water samples, visual symptoms of foliar damage to
forests, species composition of wetlands, and other indicators of environmental well  being, on the
resource units selected in the Tier 2 sample.

The efficient statistical properties of estimates made from the double sample derive  in part from the
extensiveness  of the  Tier 1  sample and the relevance of the information in the Tier 1 sample to the
data  in the  Tier 2  sample.  But a  major  advantage of this design strategy is that  the Tier  1
characterization  of areas around  the grid points is not  restricted to the resources  identified a priori.
Other resources  that can be identified  in the characterization may  be sampled at  a  later time, in
response to a  new issue regarding their condition. The primary design device for  implementing  this
adaptable capability  is the sampling grid.
                                                2-1

-------
2.2 EMAP BYWORDS

In recognition of these demands, it is accepted that the EMAP design must be:

     •   Adaptive. The program must be able to adapt to changing circumstances,
            perspectives, objectives, and knowledge, and to correct past mistakes.

     •   Simple. Simplicity minimizes the difficulties and consequences of adaptive
            change.

     •   Rigorous. Rigor with respect to important considerations will be
            established by explicit protocols, and by a program of quality management.

     •   Robust.  Robust methods will get priority over those dependent on
            assumptions, and effort will be made to ensure robustness of results
            relative to necessary assumptions.

     •   Unambiguous. The resources described by EMAP will be explicitly defined,
            either conceptually or by identification of an explicit frame.                  '

     •   Flexible. The  protocols  and design structures will be  flexible and capable of
             molding to a wide variety of resource and time/space patterns, and accommodating
             existing resource sample designs.

     •   Adequate.  The need to address multiple and changing objectives rules out
            optimization;  adequacy for all objectives must be a guiding criterion.
Other programmatic  components must  also  fulfill  these  general criteria and  uphold the  objectives.
Specifically,  identification  of indicators,  activities of field  data collection,  data management, data
analysis, quality assessment, and quality management must all uphold the standards set for EMAP.
The best design will be useless if it is not translated into worthy operational protocols.

2.3  EMAP CAPABILITY

EMAP is  oriented to regional  phenomena. The capacity to describe  regional status and trends  is
dependent on the capacity  to measure status and trends at sites or on resource units, but the emphasis
on regional phenomena focuses EMAP descriptors on estimates for explicitly defined populations and
on representation of pattern for spatial regions and temporal frame.  The use of probability samples on
well-defined  sampling  frames will  permit  rigorous,  unambiguous  estimation  and  assessment  of
population attributes.  Statistical  methods  based  on explicit  models  will  provide other  rigorous
inferences, including spatial and temporal patterns and association among attributes and resources.
The focus on regional phenomona requires that the EMAP design strategy emphasize accommodation
of a wide range of resources and a wide range of explicit questions. The grid-sampling process provides
the capability of sampling any spatially distributed and well-defined resource. Application of the design
to the currently identified resources ensures that the currently identified rules for selection will be broad
and versatile. Randomization of the grid provides the protocol that generates a probability sample and
ensures the desired  rigor of population characterization. The triangular structure of the grid provides
minimum distance  between grid  points and an  additional degree of freedom from alignment with
regular anthropogenic forms. The hierarchically  structured grid provides the capability of describing
resources at  coarser resolution  than the base grid, or locally enhancing the grid  to sample  other
resources requiring higher resolution. Thus there  will be a general capability for providing information
on multiple spatial scales:  global, national, regional, and local.

                                                2-2

-------
EMAP cannot be designed to optimize the answer to any single question or the representation of any
single resource.  EMAP must provide adequate answers for all questions and resources within the basic
framework  and  adequate  flexibility to accommodate  increased resolution for specific  issues. It is
anticipated that the  greatest interest  in  ecological  resources  is at  the  subpopulation level,  e.g.,  in
specific types of wetlands rather  than all wetlands. The design strategy should therefore incorporate a
classification  scheme  that  identifies these subpopulations and a sampling  structure that  provides
estimates of status and  quality  for  them. The flexibility to  adequately characterize subpopulations
spanning a wide range, of densities and distributions is an essential property of the EMAP design.

The probability sample will provide  the basis for rigorous estimation and population characterization.
However, good  scientific  descriptions will also require good  indicator  characterization  of sites and
resource units.  In general, indicator measurement  on sample  resource   units will not  be based  on
probability samples, so it will be necessary  to provide  standards of  objectivity  and  representation.
Further, inferences of population characteristics  will be accompanied  by quantified statements  of
statistical uncertainty that acknowledge all relevant sources of statistical variability.

The establishment of cause-and-effect relationships among stressor, exposure, and response indicators is
not  a design objective  for  EMAP. However,  the  EMAP  design   must accommodate  the  planned
collection of information  to investigate associations between known or  suspected cause-and-response
ecological indicators.  The design must be  flexible enough  to enable the evaluation of such associations
both within and  among resources.  The  EMAP  design  should  also  provide for limited diagnostic
capability  via correlations  between condition  and potential causative factors, thereby identifying areas
for further examination or research.

It is anticipated that there will be changes in  resource definition and classification as EMAP progresses.
The EMAP design must  be adaptable to accommodate changes or refinements in resource definition or
classification. In addition,  implementation of the changes should be without  substantial  alteration of
the  sample or  its ability  to meet  the other  EMAP  objectives. As an  example, the design should
accommodate the  identification  and correction of errors  in the initial classification that  will accrue
through the accumulation of improved information developed by the EMAP monitoring tasks.

Substantial amounts of information similar to that collected by EMAP exist in data sets  generated  by
other monitoring networks, and  the EMAP  design  should be  capable of integrating and  using  such
information. It  is anticipated that integration can best be  achieved when  the other network  is based  on
a rigorous statistical foundation. However, it will  also be necessary  for the EMAP design to establish
linkages to  networks  that use  nonprobability sampling designs.  The  primary  advantage  of  such
associations will be to expand the temporal and spatial resolution of EMAP-derived information and to
provide a common framework for reporting environmental  monitoring data.

2.4  TIER STRUCTURE

The EMAP monitoring strategy  is based on a hierarchical structure,  in which several distinct tiers are
identified.

1.   Tier 1.
       The grid (Section  3.2) provides  the  structure for  the Tier  1  sample,  and the  Landscape
       Descriptions (Section 3.3.1) provide the data for that sample. Because the  individual resources
       are  not necessarily  represented  by explicit frames, and because  the design is intended to  be
       capable of sampling any resource,  including unspecified ones,  a  device like  the grid is essential.
       Even in  sampling an explicit frame,  the grid is a good way to  select a sample  to fulfill the
       EMAP objectives. The data  generated at this tier provide the  estimates of regional structural
       parameters for  the defined resource frames.
                                                 2-3

-------
 2.    Tier 2.
       This represents the level of higher resolution characterization of specific resources. The sample at
       this level provides resource units selected explicitly for field visitation in a  manner to best
       respond to questions regarding status and trends of the specific resources.

 3.    Tiers 3 and 4.
       These levels are not distinct in the thoughts of planners at this time.  Collection of field data on
       individual units, high  resolution study of trends,  process  studies and other functions are  all
       lumped together. As  the  EMAP planning process proceeds to the later stages, attention will be
       given to organizing these topics and making the  roles of Tiers 3 and 4 explicit. The original
       interpretation of Tier 3 as the level of annual measurements for trend detection  has been greatly
       modified by  development of the interpenetrating design (Section 3.4), which has  moved  trend
       detection into Tier 2, and by investigation into the required frequency of resampling .

 2.5  AN EMERGING PERSPECTIVE OF EMAP

 As we have progressed in EMAP design, and as the basic design  issues are resolved and more subtle
 issues emerge, a clearer perspective of the capability and potential of EMAP monitoring has begun to
 take  form. One aspect of  this  perception  derives  from our  investigation of the need for annual
 observations on some systems. The fundamental tradeoff is between annual observations on  n systems,
 or observations every k years on kn systems. As we  investigated this  tradeoff, the  perception grew
 increasingly strong that there are many advantages to a broader and more  comprehensive coverage of
 the populations  that we want to characterize.  The perspective that remains is that EMAP, in  the
 design proposed, has the  capability of discovering  trends and insult in subtly defined  and  spatially
 diffuse populations that are not  likely to be recognized  without  a design like that proposed for EMAP.
 This  capability is enhanced by  increasing the number of systems  periodically monitored by EMAP.
 Less frequent visitation is one way of gaining increased numbers  of systems.

 Another aspect is that EMAP is capable of providing information not available in other monitoring
 programs, and of putting that  information into  a more  useful and integrated  form. EMAP design
 objectives should be oriented toward  these novel capabilities, and not toward  capabilities tha.t already
 exist  elsewhere. Strong trends and flagrant insult will be detected without EMAP, so their detection is
 not a necessary design objective.

 Finally,  we perceive that  we  have just begun to plumb  the potential and general capability of  the
 EMAP concept. It is important to think of EMAP as an evolutionary system that will evolve as EMAP
steersmen exploit its adaptability in responding  to new and newly  perceived issues.  In  order to remain
vital, and not succumb to the aging process, EMAP requires a structure,  which might justifiably be
thought  of in Wienerian terms as cybernetic, that uses that adaptability to fullest  advantage. At this
point it  will suffice  to focus on the EMAP capability that does not reside elsewhere, and think deeply
about those things that need EMAP in  order to  be discovered and described. These  are the things that
should establish EMAP directions.
                                                2-4

-------
                                             SECTION 3
                                        SAMPLING DESIGN
3.1  DESIGN OVERVIEW
The resources of  the United  States are sampled  via  a triangular point  grid. This grid identifies
approximately 12,600 locations at which ecological  resources will  be cataloged, classified,  and studied
with respect to condition and status and to  trends in condition and status. Most of this descriptive
effort will be focused on the areas immediately surrounding the grid points. These activities are divided
into two tiers, the first directed toward information that  essentially  can  be  collected using remote
sensing  (e.g.,  aerial photography, Landsat images,  maps,  and other  existing  information) and  the
second  directed largely toward  more intensive data collection at sites selected to represent specific
resources.

At Tier 1, the area surrounding  the grid points will  be characterized. It is convenient to think of these
characterizations as consisting of two separate data sets, although  one is really  a subset of the other.
Specifically, landscape descriptions of the 40-km2 hexagon centered on each grid point provide the base
Tier 1 dataset. These hexagon (40-hex)  descriptions  constitute a probability  area sample of the United
States.  From these descriptions will be generated  regional estimates of areal extent of all ecological and
landscape classifications and regional estimates of numbers of entities of all  discrete ecological objects
of interest, such  as lakes, stream reaches, or prairie potholes.  Any classification  available  at this level
carries into the structure of these estimates.

This Tier 1, 40-hex description also provides a sample from any explicitly defined resource  frame, such
as the frame of lakes  between  10 and 2,000  ha in the United States, or the frame of primary sampling
units in  the   National  Agricultural Statistics Service  (NASS)  area sample.  In  this capacity,  the
landscape descriptions supply the Phase I sample for each of the resources identified, to be sampled in
the current implementation  of EMAP.   This function requires only a partial  completion of the full
landscape descriptions. A feasible implementation strategy generates the Tier 1  sample for the various
resources and then completes the rest of the  landscape descriptions  at a later  time. But if the landscape
descriptions are completed before any of the Tier 1 resource samples are identified, the latter will  be
included in the descriptions.

This perspective makes  clear  the most powerful property of  this design:   the generation  of Tier 1
samples of yet unidentified resource frames. Just as the landscape descriptions provide a sample for any
identified frame, they also provide samples  for frames that  have not yet been defined, as  long as  the
specification of landscape attributes is general  enough to include the frame identifying attributes.  This
is  justification for  the great  generality of the  landscape descriptions, providing the capacity for rapid
response to novel questions.

A  Tier 2 resource sample will be  a subset of the Tier 1 sample  for that resource, selected by probability
methods for  more  intensive data collecting, and  typically involving site visits. The data  collected  at
Tier 2 will be  the  basis for  reporting on regional status and trends in indicators of  ecological response
and pollutant  exposure,  as  well as other attributes  recorded  at this level.  Unless  there is  a specific
question regarding  co-response of two resources, the Tier 2 resource  samples will be independent.

Rare resources will provide  a small Tier 1 sample that  is incapable of supporting  a sufficiently large
Tier 2  sample. For such cases,  the  grid  will be enhanced  by one  of  several available enhancement
factors, solely  for this resource, and  solely in  the necessary area to cover this resource. The enhanced
grid will also be regular and triangular.
                                                 3-1

-------
The characterization of one-sixteenth of the  total area of  the  United States will provide baseline
descriptions of the  ecological resources of interest, and  a general landscape basis for  characterizing
landscape change over  time.  Specific resource samples extracted from the general  descriptions provide
the basis for describing status and detecting trends relative  to that resource. The universality of the
grid and the generality of the descriptions ensure that any resource is sampled by  the design, and that
a Tier 2 sample can be constructed for any newly identified resource with a minimum of  effort and
disruption.

3.2 THE SAMPLING GRID

3.2.1 The Grid

The point grid established for the  United States is  triangular,  with approximately  27 km between
points in each direction.  A fixed position that represents a permanent location  for  the base  grid is
established  (Figure  3-1), and the sampling  points to be used by EMAP  are  generated by a slight
random  shift of the entire grid from this base location. The full descriptions and rationale for the
cartography and geometry of the grid are given in White et al. (1991).


3.2.2 Geometric Design Criteria

The design  objectives were translated into a set of geometric specifications required for the grid:  (1)
equal  area  sampling  structure  using  a regular  placement of  sampling  locations,   (2)  compact
arrangement of sampling  locations,  (3)  hierarchical structure, and (4) a realization of the  grid on a
single planar surface for the entire domain of application.

Regularity  in the planar  projection is required for the  randomization step prescribed  for  EMAP. A
regular placement of points on the plane can be achieved either with a square or a triangular  lattice.
Hierarchical decompositions are available in both square and triangular spacing.

The  triangular  structure  is  slightly  preferable to the  square in compactness,  evenness  of  spatial
coverage, and precision of spatial estimates (c.f., Olea, 1984). The square structure has  the  advantage
of being implemented in many computer hardware devices and software data structures. The triangular
arrangement has the advantage  of  an extra degree of freedom from coincidence with  anthropogenic
structures such  as public  land survey  lines. The balance is  slightly in favor  of the  triangular
arrangement, and EMAP  will use a regular  triangular grid extending without interruption  across the
entire domain.

3.2.3 The Global Structure of the Design

Although the initial EMAP objective was the  conterminous United States, with potential extension to
Alaska and other areas, there was  from the  beginning  a perspective that the design for the  United
States should be considered  in the context of whatever global designs were in existence. Some  global
models do exist, with varying properties and  purposes, but none fit the design criteria set for EMAP.

These criteria for the United  States alone are met by the  standard map projection  for the conterminous
United  States, the Albers projection. This projection  has the equivalent, or equal area,  property (the
ratio of two areas  on  the sphere  to the corresponding areas on the projection is constant)  and a
relatively low degree of scale distortion  (resulting  in low  distortion of  shape  and distance). Scale
distortion is within 1% for  most of the United States and is exceeded  only in the southern parts of
Florida and Texas.  However, the distortion degrades sharply into neighboring areas of Canada and
Mexico. The search was extended in the hope of getting a projection that was better for neighboring
areas, and also extended to Alaska.
                                                 3-2

-------
Figure 3-1.  The base grid placed advantageously on the United States.





                                                3-3

-------
Other projection  systems in  common  use are also unsuitable for  various  reasons.   The  Universal
Transverse Mercator projection,  for example, is  neither  equal area  nor  continuous across the
conterminous United States. It consists  of 6 degree meridional zones.  Consideration of a number  of
proposed global tessellations1, or  other  systematic treatments  of  areas on the globe, has resulted  in
identification of one that provides good representation for the United States and neighboring regions,
including Alaska,  and can be used adaptively for other parts of the world. This tessellation is based on
the  semi-regular  geometric  solid  named the  truncated icosahedron,  which  is a modification of the
regular icosahedron, the 20-sided regular  platonic solid.

The truncated icosahedron  starts  with the 20 equilateral sides of the icosahedron and replaces each
vertex of  this model with  a  regular pentagon.  The  resulting solid has 20 hexagonal faces and 12
pentagonal faces.   This  solid is the basis for the most  common construction  of the surface of a soccer
ball (Figure 3-2).   Each hexagonal face  of the truncated icosahedron  may be considered a domain for
implementing the EMAP sampling grid.  Neighboring plates can be contiguous along one edge, but will
not join with all adjacent plates in the plane. The  regular grid  on a hexagon will lose some regularity
when extended onto pentagons. Additionally, there is a minor sampling irregularity at the edges of two
plates.

One hexagonal face, when appropriately centered on the United States, covers the land area and part  of
the  adjacent  continental shelf of the conterminous United  States, southern  Canada,  and northern
Mexico.  On this  hexagonal face, an  equal area map projection (Lambert's azimuthal equal area) was
constructed. This  projection has a maximum scale distortion of approximately  1.1% at  the center and
at the vertices of the hexagon.

As a global design, this  configuration has some disadvantages. A polar orientation does not exist that
is satisfactory for  the primary objectives  of EMAP,  and the orientation chosen for EMAP provides poor
representation for a number of areas on earth. However, it is possible to generate plates for the several
areas,  more or less independently of each other, using the general approach adopted for  EMAP. An
adaptive design based on this model will  provide the needed representation in any area.

3.2.4 Baseline Grid Density and Augmentation

The hexagon covering the United States has been chosen as a plate from the truncated icosahedron and
measures approximately 2,599 km  on  each side (Figure 3-1).  Each  side of this hexagon and each radius
connecting the center  with a vertex is divided into  96  equal  parts.   A  triangular grid is  then
constructed by  connecting  the endpoints of  the  96 parts along  each  of the three axes of the six
equilateral triangles making up the hexagon.  The endpoints plus the intersections of the constructed
grid form a triangular grid having 27,937 points in the full hexagon.  The distance between the points
on this grid Is approximately 27.1 km.

The Dirichlct  (Theissen, Voronoi)  polygons constructed on the grid  points are hexagons. These
hexagons tile the  plane  and have an  area of approximately 634.5 km2.  Characterization hexagons of
l/16th  this size  (approximately  39.7  km2)  will  be  centered  on  the  grid  points. There  will  be
approximately 12,600 grid  points in the conterminous United States,  thereby establishing the grid
density for baseline sampling in EMAP.

Augmentation of the baseline grid to  increase  the density of sampling for certain rare resources can be
achieved through  the hierarchical  geometrical properties of triangular grids.   In any specified region,
the grid can be enhanced by the insertion of additional locations in  a systematic pattern that is a factor
of 3, 4, or 7 times that of the baseline density. Products and quotients of combinations of powers of
these factors will also provide enhancement factors. For the factor of 3,  a new point is placed in the
         A  tessellation is a pavement or tiling by a mosaic pattern.
                                                3-4

-------
Figure 3-2.  The truncated icosahedron model projects the familiar soccer ball tessellation pattern onto
the earth.
                                                 3-5

-------
 center of each triangle defined by three adjacent points. For the factor of 4, a new point is placed in the
 center of the line connecting each pair of adjacent points.  Additional enhancements are described in
 White et al.  (1991).

 The hierarchical structure is also useful for identifying interpenetrating subsamples (Section 3.4). The
 factor of 4 is prescribed, leading  to 4  interpenetrating subsamples that are visited in successive years,
 with a repeating cycle length of 4 years. However, 3, 7, and 9 are also feasible factors.    :

 3.2.5  Randomization

 The randomization of the grid will be achieved by a random  shift in the plane  of the entire grid of
 points.   As the design is systematic,  a single random translation will be chosen and all points will
 translate systematically.  The  restriction that randomization be a feature of EMAP  is important, in
 that this is the basis for the strict requirement of grid regularity. This restriction is also important in
 that it provides the protocol that establishes the EMAP systematic sample as a probability  sample.
 Other ways could have been used to get a probability sample; this probability sample also provides the
 advantages of a systematic sample.
 3.3  THE SAMPLE

 The Tiers (Section 2.4) impose the dominant sample structure,  and many of the implementation tasks
 are  organized by  this structure. The interpenetrating design structure is  separable from the general
 sampling considerations, and is discussed in Section 3.4.

 3.3.1  Tier 1. Landscape Descriptions

 One-sixteenth of the area of the United States will be characterized, primarily on the basis of remote
 sensing data  (e.g.,  aerial photography, Landsat images,  maps, and other existing  information). These
 characterizations provide  baseline descriptions  of all  ecological  resources, identify  resources  and
 populations of concern, provide a base for assessment of landuse  change, and provide a Tier 1 sample of
 all resources,  from which a subsample will be taken for assessment of resource status and trends.

 Landscape descriptions (LDs) will be made on hexagons (40-hex's) constructed and  centered on the grid
 points to have area (39.7 km2) exactly  one-sixteenth of  the area per point in the  base grid. Physical,
 biological, organizational, and anthropogenic attributes will be described as  part of the characterization
 (for examples, see Section 5).  These descriptions will be  in the form of (1)  maps of the areal extent of
 the several landscape, landuse, and resource classes prescribed, (2) numbers, location, and classification
 of discrete resource units, and (3) identification  and measurement of a variety of attributes of landscape
 and  resource structure and organization.

 The LDs have several  functions with respect to fulfillment of EMAP objectives. As area samples, they
 provide  the  data for estimation of certain  regional resource and landscape parameters.   They  also
 characterize  the Tier  1  (or first  phase or  first stage)  sample  that  will  be subsampled for  further
 characterization  of resource sampling units  or  sampling sites, leading to regional estimates of other
 parameters of those specific  resources.   Additionally,  these LDs  must provide  sufficiently general
 descriptions of the ecological resources  that it will be  possible to identify, at a future date, other
 resources for which new questions have arisen.

3,3.2 Discrete and Extensive Resources

Resources will be divided into two  categories, depending  on the  size of the  natural unit.  If the natural
unit is  less than 2,000 ha  (roughly  half of a  characterization  hexagon),  it will be categorized as a
discrete resource.  Most wetlands, lakes, and stream segments are discrete resources.  If the natural unit

                                                3-6

-------
 is 2,000 hectares or  larger,  it will be called an extensive resource. Many estuaries, forests, large lakes,
 and large wetlands (e.g., the Everglades) are extensive resources. Large rivers will be a special class of
 extensive resource. It is necessary to recognize these differences because the methods  of sampling and
 characterization are often  different  for  the several forms. Sets  (populations) of discrete units can be
 characterized in terms  of  unit  characteristics, whereas extensive resources will  be represented on an
 areal or other basis.

 The class of extensive resources  also contains a special subclass of interest, being the class of continuous
 resources. The  major examples  of continuous resources are water and air, although many others may
 have similar sampling properties.

 These  different classes  of resources  require somewhat different  sampling  strategies and  different
 descriptors, and these features enter into design considerations. We also note that some populations on,
 say, Chesapeake  Bay are continuous and others extensive, so that the designs must  be very flexible.
 The consolidating feature is that each resource is sampled in some manner in the neighborhood of each
 point in the grid, unless the resource is not found in that neighborhood.

 3.3.3 Sampling  Considerations

 The  intent  to  use  the landscape  descriptions for  estimation and  further  sampling  puts certain
 requirements on  the nature of the records  made.   Each  40-hex will be  partitioned into the areas
 occupied by the various resource and land  use classes. This requires digitization of the boundaries and
 involves a certain degree of subjective interpretation of the exact location of many of the  boundaries.
 All uses of these classes  must accommodate their fuzziness.

 Identification of the numbers  of resource  units  in  a  40-hex  requires  that  all  resource  units  be
 represented as "being" at a point. This location  point can be arbitrary, if it was set without knowledge
 of the grid location, as in  the digital line graphs (DLGs).  Otherwise it must be objective. One such
 objective definition would be the centroid of the  digitized unit boundary;  another would be the centroid
 of the tangential NSEW rectangle.

 The ways in which these records will be used in  the sampling process require that certain care be taken
 in generating these data in the beginning, and a certain reluctance in modifying  them in the  future. If
 the digitized boundary of a unit is changed, it will not be necessary to change the original identity of
 the point position of the unit. However, other changes will have greater impact  on sample properties.
 These are all considerations of the long-term adaptability of the design.

 3.3.4 Tier 1 Sample of Extensive Resources

 Extensive resources, e.g., lakes over  2,000 ha  in  area,  can generally be  sampled by a list frame.
 However, these  are  also sampled in  a meaningful  collective manner by  the  40-hexes  in the area-
 sampling mode, and otherwise by the point grid, in a point/spatial mode. Specifically,  the area sample
 that derives from the LDs will contain sample areas of large lakes, simply as a landscape classification.
 These samples  lead  to  estimates of areal extent of the class.  Field  observations to determine higher
 resolution characteristics will lead to estimates of the population parameters for those characteristics.

 Sample areas included in the 40-hexes  can be the basis for location of quadrats,  transects, or other
surrogates  for  discrete  sampling units on  which  to make field records of indicators. Via the area-
sampling mode, samples selected in this  manner can be carried into Tier 2 samples in the exact manner
of that prescribed for discrete resources.

 For the special case of continuous resources, such as estuaries, the samples may be point samples. Point
samples  will also arise in  "plotless"   methods,  as  are often  used  in  forestry.  Thus,  for  certain
noncontinuous,  extensive resources, the locations making up the Tier 1 samples will be points  in space

                                                 3-7

-------
determining  where observations will occur, in a manner similar to the point designs for  continuous
resources.

It will often  be of interest to characterize entire specific extensive resources, such as the Everglades or
Chesapeake Bay. In such cases, grid enhancement (Section  3.3.5) will  often be a desirable option in
order to  gain greater precision  for spatial representation of pattern. The sampling plan  for  Near
Coastal, currently 5n review, contains  explicit plans for monitoring such extensive  resources,  with
illustration of grid enhancement and other features of interest.

3.3.5 Tier 1  Sample of Discrete Units

Discrete resource units (e.g., entire  lakes, entire wetlands) identified in  the LD data will make up the
(full) Tier 1 sample for the respective resources. The set of materials derived from the LDs regarding
those resource units constitutes a significant part of the landscape descriptions for the various resources.
It is important to identify this as a subset of the LD data,  but also important to recognize that this
particular dataset could  be generated,  in  its entirety, before any of the rest of  the  LD  data are
generated. These data are  a subset of the whole,  but not dependent on the  whole  except through
protocols of generation.

As a Tier 1 activity, each of the Resource Groups  (Wetlands, Surface Waters, Great Lakes, Forests,
Agro-Ecosystems, Near Coastal, and Arid Ecosystems) will determine the specific classes of resources to
be distinguished in the  LDs. There will  be from few to many  classes for  each of  the  resources
represented by the several Groups.  It is necessary to recognize  that some of these classifications will
have significant misclassification error; such a classification would not make good stratification.

3.3.5.1  Tier 1 Resources

Also as a Tier 1 activity, each of the Groups will  designate certain classes as  Tier 1 Resources. The
decision regarding these designations does  not have to be  made prior to landscape description, but
rather  can be based on  the LD data. Any resource that is of  particular interest, or  disinterest; any
resource that is too rare or too spatially limited to  get adequate representation; any resource that has
the potential of screening others of greater interest; any resource  that requires special treatment for one
or another reason can be  designated as a Tier 1 Resource. Tier 2 samples will be taken from the Tier  1
Resources in a manner that establishes them as strata.
 3.3.5.2 The Process of Selecting the Tier 2 Sample

 There are several steps between the generation of the full Tier 1 sample, as derived from the LDs,  and
 the final selection of the Tier 2 sample. For each Tier 1 Resource, identify the Tier 1 40-hexes in which
 the resource Is present and perform these two steps:

             1.  Select a single resource sampling unit at each grid point (40-hex).

             2.  Select a subset of 40-hexes.

 This two-stage selection process can be performed in either order. As the rules of selection at each step
 can be varied, there are a number of possible  designs for the process.  In particular,  several different
 rules (rules of association) have been proposed for associating a resource unit with a grid point:

             1.  Select the unit in the 40-hex closest to the grid point2,
                  7Tt- = a,-/634.5, where a,- is the area of selection of the unit.
         2See Section 4 for statistical notation.
                                                  3-8

-------
            2.  Select the unit that contains the grid point,
                 7rt-  = a,-/634.5, where a,- is the area of the unit.

            3.  Select a unit with equal probability from the set in the 40-hex,
                 TT,-  = l/16ni, where n,- is the number of resource units in the 40-hex.

3.3.5.3   Design Option 1

The first design considered did not depend on the LDs. Specifically, a rule of association was prescribed
that would select a  resource sampling unit  (RSU) at each grid  point, and then this sample of RSUs
would be subsampled to yield the Tier 2 sample.  As the utility  of the LDs became more apparent, it
became clear that the third rule of association, above, was not best utilized with this option.

For  each resource of interest  (e.g., a specific type of wetland)  and  for each 40-hex  within which  the
resource occurs, one resource unit  will be selected.  This collection of  sampling units was called the Tier
1 sample for that specific resource. (This usage is  out of date; now the entire set of units in a 40-hex is
considered to constitute the Tier  1 sample.) The  Tier  1 inclusion probability for  the selected unit is
carried  as part of the Tier 1  data record for that  unit.

Recall that the Tier 1 sample for this resource  now contains no more than one unit for each  40-hex,
and  still contains all the 40-hexes  that had the resource. The next step selects a subset of this sample of
units (or of the 40-hexes) to obtain the Tier 2 sample. Two criteria are specified here:

            •  Sample with  conditional inclusion probability inversely proportional to
               the Tier 1  inclusion probability.

            •  Sample in a manner that retains the spatial distribution of the Tier 1
               sample.

Each of these criteria imposes constraints on the selection. The  first imposes an upper limit  to Tier 2
sample size, and must occasionally be relaxed in  order to obtain the desired sample size.  The second
imposes minimum sample size.

Other options  may  differ statistically from  this one, and there  are  other sequences  of steps that will
produce an  equivalent  sample.  The  following  alternative is of particular interest  because it offers
adaptive advantages.

3.3.5.4  Design Option 2

First, employ  the step  reducing  the number  of 40-hexes, and  then select  an-RSU  from each of  the
selected 40-hexes according to one of the rules of association. This switches the order of the two steps in
Section 3.3.5.2 from the order used in  the first  option (Section  3.3.5.3). There appears  to be no
advantage to this arrangement  when the first or  second rule of association is used.  However, it does
seem advantageous  to  use this order if the third rule is used, and the following example is developed
with the third  rule.                                                                      ,

Given the set of 40-hexes containing units of a  particular Tier  1 resource, select a subsample of these
40-hexes with probability proportional to the number of units. Employ spatial constraints in  the usual
manner. Then select a single unit from  each  of the  selected  40-hexes with equal  probability.  The
resultant sample is identical to the sample derived from the first option, when that option uses  the
association rule of a single random unit from each 40-hex. The difference is that the full set of units in
each 40-hex has been carried into  the second stage of this selection process, 'and so  will be available for
         A method of sampling that retains spatial distribution is outlined in Section 4.2.5.
                                                 3-9

-------
modification of the sample at a future time. (The comparison of the possibilities is not complete,  but
recognition of this option puts a new light on the choice of selection rule.)

3.3.5.5  Wrap-up

Because of the way that these options relate to each other, and the way they compare,  it is reasonable
to take the view that Option 1 should be used only with association rules 1 and 2, and Option 2 should
be used only with association rule 3. Subsequent treatment in this report will follow this  identity.

3.3.6  Augmenting the Grid

Certain  resources may occur in relatively few of the 40-hexes. This will happen when  the resource is
highly localized (e.g., redwood forest)  or rare (e.g., some types of wetlands).  One way to  obtain a
larger Tier 1 sample with good spatial coverage is to increase the grid density over the area occupied by
the resource. The increased grid density will be accomplished by augmenting the original grid (keeping
the original  grid  points and adding  others; see Section 3.2.4). The enhanced grid points will receive
landscape description only for the resource for which enhancement was made.

3.3.7 Subpopulations of Interest

The Tier 1 resources have been sampled as Tier 2 strata. Given a Tier 1 sample for a discrete resource,
generated  in the manner of the previous section,  certain subclasses may also be sufficiently important
to be considered as strata. Although it is possible to  impose another level of stratification below  the
Tier  1  resources,  the current perception is that classes needing special consideration should  be made
Tier 1 resources.

In considering  this, it  is  necessary  to be  aware that any subclasses of a Tier  1 resource can be
characterized by subpopulation estimation.  Further, the fewer strata, the  better, unless there is clear
conflict among objectives that requires stratification for resolution. When  the Tier 2 sample is visited,
additional classes will  be identified,  so that the  resultant  estimates can also  reflect  populations
constructed on any of these Tier 2 classes.

There is a natural hierarchical structure in these three levels of classification that is  reinforced by  the
fact  that Tier 1 resources have  been treated as strata. Thus all subclasses within Tier 1 resources are, in
a sense,  nested within those strata. Further,  if the subclasses identified at Tier 1 are used as sub-strata,
this  imposes a second hierarchical level on the sample.

However, there is no need for the subpopulations identified by  the classifications -to be conceptually
hierarchical  or  mutually exclusive; a  subpopulation of interest may  interpenetrate two strata.   In
technical jargon, the union of any set of classes, constructed as intersections of any classification,  can
be a subpopulation of interest.  But the disadvantage that accrues from combining population estimates
over strata is one  of the major reasons for minimizing stratification.                                .

3.3.8 Tier 2 Characterizations

Tier 2 is oriented toward collecting information at  the Tier  2  RSUs or sites.  Physical, chemical, and
biological  measurements made  at Tier  2  will form the basis for estimating regional and  national
estimates of status, change, and trends, and for identifying additional subpopulations of interest.   For
Tier 2 characterization, each resource group is evaluating, selecting,  and  developing  measurements of
overall  resource condition (response indicators),  measurements related  to  pollutant or  other  exposure
(exposure  indicators),   and  measurements  of  possible  sources   of  exposure  (stressor  indicators).
Associations between these indicators  will  be of particular interest  for  studying possible causes of
change.  Details of the EMAP approach to indicators is provided in Hunsaker and Carpenter (1990).
                                                3-10

-------
3.4 INTERPENETRATING SAMPLES

To complete the design specifications, it is necessary to address the issues of numbers of units and sites
to be included in the Tier 2 sample, and also the schedule of revisits, for repeating measurements on
the same units.

EMAP  is designed  both to  describe current and ongoing status and  to  detect  trends  in  a suite of
indicators. These two  objectives have  somewhat conflicting design criteria;  status  is ordinarily best
assessed by  including as many population units as possible in the sample, while trend is ordinarily best
detected by repeatedly observing the same units over  time. Meeting both objectives may require some
trade off between the designs that are best suited for each objective. One mitigating factor is  the simple
fact that  for faint trends  there is little value in observations on an annual basis; annual observations
are better suited to strong trends.

Other considerations also are involved in this design decision regarding the temporal/spatial pattern of
the monitoring  activity. An early proposed option was the regional rotation.  In this option, all Tier 2
sites in  a region covering approximately one quarter of the United States would be visited in one year.
Sampling efforts would shift to other  regions during successive  years, with  regions and  sites being
revisited in  a four-year rotation cycle. Regional blocking satisfies certain  logistic concerns.

3.4.1  Interpenetrating Subsamples

The interpenetrating design was devised as an alternative to the regional rotation.  This option proposes
to  block  the  sample  according  to  the  fourfold  decomposition  of the grid into four interpenetrating
subsamples  (Figure  3-3).  This decomposition would apply to both the Tier 1  and the Tier 2  sample. In
the first year, the first  interpenetrating  Tier 2 sample would be implemented over the entire continental
United  States.  The second year, Tier  2 sites from the second interpenetrating  subsample  would be
visited,  and so on (Table 3-1). In this manner, all of the Tier 2 sample sites  would be visited during a
four-year period.  A second cycle could  begin in the fifth year with revisits to the  first interpenetrating
subsample,  and this pattern would continue indefinitely.


This approach ensures  nearly uniform spatial coverage for each annual subsample.  The four subsamples
"interpenetrate"  the  spatial  structure, and  whatever other  structure  exists,  and provide  annual
estimates of  population  parameters  over  every geographical  region and  over  every identifiable
population,  no matter  how dispersed it  might be.  The Tier 1 subsamples are highly related. The degree
of  relation  between the  Tier 2  subsamples  depends  in part  on the  manner of their  selection  and
especially on the degree of spatial control in that selection.

Some other rotating panel  and partial replacement designs have been considered; our conclusion, based
on the  preliminary  results, is that the  interpenetrating design  is by far the most favorable of those
considered.  However, there is an appreciable advantage to  a low rate (~10%) of annual repeats during
the first cycle of the interpenetrating design. This advantage accrues from site pairing that  eliminates
many sources of variation,  but it is very interesting that a greater percentage  of repeats, or carrying the
repeats  beyond  the first cycle, has little benefit. Once some sites have been revisited at the period of the
cycle, there  appears to  be negligible advantage to annual revisits.

The  interpenetrating  alternative  is  proposed  for  EMAP because of  its estimation and  reporting
advantages. During the  first year, the interpenetrating sampling alternative provides  national  and
regional estimates of condition,  with higher resolution estimates available as more  sites  are visited in
successive years.  An important  advantage of the interpenetrating sampling design is in the  estimation
of regional  and  national trends through time (see Section 4).   Generally, EMAP will focus on faint
trends that  do not result in immediate catastrophic changes. Such trends require some time before the
                                                3-11

-------
        o    •
     A— -o-
                            A    n
  --O—-•    O    •
-A    n
n
        o   m--—o—-m    o   m
     A    n    A    n    An
        o   m    o    m    o   m
     A    a    A    n    A    n
        o   •    o    •     o   •
4—-n—-4    o    A    n
  \        f
  b    *'-—G-—p    o    m
n   X   ti    A    n    A
            \   /
             \   f
   m    o   ₯   o    •    o
A   n    A    n    A    n
   o    •    o   m    o    m
n   A    n    A   n    A
   •    o    m   o    m  .  o
            Incorrect alignment
                                             Correct alignment
                                           O
Figure 3-3.  The structure of the EMAP grid, illustrating the fourfold decomposition that will be the
basis of the interpenetrating sample design.
Table 3-1.  Schematic of the rotating interpenetrating design prescribed for EMAP. The subsamples
arc identified by the fourfold decomposition of the triangular grid, as in Figure 3-2.
SubsampleJ. Year-
                                  7    8    9    10    11    12    13
1
2
3
4
XXX X
X X X
X X X
X X X
                                          3-12

-------
cumulative  change is detectable, and as great a population coverage as possible is needed in  order to
isolate subpopulations  that  may respond differently than others. The interpenetrating design is well-
adapted to detecting persistent, gradual change of diffuse subpopulations and to accurately representing
the trajectory of status variables.

3.4.2 Ramping Up in the First Cycle

The logistics of implementing a new monitoring program apparently dictate that some form of "ramp-
up"  strategy be employed. This issue is still being investigated, and several options for ramping up to
full EMAP  monitoring effort are being discussed.  Ramping-up involves the scheduling of the landscape
description  part of the Tier 1 sample selection, as well  as training and deployment of field crews and
the logistics of field data collection. Additionally, the  various options involve statistical issues that
have significant bearing on  the  choice. It seems clear that  some kind of ramping is required,  but just
how this will be accomplished is yet to be determined.

The design-preferred (as opposed to logistics-preferred) implementation of the interpenetrating design is
completion  of characterization and  selection of Tier 1 sites for the first interpenetrating subsample
before initiating Tier 2 field work. The same restriction would apply to the second grid subsample, and
to the third, etc.  The  option of regional  ramping at the Tier 2 field level creates a number of design
difficulties and should not be entered into without full awareness of the statistical consequences.

One ramped  option  that  has  not been given  sufficient  consideration is based  on  resources  and
indicators. Any of the  ramped  options involve ramping on resources and indicators, and this is one of
the attractions. It might be feasible to implement the full interpenetrating sample, but with a reduced
field load, by judicious selection of indicators. This option would bring some populations  and estimates
on line later than  others, but would not disrupt the selection and estimation  protocols.

Although supplementing the interpenetrating  design with a few annual repeats during the first cycle
appears to be desirable, the  situation does not look favorable for such a plan. If the basic effort cannot
be  mustered  to jump into  the full interpenetrating  design,  supplements appear to be out  of the
question.


3.5  EXISTING INVENTORIES AND MONITORING PROGRAMS

Existing  inventories and  monitoring programs can be useful  to EMAP in  several  ways. They can
provide background and historical information of interest to EMAP. They  can provide estimates that
fulfill  the objectives of EMAP,  either completely, or for specific subpopulations. They can provide an
explicit frame that covers EMAP resources, either completely, or in part.

In some cases, EMAP  objectives will be fulfilled by extracting  information  from the reports generated
by  existing networks. In others, cooperative agreements may provide for additional data collection so
that  the  modified  network better represents EMAP  objectives. In  still others,  fully  cooperative
monitoring  programs may be devised to fulfill the objectives of each program.

The EMAP grid is capable of sampling the frame of an existing network or of sampling the monitoring
sites of that network. This capability provides for a variety of supplementary or verification designs in
interfacing  the existing network with the EMAP monitoring programs.  At the extreme, EMAP could
have an independent monitoring net on the existing frame. The  possibilities are many.

The EMAP sampling  grid  provides both remotely sensed descriptions of resources and a convenient
(conceptual) frame  for sampling at Tier 2.   For certain EMAP resource groups, rigorous probability
monitoring  programs may already be in  existence that, with some supplementation, could replace the
need for Tier 2 sampling that is based on the EMAP sampling grid.
                                                3-13

-------
Use of existing frames, or nets, requires certification of coverage of the existing frame for the EMAP
resource. If the existing frame incompletely covers the resource, then it is necessary to supplement that
frame to provide coverage of the complementary subset of the resource and to implement a sample on
this frame supplement. Such a supplementary sample may be required  in association with the use  of
any existing frame, in whatever capacity.

3.5.1  Examples of Existing Programs

The  National  Surface Water Survey  (NSWS) provides valuable background  information  on the
susceptibility of lakes  and streams to acidic deposition for specific target populations.  These surveys
covered only parts of the general surface  water resource, but  the methodology and frames are  of
interest.  Many of the sampling and estimating protocols to  be  used in EMAP were initiated in the
NSWS. Probability samples were used throughout.

The  USDA Forest Service maintains  an ongoing  national  forest monitoring program, the Forest
Inventory  and  Analysis (FIA)  program.  Additionally,  that agency  is in the  process of initiating
collection of forest health  data. Coordination between EMAP and the Forest Service presents a good
opportunity to develop an integrated design  that fulfills the objectives of both agencies. The design  of
FIA  is not strictly probabability based,  and the utility  of the  network in EMAP has not yet been
established.

The  USDA Agricultural  Research Service  maintains  an annual  monitoring  program,  through the
National Agricultural  Statistics Service (NASS), based  on a probability area sample that uses a well-
defined area sampling  frame. EMAP is exploring several options for  integrating the  AGRO\EMAP
design with the NASS frame and sample.  Additionally,  the  USDA Soil Conservation Service (SCS)
maintains a monitoring program on soils, designated the National Resources Inventory, which also has
potential for coupling with EMAP.

The USDI Fish and Wildlife Service National Wetlands Inventory (NWI) provides periodic estimates of
wetland extent by wetland type.  NWI assistance and material is being used to obtain frame material
and population descriptions for the EMAP Wetlands Resource  Group.

Many  other ongoing programs being conducted by governmental agencies provide potential EMAP
interfaces.  For the most part, these are nonprobability  designs, and must be used in a supporting role.
There  may be a  few  cases in which information from existing programs can  be incorporated even
though the samples were not collected  using probability  sampling,  or else  the  essential details of the
sampling  design are no longer available.  The term "found  data"  has been used for  such data that
cither did not come from a rigorous design or have in some sense lost their  "identity."  A strategy has
been proposed for incorporating such found  data into a rigorous monitoring program,  but the process
requires extensive data on the sampling frame,  considerable effort, and explicit  assumptions (Overton,
1990). Although this strategy may be an attractive option only in a few circumstances, it is important
to have the strategy available for use in a long-term monitoring program.
                                               3-14

-------
                                            SECTION 4
                                   ESTIMATION AND ANALYSIS

 The kinds of analytic output that will be generated by EMAP are dictated by the specific objectives of
 the  various  EMAP programs, with  some  constraint  imposed  by  the design.  The  algorithms of
 estimation and analysis are specific for the design and oriented to the nature of the needed output. It is
 appropriate for these algorithms to be made explicit at  the time design decisions are made,  in order to
 allow any constraints to be considered in the decision process.

 4.1  ESTIMATION AND  DESCRIPTION

 Generation of descriptive statistics is simplified  by use of Horvitz-Thompson  (HT) formulae, which
 reduce all design features to specification of the inclusion probabilities. The inclusion probabilities are
 knowable for any probability sample, and the strict requirement that EMAP Tier 1 and Tier 2 samples
 be probability samples has ensured that these formulae can  be used. Horvitz-Thompson estimation is
 provided for all basic population parameters, but  it is necessary in many cases to make approximations
 in estimating variances. These approximations will  also be kept in the form of HT algorithms, so that
 no  change in  the computing algorithms will  be  necessary. Several  documents address the nature of
 these approximations and their adequacy in  the context of EMAP-like surveys (Stehman and Overton,
 1987a,b; Overton and Stehman, 1987).

 Inclusion probabilities are of two kinds, first order and second order. First-order inclusion probabilities
 are simply the probabilities with which the individual sampling units are included in  the sample. These
 must be known for each selected  unit, and the data record for each unit must include the value or the
 information necessary to determine the value. This information is generated as a product of the process
 of sample selection and will be archived at that time. The symbol iru (referring to the sampling unit u)
 or TTf  (referring to the ith sampling unit)  will designate the first-order inclusion probability. When
 dealing only with sample notation, the subscript "i" is unambiguous and will be used.

 Second-order, or pairwise, inclusion probabilities are the probabilities with which two specific sampling
 units are included in the sample. These are designated as TT^, with obvious extension of the notation,
 referring to the probability of simultaneously including  units  i and j in  the sample.  In this document
 and  the supporting documents, TT,^  usually will be  calculated as a specific function  of TT,- and  TT •.
 Certain  design features may  be  required  to make this calculation, for  example, stratum or cluster
 identification and sample size. That  information  will also be  retained as part of the individual data
 record, so that storage and use of the data are kept uncomplicated. In some circumstances, it may be
 necessary to record and archive the explicit  ir^.'s, and that specification  must  be  a  component of the
 design established for these circumstances.

 4.1.1  Formulae

 Estimation formulae are simplified by use of weights (w) rather than inclusion probabilities,   where

            w,- =  I/*,.       and    w,.,,. =  1/x.j .

In practice it is appropriate to archive the weights rather than the inclusion probabilities as part of the
data set. The HT estimating formulae are then  expressed as:
         = £ w«y,.
(1)
                                                4-1

-------
V(Ty) =  £
          i€S
                           -i)  + £
                                         £
                                        J6S
(2)
where y is any attribute and Tj,  is the total of that attribute over any specific  identified population.
The summation is restricted to the specific set of units, S, in the sample or any subset of the sample
defined  by a specific population. Estimates over subpopulations are thus provided by subsetting the
sample. Estimates of numbers of units in populations are provided by setting y=l.

The variance formula (2) provides unbiased variance estimates if all pairwise inclusion probabilities are
positive. Systematic sampling, when interpreted  in  the fixed  configuration perspective, has a large
number  of zero pairwise inclusion probabilities. Thus,  when systematic sampling has been used, it  is
usually  necessary to use  an approximation  to the  variance estimator. The randomized systematic
sample  is  a commonly  used  model for  this  purpose, and  several approximations to the pairwisj
inclusion probabilities  are available.  The following  one has  been shown  (Stehman  and  Overton,
1987a,b) to perform satisfactorily with the form of the variance equation used  here. It has also been
demonstrated that  the model is satisfactory in EMAP-like sampling  circumstances  (Overton  and
Stehman, 1987).

For Tier 1, an appropriate formula for the second-order inclusion probabilities will usually be given by
                                                                                                (3)

                                                                           Strata  and clusters pose
                                       from which,
                     2nw,-Wy — Wf—vfj
            W'V  =        2(n-  1)       '
This approximation will usually  be used in subsequent  special formulae.
exceptions, as specified in Section 4.2.2.

4.1.2  Verification

EMAP will continue to develop estimates in all general areas of inquiry for many years; anticipating
the variance estimation restrictions in  a particular case is much less  important than  appropriately
responding to the evidence that is generated.  For many purposes, retrospective assessment of variance
will serve EMAP needs.

It is  proposed that a general assessment  capability be maintained that routinely investigates the
behavior  of the  EMAP  design  under  the conditions being  experienced  and  under  a variety of
hypothetical circumstances. This is  a part  of planned ongoing support for  EMAP.   Specifically, the
capacity will be developed to explore the precision of estimates and the adequacy  of variance estimates
in any particular part of EMAP by basic simulation experiments.
 4.2 TIER 1 ESTIMATES

 Tier 1 estimates are of several kinds, depending on the data on which they are based and the nature of
 the populations and the selection process (see Section 3.3).

 4.2.1  Resource Inventory

 Resource inventory is provided by the data from the  landscape  descriptions (3.3.1)  on the 40-hexes
 established on the grid points.  These Tier 1 estimates will typically be of the areal extent of surveyed
 resources, by whatever classes have been identified at Tier 1. Also, estimates of the numbers of resource
 units (e.g.,  lakes)  in specific  populations of  discrete units  will  be  made at Tier 1. The landscape
 description units represent 1/16"1 of the area per  grid point, so that w = 16, and y is either the area of

                                                 4-2

-------
the resource  or  the number of resource units in  the 40-hex. Other attributes of landscapes  will
ultimately be identified for similar population estimation.

Let the  areal extent, ar, of a particular resource be identified in the 40-hexes.  Then, with i now
indexing the grid points,

                                                                                                (la)
                           ies
estimates  the  total  area  of this  resource in the  defined  class  representing  any  spatial  or other
subpopulation, and where  the summation  is consistent with the identity of that subpopulation. The
variance of this estimate is estimated by a special case approximation of (2),

                                                                                                (2a)
One way to derive this result is to apply (3) to (2) with wt-= 16 for all i.

Estimation of numbers of any population  (class) of resource units (e.g., lakes) is possible for any class
for which units are uniquely represented by points. Point representation  (3.3.2.1) is necessary in order
to unambiguously count  the number of such units  in the 40-hexes.  Given such a count, nri,  over the n
grid points, the total number of units in  the defined population is estimated by,
                   Nr = 16£ nri
                           ies
 with variance similarly estimated by
                                                                                           (ib)
V(Nr) -
                                    es
                                                                                                (2b)
Estimates representing  resource inventory  will then  be summarized in a  table similar to Table 4.1.
This  inventory can  be  repeated for  any geographic  subdivision, or other spatial  partitioning of the
United States, or for any population defined in any manner. Some resources (extensive) will not have
estimates of numbers of units, but otherwise all of these estimates are  generated for all populations of
interest.
Table 4-1.    Tier 1 Inventory Estimates are provided from the Tier 1 Sample for All Resources
  and Classes of Resources.
Resource
                   Estimated Area, by Class

                   1     2      3     ....    k
                                              Estimated Number of Units

                                              123      ....    k
      B
      C
                                                 4-3

-------
4.2.2 Discrete Resources

An alternate Tier 1  analysis derives from  Design Option 1  (Section 3.3.5.3)  when  used with  an
association rule that is not based on the list of resource units in the 40-hexes. Specifically, one example
is the rule of taking the nearest lake to the sample point. Such a rule provides a single resource unit in
association with each  grid point. There must be some limitation of distance to allow misses,  and the
one prescribed is that no unit will be selected by this rule if it is located outside the 40-hex.

An especially useful application of this option is selection of the National Agricultural Statistical Sevice
(NASS) primary sampling units (PSUs) that contain the EMAP grid points.  This selection does not
involve the LDs  in any way, and  any incompleteness in the NASS coverage will show up as misses,
there being  no  PSU  containing  the grid  point.  This  selection  will  necessarily involve  variable
probabilities;  in this case, TT{ = a,-/634.5, where a,- is the area of the PSU  containing  the  \th grid
point.
Extensive resources may also have a sampling unit chosen in this manner, with the unit an arbitrary
area or quadrat, and  so be represented at the set of grid points by this specific structure. There are
several  specific variations  involved  in the EMAP  structure. The  Tier  1 sample of units is  then
subsampled to obtain  a useful Tier 2 sample. The subject is best discussed in terms of discrete resource
units, such as lakes, stream reaches, or  NASS sampling units.

Estimating  formulae revert to the general forms (1, 2), with general  representation of the inclusion
probabilities.  When using the specific  approximation (3), with no complicating design features, it is
convenient to rewrite the variance estimator as
ies
                                          iss jes
                                                    y.y (2w.w  _ w,_  Wj
                                                                       (2c)
Estimates from this Tier 1 sample provide the  base of variance estimation at Tier 2, via the double
sampling protocol.

Design features that create exceptions in formula (2c) are:

            •  Stratification:   TT,-^ =  7r,-7r;- ,  for i, j  in different strata

            •  Clustering:    ir,^-  =  TT,- = Wj , for i, j  in the same cluster

            •  Units entering with probability 1
4.2.3.  Domains

It is  easy to generate circumstances in which the randomly ordered model will  be inadequate for
variance assessment, and therefore it must be recognized that consideration of spatial matters must be
made in  certain circumstances.  One  such circumstance is  of particular  interest in the discussion of
strata, and can be identified in terms of formula (2c) and the identity of n in that formula.

The argument is simple.  If the grid is laid out over the domain of a resource, and data collected over
that domain, then extension  of  the  grid over the  rest of the country does not change the estimate.
Thus, extension over the rest of the country does not change the variance of that estimate. In effect,
the randomization model  has  assumed that  the  resource  is randomly  distributed over the entire
country, and prescribed the variance in terms of (2c)  on that premise. But  if we  recognize that the
resource is distributed over no greater an area than a designated "domain", then we can re-express the
assumption in terms of randomization over the domain, and let the effective sample size be determined

                                                4-4

-------
 by the domain boundaries. For the Tier 1 sample, this will change n from 12,000 to, perhaps, 1,000 for
 a particular resource.

 Inspection of (2c) reveals that the term in brackets in the second term,

            (2w,-w • — w,- — w.-)

 is necessarily non-negative, since w,- > 1  for all i. Therefore the second term is strictly non-negative for
 any positive  attribute, y. As a consequence, increase in n  without an  accompanying increase in S
 implies a decrease in the second term and an increase in V. Similarly, reduction in n  without an
 accompanying reduction in S  produces a reduction in V. This is exactly the  effect of restricting  the
 sample size, n, for a particular resource to the domain of that resource ( Figure 4-1).

 Under* this restriction, the effective  assumption  in invoking the randomization model  is that  the
 resource is randomized over its identified  domain, rather than over the entire United States. Not only is
 this a satisfying change in the assumption, it provides greatly  reduced variance estimates in crucial
 circumstances.  Specification of the domain  will remain an issue as we learn to  apply this concept. The
 domain of many, if not most,  resources will be imperfectly known, a priori, and it will be necessary to
 modify our precept in accordance with the data.  On the other hand, it is not allowable to reduce this
 "envelope"  to  the  concave  set, or  to  some  other  "shrunken  skin."  It is realistic to  think of
 determination of domain boundaries as a spatial issue for investigation, and meantime adopt a strategy
 of conservative approximation.

 A number of other issues are also raised by this development.  When should a subpopulation be assessed
 in  terms  of its  own  domain,  and  when should  the parent  domain  remain  effective? At first
 consideration, the general answer to this seems to  be in terms of the identified base of reference.

 What  is the effect of greatly varying  density of resource units over the domain? This too represents a
 clear deviation from the assumed randomization model, capable of description by the data  collected at
 Tier  1,  and  possibly leading to loss of perceived  precision.  This  is an obvious  circumstance  for
 consideration of stratification.  The same  is recognized with respect to spatial patterns in responses, as
 treated elsewhere. There  are  a number  of  aspects  of the analyses  that  require  attention to  spatial
 matters.

 4.2.4  Domains and Strata

 It is reasonable to think of domains as resource-specific strata, and this is compatible with the EMAP
 design in  terms of Tier 1 Resources.  For  Tier 1 estimates,  the domain-specific n is then appropriate for
 variance estimates, and this all is carried into Tier 2 sampling. The Tier 2 sample sizes will be specific
 for these  Tier 1 Resources.  A  resource-specific, spatial division of this domain  into two strata  will be
 generally  tolerable, and may have certain advantages.  However, an arbitrary spatial stratum will cross
 the boundaries of many resource classes and  domains (Figure 4-2), forcing  an awkward and  undesirable
 substratification on  them  all, with  no  real redeeming benefits.  It is possible that some ecoregion
 structures could minimize these undesirable features,  but the entire issue seems  worthy of  a firm
 position. Resource-specific spatial substrata are tolerable, but  general spatial strata imposed on all the
 resources  are  intolerable.    However, subresources  should  be   seen  in   the  subpopulation/domain
 perspective. If the classes of a Tier 1 resource are addressed as subpopulations, and if these are confined
 to well-defined subdomains, then the statistics of these subpopulations should also reflect this domain
constraint. These perspectives  of EMAP  strata must be  explored in  great depth in elaborating the
spatial analyses of the sample.
                                                4-5

-------
Figure 4-1. When, a resource is spatially restricted to a subregion, then an approximation to the domain
boundary will identify the effective sample size for that resource.  If the resource is restricted, say to
New England (sugar bush) or to California and Southern Oregon (redwoods), then sample points taken
outside the domain have no bearing on the precision  of the sample taken within the domain. But just
what is the appropriate sample size to use? It is possible to generate a suitable "effective sample size",
just by delineating the approximate domain of the resource; slight inflation of the domain will lead to
slight inflation of the estimated variance.

                                                4-6

-------
              Arbitrary Boundary
Figure 4-2. Overlapping domains make it difficult to impose a common spatial stratification.
                                               4-7

-------
4.2.5 Spatial Control of Tier 2 Selection

These considerations lead directly to the issue of maintaining spatial distribution of the Tier 2 sample.
If the Tier 1 sample has been carefully generated to contain great spatial distribution, then it seems
imperative that this distribution be retained in  the Tier 2 sample. The device that was used for this
purpose  in the National Lake Survey  (Overton, 1987a) appears  well  suited  for this purpose in the
EMAP design. Spatial stratification is the basis for the device.

Let  the  domain  (of a particular resource) be  partitioned into a number of compact clusters  of grid
points, such that the sum of weights for the Tier 1 sample in a cluster  is "equal" for all clusters.  The
purpose  is to select a Tier  2 sample in such a  manner that "equal" representation is obtained from
these clusters (Figure  4-3).  This  effectively  distributes the  Tier 2  sample in  proportion  to the
population distribution  for all subspaces of the domain, subject to the  resolution of the Tier 1  sample
information.

The explicit subspaces represented by the clusters are effective strata, but will not be treated as explicit
strata in selection  of the Tier 2 sample. However, it is a simple step to combine clusters to make up
coarser strata that represent meaningful spatial  subdomains. Such subdomains may provide statistical
advantages,  as well  as conceptual  meaningfulness,  and  the statistics of these  structures will be
investigated.

The ready utility of spatial  strata in the context of specific resources imposes certain constraints on the
use  of other design features that will inhibit that utility.   Artificial strata, like political regions, are
particularly  to be avoided. Strata based on ecoregions probably should also be avoided, but in part this
rests on  the compatibility  of ecoregion  boundaries  and domain  boundaries.  The only  nonspatial
stratification criterion that seems to be  warranted is that of resource identity.


4.3  TIER 2  ESTIMATES AND DESCRIPTIONS

4.3.1 Estimates

The Tier 2 samples  will provide additional variables representing higher resolution classification  of
resources and the suite of indicators. From these data, estimates will be generated of the areal extent of
resource classes that were not identified on the Tier 1  sample and of the numbers of resource units in
those populations.   A table  similar to Table 4-1 can be generated for Tier 2 estimates. In addition, the
Tier 2 sample lends itself to more complete population description in terms of the indicator variables
that have been measured on the samples of population units, or  on  the quadrats or points used  to
characterize extensive or  continuous resources.   Additional descriptive statistics will also be used  in
reporting the Tier 2 analyses.

Estimating formulae for Tier 2 will follow the general forms (1, 2), but the inclusion probabilities are
somewhat more complex than in Tier  1, representing the sampling process  at  both tiers. Actually this
is a three-stage  process, although  the  middle stage is not identified in Option  2. The process is an
extension of that used in the National Lake Surveys  (NLS) (Linthurst, et al.,  1986; Landers, et al.,
1987) and  the National Stream Surveys  (NSS) (Messer,  et al., 1986;  Kaufmann,  et al., 1988).  For
discrete  resources,  selection at Tier 2  will be  made in a manner  to cancel out the variable inclusion
probabilities that were generated at Tier 1. Another source of variable probability will be stratification
at Tier 2 selection. There will be some restriction on the ability to remove Tier  1 variable probability,
and the  general formulae  (1, 2) will  be used  throughout.  Further,  there are possible situations  in
extensive resources in which one might  wish to retain variable probabilities.
                                                 4-8

-------
Figure 4-3. Partitioning a Resource Domain into compact  clusters of points, with each cluster  having
the same size, Ew, provides structure that maintains spatial distribution of the Tier 2 sample.

-------
4.3.2  Option 1

For Design Option 1, the Tier 1 sample will consist of one resource unit per grid point, for each discrete
resource at that point. The Tier 2 sample for a particular resource will then be a subsample of the Tier
1 sample for that resource, such that the product of the Tier 1 inclusion probability and the conditional
Tier 2 inclusion  probability yields the "total" probability of selecting the ith unit in the Tier 2 sample,
                — 7ri«7r2-i»
                              and thus
                                *liw2-li
                                                                              (4)
The recommended protocol for Tier 2 selection again involves systematic sampling on an ordered list,
though there  is local  randomization. Again it is necessary to approximate the second  order inclusion
probabilities for this process, and to compute  w2ij-. Two solutions have been used (Messer, et al., 1987;
Ovcrton,  1987a).  In certain circumstances, Equation (5) is appropriate, with approximation either at
the second term or the first,
Wn;.- =  W,
                                                                                                (5)
But when both  terms must be  approximated,  as  by Equation (3),  it seems no  more arbitrary to
approximate \v2l]- directly by Equation (3), as was done in the Stream  Survey (Overton, 1987b). Minor
complications arise when certain units are selected with certainty.

Further  investigation will  be made into the behavior of this approximation.  For the recommended
selection procedure at Tier 2, 7T2i = nj/Nj, where n2 is the Tier 2 sample size and N\ is the Tier 1

estimate of the size of that resource. It follows that  N2 =  ^l/*"*  — Nlt so that the variance of the
                                                          S2
population total  estimated from the Tier 2 sample is identical to the variance of the estimate from the
Tier 1 sample.


4.3.3 Option 2

4.3.3.1 Discrete Resources

For Design Option 2, the Tier 1 sample for a particular discrete resource consists of all resource units in
each hexagon, and the Tier 2 sample is selected by a two-step process:  (1) select a set of hexagons with
probability proportional  to the number of units,  and (2) then select a single unit at random from each
hexagon selected at the first step. For the second step, equal probability selection from the number in
the  hexagon  results in  selection with probability  inverse  to that in the first  step.  Therefore, the
inclusion probability for each unit in this Tier  2 sample is also given by ir2i =  n2/N,  but in this
option, N  was estimated from the full hexagon  list of sampling units, rather than from a single unit
selected by an arbitrary rule.

This design leads to the identical sample that would have  been obtained if the sequence of Option 1
had been used, but it creates a different perspective  of the sample, allows perception of the basis for the
apparent advantage  of  this association  rule over the first,  and provides greater flexibility for future
changes in sample structure due to shifting objectives.

Again,  Equation (5)  can be used  in the manner of  Phase II of the  Eastern  Lake Survey (Overton,
1987a) if cither of the stage  designs will yield exact HT variance estimation. The second stage of this
option is a cluster subsample with one unit  per cluster,  and  this produces a large  number of zero
second-order  T'S. Thus  both components must be approximated,  so again  the  indicated method is to
estimate the w2,-.-'s directly by Equation  (3). This has an additional reinforcement in that the variance
estimated  would be  identical  to  that estimated  from this association  rule used in Option 1. Still, the
                                                4-10

-------
 behavior of this prescription has not been investigated in the context of Option 2, and this will be done
 in the coming months.

.4.3.3.2 Extensive Resources

 Continuous resources can  be  sampled  in  the  manner  of general  extensive resources  (following
 paragraph), but the preferred design  will be based on point samples of one or another type. Several
 have been  investigated,  and further investigation  is necessary before a  definitive  assessment will  be
 possible. The strict triangular point grid, enhanced from the  base EMAP grid, is the currently favored
 design.  When  the  surface signals  are  strong  relative  to  noise, population estimation  methods for
 variances are weak and approximations based on spatial components are needed. In this circumstance,
 spatial models for surface  fitting are more greatly needed than elsewhere in  EMAP. The Near Coastal
 Demonstration Project is a pilot study for this approach  (see section 3.3.5).

 Extensive resources can also be  sampled by Option  2, but there are so many  different ways to represent
 the  resource contained in  a 40-hex that it is appropriate to restrict the general treatment and leave
 considerable latitude to the resource group for field  methods. We will treat only the  first step of Option
 2, and identify two ways to select the subset of 40-hexes for  a  particular resource.

      1.  Choose a subset of the Tier  1  hexagons  containing the resource with  inclusion probability
         proportional  to  the area,  a,-,  of the resource in the  ith hexagon.  Then  these elements .are
        selected with "total" probability
                    r2t-  =
        so their variances are also equal.
Again,
A2  =
        The analogy to a Tier 2 sample for discrete resources is completed if a single quadrat of fixed
        size is randomly selected in the resource of each selected 40-hex. Let the quadrat be of size m,
        so that 7r3.2i = m/a,-  if m
-------
        certain resources than for others. Extensive homogeneous resources will be represented nicely by
        index samples using quadrats,  transects,  or other simple  devices. It  is the  heterogeneous
        extensive  resources that  will pose a  real challenge.   This approach is clearly the preferred
        option for general extensive resources.

4.3.4  Distribution Functions

These populations also will be characterized by the estimated distribution function for the variable, x,
of interest. Indicator variables will play such a role. In this, it is necessary only to consider a particular
range of the variable x as defining a subpopulation, and  estimate the  number of, say, lakes in  the
population that are  in this subpopulation. This process is repeated for all values of x in the sample, and
the estimated distribution plotted as in Figure 4-4.  Note that such distributions are generated for a
variety  of y-variables,  such as  number (frequency distributions), area (areal distributions), length,  or
other attribute.  For  example, it is useful to generate the distribution of stream miles on the variable, x,
indicating the miles  of stream that have a specific attribute.

The page of descriptive statistics that was prescribed (Linthurst et al., 1986) for the NLS, and also used
for the  NSS, will also serve EMAP descriptions (Figure 4-4). Quantiles are interpolated from the fitted
distribution function. In the NLS, confidence bounds were defined for the number of units in the class
in question and presented in scaled form. Extension to EMAP  demands greater versatility, and it is
recommended that  the distributions be generated both as  numbers and as proportions,  and  that  the
confidence bounds be generated accordingly (Figure 4-5). Then a single page of description will report
only one kind of distribution, either of numbers or area,  but in both forms.  To keep the two forms
distinct, it is recommended  that the scale of the  distribution  of numbers  be labeled as numbers, but
that the plots be  of fixed dimension to facilitate comparison. Confidence bounds on the distribution of
numbers should continue to be one-sided, as in the NLS (Overton, 1985),  and ascending and descending
analyses should be provided,  depending on the variable. The ascending analyses provide upper bounds
on the numbers of resource units having value of the variable below a particular value, and descending
analyses provide upper bounds on the numbers of units having value above a particular value.

Confidence bounds on  proportions of numbers of units  will usually be appropriately provided  by exact
binomial  bounds, and these are to be  two-sided, with  descending analyses  unnecessary. Binomial
bounds can be used only when  the variable probabilities have  been eliminated by the selection process
at Tier 2; otherwise it may be  necessary to use ratio variances for these  confidence bounds. Combined
strata will involve means of binomials, and will also not be suitable to exact binomial bounds.

Distributions of area or length can be estimated directly using the HT formulae, but proportions of
areas and lengths must be treated as ratios  and therefore will require special treatment  in generating
confidence bounds of the second type.

4.3.5  Distributions  of Continuous Resources

Continuous resources  are also represented in the form  of distributions in terms  of random points over
the spatial extent of the resource. A temporal component may also be represented in  the distribution.
These  representations  do  not  have  the finite population corrections  of the discrete  resources,  but
otherwise the representations are identical. However, there is a class of continuous  resource, charac-
terized  by strong surface signal-to-noise ratio, in  which the proposed binomial  confidence bounds will
be too conservative.  Effort  will be  expended to  identify these resources and to  construct  better
confidence bounds. It seems likely that a technique from spatial statistics will be necessary. The signal-
to-noise ratio deserves a brief  mention. If the spatial  pattern,  as seen by the  measurements,  is very
regular, then this is evidence that noise is relatively low and that the surface signal is strong, relative to
noise. The  noise  in question can either be true surface,  high-resolution irregularity  or  measurement
error.  In some circumstances, discrete resources can also exhibit such patterns of strong surface signal
                                                4-12

-------
              Vu-lsbU: ANC a* p«q L" for lakes S200O ha
                X:,: 0.0    X,,: SO.O    X«> 2OO.O
              Population Siia (N): 7O96  SEW 16SJ3
              Lake ATM (At 427864     SE(A): 36414
                                  Sample Size: 763
          1.0
          0.8-
          0.6-
          0.4 -I
          0.2-
          0.0
                                                                           1	!	1-
             -100.0-0.0    1OO.O   2OO.O  300.0   4OO.O   500.0   6OO.O   7OO.O   8OO.O.  9OO.O 1OOO.O
        Mire -45.60
                        Q>: SI. 63
       Q3: 118.36
j: 199.64
                                                                     CU ,399.34
                                                                                     Max: 4046.60
             Median;  158.11
             Mean: 268.O8
             Sid. Dev.: 411.69
Proportions and Numbers Below ih« Value of X<
p<,: O.046     Ntl: 326      N,..,,: 422
p^: 0.192     N<3: 1364     Ne^,: 1536
p' OG7?     A.,  2II7-J77   A,,,  3389-3
Figure 4-4.  Example of descriptive statistics used in the National Lake Survey that could also be used
in EMAP: F(x) and  G(x) are frequency and areal distributions of ANC (acid neutralizing capacity) for
the target  population of lakes  < 2000 ha in the  northeastern  United  States,  Eastern Lake Survey.
(Source: Linthurst et al., 1986)
                                                   4-13

-------
          A
          N
           100
150       200
           250       300       350       400
           100
150
200
250       300
350       400
Figure 4-5.  Examples of estimated distributions, of numbers of resource units  (upper plot), and of
proportions  of  resource units  (lower  plot).  The  confidence bound  in  the upper plot is  based on
estimated variance of the estimated numbers; in the lower plot, the confidence bound is based on the
binomial distribution.
                                             4-14

-------
to noise, but  the phenomenon is more expected in continuous resources. An illuminating example of
spatial effect  can be identified in this context.  Consider estimation of the proportion of Chesapeake
Bay for which the value of x is below a specified nominal value (Figure 4-6).

Draw the contours in the bay for this nominal value. The variance of the estimated proportion derives
from the variance of the number of grid points that fall in the designated region under randomization
of the grid. If the  region is entire, a systematic  sample will  have  variance greatly less than will a
random sample,  which generates the  binomial distribution. But  if the region is greatly  fragmented,
then the systematic and random samples will have similar variances. The key to adequate estimation of
the variance  of the estimated proportion, when using  the systematic sample,  is  thus identification of
the degree of  fragmentation of the region sampled.

This  example is  presented  to convey  that the focus on population estimation, rather  than spatial
analysis, does not imply  that the spatial  component of distributions is being overlooked. A  principal
reason for use of the systematic grid  is to provide uniform spatial coverage,  and the restriction that
Tier 2 sample selection will retain that spatial coverage is consistent with this goal. However, the usual
variance  estimation methodology  does not  take spatial components  adequately into account,  and
methodological development  is required. But neither do current spatial  methods  take into account the
needed probability sampling considerations. Clearly a melding of the areas is needed.


4.3.6  Deconvolution

Estimated distribution functions can also contain an  extraneous  temporal  component if the data are
taken in  a temporal window that has appreciable variation in the variable. When such variation is
present, it is  important to identify this component of the distribution, and  when  this component is not
consistent with the objectives of the survey, it is therefore extraneous and should be  removed. We refer
to this removal  as  "deconvolution," and development  of satisfactory methodology for deconvolution is
a prime objective. This topic has been addressed in specific  circumstances (Church, et al., 1989), but a
comprehensive treatment has not yet been established  (Overton 1989a). Attention will also be given to
deconvolution to remove  the effect of other sources of extraneous variation.

 4.3.7  Trends and the Interpenetrating Design

 Characterization of trends is a key component of EMAP. We will use the term  in the broad sense, to
 mean the general temporal pattern of variation in an attribute, and will focus on trends in populations.
 Trends  in single systems  are the subject of process studies,  and not  really part of the general
 monitoring perspective.  Trends in early warning systems may be  an exception, but  this concept is not
 well  developed.   The interpenetrating design  is effective  in  characterizing  trends, and a  linear model-
 based  paired t-test is available to determine if the change from one time to another is significant. It is
 also simple to determine if the pattern from one time to another  can be considered linear, or  if there is
 evidence that a nonlinear pattern holds.

 The  descriptive ability  of the four-year moving average of the interpenetrating samples  is great, and
 this  moving average is easily  corrected  for  linear  trend, so as to  accurately reflect  the resource
 trajectory. Further,  the same  moving  average analyses can  be extended to distributions,  for those
 variables  to  be analyzed for population trend. Again,  tests for population trend are  straightforward for
 these distributions  at  intervals of four years.
                                                 4-15

-------
Figure 4-6.  An unrestricted random sample will give a binomial random variable for the number of
points in the enclosed area in  each figure.  A systematic grid will have greatly less  variation for  the
"entire" region of (a), and will have close to the binomial variance for the highly fractured case, (c).
                                               4-16

-------
The basic population moving average estimates are given below, either for Tier 1 or Tier 2 estimation.
Si, S2) S3, and S4 refer to the samples implemented in years 1, 2, 3, and 4.
     year 1


     year 2


     year 3


     year 4
t  =
   £yw,
   Si
if £
 LSI
    =   if £yw
                     "I,
                     J
=  if £yw + £yw +
        if
         L
         !
                          J
                     £yw + £yw ~|
                     S3      s4   J
The formula for year 4 can be expressed in a form

                         1  4
              Ty   =   3 £  £yw

that applies to  all subsequent  years, with the four  samples now representing  the  last four years of
execution of the Tier 2 sample. For example, S4 might represent 1995, Sx  1996,  S2 1997, and S3  1998.
We can identify this estimator  as a four-year moving average, with each of the four years represented
by  a  different subsample. The sequence  of moving  averages will therefore provide averaging on  the
subsamples as well as on the years, and provide a better description of the resource trajectory than will
the other  alternatives, with essentially the same effort.

Variance  estimates  for  the moving  average estimates  follow  standard  HT  formulae,  using  the
approximations  adopted for EMAP and used  throughout.  Further, subpopulation  estimates and
variances  follow  the standard  subsetting  protocol,  including  the  generation  of  distributions and
confidence bounds for distributions. Then, comparison among distributions at intervals of four years is
direct and follows the general protocol developed for  such  comparison in Phase II of the Eastern Lake
Survey (Overton,  1987a; Overton, 1989a).

4.3.8   Spatial Pattern

The systematic grid design is capable of representing spatial patterns and processes  that are apparent
at this or  coarser resolution. In some  instances,  higher  density systematic  grids  will be used,  for
example in  characterizing large bodies of water, and  design sensitivity to pattern will change with  the
grid density. This is an important consideration  in design;  choice  of density, and  the size of  the
landscape description hexagons  (the 40-hexes), is  tied to the scale  of the phenomena  intended  for
measurement. After  these  designs  have been fixed,  spatial  characterizations are limited by  the
resolution of the grid and the size of the description  hexes.

Two  general ways  of  representing spatial  pattern are  available.  Regional  differences  are  easily
accommodated by subpopulation  analyses. Any subpopulation can  be described  by  subsetting  the
sample  on   that  subpopulation, and  regional  or  other  spatial  entities are  simple examples  of
subpopulations (Section  4.2.2). Then the population distributions and  other statistics are capable of in-
depth characterization of those entities.  This approach will be used extensively and will be the base for
a number  of other  analytic approaches. Regional  subpopulation analyses are enhanced for  most
resources by the uniform grid design.
                                               4-17

-------
Surface fitting, as by the methodologies of spatial statistics, will be another essential technique. Kriging
is a common method, and  is clearly of use in some circumstances. However, we perceive the need for
greatly enhanced methods of spatial representation to account  for irregularities created by topographic
and meteorological patterns, as well as by anthropogenic and ecological processes. Kriging will be used
in the beginning, but effort will be directed toward a more  satisfactory spatial analysis. We cannot look
at the output of kriging without wanting something better.

One simple estimation that will be immediately available, without much development, derives from the
capacity to integrate sample statistics over spatial regions. Such methodologies are extensions of end
corrections developed for systematic sampling by  Yates (1949) and others.  This is closely tied to the
issues of spatial stratification.  However, beyond  the first simple applications, this leads back to spatial
statistics, and the need for extensive development in that area.

At  another level, we identify  several distinct modes of visual presentation of spatial pattern. Useful
representation  of  surfaces  can be made  by  contour plots,  but  in  certain circumstances, more
information will  be carried by spatial mosaics. Mosaics and a  variety of point representations also can
be  used  for spatial characterization  of  discrete  resource  units. Some effort  will  be  expended  in
developing and applying these  methods.


4.4 CHANGES IN CLASSIFICATION  AND STRUCTURE

4.4.1   Reclassification

The dominant consideration with respect to flexibility and adaptability of the design is the capacity  to
accommodate reclassification. Some resource units  will change from one type to another over the course
of years.  Others will  turn  out to have been  misclassified, either on  ground inspection, or on photo-
reintcrpretation.  More importantly, as more is learned  about the systems that are being monitored, it
is certain that some changes in classification  will be desired to bring the design more into alignment
with  new perspectives. If classification has been the basis for stratification, then there will be interest in
rcstratifying to accommodate reclassification.

4.4.2   Subpopulation Estimation

Subpopulation estimation is always a way  to deal with these problems. Summation over the sample
units (Section 4.1.1) belonging to a specific Subpopulation  provides estimates for that Subpopulation. A
Subpopulation being so characterized can be present in  all  strata, and one simply subsets the sample of
each  stratum in  generating the Subpopulation estimates. Thus it is not necessary to restratify in order
to fulfill the basic characterization goals,  but it might  be  desirable.  For example, association analyses
are made more  complex by mixing strata, due to the need  to weight the analyses, as discussed  in
Section 4.5.1. In general, data handling and analyses are simpler if the sample structure is Simpler. The
general capacity to restratify, and otherwise to  reorganize the sample structure, is  seen  as a necessary
feature of a long-term monitoring design.

4.4.3   Restratification

Reclassification at Tier 1 will have two kinds of effect on the resource  samples. If a unit is changed
from  one resource to another,  then the  identification of the Tier 1 sample of each resource is subject to
change. This change will carry through into the Tier 2  sample, with attendant changes in the inclusion
probabilities.

The other kind of change involves reclassification  of a  single resource, ranging from a simple shift of a
few units from one class to another to a complete cross classification. The first again involves only a
few adjustments. The latter requires wholesale modification in the sample in order to obtain uniform
                                                4-18

-------
inclusion probabilities in the new classes. In some cases, one may wish to retain the original structure,
as well as the new one, and treat the cross classification cells as strata.  However, this is little different
from subpopulation estimation.

The general process of restratification  involves  sample reduction and sample enhancement. Sample
reduction is straightforward; one  simply subsamples  the  original sample and carries the selection
probabilities into the new  inclusion probabilities. Sample enhancement  is more difficult, and it  is not
trivial to enhance a sample in such a manner as to obtain a prescribed uniform inclusion probability. A
general EMAP recommendation will be to oversample at the initial determination of the Tier 2 sample,
with reduction to the working level. The remainder will be archived as a source of future enhancement
units.

4.4.4   Restructuring the Interpenetrating Sample

Other procedures also involve these operations of reduction and enhancement.  Sample selection for  the
interpenetrating Tier 2 samples must be made, in the beginning, in  ignorance of the structure  of  the
later  Tier  1 interpenetrating samples. Thus, it seems  a good design to modify these samples after all
the Tier 1  descriptions are completed,  in order to obtain uniform inclusion probabilities over the four
samples.  This   modification  involves  reduction  and  enhancement;  oversampling  the  first  three
subsamples, so  that only reduction might be required,  again seems desirable.  Here one would consider
implementing the oversample in the  first cycle.   If the Tier  1 resource samples are all identified before
any Tier 2 samples are selected,  then  these adjustments will not be  needed, but this hardly seems
feasible.

4.5 ANALYSIS OF ASSOCIATIONS

Analysis of associations is an integral part of intended EMAP assessment, and the needs of this activity
also have played a  large role in design consideration. The statistical analyses of data from a probability
sample pose problems that are  not encountered in conventional statistics. Except for this, the analyses
are the same as  in any other statistical context. Inferences, however, may  be modified because of  the
observational nature of these surveys.

4.5.1 Weighting

If the data are represented by variable  inclusion  probabilities, it is necessary to account for this in  the
analyses, by weighting. For this reason, in large part, the EMAP  plan eliminates variable probabilities,
except between resource strata, and these strata are  usually based on resource classes  that  are of
particular  interest.  Simple unweighted  analyses suffice for the individual strata,  within which  the
weights are all  the same.

In conducting an association analysis for two strata, combined, suppose that inclusion probabilities are
uniformly .5 within one stratum and uniformly .2 within the second stratum.  Analysis of the combined
data set is relative to the combined populations, and it  is necessary to weight  data from the first
stratum by 2 and data from the second stratum  by 5 in order  to get the appropriate analysis.  This is
not really a heavy additional load, but it is something else to take care of.

One  of the most perplexing issues involving weighted data  is that of graphical representation. Scatter
plots  are a basic tool  for displaying  an association, but they are  difficult to produce when the weights
are different. Spatial plots are even more difficult; what is needed is some way to make the number of
points in each of the stratum samples proportional to the population numbers. Some preliminary work
done  using Eastern Lake Survey data has met with some success, but in general, much attention must
 be paid to this issue.
                                                 4-19

-------
In regression analysis, if the two strata have different functional forms for the regression, say a different
slope, then there are grounds for the position that the combined analysis is meaningless. If the relation
is to be used to provide a prediction for a particular unit, then it is clear that the stratum of that unit
should be identified, and the individual stratum regression used. No problem arises if the strata are not
going to be combined.

Further, if the two strata have the same functional association, there is no problem in combining the
data,  even  if the two  sets of inclusion probabilities  are  disparate. The Gauss Markov Theorem says
nothing about inclusion probabilities, but it does require the functional form  of the relation  to be the
same.

The EMAP strategy, then,  is to identify resource strata  that  have specific meaning, and to  construct
the design to adequately characterize these  resource classes.  Elimination  of variable probabilities within
these  strata allows  simple analysis for association, within  strata. If it  turns out  that association
relations are the same among  several strata,  then it  is reasonable to combine  strata for combined
representation  of those relations, but separate stratum  analyses  are appropriate when relations are
different.

These considerations still do not prohibit combining strata with different association relations if there is
a compelling reason to analyze the mixture. It is only required that weighted association analyses be
made in such cases, but we would argue that such cases will  not be common.

4.5.2  Observational Data

Testing in the  context  of EMAP association analyses is another issue. These will be observational data.
Tests based on observational data are no different from tests based on experimental data, and the
statements  of  significance are  the  same.  However,  inferences drawn  from  such tests may be  very
different. Causality is  proven by formal experiments,  but not by observational studies. Observational
studies prove only association, and  the nature of the cause must be discovered  by other  means. For this
reason, it will  be appropriate, even after extensive  exploration of association in the context of EMAP
assessment, to  speak of "possible cause," rather  than "probable cause."

4.5.3  Structures

Point by point, or  sampling  unit  by sampling unit, analyses will  be appropriate  for  any sets of
variables taken on points or on sampling units. Data sets prescribed by stressor/stress indicator pairs
will  be generated on a sampling unit basis,  and these will  be a major source of association analysis. But
there is interest, also,  in association analyses across  resources. Cross-resource associations will not  be
evident on a sampling  unit basis unless there is a specific design that provides data on those  resources
on the same sampling  units. It will  often be feasible to collect the needed cross-resource  data on the
sampling units of one  resource. In other cases, it may be  necessary to have  associated pairs of units
from the two  resources. This is  a  feasible design option, but should be used  only  for  matters  of
particular concern.  Precision of resource estimates  will be lessened by use  of  designs that restrict the
individual samples.

Evidence of association  and  clues to meaningful  relations  will  also  derive  from  subpopulation
differences,  and the search for subpopulations that exhibit insightful differences should  be a prominent
activity. Subpopulations may be identified spatially,  or by any other criterion, and the nature of the
criterion is  the basis for investigation of cause. Any attribute  that is associated  with the population
criterion is thus a candidate for identification as causal.

At least in  the beginning, association of deposition  with resource response must be on a subpopulation
basis, because it will not be possible to project deposition on high spatial resolution, so as to adequately
associate with sampling unit variation. Spatial regions must  suffice.
                                                4-20

-------
Comparisons among subpopulations will be made in terms of distributions, rather than simply relative
to means. Such comparisons prescribed for the  NSWS provided satisfactory assessment  (Overton,
1985). These  were simple chi square analyses of the data, partitioned by quantiles of the estimated
distribution of the combined population. Comparisons among strata are simple, but again comparisons
of mixed stratum  populations pose some difficulties, because of the variable probabilities. However,
concern with  the testing protocol must be tempered by recognition that such tests are simply  for the
purpose  of screening, and  that  follow  up investigation will  be required to  make any inference of
mechanistic association or causation.

That is,  given identified  subpopulation associations, it will be of interest to establish higher resolution
relations on a sampling unit by sampling unit basis. At the most fundamental level, it is necessary to
make sure that the  specific  lesource units  that  show stress  have received the stressor stimulus. This
places requirements, for example, on analysis of deposition networks that cannot be met by currently
available spatial statistical techniques. However, such designs will not be  needed  in the beginning, but
rather only after the subpopulation associations  have been established.  It will be  several years before
these capacities are needed.

4.5.4 Multivariate Analyses

A question has arisen on several occasions regarding multivariate analyses, and in particular, the use of
multivariate distribution functions. Unquestionably, any data exploration must  admit multivariate
methods, and there is no  intent to prohibit them. Further, there are specific forms of multivariate
description that have proven particularly useful, such as the trilinear plots used in the NSWS. When
the nature of the relation of certain variables can be identified, as in water chemistry, then this relation
can be exploited to great advantage in the manner of analysis.

Bivariate distribution functions will not generally be supported by data sets of the sizes used  in EMAP.
However, when the categories of  one variable, or of a combination of several variables, are coarse, then
these categories can define subpopulations that are described by  the distribution functions of whatever
other variables  are of interest in association with  those that define the subpopulation. In a sense this
allows multivariate distribution functions, but within the subpopulation paradigm.

4.5.5 Subpopulations Revisited

Subpopulations are an extremely important facet of the EMAP design perspective. Their use permeates
most of  the analytic topics,  and  their care and maintenance dominate the design issues.  Recalling the
thought  that EMAP programs are  designed  to  discover and characterize subtle, diffuse, likely-to-be-
overlooked-without-EMAP subpopulations that are doing interesting things is a good way to end this
section.
                                                4-21

-------

-------
                                           SECTION 5
                                         REFERENCES
Church,  M.R., K.W.  Thornton, P.W. Shaffer,  D.L. Stevens,  B.P. Rochelle,  G.R.  Holdren,  M.G.
Johnson, J.J. Lee, R.S. Turner,  D.L.  Cassell, D.A.  Lammers, W.G. Campbell, C.I. Liff,  C.C. Brandt,
L.H. Liegel, G.D. Bishop, D.C. Mortenson, S.M. Pierson, and D.D. Schmoyer. 1989.  Future effects of
long-term sulfur  deposition on  surface water chemistry in the Northeast and  Southern Blue Ridge
Province. EPA-600/3-89/061. Washington, DC:  U.S. Environmental Protection Agency.

Horvitz,  D.G. and D.J Thompson.   1952.   A generalization of sampling without replacement from  a
finite universe.  J. Amer. Statist. Assoc. 47:663-685.

Kaufmann,  P.R., A.T. Herlihy, J.W. Elwood, M.E.  Mitch,  W.S. Overton,  M.J.  Sale,  K.A Cougan,
D.V. Peck, K.H. Reckhow, A.J. Kinney, S.J. Christie, D.D. Brown, C.A. Hagley,  and H.I. Jager.  1988.
Chemical Characteristics of Streams in the Mid-Atlantic and Southeastern United States.  Volume I:
Population   Descriptions  and   Physico-Chemical   Relationships.     EPA/600/3-88/021a     U.S.
Environmental Protection Agency, Washington, D.C.

Landers, D.H., J.M. Eilers, D.F. Brakke, W.S. Overton,  P.E. Kellar, M.E. Silverstein, R.D. Schonbrod,
R.E. Crowe, R.A. Linthurst, J.M. Omernik, S.A. Teague, and E.P. Meier.  1987.  Characteristics of
lakes in  the  western  United  States.   Volume I:  Population  descriptions  and physico-chemical
relationships.  EPA-600/3-86/054a.  Washington,  DC: U.S. Environmental Protection Agency.

Linthurst, R.A., D.H.  Landers, J.M. Eilers, D.F.  Brakke, W.S.  Overton, E.P. Meier, and R.E. Crowe.
1986.  Characteristics  of lakes in the eastern United  States.  Volume I:  Population descriptions and
physico-chemical  relationships.    EPA-600/4-86/007a.     Washington,  DC:   U.S.  Environmental
Protection Agency.

Messer, J.J., C.W. Ariss, J.R. Baker, S,K,  Drouse, K.N. Eshleman, P.R. Kaufmann, R.A. Linthurst,
J.M. Omernik, W.S. Overton,  M.J. Sale,  R.D. Schonbrod, S.M. Stambaugh, and J.R.  Tuschall, Jr.
1986. National Surface Waters Survey: National Stream Survey, Phase I - Pilot  Survey.  EPA/600/4-
86/026, U.S. Environmental Protection Agency, Washington, D.C.

Messer, J.J.,  C.W. Ariss, D.H. Landers, and W.S. Overton.   1987.  Critical design and interpretive
aspects of the National Surface Water Survey.  Lake Reserv.  Manage. 3:463-469.

Messer, J.J., R.A. Linthurst, and W.S. Overton.  1991.  An  EPA Program for  Monitoring Ecological
Status and Trends.  Environmental Monitoring and Assessment 17:67-78.

Olea,  R.A.    1984.   Sampling  design optimization for  spatial functions.   Mathematical Geology
16(4):369-392.

Overton,  W.S. 1985.   Working draft, analysis plan for the Eastern Lake Survey. March   1985.
Technical Report 113, Dept. of Statistics, Oregon State University.

Overton, W.S.  1987a. Phase II analysis plan, National Lake  Survey — working draft.   April 1987.
Technical Report 115, Dept. of Statistics, Oregon State University.
                                              5-1

-------
Overtoil, W.S.  1987b.   A sampling and analysis  plan for streams, in the National Surface Water
Survey  conducted by  EPA.   June  1987.    Technical  Report 117, Dept. of Statistics, Oregon State
University.

Overton, W.S. 1989a.   Calibration methodology for the double sample structure of the National lake
Survey  Phase  II Sample.  Nov. 1989.   Technical Report  130, Dept. of  Statistics,  Oregon State
University.

Overton, W.S.  1989b.  Effects of measurement and other extraneous errors on estimated distribution
functions in the National Surface Water Surveys.  Aug. 1989. Technical Report 129, Dept. of Statistics,
Oregon State University.
Overton, W.S.  1990.  A strategy for use of found samples in a rigorous monitoring design.
Report 139, Dept. of Statistics, Oregon Sate University.
Technical
Overton, W. S., and S.V. Stehman.  1987.  An empirical investigation of sampling and other errors in
the National Stream Survey:  Analysis of a replicated sample of streams.  Oct. 1987.  Technical Report
119, Dept. of Statistics, Oregon State University.

Palmer, C.J., Riiters, K.H., Stirckland, T., Cassell, D.L., Byers, G.E., Papp,  M.L., and Liff, C.I. 1991.
Monitoring and research strategy for forests —  Environmental Monitoring and Assessment Program
(EMAP).  EPA/600/4-91/XXXX. U.S. Environmental Protection Agency, Washington, D.C.

Stehman,  S.V., and W.S. Overton.   1987a.   Estimating  the  variance of the  Horvitz-Thompson
estimator in variable probability, systematic samples.  Proceedings of the Section on Survey Research
Methods of the American Statistical Association.

Stehman,  S.V.  and  W.S.  Overton. 1987b.  An empirical  investigation  of the variance estimation
methodology prescribed  for the National Stream Survey:  Simulated  sampling from stream data sets.
Oct. 1987.  Technical Report  118, Dept. of Statistics, Oregon State University.

White, D.,  A. Jon Kimmerling,  and  W.  Scott  Overton.   1991.   Cartographic  and  Geometric
components of a Global  Sampling Design for Environmental Monitoring, Accepted by Cartography and
Geographic Information  Systems.

Yates, E.  1949. Sampling Methods for Censuses and Surveys.  London: Charles  Griffin and Co.
                                                5-2
                                                             •ifV.S. GOVERNMENT PRINTING OFFICE: 1992 - 648-003/41801

-------