RESEARCH     TRIANGLE    INSTITUTE
                                       1864/14/03 - Oil
                               National Soil Monitoring Program
                                              by
                                         Roy Whitmore
                                       Martin Rosenzweig
                                          John Hines
                                  Research Triangle Institute
                                    Research Triangle Park,
                                     North Carolina  27709
                                    Contract No.  68-01-5848
                                 Task Manager:   William Smith
                                  Project Officer:   Ann Carey
                                 Design and Development  Branch
                                 Exposure  Evaluation  Division
                           Office of Pesticides  and Toxic  Substances
                                Environmental  Protection Agency
                                    Washington,  D.C.   20460
                                                                       March 1981
                                                                       Draft of Final
RESEARCH  TRIANGLE  PARK,  NORTH   CAROLINA  27709

-------
                     Disclaimer

This document is a preliminary draft.   It has not been
released formally by the Office of Testing and Evaluation,
Office of Pesticides and Toxic Substances, U.S. Environmental
Protection Agency, and should not at this stage be construed
to represent Agency policy.   It is being circulated for
comments on its technical merit and policy implications.

-------
                        TABLE OF CONTENTS

                                                             Page

      NATIONAL SOIL MONITORING PROGRAM	      1

 1.1  General Description of the Program	      1
 1.2  The Rural Soils Network Survey Design 	      1

      1.2.1  General Considerations 	      1
      1.2.2  The Probability Sample Design	      1
      1.2.3  Limitations as a Monitoring Network	     18
      1.2.4  Uses in Regulatory Action	     19
      1.2.5  User Needs and Historical Uses of the Data .  .     19

 1.3  Alternate Survey Designs for the RSN	     20

      1.3.1  Design Option One	     20
      1.3.2  Design Option Two	     22
      1.3.3  Design Option Three	     24

 1.4  Present Network Operations	     34

 1.5  Alternate Operational Designs for the RSN	     35

 1.6  Recommended Modifications 	     36

 1.7  Statistical Findings and Charts for the RSN	     37

      1.7.1  Introduction	     37
      1.7.2  Sampling weights 	     37
      1.7.3  Stratification	     42
      1.7.4  Analysis	     44

 1.8  Capabilities for Performing Special Studies 	     53

 1.9  Toxic Substances Other Than Pesticides in Soils  ...     53

1.10  Implementation Plan for a New Survey Design
      of the Rural Soils Network	     54

EVALUATION OF CHEMICAL ANALYSIS	     90

2.1  Objective	     90
2.2  Discussion	     90

      2.2.1  Analytical Methodology 	     91
      2.2.2  QC/QA	     97
      2.2.3  Accuracy and Precision	     97
      2.2.4  Minimum Detectable Levels	     99

2.3  Fate of Pesticides in Soils	    102

2.4  Recommendations	    103

-------
                            TABLE OF CONTENTS

                                                                 Page

REFERENCES	   105

APPENDIX A:  Questionnaire on Chemical Analysis of Soil ....   A-l

APPENDIX B:  National Soil Monitoring Program -
             Pesticide Analysis Report Form 	   B-l

APPENDIX C:  Analytical Methodology for Organochlorine and
             Organophosphorous Pesticides and Trifluralin .  .  .   C-l

APPENDIX D:  Sampling Weights for the Rural Soils Network (RSN)   D-l

APPENDIX E:  Construction of an Analysis Data File	   E-l

-------
                             LIST OF TABLES

Table                                                            Page

 1.1      Sampling Rates (%) Which Provide Standard Relative
          Precision of County Level Estimates for 10 Size-
          classes and 3 Sizes of Unit	    5
                                              *
 1.2      Dichotomization of the Land Use Code	    8

 1.3.3.1  Construction of the Cost Model	   30

 1.3.3.2  Cluster Effect for Selected Values of p and n2. .  .  .   32

 1.3.3.3  Minimum Cost Allocation Subject to the Constraint .  .   33

 1.7.1    Fiscal Years of Data Collection for the Rural Soils
          Network	   40

 1.7.2    RSN Sites in Counties Having Both Irrigated and
          Remainder Strata, but only 160-acre PSU's 	   47

 1.7.3    Compounds with No Detectable Levels in Cropland Soils   48

 1.7.4    Compounds with No Detectable Levels in Noncropland
          Soils	   49

 1.7.5    Statistics for Compounds with Few Detectable Levels
          in Cropland Soils for Round One	   52

 1.7.6    Statistics for Compounds with Few Detectable Levels
          in Noncropland Soils for Round One	   53

 1.7.7    Statistics for Compounds with Detectable Levels in
          Noncropland Soils for Round One 	   54

 1.7.8    Statistics for Compounds with Detectable Levels in
          Cropland Soils by Census Division for Round One ...   55

 1.7.9    Statistics for Compounds with Detectable Levels in
          Cropland Soils by Cropping Region for Round One ...   72

 2.1      Pesticides and Toxic Compounds Analyzed Under NSMP.  .   93

 2.2      Procedures for the GC Analysis of Pesticides for
          the NSMP	   95

 2.3      Average Recoveries for Some Organochlorine Pesticides
          from Soil	   99

 2.4      Precision for Some Organochlorine Pesticides in Soil.  101

 2.5      Detection Limits of Pesticides in Soils 	  102

-------
                             LIST OF FIGURES

Figures                                                          Page

 1.1      Typical Stratification of a Township	      3

 2        Sample Points on a 160-acre Sample Area 	      7

 2.1      Capillary GC/ECD Chromatogram of Arochlor 1242 and
          Arochlor 1260	     57

-------
                            EXECUTIVE SUMMARY
1.   Introduction

     The purpose of  the review of the National Soils Monitoring Program
(NSMP) is to:

     a)   Describe the network,
     b)   Assess it current effectiveness,
     c)   Provide design options.

The NSMP has two. components, the Urban Soils Network (USN) and the Rural
Soils Network (RSN).  Its purpose has been to monitor pesticide residues
in soils in the conterminous United States.

     The USN will be reviewed in a later report.

     This report considers  the RSN review which  represents  a major and
time-consuming  effort.   It embraces  the assembly and  review of design
documents, the correspondence files and memoranda relating to operational
activities, and the computer data files including editing and correcting
data  entries  where necessary.   It also  includes  analyses of  the  data
using  the sampling  weights developed during  the establishment  of the
structure of the survey design.

     This report contains a brief and complete description of the statis-
tical  design  of the  RSN,  and  its  parent the  CNI.   It  is  therefore a
valuable asset in understanding, analyzing or modifying the soil monitor-
ing efforts of the federal government.

1.1  General Description of the Program

     The National Soil Monitoring Program consists of two networks:   (1)
the  Urban Soils  Network and  (2)  the Rural  Soils Network.   The Rural
Soils Network is a probability subsample of the 1967 Conservation Needs
Inventory sample.  The  area sampled by the Rural Soils Network includes
all of the  conterminous United States except for areas considered to be
urban in  character.   These  urban areas are monitored by the Urban Soils
Network,  which consists of a stratified sample.

1.2.1  General Considerations

     The  fact  the  Rural Soils  Network  (RSN)  is a probability sample
makes possible  valid statistical inferences to  the  population sampled,
namely all  rural  soils of the  conterminous United States.   Moreover,
inferences are possible for all reasonably large geographic areas within
the United States,  for example cropping regions and larger States.  Some
State exclusions must be noted in analyzing the data.

     The  operational  design of the RSN  makes  possible  some interesting
statistical  analyses.   Because  soil  and crop  specimens are  obtained
simultaneously  at  harvest  time  from matched  sites,   the  relationship
between pesticide  levels  in soils and harvested  crops  can be analyzed.
                                  -i-

-------
Also, since  some  sites were sampled at  a  four year interval, trends in
pesticide residue levels can be investigated.

1.2.2  The Sampling Design

     The Rural  Soil Network  (RSN)  is a probability sample  of 10-acre
sites from  the population  of all rural land  areas  in the conterminous
United States.  Each  10-acre  site is located by a probability subsample
of the data  points  of the 196JL Conservation Needs Inventory (CNI).  The
CN1,  in  turn, is a probability  sample  of all rural  land areas in the
conterminous United States.

     The CNI is a  stratified random  sample  of primary  sampling units
(PSU's) from each county of the  conterminous  United States,  except for
those counties strictly metropolitan in  character.  The standard size of
the PSU's was 160  acres, although AO^acjre, 100.-ac.re, and j640-acre PSU's
were not uncommon.  The standard sampling rate was two percent, however
this rate was increased or decreased in  order to eitner provide estimates
of nearly  equal precision  for all counties and  to  oversample areas of
special  interest.   The  sampling rates  varied within strata  from less
than one percent to approximately thirty-two percent.

     In the  CNI,  data were collected  for  each of a series of points at
every CNI sample site.  The land use data collected for each CNI sampling
point was  used to  classify the  point as either a  cropland  point or a
noncropland  point.  The  sampling design of the RSN specified that JD.._02_5
percent.,oj  the cropland  and  Q_J).Q25. perjcent __of_ the .noncropland of the
rural conterminous  United  States would  be sampled.  A subsample of the
CNI  cropland sampling points  was selected and  used to  locate the RSN
cropland sample sites.  The RSN noncropland sample sites were located by
a subsample of the CNI noncropland sampling points.

     The operational  design of the Rural  Soils Network (RSN) specifies
that each cropland  site be randomly designated as a first-year, second-
year, third-year, or  fourth-year cropland site, such that one-fourth of
the  cropland sites  in  each  State  will be  sampled each  fiscal year.
Noncropland sites were handled in the same manner.  Specimens were to be
collected at  each site no less than once  every four years and not more
than once per year.   Soil specimens were  obtained  by compositing fifty
soil cores,  2-inches  in diameter by 3-inches in depth.  Cropland speci-
mens were to be obtained immediately before or at harvest time.

1.3  Alternate Survey Designs

1.3.1  Design Option One

     A minimal change alternative would be to subsample the current RSN.
This option mainly addresses the problem of the cost of the RSN although
the need  for national  and regional estimates  is  also considered. (^Any
need to eliminate reliance upon the 1967 CNI is not addressed?]

     This option  does offer  the advantage that  it can  be quickly and
easj^y.  implemented,   possibly  while   other   alternatives  are  under
development.
                                  -ii-

-------
     Replicate subsamples are recommended if this option is to be imple-
mented,  even  if""it  is "only on  a temporary basis.  For  example,  if 50
percent  of  the  RSN sites are to  be  surveyed,  five subsamples that each
comprise  a  10 percent  subsample can be used.  At  least  five replicate
subsamples should be selected.   The use of replicate subsaraples makes it
possible  to estimate  sample variances  easily by  using the  theory of
replicate subsamples.

     It  may also  be useful to select  the  subsamples  at different rates
within domains of interest.  Identification of strata of special interest
within the domains just considered can be used to increase the possibil-
ity of finding toxic substance residues.

1.3.2  Design Option Two

     A  design analogous  to the design that  produced the  present  RSN
sample  can  be based  upon the 1982 jJational .Resources Inventory (NRI).
Use of  the 1982  NRI will provide  up-to-date land  use informationT  A
subsampling procedure  to obtain  adequate  precision at minimum cost is
proposed.   This  can  be accomplished  by  identifying  areas  where  toxic
residues are likely to be found and giving these areas a greater probabil-
ity of being selected for the RSN sample.

     It  is suggested that counties be used as primary sampling units for
the second  phase  sample.   The  data from the  present  RSN indicates that
counties  are  generally  heterogeneous with  respect to  toxic residues."
Thus, it would  be advantageous  to select relatively few counties with a
larger number of sample sites.   The use of counties as PSU's will reduce
travel costs associated with data collection.  More importantly, smaller
areas like counties can be effectively stratified into areas where toxic
residues are likely to be found.

     The  RSN sample sites  are   to  be  located  at  NRI  sample  points.
Sample  counties  are selected from  the counties  in the  NRI sample, so
that counties where  toxic substance residues are likely  have a greater
chance  of selection.  Thus, it  is suggested that  counties  be selected
with probability proportional to size (FFS), where the size measure is a
measure of the likelihood of finding toxic residues.

     Efficient  sampling within  the  selected  counties can  result  from
careful  stratification.   The NRI sampling points  within a  county  are
first stratified  into  cropland points and noncropland points, to insure
adequate representation of each of these land types and because agricul-
tural chemical  residues  are  more likely to be found in cropland.  Local
land use  characteristics  can be  used to further stratify both the crop-
land points and the noncropland points.

1.3.3  Design Option 3

     Review of  the  data indicates large numbers of zero valued observa-
tions,  and relatively few positive observations.   This analytic challenge
has been  discussed  elsewhere [See Lucas et al, Recommendations  for  the
National  Surface  Water Monitoring  Program for Pesticides.   Report  No.
                                  -ill-

-------
    RTI/1864/01-02I].   The conclusion of  that analysis was that  the  appro-
    priate  measures  of "level"  are:

          (1)   The  proportion  of  positive  dete.c.tignsj  i.e.,  the  relative
               frequency  of  last   stage  sampling  units  positive  for  the
               substance(s)  under investigation,  and

          (2)   The_ pjropar.tion _of__s,amp_l_ing  units.. c^ntaj^J5g^^ncejn^:rations_ of
               substance above  some  specified level.   -This level  may  signal
               the  existence of  an  undesirable situation.

          The  proposed  design is  a  two-stage area probability sample  with
    stratification of the  sampling  units  at each level.   The  first stage or
    primary sampling units (PSUjs)_ are, ^counties .  Geographic  stratification
    is  provided by  the jour _ Census  Regions .  Allocation of PSU's  to these
    regions is in  proportion to the  land area eligible for the study.   Using
    additional variables to allocate the  sample is unlikely  to be useful at
    this  level due  to the  variety  of  land use within each  Census Region.
    The  eligible  land area  is  currently defined by the  membership  require-
    ments of  the RSN and USN.   It  may be advantageous  from administrative as
    well  as fiscal and  statistical  grounds  to combine the activities  of the
    soil  networks, and consider SMSA counties as a  stratum within  the  survey.
    This  point requires  further  review,   however  initial   investigation
    suggests  savings  are likely.

          With the  extension of monitoring responsibility from pesticides to
    toxic substances  in general, some revision of the  approach is  indicated.
    The  following stratification  variables   are  therefore proposed for the
    PSU's in  addition to the  geographic  stratification above:

          (1)   Land area,
          (2)   Population density,
          (3)   Agricultural  activity,  and
               Industrial activity.
     SLe.c.oad  stage sampling ..units  (SSU^s)  are  10-acre plots.  These  are  pro-
     posed  as the final stage  units  or  analysis  units  on the assumption  that
     they  are sufficiently  homogeneous  that  the effects of  sub samp ling are
     negligible  .   This is a verifiable proposition.   The problem with SSU's
     this  small is  the ability  to  locate them  in the  field.   The  lack of
     identifiable  boundaries renders exactly locating  them most  difficult.
f|    To  ease this  difficulty,  ejiumeratioji_districts  (ED's) are  proposed as
     readily  identified segments .  The problem' is reduced to locating the SSU
     within the ED,  or  any suitable sub segment chosen to  facilitate the task.

         SSU's  will be allocated equally to PSU's.   A detailed  field-use
     protocol will  locate the specimen for collection,  leaving the minimum of
     discretion for  the field personnel  in the selection  of  these  sites.   The
     protocol will  specify  a   grid  locating multiple  specimen  collection
     sites.   The  soil collected  in a given plot would be composited, unless
     the homogeneity of the  10-acre plot is under investigation.

-------
1.4  Present Network Operations

1.5  Alternate Operational Design

     The  operational  design of  the  Rural Soils Network  (RSN)  was well
conceived   for   monitoring  agricultural  pesticides   and  herbicides.
However, much pesticide  and herbicide residue may often  be  leached out
of, or vaporized from, the cropland soil by harvest time.

1.6  Recommended Modifications

1.7  Statistical Findings

     Several types of analyses are of interest for the RSN data, notably:

     (1)  Estimation  of  base levels  for residues of  toxic  substances,
     (2)  Estimation  of  changes  in  mean  levels  of  toxic  substance
          residues  from  the  first  round  to the  second  round  of data
          collection, and
     (3)  Estimation  of   relationships   between  soil  and crop  residue
          levels.

The  reason  for analyzing  the  RSN data  in this  study  was  to  obtain a
measure of  the  degree of precision that could be  obtained for analysis
of residue data based upon the present data.   It was decided that estima-
tion of  base levels  of  residues would  be sufficient.   In  particular,
estimation of levels  was undertaken for the first round soil data only.

     It was  found  that the data values for most compounds were predomi-
nantly zero.  The predominance of zero values in the residue data results
in J-shaped  distributions for  the  amount of residue detected  for most
compounds.   This  type  of  data  presents  some  rather  unique  analysis
problems.  For  example,  the weighted mean of  the raw data values has
little meaning  if  most values  are zero and a few are very large.  Thus,
some type of  data transformation  is  generally required  in order  to
obtain a  meaningful  analysis  [See Lucas, et al,  Recommendations for the
National  Surface  Water Monitoring Program for  Pesticides.   Report No.
RTI/1864/01-02I].  Ideally, each compound  should be considered individ-
ually  to  determine an appropriate  transformation, if  any.   Ubiquitous
compounds like arsenic may not require transformation.

     For  analyses  on the  proportion scale,  all  data  values above the
minimum  detectable level  (MDL)  were  replaced  by  the  value one.   The
weighted mean on  this scale is a weighted estimate of the proportion of
the sampled  land  area with a residue level in excess of the HDL.  Since
this scale was felt to be generally the most appropriate for analysis of
the  residue  data,  the  standard error  and  the  design effect  for the
estimated proportion were also computed.

     Estimation of standard errors and design effects required that some
strata  be combined.  Since it was not possible to account  for all dimen-
sions of the CNI stratification, the standard errors computed are undoubt-
edly conservative  estimates.   This   results  in  similarly  conservative
interval  estimates  of the  proportion of sampled areas where  levels  of
the compound exceed the minimum detectable level (MDL).

                                  -v-

-------
     The design  effect  is  the ratio of the  sample  standard error to an
estimate of what the standard error would have  been if a simple random
sample of the same size had been used, i.e.,

     ___ _     Estimated S.E. (For the design used)
               Estimated S.E. (Simple Random Sample)


Alternatively, the design  effect can be thought of  as  the ratio of the
actual sample size to the  sample size that  would  be required to obtain
an  estimate  with  the same  standard error  based  upon a  simple random
sample.   Generally   stratification  decreases  the  design  effect,  while
clustering increases it.  Thus, since the CNI stratificatiorT~can be used
and  there  is no  clustering of  sample  sites in the RSN  sample, design
effects less one would be expected.  This would indicate that the design
produced  smaller standard  errors  than  would a  simple  random sample of
the same size.  Many of the design effects shown in Tables 1.7.7 through
1.7.9 are indeed less than one.  However, some design effects are substan-
tially greater than one.  It is not clear therefore that the CNI strati-
fication was particularly  advantageous  for estimation of proportions of
detections for toxic.substance residues.
                  "t-pt-^ttZJt*
1.8  Capabilities for Special Studies

1.9  Toxic Substances Other than Pesticides in Soils

1.10  Implementation Plan for a New Survey Design of the Rural Soils
      Network

2.0  Evaluation of Chemical Analysis

     Information  on  the quality  of the pesticide data compiled by the
NSMP is not  currently available to users of the program's computer data
file.  Some measure of this quality is necessary for meaningful statisti-
cal evaluation of the data and practical interpretation of the results.
To this purpose,  a  limited review of the current analytical methodology
was  conducted  and  information  compiled  on the accuracy (recoveries),
precision  (coefficient  of  variation) and  minimum detectable  levels of
each of the pesticides monitored under the program where such information
was available.

     Over  thirty toxic  substances  have been  monitored under  the  NSMP
including  several chemical  classes:   1)  organochlorine  pesticides;  2)
PCBs*; 3)  trifluraline; 4)  organophosphorous  pesticides; and  5)  heavy
metals.  All analyses (~ 450 soil specimens/year) are carried, out at the
Toxicant Analysis  Center,  Bay  St.  Louis,  Mississippi.   However,  heavy
metals have not been analyzed in soil sinceM9_7J£

     Nearly  all  procedures  applied to the  analysis of  pesticides  and
PCBs in  soil specimens  used an  initial  extraction followed by column
chromatography clean-up.   Final quantitation  of pesticides  was carried
out using external standard techniques with gas chromatography (GC).   In
general,  confirmation  of detected pesticides  was performed  by changing
the selectivity of the GC column or detector.  Each set of specimens was
*
 Polychlorinated Biphenyls

-------
accompanied by  a blank  and  ne or  more controls  (fortified  blanks)  to
check  contamination  and pesticide  recoveries  during the  extraction,
clean-up and GC analysis procedures.

     Levels of  heavy  metals  in  soil  specimens  were  determined  using
atomic absorption spectroscopy (AA).  Plane AA was used for lead, cadmium
and arsenic and  the cold vapor techniques for mercury.   No  information
was available on the current accuracy, precision and limits of detection.

     Relatively little information  was  readily available on the current
accuracy,  precision and  MDLs** for  pesticides and  PCBs in  soil.   Of
particular interest are individual values for accuracy and precision for
each pesticide in  each of the specimen matrices (crops, water and sedi-
ment).  An average of each of these values derived from replicate analy-
sis over a period of time would also provide an indication of the method
stability  for a particular  pesticide  in  a  specific  matrix.   Recovery
data  for  each pesticide  was judged a  reasonable indication  of method
accuracy  since  analytical   results  not  corrected  for  recoveries  and
losses during the  analysis  can represent a  significant contribution  to
error in the reported result where such recoveries are low.

     Relatively  little  recovery  and precision  data were available  at
levels near the  pesticide MDLs.   It is particularly important that such
data be provided to users of the computer data files since it represents
the "worst" case in terms of the data quality.

     Limited  review of  analytical  methodology used in the NSMP and  an
attempt to compile  data  for  the average accuracy, precision  and MDL in
soil  for  each toxic  substance monitored under  this program  provide  a
basis for the—following recommendations;

1.   Accuracy (that is, recoveries) and precision data must be generated
     for  all  pesticides  monitored in  the  NSMP.   The  data   should  be
     generated at   two  different  levels  (e.g.,  at  the  MDL  and at  ten
     times the MDL).  The results  for controls analyzed with each set of
     specimens would be  the best means  of  providing  this  information
     since it is  necessary  that  control data  be  made  accessible  to
     computer data  file  users  in  any event.   Controls  must be run with
     each  set of specimens  and  should consist of a  blank (unfortified
     soil free from the  analytes  of interest) and two fortified blanks
     (one fortified  at the MDL and another  at "terr~t"imes~t:He" MDL)T~TKe
     analytical results for the controls should be reported on a separate
     form  (especially  designed for control data) and  encoded  such that
     there is a one-to-one association with the particular set of speci-
     mens  with  which  they   were  analyzed.    The  encoding should  allow
     later computer retrieval of control data for any particular specimen
     set or group of sets (for example, geographic area, over a specified
     period of time,  or  for a particular pesticide).   The availability
     of this  information in  a  retrievable form to data file  users would
     provide  the  means   for assessing  data  reliability now  lacking.
     Further,  any  duplicate  specimen  analyses  must be  reported in  the
     computer data  file  as  they  provide the  best  means of  assessing
  Minimum detection levels

                                  -vii-

-------
     method precision on  a  continuous  basis.   Duplicate results must be
     specifically encoded such that they are retrieved as a group (e.g.,
     all duplicates for a particular matrix and pesticide over a speci-
     fied period of time)  as well as with the initial analytical results
     for the specimen.  The need to make routine control data available
     to program .data file users cannot be "dyefempfiasized.  This does not
     preclude the  use of specialized  controls  (e.g.,  SPRMS);  however,
     these results  should also be included in the computer file encoded
     to allow facile retrieval both as a group and with their particular
     specimen set.

2.   The  pesticides included  on  the  routine  monitoring list  must be
     reviewed on a  regular  basis and appropriate deletions or additions
     made.  Specifically,  the need for routine analysis of organophosphor-
     ous pesticides in soil  should be reviewed as this class of compounds
     is known to be unstable and has seldom been reported in either soil
     or  sediment.   Once  the  baseline  has  been  established  for  such
     compounds, three choices are possible:  1) cease to analyze for the
     compound(s)  except  under  special circumstances   (e.g.,  after  a
     chemical  spill or when contamination is  suspected from  a recent
     application);   2)  analyze  for the  compound(s)  on a  more  frequent
     basis; and  3) concentrate  efforts on the  analysis of degradation
     products of known toxicity where these exist.  Decisions concerning
     the analysis  of  toxic  substances  under the NSMP should be based on
     information generated in other agency data files (e.g., USDA, USGS,
     etc.) as well as data generated within EPA.

3.   Soil specimens should  be  characterized as to the percent carbon or
     percent inogranic  residue.  This  information must  be  included on
     the report form (along  with moisture content) as part of the speci-
     men characterization (source).  Significant trends may otherwise be
     missed with respect  to the soil type and  its effect on toxic sub-
     stance accumulation,  degradation and transport.

4.   Control specimens  (in  the  matrix  of interest)  should be included
     with any specimens either stored for extended periods or shipped to
     another  site   for  analysis.   This is  particularly  important  for
     toxic compounds  which  are known to be unstable;  i.e., organophos-
     phorous pesticides.  The results of  these  "storage controls" must
     also be included in the computer data file with appropriate encoding
     for specific retrieval.

5.   Analytical methodology  should be updated to include state-of-the-art
     capillary  GC   techniques.   This would provide  a higher  degree of
     confidence in  the resulting data  through  increased resolution and
     sensitivity.   The use of higher resolution analytical techniques is
     a move toward the quantitation of PCBs (and technical chlordane) as
     their individual  isomers.  This approach is  far  more useful than
     the present method of attempting to identify, patterns and averaging
     components, since the toxicity and biodegradation of the individual
     isomers are not identical.

6.   The  pesticide recoveries  should  be  monitored  for  each  specimen
     analyzed by initial  fortification  of  the specimen with appropriate
                                  -viii-

-------
compound(s).   Subsequent  analysis  of  the  compound level  should
enable comparison  of data between specimens with  increased confi-
dence that anomalous results will be detected.   The use of internal
standard quantitation techniques would normalize recoveries between
specimens and should be considered.

Detailed  information  on all  analytical  procedures  under  the NSMP
should be documented in  one  source.   The procedures must then be
maintained current with ongoing improvements and modifications made
by the analytical  laboratories.   Such updating requires both flex-
ibility and regular review by program management.
                            -ix-

-------
                  1.  NATIONAL SOILS MONITORING PROGRAM
1.1       General Description of the Program

     The National Soils Monitoring Program consists of two networks:  1)
Urban Soils Network and 2) Rural Soils Network.  The Rural Soils Network
(RSN) is a two phase probability sample.  The first phase sample was the
1967 Conservation  Needs Inventory  (CNI)  sample.  The  RSN sample  is  a
probability subsample  from  the  ultimate sampling units of the 1967 CNI.
The  area sampled  by  the RSN includes all  of the  conterminous  United
States  except for  areas considered  to be  urban in  character.   These
urban areas are  monitored by the Urban Soils Network, which consists of
a sample of the urban areas.

1.2       The Rural Soils Network (RSN) Survey Design

1.2.1     General Considerations

     The fact that the Rural Soils Network (RSN) is a probability sample
makes possible  valid statistical inferences to  the  population sampled,
namely  all  rural  soils of  the  conterminous United  States.   Moreover,
inferences  are  available  for  all  reasonably  large  geographic  areas
within the United  States,  e.g.,  cropping regions and the larger States.
However, the  decision  not to collect data in  some States restricts the
population for which inferences are valid.

     The operational design of the RSN makes  possible  some interesting
statistical analyses.   Since soil and crop  samples  are obtained simul-
taneously at  harvest time,  the relationship between pesticide levels in
soils and harvested crops can be analyzed.   Also, since each sample site
is  sampled  at four-year  intervals,  trends  in pesticide  residue levels
can be investigated.

1.2.2     The Probability Sample Design

     The Rural  Soils Network  (RSN)  is a probability  sample  of 10-acre
sites from  the population  of all rural land areas  in  the conterminous
United States.  Each 10-acre site is located by a point determined by a
probability subsample  of  the data points of the 1967 Conservation Needs
Inventory (CNI) which  is,  in itself, a probability  sample of all rural
land areas in the  conterminous  United States.   Among the lands included
in  the  CNI  are  the following:  (a) privately owned  land,  both personal
and corporate; (b)  land owned by State and  local  governments; (c) land
owned by the  federal government;  and (d) Indian  land.   Among the areas
excluded are:  Ponds and  lakes  of more than two acres,  all streams, and
urban or built-up areas.

1.2.2.1   The CNI survey

     The 1967 CNI did not however map, that is, collect data for, federal
noncroplands.    This portion of  the  CNI  was indefinitely  postponed,
although all federally  owned rural land areas did receive their share of
CNI  primary   sampling  units.  Federally owned cropland  operated  under
lease or permit was, however, mapped by the 1967 CNI.

                                 -1-

-------
     Urban  or built-up  areas excluded  from  the  CNI  have a  specific
definition and not  all  areas inside city and village limits are consid-
ered  urban or  built-up, whereas  some areas  outside city  and village
limits are.  In particular,  urban or built-up areas are defined as areas
of 10 acres  or more,  consisting of  residential  sites,  industrial sites
(except  strip mines,  borrow and  gravel  pits),  railroads,  roadways,
cemeteries,  airports,  golf  courses, shooting  ranges,  institutional and
public administration sites,  and "similar kinds of areas."1  The exclu-
sion  of  urban  or  built-up  areas (of  10 acres or  more)  from  the CNI
resulted in  excluding  of some counties that were  strictly metropolitan
in character.

     The CNI  sample sites were selected by the Statistical Laboratories
at Cornell University and Iowa State University.  The sampling sites for
thirteen  States   in the northeastern  United  States  were  selected  at
Cornell.    All other  sampling sites  were  selected  at  Iowa State.   A
deeply stratified sampling design was  used for  the  CNI.   Counties were
treated  as  strata  within all States.   Little  more  is  known  about the
procedure  used at  Cornell,   except that  the standard  sampling  rate was
about 2  percent  and the standard  size  of a primary  sampling unit  (PSU)
was 100 acres.  The stratification used at Iowa State sometimes involved
large  scale  geographic  stratification  between  the  State and  county
levels,  e.g.,  a  sandhills stratum was designated  in Nebraska,  and  in
many States irrigated areas  were treated as a stratum.

     The sampling  procedure   followed  at  Iowa State  can best  be under-
stood by first considering  the procedure most  commonly  employed in the
States of  the  western  United States that are divided into townships.   A
township is  a 6  mile by 6 mile  square of  land  (see  figure 1.1).  Each
regular  township contains  36 sections.   This  township  consists  of  6
rows, each containing 6 sections.  Three geographical strata were formed
from  this  township:  1) the  first stratum was the northern 2  rows;  2)
the  second stratum was the  middle 2 rows; and  3)  the  third stratum
consisted  of  the 2 southernmost rows.  Each stratum  then contained  48
quarter-sections  (160-acre  square  PSU's),  from  which  a  predetermined
number of  PSU's  were  randomly selected.  The standard sampling rate for
the 1967 CNI was the selection of one PSU from each stratum of 48 PSU's.
Thus,  the  standard sampling rate was  approximately 2 percent (1/48).

     Estimates of nearly equal precision were desired for all counties.
The sampling procedure just  described was believed to provide sufficient
precision  for a  county with 384 to  767  acres of  inventory  acreage.
Thus, a  sampling rate  of  less  than 2% was used in some  of the larger
counties, and  more  than  2%  in some of the smaller counties.  The sampl-
ing  rate was  also  generally increased  in irrigated strata and other
areas of special interest.

     In order  to  increase the sampling rate from 2% to 4%, two quarter-
sections  were  selected  from  each stratum,  rather than  one.  However,  a
i
 Basic Statistics  —  National Inventory of Soil  and  Water Conservation
Needs, 1967.
                                 -2-

-------
        Section one
        sq.  mi.,
        640  acres
Stratum 1
Stratum 2
 Stratum 3
                                                           Sampled location is
                                                           0.25 sq. mi., i.e.,
                                                           a quarter-section,
                                                           i.e., 160 acres.
                                                                     T
                                                                     i
                                                                      t
                                    6 mi
                Figure 1.1  Typical Stratification of  a Township

           (Source:  personal communication from  Iowa  State University,
                     Statistical Laboratory).
                                 -3-

-------
decrease in the sampling rate from 2% to 1% was accomplished by changing
the stratum size from 12 sections to 24 sections with one quarter-section
being selected  from  each of the 24 section strata.   Thus, a decrease in
the sampling rate  from  2% was accompanied by an increase in the stratum
size.

     It was also desirable at times to change the size of the CNI sampl-
ing site from the usual 160 acres.  In some large counties in the western
United States with large tracts of relatively homogeneous soil type and
usage, CNI  sample  sites  consisted of one section or 640 acres.  In some
highly developed agricultural areas of special interest, sites consisting
of 40 acres, a sixteenth-section, were sometimes used because of consider-
able heterogeneity between fields.

     The above  considerations  led to the establishment of Table 1.1 for
the determination  of a  standard sampling rate based  upon the inventory
acreage  of a  county and  the size  of sampling unit  to be  used.   The
standard sampling  rates  shown in Table 1.1 were determined so that the
relative precision of county level estimates would be constant, i.e. not
dependent upon  either county or sampling unit size.  This table was not
strictly adhered to, however.

     The sampling procedure just described was used in all States samples
designed at Iowa State.   Township and section boundaries were artificially
imposed  upon  counties that  were not  already  surveyed into  such divi-
sions.  Whenever possible,  township and section boundaries were made to
follow lines of longitude and latitude in the same manner as in section-
ized States.

     Many  counties are  not  regular in shape  so that  there were often
partial townships, strata,  and sections around their borders.  Sections
around such borders were included in the sampling frame only if at least
part of the section was in the county being sampled.  Such sections were
then grouped into  strata for sample selection.  The strata were usually
composed of twelve  sections  each,  just  as  twelve  sections  form  one
stratum  in  the standard  sampling scheme depicted  in Figure  1.1.   Any
sampling units  that  fell outside the  county of  interest  as a result of
this procedure were subsequently ignored.

     For each  sampling  location, i.e. PSU, determined  by the procedure
just described,  the CNI  collected  data at  each of a  series of points
within that PSU.   In order to determine the positions of these sampling
points, an  aerial  photograph of the sampling  location  was  obtained.   A
spinner or template consisting of a grid of small holes was then centered
over the photograph and  spun.   A  deterministic procedure was  used  to
choose a hole  for  the location of the spinner in a fashion that allowed
some variety in  the  choice of the  spinner  location without introducing
personal bias.2   When the  template came  to rest,  the  location of each
hole was marked on  the  photograph.   The  first point in  the upper left
2
 The procedure for  selecting  the spinner hole is  described  in Appendix
#2 of the National Handbook for Updating the Conservation Needs Inventory
(U.S.D.A., Washington, D.C., August 1966).

                                 -4-

-------
      Table 1.1.  Sampling Rates (%) Which Provide Standard Relative
                  Precision of County Level Estimates for 10 Size-classes
                  and 3 Sizes of Unit
County
size-class
1
2
3
4
5
6
7
8
9
10
(Square
47
48
96
192
384
768
1,536
3,072
6,144
12,288
miles)
and less
95
191
383
767
- 1,535
- 3,071
- 6,143
- 12,287
and over

40 acres
16
8
4
2
1
1/2
1/4
1/8
1/16
1/32
Size of unit (PSU)
160 acres
32
16
8
4
2
1
1/2
1/4
1/8
1/16

640 acres
64
32
16
8
4
2
1
1/2
1/4
1/8
*
 Source:  Taylor, Howard L.  Statistical Sampling for Soil Mapping Surveys,
June 1962, courtesy of the Iowa State University Statistical Laboratory.
                                 -5-

-------
corner of  the sample site was  point number one.  The points  were  then
numbered consecutively along a line proceeding from left  to right and/or
up.  The consecutive  numbering  of the sampling points then continued in
the same manner with the line of points just below the first line.  This
procedure continued until all points in the sample area had been numbered
as illustrated in Figure 2. These points constitute an aligned two-dimen-
sional systematic  sample within each selected PSU.3   Such an alignment
of  points  in  a strictly  North-South  and East-West  manner  should  be
avoided because  of the  tendency to develop land  use  in  such a pattern;
the spinning of the template alleviates this.

     Various  sampling  templates  were  prepared  so  that  template  and
aerial photograph  scales  could  be matched to obtain a constant sampling
density.   It was most convenient to assign the sampling  points in local
USDA  offices,  since local Soil Conservation Service  offices generally
had  the  needed aerial  photographs  in their  files.   However,  the local
USDA personnel  did not  always follow the sampling protocol specified by
the design.  For instance,  it appears that the  templates were not spun
for  Nevada,  and  the template  was  often not  properly  matched  to  the
photograph scale in New Mexico.

     Exhibit  1   is  a photocopy  of  an  aerial  photograph of  a specific
160-acre CNI sample site with 34 consecutively numbered sampling points.
The  point  density of the  template  used for this  site was the standard
point  density  intended for  all sites, except  for the  640-acre sites.
The point density of the templates used for 640-acre sites was one-fourth
that of the other sites, since 640-acre sites were used only in homogen-
eous  land  areas.   Thus,  160-acre  and  640-acre sites usually received
from 34 to 39 sampling points and 40-acre sites usually received between
9 and  11 points.

     Exhibit 2 is a photocopy of the data collection form used to record
the data for  the 34 sampling points shown in Exhibit 1.   The data items
that  were  used  in  determining  the Rural  Soils  Network  (RSN) subsample
were the Field  Mapping  Symbols  and the Land  Use Codes.   In particular,
this  information was used to classify each sampling point as either a
cropland point  or  a noncropland point as shown in Table  1.2.  It should
be noted that sampling points that inadvertently fell into areas outside
the  target population,  i.e.,  urban  areas,  water  areas,  and  federal
noncropland, were classified as noncropland points.

     The counts  of cropland  and noncropland points were accumulated as
shown  in Exhibit 3 for the purpose  of  selecting the  RSN subsample from
the  CNI  sample.   The data for  the  CNI  sites  shown in Exhibits  1 and 2
appear on the fourth line in Exhibit 3.  In particular, Exhibits 1 and 2
are  for  State  16, Kansas;  County  66, Nemaha;  site  number  5-2-2R.   A
total of 34 points were sampled at this site and 19 of these points were
designated as cropland  points.   Thus, the proportion of cropland points
at  this  site  was 19/34 = 0.55882.  However, the sampling rate in Nemaha
County was 2.257%;  i.e., the ratio of sampled acreage to  total inventory
acreage  in  Nemaha  County  was about 0.02257.   Thus,  in  order to adjust
the  cropland  proportion to  a standard 2%  sampling rate, the "cropland
ratio" was computed as
3
 See, for example, Cochran, W. G. [1977, pg 228].  Sampling Techniques.
Wiley, New York.
                                 -6-

-------
           1

           7

          13

          19

          25

          31
 1

 2
 4
 7

11

16

21
                                                  27   32
                  36   38
                                + is center point
              Figure 2:   Sample.EOints  on a 160-acre  Sample Area  .

Note:  The numbers above are the point  numbers  for  the first  points on  each  line.

Source:  Appendix #2 of the National Handbook for Updating the Conservation  Needs
         Inventory (U.S.D.A.,  Washington, D.C., August 1966).
                                           -7-

-------
             Table 1.2.  Dichotomization of the Land Use Code
                           CROPLAND CATEGORIES
     Land Use Codes
Nonirrigated   Irrigated
     L10
     L20
     L30
     L40
     L50
     L60
     L90
Lll
L21
LSI

LSI
L61
L91
Corn and sorghums
All other row crops
Close grown field crops
Cultivated summer fallow
Rotation hay and pasture
Hayland
Orchards, vineyards, and bush fruits
                         NONCROPLAND CATEGORIES
     Land Use Codes
Nonirrigated   Irrigated

     L70
     ISO
     LOO
     P10          Pll
     P20
     F10
     F20
     H10
     H20

     Field Mapping Sybmol

          UB
          FED
          Wl
          W2
          W3
                 Conservation use only
                 Temporarily idle cropland
                 Open land formerly used for crops
                 Pasture
                 Range
                 Commercial forest
                 Noncommercial forest
                 Other land in farms
                 Other land not in farms
                 Urban or built-up area
                 Federal noncropland
                 Water area of more than 40 acres
                 Water area of 40 acres or less
                 Intermittent water area
 Source:  Memorandum entitled "Soil Monitoring Program—Sampling Design"
from Leo G.K. Iverson to USDA PPC Inspectors.
                                 -8-

-------
CONS. NEEDS  SAMPLE
                 NE
                  &
5-2-2R
          Photo No --
          Owne r sh ip     ' _
                          -Area
                 IX-2CC-I71
                  Exhibit 1:  Aerial Photograph of a CNI Sample Location
"source:  Sampling files maintained by the EPA Field Studies Branch, Washington,
                               -9-

-------
SCS-263

   '
                                           ,C MrVCM l\l.«.UItw

                                           Use and
                                                                              SOIL CONSERVATION SERVICE
                                                                                                "J^
Size   '     Ownership .
    TS^  .   s- -   .?
Sub Basin ,
Land Res. Area _/_ __g_ ^_
                                                                            TT TT TT TC IT

                                                            Land Res. Reg. //    Agt. Sub. Reg.   / ±-
                                                                          22
    ^	Exhibit  2:   CNI ^ia^a r.nllo^f-tnn ghoai^	•	—

    *Source: Sampling files  maintained by the EPA Field  Studies Branch, Washington,  D.C.

-------
                         Exhibit 3:
Accumulation Used in Selecting the Pesticide Residue Network Subsample
                                           A.    I*   \      *   IV(«A^ *•
Source:  Samplin
files maintaine
by the EPA Fiel
Studies Branch,
Washington, DC
16
\\l
1 "
16
4 w
16
16
15
1 '-
J. ^
16
15
1 «.
A \^
1 A
* &
16
15
* w
1 ^
*• O
1 x
I n
1 U
it
* O
1 c
i. J
li
* O
1 i.
L w
li
15
1 .15
* • w
21 15
*• nj
L< 15
. 1 i
> A t^
15
1 <=>
*• *J
i <:,
i o
1 1
i O
1 £.
•*. \J
is
*> w
16
^ wJ
16
A U
'15

1 c.
1 3
Ih
* O
1 £.
1 O
1 A
1 5
16
66
66
66
66
65
6C
tc
66

66
W w
67
67
67
o7
67
67
*s •
b7
67
67
67
67
67
67
67
67
67
67
U I
67
67
67
67
U 1
67
U (
67
67
67
~ 67
67
67
67
U I
o7
67
J *
67
\J 1
67
w 1
67
(0
^—
5
5
5
5
5
5

5
5
1
1
1
1
1
1
1
1
1
1
1
1
2

^
2
2
£.
2
2
2
2
bi
2
2
2
3
3
3
3
3
3
3

3
3
•n.1
•'V-'
1
2
2
2
3
3
3

^
4
1
1
1
2
2
2
3
3
3

^
4
1
1
1
2
2
2
3
3
3

^
^
1
1
1
2
2
2
3

3

•- "%
3R
1R
2R
3R
1R
2 P.
3 1<

2R
3K
1ft

3R
1R
2R
3 P.
IK

3fi
1R

3R
IK

3K
1R
2R
3R
1R
2R
3R
1R
2R
3R
1R
2R
3R
1R

3K
IK
2R
3k
1R
33
32
33
34
36
24
36
35
35

33
35
35
36
32
36
37
35
36
37
35
40
32
36
37
38
34
36
33
35
38
38
35
37
38
35
35
36
37
37
37
38
36

38
18
. 5
10
19
22
21
31
20
24
32
24
0
21
11
10
23
0
30
5
15
2
13
24
15
32
16
10
10
22
33
23
.3
17
17
10
19
25
11
20
12
10
10
22
9
33
^*V^r i i ^^
U«cV.o
.4833'*
.13845
.26852
•49519
.54152
.77536
.76305
.50635
.60763
.83400
.64445
0.0000'J
.55703
.29897
•30577
•62513
0.00000
•83368
.13589
.39667
.05591
.31300
.73335
.40769
• 8'4624
.41198
.23778
.27179
•65231
.92255
•59223
.20599
.47525
.44956
.25749
.53116
.69890
.29897
.52390
.31734
.26^45
.25749
.59795
.21478
.97847
.40273
.74767.
.61760
.39093
.34460
.11076
.12307
.37977
.27349
.05212
.24167
.97847
.39158
.67949
.6726«
.35333
.97847
. 13973
.84257
.58179
.92255
.66046
.24461
.57077
.13222
.56540
.69068
.70667
.32615
.05591
.38623
.77247
.50321
.52890
.72097
.44730
.27956
.67949
.44956
.66112
. 7 1 VJ 1
.72097
.38051
.76363"
o.ooooo
1985.58597
1986.72443
1983.99295
1989.48814
1990.029&7
1990.80503
1991.56809
1992.07445
1992.68208
1993.51608
1994.16054
199^. 160-54
1994.74762
1995.04560
1995.35237
1955.97751
1995.97751
19^6. 81619
1996.95209
1997. 34677
1097.40468
1997.72269
1998.456^4
'1998.86423
1999.71043
2000.1224.7
2000.41025
2000.68205
2001.33436
2002.25692
2002.84915
2003.05514
2003.53040
2003.97997
2004.23746
2004.76863
2005.46754
2005.766?!
2006.29542
2006.61276
2006.67721
2007.13470
2007.73266
2007.94744
2008,92591
2052.33459
20?3. 58226
205*. 19967
2054.59081
2054.93541
2055.04618
2055.
2055.
2055.
2005 .
2056.
2057.
2057.
?058.
2058.
20? y •
2060.
2060.
2061.
2061.
2062.
2053.
2063.
2064.
2064.
2064.
2065.
2066.
2066.
2066.
2067.
2067.
2068.
2063.
2059.
2069.
16925
54902
32752
87965
12132
09979
49113
17067
84337
19670
17517
31406
15753
73932
661&5
3221^4
56696
13774
26906
83644
52713
23330
55996
61537
00211
774^0
277RO
80670
527>S8
97493
2070.25454
2070.93404
2071.38360
2072.04473
20 7?. ->56 7-5
2073.47973
2073.86024
2074.62393
2074.623^3

-------
               10     9
               — .   ^
               34   2.257
This cropland ratio  of 0.49519 was then added to the cropland accumula-
tion, which was the sum of the cropland ratios for all previously listed
sites in the State.

     The  procedure  used  to  obtain  the  noncropland accumulation  was
identical  to  that just described  for the  cropland  accumulation.   How-
ever, it was considered desirable to include federal noncroplands in the
RSN  noncropland  sample.   Although  federal noncroplands  had not  been
mapped by the CNI, the CN1 sampling procedure did assign PSU's in federal
noncropland areas.   That  is,  the CNI sample sites were selected without
regard to federal land status,  whenever a CNI sample site fell entirely
in a federal  noncropland  area, no CNI sampling  points  were assigned to
the  site.   In  order  to  obtain coverage  of these  federal noncropland
sites by the  RSN noncropland sample, the sampling staff obtained a list
of the  CNI sample  sites  in  federal  noncropland areas  for each State.
The sampling  staff  then inserted a "dummy" CNI record into the listings
of the  type  shown in Exhibit 3  for  each CNI site that fell entirely in
federal noncropland.  Each dummy record showed zero cropland points, and
a  total  number of points appropriate  for the size  of  the sample  site,
e.g., 36 points for a 160-acre PSU.

     The  grand  total  of  the  cropland  accumulation  from  Kansas  was
3426.67927, and  for  noncropland it was 3131.41689.   The  total of  these
accumulations, 6558.09616, was employed for estimation of the proportion
of  cropland  and noncropland acreage  in Kansas.   In particular,  the
estimate of the proportion of cropland acreage in Kansas was

               3426.67927
               6558.09616
                                   52.25112878%
This procedure provides  a  direct estimate of the proportion of cropland
acreage in  the State.   This estimated proportion of cropland was multi-
plied by an estimate of the total land area in Kansas, namely 52,510,720
acres, to yield an estimated cropland acreage in Kansas of

     (.5225112878) (52,510,720)    =    27,437,444 acres.
This same procedure was used for all States.

1.2.2.2  The RSN Survey

     The Rural Soils  Network (RSN) selected two subsamples from the CNI
sample  sites,  a  cropland  sample and a noncropland  sample.   The sample
design  of  the  RSN specified  that the  subsamples would  contain  0.025
percent  of  the cropland  acreage and 0.0025 percent  of  the noncropland
acreage in each State.  Thus the cropland sample in Kansas was to consist
of
                                 -12-

-------
     (0.00025) (27,437,444)   =    6,859.36 acres.

Each RSN  sample  site was to be  a  10-acre plot with an  equal  number of
plots  sampled in  each  of  four years.   Thus,  the number  of cropland
sample sites  to  be selected in Kansas in each of four years of sampling

                 n«   6'8!)9:36*"eS	c-    =    171 sites/year.
                 (10 acres/site) (4 years)
The number  of  10-acre noncropland sites to be sampled in each State was
determined  in  exactly the same manner.  Each RSN site was to be sampled
a second  time  four years after the  initial  sampling to determine rates
of change in pesticide residues.   Implementation of this design for all
States resulted in the sample sizes shown in Table 1.3.  This sample was
expected  to yield  reasonably precise estimates for cropping regions and
some of the larger States.

     Having determined  the  number of RSN cropland sites  to be selected
in a  State,  a  systematic subsample of  CNI  cropland  points was selected
from  the  cropland  accumulation for the State.  Each CNI  cropland point
selected was used  to locate a 10-acre  RSN  cropland  sample site.  It is
easiest to explain this procedure by example.  The total of the cropland
accumulation for Kansas  was 3426.67927, and  171  cropland  sites were to
be surveyed in each of 4 years.  Thus, the starting point for the sample
in Kansas was a random number between zero and

          3426.67927     ±       Q ?fi
            (4) (171)          :>.uimt>


The random  number  chosen was 0.27889, which determined the selection of
the  first RSN  cropland  site.  All  RSN  cropland sites in  Kansas  then
resulted from a sequence number of the form

          0.27889 + k (5.00976) for k = 0,1,2,..., [(4)(171)-1 = 683]

The RSN  cropland  site in Kansas that  was  considered previously in this
discussion resulted from the sequence number

          0.27889 + (397) (5.00976)     =    1989.15361,

as seen on the first line of Exhibit 4.

     The  sequence  number 1989.15361  not  only determined  that CNI  site
5-2-2R  of Nemaha  County,  Kansas  was  to be  included in  the cropland
sample  of the RSN;  it also  specified that a particular  point at  this
site was  to be used to locate the  10-acre  RSN site.  Hence, one of the
19 cropland points at this site was  determined by interpolation.  From
Exhibit  3,  the  following  cropland  accumulations   were  obtained  for
interpolation:

     State     County    CNI Site       Cropland Accumulation
      16        66       5-2-1R              1988.99295
      16        66       5-2-2R              1989.48814
                                 -13-

-------
           Table 1.3:   Design  Sample  sizes  for  the  Rural  Soil^ Network

Census Division
State
New England
Maine
New Hampshire
Vermont
Massachusetts
Rhode Island
Connecticut
Middle Atlantic
New York
New Jersey
Pennsylvania
East-North Central
Ohio
Indiana
Illinois
Michigan
Wisconsin
Pacific
Washington
Oregon
California
West-North Central
Minnesota
Iowa
Missouri
N. Dakota
S. Dakota
Nebraska
Kansas

Cropland
80
32
8
20
8
4
8
320
152
20
148
1648
276
312
568
220
272
600
180
152
268
3596
488
608
•328
636
424
428
684
Component
Noncropland
96
48
12
12
12
4
8
128
60
12
56
224
36
28
32
68
60
456
92
140
224
456
80
28
76
48
80
80
64

Total
176
80
20
32
20
8
16
448
212
32
204
1872
312
340
600
288
332
1056
272
292
492
4052
568
636
404
684
504
508
748
*Source:
Wiersma, G.B., Sand, P.F., and Cox,  E.L.  (.1971).   A sampling Design to
Determine Pesticide Residue Levels in Soils of the Conterminous United
States.  Pesticides Monitoring Journal 5CD, pp.  63-66.

                                  -14-

-------
       Table 1.3:   Design Sample Sizes  for the  Rural Soils  Network
                                 (continued)                 '
PPTIKIIQ H"ivi si on
wwllOUO WAVAOAwH
State
.South Atlantic
Delaware
Maryland
Virginia
W. Virginia
N. Carolina
S. Carolina
Georgia
Florida
East-South Central
Kentucky
Tennessee
Alabama
Mississippi
West-South Central
Arkansas
Louisiana
Oklahoma
Texas
Mountain
Montana
Idaho '
Wyoming
Colorado
New Mexico
Arizona
Utah
Nevada

Cropland
556
12
52
84
24
124
68
V120
72
452
124
112
92
124
1300
188
108
260
744
916
340
132
68
240
40
36
48
12
Component
Noncropland
376
4
12
56
36
68
40
80
80
244
52
56
72
64
552
64
60
'84
344
1280
200
120
148
140
192
176
128
176

Total
932
16
64
140
60
192
'1-08
200
152
696
176
168
154
188
1852
252
168
344
1088
2216
540
252
216
380
232
212
176
188
Grand Total
9468
                                               3812
                                   13280
                                        -15-

-------
                                          Exhibit 4:  KFRN Cropland Sampling
     
-------
The interpolation proceeded as follows:
     1989.15361 - 1988.99295
     1989.48814 - 1988.99295
                                 (19)   =    6.16 -> 7.
The  interpolation  figure was rounded  up  since an integer from  1  to 19
was  required.   In  this case,  the  seventh cropland  point at CNI  site
5-2-2R was to be used to locate the RSN cropland site, as is also speci-
fied in Exhibit 4.

     Once a defining point for a RSN site had been selected, an adjacent
second cropland (or noncropland) point was required in order to completely
determine the  location  of  the 10-acre RSN site.  If X is used to denote
the  defining  cropland point  selected  from the CNI  sample,  an adjacent
cropland point was  to be determined by considering the other CNI sample
points in the order indicated below :
                                   4
                         3         X         1
                                   2
If an acceptable second cropland point could not be located as indicated,
then the next  cropland  point in the listing  was  taken as a first point
and  the  routine  repeated.4   This procedure was implemented  in the USDA
offices  prior  to  field work,  and some  discretion  was  allowed.   The
intention was  clearly that  an RSN cropland site should not be placed at
an isolated cropland point.

     After two  points had  been selected, a designation was  made  on an
aerial  photograph  or  other  map  of a 10-acre  site with  these points
centrally located.   Attention was given to making the boundaries conform
with natural physical features  as much as  possible.   Implementation of
this procedure can  be  illustrated  by Exhibits  1  and 2.   The design
specified that the  seventh  cropland point was  to be used to locate the
RSN  site.   From Exhibit 2,  it  can be seen  that the  seventh cropland
point is the eighth CNI sample point.   In Exhibit 1, it can be seen that
the  depicted RSN site was,  indeed, centered  about  the  eighth and ninth
CNI sample points,  both cropland points.

     The  field person  was  permitted  to adjust  the boundaries of the
designated 10-acre RSN  site and was expected to prepare records so that
the  site  could be  readily  relocated  for subsequent  sampling at 4-year
intervals.  The final  sample  location  was to be  not less than 8 acres.
If the  designated  site  should  prove  to  be totally  unacceptable,5 the
field  person  was  permitted  the  following  alternatives  in order  of
preference:

     1)   Try to find  10 acres  within the CNI site that are acceptable.

     2)   Try  to  find 10 acres within one-fourth mile  of the CNI site
          that are acceptable.
 Memorandum entitled  "Soil Monitoring Program —  sampling  design" from
Leo G.K. Iverson to USDA PPC Inspectors.
5
 The authors  were not  able  to find an explicit definition of "totally
unacceptable."
                                 -17-

-------
     3)   Try to locate two smaller sites within the CNI site that equal
          10 or  nearly  10 acres.   Sample as if they were a single site.

     4)   Request  the USDA  staff at Hyattsville  to re-select  the  CNI
          site.6
     Substitute  CNI sites  were  selected  in  a  number  of cases.   The
substitutes  were chosen  from  within  the same  county as  the  original
site.  An  effort was  made to choose a substitute CNI site with approxi-
mately the  same  proportion of cropland points as the original CNI site.
However, since  a random  sequence number was not used  to determine the
substitute  site,  it was necessary to randomly designate  a point within
the substitute CNI site to locate the 10-acre RSN site.  It is not clear
that this randomization was always performed.

     There  are  several reasons why substitute CNI  sites  were sometimes
required.  Re-selections were performed by the USDA staff at Hyattsville
before  the  sample  went  to  the  field  when  the  selected CNI  site was
already  in use  by  the USDA.   For example, the selected  CNI  sites were,
occasionally found to be in use by

     a)   the Soil Conservation Service for their crop estimates,
     b)   the Economic  Research Service  for  their Pesticide Use Survey,
     c)   the  June  Enumerative  Survey  of   the  Statistical  Reporting
          Service.

Re-selections  were  sometimes  necessary after  the  sample went  to the
field because  the  land  owner refused to  cooperate.   Some re-selection
was  necessary  because of  a  change of land  use  status.  Unfortunately,
substitute  sites are   not  designated  as such on the  computer  records.
This  is  especially problematic  if  a  substitute  was  selected  in the
second  round of data  collection.   First round  and second  round data
cannot be  compared  directly  for  a  site if a substitute  has  been used.

1.2.3     Limitations  as a Monitoring Network

     The Rural  Soils  Network (RSN) design specified  that 0.025 percent
of  the  cropland acreage  and  0.0025 percent  of the noncropland acreage
was  to  be  sampled  in  each  State.   This  criterion resulted in sample
sizes that  vary considerably  from  one State to another.   Rhode Island
received the fewest sampling  units,  four each of cropland and  noncrop-
land.  Texas received  the most, 744 cropland sites  and 344 noncropland
sites.   Thus,  reliable  estimates of average  pesticide levels  are not
available for some geographic areas.   This is a minor limitation because
estimates  are  not generally  required  for small geographic  areas.  The
deletion of some States  when the design was  implemented restricts the
population to which inferences are valid, however.
 Shepherd,  D.R.   PPC Division  Memorandum 804.3  concerning  "Guidelines
for collecting  sample for the National  Soil  Monitoring Program—1969."
                                 -18-

-------
     More significantly, the following factors must be noted:

          The current design  was  found to be  too  expensive  to operate.
          The  network as  it stands  was not  designed to monitor  non-
          pesticide toxic materials, hence may be inadequate particularly
          for non-agricultural areas and localized contaminants.
          The  stratification  is now  15  years out-of-date,  which means
          losses in efficiency.
          The  two  phase design renders  estimating precision difficult.

1.2.4     Uses in Regulatory Action

     The Rural Soils  Network (RSN)  could be used to identify pesticides
and other widely  dispersed toxic  substances for which regulatory action
is desirable.  Each  sample site of the RSN was to be sampled every four
years.   Thus,  significant  increases  in  average  levels  of  specific
substance,  could  potentially be  discovered.   Moreover,  since  residue
levels were determined for both soils and crops,  the relationship between
soil  and crop  residue   levels  could  be used  to  identify  potentially
dangerous levels of  soil residue.   For example,  if a pesticide level in
corn that is  dangerous  for humans has been identified, the relationship
between  soil  and  corn concentration of  that pesticide  could be  used to
determine a  corresponding  dangerous  level  of  the pesticide in soil.

     The RSN  could also  be used to monitor the effects of regulation of
specific toxic substances.   Because each RSN site is sampled every four
years, the  network could monitor  the, effect of the regulation on levels
of the toxic substances in soils and crops.

     The RSN  may be  of  limited use,  however, in  identifying specific
violators of  regulatory action.  This situation  results  from the  very
design of the RSN.  The RSN is designed to be sites selected by a random
process  at  a  given  sampling rate  with  the location  of  specific sites
being confidential to protect the farm operator.   Specific localities of
interest may  not enter  the  RSN sample,  but the  design framework could
serve as the basis for special studies in suspected "hot spots."

1.2.5     User Needs  and Historical Uses  of the Data

     The historical objectives  of  the Rural Soils Network (RSN)  were as
follows:7

     (1)  Determine  levels of  pesticides and  other pollutants  in  the
          agricultural environment.

     (2)  Observe trends in pollutant levels through time.

     (3)  Determine the degree to which crops are contaminated.

     (4)  Determine  the  levels  of  various  pollutants  in agricultural
          waters.
7
 Shepherd, D. R.   PPC Division Memorandum  804.3  concerning "Guidelines
for collecting samples  for  the National Soil Monitoring Program—1969."
                                 -19-

-------
     (5)  Determine the  concentration of certain pollutants  at various
          depths in the soil profile.

     (6)  Review  program  findings  with  recommendation of  appropriate
          actions in mind.

The six objectives listed above comprise the major historical user needs
for the RSN  data.   The regulatory uses considered  in  section 1.2.4 are
included in objective (6) above.

     The implementation  of the  RSN  allows only partial  fulfillment of
the six objectives listed above.  It appears that objective (5) has been
abandoned  since soil  data has  been collected only  for  the  top  three
inches  of  soil.   Objective  (4)  has only  been partially addressed by
sampling pond water  and  sediment  during  a  single fiscal year.   Most
States have  follow-up  data with which to address objective (2) for only
one=fourth  of the  cropland 'sites  and none  of the noncropland sites.

1.3.     Alternate Survey Designs for the RSN

     The Rural  Soils Network  (RSN)  is a probability sample of the rural
areas of the conterminous United States.  A probability sample is essen-
tial as an  objective  basis for making inferences.   The RSN is, however,
a subsample  of  the 1967  Conservation Needs Inventory  (CNI).   It relies
upon  the  CNI  to  identify  the cropland  and  noncropland  strata,  as in
double  sampling schemes.  As  the  1967 CNI became  outdated,  sites were
found in the field to no longer belong to the intended stratum, cropland
or noncropland.   It has been the practice for the field personnel of the
RSN to use substitute sites in these cases. j^Jhe use of substitute sites
tends  to  destroy  the  probabilistic  nature  of the  sample  and  is  not
generally recommended,  however.5"D Resumption of RSN data  collection is
likely to result in many sites being misclassifled.

     Thus,  sampling  considerations  alone  suggest that  a new  RSN sample
is needed.   In addition,  a new sampling design should address the problem
of  monitoring toxic  substances other  than agricultural  chemicals  and
should attempt to reduce the cost of the monitoring network.   The expense
of the RSN led to purposive deletion of entire States in the  past, which
restricts the population to which valid inferences can be  made.  Various
alternative designs will now be considered.

1.3.1  Design Option One

     A minimal  change  alternative would be to subsample the  current RSN
on a probability basis.  This option mainly addresses the  problem of the
cost of the  RSN,  however it does also address the need for regional and
national estimates. £~bny need to eliminate reliance upon the  1967 CNI is
not addressed/^

     This option does have some advantages, however.  It's main advantage
is that it  can  be implemented quickly  and  easily,  possibly  while other
alternatives  are  under development.  Another advantage is  that direct
comparison could be made to the data collected from 1968 to 1975.
8
 See, e.g.,  page  386 of Kish, Leslie  (1965).   Survey Sampling.   Wiley.
                                 -20-

-------
     Careful  treatment  of the  sites  found to  no longer belong  to the
intended stratum would be necessary.  There are at least three ways that
these  sites  could be handled.   One possibility would be to  drop these
sample sites entirely.  There would be a loss in precision for estimates,
and  the  sampling weights would  have  to be adjusted to  reduce  the bias
that would result from deletion of these sites.  Alternatively,  substitute
sites could be selected, as has been done historically with this sample.
However,  the use  of substitute  sites  introduces  bias that cannot  be
measured or adjusted.  Finally, sites can be retained as selected.  This
keeps  the  initial  weight correct and provides unbiased estimates at the
cost of  a decrease  in  precision.  The  computerized data  records would
need to  indicate the resolution of each  of these  cases,  whether they
were all  dropped, or substitutes were  selected,  or retained  in their
original strata.   If as many as  10 percent  of the sample sites require
either deletion or substitution, this design option may not be reasonably
efficient..

     Data  analysis  problems would  be  aggravated  by  subsampling  the
present RSN.  The deep stratification of the 1967 CNI results in strati-
fication benefits for sample variances for the RSN.  However, the sparse-
ness of the RSN sample in comparison to the CNI sample makes recovery of
the  stratification  effects  difficult   (See  section  1.7).  The  major
problem  is  that  many  counties  have  no more  than  one RSN  site.   The
magnitude of this problem would necessarily increase with a subsample of
the current RSN.

     Thus, replicate  subsamples  are recommended if this option is to be
implemented, even  if it is only  on a  temporary basis.   For example,  if
50 percent of the RSN  sites are to be surveyed,  five subsamples that
each  comprise a  10   percent subsample  could  be  used.   At  least five
replicate  subsamples should  be  selected.   A defensible procedure for
selecting the replicate subsamples would be to first order the RSN sites
by  States  and  CNI  strata  within  States,  then  independent  systematic
subsamples could be selected.  This procedure would insure representation
of all states  and as much CNI stratification as possible in each of the
replicate  subsamples (or technically   'pseudo-replicate1  subsamples).

     The use  of  replicate subsamples would make it possible to estimate
easily sample  variances by  using the  theory  of replicate  subsamples.9
The results of interest  would initially be tabulated separately for each
independent subsample.  The variance of these results treated as indepen-
dent measurements  provides a simple, unbiased  estimate.  The resulting
variance estimate  captures  all design  effects,  although stratification
effects and design effects are not separately estimable.  This is not of
major  consequence  for the present RSN sample, since only  one  stage  of
sampling is employed within CNI strata.

     It might also be useful to select the subsamples at different rates
within domains of interest.  The present RSN sample has widely different
sample sizes within the  Census Divisions, and within the cropping regions.
If cropping  regions  comprise the major domains  of  interest,  they could
be subsampled at differential rates so that each received about the same
9
 See, e.g.,  page 19  of Cochran, W.G.,  Hosteller,  F. ,  and  Tukey,  J.W.
[1975].    Principles  of Sampling.   Journal of the American Statistical
Association, 70: 13-35.
                                 -21-

-------
 number of RSN sites.  Alternatively, Census Divisions could be subsampled
 at  differential  rates,  which might considerably reduce  the  sample size
 in some of the larger States, like Texas.

      Finally, identification  of strata  of special interest  within the
 domains just  considered,  could be  used to increase  the possibility of
 finding  toxic substance  residues.   For example,  the  noncropland  RSN
 sites  could  be  stratified  into  industrial  and  nonindustrial  areas.
 Sites in nonindustrial  areas  could  then be sampled at a lower rate than
 sites in industrial areas.   Stratification according  to  whether  or not
 toxic residues  have  previously been  found  at the  site may be  useful
 also.  Widely different  sampling   rates  would not  be  used  for  these
 strata,  however, because  they  would form a far from  homogeneous  group.

 1.3.2  Design Option Two

      The  present  RSN sample  is  a  subsample  of  the  1967  Conservation
 Needs Inventory (CNI).  A design  analogous to the design that  produced
 the present RSN sample  could  be based upon the 1982  National Resources
 Inventory (NRI).   Use of  the  1982 NRI  would  provide  up-to-date  land use
 information.   The NRI was  designed by  the Statistical  Laboratory at Iowa
 State University, and is currently being conducted  by  the Soil Conserva-
 tion Service.  The  design of  the  NRI  is  similar to  that of  the  CNI,
 except that the standard  sampling practice is to collect  land  use data
 for exactly three  random  sampling  points  within each primary  sampling
 unit  (PSU)  of the  NRI.10  Also,  the NRI is based  upon a  more  dense
 sample than was the  CNI.  Consequently,  data collection  for  the  NRI  is
 over three  years, 1980 to  1982.

      The procedure used  to select the RSN subsample from  the CNI sampling
 points  resulted in  a  sample  that  was  essentially self-weighting  within
 States  where  only  one  size  of PSU was  used  (See Appendix  D).  Equal
 weighting  was  an  important  consideration before  the  development  of
 computer  software  for the  analysis  of  unequal probability samples.  The
 unweighted  analysis  of  data  from   sample  sites selected with unequal
 probabilities  can well lead to  spurious  conclusions.

      Since  software  is now available for  the analysis  of  unequal probabil-
 ity  samples, an improved subsampling procedure  can be  devised.   The goal
 of  the  subsampling  procedure is to obtain adequate precision  at minimum
 cost.  This can be accomplished by identifying  areas where toxic residues
 are  likely  to be found  and  giving these areas a  higher probability  of
 selection.   It is,  of course, important  that  all  areas have  a positive
probability of being in  the sample  so  that statistical inferences will
be valid for the entire population.

     It is suggested that counties be used as primary sampling units for
the  second  phase sample.  The data  from the  present  RSN suggests that
counties  are  generally  rather  heterogeneous  with   respect  to  toxic
residues.    Thus,  it  would  be  advantageous   to  select  relatively  few
counties with  a relatively large number of sample sites,  say 5 to..10.,
within each sample   county.   The use of  counties  as  PSU's will reduce
  The  NRI  sampling  design  also includes  pilot studies  of  alternative
sampling designs in California, Louisiana, and Maine.  In Louisiana (and
in 40-acre PSU's), there is  only one random sampling point within a PSU.
                                 -22-

-------
travel costs associated with data collection.  More importantly, however,
smaller  areas  like  counties  can  be  stratified more  effectively into
areas where toxic residues are likely to be found.

     The RSN sample sites are to be located at NRI sample points.  Thus,
sample  counties  are  selected  from  the  counties occuring  in  the  NRI
sample, and  so  that counties where toxic  substance  residues  are likely
to occur have a greater chance of selection.  Thus, it is suggested that
counties be  selected  with probability proportional to size (PPS), where
the  size  measure   is  a  measure  of  the  likelihood  for  finding  toxic
residues.  Selection  of  PSU's  with  PPS sampling  is  a  common technique
with  resulting  variances of  estimates reduced  to  the  extent  that  the
size measure is  correlated  with items of  interest.   Variables  that  can
be used to construct county size measures include:

     (1)  Proportion of county acreage in cropland.
     (2)  Proportion of county acreage in heavy industry.
     (3)  Intensity of agricultural activity.
     (4)  Degree of industrialization.
     (5)  Predominant crops.
     (6)  Predominant industries.
     (7)  Predominant soil types.
     (8)  Climate
Counties  should  be selected with PPS  sampling  within Census Divisions,
cropping regions, or some other domains to insure adequate representation
of the major domains of interest.

     After  sample  counties  have been  selected,  the  NRI  sampling points
can  be used  to  locate RSN  sample  sites.  The  procedure used  for the
current RSN cannot be used,  however,  since most PSU's of  the  NRI have
exactly three  sampling points  and  some  have  only one  sampling point.
Thus,  it 'Isnid longer—feasible to center an RSN cropland site about two
cropland sampling  points.   Instead,  if a cropland point is selected for
the location of a cropland sample site, it is suggested that the site be
a square  10-acre  site  centered at the  selected  RSN  cropland point.  If
such  a site  is  not all  cropland,  percent cropland  will be noted and
specimens taken and kept separately for each stratum.

     Efficient sampling  within the selected counties  could result from
careful  stratification within  the  sample  counties.   The  NRI  sampling
points within  a  county  could first be stratified into  cropland points
and  noncropland  points,  to  insure adequate  representation of  each of
these  land  types  and  because agricultural  chemical residues  are more
likely to be  found in cropland.  Local land use characteristics similar
to those suggested  for constructing county size measures  could be used
to further stratify both the cropland points and the  noncropland points.
Finally,  greater  selection  probabilities  would be used  in  strata where
toxic  substance  residues  are more  likely to  be  found.   Moreover,  at
least one cropland site and one noncropland site should be selected from
each  sample county  that contains  at  least  one  NRI cropland  and one
noncropland sample point.
                                 -23-

-------
1.3.3  Design Option Three

1.3.3.1  Background

     The  target population  for  the  National  Soil Monitoring  Program
(NSMP) was the  land in the conterminous United  States,  divided  between
the Rural Soils Network (RSN) and the Urban Soils Network (USN).   Descrip-
tions  of  these networks  are  given  elsewhere.11   Both networks  were
interested in  "levels" i.,e.,  the absolute amount  of pesticide  in  the
soil, and "trends," the change in this amount with time.

     Review of  the  data  indicates large numbers of zero valued observa-
tions,  and  relatively  few  positive  observations.   This  analytical
challenge has been discussed elsewhere [See Lucas et al,  Recommendations
for  the  National  Surface  Water Monitoring Program  for  Pesticides.
Report No. RTI/1864/01-02I].   The conclusion of  that analysis was  that
the appropriate measures of "level" are:

(1)  The  proportion of positive detections,  that is, the  relative  fre-
quency of  last  stage  sampling units positive for the substance(s) under
investigation, and

(2)  The  proportion of sampling units containing  concentrations  of  sub-
stance above  some  specified  level.  This level may signal the existence
of an undesirable situation.

(3)  The  geometric  mean of  the positive values which is  a useful  con-
committant to  the  data,  identifying situations where, for example,  the
proportion of positive sampling units remains constant,  but the level of
concentration of toxic substance increases or decreases.

(4)  Related to (3), measures based on a truncated, or censored,  lognor-
mal model may prove useful.12

In  the  following  sections,   a  two-stage  design  is proposed, and  each
stage of sampling is described in some detail.  Simple cost and variances
are  included  as means of investing the  effect and  expense  of  various
alternative sample allocations.

1.3.3.2  Overview of the Proposed Sample Design

     The  proposed  design  is a two-stage  area probability  sample  with
stratification  of  the  sampling units  at each level.  The first stage or
primary  sampling units  (PSU's)  are counties.  The  3141  counties  in the
United States in aggregate constitute the total land area of the country.
Geographic  stratification  is  provided  by  the four  Census  Regions.
Allocation of PSU's  to these regions is in  proportion  to the land area
eligible for the study.
"National  Soils  Monitoring  Program:   Preliminary  Report.   January,
1980.  Research Triangle Institute.  EPA Contract No.  68-01-5848.

120wen and DeRouen.  Estimation of the mean for lognormal data containing
zeros and  left-censored values,  with application to  the measurement of
worker exposure to air contaminants.  Biometrics: 36:707 (1980).
                                 -24-

-------
     The question of  land area eligibility is currently  defined  by the
membership requirements of  the RSN and the USN.   It may be advantageous
from administrative as well as fiscal and statistical grounds to combine
the  activities  of the  soil networks, and  consider SMSA  counties  as a
stratum within  the  survey.   This point  requiring  further review beyond
the  scope  of this  study  is not addressed.   Initial investigation does
suggest that  savings  may  reasonably be anticipated.  Further discussion
is limited to tasks assigned to the RSN.

     With the extension of  monitoring responsibility from pesticides to
toxic  substances  in  general,  some  revision  of  the  approach  seems
indicated.  The following stratification variables are therefore proposed
in addition Census Regions for the PSU's:

     (1)  Land area,
     (2)  Population density,
     (3)  Agricultural activity, and
     (4)  Industrial activity.

     Second  stage  sampling  units (SSU's) are  10-acre  plots.   These are
proposed as  the final  stage units or analysis  units  on  the assumption
that they  are sufficiently homogeneous that  the effects  of subsampling
are  negligible.   This  is a verifiable  proposition.  The  problem  with
SSU's this small is the ability to locate them in the field.  The require-
ment for exactly locating plots is exacerbated by the absence of identi-
fiable  boundaries,  rendering  the  task  most  difficult.   To  ease  this
difficulty, Census enumeration districts (ED's)  are proposed as readily
identifiable  segments.    The  problem  is  reduced to  locating the  SSU
within the ED, or any suitable subsegment adopted to facilitate matters.

     SSU's will be allocated equally to PSU's.  A detailed field protocol
will locate  the  points  for specimen collection,  leaving  the  minimum of
discretion for the field personnel in the selection of these sites.   The
protocol  would  specify  a  grid locating  multiple  specimen  collection
sites.   The  soil  collected  in a given plot would be composited, unless
the homogeneity of the 10-acre plot is under investigation.

     Temporal effect  is not considered.   It is assumed for establishing
budget only that one collection per site  per year will be made.  However,
it does not  seem reasonable that all toxic substances persist in soils
at  stable  levels  throughout  the  year.   This may be  satisfactory for
heavy metals, particularly  at  poorly drained sites, but most pesticides
dissipate  through  leaching,  transpiration   and degradation  following
application,   and volatiles in  all  likelihood  leave  the soil  almost
immediate!y_-_j5TThus,  special studies of  this  phenomenon are recommended
         above the monitoring effort.

1.3.3.2.1  The First Stage Sample

     The first stage  sampling  units are  counties, which  are  often  used
as sampling units in  national surveys.   They are  easily  identified and
are political units of  sufficient  size that a great deal of information
is available  about them.   Indeed,  in order to enhance the efficiency of
the  proposed  design,  it  is  recommended  that  extensive  collection of
                                 -25-

-------
information be  undertaken for each county in the U.S.  This information
should include:

      1)   Total land area
      2)   Cropland and non-cropland acreages, or their estimates
      3)   Soil maps, characteristics - pH, organic content, etc.
      4)   Drainage areas and water ways
      5)   Weather, climatic and meteorologic data
      6)   Location and size of urban areas
      7)   Cropping patterns, major crop(s)
      8)   Location and types of industrial activities, including
          storage sites
      9)   Location of dump sites
                                          £
Moreover, the Master Area Frame maintained^the USDA should be consulted
for  design  information,   as  well  as  States  with mandatory  pesticide
reporting laws.

      The size measure  for the Census Region  is  its  eligible land area.
Other measures correlated with toxic substance use do not appear feasible
at  this  level  in view  of  the variability  in  land use.   The present
proposal uses the  definitions of the RSN to determine eligibility.  The
number of counties (PSU's) allocated to each Census Region is in propor-
tion  to  its  size,  with at least one PSU selected from each region.  The
allocation of PSU's to further strata is carried on in this fashion with
the  limitations  that there  must be at  least one PSU  in each stratum.

     PSU's in each stratum will be selected with probability proportional
to size  (PPS) and  with replacement.  As before, the size measure is the
land area eligible for the RSN.

      It  is  anticipated  that  the  investment in  the collection  of the
county  level  information  will provide  substantial  gains  in_ precision
through  effective  stratification.  The  purpose of  this  stratification
will  be  to locate  regions of  approximately  equal risk  of exposure to
toxic substances,  hence  permit the effective location  of sample sites.

     Two points can now be made:

      (1)  The most effective  variables  for stratification  will  change
          for different classes of toxic substances,  and may change from
          substance to substance, and

      (2)   It is not  possible  to anticipate which substances will  be of
          major interest in the future.

This leads to the conclusions:

      (a)   Information may  be profitably  collected  for every  county in
          the United States,  and

     (b)   Any proposed design should be  as flexible as possible.
                                 -26-

-------
Point  (a)  supports  point (b) above by simplifying the process of making
design  changes if  and when they  become necessary.   Additionally,  the
selection  of  stratification variables which appear  to  be  both general
and effective  offers  the possibility of achieving a  flexible and effi-
cient design over the near future.

     The approach is  to propose the selection of PSU's according to the
general stratification  scheme  which is found most effective at the time
of  the adoption of  the design.  These  PSU's would  then establish the
monitoring network  (RSN).   The selection of the  SSU's within the given
PSU's according to the procedure below would then determine the specific
soil  specimen  sites.   However,  it  is proposed  that  the stratification
variables within the PSU's, and hence the soil specimen sites, be allowed
to  change  in  response to changing  interest  in  toxic  substances.   It is
intended  by  this   technique  to maximize  the  probability of  positive
results to monitoring efforts.

1.3.3.2.2  The Second Stage Sample

     The secondary  sampling  units  (SSU's)  will be 10-acre plots.   Equal
numbers will  be selected with^in  each PSU.  It  is possible  that there
will  be  more  strata  within PSU's  than  sampling units.  This  suggests
that  stratified random sampling will not apply.  There  are a number of
related methods which can be used in this situation.

     One procedure is to use a composite index combining several strati-
fication variables.   In effect, two or  more strata  are  combined and a
'weight' is assigned to each observation in the new stratum based on the
relative sizes of  the original strata.  Observations are then selected
from the new strata by the usual probability methods.

     A second procedure is to consider the effect of combinations of the
strata and assure  at least one observation  from important  combinations
is  selected.   This  can  be  accomplished  by employing  the lay-out of an
experimental design as  if  the  strata were treatment  levels.   The latin
square is  used in  this fashion.   For example,   consider the following
case with two stratification variables each at 3 "levels."
                Table A.  A Latin Square Selection Scheme

                                  Geographic Location
                         Type 1         Type 2         Type 3

               Type 1                     x
Soil Types     Type 2                                    x
               Type 3      x

                    x = selected plot.
Here with  a sample  of  3 plots  we have observations from  each  type of
location (possibly  classified by  potential  exposure) and  of  each soil
                                 -27-

-------
.typje.   This  can be  done by:   Choosing  a "cell"  (Soil  Type x Location
Type Combination) at random, then eliminating the remaining cells in the
same row  and column from further consideration.  A  second selection is
made at random  from  the cells in the remaining  rows  and columns.   The
row and column containing the second selection are  then eliminated and
the next  random choice is made.  This procedure  is  continued until all
the rows and columns are eliminated.

     A  third  procedure  generalizes  the  approach above  and  is called
"controlled  selection".   The typical use of this procedure is to visual
the sample in  a  tabular array as:
                Table B.  Example of Controlled Selection

                                    Geographic Location
                         Site 1    Site 2    Site 3    Site 4    Total

               Type 1                                             H!
Soil Type      Type 2                                             n2'
               Type 3	n3'

               Total       n !       n 2       n 3       n 4      n
Here  the total  number  of  plots  assigned to  a PSU,  say, is  n.   The
constraint, or  "control",  imposed is that the margins of the table, the
row and  column  totals (or proportions if preferred), be satisfied.  So,
Site  1 must  appear a..^ times and Soil Type 2 must appear n.g times, and
so on.  Any arrangement of the sample among the table cells which satis-
fies these constraints is acceptable.  And, at least conceptually, every
such arrangement, or a specified subset, is written down and a probabil-
ity  assigned to  it.   Then  one  of these  arrangements is  selected by
chance according to the assigned probability.

     The  complication introduced  by this  method  is  the  loss  of  the
ability to obtain simply an estimate of precision.  The level of control
requires either replication to obtain a variance estimate or some approxi-
mation be used.

     The methodology  adopted for  the  design will depend  on the actual
stratification  variables and  the  constraints  on selection which seem
most  effective.    An  important  statistical  consideration  is   that  the
procedure used  should  provide  an unbiased estimate of the PSU parameter
of  interest  (total,  mean or  proportion).  In  addition,  a measure of
precision of the estimate should be capable of reasonable approximation.

1.3.3.3  Size and Allocation of the Sample

     Sample  size   is  determined by the  level  of  precision needed  to
answer the question or questions which are the reason for undertaking a
survey.  The allocation of the sample is dependent upon locating sources
of variation entering  the  survey and the cost of controlling  them.  Of
                                 -28-

-------
course, these two considerations are interdependent and cannot be solved
separately.  In order to examine this quantitatively, models approximat-
ing cost and variability are constructed.  These models are only intended
to  indicate  values depending  upon circumstances which  may change, but
still  permitting  more rational decision-making  rather,  than an attempt
at an exact description of budget or variability.

1.3.3.3.1  A Cost Model

     The  total  cost of  a  survey  depends  upon both  fixed and variable
costs.  Fixed costs are overhead costs which are essentially independent
of  the sample size  -  materials,  rental of  quarters, preparatory work,
staff  salaries,  and so  on.   Variable  costs  are unit  costs - specimen
collection, travel,  shipping,  etc.  For our two stage sample, we assume
a simple linear cost model,

               C = GO + GI ni + Cg nj 0.2»

where

          C is the total cost of the survey

          CQ is the fixed cost

          G! is the variable cost for county-level data

          C2 is the variable cost for plots

          H! is the number of counties in the sample

          nz is the average number of plots per county.


The development of the costs is shown in Table 1.3.3.1.  These costs are
estimated  from related  efforts  and are  only  approximate.   Different
methods in contracting and operating the survey will significantly alter
these costs.   For example,  cooperative agreements with the Department of
Agriculture  or  other  interested  agencies  may  produce  substantially
different field costs.   Also  laboratory costs are included for "organo-
pesticides" and  heavy metals.  However, different  budgeting may appro-
priately exclude part or all of these costs.

     Under the assumptions  given we find

          C0   =    $367,800

          Cj.   =       3,280

          C2   =         926 .
Since  the  overhead  cost  includes the  collection of  preparatory  data,
maps, etc., on  all  3141 counties in the United States, this cost is not
included.  It may be preferable to:
                                 -29-

-------
              Table 1.3.3.1  Construction of the Cost Model
1.   Selection of counties - first stage units

          Item                GO - Overhead Costs      Cj - per County Costs
                                              *
     Construct Frame                  $300,000
     Stratify Frame                      1,000
     Develop Size Measures                 300
     Select Sample Counties              5,000
     Develop Computerized Data Sets      2,500
     Administration                      3,000                   100

2.   Selection of plots - second stage units

          Item                     C0        Cj.        C2 - per Plot Costs

     Construct Frame               3500      1000           50
     Stratify Frame                4000       150
     Form Segments                  500        20
     Select Sample Plots           3000        10            1

3.   Field Work and Analysis

     Collection of specimens      12000      2000          300
     Laboratory Analysis**                                 560
     Data handling                 1000                      5
     Stat. Analysis, Reporting    20000                     10
     General Administration       10000

          Total:    C0 = $367,800        Cj = $3,280           C2 = $926 .
 *
  Includes preparing materials on 3141 counties.
**
  Uses RTI costs, does not include analysis of toxic substances beyond
pesticides and heavy metals.
                                 -30-

-------
     (1)  Do only a subset of the counties, or
     (2)  Spread this cost over several years.

Ignoring this factor is equivalent to using the cost equation

     C • CQ — Cj nj + C2 HI n2

which  clearly  does not  affect the  relative  allocation of  the  sample.
Using  the  first  equation,  the estimate cost  of  a  survey of 57 counties
with an average of 18.73 plots per county is                 ~       ~ —

          C = $367,800 + $3,280 (57) + $926 (57) (18.73)

            = $1,543,367.


1.3.3.3.2  Sample Size Calculations

     A  minimum acceptable  precision must  be  specified to  insure  the
adequacy of  the  survey results.   The statement  "I  must know the amount
within  10  percent," or  "The error in the proportion  reported must  not
exceed 20 percent," specifies a sample size under a particular survey of
a proposed study if the heterogeneity of the population under investiga-
tion is known.

     For the purpose  of discussion, the parameter  of  interest  is taken
to be the proportion, p, of land (specifically of 10 acre-plots)  contain-
ing detectable  levels of  toxic  substance.  The variance model  for  the
estimator p of p is
               Var(p)    =            {1 + p(n2 - 1)} ,
                              nl n2

where p is the correlation among plots within a county,

      nj is the number of counties

      n2 is the average number of plots per county.

The term in brackets is called the "cluster effect", and it is convenient
to write

               dc = 1 + p(n2 - 1)


This model ignores  stratification and unequal weighting for simplicity.

     The sample allocation problem is choose the number  of counties, nlf
and the number of plots ,  n2 , within counties .   For a given budget (which
fixes the total sample size), are we wiser to include many counties with
few plots per county, or fewer counties with more plots  per county?  The
solution is to balance  considerations of cost and variability, that is,
                                 -31-

-------
            Table 1.3.3.2  Cluster Effect for Selected Values
                           of p and 0.2
Intracluster
Pesticide Correlation p


Endrin
Chlordane
Aldrin
Dieldrin
P,P'-DDE
0.01
0.06
0.125
0.169
0.231
0.298
0.430
Average Number
5 10
1.04
1.24
1.50
1.68
1.92
2.19
2.72
1.09
1.54
2.13
2.52
3.08
3.62
4.87
of Plots per County 03
15 20 25
1.14
1.84
2.75
3.37
4.23
5.17
7.02
1.19
2.14
3.38
4.21
5.39
6.66
9.17
1.24
2.44
4.00
5.06
6.54
8.15
11.32
Cluster Effect d  = 1 + p(n2 - 1)
                                 -32-

-------
                 Table 1.3.3.3  Minimum Cost Allocation Subject to the Constraint:

                                            c.v.  =  Vv(p)/P < 0.10
p
.01
.06
.125
.169
.214
u> .298
' .430
The

Average
Cluster
Size
18.73
7.45
4.98
4.17
3.61
2.89
2.17
entries
nz = 1
Cluster
Effect
"c
1.18
1.39
1.50
1.54
1.56
1.56
1.50
in the table
C .
_i ilPl*
Cz p
*
3141
3141
3141
3141
3141
3141
3141
were
dc =
p = 0.0001
**
Est. Cost
$64,779.986
31,970.880
24,786,972
22,431,228
20,802,394
18,707,782
16,614,096
calculated from the
1 + p(nz - 1) nt =
P =
"l
3141
3141
3141
3141
3141
3141
3141
formulas
(1 - P)
0.001
Est. Cost
$64,779,986
31,970,880
24,786,972
22,431,228
20,802,394
18,707,782
16,614,096
:
2
dc / pnz(c.v.)
P
"l
622
1843
3141
3141
3141
3141
3141


= 0.01
Est. Cost
$12,828,060
18,759,020
24,786,972
22,431,228
20.802,394
18,707,782
16,614,096


p B 0.10
DI Est. Cost
57
168
271
332
389
487
623


$1,175
1,710
2,138
2,370
2,576
2,900
3,295


,928
,392
,980
,544
,024
,242
,392


 *n|, the number of counties in the sample,  cannot exceed the total number  in  the  United -States.

**Estimated Cost does not include the fixed  portion,  C0, in the cost equation  (see accompanying text)

               Cost = Co * Cinj + Czn1nz

and  n2   =    average number of plots per county
     C,   =    cost for first stage units = $3280
     C2   =    cost per second stage units = $926
     p    =    proportion of land area containing detectable levels of toxic substance.

-------
Table 1.3.3.3 (continued)  Minimum Cost Allocation Subject to the Constraint:



                                    c.v. = VV(P)/P  < 0.15
p
.01
.06
, .125
U)
r •«>
.214
.298
.430
Cluster
Size
"2
18.73
7.45
4.98
4.17
3.67
2.89
2.17
Cluster
Effect
dc
1.18
1.39
1.50
1.54
1.56
1.56
1.50
*
3141
3141
3141
3141
3141
3141
3141
p = 0.0001
Est. Cost
$64,779,921
31,971,296
24,787,138
22,431,200
20,802,403
18,708,235
16,614,068
"I
2797
3141
3141
3141
3141
3141
3141
p = 0.001
Est. Cost
$56,685,272
31.971,296
24,787,138
22,431,200
20,802,403
18,708,235
16,614,068
"1
277
820
1325
1624
1901
2375
3041
p = 0.01
Est. Cost
$ 5,712,842
8,346,534
10,456,311
11,597,666
12,590,056
14,145,832
16,085,126
p = 0.10
nt Est. Cost
25
74
120
147
172
215
276
515
753
946
1,049
1,139
1,280
1,459
,599
,223
,977
,788
,181
.570
,879

-------
Table 1.3.3.3 (continued)  Minimum Cost Allocation Subject to the Constraint:
                                    c.v. = VV(p)/P  < .20
Average
Cluster
Size





1
CO
Ln
1


P
.01
.06
.125
.169

.214

.298
.430
n2
18.73
7.45
4.98
4.17

3.61

2.89
2.17
Cluster
Effect
"c
1.18
1.39
1.50
1.54

1.56

1.56
1.50
p = 0.0001
* **

3141
3141
3141
3141

3141

3141
3141
Est. Cost
$64,779,921
31,971,296
24,787,138
22,431,200

20,802,403

18,708,235
16,614,068
P
"i
1573
3141
3141
3141

3141

3141
3141
= 0.001
Est. Cost
$32,441,520
31,971,296
24,787,138
* 22,431,200

„ 20,802,403

18,705,235
16,614,068
p = 0.01
•i
155
461
745
914

1067

1335
1710
Est. Cost
$ 3,196,716
4,692,350
5,879,152
6,52.7,257
,
7,079,837

7,951,446
9,004,908
P
•i
14
41
67
83

97

121
155
= 0.10
Est. Cost
$ 288,735
417.326
528,729
592,737

642.417

720,692
819.860

-------
the budget goes  further if we sample the  less  expensive  units,  however
precision is  improved  if  more  of our  observations  come from the  most
variable units (since in the extreme case,  if the units all  have  identi-
cally the same value, one observation is sufficient to tell  us everything
about these units).

     Using,  the  cost  and variance equations above we find the values of
nj and n2 which optimize precision for a fixed cost are
                     rrrra
                      C2p

and
                    (l-p)dc
                    pn2 (c.v.)
      2
(c.v.)  is the square of the coefficient of variation or the relative
variance.  It is the level of precision specified as necessary for this
survey, and is given by the equation

          c.v. =
     The optimal allocation and the associated cost is given for a range
of values of p, most of which represent national average values for some
of the  common pesticides  reported in the  RSN.   These values  of  p are
indicated in  Table  1.3.3.2 along with the effect of cluster size on d ,

the  cluster  effect, and  the  names of  the  pesticides  involved.   Table

1.3.3.3  displays  the  minimum cost  allocation  and  the estimated cost
corresponding  to  these values of p,  the  correlation of  the  pesticide
concentrations within  counties.   Values  of  the coefficient of variation
(c.v.) on the order of 10 percent are  commonly accepted.

1.4  Present RSN Operations

     The operational  design of the Rural Soils  Network (RSN)  specified
that each site would be randomly designated as a first-year, second-year,
third-year,  or fourth-year sample site,  so that  sample specimens  would
be obtained for one-fourth of the sites in each State during each fiscal
year.   Specimens  were to  be  obtained at  each  site  no less  than once
every four years  and  not more than once per  year.   Soil specimens were
obtained by compositing fifty soil cores, each 2 inches in diameter by 3
inches  in  depth.   The procedure  for  collecting and  compositing  these
cores and  for collecting  crop  specimens is described  in  detail  in the
PPC  Division  Memorandum  804.3,  which  is  dated April,  1969, and  is
entitled  "Guidelines  for  Collecting  Sample  for  the  National  Soil
Monitoring Program  —1969."   This memorandum  specifies that  soil  and
crop  specimens  are  to be  obtained simultaneously at  or shortly before
harvest  time   for  the cropland  sample.   It  also  specified water  and
sediment specimens should be collected from the nearest pond to each RSN
site, within one mile, four times at equal intervals during each sampling
year.

                                 -36-

-------
     The  above  operational  design  appears  to  have been  implemented,
except  that  specimens  from ponds have been collected in only one fiscal
year, 1973.  Moreover, data collection ceased with fiscal year 1975, and
very little second round data for assessing trends is available.

1.5  Alternate Operational Design for the RSN

     The  operational  design of  the Rural Soils Network  (RSN)  was well
conceived for monitoring agricultural pesticides and herbicides in rural
soils,  harvested crops,  and  rural ponds.   Some  modifications appear,
however, to be warranted at this time.

     The  operational  design  of the RSN specified  that soil  and crop
specimens be  obtained  simultaneously at or shortly before harvest time.
This  data  was to be  used to  monitor levels of compounds  in soils and
crops,  as well as establish relationships between soil and crop residues.
Crop specimens should be obtained at or shortly before harvest, since it
is the  harvested crop  that will be consumed.  However, harvest time may
be less than ideal  for obtaining  soil  specimens.  Much pesticide and
herbicide  residue may  often be leached out of  or vaporized  from the
cropland soil by harvest time.  This could  explain  in some measure the
preponderance  of less  than  detectable  residue  levels in  the  cropland
soil data collected thus far (See Section 1.7).
     Thus, it may be  preferable to obtain cropland soil specimens early
in the  growing  season.   It would then be necessary to carefully specify
where the soil cores were selected, e.g. on a map of the sample site, so
that crop  specimens  could be obtained near  harvest  time  at practically
identical locations.

     Noncropland  soil  specimens could  be obtained  whenever  convenient
during  the  sampling year, since there appears to be  no  major national
relationship  between annual  seasons  and toxic  substance  residues  in
noncropland soils.  Random points in time are preferable, but may not be
logistically feasible.   However, the purposive  selection a  single point
in time opens up the opportunity for introducing serious bias.  Whatever
protocol  is   adopted,   it  is  important  that  the  protocol  be  applied
uniformly across  the nation  so that the  population  being sampled is as
well-defined  as  possible.   Sampling some  areas  when levels  of toxic
substances are  suspected to be  high,  but not doing so  in  other areas,
would  lead to  difficulties  when  making other  than  local inferences.

     Changes in the definition of an RSN sample site that would make its
boundaries more  readily  identifiable  would be  useful.   This  would be
useful  so  that  the  selected  sample site could be accurately identified,
and  the  identical  site  could  be  revisted  periodically  to establish
trends  in residue levels.  If the selected site is not precisely defined,
the value  of  the sampling design is lessened.   Analyses of  trends based
upon paired differences may lead to spurious results.

     The use of a sample site larger than 10 acres may make  it easier to
identify site boundaries.  However, compositing of the specimens collect-
ed at a site is only'justifiable if the site is  homogeneous  with respect
to data items.   Thus,  a fairly small  sample site  is required  if the
specimens  are  to be  composited.   The  alternative  would be to  report
multiple specimens individually.

                                 -37-

-------
     The use of less than fifty soil cores at a sample site could reduce
the expense  of  collecting specimens and should be  considered.   The use
of a large number of cores is advisable, however,  if the cores are to be
composited.  This  insures that  the composite is  representative of the
site by  reducing  the influence of individual cores.  If multiple speci-
mens were  to be  reported separately within a sample  site,  fewer cores
might be sufficient.  An experimental study could be designed to investi-
gate optimal size  of  sample  site  and optimal  number of  soil cores.

     Elimination of pond water and pond sediment  specimens  is  probably
necessary  to keep  the cost of the  RSN  data  collection reasonable.  The
operational  design  specified  that pond specimens  were to  be  obtained
four times at equal  intervals during  each  fiscal year  for RSN sample
sites with a pond  within one mile.  This procedure is commendable since
the  pesticide  level  in pond  specimens  would probably vary  greatly,
depending  upon  the  turbidity of  the  water,  the  water level  and the
season.  The four  equally  spaced  samples would  allow compensation for
this variability.   Unfortunately,  this  sampling  protocol would probably
require  a  field  crew  devoted entirely  to  sampling  pond  water.   Two
reasonably spaced  collections  of pond specimens  for each sample site in
some sampling years may be worth considering.  The pond specimens could
be collected early and late in the growing season, possibly simultaneous-
ly with the collection of soil specimens and crop specimens,  respectively.

     Finally, it  is  important  that tests for all  toxic substances for
which inferences are desired be performed on all  sample specimens.  This
may have  been the  intention in the  past,  but the data  in  Section 1.7
show clearly that  some classes of  compounds were  more regularly tested
than others.   All compounds for which statistical inferences  are desired
should be  tested  in all sample specimens.  This  requirement  may place a
practical  limit  on  the  number of classes  of  compounds that  can  be
monitored.

1.6       Recommended Modifications

     Since the most cost-effective strategy for modifying the RSN depends
to  some  extent  upon  information which is not available,  the following
are simply indications  of a way to enhance  program efficiency.   Design
Option 1 seems  to  have little to  recommend  it.   Its  importance lies in
its connection  with  the  historical series  reflecting the operation of
the RSN  from FY 1968 to FY  1973.   However,  given the inactivity of the
RSN in  the intervening  years, there is  reason  to believe  the network
would require  substantial up-dating  which  in itself  adversely affects
the relationship between the RSN and the historical  series.  Moreover,
it may be possible to safeguard the series by appropriately managing the
transition to a new network.

     Design Option 2 may be the most feasible economically.   If a coopera-
tive agreement  can be  reached with  the  officials responsible  for the
operation  of the  National Resources  Inventory   (NRI),  then the field
costs may  be kept  down.  Since the NRI  is  intended to produce national
estimates  of various  kinds,  it is  likely to  do  so for toxic substances
in  an  adequate  fashion, and  a subsample  satisfactory  for monitoring
purposes.


                                -38-

-------
     Design Option 3  represents  a monitoring effort geared toward toxic
substances specifically.   It is  expected  to perform well  in providing
the  desired  data.   Should  an  advantageous  cooperative agreement  with
USDA or others  not be obtainable, then this  would seem to  be the option
of  choice. £^And»  in fact,  it  is  not  impossible that conditions  may
dictate that  a combination  of  Design Options  2 and 3 be  adopted.   An
economical national estimate may be provided  by the NRI network, and may
be profitably  supplemented by local or special  studies based on Design
Option STj

1.7       Statistical Findings and Charts for the RSN

1.7.1     Introduction

     Data collection  for  the Rural Soils Network (RSN) occurred between
fiscal year 1968  and fiscal year 1975.  The  design specified that one-
fourth  of all sites  in  each  State  would   be sampled in  each  year.
However, the first year  of sampling was regarded as a large scale pilot
study and only six States were sampled.  The RSN was never fully imple-
mented; the yearly data  collection effort is summarized in Table 1.7.1.
This table  indicates, for  example,  that the  random one-fourth  of  the
cropland sites  in Maine  that were designated to be first-year cropland
sites were sampled in fiscal years 1968 and  1973.   It is  apparent from
Table  1.7.1  that  only one-fourth  of  the noncropland sites  have  been
sampled in most States.   Also,  most States  have  a  follow-up sample at
approximately a four  year interval for only  one-fourth of  the cropland
sites.   Finally, it is apparent that very little data have  been collected
for  the Mountain  Census  Division of the United States, possibly because
of the expense of collecting data in this region.

     In preparation  for  data analysis,  the EPA computer records for the
RSN  were checked  for logical inconsistencies.  Twenty-three were found.
The  methods  of  identifying and  resolving   these  inconsistencies  are
discussed in  Appendix E.   Appendix E also describes the creation  of a
data set with a structure that more readily lends itself to data analysis
than do the EPA data  files.

1.7.2  Sampling weights

     Proper analysis  of the RSN data must account for the  characteristics
of the sampling design by the use of sampling weights.   Sampling weights
are adjustments attached to each observation  of a data set  which usually
reflect the probability of selection of the observation.   In the case of
simple random sampling,  the use of weights is quite straightforward.   If
one  individual  in a  1000 is randomly selected,  i.e., the probability of
selection is 1/1000,  then each individual "represents" 1000  others  and
his  income, say,  is  multiplied by 1000 to estimate the  total income of
1000 individuals.   In more complex survey  designs, the same approach
applies although the  details become more complicated.

     The weights  for  the  Rural Soils Network (RSN) depend  on two phases
of  sampling:    (1) The  selection  of  the sampling  points  for  the  1967
Conservation Needs Inventory (CNI),  and  (2) the  subsample of the  1967
CNI  points  selected  to   locate  the RSN  sample plots.  Therefore,  the


                                 -39-

-------
Table 1.7.1:  Fiscal Years of Data Collection for the Rural Soils Network

Census Division
State
New England
Maine
New Hampshire
Vermont
Massachusetts
Rhode Island
Connecticut
Middle Atlantic
New York
New Jersey
Pennsylvania
East-North Central
Ohio
Indiana
Illinois
Michigan
Wisconsin
Pacific
Washington
Oregon
California
West^N.o.r.th_Centra 1
Minnesota
Iowa
Missouri
N. Dakota
S. Dakota
Nebraska
Kansas


1

68*
69
69
69
69
69

69
69
69

69
69
69
69
69

68*
72
69

70
69
69
69
69
68*
75*

Year in
2

69
70
70
70
70
70

70
70
70

70
70
70
70
70

69
73
70


70
70

70
69

Cropland
Round 1
3

70
72
72
72
72
72

72
72
72

72
72
72
72
72

72
74
72


72
72

72
70

Samples

Noncropland Samples
Round 2 Year in Round 1
4

72
73
73
73
73
73

73
73*
73

73
73
73
73
73

73

73


73
73

73
72

1

73
74
74
74
74*
74

74
74
74

74
74
74
74
74

74

74


74
74

74
73

21234

74 68* 69 70 72*
72*
72*
72*
72*
72*

72*
72*
72*

72*
72*
72*
72*
72*

68* 69




69



74 68* 69

                                                       (continued)
                               -40-

-------
      Table 1.7.1:
Fiscal Ysars of Data Collection for the Rural Soils Network
                 (continued)

Census Division
State
South Atlantic
Delaware
Maryland
Virginia
W. Virginia
N. Carolina
S. Carolina
Georgia
Florida
.East-South Central
Kentucky
Tennessee
Alabama
Mississippi
West-South Central
Arkansas
Louisana
Oklahoma
Texas
Mountain
Montana
Idaho
Wyoming
Colorado
New Mexico
Arizona
Utah
Nevada


1

69
69
68*
69
69
69
68*
69

69
69
69
69

69
69
69
75*

75*
68*
69
69
69
69
69
69

Year in
2

70
70
69
70
70
70
69
70

70
70
70
70

70
70
70



69






Cropland
Round 1
3

72
72
70
72
72
72
70
72

72
72
72
72

72
72
72



72






Samples Noncropland Samples

4

73
73
72
73
73
73
72
73

73
73
73
73

73
73
73



73






Round 2 Year in Round 1
1 21234

74 72*
74 69 70 72*
73 74 68* 69 70 72*
74 69 72*
74 72*
74
73 74 68* 69
74

74 72*
74
74
74

74
74
74



74 68* 69



69


 These data are not on the computer files supplied by EPA.
**
  Source:  Personal communications with and computer files supplied by EPA
Field Studies Branch, Washington, D.C.
                                   -41-

-------
selection  probabilities  will be discussed which  accompany the sampling
units in each phase.

1.7.2.1  Sample Selection for the CNI

     The  CNI  is a  highly stratified  area  probability sample,  and its
sampling  weights are  rather easily determined.    Since  stratification
requires  that units  be  selected  in  each  stratum (subdivision  of the
population),  there  is  no choosing among strata.   If  States are strata,
we  must draw a  sample1 in every State.  If  we stratify  by  county, we
sample  in every county, and if townships and parts of townships are also
strata  then we must sample in every such stratum.   So there is no selec-
tion probability to calculate  for strata since each  stratum  has a 100
percent chance of being selected.  Within strata,  primary sampling units
(PSU's), usually 1 or 2, were selected purely by chance, i.e., at random
with equal probabilities.

     As  discussed  in  Section 1.2.2,  all  counties of  the conterminous
United  States that  were  not entirely urban, were divided into townships
and  sections, or  pseudo-townships  and pseudo-sections.    The standard
sampling procedure used  strata  composed of 12-section  blocks  (1/3  of a
township),  and  one  quarter-section  (the  PSU)  was   drawn at  random.
Hence,  the probability of selection was 1/48,  a sampling rate of approxi-
mately  2 percent.

     Within each PSU,  sample "points" were selected by use of a perfor-
ated template, which was spun to locate sampling  points  in an unbiased
manner.  The  perforations  formed a grid pattern which  was marked on an
aerial  photograph of the PSU.  The CNI sample collected data at each of
these  sampling  points.   Among  the  information  collected was  land use
data, which was used by the RSN to classify each point as  either cropland
or noncropland.   Due to differences in PSU sizes and shapes and the spin
of the  sampling  template,  the number of cropland  points,  the  number of
noncropland  points,  and their  total  change in  an  unpredictable,  or
random, manner.  These three quantitites  are  then random  variables that
can  be  used  in standard  statistical  procedures.  The RSN used these
random  variables for estimation  of proportions of cropland and noncrop-
land acreage  in each  of the States of the conterminous  United States.

     If we use the notation

               U(i,j,k) = the total number of  PSU's in stratum k of
                          county j  in State i,
then the probability of selecting PSU i when u(i,j,k) PSU's are selected
at random from stratum k is
                                 -42-

-------
It  is  shown in Appendix D that  the  selection of sampling points within
the PSU  can essentially be ignored.  The  resulting  sampling weight for
each of the n(i,j,k,£) sampling points in PSU £ is then


          W(i,j,k,£,m)13 =	    for m = 1, 2, ..., n(i,j,k,£).
1.7.2.2  Sample Selection for the RSN

     The RSN  is  based upon a subsample  of  the CNI sampling points.  It
is  intended to  provide valid estimates—for- cropping regions and some of
the larger States, rather than the county level estimates available from
the CNI.   The RSN is based upon systematic subsamples, one for cropland
points  and another  for  noncropland points,  selected  from the sampling
points  of  the CNI within each State.  Each sampling point selected for
an RSN  sample is used to locate a 10-acre sample plot.

     The RSN  cropland sample is based upon a systematic subsample of the
CNI  sampling  points  that  have  been  classified  as cropland  points  as
detailed in Section  1.2.2.   This procedure results in a sample in which
the PSU's  of  the CNI occur essentially with probability proportional to
size  (PPS),   where  "size"  is  measured  by  the  proportion of cropland
points  within the PSU.   Thus,  PSU's  containing a  higher proportion of
cropland points  are more  likely to be  selected into  the RSN cropland
sample.

     The following  notation is  useful for expressing  the RSN sampling
weights:

          v (i,j,k,£) = number of sample cropland points in PSU £.

           v(i,j,k,£) = total number of sample points in PSU £.  •

          r1(i,j,k,£) = the cropland ratio for PSU £ (adjusted as
                        detailed in Appendix D).

               N-(i)  =  2 r-(i,j,k,£)  = Sum of the cropland ratio over
                         all units in State i.

               n-(i)  =  number of RSN cropland sample sites in State i.


The probability  that a  PSU of  the  CNI  will  be  selected  into  the RSN
sample is then essentially proportional to

                                   n.(i)
13
  Since 640-acre PSU's were sampled at one-fourth the rate of all other
sizes of PSU's, the appropriate weight for these sites is 4W(i,j,k,£,m).

                                 -43-

-------
It is well-known14 that drawing equal sized samples within PSU's selected
with probability proportional to size results in a self-weighting sample,
i.e.  all  ultimate sampling  units  having  the  same  sampling  weight.
Essentially the same phenomenon occurs with the RSN samples.  Most PSU's
of the CNI that are selected into the RSN sample receive exactly one RSN
sample  plot.    Thus,  under  the  fairly broad  assumptions detailed  in
Appendix D, the  sampling weights for the RSN  cropland sample plots are
given by

                                    N.(i)
    W (i,j,k,£,m )15 = v(i,j,k,£) • -Vpr    for m  = 1,2	n(i,j,k,£).
     1          I                   11 - V. J. y         A
Since the total number of points, v(i,j,k,£), within a PSU, is essential-
ly  constant  for most  States,  the sample  is essentially self-weighting
for most States.

     Details of the derivation of the sampling weights and implementation
of approximate sampling weights are found in Appendix D.  The approximate
sampling weights were calculated and included in the data set constructed
for analysis purposes, which is discussed in Appendix E.

1.7.3  Stratification

     The  two phase  sampling design of  the RSN  necessarily introduces
complexities into  the  data analysis.   The first  phase  sample,  the 1967
Conservation Needs Inventory (CNI), was a deeply stratified design.  The
second phase sample was  the systematic selection  of ultimate sampling
units from the  CNI to locate RSN sample sites.  Exact variance formulas
for estimates based  upon the RSN would be very difficult to derive, and
would include components of variance from both phases of the design.  As
is common practice in this situation, approximate variance formulas were
used  that  capture most  of the design effects  and  provide conservative
estimates of variance.   The  major design effects to be accounted for in
the  RSN  design  are the  stratification effects  derived  from the  CNI
sampling design.

     The RSN sampling  design was described  in  detail  in Section 1.2.2.
The  dimensions  of  the  stratification  in this design are  reviewed  in
Exhibit  1.7.1.   The first dimension  of stratification  in  the  CNI, and
hence  the  RSN,  consists of  the 48 States  of the  conterminous  United
States.  Within some States,  large scale geographic strata were defined.
For example, the  sandhills of Nebraska were treated  as  a stratum.  The
irrigated  agricultural areas  of many  States  were treated  as  strata.
Desert areas were treated as strata in many States.

     The designation of  large  scale  geographic strata within States was
usually  accompanied  by the  use  of  different sizes of PSU's  in  the CNI
14
  See,  for  example, Kendall,  M.G.  and Stuart, A.  [1968,  pg  195].   The
Advanced Theory of Statistics, Vol. 3.  Hafner, New York.
15
  Or 4W (i,j,k,£) for 640-acre PSU's.  (Recall footnote 1).
                                 -44-

-------
                                                             *
          Exhibit 1.7.1:  Dimensions of the RSN Sample Design

I.   Phase One Sample - 1967 Conservation Needs Inventory (CNI).

     A.   Dimensions of deep stratification.

          1.   States of the 48 conterminous United States.
          2.   Large scale geographic strata, etc., sandhills, irrigated
               areas, etc.
          3.   Counties that are not entirely urban (crossed with the
               large scale geographic strata to form smaller sub-county
               strata).
          4.   Townships or pseudo-townships within counties or sub-county
               strata.
          5.   Strata generally composed of 48 PSU's each within townships
               or pseudo-townships.

     B.   Phase One Sample Selection

          1.   Usually one PSU was selected from each ultimate stratum.
          2.   A template was used to assign a randomly aligned two-
               dimensional sample of SSU's within each sampled PSU (the
               number of SSU's assigned was usually proportional to
               PSU size).

II.  Phase Two Sample - Rural Soils Network (RSN) subsamples

     A.   Systematic subsamples of the utlimate sampling units, SSU's,
          from the first phase sample were used to locate the 10-acre
          RSN sample sites.

     *
      Source:  Documents from and personal communications with both the
EPA Field Studies Branch at Washington, D.C. and the Statistical Laboratory
at Iowa State University.
                                -45-

-------
sample.  The  irrigated  strata  were generally very hetergeneous and were
of  special  interest.   Thus, 40-acre  PSU's were  usually used  in  these
strata.  It  appears that  all  40-acre PSU's were  assigned  to irrigated
strata.  In addition,  the  CNI  sometimes employed  160-acre  PSU's  in the
irrigated strata.   For  analysis  of the RSN data,  a  stratum was defined
within each State which consisted of all sites in 40-acre PSU's, as well
as all sites in 160-acre PSU's  which fell within an irrigated stratum of
the CNI.  Sites  within 40  acre PSU's are given by Tables D-4 and D-5 in
Appendix D.   The sites  in 160-acre PSU's used in irrigated strata are
shown in Table 1.7.2.

     The sandhills  stratum in Nebraska  was a  homogeneous  stratum, and
640-acre PSU's were used throughout.  Geographically homogeneous strata,
such as  desert lands,  were also  defined  in the  States  of New Mexico,
South Dakota,  Utah, and Wyoming.  Apparently,  640-acre  PSU's were used
exclusively within  these  strata  as  well.  Moreover, a  geographically
homogeneous stratum was  also defined in Maine.  Both  200-acre and 400-
acre PSU's were  used  in this stratum for  Maine.   Thus,  for analysis of
the RSN data,  a stratum was defined within each State which consisted of
all sites  in the 200,  400, or 640 acre PSU's.  The  sites  within these
oversized PSU's are given by Tables D-4 and D-5 of Appendix D.

     All RSN  sites  of  a  State  that were not classified as  being in
either  of   the  two large  scale  geographic  strata  just  defined  were
considered to be  in the "remainder" stratum of that  State.  For States
that  contained PSU's  of only one size and  no irrigated  stratum,  all
sites were considered  to be in the "remainder" stratum,  which was then
identical to  the  State stratum itself.   All States in Table D-2 and D-3
of Appendix D fell  into this category,  except for Oregon and Idaho (See
Table 1.7.2).

1.7.4   Analysis

     Several types of analyses  are of interest for the RSN data, notably:

     (1)  Estimation of  base  levels  for residues of  toxic substances,
     (2)  Estimation of  changes  in mean levels of toxic substance resi-
          dues from the  first  round to the second round of data collec-
          tion, and      |
     (3)  Estimation  of relationships   between soil  and  crop  residue
          levels.

The reason  for analyzing  the  RSN  data  in this  study was  to  obtain a
measure of precision of residue data based upon the present data collec-
tion effort.   It  was  decided  that estimation of base levels of residues
would be sufficient.  In particular, estimation of levels was undertaken
for first first round soil  data only.

     It was found that the data values  for most compounds were predomi-
nantly zeros.  In fact,  Tables 1.7.3 and  1.7.4 list  numerous compounds
for which no detectable levels  were found in the cropland and noncropland
soils, respectively.
                                 -46-

-------
             Table 1.7.2:
RSN Sites in Counties Having Both Irrigated
and Remainder Strata, but only 160-acre PSU's
State Name
(State Code)
Arizona (04)
New Mexico (35)
Oregon (41)
Idaho (16)
County Name
(County Code)
Apache (001)
Cochise (003)
Curry (009)
Hidalgo (023)
Roosevelt (041)
Torrance (057)
Crook (013)
Grant (023)
Lane (037)
Malheur (045)
Ada (001)
Adams (003)'
Irrigated Stratum.
Site Numbers
1
3
5
8
10
78,150
81,154
16,17,90,91,162, 163
20-22,94,96,166,167,169
1,64
127
Remainder Stratum,.
Site Numbers
10-13
14,15
2
9
4
8
95,168
97
               Bannocke (005)
               Bear Lake (007)
               Bingham (Oil)
               Blaine (013)
               Booneville (019)
               Butte (023)
               Caribou (029)
               Cassia (031)
               Clark (033)
               Custer (037)
               Elmore (039)
               Franklin (041)
               Fremont (043)
               Gem (045)
               Kootenai (055)
               Lemhi (059)
               Lincoln (063)
               Madison (065)
               Oneida (071)
               Owyhee (073)
               Payette (075)
               Power (077)
               Teton (081)
               Twin Falls (083)
               Valley (085)
               Washington (087)
        4,5,193

        69,102,133,195,196
        134
        199
        13,75,76,105,202
        77
        79,80
        143
        147
        211
        24
        213
        28,153
        120,154
        217
        93,156
        31
        32,95,158,220,221
        96
        159
2,65,190
3,98,128,191
67,68,99,130
100,131
7,8,70,132
103
11,12,74,104,137,200
138,139,201
14
107-109,203
15,78,110
16,141,204
17,111,142,205

84
117,118

25,87,150
90,216
91,121,122

29,30,92,155,218
94,157
33
125,126
222
 Only sites that were surveyed by the RSN have been classified.  Classification of
all sites in these counties would require considerably more effort.
*
 Source:  CNI site numbers corresponding to the RSN site numbers were obtained from
the EPA Field Studies Branch, Washington, B.C.  The stratum classification for each
of these CNI sites was obtained from the Statistical Laboratory of Iowa State
University.
                                    -47-

-------
                                                                     *
  Table 1.7.3:  Compounds with No Detectable Levels in Cropland Soils


               Compound                           Sample Size


               Alachlor                              6071


               Photodieldrin                         6071


               Benzene Heptachloride                 6071


               Mirex                                 6071


               Prolan                                2846


               Bulan                                 2846


               Gamma Chlordane                         37


               Folex                                 2341
£
 Source:  Computer files supplied by EPA Field Studies Branch,

Washington, D.C.
                                 -43-

-------
            Table 1.7.4:
      Compounds with No Detectable Levels
      in Noncropland Soils"'
Compound
Sample Size
Alachlor                 238
DCPA                     238
o,p-'TDE                 238
Photodieldrin            238
Endosulfan I             238
Endosulfan II            238
Endrin                   238
Endrin Aldehyde          238
Endrin Ketone            238
Heptachlor               238
Isodrin                  238
Lindane                  238
Benzene Heptachloride    238
Methoxychlor             238
PCNB                     238
Propachlor               238
Ronnel                   238
Trifluralin              238
Mirex                    238
Ovex                     238
PCB                      238
   Compound
Sample Size
   Bulan
   Gamma Chlordane
   Carbophenothion
op
   Diazinon
°P Ethion
   Folex
o-P Malathion
o? Methyl Parathion
op Ethyl Parathion
oPPhorate
   2,4-D
   Atrazine
                                             2
                                             0
                                             2
                                             2
                                             2
                                             2
                                             2
                                             2
                                             2
                                             2
                                             2
                                             1
                                             9
 Source:  Computer files supplied by EPA Field Studies Branch,
Washington, D.C.

  Rarely tested class of chemicals.
                                 -49-

-------
     It is also evident from these and subsequent tables in this section
that  some classes  of  compounds  were  tested for  more regularly  than
others, which  raises questions  about what generalizations  can be  made
from this data.  It would be of interest to know what criteria were used
to determine whether or not a test would be performed.

     Moreover the exclusion of some States from the sample restricts the
population to  which inferences  are valid.   It  can be  seen from Table
1.7.1  that nearly  complete data exists for some Census Divisions, while
there is very little data for others.

     The  predominance   of  zero values  in the  residue data  results  in
J-shaped  distributions  for  the  amount  of   residue  detected  for  most
compounds.   This  type  of  data  presents  some  analysis problems.   For
example,  the weighted  mean of the raw data values has little meaning if
most  values  are  zero  and  a few  are large.   Thus,  some  type  of  data
transformation  is  generally  required in  order  to obtain  a meaningful
analysis  [See  Lucas,  et al, Recommendations for the  National Surface
Water  Monitoring  Program for Pesticides.  Report No. RTI/1864/01-02I] .
Ideally, each compound  should be considered individually to determine an
appropriate transformation,  if  any.   A ubiquitous compound like arsenic
may not require  a  transformation.   The analysis of the first round soil
data was  computed  on three scales:  (1) The  raw data, (2)  a logarithmic
scale,  and  (3) a  proportion scale.  The  raw data  values  exceeding the
minimum detectable  level  (MDL)  were also analyzed  as a  separate  data
set.  The results are shown in Tables 1.7.5 through 1.7.9.

     Extensive  analyses were not  considered  appropriate  for compounds
for  which there  were  few  detections -  observations  in excess  of the
minimum detectable  level  (MDL) .   The analyses for  these  compounds are
presented in Tables  1.7.5  and 1.7.6 for cropland and noncropland soils,
respectively.  Each  of these tables  contains the following information
for the compounds represented:

     (1)  The sample size, i.e., number of sites for which the
          presence of the compound was tested,
     (2)  The number of data values exceeding the minimum detectable
          level,
     (3)  The largest amount of the compound detected at any one site
          in parts per million (ppm) , and
     (4)  The weighted average,
                                        I w.x.
                                           11
          of the detections in ppm where the sampling weights are
          represented by w. and the detections (amounts exceeding the
          MDL) are denotedxby x..
     For the analyses  on the logarithmic scale, the data values, say x,
were transformed to log  (x+1).  This is a transformation often found to
be useful for stabilizing the variances of data that consist of positive
                                 -50-

-------
integers covering a  wide  range.16  The presence of many zero values for
most of  the compounds  makes  this transformation  of  questionable value
for such compounds.   For  presentation of the findings  on  this scale in
Tables 1.7.7 through 1.7.9,  the results have been transformed again to
the original scale.   In particular, if y represents the weighted mean of
the log-transformed data,  the value reported is given by

          x    =    Antilogg  (y) - 1 ,


which  bears a  strong  analogy  to the  geometric  mean.  Actually,  the
geometric mean is identically zero when any of the data values are zero.

     For analyses  on the  proportion scale, all  data values  above the
minimum detectable level  (MDL)  were  replaced by the  value  one (so that
their  sum  is the number of positive values).  The weighted mean on this
scale  is a weighted  estimate of the proportion of the sampled land area
with a residue  level in excess of the MDL.  Since this scale is felt to
be the most appropriate for analysis of the  residue  data,  the standard
error  and  the  design effect for the  estimated  proportion  are also pre-
sented in Tables 1.7.7 through 1.7.9.  The statistical approach used for
computation of  the standard  errors and design effects was a first-order
Taylor series approximation as implemented in computer software developed
by  RTI  for  analysis  of  nested probability  samples  [See  SESUDAAN:
Standard Errors  Program for  Computing of Standardized Rates from Sample
Survey Data.  Report No. RTI/1789/00-01F].

     Estimation of standard errors and design effects required that some
of  the strata  defined in Section  1.7.3  be combined.  In particular,
strata that received  only one  sampling unit  had to be  combined with
other strata to produce valid estimates of sampling variances.  In order
to determine where this was necessary, the RSN records were first sorted
by States,  by  large  scale geographic strata within States,  and finally
by counties within   large  scale geographic  strata (See Exhibit 1.7.1).
When  a  stratum defined  by these  three  levels   of  sorting  (i.e.,  an
individual county portion of a large scale geographic stratum) contained
only a single  round one   soil  record,  this stratum  was placed  into  a
"residual  county" stratum created  within the  large  scale  geographic
stratum.    Recall  that  the States having no large scale geographic stra-
tification can be thought of as a single large scale geographic stratum.
Finally,   whenever a "residual  county" stratum  within  a large  scale
geographic stratum consisted of only a single Round One soil record, the
stratum identification  of the  record in this "residual county" stratum
was changed to  that  of an arbitrary  county  within the same large scale
geographic stratum.   The goal of this strategy was to achieve the maximum
possible benefits from the CNI stratification for estimation of standard
errors and design effects.

     Since  it was not possible to account for all dimensions of the CNI
stratification  (See  Exhibit  1.7.1), the  standard errors  computed are
16
  See page  157 of  Steel,  R.G.D. and  Torrie,  J.H.  [I960].   Principles
and Procedures of Statistics.  McGraw-Hill, New York.
                                 -51-

-------
    Table 1.7.5:  Statistics for Compounds with Few Detectable Levels
                  in Cropland Soils for Round One*
                                I/        2/         3/
Compound                       n—       na.        Max—
DCPA                          6071        3       1190       632.92
Dicofol                       6071       16       2150       370.40
Endosulfan I                  6071        7        240        95.83
Endosulfan II                 6071       15       1240       172.10
Endosulfan Sulfate            6071       18       2070       343.85
Endrin Aldehyde               6071        1         30        30.00
Endrin Ketone                 6071       10        380        98.19
Lindane                       6071       21        350        51.92
Methoxychlor                  6071        1        280       280.00
PCNB                          6071        4       2610      1103.87
Propachlor                    6071        5        100        80.27
Ronnel                        6071        1        190       190.00
Ovex                          6071        1       1130      1130.00
PCB                           6071        2       1490      1130.98
Carbophenothion               2341        1        230       230.00
DEF                           2341        9        670       272.63
Diazinon                      2341        9        170        82.01
Ethion                        2341        3        240       107.95
Malathion                     2341        5        360       163.26
Methyl Parathion              2341        1         10        10.00
Ethyl Parathion               2341       18       3010       296.05
Phorate                       2341       10        400        76.16
2,4-D                          188        3         30        17.26
-Sample size.
2/
-Number of occurrences above the MDL.

3/
-Maximum amount detected (PPM).

4/
-Weighted average of the data values in excess of the MDL (PPM).
*
 Source:  Computer files supplied by EPA Field Studies Branch,
Washington, D.C.
                                 -52-

-------
       Table 1.7.6:  Statistics for Compounds with Few Detectable

                     Levels in Noncropland Soils for Round One*
Compound
Aldrin
Chlordane
o,p'-DDE
o.p'-DDT
o,p'-TDE
Dicofol
Dieldrin
Endosulfan Sulfate
Heptachlor Epoxide
Toxaphene
^
238
238
238
238
238
238
238
238
238
238
°+y
1
5
2
8
7
2
10
1
2
1
Max'/
20
500
30
50
180
290
90
80
10
520
i?
20.00
200.34
24.57
20.43
45.47
138.00
29.00
80.00
10.00
520.00
-  Sample size.


2/
—  Number of occurances above the MDL.


3/
-  Maximum amount detected (ppm).


4/
-  Weighted average of the data values in excess of the MDL (ppm),




*

 Source:  Computer files supplied by EPA Field Studies Branch,

Washington, D.C.
                                 -53-

-------
            Table  1.7.7:   Statistics  for Compounds with Detectable Levels in Noncropland Soils  for Round One

Compound
p,p'-DDE
p.p'-DDT
Arsenic
Atrazine
n— MA v^— v — v^— Y ^~
1JOA "_i_ ™ "
238 310 37.02 3.51 0.35
238 230 54.12 3.49 0.26
233 54,170 3,957.92 3,772.27 1,618.71

P(>MDL)-/
0.09
0.06
0.95

S.D.I/
0.02
0.02
0.02

DEFF-/
0.92
1.11
1.32
                                                                                                     (continued)
j>
-Sample size.


2/
-Maximum amount detected (ppm).


3/
- Weighted average of the data values in excess of the MDL (ppm).


4/
- Weighted average of the amount detected (ppm).



- Antilog  (weighted average of log  (amount +!)-!); analogous to the geometric mean (ppm),



- Weighted proportion of cases with data values in excess of the MDL.



— Standard deviation of the estimated proportion.

o /

- Design effect for the estimated proportion.


*

 Source:  Computer files supplied by the EPA Field Studies Branch, Washington, D.C.

-------
Table 1.7.8:  Statistics for Compounds with Detectable  Levels  in Cropland Soils by  Census  Division for Round One





1
in
Ui



Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
West-North Central
South Atlantic
East-South Central
West-South Central
Mountain

„!/
6071
72
296
1595
505
1943
482
429
546
203

Max'/
13,280
280
150
13,280
170
4,250
570
420
60
20
*
^
219.65
280.00
90.89
277.06
54.91
166.47
123.23
110.00
20.72
20.00
Aldrin
x—
23.06
4.02
.60
61.89
.99
17.56
4.10
2.76
.68
.10
X8
.54
.08
.03
1.59
.07
.54
.14
.10
.10
.01
P(>MDL)-/
0.11
0.01
0.01
0.22
0.02
0.11
0.03
0.03
0.03
0.00
S.D.J/
0.00
0.01
0.00
0.01
0.01
0.01
0.01
0.01
0.01
0.00
DEFF-
0.79
1.05
1.01
0.79
1.03
0.79
0.89
1.10
0.96
0.99
                                                                                                (continued)

-------
Table 1.7.8:  Statistics for Compounds with Detectable Levels  in Cropland Soils  by  Census  Division for Round One
                                                   (continued)
Chlordane
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
West-North Central
South Atlantic
East-South Central
West-South Central
Mountain
„!/
6071
72
296
1595
505
1943
482
429
546
203
Max*/
13,340
2,200
3,190
6,980
2,460
8,040
13,340
7,890
260
480
M'
645.24
693.19
596.78
809.89
527.37
489.02
655.08
753.98
116.26
164.88
x—
56.74
43.30
26.26
120.76
14.71
40.96
65.30
35.67
1.26
11.32
Xg
.63
.45
.28
1.48
.15
.55
.68
.30
.05
.38
P(>MDL)-/
0.09
0.06
0.04
0.15
0.03
0.08
0.10
0.05
0.01
0.07
S.D.I/
0.00
0.03
0.01
0.01
0.01
0.01
0.01
0.01
0.00
0.03
DEFF-^
0.93
1.07
0.93
0.87
0.97
0.86
0.89
0.93
1.19
2.32
                                                                                                (continued)

-------
    Table 1.7.8:  Statistics for Compounds with Detectable Levels in Cropland Soils by Census Division for Round One
                                                       (continued)
Ln
o,p ' - DDE
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
West-North Central
South Atlantic
East-South Central
West-South Central
Mountain
ni/
6071
72
296
1595
505
1943
482
429
546
203
Max'/
510
30
100
510
380
90
140
80
250
70
M'
45.90
30.00
40.86
109.60
51.53
27.37
29.23
32.41
67.24
35.00
&
.98
.12
1.35
.62
5.02
.06
2.00
1.66
1.42
0.16
'r'
.07
.01
.12
.02
.40
.01
.23
.19
.08
.02
P(>MDL)-/
0.02
0.00
0.03
0.01
0.10
0.00
0.07
0.05
0.02
0.00
S.D.I/
0.00
0.00
0.01
0.00
0.01
0.00
0.01
0.01
0.01
0.00
DEFF-/
0.83
0.29
1.11
1.00
0.86
0.86
0.98
0.93
1.02
0.12
                                                                                                    (continued)

-------
Table 1.7.8:  Statistics for Compounds with Detectable Levels  in  Cropland  Soils  by  Census  Division for Round One
                                                   (continued)
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
i
in
MDL)-/
0.20
0.34
0.28
0.08
0.46
0.06
0.59
0.51
0.27
0.19

S.D.^/
0.00
0.05
0.03
0.01
0.02
0.00
0.02
0.02
0.02
0.02

DEFF-/
0.56
0.72
0.99
0.90
0.57
0.84
0.79
0.55
0.64
0.66
                                                                                                (continued)

-------
Table 1.7.8:  Statistics for Compounds with Detectable  Levels  in  Cropland  Soils  by Census Division for Round One
                                                   (continued)
p.p1 - DDT
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
West-North Central
South Atlantic
East-South Central
West-South Central
Mountain
„>/
6071
72
296
1595
505
1943
482
429
546
203
Max'/
245,180
4,650
245,180
35,920
19,750
1,420
20,260
16,070
15,860
3,230
- 3/
1044.70
850.87
5890.52
1610.18
783.42
127.08
582.11
967.43
1002.76
226.55
&
187.17
253.33
1527.29
97.31
297.64
7.80
318.45
478.18
252.82
38.88
x*/
g
1.51
4.81
2.74
.34
6.24
.30
17.36
14.64
2.90
1.05
P(>MDL)-/
0.18
0.30
0.26
0.06
0.38 ~~
0.06
0.55
0.49
0.25
0.17
S.D.I/
0.00
0.04
0.03
0.01
0.02
0.00
0.02
0.02
0.01
0.02
DEFF-/
0.56
0.56
0.99
0.93
0.62
0.83
0.73
0.52
0.56
0.76
                                                                                                (continued)

-------
Table 1.7.8:  Statistics for Compounds with Detectable  Levels  in  Cropland Soils by Census Division for Round One
                                                   (continued)
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
i
•p West-North Central
South Atlantic
East-South Central
West-South Central
Mountain

&
6071
72
296
1595
505
1943
482
429
546
203

Max'/
32,750
860
32,750
8,210
4,510
410
4,180
1,790
5,620
290

M'
307.91
169.43
1552.14
797.32
205.01
46.63
171.33
233.35
374.76
66.11
o,p
&
35.94
38.13
262.43
20.47
56.99
1.31
67.89
83.90
63.94
5.84
1 - DDT
'r'
.67
1.80
1.14
.14
2.35
.09
4.64
4.31
1.23
.38

P(>MDL)-/
0.12
0.23
0.17
0.03
0.28
0.03
0.40
0.36
0.17
0.09

S.D.2/
0.00
0.03
0.02
0.00
0.02
0.00
0.02
0.02
0.01
0.02

DEFF-/
0.58
0.45
0.93
0.87
0.74
1.00
0.76
0.55
0.49
0.69
                                                                                                (continued)

-------
Table 1.7.8:  Statistics for Compounds with Detectable Levels  in  Cropland  Soils by Census  Division for Round One
                                                   (continued)
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
i
f West-North Central
South Atlantic
East-South Central
West-South Central
Mountain

n'/
6071
72
296
1595
505

1943
482
429
546
203

Max*/
38,460
8,200
38,460
31,430
20,130

500
7,470
1,250
1,670
150

M'
349.24
616.34
1978.77
859.24
357.52

32.42
177.33
135.30
159.58
38.19
P.P
x—
31.78
156.40
255.99
25.67
68.26

.63
63.33
31.50
21.10
1.86
1 - TDE
X8~
.46
2.35
.82
.14
1.27

.06
3.57
1.64
.72
.17

P(>MDL)-/
0.09
0.25
0.13
0.03
0.19

0.02
0.36
0.23
0.13
0.05

S.D.I/
0.00
0.04
0.02
0.00
0.02

0.00
0.02
0.02
0.01
0.01

DEFF-/
0.73
0.70
1.00
0.95
0.79

1.09
0.85
0.94
0.68
0.77
                                                                                                (continued)

-------
Table 1.7.8:  Statistics for Compounds with Detectable Levels in Cropland Soils by Census Division for Round One'
                                                   (continued)
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
£ West-North Central
South Atlantic
East-South Central
West-South Central
Mountain

ni/
6071
72
296
1595
505
1943
482
429
546
203

Max*/
16,790
50
16,790
1,300
4,520
100
1,350
490
210
10

o.p
1 - IDE


x+^ & xg^ P(>MDL)^ S.D.^
387.39
50.00
2156.69
206.48
252.80
100.00
124.03
138.89
150.00
10.00
5.71
.72
91.14
1.01
13.76
.04
9.00
2.56
.20
.14
.06
.06
.27
.02
.27
0.00
.36
.08
.01
.03
0.01
0.01
0.04
0.00
0.05
0.00
0.07
0.02
0.00
0.01
0.00
0.01
0.01
0.00
0.01
0.00
0.01
0.01
0.00
0.00

DEFF-/
0.86
1.06
0.87
0.98
0.93
0.81
1.03
1.00
0.37
0.32
                                                                                                (continued)

-------
Table 1.7.8:  Statistics for Compounds with Detectable Levels  in Cropland Soils  by Census  Division for Round One
                                                   (continued)
Dieldrin
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
West-North Central
South Atlantic
East-South Central
West-South Central
Mountain
„!'
6071
72
296
1595
505
1943
482
429
546
203
Max*/
9,830
4,640
9,830
6,180
2,150
1,620
1,850
650
270
610
*.*
150.35
1,087.94
284.49
196.21
126.37
113.45
175.57
61.72
70.73
61.70
x-
41.14
123.64
60.22
72.36
20.69
32.92
43.34
13.05
9.42
11.69
X8
2.22
.79
1.29
4.58
.93
2.35
1.77
1.08
.68
.95
P(>MDL)-/
0.27
0.11
0.21
0.37
0.16
0.29
0.25
0.21
0.13
0.19
S.D.2/
0.01
0.04
0.02
0.01
0.02
0.01
0.02
0.02
0.01
0.03
DEFF-/
0.84
1.06
0.92
0.71
0.92
0.84
0.83
1.04
0.66
1.55
                                                                                                (continued)

-------
Table 1.7.8:  Statistics for Compounds with Detectable  Levels  in  Cropland  Soils by  Census  Division for Round One
                                                   (continued)
Endrin
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
West-North Central
South Atlantic
East-South Central
West-South Central
Mountain
a*/
6071
72
296
1595
505
1943
482
429
546
203
Max'/
2,130
150
560
20
160
80
2,130
640
480
220
^
142.72
150.00
313.43
14.89
49.22
26.53
347.11
141.47
101.57
33.43
x—
1.73
2.17
3.32
.02
1.54
.15
12.28
4.07
2.21
.57
\~'
.05
.07
.06
0.00
.12
.02
.17
.13
.09
.05
P(>MDL)-/
0.01
0.01
0.01
0.00
0.03
0.01
0.04
0.03
0.02
0.02
S.D.I/
0.00
0.01
0.00
0.00
0.01
0.00
0.01
0.01
0.01
0.01
DEFF-/
0.81
1.06
0.21
0.98
0.86
0.79
1.00
0.73
0.92
0.90
                                                                                                (continued)

-------
    Table 1.7.8:  Statistics for Compounds with Detectable Levels in Cropland Soils by Census Division for Round One
                                                       (continued)
Ul
Heptachlor
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
West-North Central
South Atlantic
East-South Central
West-South Central
Mountain
„!/
6071
72
296
1595
505
1943
482
429
546
203
Max'/
1,710
40
10
1370
20
1,710
340
70
10
260
M'
101.01
25.00
10.00
102.76
20.00
109.97
93.18
18.30
10.00
140.00
&
4.78
.72
.04
12.23
.04
3.99
1.56
.34
.02
.34
- 5/
g
.20
.09
.01
.57
.01
.14
.06
.05
.01
.01
P(>MDL)-/
0.05
0.03
0.00
0.12
0.00
0.04
0.02
0.02
0.00
0.00
S.D.1/
0.00
0.02
0.00
0.01
0.00
0.00
0.01
0.01
0.00
0.00
DEFF-/
0.81
1.07
1.17
0.84
0.98
0.74
1.02
1.03
1.30
0.21
                                                                                                    (continued)

-------
Table 1.7.8:  Statistics for Compounds with Detectable  Levels  in  Cropland  Soils by  Census  Division for Round One
                                                   (continued)
Heptachlor Epoxide
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
i
a! West-North Central
South Atlantic
East-South Central
West-South Central
Mountain
nl'
6071
72
296
1595
505

1943
482
429
546
203
Max'/
1,080
60
60
1,080
70

330
180
720
10
50
M'
54.59
32.64
24.75
69.56
18.46

43.16
41.02
96.30
10.00
37.65
&
4.24
1.91
.83
9.21
.51

3.55
2.97
2.70
.02
1.61
X8
.31
.22
.11
.65
.08

.31
.27
.11
.01
.16
P(>MDL)-/
0.08
0.06
0.03
0.13
0.03

0.08
0.07
0.03
0.00
0.04
S.D.2/
0.00
0.03
0.01
0.01
0.01

0.01
0.01
0.01
0.00
0.02
DEFF-/
0.92
1.12
0.86
0.84
1.00

0.85
0.94
1.03
1.30
2.42
                                                                                                (continued)

-------
Table 1.7.8:  Statistics for Compounds with Detectable Levels  in  Cropland  Soils  by  Census  Division for Round One
                                                   (continued)
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
West-North Central
South Atlantic
East-South Central
West-South Central
Mountain

-i'
6071
72
296
1595
505
1943
482
429
546
203

Max'/
180
0
0
180
0
50
0
10
0
0

M'
21.68
0.0
0.0
23.17
0.0
18.99
0
10.00
0.0
0.0
Isodrin
&
.16
0.0
0.0
.51
0.0
.08
0
.05
0.0
0.0

'f'
.02
0.0
0.0
.06
0.0
.01
0
.01
0.0
0.0

rOMDi,*/
0.01
0.0
0.0
0.02
0.0
0.00
0.0
0.01
0.0
0.0

8.D.2/
0.00
0.0
0.0
0.00
0.0
0.00
0.0
0.00
0.0
0.0

DEFF-/
0.96
1.00
1.00
0.99
1.00
0.82
1.00
1.08
1.00
1.00
                                                                                                (continued)

-------
Table 1.7.8:   Statistics for Compounds  with Detectable Levels  in  Cropland  Soils  by Census  Division for Round One
                                                   (continued)
Toxaphene
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
£ West-North Central
South Atlantic
East-South Central
West-South Central
Mountain
„!/
6071
72
296
1595
505
1943
482
429
546
203
Max*/
36,330
0
0
0
8,300
5,970
18,100
21,000
36,330
4,960
M'
3,562.56
0.0
0.0
0.0
2,225.71
3,031.10
3,012.79
3,460.30
7,271.25
3,398.33
x-
129.98
0.0
0.0
0.0
208.16
5.08
423.65
629.80
519.17
47.19
Js-'
.32
0.0
0.0
0.0
.99
.01
1.89
2.97
.80
.12
P(>MDL)-/
0.04
0.0
0.0
0.0
0.09
0.00
0.14
0.18
0.07
0.01
S.D.I/
0.00
0.0
0.0
0.0
0.01
0.00
0.01
0.02
0.01
0.01
DEFF-/
0.64
1.00
1.00
1.00
0.76
0.81
0.84
0.71
0.66
0.51
                                                                                                (continued)

-------
Table 1.7.8:  Statistics for Compounds with Detectable Levels  in  Cropland  Soils  by  Census  Division for Round One
                                                   (continued)
Trifluralin
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
i
« West-North Central
South Atlantic
East-South Central
West-South Central
Mountain
„'/
6071
72
296
1595
505

1943
482
429
546
203
Max'/
1,860
0
140
600
1,290

680
1,860
270
370
240
M'
99.33
0.0
92.95
90.40
159.72

94.74
122.55
76.00
118.86
97.07
x—
3.20
0.0
1.12
2.11
4.05

2.42
6.67
7.45
4.57
1.90
**-'
.14
0.0
.05
.11
.11

.12
.23
.48
.19
.08
P(>MDL)-/
0.03
0.0
0.01
0.02
0.03

0.03
0.05
0.10
0.04
0.02
S.D.2/
0.00
0.0
0.01
0.00
0.01

0.00
0.01
0.01
0.01
0.01
DEFT-/
0.80
1.00
0.93
0.99
0.97

0.69
0.85
0.73
0.81
0.91
                                                                                                (continued)

-------
     Table  1.7.8:  Statistics for Compounds with Detectable Levels in Cropland Soils by Census Division for Round One
                                                       (continued)
o
Arsenic
Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
West-North Central
South Atlantic
East-South Central
West-South Central
Mountain
„!/
4690
59
222
1191
311
1598
402
326
410
171
Max'/
180,420
69,100
180,420
99,400
61,810
107,450
25,600
34,480
33,500
15,820
Z 3/ «^/ « 5/ DfiMnr^/ c n 7/
x — x— x — rt>MDLJ— 5.D.—
5,869.29
10,649.32
9,211.71
6,618.49
4,490.05
5,948.02
3,251.96
7,286.42
4,138.06
3,555.91
5,665
10,462
9,034
6,448
4,404
5,667
3,080
7,180
4,072
3,430
.15
.80
.18
.51
,59
.13
.14
.89
.43
.49
2,863
4,913
5,270
3,427
2,642
2,778
1,260
4,768
2,391
1.957
.07
.77
.13
.92
.87
.43
.43
.52
.27
.63
0

0
0
0
0
0
0
0
0
.97
.98
.98
.97
.98
.95
.95
.99
.98
.96
0.00
0.02
0.01
0.00
0.01
0.01
0.01
0.01
0.01
0.01
DEFF-/
1.44
1.05
1.11
1.01
0.96
1.62
0.96
0.94
1.31
0.84
                                                                                                    (continued)

-------
Table 1.7.8:  Statistics for Compounds with Detectable Levels  in Cropland Soils by Census Division for Round One

                                                   (continued)

Census Division
Total RSN
New England
Middle Atlantic
East-North Central
Pacific
West-North Central
South Atlantic
East-South Central
West-South Central
Mountain
i/
-Sample size.

I/ „ 2/ - 3/
n-' Max- x+-
523 16,730 231.40
0 - -
0
235 1,380 137.22
0 - -
288 16,730 303.75
0 -
0 -
0 " "
: -


Atrazine
•Al x y P(>HDL)-/ S.D.-' DEFF-'
8
115.34 B.30 0.50 0.02 1.16
- - •
_ - -
70.21 8.12 0.51 0.03 0.99
_ - - - -
148.45 8.40 0.49 0.03 1.27
. - - -
_ - -
. - -
. - -
"~~ ~ (continued)

 2 /

 - Maximum amount detected (ppo).



 -^Weighted average of the data values in excess of the MDL (ppra).



 -'Weighted average of the amount detected (ppra).



 -^Antilog  (weighted average of log  (amount +!)-!); analogous  to  the geometric mean (ppm).
          c                         c                                               _





 - Weighted proportion of cases with data values in excess of  the MDL.



 - Standard deviation of the estimated proportion.


 8/
 - Design effect for the estimated proportion.



  Source:  Computer files supplied by the EPA Field Studies Branch, Washington,  D.C.

-------
Table 1.7.9:  Statistics for Compounds with Detectable  Levels in Cropland Soils by Cropping Region  for  Round  One
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
General Farming
Hay
Vegetables
Fruit or Nut Orchard

&
6071
1386
1056
221
1271
699
609
557
253

Max*/
13,280
4,250
220
0
13,280
1,220
280
350
470

M'
219.65
192.83
55.25
0.0
290.73
167.63
78.28
80.69
172.23
Aldrin
x-
23.06
42.83
0.63
0.0
61.75
10.75
1.38
2.23
3.42

x*/
8
.54
1.53
.04
0.0
1.39
.31
.07
.11
.09

P(>MDL)-/
0.11
0.22
0.01
0.0
0.21
0.06
0.02
0.03
0.02

S.D.I/
0.00
0.01
0.00
0.0
0.01
0.01
0.00
0.01
0.01

DEFF-/
0.79
0.89
0.77
1.00
0.96
0.87
0.76
1.02
0.59
                                                                                                (continued)

-------
    Table 1.7.9:   Statistics  for  Compounds  with'Detectable  Levels  in  Cropland  Soils by  Cropping Region for Round One
                                                       (continued)
U)
Chlordane _
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
General Farming
Hay
Vegetables
Fruit or Nut Orchard
„!/
6071
1386
1056
221
1271
699
609
557
253
Max'/
13,340
8,040
660
620
5,620
1,190
7,890
13,340
2,720
•M'
645.24
652.22
206.17
264.00
736.21
321.79
620.00
764.19
474.67
x-
56.74
113.85
1.47
6.30
97.13
15.29
25.55
61.76
51.77
Xg
.63
1.69
.04
.14
1.18
.27
.23
.55
.77
P(>MDL)-/
0.09
0.17
0.01
0.02
0.13
0.05
0.04
0.08
0.11
S.D.2/
0.00
0.01
0.00
0.01
0.01
0.01
0.01
0.01
0.02
DEFF-
0.93
0.94
0.81
1.06
0.98
0.97
1.59
1.26
0.86
                                                                                                    (continued)

-------
Table 1.7.9:  Statistics for Compounds with Detectable Levels  in Cropland Soils by Cropping Region for Round One
                                                   (continued)
o,p' - DDE
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
General Farming
Hay
Vegetables
Fruit or Nut Orchard
„!/
6071
1386
1056
221
1271
699
609
557
253
Ma**/
510
90
380
250
200
10
30
140
510 .
x+
45.90
28.65
121.31
41.70
31.97
10.00
20.00
39.37
63.95
^
.98
.26
.35
6.27
.52
.02
.09
1.77
9.57
**-
.07
.03
.01
.70
.05
.01
.01
.16
.70
P(>MDL)-/
0.02
0.01
0.00
0.15
0.02
0.00
0.00
0.05
0.15
S.D.^
0.00
0.00
0.00
0.02
0.00
0.00
0.00
0.01
0.02
DEFF-'
0.83
0.99
0.70
1.01
0.87
0.90
0.94
0.95
0.97
                                                                                                (continued)

-------
Table 1.7.9:  Statistics for Compounds with Detectable Levels  in Cropland  Soils  by  Cropping Region for Round One
                                                   (continued)
p,p' - DDE
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
•vl
i General Farming
Hay
Vegetables
Fruit or Nut Orchard
ni/
6071
1386
1056
221
1271
699
609
557
253
Max*/
54,980
550
2,270
6,210
4,760
4,550
8,090
6,820
54,980
x^
x+
303.39
68.21
127.47
344.08
226.36
154.69
272.45
222.87
974.93
&
59.68
7.21
9.62
272.61
53.45
13.84
28.48
107.16
611.57
x 5/
A
g
1.34
.48
.32
52.52
1.83
.39
.49
6.92
21.20
P(>MDL)-/
0.20
0.11
0.08
0.79
0.24
0.09
0.10
0.48
0.63
S.D.I/
0.00
0.01
0.01
0.03
0.01
0.01
0.01
0.02
0.03
DEFF-
0.56
0.93
0.61
0.87
0.56
0.95
0.80
0.72
0.88
                                                                                                (continued)

-------
Table 1.7.9:  Statistics for Compounds with Detectable Levels  in Cropland  Soils  by Cropping Region for Round One
                                                   (continued)
p.p1 - DDT
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
( Soybeans
i General Farming
Hay
Vegetables
Fruit or Nut Orchard
a*/
6071
1386
1056
221
1271
699
609
557
253
Ma*'/
245,180
3,080
5,160
15,860
16,070
23,700
38,550
69,300
245,180
x+
1,044.70
179.52
218.00
1,144.63
793.83
707.81
847.73
1,048.12
3,131.82
^
187.17
19.55
13.67
890.55
174.66
45.21
74.62
440.77
1,753.51
Xg
1.51
.60
.31
98.48
2.25
.32
.48
8.21
20.54
P(>MDL)-/
0.18
0.11
0.06
0.78
0.22
0.06
0.09
0.42
0.56
S.D.I/
0.00
0.01
0.01
0.03
0.01
0.01
0.01
0.02
0.03
DEFF-/
0.56
0.90
0.66
0.81
0.53
0.93
0.75
0.73
0.89
                                                                                                (continued)

-------
Table 1.7.9:  Statistics for Compounds with Detectable  Levels  in Cropland Soils by Cropping Region for  Round One
                                                   (continued)
o.p1 - DDT
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
i
7"1 General Farming
Hay
Vegetables
Fruit or Nut Orchard
•*/
6071
1386
1056
221
1271

699
609
557
253
Max'/
32,750
470
620
5,620
3,320

3,790
14,050
11,700
32,750
M'
307.91
71.41
69.77
328.24
212.36

225.16
519.05
279.46
738.55
x-
35.94
3.12
2.76
203.82
32.55

7.62
24.09
82.86
292.74
l?
.67
.17
.15
20.33
.97

.14
.21
2.67
5.62
P(>MDL)-/
0.12
0.04
0.04
0.62
0.15

0.03
0.05
0.30
0.40
8.D.2/
0.00
0.01
0.00
0.03
0.01

0.01
0.01
0.02
0.03
DEFF-/
0.58
1.06
0.67
0.80
0.46

0.99
0.77
0.84
0.89
                                                                                                (continued)

-------
     Table 1.7.9:   Statistics for Compounds with Detectable Levels in Cropland Soils by Cropping Region for Round One

                                                        (continued)
I
•vl
p.p1 - TDE
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
General Farming
Hay
Vegetables
Fruit or Nut Orchard
al'
6071
1386
1056
221
1271
699
609
557
253
Max'/
38,460
1,230
370
1,670
1,250
2,070
8,200
31,430
38,460
M'
349 . 24
88.79
62.22
172.24
123.35
195.38
368.31
494.21
1,155.54
&
31.78
4.48
1.53
75.23
13.46
4.15
15.09
125.08
329.80
X8
.46
.20
.09
5.82
.55
.08
.17
1.91
2.86
Pom*,*/
0.09
0.05
0.02
0.44
0.11
0.02
0.04
0.25
0.29
S.D.2/
0.00
0.01
0.00
0.03
0.01
0.01
0.01
0.02
0.03
DEFF-/
0.73
1.02
0.64
1.00
0.75
0.94
0.81
0.86
1.03
                                                                                                     (continued)

-------
     Table 1.7.9:   Statistics for Compounds  with Detectable  Levels  in  Cropland  Soils  by Cropping Region for Round One
                                                        (continued)
\o
o.p1 - IDE
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
General Farming
Hay
Vegetables
Fruit or Nut Orchard
„!/
6071
1386
1056
221
1271
699
609
557
253
Max*/
16,790
340
150
490
210
100
230
4,870
16,790
M>
387.39
112.20
46.58
161.17
49.52
67.24
100.38
237.14
1,265.99
xV
5.71
.70
.23
7.10
.41
.24
.69
13.88
104.07
-r
.06
.03
.02
.22
.03
.01
.03
.28
.52
P(>MDL)-/
0.01
0.01
0.01
0.04
0.01
0.00
0.00
0.06
0.08
S.D.Z'
0.00
0.00
0.00
0.01
0.00
0.00
0.01
0.01
0.02
DEFF-/
0.86
0.97
0.46
1.11
1.08
0.86
0.87
0.90
1.04
                                                                                                     (continued)

-------
Table 1.7.9:  Statistics for Compounds with Detectable Levels  in Cropland Soils by Cropping Region for Round One
                                                   (continued)




1
00
o



Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
General Farming
Hay
Vegetables
Fruit or Nut Orchard

ni/
6071
1386
1056
221
1271
699
609
557
253

Max'/
9,830
1,620
610
1,280
6,180
710
4,640
1,850
9,830

-^
150.35
149.79
51.14
86.50
165.48
109.40
128.11
132.74
442.65
Dieldrin
-4/
x-
41.14
74.68
4.43
12.04
64.05
19.19
14.94
36.92
99.08

S5/
g
2.22
8.30
.34
.60
4.58
1.03
.55
2.03
1.72

P(>MDL)-/
0.27
0.50
0.09
0.14
0.39
0.18
0.12
0.28
0.22

S.D.2/
0.01
0.01
0.01
0.02
0.01
0.01
0.02
0.02
0.03

DEFF-/
0.84
0.90
1.20
1.09
0.88
0.91
1.33
0.93
0.98
                                                                                                (continued)

-------
     Table 1.7.9:  Statistics for Compounds with Detectable  Levels  in Cropland Soils  by Cropping Region for Round One

                                                        (continued)
i
oo
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
General Farming
Hay
Vegetables
Fruit or Nut Orchard

„!/
6071
1386
1056
221
1271
699
609
557
253

Max'/
2,130
80
80
420
640
480
100
1,000
2,130

M'
142.72
37.11
30.76
111.21
93.08
234.28
58.40
187.72
483.73
Endrin
x-
1.73
.15
.34
7.66
.88
.72
.16
6.91
13.92

v'
.05
.01
.04
.32
.03
.01
.01
.17
.15

P(>MDL)-/
0.01
0.00
0.01
0.07
0.01
0.00
0.00
0.04
0.03

S.D.I/
0.00
0.00
0.00
0.02
0.00
0.00
0.00
0.01
0.01

DEFF-/
0.81
0.97
0.74
0.87
0.87
1.08
0.83
0.77
1.05
                                                                                                     (continued)

-------
Table 1.7.9:  Statistics for Compounds with Detectable  Levels  in Cropland  Soils by  Cropping Region for Round One
                                                   (continued)
Heptachlor




i
oo
10
1



Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
General Farming
Hay
Vegetables
Fruit or Nut Orchard
n!/
6071
1386
1056
221
1271
699
609
557
253
Max'/
1,710
1,710
10
10
940
290
260
30
190
* 3/
x+
101.01
112.48
10.00
10.00
102.72
47.69
56.48
16.74
75.22
&
4.78
12.21
.02
.10
9.57
.99
.34
.18
1.23
^
.20
.51
0.00
.02
.42
.07
.02
.03
.07
P(>MDL)-/
0.05
0.11
0.00
0.01
0.09
0.02
0.01
0.01
0.02
S.D.Z/
0.00
0.01
0.00
0.01
0.01
0.01
0.00
0.00
0.01
DEFF-/
0.81
0.84
0.81
1.07
0.95
1.04
0.76
0.92
1.06
                                                                                                (continued)

-------
Table 1.7.9:  Statistics for Compounds with Detectable  Levels  in Cropland  Soils by  Cropping Region for Round One
                                                   (continued)
Heptachlor Epoxide
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
oo
**» General Farming
Hay
Vegetables
Fruit or Nut Orchard
„>'
6071
1386
1056
221
1271
699
609
557
253
Max'/
1,080
350
70
40
1,080
200
720
120
180
x+
54.59
54.75
24.69
21.50
64.00
38.18
68.05
30.32
44.25
x—
4.24
9.28
.17
.62
7.56
1.59
2.17
1.37
2.88
x*/
g
.31
.82
.02
.09
.54
.15
.12
.15
.23
P(>MDL)-/
0.08
0.17
0.01
0.03
0.12
0.04
0.03
0.05
0.07
S.D.I/
0.00
0.01
0.00
0.01
0.01
0.01
0.01
0.01
0.02
DEFF-/
0.92
0.93
0.82
1.10
0.98
0.96
1.32
1.45
1.06
                                                                                                (continued)

-------
      Table  1.7.9:   Statistics  for Compounds with Detectable Levels in Cropland Soils by Cropping Region for Round  One
                                                        (continued)
00
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
General Farming
Hay
Vegetables
Fruit or Nut Orchard

„!/
6071
1386
1056
221
1271
699
609
557
253

Max'/
180
180
0
0
90
20
0
10
0
Isodrin
- 3/ -4/ - 5/
X+ X Xg
21.68 .16 .02
21.46 .47 .06
0 00
0 00
24.23 .27 .03
14.98 .04 .01
0 00
10.00 .02 0
0 00

P(>MDL)-/
0.01
0.02
0.0
0.0
0.01
0.00
0.0
0.00
0.0

S.D.I/
0.00
0.00
0.0
0.0
0.00
0.00
0.0
0.00
0.0

DEFF-/
0.96
0.96
1.00
1.00
1.10
1.04
1.00
1.13
1.00
                                                                                                      (continued)

-------
     Table 1.7.9:  Statistics for Compounds with Detectable Levels in Cropland Soils by Cropping Region for Round One
                                                        (continued)
oo
Ol
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
General Farming
Hay
Vegetables
Fruit or Nut Orchard

a!/
6071
1386
1056
221
1271
699
609
557
253

Max*/
36,330
8,800
1,600
36,330
21,000
2,080
11,030
12,000
8,300

M'
3,562.56
2,761.23
810.18
4,190.03
3,932.77
2,080.00
5,174.00
2,734.31
2,601.99
Toxaphene
x—
129.98
18.74
1.78
1,394.85
261.63
2.99
22.66
216.55
267.90

x*/
g
.32
.05
.01
11.81
.67
.01
.04
.82
1.16

P(>MDL)-/
0.04
0.01
0.00
0.33
0.07
0.00
0.00
0.08
0.10

S.D.I'
0.00
0.00
0.00
0.03
0.01
0.00
0.00
0.01
0.02

DEFF-^
0.64
0.77
0.78
0.88
0.65
1.00
0.89
0.91
0.89
                                                                                                     (continued)

-------
     Table 1.7.9:   Statistics  for  Compounds with Detectable Levels in Cropland Soils by Cropping Region  for  Round One

                                                        (continued)
I
OO
Trifluralin
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
General Farming
Hay
Vegetables
Fruit or Nut Orchard
„!/
6071
1386
1056
221
1271
699
609
557
253
Max'/
1,860
600
290
310
680
310
10
1,860
1,290
M'
99.33
88.20
126.32
70.07
87.60
104.94
10.00
160.00
328.65
#
3.20
2.79
.36
11.12
6.41
.87
.01
7.40
5.33
X8
.14
.14
.01
.86
.35
.03
0.00
.22
.06
P(>MDL)-/
0.03
0.03
0.00
0.16
0.07
0.01
0.00
0.05
0.02
S.D.I/
0.00
0.00
0.00
0.02
0.01
0.00
0.00
0.01
0.01
DEFF-/
0.80
0.93
0.76
0.82
0.88
0.99
0.86
0.94
1.02
                                                                                                     (continued)

-------
Table 1.7.9:   Statistics for Compounds with Detectable Levels  in Cropland  Soils  by  Cropping Region for Round One
                                                   (continued)
Arsenic
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
00
I"1 General Farming
Hay
Vegetables
Fruit or Nut Orchard
„!/
4690
1109
826
177
962
516
453
448
182
Max'/
180,420
31,980
37,530
38,900
107,450
64,940
51,300
69,100
180,420
- 3/
5,869.29
5,653.68
5,292.43
5,932.09
6,723.70
6,376.99
5,602.92
4,997.11
8,009.47
x—
5,665
5,467
5,091
5,827
6,521
6,234
5,275
4,851
7,654
.15
.48
.67
.79
.48
.96
.52
.67
.21
X
g
2,863.
3,101.
2,616.
3,497.
3,462.
3,360
2,058
2,367
2,415
v
07
61
57
19
38
.02
.05
.47
.32
P(>MDL)-/
0.97
0.97
0.96
0.98
0.97
0.98
0.94
0.97
0.96
S.J
0
0
0
0
0
0
0
0
0
D.-/
.00
.01
.01
.01
.01
.01
.02
.01
.02
DEFF-^
1.44
1.38
0.81
1.05
1.01
1.05
2.84
1.01
1.04
                                                                                                (continued)

-------
                                 Table  1 7.9:  Statistics for Compounds with Detectable Levels in Cropland Soils by Cropping Region for Round  One
                                                                                    (continued)
I
oo
00
Cropping Region
Total RSN
Corn
Wheat & Small Grains
Cotton
Soybeans
General Fanning
Hay
Vegetables
Fruit or Nut Orchard
^&^=^=s^
.!/
523
271
26
0
102
89
17
16
1

M 2/
Max-
16,730
1.550
120
-
16,730
1,380
100
340
40

if
231.40
185.62
94.17
-
537.13
113.52
43.80
110.04
40.00

i*/
115.34
93.25
17.73
-
284.52
62.55
15.00
82.47
40.00

V
8.30
9.18
1.32
-
10.59
8.68
2.46
• 20.33
39.85

P(>HDL)-/

0.50
0.19
•
0.53
0.55
0.34
0.75
1.00

S.D.I/
0.50
0.03
0.12
"
0.05
0.05
0.12
0.10
0.0

DEFF^
0.02
1.18
2.46

0.98
1.04
1.11
0.87
1.00
                                   -Sample  size.


                                   -^Maximum amount  detected  (ppm).


                                   2'Weighted average  of the  data  values  in excess  of  the MDL  (ppm).


                                   -^Weighted average  of the  amount detected (ppm).


                                   ^Antiloge (weighted average of loge (amount +!)-!);  analogous  to  the geometric mean (Pp»).


                                   -'weighted proportion of cases  with data values  in excess of the MDL.




                                   - Standard deviation of the estimated proportion.


                                   - Design effect  for the estimated proportion.


                                     Source:  Computer files supplied by the EPA Field Studies  Branch, Washington, D.C.

-------
undoubtedly conservative estimates.  Thus, the interval of values within
two standard errors of the estimated proportions will provide a conserva-
tive 95 percent confidence interval estimate of the proportion of sampled
area where  levels of  the compound exceed  the  minimum detectable level
(MDL).

     The design effect is the ratio of the sample standard error to an
estimate of what  the standard error would  have  been if a simple random
sample of the same size had been used, i.e.

     DEFF =     Estimated S.E. (For the design used)
               Estimated S.E. (Simple Random Sample)


Alternatively, the design effect'can be thought of  as the ratio of the
actual sample size to  the sample  size that would  be required to obtain
an  estimate  with  the  same  standard  error  based  upon a  simple random
sample.   Generally  stratification decreases the  design  effect,  while
clustering increases it.  Thus, since the CNI stratification can be used
and there  is no  clustering  of  sample  sites in the RSN  sample, design
effects less one would be expected.  This would indicate that the design
produced  smaller  standard errors  than  would a  simple  random sample of
the same size.  Many of the design effects shown in Tables 1.7.7 through
1.7.9 are indeed less than one.  However, some design effects are substan-
tially greater than one.  It is, hence,  not clear that the CNI stratifi-
cation was  particularly  advantageous for estimation  of  proportions of
detections for toxic substance residues.

1.8     Capabilities for Performing Special Studies

     If it were possible to  completely fulfill  the  design of the Rural
Soils Network (RSN),  it would serve as an excellent vehicle for perform-
ing special studies.   With one-fourth of all sites  in each State being
sampled in each year, baseline levels of pesticide residue would soon be
established  for  all moderate size geographic  areas.  Data  needed  for
special studies of specific  pesticides  or  specific  areas  would  then be
readily available.

1.9     Toxic Substances Other Than Pesticides in Soils

     The NSMP  currently monitors  three  classes of  pesticides  in soil,
organchlorine pesticides  and trifluralin,  organophosphorus  pesticides,
and heavy metals.  Each of these classes are analyzed using methodology
specifically designed to provide optimum selectivity and sensitivity for
that class to the exclusion of others.

     Expanding the capability of the  soil  networks  in  monitoring for a
wide range of toxic substances will require the development of analytical
methodology to deal with the special characteristics of these substances
as well as those  of  the matrix.   A wide  range  of new techniques (e.g.,
high performance liquid chromatography,  mass spectroscopy, electrochemis-
try and  capillary gas  chromatography)  may  need  to be incorporated to
accomplish this purpose.   However, the  design and application of effec-
tively administered  QC/QA programs must  be concurrent with the develop-
ment of appropriate analytical methodology.

                                 -89-

-------
     Much of  the  necessary methodology is already available in the open
literature or in EPA and contracting laboratories.  Some may be directly
applicable to the  perceived needs of the NSMP,  and  others will require
some degree  of modification  to  account  for  differences in  either  the
analyte  or  matrix.  All  aspects  of the  methodology must  be evaluated
(i.e., sample collection and storage, analyte isolations and instrumental
analysis) and the  method appropriately validated in order  for the NSMP
to meet the needs of those who are using the analytical data.

     The working definition of "toxic substance" at present must include
virtually  any substance  manufactured   in or  imported  into  the  United
States.  Great care  must be exercised in decisions regarding the choice
of substances to  be  monitored by NSMP.   The  complexity and cost of the
required methodology increase directly with the number of substances and
matrices to be analyzed.  Thus, misjudgement can quickly lead to unneces-
sary or nonproductive expenditures of time and funds.

     There  are  two  major  methodological approaches to  the concurrent
analysis of a number of different substances.  The first approach is the
development of a "survey" method in which specimen components are separ-
ated only  to the  extent necessary to ensure  the  compatability of each
component with the analytical technique.  The resulting subset of speci-
mens are all for  the  analysis of such specimens; however, the overall
number of  specimens  requiring analysis is minimized.  Two such "survey"
methods  (Master  Scheme  for  the  Analysis of Organic  Compounds in Water
and A  Comprehensive  Method  for  the Analysis  of Volatile  Organisms  on
Solids,  Sediments  and  Sludges)  are currently being developed under EPA.
Development  of  a  truly  all-inclusive  "survey"  method may  be neither
possible nor practical as the present methods are limited to analysis of
organic compounds which are or can be made sufficiently volatile to pass
through a capillary gas chromatograph/mass spectrometer.

     An  alternate approach is  the  development  of  analytical  methods
optimized for a  specific substance or class of substances.  Each method
necessarily excludes  all substances except those of similar chemical and
physical  characteristics.   Monitoring  of a  large number  of different
substances  would  therefore  require  the  use  of a  number  of specific
analytical methods.

     Neither  approach  is  without its   disadvantages  and these must  be
weighted against  the goals of the monitoring network.  A basic philoso-
phy  must  be established   regarding  these  goals and  the methodology
approach which will best serve them over the long term.

1.10  Implementation Plan for a New Survey Design of the Rural Soils
      Network

     A specific implementation cannot be recommended at this time, since
a  specific  design option  has  not  been  recommended. ^.One observation
which can be  made is that  the transition period should cover one cycle
in the old  design, 4 years^3 Since, it is  not likely to be feasible to
investigate the entire  RSN,  nor  indeed is it necessary, a subset of the
old sites  may be used.  An advantageous  scheme may be to  link old and
new sites on the basis  of geographic proximity, and compare their obser-
vations over the transition period.

                                 -90-

-------
                   2.  EVALUATION OF CHEMICAL ANALYSIS
2.1       Objective

     The objective of this section is to conduct a limited review of the
current  analytical methodology  used in  the National  Soils Monitoring
Program  (NSMP)  in order  to  assess  the  quality and  reliability of the
data with  respect to meaningful  statistical  evaluation and statistical
survey design.

2.2       Discussion

     Data  compiled by NSMP  is generated by  the use  of complex multi-
residue analytical methodology,  and the quality of such data  is deter-
mined primarily  by the  limitations of the  methodology.  These limita-
tions are normally defined in terms of the precision,  accuracy and
minimum detectable level (MDL) of the analytical method for each speci-
fic analyte and specimen matrix.   A knowledge  of these  limitations  is
especially  important  to  potential users of  the analytical data  since
reported substance levels are merely estimates of the "actual" levels in
the matrix.   As estimates,  individual  values in the NSMP  data  file in
fact represent ranges (of values) in which the "actual" substance levels
are reasonably (i.e.,  with some high probability) expected to fall.  The
size of the range can be adequately described by the accuracy and preci-
sion of the analytical method; therefore,  a knowledge of these parameters
is required for meaningful evaluation of the data.  In addition,  the MDL
for the method  defines  (or should define) the  lowest level that can be
estimated with reasonable confidence, no analytical method being capable
of absolute detection down to zero concentrations.   This limit  must be
considered  in evaluating the practical versus  the  statistical signifi-
cance of  trace  levels  and  zeros reported in  the  data  file  (Hartwell
et al.  1979).

     The extensive manual of recommended analytical methodology has been
published  by  EPA-RTP   (USEPA.  1977) for  use  in routine  multiresidue
pesticide analysis.  The complexity of the sample matrices and pesticide
types  routinely  analyzed in  practice  requires  that  the  methodology
consist of a basic analytical procedure with a large number of modifica-
tions and ancillary techniques in order to cope with problems imposed by
widely divergent pesticide levels and interferences.   Each modification
or technique  produces a  specific effect on the  accuracy, precision and
MDLs of the overall analytical method, and,  hence, must be validated for
each pesticide  analyzed  by  the  method.   A  detailed knowledge of the
analytical  procedure is  therefore required in order  to  properly assess
the quality of the data generated by the procedure.

     An extensive set of recommended QC/QA procedures  has been published
by EPA-RTP  (USEPA. 1979)  in an  effort  to  control the  quality  of data
produced by analysts  and laboratories using  the multiresidue  pesticide
method.   Laboratories  adhering to these recommendations will necessarily
generate (through  controls,  blanks,  and SPRMs*) much of the information
needed to assess  accuracy, precision and MDLs for data  reported to the
NSMP.   Control and SPRM data  are not, however, compiled or summarized in
a.
 Special Pesticide Refrence Material.

                                  -91-

-------
a  single  document  (i.e.,  issued semiannually or  annually),  or entered
into the computer data file.  Thus, for all practical purposes, the data
are  lost  to the  potential data  file  users.   Reporting  of all control
data in the  computer file, along with results for soil specimens, would
allow the  data  quality to be determined according to the specific needs
of  the  individual  user  (e.g.,  for a particular  pesticide in  a given
geographic area or  over  a specified period  of time).   The  results of
duplicate  specimen  analysis  are apparently not reported in the computer
data file.   Again,  this  is  valuable information  lost  to computer file
users.

     RTI  has attempted  to  review the  analytical methodology  used to
generate  data  under the  NSMP and  compile  existing information  on the
current quality of  the  data (accuracy,  precision and  MDL)  in order to
make this  information  available to Program data file users.  The review
is  necessarily  limited by  the provisions outlined in  the  revised work
plan for this task.

     In the interest of clarity and accuracy, the RTI request for detail-
ed  information  on  analytical  procedures and data quality was  made in
written form.   A  questionnaire was submitted to William  G.  Mitchell of
the Toxicant Analysis Center, Bay St. Louis, Mississippi, the laboratory
currently responsible for carrying out chemical analysis under the NSMP.
The  cover letter  to Mr.  Mitchell and  the  questionnaire are  given in
Appendix A.  The questions were designed to provide detailed information
on  all areas  of current analytical methodology pertinent to the quality
of  the data  generated  by the method.  It was anticipated that extensive
verbal  follow-up  (telephone)  would be  required  to  obtain  additional
information  and  clarify  details.   The  initial  information  from the
laboratory has  been received  by RTI and evaluation  of the information
carried out.  The results are presented below.

2.2.1     Analytical Methodology

     The  NSMP  currently  reports  levels  in soil for  over thirty pesti-
cides and toxic substances (Table 2.1)  including several chemical classes
(i.e.,  organochlorine  and organophosphorous  pesticides, trifluralin and
heavy metals).   All analyses  are carried out at  the  Toxicant Analysis
Center (TAC) in Bay St.  Louis, Mississippi.   The analytical results for
each soil specimen are reported on a single form (Appendix B) along with
the specific location and date at which the specimen was taken.  Indivi-
dual pesticides and metals  detected in the  specimen  are  listed along
with their levels in fourteen blank spaces on the form.   Reporting units
(i.e.,  ppm,  ppb &  ppt)  are  specified  using a value  code following the
particular result.  Although there are spaces on the form for individual
soil characteristics such as pH,  % sand,  % silt,  % clay and  % organic
matter;  these   characteristics  are  not  currently  determined  for urban
soil specimens.  The % moisture content of each soil specimen is deter-
mined but  not  reported on the form.  The reported  results are, however,
corrected  for   %  moisture  (i.e.,  reported  on  the basis of  dry solid
weight).    An  important point  of confusion arises  from the use of the
term  chlordane  in  reporting  results.   The  term usually  corresponds
specifically to the level of y-chlordane in a  soil  specimen as this is
the most commonly found isomer.  However, when the  cr-isomer and t-nonachlor
                                 -92-

-------
       Table 2.1.  Pesticides and Toxic Compounds Analyzed Under NSMP
Organochlorines

Alachlor
Aldrin
BHC
Chlordane
DDTs
Dieldrin
DCPA
Dicofol
Endosulfan 1
Endosulfan II
Endosulfan Sulfate
Endrin
Endrin Ketone
Heptachlor
Heptachlor Epoxide
Hexa chlo robenzene
Isodrin
Lindane
Methoxychlor
PCBs
Propachlor
Toxapheae
Organophosphates

DEF
Diazinon
Ethion
Malathion
Phorate
Parathion, ethyl
Parathion, methyl
Ronnel
Trithion
Other
Trifluralin
Heavy Metals

Mercury
Cadmium
Lead
Arsenic
Source:  Toxicant Analysis Center (TAC),  USEPA,  Bay St.  Louis,  Mississippi.
                                 -93-

-------
(and presumably oxychlordane  since  it is not listed separately in Table
2.1) are also found, all levels are reported under the term chlordane as
their sum.   The term  "technical  chlordane" is inappropriate  as  one of
the major  components,  heptachlor,  is reported separately.   In order to
avoid confusion in the subsequent interpretation of the data, all individ-
ual  components  of  pesticide  mixtures  (e.g., chlordane,  BHC  and  PCB)
should  be  reported  as such.   Otherwise potentially valuable  data is
lost.

     Only  urban soil specimens are currently collected and  analyzed at
TAG; the  last rural  soil  specimens having been analyzed  in 1977.   The
soil specimens are  collected  by either EPA or a  contracting laboratory
as a pattern of 6-8 core specimens composited in a one quart, wide-mouth,
glass mason  jar with  a Teflon- or aluminum  foil-lined  cap.  Specimens
are  subsequently   shipped  to  TAG  at  ambient  temperature.   Specimens
received at TAG are refrigerated until they can be analyzed.   The speci-
men collection date, the date of receipt at TAG,  and the date of analysis
are all stated  on  the result form, and  thus,  these data are presumably
available to computer data file users.

     The analytical  methodology used in the  analysis of organochlorine
and organophosphorous  pesticides in  soil  specimens  is  essentially the
same as that  used  for the  analysis of  sediment specimens  under the
National Surface Water Monitoring Program (D. Lucas  et al.  1980).   The
analysis of pesticides in both matrices is performed at TAG.   The speci-
fic procedure for the extraction and Florisil clean-up of soil specimens
for analysis of organochlorine and organophosphorous pesticides is given
in  Appendix  C.   The procedure was  furnished  by  TAG as a  result of the
RTI questionnaire.   The levels  of  pesticides in  specimen extracts are
determined  using  essentially  the  same  gas  chromatographic techniques
applied to water and sediment (D. Lucas et al. 1980).   Although triflura-
lin is  a nitroaniline,  its chemical properties allow it  to  be analyzed
with the organochlorine pesticides.

     The  general   method   used  in  the  analysis  of  organochlorine  and
organophosphorous  pesticides  involves an initial  screening  of specimen
extracts on a primary GC system.  All positive results are then screened
on  a  secondary GC  system  that differs  from the primary  system in the
selectivity  of  either  the column  or the detector.   Continued positive
results may  then be confirmed through the use of  additional analytical
techniques depending on the degree of suspected difficulties from inter-
ference, contamination, or low levels (approaching the MDL).   The techni-
ques used in the application of this methodology to each pesticide group
are summarized in Table 2.2.

     Quantitation  of GC  results for  pesticides  is  carried  out using
external standard  procedures with single-point calibration.  Calibration
standard concentrations are adjusted  to give compound responses similar
to those in specimen extracts  in order to reduce the effects of detector
                                                             63
non-linearity.  The ECDs used in this program all possess an   Ni source.
     Actual recoveries of pesticides from soil specimens are not monitored
except  via  the  corresponding  recoveries  for  controls.   It  would be
extremely useful to  fortify  each specimen with a particular compound(s)


                                 -94-

-------
                     Table 2.2.  Procedures for the GC Analysis of Pesticides for the NSMP
,
 in
Compound Class
Organochlorine
Organophosphorus
Primary
analysis
GC/ECD on
OV-1
GC/FPD on
OV-1
Secondary
analysis
GC/ECD on
OV-210
GC/FPD on
1.5% OV-1/
1.95% OV-210
Confirmation
techniques
GC/HECD
GC/ECD on
1.5% OV-1/
1.95% OV-210
GC/NPD
GC/FPD-S mode
GC/ECD
Additional
comments
Every 10th soil
analysis dupli-
cated
Every 10th soil
analysis dupli-
cated
            Source:  Toxicant Analysis Center  (TAC), USEPA, Bay St. Louis, Mississippi.

-------
prior  to  extraction and  clean-up  and  thereby  monitor any  anomalous
behavior  in  the extraction,  clean-up and GC  injection  procedures which
may occur  from  time to  time for a  particular specimen.  This technique
was used in the National Human Monitoring Program for analysis of organo-
chlorine  pesticides in  adipose tissue.   Aldrin, which  is  seldom found,
was spiked into fat specimens and then analyzed as though it were endo-
genous.   The  internal  standard  quantitation  method  is  an  alternate
procedure for normalizing recoveries and was briefly examined at Versar,
Inc. for the analysis of s-triazines in water and sediment specimens (D.
Lucas  et  al.  1980).  Promazine was used as  the internal standard and
preliminary work showed promise (Bob Martin, Versar, Inc.).

     All  GC methodology used in the NSMP utilizes packed-column techni-
ques.  The improved resolution and sensitivity that could be  obtained by
incorporating state-of-the-art capillary GC techniques would  considerably
increase the utility of the method and reduce the need for confirmation.
This is particularly evident in the analysis of PCBs where each indivi-
dual designation  (i.e., Arochlor  1242,  Arochlor 1254 and Arochlor 1260)
actually represents a complex mixture of partially chlorinated biphenyls
(e.g., tri-,  tetra-, penta- and hexachlorobiphenyl) and their respective
isomers.  Considerable overlap  exists  between the components present in
each PCB.  For instance,  Arochlor  1242  contains  di-  to hexachlorinated
biphenyls  whereas  Arochlor   1254   contains  tetra- to  heptachlorinated
biphenyls.  There is currently no reason to expect the individual compo-
nents  to  possess the same degree of stability or  toxicity.   Thus, the
original  pattern  of components and their  relative  amounts  may  not be
preserved in  complex environmental  and biological matrices.   Yet, it is
on  the  basis  of the standard peak pattern that the presence  of PCBs and
their  levels  are currently  determined.   Further, as  was  shown  in the
analysis  of  fat specimens under  the National  Human  Monitoring Program
(R.M. Lucas et al.  1980),  FCB components can interfere with the analysis
of  some chlorinated pesticides (e.g.,  p,p'-DDT,  t-nonachlor  and hepta-
chlor  epoxide)  at  sufficiently high levels.   The  degree  of resolution
that can  now  be achieved  by capillary GC  techniques  is demonstrated by
the chromatogram of Arochlor 1242 and 1260 in Figure 2.1.   These Aroch-
lors cover nearly the entire range of PCB components (monochlorobiphenyls
to  octachloropiphenyls and their  isomers) and  yield  over  80 individual
peaks  by  this method.  Typical packed-column  GC  techniques  yield less
than 15 peaks for these mixtures.

     In  general,  the  analysis of  heavy metals  in soil specimens was
carried out  using  flame  atomic absorption  (AA)  techniques for cadmium,
lead  and  arsenic(T.J. Forehand et at,   1976),  and  the cold  vapor AA
technique for mercury.  More specific information on the current methodol-
ogy was  unavailable for  two reasons.    First,  soil specimens  have not
been analyzed for heavy metals since 1979.  In view of recent instrumental
acquisitions   (i.e., graphite  furnace  and  Zeeman  AA) coupled  with the
continual refinement of AA procedures  in the ongoing analysis of other
matrices (e.g., blood, urine, etc.) at TAC; it is likely that the original
procedures (for soil) will  undergo  substantial modification  when the
analysis of soil specimens is resumed.   Second, the individual responsible
for  the most  recent analysis  of  soil  specimens  (1979)  is  no longer
employed at TAC, and thus  detailed information concerning the methodology
and control  data is  not  readily available.   It has been necessary to
                                 -96-

-------
                 Ji
JU
jy
                                                                          hrs.
        Figure  2.1  Capillary GC/ECD Chromatogram of Arochlor 1242 and Arochlor 1260

                   ( M.2 ng total): 48m x 0.25mm id capillary with O.lp Apeizon L on

                                    persilylated pyrex, 1.5mL/min. helium, 150°-290°
                                    @l°/min., BCD @ 128 x 10.
Source:   Toxicant Analysis Center (TAG),  USEPA,  Bay  St.  Louis,  Mississippi.

-------
obtain  such  information directly  from the  analyst  for virtually every
analytical  method  used in  the  five National  Monitoring Programs;  a
situation  which  further  demonstrates  the  need  for  centrally  located
documentation of  all  methodology used in a monitoring program.  In view
of the lack of available data on the current analysis of metals in soil,
the  quality  of the  analytical  data cannot  be determined  at  this time.

2.2.2     QC/QA

     For organochlorine-organophosphorus pesticides  in soil,  individual
specimens  are analyzed  in  sets  of  10-15  with  each  set  containing a
method blank  (reagent blank) and a control.  The  control  consists of a
fortified  soil  specimen (SPRM), which  is  generated  internally.   Checks
are also run on the elution pattern of pesticides from Florisil columns.

     Information  regarding the  primary and secondary analytical techni-
ques and  confirmation techniques  that may  be  used in the analysis of
soil specimens  has been summarized in  Table 2.2.   Decisions  concerning
the  adequacy  of  the  clean-up procedure (if used),  the validity of the
standards and controls, and the confirmation techniques used are reviewed
by  the  supervisor  (i.e.,  William Mitchell) and  the TAG  QC/QA  officer
(Dr. Joe Yonan).

2.2.3     Accuracy and Precision

     Information  on the accuracy and precision of an  analytical method
is required in  order to define the relationship between the  analytical
result  (estimate)  and the  "actual"  analyte  level  in  the  specimen.
Although this information  may be produced as part of the initial method
development and validation,  it  is by itself insufficient as the charac-
teristics of the method can (and frequently do) change with time, analyte
level,  and matrix.   Environmental matrices are particularly complex and
variable.

     The replicate analysis  of  a specimen  containing  a known level of
analyte  (i.e.,  a  control)  over  a period  of  time  can provide useful
information about the  accuracy  and  precision of  the method,  and the
method stability.  RTI  has attempted  to compile such information, where
it is available,  for each  pesticide and toxic substance listed in Table
2.1.

     The accuracy of the analytical methodology is probably best reflect-
ed in  the  recovery of  the analyte.   This is particularly  true  for the
toxic substances  monitored in the NSMP since the analytical results are
not corrected for losses during workup (i.e., recoveries).   The available
recovery data for analysis  of toxic substances in soil is given in Table
2.3.  These values represent averages  over a period  of several months
and are therefore more  useful as general indications of method accuracy
than a  corresponding single value.  Unfortunately,  this  information is
far from complete with  respect  to the number of toxic substances listed
(Table  2.3) versus  the  number analyzed (Table 2.1).   Recovery data must
be generated  for  all  substances analyzed.   While a  single  value may be
found to hold for a number  of similar substances,
                                 -98-

-------
                 Table 2.3.  Average Recoveries for Some
                   Organochlorine Pesticides from Soil
Pesticide
•y-Chlordane
o,p'-DDE
p,p'-DDE
p.p'-DDD
o,p'-DDT
p,p'-DDT
Dieldrin
Aldrin
Heptachlor
Heptachlor
Epoxide
Endrin
Fortification
level (ppb)
60
90
75
150
240
240
90
30
30
60
120
Average
% recovery
80
81
84
87
84
88
80
127
80
80
89
Average
error
-20%
-19%
-16%
-13%
-16%
-12%
-20%
+27%
-20%
-20%
-11%
Reported
MDL (ppb)
10
20
20
20
20
20
20
10
10
10
20
Source:  Toxicant Analysis Center (TAG), USEPA, Bay St. Louis, Mississippi.
                                 -99-

-------
the  similarity must  be demonstrated  (and  the substances  thus  grouped
must be specified) and does not obviate the need for subsequent monitor-
ing via controls.

     Available  information on the  analytical precision  for  toxic sub-
stances in soil is  given in Table  2.4 in terms of  the  coefficient of
variation  (CV).  The CV is calculated as follows:

     cv _  std. deviation x 100 = % reiative standard deviation.
              mean
As with recoveries, data was available for only a small number of organo-
chlorine  pesticides and  the  fortification  levels were  variable  (3-12
times the HDL).

     No indication of specific interferences between pesticides has been
given.  This is particularly interesting since PCB levels greater than 1
ppm were  found to significantly interfere in the  GC/ECD analysis  of p,
p'-DDT, t-nonachlor and heptachlor epoxide in human fat (RH Lucas et al.
1980).   The  GC  methodology used  for the  analysis of  organochlorine
pesticides in water and sediment does not appear to differ significantly
from that  for  human adipose tissue.  Thus the ubiquitous nature of PCBs
would  be  expected  to  cause  interference  problems  regardless of  the
specimen  matrix.   The  use of high  resolution  capillary  GC  techniques
would contribute  significantly to  the elimination of such difficulties,
as well as increase the sensitivity of GC/MS as  a highly specific confir-
mation procedure.

2.2.4     Minimum Detectable Levels

     All analytical techniques are characterized by an inherent limit of
sensitivity  below  which  the  technique cannot   reliably  discern  the
presence or absence of a particular component.  Thus the procedures used
in the analysis of soil specimens must be similarly characterized by a
minimum  amount of  specific analytes  which  produce a  signal  response
statistically  discernible  from background.    This  analyte concentration
is defined as the minimum detectable level (MDL) and is important in the
assessment of  the analytical  data  since concentrations  reported  below
the MDLs lack validity and must be considered unreliable.

     The MDLs  associated with  the  GC analysis of specific pesticides in
soil specimens  are a function of instrumental  operating  conditions  and
the amount of  background  introduced by residual matrix material in  the
injected specimen extract.   Tentative detection  limits have been establ-
ished by TAC  and are  shown  in  Table 2.5  for the  organochlorine  and
organophosphorous  pesticides.

     The MDL  corresponds  to the  amount of  analyte producing  a  signal
equal to 5%  of full scale deflection with a  maximum of 1% noise (signal
to noise ratio =5:1).   In cases where the chromatographic background is
significant,  the  MDL  is  taken as  that amount  of analyte producing  a
signal equal  to  twice the  noise  level  in   the vicinity of the  peak.
                                 -100-

-------
              Table 2.4.  Precision for Some Organochlorine
                           Pesticides in Soil
Pesticide
y-Chlordane
o,p'-DDE
p,p'-DDE
p,p'-DDD
o,p'-DDT
p,p'-DDT
Oieldrin
Aldrin
Heptachlor
Heptachlor Epoxide (HE)
Endrin
Fortification
level (ppb)
60
90
75
150
240
240
90
30
30
60
120
CV
3%
2%
3%
3%
2%
4%
2%
10%
3%
4%
5%
Source:  Toxicant Analysis Center (TAG), USEPA,  Bay St.  Louis, Mississippi.
                                 -101-

-------
           Table 2.5.  Detection Limits of Pesticides in Soils



          Compound                           Detection limit

Organochlorine pesticides

     Early eluting                                10 ppb
       (BHC's, Aldrin, Heptachlor
        Epoxide, Chlordane)

     Late eluting                         .        20 ppb
       (DDTs, Dieldrin, Endrin)

     All multicomponent pesticides                50 ppb

Organophophorous pesticides                    10-50 ppb



Source:  Toxicant Analysis Center (TAC), USEPA, Bay St.  Louis, Mississippi.
                                 -102-

-------
Any response lower than the HDL is reported as not detected (ND).  There
can  be  significant variation  in  analytical  sensitivity,' even  among
specimens of the same typ^.  Consequently, the reported MDLs are typical
or  expected levels  realized for  the  majority (75-80%)  of  specimens.

2.3  Fate of Pesticides in Soil

     After  the  application of pesticides to  agricultural  land  a number
of processes may occur which lead to its transport in the environment or
its removal by chemical or biological degradation.   Both of these mechan-
isms depends to a large extent on the chemical structure of the pesticide
and to  a lesser extent on the  soil type, clay, clay  loam,  sandy loam,
sand etc.

     Chlorinated hydrocarbons have a well earned reputation for persist-
ence.   Kearney, Nash  and  Isensee has compared persistence of pesticides
within each  general pesticide type and gives the persistence of chlori-
nated  hydrocarbons  varies  from 5  years  for  chlordane  to 2 years  for
heptachlor  and  aldrin.   The persistence  of phosphate  insecticides  is
measured  in weeks  by contrast.   Diazinon persists  for some  12  weeks
compared to  Malathion  and Parathion which persist for only a few weeks.
Trichloroacetic  acid  persists  for  12  weeks  compared to  2  weeks  for
Barban.   Intermediate  between  the  two  extremes  are  a  wide  range  of
herbicides.  The  urea, triazine  and pieloram herbicides range  from  3
months  for  Prometryne  to  18  months for  Pieloram and  Propazine.   The
benezoic acid  and amide  herbicides range from 2  to  12  months and  the
phenoxy,  toluidine  and nitrile herbicides  range  from  1  to  6  months
persistence.

     The migration of  pesticides  in soils is again very closely related
to  its  chemical structure.   Such  factors as  water solubility  and  the
absorption  on  soil  particles affect their migration  compounds such as
Aldecarb have migrated through the soil over burdened to shallow aquifers.
Halogenated  hydrocarbon pesticides such as  benezene  hexachloride which
has a low water solubility remains entirely in the upper soil layer (2).

     The effect of  soil type on the fate of pesticides has been studied
by a number  of  workers.   In very general terms absorption is greater on
clays   than  on  sand.    The  organic  content  of the  soil also  affects
adsorption (3).

     (1)  Kearney, P.  C.,  R. G.  Nash and A.  R. Isensee 1979.   Persistence
          of Pesticide  Residues in  Soils.   In M.  W.  Miller and  G.  G.
          Berg  (ed).   Chemical  Fallout, Current research on  persistent
          pesticides,   Charles  C.  Thomas,  Publisher,  Springfield,  111.

     (2)  Kawahara,  T.  M., Matsui and H. Nakamura,  "BHC in Soil of Paddy
          Field" Bull.  Agric. Chem. Inspec.  Stn. 12:42-45 (1972).

     (3)  Bristow,  P.  R.,  J.  Katan  and J.  L. Lockwood.  "Control  of
          Rhizoctoria  solani by Pentachloronitrobenzene Accumulated from
          Soil  by Bean Plants,"  Phytopathology 63:808-813(1973).
                                 -103-

-------
2.4  Recommendations

     Limited review  of analytical methodology  used in the NSMP  and an
attempt to  compile  data  for the average accuracy, precision,  and MDL in
soil for  each toxic  substance  monitored under  this  program provide  a
basis for the following recommendations:

     1.   Accuracy  (that is,  recoveries)  and  precision  data  must  be
          generated for all  pesticides  monitored in the NSMP.   The data
          should be generated at  two different levels (e.g.,  at the MDL
          and at ten  times  the  MDL).  The results for controls  analyzed
          with each set of specimens  would be the best means of  providing
          this information  since  it  is necessary that control  data be
          made accessible  to computer data  file users  in any  event.
          Controls must  be  run with each  set  of specimens and  should
          consist of a blank (unfortified soil free from the analytes of
          interest) and  two  fortified blanks (one fortified at  the MDL
          and another at ten times the MDL).  The analytical results for
          the controls should be reported on a separate form (especially
          designed for  control  data)  and encoded such that there  is  a
          one-to-one  association  with  the  particular set  of  specimens
          with which they  were  analyzed.   The  encoding  should  allow
          later computer  retrieval  of  control  data for  any  particular
          specimen set or group of sets  (for example,  geographic area,
          over a  specified  period of time, or  for a  particular  pesti-
          cide).   The availability of this  information in a retrievable
          form to data file  users would provide the means for assessing
          data reliability now lacking.  Further, any duplicate  specimen
          analyses must  be  reported  in the computer data  file  as  they
          provide  the best  means of  assessing  method  precision  on  a
          continuous  basis.   Duplicate  results must  be  specifically
          encoded such  that  they are retrieved  as  a group (e.g.,  all
          duplicates for a particular matrix and pesticide over  a speci-
          fied period  of time)  as well as with the  initial  analytical
          results for  the  specimen.   The need  to make  routine  control
          data available to  program data file users cannot be  overempha-
          sized.   This does  not  preclude the use of specialized  controls
          (e.g.,  SPRMS,); however, these results should also be  included
          in the  computer file encoded to allow facile retrieval both as
          a group and with their particular specimen set.

     2.   The pesticides  included  on  the routine monitoring list must be
          reviewed  on a  regular  basis  and  appropriate deletions  or
          additions made.  Specifically,  the need for  routine  analysis
          of organophosphorous pesticides in  soil should  be reviewed as
          this class of compounds  is  known to be unstable  and  has  seldom
          been reported  in either soil or sediment.  Once  the  baseline
          has been established  for   such compunds,  three  choices  are
          possible:    1)  cease  to analyze  for  the  compound(s)  except
          under  special  circumstances (e.g.,  after a chemical  spill or
          when contamination is  suspected from a recent application); 2)
          analyze for the compound(s) on a more infrequent basis;  and 3)
          concentrate efforts on the  analysis  of degradation products of
          known  toxicity  where  these  exist.   Decisions  concerning  the


                                 -104-

-------
     analysis of toxic substances under the NSMP should be based on
     information generated in  other  agency data files (e.g.,  USDA,
     USGS, etc.) as well as data generated within EPA.

3.   Soil  specimens  should  be  characterized  as  to the  percent
     carbon or percent inogranic residue.   This information must be
     included on  the  report  form (along with  moisture  content)  as
     part  of  the specimen characterization (source).  Significant
     trends may otherwise  be  missed  with respect to  the  soil type
     and  its  effects  on toxic  substance  accumulation,  degradation
     and transport.

4.   Control  specimens  (in  the matrix   of   interest)  should  be
     included with any specimens either stored for extended periods
     or shipped to another site for  analysis.   This is particularly
     important for  toxic compounds which  are known to be unstable;
     i.e.,  organophophorous   pesticides.    The  results  of  these
     "storage controls" must  also be included in the computer data
     file with appropriate encoding  for specific retrieval.

5.   Analytical methodology should be  updated to include state-of-
     the-art capillary GC  techniques.   This would provide a higher
     degree of  confidence  in  the  resuling data  through increased
     resolution  and  sensitivity.   The  use  of higher  resolution
     analytical  techniques  is  a move  toward the  quantitation  of
     PCBs  (and  technical chlordane)  as their  individual  isomers.
     This  approach  is far more  useful than the present  method  of
     attempting  to  identify   patterns  and  averaging  components,
     since the toxicity and biodegradation of the individual isomers
     are not identical.

6.   The pesticide recoveries  should  be monitored for each specimen
     analyzed by initial fortification  of the specimen with appro-
     priate compound(s).   Subsequent  analysis  of the compound  level
     should  enable  comparison  of   data   between  specimens   with
     increased confidence that  anomalous  results will be detected.
     The  use  of  internal  standard  quantitation techniques  would
     normalize   recoveries   between   specimens   and  should   be
     considered.

7.   Detailed information  on  all analytical procdures  under  under
     the NSMP should  be  documented  in one  source.  The  procedures
     must then be maintained  current  with ongoing improvements and
     modifications  made   by   the analytical   laboratories.   Such
     updating  requires  both   flexibility  and regular  review  by
     program management.
                            -105-

-------
                               References
(Analytical Section)

Hartwell ID, Piserchia P, White SB et al.  1979.   Analysis of EPA
     Pesticides Monitoring Networks.  Washington, DC:   U.S.  Environmental
     Protection Agency.  EPA-560/13-79-014.

USEPA. 1977.  U.S. Environmental Protection Agency.   Manual  of Analytical
     Methods  for  the  Analysis  of  Pesticide   Residues  in  Human  and
     Environmental  Samples.   Revised June  1977  under EPA  Contract No.
     68-02-2474.

USEPA. 1979.  U.S. Environmental Protection Agency.   Manual  for Analytical
     Quality Control  for Pesticides and Related Compounds  in Human and
     Environmental Samples.  Revised January 1979 under EPA  Contract No.
     68-02-2474.

Lucas D,  Mason RE, Rosenzweig M et al. 1980.   Recommendations  for the
     National  Surface  Water Monitoring Program:  Report  Two.  Research
     Triangle     Park,     NC:      Research     Triangle     Institute.
     RTI/1864/14/01-02I.

Lucas RM,  Rosenzweig  MS, William  SR  et  al.  1980.   Evaluation of and
     Alternate  Designs  for National Human Monitoring  Program's  Adipose
     Tissue  Survey.   Research Triangle  Park,  NC:   Research  Triangle
     Institute.  RTI/1864/14/02-2I.

Forehand TJ, Dupuy  AE,  Tai H. 1976.  Determination of  arsenic in  sandy
     soils.  Analytical Chemistry 48(7): 999-1001.
                                 -106-

-------
                APPENDIX A




Questionnaire on Chemical Analysis of Soil

-------
RESEARCH  TRIANGLE   INSTITUTE
PO»T  OFFICE  BOX  I 2 1 t 4                                          —-_|
RESEARCH  TRIANGLE  PARK.  NORTH  CAROLINA  17709           I'

CHEMISTRY AND LIFE SCIENCES CROUP                     October  28,  1980


        Mr.  William Mitchell
        Toxicant Analysis  Center
        US Environmental Protection  Agency
        1105,  NSTL
        NSTL Station,  Miss.   39529

        Dear Mr.  Mitchell:

             The Research  Triangle Institute (RTI), under  contract with  the
        Environmental  Protection Agency (EPA),  is  conducting an assessment of
        the  five National  Pesticide  Monitoring  Programs.   The Statistical  Sciences
        Group at RTI has been analyzing the  data generated by the Network  Programs
        and  is responsible for  conducting  this  review.   The  Chemistry and  Life
        Sciences Group is  assuming a supportive role  in  this effort.

             We have been  asked to review  the current analytical methodology
        being used and to  evaluate the  quality  of  data being generated in  each
        Monitoring Program.   The main objective of this  review is not to criticize
        or find fault  with the  laboratories  involved  in  these programs but to
        identify the strengths  and limitations  inherent  in the analytical  methodo-
        logy.   It is important  to  define the state-of-the-art as it is practiced
        by participating laboratories and  to establish reliability factors for
        the  reported data.  The statisticians are  particularly interested  in
        assessing measurement error  and in developing the  best means for document-
        ing  estimates  of accuracy.

             We have prepared a list of questions  relating to different  aspects
        of the analytical  procedure.  Some questions  are concerned with  procedural
        matters and others  are 'directed toward  defining  the  scope of the methodo-
        logy.   We hope you will assist  us  by responding  to these queries and by
        suggesting possible approaches  or  solutions to the issues mentioned
        above.

             Since this evaluation must be based to some extent on your  experience
        and  view of the capabilities of the  method, your cooperation is  essential
        to the success of  this  evaluation.   Your prompt  response would be  greatly
        appreciated.

             Thank you.

                                                Sincerely,
                                                John W. Hines, Ph.D.
                                                Chemist

       JWH/lfo

                                         A-l
  (91»|   841-6000       TROM    RALEIGH.    DURHAM    AND    CHAPEL    HILL

-------
                 National Soil Monitoring Program

                 -Analytical Methodology Issues-


1.   It is presumed that current laboratory procedures follow a  written
     analytical protocol.  Please furnish a detailed copy of current
     laboratory protocol along with its source (e.g.,  EPA Manual of
     Analytical Methods).  Include information on sample storage conditions
     (i.e., time, temperature) compositing) prior to analysis.   Also
     include information on any procedural modifications required due  to
     individual matrix or sample characteristics (e.g.,  emulsions,
     interferences or specific analytical requirements which might
     preclude the necessity for performing certain operations).
                                  A-2

-------
2.   The following list represents compounds which have been monitored
     under the National Soil Monitoring Program.   Please indicate  which
     components are currently monitored on a routine basis,  and which
     are no longer monitored or are only monitored under special circum-
     stances (e.g., by request, in samples from particular geographic
     areas, in particular types of samples).
     Organochlorines        Organophosphates        Heavy Metals
     Alachlor                 DEF                      Mercury
     Aldrin                   Diazinon                 Cadmium
     BHC                      Ethion                   Lead
     Chlordane                Malathion                Arsenic
     DDTs                     Phorate
     Dieldrin                 Parathion, ethyl
     DCPA   .                 Parathion, methyl
     Dicofol                  Ronnel
     Endosulfan I             Trithion
     Endosulfan II
     Endosulfan Sulfate
     Endrin
     Endrin Ketone          Other
     Heptachlor               Trifluralin
     Heptachlor Epoxide
     Hexachlorobenzene
     Isodrin
     Lindane
     Methoxychlor
     PCBs
     Propachlor
     Toxaphene
                               A-3

-------
3.   Which of the above analytes are never or very seldom found (<1%
     analyses) in general soil samples?
                                  A-4

-------
4.   Do certain individuals perform specific aspects of the program
     (e.g., organophospate assays, data interpretation, QA/QC assess-
     ments)?
                                  A-5

-------
5.   Are there "decision points" in your procedure  where  judgement  is
     used in selecting procedural alternatives  (e.g.,  column  cleanup,
     choice of GC conditions,  data interpretation)?
                                  A-6

-------
6.   Describe your daily calibration and QC procedures (standards,
     spiked samples, blanks, other).  Please indicate how many of each
     type of control sample are used with each sample set and their
     concentration levels (typical levels for standards,  spiked samples),
                                  A-7

-------
7.   Describe any additional  QC/QA procedures which are part of your
     protocol (duplicate or split analysis,  confirmatory analysis, use
     of multiple GC columns,  interlaboratory programs, other).  Please
     indicate how often these procedures  are used.
                                  A-8

-------
What is the sample concentration range analyzed by direct injection
on the GC, AA, etc (i.e., before further concentration or dilution
becomes necessary)?  Please indicate the method of reporting
results at various analyte levels (i.e., above and below limit for
quantitation, below limit of detection).
                             A-9

-------
What are the estimates of the minimum quantitatable level (MQL) of
individual analytes in real samples?  How are they determined and
to what extent does the sample matrix affect these values?  What is
the criterion used in reporting a specific analyte as "not detected"
and in what manner are these results reported (zero, not detected,
less than a certain value, less than the MQL)?  Is the lower limit
of quantitation different from the instrumental limit of detection?
If so, what is their relationship?
                              A-10

-------
10.  What is your estimate of the analytical precision  associated with
     each component and the dependence  of  this parameter  on  the  analyte
     concentration in the sample?  How  is  precision  estimated?   If
     available,  please give the precision  for analysis  of replicate
     SPRMs or similar controls over a period of  time for  each analyte.
                                  A-ll

-------
11.  What is your estimate of the analytical  accuracy associated with
     each component and the dependence of this  paramter  on  the  analyte
     concentration in the sample?  How is accuracy estimated?
                                  A-12

-------
12.  What is the analyte recovery during sample workup and is the
     reported concentration corrected for recovery?
                                   A-13

-------
13.  What method(s) do you use for qualitative analysis of the data?
                                  A-14

-------
14.  What method(s) do you use for quantitative analysis of the data?
                                 A-15

-------
15.  What suggestions do you have for quantitating measurement error and
     documenting this information?
                                  A-16

-------
16.  What suggestions do you have for making the  Monitoring Program more
     efficient and meaningful (e.g.,  analytical modifications,  choice of
     analytes for analysis,  cost effectiveness)?
                                  A-17

-------
17.  What are the number of person-hours  (and  costs, if possible) allocated
     for the sample workup, sample  analysis, and data  interpretation
     aspects of this program based  on  a set of samples?  How many samples
     per set?
                                  A-18

-------
           APPENDIX B

National Soil Monitoring Program
 Pesticide Analysis Report Form

-------
                                                      Table  1.3
SECTION 1. SAMPLE IDENTIFICATION DATA	

                              DATE RECEIVED AT LAB
       i   t
                                                                    PESTICIDE ANALYSIS WORKSHEET
                                   SECTION 2. SAMPLING DATA (Tu Of complei
                                                                              npling I
 SAMPLED BY M*f'icy «"d '«' "'"">•'
 DATE SAMPLED
                                                    SITE
                                                                                                    STATION/SITE NUMBER

             16
                STATE
                               17
                                 18
                                    COUNTY OR REGION
                                                                                          19
                                                                                             20
                SVSTEM
J2r33 INS S NATIONAL SOILS MONITORING
     I NE» NATIONAL ESTUARINE MONITORING

     NW n NATIONAL WATER MONITORING
                                                            MATERIAL
                                                                                       34
                                                                                          35
                                                                                             36
                                                                                               21
                                                                                                  22
                                                                                                    23
                                                                                                            26
                                                                                                                    29
                                                                                                                      30
                                                                                                                         31
                                                                                                 CROP NUMBER (If applicable I
                                                PESTICIDES USED (deck or Ifectfy)
   2.4-0
   ALDRIN
                     CHLOROANE
                     DOT
                                       OIELORIN
                                       ENDRIN
                                                         MALATHIUN
                                                         PARATHION
                                                                           TRIFLURALIN
  I ATRAZINE
                     DIAZINON
                                       HEPTACHLOR
                                                         TOXAPHENE
  AMPLING REMARKS
                                        SECTIONS. SPECIFIC SAMPLE CHARACTERISTICS
                                              (Code)   38 39 40 41 42 43
 JATE ANALYSIS COMPLETED:
                                              SECTION 4. RESIDUES DETECTED
   10
               PESTICIDE
                                    21
                                     CODE
                                    11
                                    31
                                    4142
                                    51
                                    61
                                       12
                                      22
                                      32
                                       52
                                       62
                                         13
                                         23
                                         33
                                         43
                                                AMOUNT
14fisTl6|l7|iall9


_1    ill
24 2Sl26 27 28 29
     i  I
                                                           20
                                                      28 29! 30
                                         S3
                                         63
     _
34|'3i°l36 37*38 39140
                                            54
                                            64
                                              4SI46|47i48
                                              55
                                              65
                                                 56,57
     66167 68


      Jl
                                                      SB
                                                         59
                                                         69
                                                           SO
                                                           60
                                                            70
                                                              9110
                                                                             PESTICIDE
                                                                                                    CODE
                                                                                                  21|22 23
                                                                                                  31 32 33
                                                                                                  4142
                                                                                                  51
                                                                                                  61
                                                                                                     52
                                                                                                     62
                                                                                                       13
                                                                                                       43
                                                                                                       53
                                                                                                       63
                                                                                                              AMOUNT
                                                                                                          14
                                                                                                          34
                                                                                                             IS
                                                                                                             35
                                                                   16|l7



                                                                   ~
                                                                                                               36; 37
                                                                                                          44145146147
                                                                                                          34
                                                                                                          64
                                                                                                             55
                                                                                                             65
                                                                                                               56157
                                                                                                               66
                                                                                                                  67
                                                                                                                    18
                                                                                                                    28
                                                                                                                    38
                                                                                                                    48
                                                                                                                       19
                                                                                                                       29
                                                                                                                          20
                                                                                                                          30
                                                                                                                     58
                                                                                                                    68
                                                                                                                       39
                                                                                                                       •9 SO
                                                                                                                       59 «
                                                                                                                       69
                                                                                                                          70
                                    71
                                       72
                                         73
                                            74
                                              7S
                                                 76 77
                                                      78
                                                         79
                                                           80
                                                      71|72
                                                                                                       73
                                                                                                          74
                                                                                                             75
                                                                                                               76'77
                                                                                                                     78
 . M • P.P M. (Jffanlii. B • P P B.
                               wliiile bmlv. wci weight :~r * P.P.T.
 REMARKS
DATE
^ANALYST'S














  PA Form 8550-2 >R.. 2-75)
                                                  PneviOUl EOlTICNgl OBSOLET
                                                                  B-l

-------
                 APPENDIX C

Analytical Methodology for Organochlorine and
Organophosphorous Pesticides and Trifluralin

-------
                            Attached Methods








4.1  Extraction-Soil and Sediment



1.   Weigh a  100  g  specimen in a 500 ml  Erlenmeyer flask and add 25 ml




     of distilled water.



2.   Add 50  ml of nanograde  acetone  and  place a teflon  stopper  in the



     flask.  Shake specimen  for  % hour.   Add 150 ml of nanograde hexane




     and continue shaking for \\ hours more.



3.   Decant  specimen into  a  500 ml  separatory funnel through  hexane-




     washed glasswool that has been baked  at 350°C.



4.   Wash  the  specimen  3 times with separate 100 ml portions of hexane-



     washed water.  Discard the water (bottom layer) each time.



5.   Pour  the  extract through a  filter tube containing glasswool  and a



     1-inch layer of sodium sulfate that  has been  oven baked at 350°C.




     The filtrate is collected in a screw-capped test tube.



6.   Store specimens in refrigerator  until  ready for use.  The filtrate



     collected in step  5 is analyzed, without  cleanup,  for organophos-



     phorus pesticides.   Florisil  cleanup is necessary for detection of



     organochlorine  pesticides on the electron capture type of detector.



7.   The moisture content  of  each specimen  is determined by placing 100



     g of  soil sample in an oven at  125°C  for 24 hours and then noting




     the weight loss of  the sample.








Notes:



1.   Run a solvent check with each group  of  specimens.



2.   Run a fortified specimen with each group.   The fortification proce-



     dure  is   as  follows:   Pipet  1.0 ml of  the  organochlorine  "Soil




     Fortification Standard" A or 3.01 of  a  1:3 dilution of "Soil

-------
     Fortification Standard" A into  100  g of soil  or sediment  specimen.




     Pipet 3  ml  of the  organophosphate  "Soil Fortification  Standard"




     into the  same  specimen.  Mix  the standards with the specimen  and



     allow to  stand overnight.  The  specimen is then extracted by  the




     above procedure.



3.   Dry weight = weight  of specimen after heating  overnight  at  125°C.
                                  C-2

-------
A.   Florisil Cleanup Procedure



     1.   Quantitatively transfer  the specimen extract onto  the  top of



          the column and collect  the elution from the column into a 250




          ml flask.



     2.   When  the  sample  extract  drains  down to the top  of the upper



          layer of  Na.SO,,  add  100 ml of  a mixture  consisting  of 10%




          methylene chloride in hexane and continue collecting until the



          liquid level  reaches  the  upper  Na.SO, layer.  This elution is



          labeled the "first fraction."



     3.   Replace the  first 250 ml flask with a second  flask and then



          add 100 ml of 100% nanograde methylene chloride to the Florisil



          column.    Continue  collecting  the  elution  until  the  column



          drains dry.  Label the eluted portion, "fraction two."



     4.   To each flask add 1.0 ml of 0.01% Nujol (in hexane) and 3 to 4



          glass beads.   Attach a 3-ball  Snyder column  and place  on  a



          steam bath  or hotplate.   Concentrate to  ca 5 ml.   Add 50 ml



          nanograde hexane  and concentrate  to  about  5 ml.   Repeat the



          last concentration step once more.  This will remove essential-



          ly all methylene  chloride.



     5.   Pour 5 ml  of  hexane through the top of the Snyder column (for




     rinsing) and collect in the flask.



     6.   Transfer specimens quantitatively into 15  ml graduated centri-



          fuge tubes and place  into a water bath that is maintained at



          40°C.



     7.   Direct a  purge  of air  into the centrifuge  tube  above  the



          liquid level  until  the  volume  of liquid is reduced to 2.5 ml.



     8.   Samples  are now ready for CG determination.
                                  C-3

-------
F.   Concentration of Specimens on Hot Plate



     1.   Swirl the  flasks  containing  glass beads until boiling occurs.




     2.   Do not allow the flasks to evaporate to dryness.








G.   Pouring of Extracts Into Graduated Centrifuge Tubes



     1.   Use  a small  funnel to  avoid losses  due to  direct  pouring.








H.   Concentration of Samples In Centrifuge Tubes With A Stream of Dry Air



     1.   Water bath should remain at a constant temperature.



     2.   Stream of  air to  all  samples should  be about  the  same flow




rate.



     3.   Concentrate  all  samples  to  approximately  the same  volume.








I.   Column Cleanup



     1.   It is important  that the adsorbent (Florisil) have consistent




          mesh size and moisture content.



     2.   Exactly the  same  weight  of adsorbent should be  used  for each




          sample.



     3.   Good  column  technique  is  essential  for adequate separations.
                                  C-4

-------
C.   Florisil Column Separation of Pesticides in Standards A and B




     1.   Components  eluting  in  the  first  fraction  (150  ml of  10%




          methylene chloride in hexane) are:




               aldrin




               heptachlor



               gamma chlordane



               OPDDE



               PPDDE



               OPDDT



               PPDDT



               PPTDE



     2.   Components eluting in the second fraction (100 ml of methylene



          chloride) are:



               endrin



               dieldrin



               *heptachlor epoxide








^occasionally heptachlor  epoxide may  split between the  two fractions.
                                  C-5

-------
D.   Florisil Column Separation of Other Common Pesticides




     1.   First fraction




     trifluralin



     toxaphene




     PCB's



     lindane (BHC)




     PCNB




     chlordane



     methoxychlor




     mirex



     2.   Second fraction




     endosulfan I



     endosulfan II



     endosulfan sulfate



     endrin, aldehyde form



     endrin, ketone form



Note:



     Most  organophosphorus  pesticides  elute  in  the  second  fraction.
                                  C-6

-------
B.   Each Batch of Florisil Should Be Checked As Follows:



     1.   Add  known  volume of  bench standard  to  Florisil column,  and




          take off fractions,  as in the above procedure.




     2.   Concentrate volumes  of fractions 1 and 2  to the same volume as




          that originally added to the Florisil column.



     3.   Compare recoveries  in each fraction with  the bench standard.



          This  allows  the chromatographer  to determine which  fraction




          contains each  component  and the percent loss on  the  Florisil



          column, if any.
                                  C-7

-------
                    APPENDIX D




Sampling Weights for the Rural Soils Network (RSN)

-------
0.   NOTATION

                 1967 Conservation Needs Inventory (CNI)

                 National Soil Monitoring Program (NSMP)

                        Rural Soils Network (RSN)

               Rural Soils Network Cropland Sample (RSNa)

              Rural Soils Network Noncropland Sample (RSN2)



Let i = 1, . . .  ,48         denote the States of the conterminous

                              United States

Let j = 1, . . .  , s (i)      denote the counties of State i that are

                              not strictly metropolitan in character

Let k-  =1, . .  . , t (i,j)  denote the strata in county j of State i

Let & = 1, . . .  , U (i,j,k)  denote the primary sampling units (PSU's),

                              typically 160- acre plots, in stratum k

                              of county j in State i

Let 8, = 1	u (i,j,k)  denote the sample PSU's in stratum k of

                              county j in State i

     [There are uncountably many secondary sampling units (SSU's), i.e.

possible sampling  points,  in each  PSU,  so it is not  possible to index

the population of SSU's within any PSU.]



Let m=l,  . . .  ,v(i,j,k,£)  denote the actual SSU's selected by

                                spinning the sampling template once for

                                PSU £ in stratum k of county j in State i.
- Although townships or their equivalent are used to stratify the sample
within counties, the township, within township, and other levels of
stratification are treated herein as a single level without loss of
generality.

                                 D-l

-------
Let V  (i,j,k,£) be  the random variable representing the number of SSU's

selected  by spinning  the  sampling  template  for PSU  £ in  stratum k.

Note that v (i,j,k,£) is a realization of V (i,j,k,£).



Let m! = 1, . . .  , vj (i,j,k,£)   denote the realized cropland SSU's

                              for FSU £ in stratum k of county j in

                              state i

Let m2 = 1, . . .  , v2 (i,j,k,£)   denote the realized noncropland SSU's

                              for PSU £ in stratum k of county j in

                              State i.



Of course, vt (i,j,k,£) + v2 (i,j,k,£) = v (i,j,k,£).

1.   PHASE ONE — THE CNI SAMPLE

1.1  CNI PSU Probability

     Since u (i,j,k) PSU's are selected at random and without replacement

from the U (i,j,k) PSU's in stratum (i,j,k),

     p (i,j,k) =    Overall probability of selection into the CNI for each

                    PSU in stratum (i,j,k)

                    u (i.j.k)                                    ri>
               "    U (i.j.k) '                                  Uj


For the standard sampling procedure, in which one PSU was selected at

random from a stratum containing 48 PSU's,

     ofi i k)  =    u (i'J»k) = -L  = 9^
     PU.J.K;  -    y (i>j>k) - 48  - W  •


1.2  Conditional Probability for SSU's in the CNI

     Recall  that  m =  1, . . . ,  v  (i,j,k,£)  indexes  the  CNI  sample

points in  PSU £ of  stratum k.   Also  recall  that there  are  infinitely

many such points  available  for sampling in each PSU.  If the points are
                                 D-2

-------
considered to have  no dimensions and hence no area, any point picked at




random must have zero probability of being selected into the CN1 sample.




This  is  because  there are  infinitely  many mutually  exclusive  points.




     However, a point with no dimensions cannot be  assigned  a land use




other  than that  of  a small undefined  physical area  surrounding  that



point.   Thus,  in  fact, a  small  undefined area  centered at  each CNI



sampling point  was sampled  rather than a  point,  per se.  Let  us  then



assume that  each CNI  sample point is effectively a  sampling unit  with



area a, where the area a does not depend upon PSU or stratum.   A probabil-



ity density  for  sample selection can then be distributed over each PSU,



resulting a positive probability for each SSU.




     A reasonable  simplification seems  to  be to assume  that  the proba-



bility density for  selection is uniform over each PSU.  This  assumption



would imply, among  other  things, that there  is  no  border effect.  That



is, areas  near  the edge of  the PSU are neither over-  nor under-repre-



sented in  the sample, both as selected and as implemented in the field.




In  this  case,  if  a single  SSU  were  to be selected at  random within a




sampled  PSU,  its  conditional probability  of  selection would  be  a/A,



where A is the area of the PSU and a is the area of the SSU.



     A   random    number V  (i,j,k,£)   of   SSU's   were   selected   from



PSU(i,j,k,£).  Letting A(i,j,k,£) denote the area of  PSU(i,j,k,£), the



conditional probability, given selection of PSU(i,j,k,£), for  the selec-



tion of SSU (i,j,k,£,m) is  then



     Prob [SSU (i,j,k,£,m)  is selected / PSU (i,j,k,£) is selected]




          A(i>j*M) E [V(i,j,k,£) / PSU (i,j,k,£)  is selected].  (2)
                                 D-3

-------
     The expected  number  of sample SSU's in a PSU, i.e. E[V] in (2), is



proportional  to  the area,  A(i,j,k,£),  of the PSU  (except  for 640 acre




PSU's).   The  density   of  the  sampling  template  for 640-acre  PSU's,




adjusted to a common photograph scale, was one-fourth that for all other




PSU's.  Hence,  the proportionality constant for  640-acre PSU's  is one-




fourth of that for all other PSU's, hence,



     E[V(i,j,k,2) / PSU (i,j,k,£) is selected]



           0.25 c A(i,j,k,£) for 640-acre PSU's



           1.00 c A(i,j,k,£) for all other PSU's                 (3)



Thus, from (2) and (3)



     Prob [SSU (i,j,k,£,m)  is selected / PSU (i,j,k,£) is selected]




           0.25c a  for 640-acre PSU's



           l.OOc a for all  other PSU's                           (4)



That is, the conditional probability of selection of an SSU is a constant



that depends only upon size of the PSU.



1.3  CNI Sampling Weights



     Combining the results  of  1.1 and 1.2, we can determine the overall




probability of selection for the ultimate sampling units,  the SSU's, for




the CNI sample.  In particular, it follows from (1) and (4)  that




     Prob [SSU (i,j,k,£,m)  is selected into the CNI sample]



     =    Prob [PSU (i,j,k,£) is selected into the CNI sample]



          X Prob [SSU (i,j,k,£,m) is selected / PSU (i,j,k,Ji) is selected]



          0.25 a c p(i,j,k) for 640-acre PSU's



          1.00 a c p(i,j,k) for all other PSU's                  (5)



Thus, for estimation of means,  a proper sampling weight for  each SSU



record in the CNI sample is simply
                                 D-4

-------
   W
                                  for 640-acre PSU's
                                  for all other PSU's
                                        (6)
The constant factor, ac, cancels in any estimation of means.  Of course,

this weight  reflects  only the unequal probabilities of selection due to

the sampling design and can be further modified to reflect missing data,

failure to accurately locate sampling points, etc.

1.4  Weighing the CNI to Estimate Total Land Area

     It seems  reasonable  that if each SSU of  the CNI is to be regarded

as having  area  equal  to one (unit free), then the sampling weight to be

assigned to an SSU is
     WT (i,j,k,£,m) =
E[Area (in acres) represented by the SSU]
          Prob [PSU (i,j,k,£)]


Area of PSU (i.j.k.A)
E[V (i.j.k.JQ]        .
Prob [PSU (i,j,k,£)]
                      .   for 640-acre PSU's
                         for all other PSU's
                         0.25 c
                         c A(i.j.k.A)
                                     for 640-acre PSU's
                                                                 (7)
                         c p(i,j,k)   for all other PSU's '
     Of  course,  the  proportionality  constant,  c,  or  equivalently,

    (i|j|kf£)l  would  have to be explicitly  determined,  probably empir-

ically, to actually use (7) in estimation of total acreage.  Although we

will not need (7) explicitly, since we are only interested in estimating

means or rates, it is reassuring that the weights (6) and (7) are of the

same form.
                                 D-5

-------
2.   THE RSN SAMPLE

2.1  Preliminaries for the RSN

     The  contribution  of sampled  CNI  cropland  PSU  (i,j,k,£)  to  the

cropland accumulation used by the USDA for selecting  the  RSN subsample

is the adjusted cropland ratio
                  -   Vl (i.J.M)  '     0-02
                      v (i,j|M)     p
     Thus, the total of the cropland accumulation used by the USDA in

State i is
                       1         *           ?!
                      k=1       ft=l
                      »•" 1       JC~ X
                         ti,j     .        u
           = 0.02   Z      Z     f.1. .,      Z        lirA     (9)
                   j=1    k=1  P (i,J,k)     £=1      v (i,j,k,£)


     Similarly, the total of the noncropland accumulation in State i is

                   s(i)  t(i,j)            u (i,j,k)     (.    .  ,.
     N2(i) = 0.02   Z      Z     ... .,      Z       VV ^L i\   (1Q)
                   j=l    k=l  p (l'J'k)     £=1

2.2  Estimation of Proportion of Cropland Acreage in
     the Rural Area of State i.

     The estimate used was
                                                Z     vt (i.j.k.2)
D
N2

(i)
_ J=l
Z
k=l
z'
k=l
P (i
)
P (i
>j
1
• J
,k)

,k)

u (
2=1
i,j

,k)
v (i,j

,k,2)

                                                         (i.j.k.2)
                   _ j=l   k=l   p (i.j.k)     2=1     v (i.j.k.2)  (11)
                   " s(i) t(i,j)
                      Z     Z    U (i,j,k)
                     j=l   k=l

                  = (Estimated total of the cropland proportions for all

                    PSU's in State i) T (Total number of PSU's in State i)

                                 D-6

-------
2.3  More Preliminaries for the RSN
     Let ni(i) denote the number of 10-acre RSNj sites to be selected in




State  i.   Recall that  n^i)  is  chosen so that 0.025%  of the cropland




acreage in State i is sampled.




     The  procedure   for  selecting n^i)  starting  points  for  the n:(i)




RSN! sample sites was:




     1)   Select  a   random  number  from the  interval (0,  Wi(i)) where




          Wi(i) = N1(i)/n1(i).  Call this random number Qi(i).




     2)   Select  as  an RSNi  starting point  the first CNI cropland SSU




          whose  contribution  to  the  cropland accumulation  causes the




          accumulation to equal or exceed qi(i).




     3)   Repeat step  (2)  with  qi(i) replaced by qi(i> + Wi(i),  qi(i) +








It should be  noted  that an RSN!  starting  point did not uniquely deter-




mine an RSNi  sample  site.   In particular,  an adjacent cropland  SSU had




to be  found,  and the  RSN..  sample site was centered about  these two




cropland  SSU's from  the  CNI.   Moreover,  substitution  procedures  were




employed  when an adjacent  cropland SSU did  not exist.   In addition, a




substitute RSNi site  was  selected if the selected site either could not




be surveyed for  some reason or could no longer be considered a cropland




site.




     Let us consider  an alternate, and perhaps more useful, representa-




tion of identically  the same  procedure for selecting the ni(i) starting




points for  the RSN:  sample.  The contribution of each  cropland  SSU in




PSU (i,j,k,£)  to the cropland accumulation is




                    II7"j ^  tln \     II 7~Z ^ l_\                       \*-£)
                                 D-7

-------
Let \ = 1,  .  .  . , A(i) denote the SSU's  selected  into  the  CNI  sample  in

State i.  The cropland accumulation may  then be represented  as

                                  A(i)
                          Nt(i) =  Z  7t(\)   ,                       (13)
                                  \=1

where 7l(\)  is given by  (12) for cropland SSU's and  is zero  for  noncrop-

land SSU's.   Thus,  the cropland accumulation  for State i  may  be thought

of  as  partitioned  into  A(i)  zones,  where each zone  has  width  n(\).

PSU's  that  are  entirely noncropland will  contribute  a  null  zone with

zero width  to the cropland accumulation.  The  RSNi procedure for select-

ing a  CNI SSU as a  starting point  for  a 10-acre RSNi site may  then be

illustrated as:

               qi(i)                        qi(i) +  Y  wi(i)

I n(l)l n(2)! 7i(3)l 7i(4)l n(5)| 7i(6)| 7t(7)'  ' U(A.-1)' n(\) ln(A +  1 )|'  '
0

A cropland SSU is selected as a starting point  for  locating  an

site, if the sequence number

               qi(i) + YWi(i)  for ? = 0,  1,  .  .  ., nx(i)  -  1      (14)

hits the zone representing the SSU.

2.4  Conditional probabilities for SSU's in the RSNt  given the CNI

     The SSU(i,j ,k,£,m..)  is  selected as an RSNX  starting point only  if

the  single  random number  qi(i)  results  in a  sequence number given  by

(14)  that  hits  the  zone  representing SSU(i, j ,k,JH,ra-) .   The  chance  of

multiple hits on this zone is almost identically  zero since  the width  of

the  zone  representing a  cropland  SSU,   given by  (12),  is  very  much

smaller than the  distance w1(i) = N1(i)/n1(i)  between cropland sequence

        2/
numbers.-   Thus,
2/ Multiple hits within the same PSU have occurred  in  the RSN  sample
occasionally, however, due to the inadvertent  repetition of  some  PSU
records in the State lists used to select the  RSN subsamples.

                                 D-8

-------
Prob (SSU (i.j.k.A.mj) will be selected as a starting point for an RSNt
     site SSU (ijj.k.A.mx) is in the CNI sample)
        Size of the zone representing SSU (i,j ,ktl,ini)
             1            0.02
     _  v (i.j.k.A)    p (i.j.k)
from  (12)  and  the fact that Qi(i)  is  a random number from the interval

(0, Wi(i)).

2.5  RSN sampling weights

     It  will be  recalled that  the selection  of  a starting  point for

locating a  10-acre  RSNj. site did not  uniquely specify the site.  There

was a procedure  for determining the sampling site based on any cropland

starting point,  however,  as  long as an  appropriate  site could be found

within the  PSU  containing the starting  point.  To  the extent that this

procedure was strictly applied, most RSN! sites were uniquely determined.

However, it is apparent from considering several maps of RSN: sites that

the specified procedure was only adhered to loosely.

     It should also be noted that the procedure for determining an RSNx

site  based  upon  a  cropland  starting point was that  the starting point

not be included  in  the resulting sample site  if  the starting point was

an  isolated cropland point.   Thus,  there was  an  intentional bias away

from isolated cropland SSU's in the RSNlt

     If the non-uniqueness of the RSNi site determined by selection of a

starting point,  and bias  away from isolated cropland  SSU's  is ignored,

we obtain from (5) and (15),

Prob (the RSNi site resulting from selection of SSU (i,j ^j^ni!) as a
     starting point will be selected into the RSNi sample)
          Prob (SSU(i,j ,k,H,mi) will be selected as a starting point for
          an RSNi site / SSU(i, j ,]ailLJmi') is in the CNI sample)
          Prob (SSU (i,j,k, £,111!) is selected into the CNI sample)

                                 D-9

-------
                     0.02
   v (i.j.k.JE)    p(i,j,k)
    0.005 a c PI (i)
                                        0.25 a c p (i,j,k) for
                                        640-acre PSU's
                                         a c p(i,j,k) for
                                        all other PSU's
                                   for 640-acre PSU's
                                                               (16)
     0.02 a c ni(i)
   v (i.j.k.l)  Mi(i)
                                        for all other PSU's
               4v (i,.i,k
Thus,  if we  are willing  to  accept the  simply ing assumptions  at the

beginning of this section, a sampling weight for estimation of means for

the RSNi site resulting from selection of SSU (i,j ,^£,011) as a starting

point is

                                  Nl(i) for 640-acre PSU's


                                                                    (17)


                    V (l>J>n'(i) Nl(l)  for a11 °ther PSU'S-


It should be  noted  that the weight  given  by (17) will be approximately

the same for  all RSNt sites within those States where only one PSU size

was used.  This  is  because v(i,j,k,£) will  be  very nearly the same for

all PSU's within such a State.

     Of course,  a sampling weight for estimation of  means for the RSN2

site resulting from  selection of SSU(i,j,k,£,m2) as a starting point is
W
                                      N2(i)  for 640-acre PSU's
                                                                    (18)
                          v(i,j,k,A) N2(i)   for all other PSU's.
                                nz(i)
                            D-10

-------
3 .   Comments

     A strict  accounting  of the bias away  from isolated cropland SSU's

in the RSNj  would be quite difficult.   It  would be necessary to deter-

mine, for  each RSNx site, the number of SSU's that would have resulted

in  selection  of  the  site  if that  SSU had  been chosen as  a starting

point.  This number of SSU's chosen as  starting points that would have

resulted in  selection  of  the RSNj site  could theoretically be any posi-

tive  integer.   A value of  one  would  hopefully  be  predominant,  giving

exact aggrement with (17)  and (18).  However, two and three would surely

occur also.

     Consider  the conditional probabilities  for  the  SSU's  in the RSN1}

given the  CNI, as  considered in  Section 2.4.   In  particular,  the sum

over all SSU's sampled by the CNI of the probability that SSU(i, j .k.Jfc.mi)

will  be  selected as a  starting  point  for RSN^  site  is as  follows from

(15):

.(i) t (i,j) u (i j,k) v,  (i.j.M)*     v (j.j.k.JE)   p (j°jik)

                2=1      mi=l

           t (i,j) u(i j,k)    ni (i)     Q.Q2     Vl

             k=l
     0-02 m (i)    s(i)  t (i,j)                u(i,j,k)
        jj / -• -\        •*      •*
                    j=l    k=l      P   's         £=1

=    Mi)

from (9).  Thus,  the  sum of the  SSU  probabilities for the RSN1( condi-

tional  on  the CNI  sample being  regarded  as fixed,  is  the RSN^ sample

size for State  i,  namely n^i).  This  result  lends additional credence

to the correctness of the sampling weights as described by  (17).
 ^Summation is over the sample cropland points of the CNI sample because
these points  constitute the  population with  regard  to the conditional
RSN probabilities.
                                 D-ll

-------
4.   Approximation to the RSN Sampling Weights




     Exact implementation of the sampling weights given by (17) and (18)




is not a simple task.  The sample sizes n^i) and n2(i) for the cropland




and noncropland samples are readily available (See Table 1.3).  However,




the State accumulations of the adjusted cropland and noncropland ratios,




Ni(i) and N2(i),  are only available from the hard-copy computer records




of the RSN  sample selection.   These records  are  not entirely reliable,




since there  is  no guarantee that the  copy  available was the final copy




from which  the  sample was selected.  Dummy records were added to obtain




coverage of  federal  croplands  for the noncropland  sample,  and the data




set was otherwise edited before sample selection.  The number of sampling




points,  v(i,j,k,£),   is   again  available  from  the  hard-copy computer




records.  However,   it  would be  a  monumental  task  to go  through the




hard-copy computer  records to  obtain  v(i,j,k,£)  for each RSN sampling




site.  A perusal  of  these sampling  records  reveals  that individual CNI




sites were  sometimes entered more  than once,  doubling the probability




that these sites would enter the RSN sample.




     If the sampling design had been implemented exactly as described in




the  text  for a particular  State  and all PSU's were  160 acres,  partial




PSU's  would still  occur around  the  boundaries  of  counties  and  other




large  scale geographic  strata, e.g.,  irrigated areas.   These  partial




PSU's would  be "nominal"  160-acre  PSU's,  but would  receive fewer than




the usual number of sampling points.




     The  full  160-acre  PSU's  each receive  approximately  36 sampling




points.  The  random  variation  in  v(i,j,k,£) may be  small  for the full




PSU's, and  a good approximation  to the sampling weights  given  by (17)




and  (18)  is achieved  by using the mean number  of points  assigned  in




place of v(i,j,k,£).




                                 D-12

-------
     Identification of the  "nominal"  160-acre PSU's is not be  a  simple




task.  It requires close examination of the CNI sampling maps, at least.




It should be  noted also  that actual PSU's  were,  in practice, sometimes




larger than their "nominal" size.  These larger PSU's occurred mostly in




States that used 40-acre PSU's in "irrigated" strata, where the "nominal"



40-acre PSU's  were sometimes  larger  than  40  acres around the stratum




boundaries.  Due to the problem of identifying PSU's considerably larger



or  smaller than  their "nominal"  size, no adjustment in  the  sampling



weights (17)  and  (18)  is  being proposed for RSN sites occuring in these



PSU's.



     The sampling weights given by (17) and (18) are only appropriate if



the sampling design is implemented as described in the text.  Examination



of the  numbers of  points  assigned to CNI  sites  reveals,  however,  that



this was not  the  case.  The assignment of  sampling points within PSU's



was done at local USDA offices, and the design sampling protocol was not




consistently followed.  For example, nearly all sites in Nevada received



approximately 36  sampling points,  whether the PSU  size was 40  acres or



160 acres.   Moreover, it appears that the sampling template may not have



been  spun  for  Nevada sites  since most  received  exactly 36  sampling



points.  Also, the scales of the sampling template and the aerial photo-



graph were often not properly matched,  resulting in consistently more or



fewer  sampling points than  expected  from the  design protocol.   For




example, many 160-acre  PSU's  in  New  Mexico received  approximately 18



sampling points, rather than 36 sampling points.  Thus, a single sampling



protocol was  not  consistently applied  throughout the United States.  It



is probably not possible to determine exactly what protocol was used for



each sampling  site.   The consequences  of  these  variations in  sampling



protocol will presently be investigated.




                                 D-13

-------
     The investigation of  the  effects of variations in the CNI sampling
procedure upon the RSN sampling weights will be aided by considering the
sampling weight given by (17) as a product.  In particular, the sampling
weight (17) may be written as
                    v(i,jtkt£)  p(i.j,k) MB  .   ,. *  .    for 640-acre
                                                             for aii other
                                                  PU,J,k)     psuis

                    vU,j,M) Mi)    .    4    £   640-acre PSU's
                          ni(i)
                                                                   (19)
                    v(i,j,k,A) Nt(i)    .         f   all other psu,
                          ni(i)

where  the first  factor is  the conditional RSN  weight and  the  second
factor is  the  CNI  weighting factor.  The CNI  weighting factor for 640-
acre  PSU's is  four  times  that  for all other PSU's because  each such
point  represents  four  times  as much  land  area  as  points in  PSU's  of
other sizes.
     A specific case may  help to clarify the effects  of variations  in
the CNI  sampling procedure upon the RSN sampling weights.   Once  again,
consider the case  of New Mexico where many  160-acre PSU's were sampled
at the design rate of about 36 sampling points per PSU,  while many other
160-acre PSU's  were sampled  at the lower  rate   of  about  18  points per
PSU.  For  those PSU's  sampled at the proper  intensity,  the appropriate
sampling weight is approximately
                         36 Hi(i)
                             (i)
                                        .
The sampling weight  formula  given by (19) would have to be modified for
the  PSU's  receiving  only about  18 sampling  points,  since each  point
represents  twice  as  much land area.   The CNI  factor of  the  sampling
weight  is  doubled resulting  in  an approximate  RSN sampling weight  of

                                 D-14

-------
          18 Nt(i)                 36
Ni(i)
na(i)
which is exactly the same as the first case.  When half the usual number

of sampling points  was  assigned the conditional RSN weight  was halved.

However, each  sampling  point  then represented twice as much land area,

doubling the CNI  weighting  factor.   In terms of probability, the condi-

tional RSN probability  was  doubled  for each point, since it contributed

twice  as much  to  the  accumulation NjCi),  but  the unconditional  CNI

probability was  halved, since  half as many sampling points were being

assigned within  the PSU.   Thus,  the procedural  variations  in  the  CNI

sampling protocol result in no change in the appropriate mean weight for

the  RSN.  A  single mean sampling  weight  is  then appropriate  for  all

160-acre PSU's.  This sampling weight is
where vlgo  is the  average  number of  sampling points per  FSU  when the

design sampling procedure described in the text is applied.

     This weight  fails  to  reflect the random variation in the number of

sampling points, v, assigned to a PSU within any given sampling protocol.

However, it  is  a  proper mean sampling weight regardless of the sampling

protocol.  The alternative  is  not feasible, requiring precise knowledge

of the  sampling procedure  used to assign the sampling points as well as

the  number  of  points assigned  for  each PSU  containing an  RSN  sample

site.  Since this  detailed information is not available, the mean sampl-

ing weights appear to be most appropriate.

     The  following  mean weights  are  suggested  for  the RSN  cropland

sample:
                                 D-15

-------
                    40 acre PSU's  :     V*<> *l
                   100 acre PSU's  :
                   160 acre PSU's
                   640 acre PSU's  :
                                               i)
where v. is the mean number of sampling points assigned to PSU's of area



A  under the sampling  protocol specified by  the design.   It should be



noted,  however,  that  this protocol results in v. being directly propor-




tional  to  the  size,  A, of the PSU, except for 640-acre PSU's where v640



is identical to  v160.   Thus,  the above mean RSN sampling weights may be



expressed as follows:






          40-acre PSU's        :
         100-acre PSU's       :     10
         160-acre PSU's       :     l6
         640-acre PSU's

where v10 is the mean number of sampling points per 10 acres assigned to




all  but 640-acre  PSU's under  the sampling  protocol specified  by the




design.




     Since only relative sampling weights are required for estimation of




means,  the  constant factor,  v10,  in  the above sampling  weights  may be
                                 D-16

-------
cancelled.  Moreover,  the cropland  and noncropland  samples  of the RSN

can  be  regarded as two strata  in  the RSN sample of  the  rural  areas of

the  conterminous United  States.  As seen before, the derivation of the

noncropland sampling  weights parallels  that for the cropland sampling

weights in all respects.  The ratio N1(i)/n1(i) for the cropland sampling

weights in state i is replaced by N2(i)/n2(i) for the noncropland sample.

Otherwise, the  conditional RSN factor of the  sampling weights and the

unconditional CNI  factor  remain unchanged.   Thus, the final recommended

sampling  weights  for the  RSN  are  as given  in  Table  D-l.   The cropland

sampling  rate was  0.025  percent  of  the  cropland  acreage within each

state,  and the  noncropland sampling rate  was 0.0025  percent of the

noncropland acreage,  which is  reflected by N2(i)/n2(i)  being approxi-

mately 10 times as large as N1(i)/n1(i).

              Table D-l:  Recommended RSN Sampling Weights

     PSU Size                 Cropland                 Noncropland
      40 acres                 4 N^iJ/n^i)            4 N2(i)/n2(i)
     100 acres                10 N^O/n^i)           10 N2(i)/n2(i)
     160 acres                16 NiCiJ/n^i)           16 N2(i)/n2(i)
     640 acres                64 NiUJ/n^i)           64 N2(i)/n2(i)
Notation:      nt(i) = Number of cropland sample sites in state i

               n2(i) = Number of noncropland sample sites in state i

               NjCi) = Total "cropland accumulation" for state i

               N2(i) = Total "noncropland accumulation" for state i

     It  should  be emphasized  that the sampling weights  shown in Table

D-l  reflect  only the  mean differences in the portion  of the selection

probabilities of the RSN sites that depend upon the size of the PSU.  It

has been argued that the selection probabilities would be fairly constant

for  a  given  size  of PSU, since the  total  number of CNI sampling points

would  be  fairly constant.   Undocumented variations in  the  CNI sampling
                                 D-17

-------
protocol make it virtually impossible to quantify the smaller variations

in  selection probabilities  for RSN  sample sites  within the  group  of

PSU's of a given size.  There are many other factors that may be reflect-

ed in sampling weights, but are presently ignored.  Some of these factors

are:

      1)  Duplicate entry of  some  CNI sites in  the  list from which the
          RSN sample  was  selected,  doubling the chance of selection for
          all potential RSN sites within such PSU's.

      2)  CNI sites missed when the RSN sample was selected.

      3)  Inclusion of  some  CNI sites that fell outside the partial PSU
          being sampled.

      4)  Loss of some CNI site maps.

      5)  PSU's substantially over or under their "nominal" size.

      6)  Border effects, or PSU size effects, on the number of sampling
          points assigned within a PSU.

      7)  Random variation  in  the  total  number  of  CNI  sampling points
          assigned to a PSU.

      8)  Failure  to  accurately locate  the  selected  RSN   sites,  CNI
          sites, and/or CNI sampling points in the field.

      9)  Uncertainty associated with the  values found for the cropland
          and the noncropland accumulations for each state.

     10)  The existence  of  multiple CNI sampling points  that would all
          lead to selection of the same RSN site.

     11)  The use of substitute RSN sites.

5.   Implementation of the RSN Sampling Weights

     Implementation  of the  approximate RSN  sampling weights  shown  in

Table D-l required  that information be gathered  that  was not available

on the  data  records.   The size of the PSU from the 1967 CNI  sample into

which each RSN site fell was obtained from the Statistical Laboratory at

Iowa State University.   These findings are shown in  Tables  D-2 through

D-5.  The State accumulations of the  adjusted  cropland  and  noncropland
                                 D-18

-------
              Table D-2:  States With Only 160-Acre PSU's

     State Code     State Name          State Code       State Name
         01
         06
         08
         12
         13
         16
         17
         18
         19
         20
         21
         22
         26
         27
Alabama
California
Colorado
Florida
Georgia
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Michigan
Minnesota
28
29
30
37
38
39
40
41
45
47
48
53
55
Mississippi
Missouri
Montana
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
South Carolina
Tennessee
Texas
Washington
Wisconsin
   Source:  Statistical Laboratory, Iowa State University



              Table D-3:  States With Only 100-Acre PSU's"

     State Code     State Name          State Code       State Name
         09
         10
         24
         25
         33
         34
Connecticut
Delaware
Maryland
Massachusetts
New Hampshire
New Jersey
36
42
44
50
51
54
New York
Pennsylvania
Rhode Island
Vermont
Virginia
West Virginia
   Source:  Statistical Laboratory, Iowa State University
       Table D-4:  States With Constant PSU Size Within Counties
State Code      State Name     PSU Size
    05         Arkansas        160 acres
                                40 acres

    46         South Dakota    640 acres
                            County Codes
                        9-15,23,39,49,53,59,61,89,
                        99,103,109,129,133-137
                        All others

                        7,19,31,33,41,47,55,63,71,
                        75,81,85,93,95,103,105,113,
                        117,121,131,137
                               160 acres    All others

   Source:  Statistical Laboratory, Iowa State University
                                 D-19

-------
        Table D-5:  States With Varying PSU Size Within Counties
State Code
    23
    04
    31
    32
    35
    49
    56
 State Name
Maine
Arizona
Nebraska
Nevada
New Mexico
Utah
Wyoming
 PSU Size
400 acres
100 acres
160 acres
 40 acres
640 acres
160 acres
160 acres
 40 acres
640 acres
 40 acres
   RSN Site Numbers-^
7,27,29,32-34,36-39,48,61,67
All others
1,3,10-55,107,108,160,163
All others
67,68,70,180,194,195,243,
246-248,321-324,434,448,449,
and all sites in counties:
5,9,31,41,45,63,73,75,91,
117,161,165,171
All others
96,143
All others
65,67,179
1,4,59,62,64,117,120,123,125,
177,183,184
160 acres    All others
640 acres    12,51,91,97
160 acres    2,3,9,45,46,48,52,54,56,90,
             92,95,134,135,139,141
 40 acres    All others
640 acres    4-6,9,15,17-20,22-25,30-33,
             35-43,46,48-54,57,60,61,66,
             71,112,165,167,168,174,176
160 acres    All others
-   Only sites for which data was collected were classified.  Completion
of this table for all RSN sample sites in these states would be very
time consuming.
*
   Source:  CNI site numbers corresponding to the RSN site numbers were
obtained from the EPA Field Studies Branch, Washington, D.C.  The PSU
size for each of these CNI sites was obtained from the Statistical
Laboratory at Iowa State University.
                                 D-20

-------
ratios,  Nj(i)  and  N2(i),  were  obtained from  the hard-copy  computer


records of  the  RSN sample selection.  The information obtained is shown


in Table  D-6.   The information in Tables D-2  through  D-6 was then used


for  the  sampling  weight  computations shown in  Table  D-l  for  each RSN


sample record, with the exceptions noted below.


     As shown in Table D-5,  the RSN  sample  sites in the State of Maine


fell  in  PSU's of  two sizes--100 acres and  400  acres.   Actually, Maine


had  a  few 200-acre PSU's, but none of these were in the RSN sample.  It


appears,  however,  from the RSN sampling documents  preserved  by the EPA


that each 400-acre PSU was treated as  four  100-acre PSU's when the RSN


sample  was selected.   The  effect  of  this  treatment  of  200-acre  and


400-acre  PSU's  in  Maine can be seen by considering the factored form of


the  RSN  sampling  weight  given  by (19).  It appears from the number of


CNI  sampling  points  assigned  to the 200-acre  and 400-acre  PSU's  that


they were, sampled at  the  same  rate  as  all other PSU's,  except for the


640-acre  PSU's.    Thus,  the  unconditional  CNI  factor  in  the  sampling


weight  (19) is  one.   The conditional  RSN  factor  is  the same,  on the


average,  as that for 100-acre  PSU's, since the total points, v, for the


100-acre  portion of  the  400-acre PSU is  same, on  the  average,  as  that


for 100-acre PSU's.  Thus, the mean sampling weight was computed for all


sites in Maine as shown for 100-acre PSU's in Table D-l.


     The  State  accumulations of cropland and  noncropland ratios, N^i)


and  N2(i),  shown  in  Table  D-6 were  checked  for  logical  consistency.


This check  was  felt to be necessary  since these  values were based upon

                                                          3 /
hard-copy  computer output from the RSN sample selection.-   This hard-
3/
-    Only a hand written copy could be found for Maine.
                                 D-21

-------
           Table D-6:   State Accumulations"  of Cropland  Ratios,  Nx(i),  and
Noncropland Ratios, N2(i),  Together with Computed  Sample Sizes  and  Total  Land Area

State Code
01
04
05
06
08
09
10
12
13
16
17
18
19
20
21
22
23
24
25
26
27
28
29

State Name
Alabama
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri

Ni(i)
467.42123
179.53666
3656.18536
1284.14536
1256.23608
58.31892
120.95724
361.80749
587.92669
655.14071
3050.59595
1616.49756
3160.09155
3426.67927
718.47949
546.63631
228.28275
423.85901
56.72992
1158.50806
2557.73877
627.86577
1702.23242

ni(i)
92
36
216*/
268
240
8
12
72
120
132
568
312
608
684
124
108
32
52
8
220
488
124
328

N2(i)
3619.41845
9063.14655
10472.11792
10698.46741
7390.47656
601.65381
154.04031
3980.34901
4066.30930
5990.26966
1747.62573
1398.49561
1515.03809
3131.41689
2969.72778
3060.22173
3735.32919.
868.16471
949.29687
3655.05437
4150.38281
3151.17520
4035.95068

Mi)
72
176
64
224
140
8
4
80
80
120
32
28
28
64
52
60
48
12
12
68
80
64
76
(1000's of acres)
Total Land Area
32,597
72,680
33,468
100,076
66,486
3,127
1,266
34,721
37,263
52,933
35,766
23,132
35,839
52,425
25,511
28,596
19,848
6,319
5,033
36,515
51,201
30,250
44,235

-------
                         Table D-6:   State Accumulations   of  Cropland Ratios, NjCi),  and
              Noncropland Ratios,  N2(i),  Together  with  Computed  Sample  Sizes and Total  Land  Area

State Code
30
31
32
33
34
35
36
37
38
39
40
41
42
44
45
46
47
48
49
50
51
53
54
55
56

State Name
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming

Ni(i)
1819.92407
38.16203
66.32464
57.27583
157.60745
204.96760
1212.80716
694.07227
3396.55322
1390.60275
1310.92419
774.27002
806.77905
7.95469
378.91528
2268.04346
617.46338
3732.70468
250.68592
142.21658
742.42676
879.95784
200.94192
1425.55981
365.49951

ni(i)
340
40-^
12
8
20
40
152
124
636
276
260
152
152-^
4
68
420^
112
744
48
20
84
180
24
272
68

N2(i)
10579.96484
1145.83691
8475.24410
1047.69238
811.71170
9663.70741
4933.72984
3684.40039
2544.91626
1889.30123
4200.03636
7003.43794
3020.83643
128.59467
2251.41602
4287.50000
3029.04150
17409.54254
6318.30648
1017.74146
4225.02734
4461.17404
2878.29848
3141.84985
7940.41016

ng(i)
200
120^
176
12
12
192
60
68
48
36
84
140
56
4
40
80
56
344
128
12
56
88^
36
60
148
(1000's of acres)
Total Land Area
93,098
49,021
70,264
5,769
4,810
77,688
30,670
31,331
44,442
26,206
43,819
61,587
28,804
676
19,338
48,612
26,444
168,001
52,722
5,937
25,458
42,616
15,402
35,013
62,306
-   These values differ from the actual sample sizes in Table 1.3.

*
    Source:  Hard copy computer records of the RSN sampling maintained by the EPA  Field Studies Branch,
Washington, D.C.


    Source:  Basic Statistics—National Inventory of Soil and Water Conservation Needs, 1967.

-------
copy record was  believed to be the  computer  record  of the final sample

selection for  each  State,  but were not verified.  The check made was to

compute,  as  described  in  Section  1.2.2,  the cropland  and noncropland

sample  sizes,  SjCi)  and n2(i), from the accumulations, N^(i) and N2(i),

and the total  land area of each State as shown in Table D-6.  The comput-

ed  sample  sizes, fijCi)  and n2(i), differ from  the  actual sample sizes

shown in Table 1.3  by no more  than four for all States except Arkansas

and Nebraska.  Small differences in the computed and actual sample sizes

can be explained by the fact the total land area for each State that was

used  to compute  the  RSN sample  sizes  did not  agree exactly  with the
                             4/
figures shown  in Table  D-6.-   The  relatively  large discrepancies for

Arkansas and Nebraska were interpreted as meaning that the accumulations

NjCi)  and  N2(i)  shown  in  Table  D-6 for  these States  are incorrect.

Thus, the ratios N1(i)/n1(i) and N2(i)/n2(i) which were used to compute

the  sampling  weights for  these two States,  from the formulas  shown in

Table  D-l,  were the averages  of  N^iJ/n^i)  and  N2(i)/n2(i)  for all

other  States,  except  Rhode  Island.   The  Rhode Island  data  was  also

excluded from  this  average  because the very  small size  of Rhode Island

resulted in its  cropland sample size being rounded  up to 4 even though

its  computed  value  was  approximately one, which deflated the  value of

N1(i)/n1(i)  for Rhode Island.
-    This  is  evident  from hand  computations of  the RSN  sample  sizes
preserved  by  the EPA  for some  States.   The  source  of the  land  areas
actually used is not known.
                                 D-24

-------
             APPENDIX E



Construction of an Analysis Data File
                 E-l

-------
                  Construction of an Analysis Data File



     The EPA  computer records for  the  Rural Soils Network (RSN) are

structured for  simple  entry of the data from laboratory analyses.  For

example, a laboratory  test that results in less than detectable levels

of a  category  of compounds produces only a  single entry into the data

file.  In order to simultaneously analyze the  data  for more  than one

compound it is useful to restructure the data file so that it contains a

distinct variable  representing the  amount  detected for each  of the

compounds to be analyzed.   Thus,  a SAS-  data  set with this structure
                                   2/
was created for analysis purposes.-  The contents of  this data  set  are

shown in Exhibit E-l.

     Each detection of a pesticide residue for a sample specimen resulted

in an entry into the EPA computer record for each of four variables—a

Residue Classification Code (RCC),  an Individual Residue Code (IRC), an

amount,  and a  unit.   It was found  that all  amounts  were in units of

parts per million;  thus  the unit variable was  not included in the SAS

file constructed.  Only  specific  residues  were tested for on a regular

basis.  These  compounds  are listed in  Table £-1.  Other  residues may

have been tested ocassionally, but such data cannot be used for inferen-

tial purposes.   Only data for the pesticides shown in Table E-l
-    Statistical Analysis  System (SAS)  User's  Guide, SAS Institute,
Inc., Raleigh, North Carolina, 1979.
2/
—    For those  readers  interested  in using this data set, it is stored
on a user  disk  at the EPA North Carolina Computing Center.  The fully
qualified data set name is

          CN.EPAROY.SADD.PEST.SASFILE,

and the data is located in the data set member called TOTAL.

                                   E-2

-------
                     Exhibit E-l.  Contents of  the SAS  data set  created
TH»CKb  UStn=?Y7
BY


bUuExUMSsI
<».4U

0
1
Ib
I <3
7 1/
4
1 1
1 /
Is
?0
7
b
Ib
fi
t-1
*•«
44
Si
S/
?i
Sb
6b
?4
?s
2b
? /
?t>
58
Sf
3 |
12
34
•^
lo
37
3H
1«J
f 0
''I
4U
41
Uf>
4}
? |
fr£
4b
A3
•51
S?
nsnAfE=

VARIAUt E
ALTNUf
ANIMATE
rLAY
CUNAft
rui'NTY
fKOHNu"
rxnpRtft
Ch'M'YK
FY
LAH
1 NDUSt
OKHpAl
PESIOO?
Ptsroii
PtSI 0 16
Ptsioen
I'tP 1 OS 1
PtS 1 1 1">
ntsiun
PtS 11 61
PtSl?35
PtSI?J7
"tsi?nn
PfcS1241
Pk S 1 P 4 1
p g. <5 J p n t\
PEST?"4fr
PLS i?4fl
PtS F?b^
ctSI?oO
Pt SI 336
PtS1JJ7
PtSlljfl
PtSTl4l
P k S 1 1 4 P
P t ^ 1143
P(.S) 3^H
PtSlltO
PLSI/I^O
"f J4?1
ot.S144»
PtSI4S7
Pt S I'i'y'S
DtS [S 1?
Pt STSiJfc
PtS 1531
P£SIb J4
ptsifc^n

S T
UnSFKVATinNS
Cr. FPAKOY. SADO

TYPt
MjM
KUM
NO"
ThAR
MJP
tUf
KU"
ruf
KU"
Mi"
KljW
Ml"
MJK
NUI"
Nil"
MjM
Nlj"
KU"
MJC
Mjf
N'U"
MlH
NUW
NU'-1
SUM
M,"
Mjf
NU"
Ml"
KU"
NO"
KUM
MJ"
KU"
KUK
NU«
Miv
NU"
so"
Mif
Mif
NUf
Kb"
NUC
SUM
MJC
M,M
Mj"

CflNTtNTS (IF SAS DATA Sb
= 1237? CWtAlFI) bY JOB FPAKflY
.PEST.bASFlLE PLKST/F. = 1 J030
ALPHABFMC LIST OF V
LFNGTH POSITION FUR^A I INfOH
4
4
3
20
3
2
P
2
p
?
f
3
II
4
4
l|
4
4
4
4
4
4
It
a
4
4
4
4
4
4
4
4
4
4
4
4
4
4
U
4
4
4
4
4
4
4
4
4
4
52
44
27.J
111
33
50
Sb
S«
21
t<)
47
64
244
24Q
152
196
d04
htt
200
<>4U
7?
7o
PO
R4
I>B
208
i!l2
100
104
112
116
IPO
124
1?B
132
216
220
lib
140
144
140
60
224
156
220
IPO
184
                                                                                   S  I t M
                                                                                                        FN1I>»Y, "AHLH 6, 1981
                                                                          Al
                                                                                   FRJHAY, MUCH 6, 19RI

                                                                                  rust RV A. films HFH IR«CK=<42  (,FI*FH«TFU HY
                                                                           L»HFL

                                                                           AfCFSSlfN MJMHtR
                                                                           ANfcl YSIS PATE

                                                                           COUNIY NANt

                                                                                NUHHFh
                                                                           CRUl'
                                                                           F1SPAL

                                                                           LAND  USF

                                                                           At 0»IN
                                                                            bEN?tNt
                                                                            UMLAN
                                                                            GAMCA
                                                                            UrP-'DOF
                                                                            p,p-'nuf
                                                                            p,p-*ni;T
                                                                            OTAZ1NUN
                                                                            orcn
                                                                            Dltl
                                                                            tMJddlJLFAf.  II
                                                                                       SULFAIF
                                                                            driJRIN  ALrtHYDL
                                                                            ENUP1M  KtTLNt
                                                                            ETHIUN
                                                                            hFPTAChl (jR
                                                                            HFPTATHLUK  FPOXICF
                                                                            ISIJOKIM
                                                                            LINHANb
                                                                            AI Arm un
                                                                            fFTHUXYTMlCP
                                                                            f»F|HYI  PAKA1H1UN
                                                                            f 1RFX
                                                                            LVbX
     *Source:   Computer files supplied  by the E?A Field  Studies  Branch, Washington  D.C.

-------
Exhibit  1  (.continued)
                     s T A i i  s i
1CAL
         ANALYSIS
                          S> T 6 I  t H
                                             l«:iS FKfuM, MARCH 6,
33
6<4
"0
ftS
5<4
Si
4/
II M
?V
TO
44
SO
12
V
7i?
U
13
in
4
S
t
fiV
7J
lu
74
71

PbSt«i3«
PtSTf-4?
PtS1646
P£PlfrbO
PtSl6?n
PtSU/?
PfcSUt?
Ptsi<-a«
P£SI7tft
Ptsne?
Pt^I 7SS
P t S 1 7 V 1
Ph
RAIN
PUdNR
SAuPRATt
•SAMD
•51L1
SlU
Sf.C
STAIF
SIMA^t
STPATuM
TfP
l«1
YtAK
\OATA
KUH
MJM
KOM
KUM
MjM
KUM
KUM
KUM
KU"
NO"
KUM
KU"
MjM
KUM
KUM
KOM
K'UM
KUM
KUM
MJM
KUM
Ch«K
KUM
NU"
K'UM
KUM
IN.
4
a
4
4
4
n
4
4
4
4
II
4
T
3
?
4
T
T
3
3
2
20
3
T
A
'
TUTAL; SEI
108
*.
30
29s
242
ONE;
PHUTUniFLPhlN
bTMYL PAK4IHIOH
PC NO
PHIRATt
P'»Ul AM
prn
pi
-------
                                                              *

      Table E-l:  Pesticide Residues Tested on a Regular Basis
Residue Classification
Code (RCC)
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2~/
2-
2
Individual Residue
Code (IRC)
499
002
160
237
240
241
243
244
786
787
258
260
638
336
337
338
341
342
343
420
421
448
497
080
526
646
687
688
795
799
534
620
672..
670|/
09 14/
161-
Compound
Alachlor
Aldrin
Chlordane
DCPA
o,p - 'DDE
p,p - 'DDE
p,p - 'DDT
o,p - 'DDT
p,p - 'TDE
o,p - 'TDE
Dicofol
Dieldrin
Photodieldrin
Endosulfan I
Endosulfan II
Endosulfan Sulfate
Endrin
Endrin Aldehyde
Endrin Ketone
Heptachlor
Heptachlor Epoxide
Isodrin
Lindane
Benzene Heptachloride
Methoxychlor
PCNB
Propachlor
Ronnel
Toxaphene
Trifluralin
Mi rex
Ovex
PCB
Prolan
Bulan
Gamma Chlordane
                                                                 (cont.)
-   Shown on the computer records to have RCC = 7, but corrected to 2 by

personal communication with EPA Field Studies Branch, Washington, D.C.



21
-   Tested for fiscal year 1972 specimens,'and thereafter.




3/
-   Tested for fiscal year 1974 specimens, and thereafter.





                                   E-5

-------
       Table E-l:  Pesticide Residues Tested on a Regular Basis
                               (continued)
Residue Classification
      Code (RCC)
Individual Residue
    Code (IRC)         Compound
3
3
3
3
3
3
3
3
3
4
5

6
149
246
248
348
380
518
531
643
650
235
013, ,
A /
016^'
Carbophenothion
DBF
Diazinon
Ethion
Folex
Malathion
Methyl Parathion
Ethyl Parathion
Phorate
2, 4-D
Arsenic

Atrazine
4/
-   Tested in fiscal years 1969, 1972, and 1973 for specimens from
"cornbelt States" only, i.e., South Dakota, Nebraska, Kansas, Missouri,
Iowa, Minnesota, Wisconsin, Illinois, Indiana, Ohio, and Michigan.

*
   Source:  Personal communication from the EPA Field Studies Branch,
Washington, D.C.
                                   E-6

-------
                                                        *
                Table E-2:  Residue Classification Codes
Residue Classification
Code (RCC)
2
3

4

5
6
IRC for
"none found"
905
910

911

901
914
Compound Category
Chlorinated hydrocarbons
Organophosphorous insecti-
cides
Fhenoxy acid derivative
herbicides
Arsenic compounds
Triazines
•f.
   Source:  Personal communication with EPA Field Studies Branch,

Washington, D.C.
were included  in  the SAS data file.  Table E-l also  shows  the RCC  for


each of the compounds tested regularly,  and Table E-2 gives the discrip-


tion of each of these RCC categories.  The RCC categories are crucial to


proper analysis of  the data because  all compounds with  a  common RCC are


tested simultaneously.   If the  test is  performed for  compounds with


RCC = 2,  for example,  there  are two possible types  of  entry into the


computer record.  Either  each positive  detection is entered, or an IRC


code is entered to  indicate  no positive detections,  as shown in Table


E-2.  Each  record in the EPA computer file  corresponds to a specific


sample specimen and contains  40 repitions of fields for IRC, RCC,  amount,


and unit.   All of  these  fields were replaced in the SAS data set by 48


pesticide amount variables, one  for each of the 48 compounds listed in


Table E-l.  A  zero  was entered for  each compound  for which less than

                             2/
detectable levels  were found.-   A decimal point,  the SAS
2/
-    Indicated by one  or  more detectable amounts for the same RCC or a
9XX code as shown in Table E-2.



                                   E-7

-------
missing value  symbol,  was  entered for the amount of compound detected

                                                      3/
whenever the test for that compound was not performed.-

     There are several variables, included for analysis purposes, on the

SAS data  file  that were not on  the original EPA data file.  Among these

new variables  are STNAME and CONAME,  the  State  and county names.  A

variable  called  ROUND was  constructed which has a value  of  one for

records from the  first round of data  collection, and a  value of  two for

the second round when the sites were revisited.  Also,  the sampling year

within  round  is given by YEAR,  e.g.,  YEAR = 1 for first-year sample

sites.  If the information  were available, it would also  be useful to

have  an indicator variable to  identify when site substitutions were

made.   This is especially important when substitutions  were made in the

second  round;  the second round data for such a site should not be directly
                                               4/
compared to the first  round data for  that  site.-   This  is an important

consideration  when  estimating differences in  residue  levels  from the

first round to the second round.

     Two other variables added  to the data file for analysis purposes

are STRATUM and WT.  The variable STRATUM is used to identify large-scale

geographic strata  within States  as described  in section  1.7.7.  The

STRATUM codes  and  their  meanings are given in Table E-3.  The variable

WT is the approximate sampling weight, which was constructed as
3/
-    Indicated by no  detectible  amounts for the same RCC and no corre-
sponding 9XX code as shown in Table E-2.

4/
-    EPA Field Studies  Branch, Washington,  D.C., assured RTI that such
substitutions in the  second  round amount to no  more than 5 percent of
all second round sites.
                                   E-8

-------
                                                         *
              Table E-3:  STRATUM Codes and Their Meaning
  STRATUM Code                               Meaning


      40                      Irrigated stratum
            I/                                 2/
  100 or 160-                 Remainder stratum^'

  400 or 640-                 Sandhills, desert, or other relatively
                              homogeneous stratum


-    Code used depends on PSU sizes used in the State.

2/
-    All sites in many states fall into the remainder stratum.

*
     Source:  Constructed by RTI from:  a) Data files supplied by the
EPA Field Studies Branch, Washington, D.C.  b) Data and personal
communications from J. Jeffery Goebel, Statistical Laboratory, Iowa
State University.
shown  in  Appendix D.   This  variable is, of course,  essential  for a

weighted  analysis  of the data  that  incorporates the sampling design

implications.

     Some quality assurance checks of the EPA computer files for the RSN

were made prior  to  creation of the SAS data set for analysis.   Twenty-

three  inconsistencies were discovered.   These  inconsistencies  are sum-

marized in Table E-4, and their resolution  is discussed below.  Most of

these  inconsistencies were resolved  by consulting microfilm copies of

the Analysis Worksheets, Form  6-7,  and the Sample Data Sheets,  Form 6-
     Maintained by the EPA Field Studies Branch, Washington, D.C.
                                   E-9

-------
Table E-4:  Data Inconsistancies in the Rural Soils Network Files
Case
Number
1

2

3

4

5

6

7

8

9

10

11

12

13

14

State Name
(State Number)
California (06)

Idaho (16)

Idaho (16)

Missouri (29)

North Carolina (37)

Ohio (39)

Virginia (51)

Virginia (51)

Illinois (17)

New York (36)

Alabama (01)

Mississippi (28)

Iowa (19)

Oregon (41)

Site
Number
39

67

75

21

31

22

37

46

138

78

91

113

559

103

Fiscal
Year
69

69

69

69

69

69

69

69

69

70

72

72

73

73

Sample
Material
Code (SMC)
1

1

1

1

1

1

1

1

1

63

1

1

1

1

Accession
Number
3196
3407
1470
11470
1471
11471
1226
11226
4063
4061
3566
3568
3468
4049
795
3476
3017
3017-'
10049
100049
204007
204117
204298
204298
312655
372655
310110
316110
Individual Residue
Codes (IRC)
13,244,243,241,260,786
911
905
13
160,260
13
13
905
13,241,786,243,240,787
911
241,260,2,243
13
13,905
911
13,911
905
13,160,914
911
905
910
13,241,243,910
914
241,244,243
13
799
910
910
905
Residue Class
Codes (RCC)
2,5
4
2
5
2
5
5
2
2,5
4
2
5
2,5
4
4,5
2
2,5,6
4
2
3
2,3,5
6
2
5
2
3
3
2
                                                                            (continued)

-------
                        Table E-4:  Data Inconsistancies in the Rural Soils Network Files

                                                    (continued)
Case
Number
15

16

17

18

19

20

21

22
23

State Name
(State Number)
Pennsylvania (42)

Nebraska (31)

Illinois (17)

Louisiana (22)

Mississippi (28)

Alabama (01)

New York (36)

West Virginia (54)
Mississippi (28)

Site
Number
164

151

99

26

49

105

194

54-/
49

Fiscal
Year
73

74

69

69

70

72

73

73
69

Sample
Material Accession
Code (SMC) Number
1

1

1

1

138

1

1

1
1

314025
340250
426298
427298
3028 .
3017-'
3621
4365
8652
8675
204111
204014
314144
314085
314097
781

Individual Residue Residue Class
Codes (IRC) Codes (RCC)
16,241,243
910
260
910
13,905
911
13,2,260
910
795,244,243,241,786,910
241,244,243,795,246
13,241,910
13,260,910
240,244,786,241,243,914
910
905,910
13,243,244,786,241,341,
795,246,799,240
2,6
3
2
3
2,5
4
2,5
3
2,3
2,3
2,3,5
2,3,5
2,6
3
2,3
2,5^

-   Case 17 becomes case 9 after the site number for case 17 is corrected to 138.


2/
-   Noncropland site number; land use changed from cropland (1) to noncropland (2)


3/
-'  RCC8 changed to 3 as IRC8 = 246.  See Table E-l.


*
    Source:  Computer files supplied by EPA Field Studies Branch, Washington, D.C.

-------
     One final correction  to  the data file was to correct the cropping



region code for several counties.  Valid codes for the cropping regions



are the integers  from one  through 8.  Several records in the computer




file showed  cropping region  codes  of 0 and  9.   These records were




corrected as shown in table E-5.
                                   E-14

-------
        Table E-5.  Resolution of Invalid Cropping Region Codes
  State Name
(State Number)

Iowa (19)
Kentucky (21)
Minnesota (27)
Mississippi (28)
Missouri (29)
Nebraska (31)
Nebraska (31)
Nebraska (31)
Oklahoma (33)
South Carolina (35)
Tennessee (47)
Tennessee (47)
California (6)
Georgia (13)
Maryland (24)
New York (36)
New York (36)
New York (36)
North Carolina (37)
Virginia (51)
Virginia (51)
Virginia (51)
Virginia (51)
West Virginia (54)
  County Naratr
(County Number)

Scott (163)
Scott (209)
Scott (139)
Alcorn (3)
Scott (201)
Hayes (85)
Scotts Bluff (157)
Thayer (169)
Cotton (33)
Dorchester (35)
Haywood (75)
Scott (151)
Alpine (3)
Invalid (4)
Invalid (18)
Bronx (5)
Invalid (32)
New York (61)
Dare (55)
Invalid (39)
Invalid (74)
Norfolk (129)
Princess Anne (151)
McDowell (47)
     Cropping Region
Original

   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   0
   9
   9
   9
   9
   9
   9
   9
   9
   9
   9
   9
   9
Corrected

    1
    6
    5
    3
    4
    2
    5
    1
    2
    4
    3
    6
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing
 Missing
 Personal communication with  EPA Field Studies Branch, Washington, D.C.
                                   E-15

-------