United States
                    [Environmental Protection
                    Agency
 Risk Reduction Engineering
 Laboratory
 Cincinnati OH 45268
                    Research and Development
 EPA/600/S2-91/005 Feb. 1992
v/EPA        Project  Summary
                     PRpTOCOL - A Computerized
                     Solid Waste Quantity  and
                     Composition Estimation System
                    Albert J. Klee
                      Efficient and statistically sound sam-
                    pling  protocols for estimating the
                    quantity and composition of solid waste
                    over a stated period of time in a given
                    location, such as a landfill site or at a
                    specific point in an  Industrial or com-
                    mercial process, are essential to the
                    design of resource  recovery systems
                    and waste minimization  programs, and
                    to the estimation of the  life of landfills
                    and the pollution burden on the land
                    posed by  the  generation  of solid
                    wastes. The theory  developed in this
                    study takes a significantly different ap-
                    proach over the more traditional sam-
                    pling  plans, resulting in lower costs
                    and more accurate  and precise esti-
                    mates of these critical entities. A com-
                    puter  program called PROTOCOL,
                    which  is designed to be run on per-
                    sonal computers with modest capabili-
                    ties, has also been developed to do the
                    calculations required.
                      This Project Summary  was developed
                    by EPA's Risk Reduction Engineering
                    Laboratory, Cincinnati, OH, to announce
                    key findings of the research project
                    that is fully documented in a separate
                    report  of the  same title (see Project
                    Report ordering Information at back).


                    Introduction
                     Traditional sampling theory generally
                    follows the following paradigm:

                     SAMPLE SELECTION—SAMPLE OB-
                    SERVATION—SAMPLE ESTIMATION.

                    Typically, the samples are chosen by an
                    unbiased procedure, such as simple ran-
                    dom sampling  or systematic  sampling
                    where it is assumed that the population is
already in random  order. In the sample
observation or data recording stage, it is
further assumed that observation of the
elements of the sample is an independent
process,  i.e., that there  is no queue of
sample elements building up, waiting to
be observed while an observation of one
sample element is being made. Unfortu-
nately, these assumptions do not fit the
circumstances when the problem is to es-
timate the quantity of solid waste arriving
at a given site, a specific  point in an
industrial or commercial process. For one
thing, the sample comes  to the investiga-
tor, which is the reverse of the situation
commonly described in standard sampling
textbooks. Since the investigator has no
control over the arrival of the sample ele-
ments, the sample observation process
often is far from independent.


Biased  Sampling for Greater
Efficiency
  Consider a situation where it is desired
to weigh a random or systematic sample
of 10% of the vehicles arriving at a land-
fill. Suppose that it is feasible to weigh up
to 10 vehicles per hour and that the aver-
age interarrival time of vehicles is 3.2 min.
On the average then, with either random
or systematic sampling, one vehicle would
be weighed every 32 min. If it takes an
average of 10  min to weigh a vehicle,
then this is well within the capability of the
sampling  system. Unfortunately, vehicles
do not have uniformly distributed arrival
times. There may be peak arrival periods
when the  number of trucks arriving and to
be sampled overwhelms the weighing ca-
pability and one is forced to  "default" on
weighing  some  of the vehicles selected
                                                                   Printed on Recycled Paper

-------
 by the sampling plan. One result is that
 fewer vehicles are weighed than the sam-
 pling plan calls for, thus reducing the pre-
 cision of the estimate of solid waste quan-
 tity. More important,  however, is that if the
 load weights of the defaulted samples dil-
 fer  appreciably  from  the  nondefaulted
 samples, bias will be introduced. For ex-
 ample, at many landfills vehicles arriving
 toward the end of the day  tend  to have
 smaller load weights than those arriving at
 other  times.  Since fewer vehicles arrive
 towards the end of the day, the tendency
 would  be to oversample  these lightly
 loaded vehicles and to undersample  the
 normally loaded vehicles arriving at peak
 hours, thus introducing a bias.
   The bias can be removed, however, in
 the estimation formula.

 Defining
   i   = index for hours
   j   = index for vehicles
   m. = number of vehicles  arriving in ith
        sampling hour
   N  = total number of vehicles in week
   n.  = number of vehicles sampled  in
        the  ith  hour
   x~  = average  vehicle-load  weight  for
        the week
 Seasonality
   It is well-known that the quantity of solid
 waste generated varies significantly from
 month  to  month.  Municipal  solid waste
 generation, for example, is typically low
 during the months of January, February,
 November, and  December,  and  peaks
 during June, July, and August,  although
 there are some variations depending upon
 geographical location. Thus, it is not suffi-
 cient to sample 1 week out of the year to
 estimate generation for the entire year. A
 Monte  Carlo  simulation was performed
 using actual data obtained from a Boston
 solid waste site. Two sampling protocols
 were simulated: (1) random sampling and
 (2)  systematic sampling. Sampling fre-
 quencies of from two to ten times per year
 were investigated.  Systematic sampling
 was found to be superior to random sam-
 pling for all sampling frequencies except
 for a sampling frequency of two,  where
 random  sampling  is identical  to system-
 atic sampling (see Table 1). At a sampling
 frequency  of four times per year, for ex-
 ample, the coefficient of variation (i.e., the
 standard  deviation  as  a fraction of the
 mean) of systematic sampling was only
 56% that of random sampling.
      = jth vehicle-load weight in ith hour    Table r. Monte Carlo simulation-Seasonality
 then  it  can be shown (the  derivation of
 these and  other equations shown  in this
 Project Summary can be found in the com-
 plete  Project Report) that an unbiased es-
 timate of x is:

      x-IKm/n.NJX,,           (1)

 and that the variance of this  estimate is:

 var (x") =
# Weeks
Sampled
2
3
4
5
6
7
8
9
10
cb Random
10.0
7.8
7.3
6.4
5.9
5.6
5.2
4.7
4.6
cb System
9.9
5.3
4.1
3.6
3.2
2.5
3.0
2.2
2.4
               nr1
                                   (2)
  When the rate of arrival of vehicles  is
not uniform  throughout the day, random
samples  will produce some intervals of
little or no activity and others of frenzied
activity. Once scales are rented  and labor
hired to make the observations, it makes
little sense not to use the equipment  and
labor to the  fullest extent possible.  If  one
samples to the fullest extent of  the sam-
pling capability, not only can Equations  1
and 2 be used to unbias the estimate, but
the resulting estimate  has a smaller vari-
ance than random or systematic sampling.
Key: cb = between-week coefficient of variation

Sample Weights for
Component Estimation
  Traditional sampling  theory  assumes
that there are sampling elements, i.e., dis-
crete entities comprising the population
about which inferences  are to be drawn.
When it comes to sampling solid waste for
composition, however, there are no such
discrete entities. There is, for example, no
such thing as a basic unit of paper or of
textiles. Thus, sampling procedures based
upon discrete  distributions  (such as the
multinomial  or  binomial) are  not valid.
Nonetheless, some basic  unit weight of
sample must  be defined.  In traditional
cluster  sampling theory,  a  balance is
achieved between  the within-cluster  and
between-cluster  components of  the total
 variability  of  an estimate. If  the  cluster
 (i.e., in  this context, a sample of given
 weight)  is too small, then the between-
 cluster variability will be greater than the
 within-cluster variability and will result in a
 large  sample variability.  If  the  cluster
 weight is too large,  however,  the greater
 will be the time and  expense of sampling.
 Since this is not a linear relationship (i.e.,
 doubling the  cluster size will  not  neces-
 sarily  double the precision of the  esti-
 mate), the optimal procedure  is to select
 that cluster  size where  the precision  of
 the sampling estimate  does  not  signifi-
 cantly improve with  cluster size. For raw
 municipal solid waste, this sample weight
 has been  found to be between 200 and
 300 Ib. For processed waste  streams, it
 has been found that particle size distribu-
 tions are adequately described by func-
 tions based upon the exponential distribu-
 tion, such as the Rosin-Rammler equation.
 Thus,  for particles smaller than that found
 in raw municipal solid waste, the following
 equation is recommended:
            Y-Xe
                                    (3)
 where Y is the optimal sample weight in
 pounds, and X is the characteristic par-
 ticle size of the material to be sampled, in
 inches  (the characteristic particle size is
 the  screen size  at which  63.2% of  the
 material passes through).

 Component Estimation
   When it  is desired to placed confidence
 intervals about the estimates made in the
 sample estimation process, traditionally the
 distribution  of either the population  or of
 the population parameter estimated is as-
 sumed to follow a specific classical prob-
 ability  distribution;  typically, the  normal
 distribution is assumed. In the sample se-
 lection process, similar assumptions are
 made  when determining the number  of
 samples to be taken. For example,  as-
 suming that at least the averages of the
 components are normally distributed, then
 the sample size is given by
            n = (ts/d)2
(4)
where d is the precision required (i.e., 1/2
the confidence interval desired), t is the t-
value at significance level a, and s is the
population standard deviation. Because the
t-value is not known until after n is deter-
mined, Equation 4 is applied in an itera-
tive, trial-and-error procedure.
  Although such  assumptions are justifi-
able in the estimation of solid waste quan-
tity, such  is  not  the  case  in  estimating
solid  waste  composition. For  one thing,
component fractions  are bounded, i.e.,

-------
there are no components in solid waste
that are present in fractions less than zero
or greater than one. These  boundaries
are generally located close to the means
of their  distributions. Thus,  solid waste
component distributions  are, at the  very
least, positively  skewed  (i.e., skewed to
the right) and, at worst, are J-shaped (see
Figure 1). Nor does  reliance  on the Cen-
tral Limit Theorem of statistics help much,
since even averages of  component  frac-
tions  do not  approach normality quickly,
at least  not within  an economically fea-
sible  number of  samples. Distributions of
component averages still tend to be posi-
tively  skewed. This characteristic precludes
the rote  application of the traditional sta-
tistical formulas  for either the estimation
of sample size  (such as Equation 4) or
the construction  of confidence  intervals.
Although transformations can be used to
construct the proper asymmetric confi-
dence intervals  after the sample data is
obtained, these  are of little  help for  esti-
mating sample size before the sample is
taken. A knowledge  of the effect of posi-
tive skewness on the actual  level of sig-
nificance  of a confidence  interval, how-
ever,  can be  of help in  determining the
number of samples to take.
  Since  the  rationale behind  using  the
Central Limit Theorem is to permit the use
of t-statistics to construct  confidence inter-
vals about the estimation of  the percent-
age of any component in the waste stream,
an appropriate measure  of  the  ability to
meet  the normality  requirements is  the
fraction of confidence intervals that actu-
ally contain the true mean at a given  level
of significance. For example,  Monte Carlo
simulations (see Table 2) have shown that,
given an  average of size 10, if a confi-
dence interval at a  nominal  significance
level of an »  .05 were constructed about
the mean, the  actual significance  level
would be a  = .104. In  other words, in-
stead of 95% confidence interval, we would
actually be constructing  an  89.6% confi-
dence interval. As the size of the average
gets larger, the discrepancy gets smaller.
For example,  at a nominal  significance
level of ctn =  .05 and an average of size
50, the actual significance level is aa =
.138.  For moderately  positively  skewed
distributions, such as ferrous metals, these
discrepancies are very much smaller, even
for very small sizes of the average.
  From the data in Table 2,  equations
have been obtained that relate an and oca.
For J-shaped  distributions, such as tex"-
tiles,  the equation is,

  +l1
                L  p2      J
                                   (7)
(8)
       Selecting a t-value, t, at a desired signifi-
       cance  level, a, and defining quantities  L
       and U  as:
                     L = u. -1 o             (9)
                     U = u + t a
(10)
       the confidence interval around T is com-
       puted as:

              lower boundary = eL         (11)

              upper boundary = eu         (12)


         Although  the calculations  dictated by
       these protocols are arithmetically tedious,
       a computer program (designed to  be run
       on personal computers with modest capa-
       bilities) has been developed  to carry out
       these tasks. The program, called PROTO-
       COL, also contains routines  that check
       input data for  errors in  coding,  and an
       editor for preparing and modifying input
       data files.
                                                                 Ferrous
                                                                  Metals
                                                                            10%
                                             02        46       8%
                                            Figure  1. Distribution of textiles and ferrous metals in municipal solid waste.
                                                                         •&V.S. GOVERNMENT PRINTING OFFICE: 1992 - 648-080/40150

-------
 Table 2. Results of Simulation  Studies for Two Municipal Solid Waste Components
 Coefficient of Skewness
 Actual a. Nominal a = .01
 Actual a, Nominal a = .02
 Actual a, Nominal a = .03
 Actual a, Nominal a = .04
 Actual a, Nominal a = .05
 Actual a, Nominal a = . 10
 Actual a, Nominal a = .20
 Coefficient of Skewness
                                1.45
                                 .68
                                            .66
.075
.102
.119
.132
.148
.190
.268
                                           .32
                                                    10
 .46


.051
.065
.079
.089
.104
.138
.227
                                                    .20
                                                            15
                                                            33
.041
.055
.065
.076
.090
.134
.215
                           Size of Average

                         20     25      30

                         J-Function (Textiles)

                        .32    .30      .27
.033
.046
.056
.065
.076
.128
.214
.031
.040
.051
.060
.067
.115
.211
.024
.035
.045
.055
.065
.114
.207
                                                                                          35
                                                                                          .29
.026
.037
.046
.056
.064
.112
.210
                    Moderate Skew (Ferrous Metals)

                 .15     .13    .17      .16     .19
                                                                                                  40
                                                                                                  .28
.017
.028
.036
.044
.065
.104
.199
                                                                                                 .16
                                                                                                         45
                                                                                                         .21
.018
.028
.038
.046
.060
.104
.200
                                                                                                         .12
                                                                                                                 50
                                                                                                                .20
.019
.031
.042
.049
.058
.099
.205
                                                                                                                .05
Actual a. Nominal a =
Actual a, Nominal a =
Actual a, Nominal a =
Actual a, Nominal a =
Actual a, Nominal a =
Actual a. Nominal a =
Actual a, Nominal a =
.01
.02
.03
.04
.05
.10
.20
.017
.026
.043
.059
.065
.116
.207
.017
.033
.039
.053
.063
.115
.204
.018
.027
.039
.047
.057
.110
.215
.018
.026
.040
.043
.055
.098
.204
.015
.023
.035
.042
.053
.100
.204
.013
.019
.033
.048
.047
.104
.214
.014
.025
.035
.043
.052
.091
.211
.009
.025
.035
.041
.059
.097
.197
.010
.025
.035
.047
.051
.100
.208
.012
.024
.036
.042
.051
.110
.203
The EPA author, Albert J. Klee, is with the Risk Reduction Engineering Laboratory
  Cincinnati, OH 45268.
The complete report, entitled "PROTOCOL - A Computerized Solid Waste Quantity
  and Composition Estimation System,"  and the diskette (Order No. PB 91-201 669/
  AS; Cost: $17.00, subject to change)  will be available only from:
        National Technical Information Service
        5285 Port Royal Road
        Springfield, VA 22161
        Telephone: 703-487-4650
The EPA author can be contacted at:
        Risk Reduction Engineering Laboratory
        U.S. Environmental Protection Agency
        Cincinnati, OH 45268
 United States
 Environmental Protection
 Agency
Center for Environmental Research
Information
Cincinnati, OH 45268
                                                            BULK RATE
                                                     POSTAGE & FEES PAID
                                                               EPA
                                                         PERMIT NO. G-35
Official Business
Penalty for Private Use $300
EPA/600/S2-91/005

-------