PROTOCOL- A Computerized Solid Waste Quantity and Composition Estimation System. Project Summary


                    United States
                    [Environmental Protection
                    Agency
 Risk Reduction Engineering
 Laboratory
 Cincinnati OH 45268
                    Research and Development
 EPA/600/S2-91/005 Feb. 1992
v/EPA        Project  Summary
                     PRpTOCOL - A Computerized
                     Solid Waste Quantity  and
                     Composition Estimation System
                    Albert J. Klee
                      Efficient and statistically sound sam-
                    pling  protocols for estimating the
                    quantity and composition of solid waste
                    over a stated period of time in a given
                    location, such as a landfill site or at a
                    specific point in an  Industrial or com-
                    mercial process, are essential to the
                    design of resource  recovery systems
                    and waste minimization  programs, and
                    to the estimation of the  life of landfills
                    and the pollution burden on the land
                    posed by  the  generation  of solid
                    wastes. The theory  developed in this
                    study takes a significantly different ap-
                    proach over the more traditional sam-
                    pling  plans, resulting in lower costs
                    and more accurate  and precise esti-
                    mates of these critical entities. A com-
                    puter  program called PROTOCOL,
                    which  is designed to be run on per-
                    sonal computers with modest capabili-
                    ties, has also been developed to do the
                    calculations required.
                      This Project Summary  was developed
                    by EPA's Risk Reduction Engineering
                    Laboratory, Cincinnati, OH, to announce
                    key findings of the research project
                    that is fully documented in a separate
                    report  of the  same title (see Project
                    Report ordering Information at back).


                    Introduction
                     Traditional sampling theory generally
                    follows the following paradigm:

                     SAMPLE SELECTION—SAMPLE OB-
                    SERVATION—SAMPLE ESTIMATION.

                    Typically, the samples are chosen by an
                    unbiased procedure, such as simple ran-
                    dom sampling  or systematic  sampling
                    where it is assumed that the population is
already in random  order. In the sample
observation or data recording stage, it is
further assumed that observation of the
elements of the sample is an independent
process,  i.e., that there  is no queue of
sample elements building up, waiting to
be observed while an observation of one
sample element is being made. Unfortu-
nately, these assumptions do not fit the
circumstances when the problem is to es-
timate the quantity of solid waste arriving
at a given site, a specific  point in an
industrial or commercial process. For one
thing, the sample comes  to the investiga-
tor, which is the reverse of the situation
commonly described in standard sampling
textbooks. Since the investigator has no
control over the arrival of the sample ele-
ments, the sample observation process
often is far from independent.


Biased  Sampling for Greater
Efficiency
  Consider a situation where it is desired
to weigh a random or systematic sample
of 10% of the vehicles arriving at a land-
fill. Suppose that it is feasible to weigh up
to 10 vehicles per hour and that the aver-
age interarrival time of vehicles is 3.2 min.
On the average then, with either random
or systematic sampling, one vehicle would
be weighed every 32 min. If it takes an
average of 10  min to weigh a vehicle,
then this is well within the capability of the
sampling  system. Unfortunately, vehicles
do not have uniformly distributed arrival
times. There may be peak arrival periods
when the  number of trucks arriving and to
be sampled overwhelms the weighing ca-
pability and one is forced to  "default" on
weighing  some  of the vehicles selected
                                                                   Printed on Recycled Paper

-------
by the sampling plan. One result is that
fewer vehicles are weighed than the sam-
pling plan calls for, thus reducing the pre-
cision of the estimate of solid waste quan-
tity. More important, however, is that if the
load weights of the defaulted samples dil-
fer appreciably from the nondefaulted
samples, bias will be introduced. For ex-
ample, at many landfills vehicles arriving
toward the end of the day tend to have
smaller load weights than those arriving at
other times. Since fewer vehicles arrive
towards the end of the day, the tendency
would be to oversample these lightly
loaded vehicles and to undersample the
normally loaded vehicles arriving at peak
hours, thus introducing a bias.
The bias can be removed, however, in
the estimation formula.

Defining
i = index for hours
j = index for vehicles
m. = number of vehicles arriving in ith
sampling hour
N = total number of vehicles in week
n. = number of vehicles sampled in
the ith hour
x~ = average vehicle-load weight for
the week
Seasonality
It is well-known that the quantity of solid
waste generated varies significantly from
month to month. Municipal solid waste
generation, for example, is typically low
during the months of January, February,
November, and December, and peaks
during June, July, and August, although
there are some variations depending upon
geographical location. Thus, it is not suffi-
cient to sample 1 week out of the year to
estimate generation for the entire year. A
Monte Carlo simulation was performed
using actual data obtained from a Boston
solid waste site. Two sampling protocols
were simulated: (1) random sampling and
(2) systematic sampling. Sampling fre-
quencies of from two to ten times per year
were investigated. Systematic sampling
was found to be superior to random sam-
pling for all sampling frequencies except
for a sampling frequency of two, where
random sampling is identical to system-
atic sampling (see Table 1). At a sampling
frequency of four times per year, for ex-
ample, the coefficient of variation (i.e., the
standard deviation as a fraction of the
mean) of systematic sampling was only
56% that of random sampling.
= jth vehicle-load weight in ith hour Table r. Monte Carlo simulation-Seasonality
then it can be shown (the derivation of
these and other equations shown in this
Project Summary can be found in the com-
plete Project Report) that an unbiased es-
timate of x is:

x-IKm/n.NJX,, (1)

and that the variance of this estimate is:

var (x") =
# Weeks
Sampled
2
3
4
5
6
7
8
9
10
cb Random
10.0
7.8
7.3
6.4
5.9
5.6
5.2
4.7
4.6
cb System
9.9
5.3
4.1
3.6
3.2
2.5
3.0
2.2
2.4
nr1
(2)
When the rate of arrival of vehicles is
not uniform throughout the day, random
samples will produce some intervals of
little or no activity and others of frenzied
activity. Once scales are rented and labor
hired to make the observations, it makes
little sense not to use the equipment and
labor to the fullest extent possible. If one
samples to the fullest extent of the sam-
pling capability, not only can Equations 1
and 2 be used to unbias the estimate, but
the resulting estimate has a smaller vari-
ance than random or systematic sampling.
Key: cb = between-week coefficient of variation

Sample Weights for
Component Estimation
Traditional sampling theory assumes
that there are sampling elements, i.e., dis-
crete entities comprising the population
about which inferences are to be drawn.
When it comes to sampling solid waste for
composition, however, there are no such
discrete entities. There is, for example, no
such thing as a basic unit of paper or of
textiles. Thus, sampling procedures based
upon discrete distributions (such as the
multinomial or binomial) are not valid.
Nonetheless, some basic unit weight of
sample must be defined. In traditional
cluster sampling theory, a balance is
achieved between the within-cluster and
between-cluster components of the total
variability of an estimate. If the cluster
(i.e., in this context, a sample of given
weight) is too small, then the between-
cluster variability will be greater than the
within-cluster variability and will result in a
large sample variability. If the cluster
weight is too large, however, the greater
will be the time and expense of sampling.
Since this is not a linear relationship (i.e.,
doubling the cluster size will not neces-
sarily double the precision of the esti-
mate), the optimal procedure is to select
that cluster size where the precision of
the sampling estimate does not signifi-
cantly improve with cluster size. For raw
municipal solid waste, this sample weight
has been found to be between 200 and
300 Ib. For processed waste streams, it
has been found that particle size distribu-
tions are adequately described by func-
tions based upon the exponential distribu-
tion, such as the Rosin-Rammler equation.
Thus, for particles smaller than that found
in raw municipal solid waste, the following
equation is recommended:
Y-Xe
(3)
where Y is the optimal sample weight in
pounds, and X is the characteristic par-
ticle size of the material to be sampled, in
inches (the characteristic particle size is
the screen size at which 63.2% of the
material passes through).

Component Estimation
When it is desired to placed confidence
intervals about the estimates made in the
sample estimation process, traditionally the
distribution of either the population or of
the population parameter estimated is as-
sumed to follow a specific classical prob-
ability distribution; typically, the normal
distribution is assumed. In the sample se-
lection process, similar assumptions are
made when determining the number of
samples to be taken. For example, as-
suming that at least the averages of the
components are normally distributed, then
the sample size is given by
n = (ts/d)2
(4)
where d is the precision required (i.e., 1/2
the confidence interval desired), t is the t-
value at significance level a, and s is the
population standard deviation. Because the
t-value is not known until after n is deter-
mined, Equation 4 is applied in an itera-
tive, trial-and-error procedure.
Although such assumptions are justifi-
able in the estimation of solid waste quan-
tity, such is not the case in estimating
solid waste composition. For one thing,
component fractions are bounded, i.e.,

-------
there are no components in solid waste
that are present in fractions less than zero
or greater than one. These boundaries
are generally located close to the means
of their distributions. Thus, solid waste
component distributions are, at the very
least, positively skewed (i.e., skewed to
the right) and, at worst, are J-shaped (see
Figure 1). Nor does reliance on the Cen-
tral Limit Theorem of statistics help much,
since even averages of component frac-
tions do not approach normality quickly,
at least not within an economically fea-
sible number of samples. Distributions of
component averages still tend to be posi-
tively skewed. This characteristic precludes
the rote application of the traditional sta-
tistical formulas for either the estimation
of sample size (such as Equation 4) or
the construction of confidence intervals.
Although transformations can be used to
construct the proper asymmetric confi-
dence intervals after the sample data is
obtained, these are of little help for esti-
mating sample size before the sample is
taken. A knowledge of the effect of posi-
tive skewness on the actual level of sig-
nificance of a confidence interval, how-
ever, can be of help in determining the
number of samples to take.
Since the rationale behind using the
Central Limit Theorem is to permit the use
of t-statistics to construct confidence inter-
vals about the estimation of the percent-
age of any component in the waste stream,
an appropriate measure of the ability to
meet the normality requirements is the
fraction of confidence intervals that actu-
ally contain the true mean at a given level
of significance. For example, Monte Carlo
simulations (see Table 2) have shown that,
given an average of size 10, if a confi-
dence interval at a nominal significance
level of an » .05 were constructed about
the mean, the actual significance level
would be a = .104. In other words, in-
stead of 95% confidence interval, we would
actually be constructing an 89.6% confi-
dence interval. As the size of the average
gets larger, the discrepancy gets smaller.
For example, at a nominal significance
level of ctn = .05 and an average of size
50, the actual significance level is aa =
.138. For moderately positively skewed
distributions, such as ferrous metals, these
discrepancies are very much smaller, even
for very small sizes of the average.
From the data in Table 2, equations
have been obtained that relate an and oca.
For J-shaped distributions, such as tex"-
tiles, the equation is,

+l1
L p2 J
(7)
(8)
Selecting a t-value, t, at a desired signifi-
cance level, a, and defining quantities L
and U as:
L = u. -1 o (9)
U = u + t a
(10)
the confidence interval around T is com-
puted as:

lower boundary = eL (11)

upper boundary = eu (12)

Although the calculations dictated by
these protocols are arithmetically tedious,
a computer program (designed to be run
on personal computers with modest capa-
bilities) has been developed to carry out
these tasks. The program, called PROTO-
COL, also contains routines that check
input data for errors in coding, and an
editor for preparing and modifying input
data files.
Ferrous
Metals
10%
02 46 8%
Figure 1. Distribution of textiles and ferrous metals in municipal solid waste.
•&V.S. GOVERNMENT PRINTING OFFICE: 1992 - 648-080/40150

-------
 Table 2. Results of Simulation  Studies for Two Municipal Solid Waste Components
 Coefficient of Skewness
 Actual a. Nominal a = .01
 Actual a, Nominal a = .02
 Actual a, Nominal a = .03
 Actual a, Nominal a = .04
 Actual a, Nominal a = .05
 Actual a, Nominal a = . 10
 Actual a, Nominal a = .20
 Coefficient of Skewness
                                1.45
                                 .68
                                            .66
.075
.102
.119
.132
.148
.190
.268
                                           .32
                                                    10
 .46


.051
.065
.079
.089
.104
.138
.227
                                                    .20
                                                            15
                                                            33
.041
.055
.065
.076
.090
.134
.215
                           Size of Average

                         20     25      30

                         J-Function (Textiles)

                        .32    .30      .27
.033
.046
.056
.065
.076
.128
.214
.031
.040
.051
.060
.067
.115
.211
.024
.035
.045
.055
.065
.114
.207
                                                                                          35
                                                                                          .29
.026
.037
.046
.056
.064
.112
.210
                    Moderate Skew (Ferrous Metals)

                 .15     .13    .17      .16     .19
                                                                                                  40
                                                                                                  .28
.017
.028
.036
.044
.065
.104
.199
                                                                                                 .16
                                                                                                         45
                                                                                                         .21
.018
.028
.038
.046
.060
.104
.200
                                                                                                         .12
                                                                                                                 50
                                                                                                                .20
.019
.031
.042
.049
.058
.099
.205
                                                                                                                .05
Actual a. Nominal a =
Actual a, Nominal a =
Actual a, Nominal a =
Actual a, Nominal a =
Actual a, Nominal a =
Actual a. Nominal a =
Actual a, Nominal a =
.01
.02
.03
.04
.05
.10
.20
.017
.026
.043
.059
.065
.116
.207
.017
.033
.039
.053
.063
.115
.204
.018
.027
.039
.047
.057
.110
.215
.018
.026
.040
.043
.055
.098
.204
.015
.023
.035
.042
.053
.100
.204
.013
.019
.033
.048
.047
.104
.214
.014
.025
.035
.043
.052
.091
.211
.009
.025
.035
.041
.059
.097
.197
.010
.025
.035
.047
.051
.100
.208
.012
.024
.036
.042
.051
.110
.203
The EPA author, Albert J. Klee, is with the Risk Reduction Engineering Laboratory
  Cincinnati, OH 45268.
The complete report, entitled "PROTOCOL - A Computerized Solid Waste Quantity
  and Composition Estimation System,"  and the diskette (Order No. PB 91-201 669/
  AS; Cost: $17.00, subject to change)  will be available only from:
        National Technical Information Service
        5285 Port Royal Road
        Springfield, VA 22161
        Telephone: 703-487-4650
The EPA author can be contacted at:
        Risk Reduction Engineering Laboratory
        U.S. Environmental Protection Agency
        Cincinnati, OH 45268
 United States
 Environmental Protection
 Agency
Center for Environmental Research
Information
Cincinnati, OH 45268
                                                            BULK RATE
                                                     POSTAGE & FEES PAID
                                                               EPA
                                                         PERMIT NO. G-35
Official Business
Penalty for Private Use $300
EPA/600/S2-91/005

-------