United States
[Environmental Protection
Agency
Risk Reduction Engineering
Laboratory
Cincinnati OH 45268
Research and Development
EPA/600/S2-91/005 Feb. 1992
v/EPA Project Summary
PRpTOCOL - A Computerized
Solid Waste Quantity and
Composition Estimation System
Albert J. Klee
Efficient and statistically sound sam-
pling protocols for estimating the
quantity and composition of solid waste
over a stated period of time in a given
location, such as a landfill site or at a
specific point in an Industrial or com-
mercial process, are essential to the
design of resource recovery systems
and waste minimization programs, and
to the estimation of the life of landfills
and the pollution burden on the land
posed by the generation of solid
wastes. The theory developed in this
study takes a significantly different ap-
proach over the more traditional sam-
pling plans, resulting in lower costs
and more accurate and precise esti-
mates of these critical entities. A com-
puter program called PROTOCOL,
which is designed to be run on per-
sonal computers with modest capabili-
ties, has also been developed to do the
calculations required.
This Project Summary was developed
by EPA's Risk Reduction Engineering
Laboratory, Cincinnati, OH, to announce
key findings of the research project
that is fully documented in a separate
report of the same title (see Project
Report ordering Information at back).
Introduction
Traditional sampling theory generally
follows the following paradigm:
SAMPLE SELECTION—SAMPLE OB-
SERVATION—SAMPLE ESTIMATION.
Typically, the samples are chosen by an
unbiased procedure, such as simple ran-
dom sampling or systematic sampling
where it is assumed that the population is
already in random order. In the sample
observation or data recording stage, it is
further assumed that observation of the
elements of the sample is an independent
process, i.e., that there is no queue of
sample elements building up, waiting to
be observed while an observation of one
sample element is being made. Unfortu-
nately, these assumptions do not fit the
circumstances when the problem is to es-
timate the quantity of solid waste arriving
at a given site, a specific point in an
industrial or commercial process. For one
thing, the sample comes to the investiga-
tor, which is the reverse of the situation
commonly described in standard sampling
textbooks. Since the investigator has no
control over the arrival of the sample ele-
ments, the sample observation process
often is far from independent.
Biased Sampling for Greater
Efficiency
Consider a situation where it is desired
to weigh a random or systematic sample
of 10% of the vehicles arriving at a land-
fill. Suppose that it is feasible to weigh up
to 10 vehicles per hour and that the aver-
age interarrival time of vehicles is 3.2 min.
On the average then, with either random
or systematic sampling, one vehicle would
be weighed every 32 min. If it takes an
average of 10 min to weigh a vehicle,
then this is well within the capability of the
sampling system. Unfortunately, vehicles
do not have uniformly distributed arrival
times. There may be peak arrival periods
when the number of trucks arriving and to
be sampled overwhelms the weighing ca-
pability and one is forced to "default" on
weighing some of the vehicles selected
Printed on Recycled Paper
-------
by the sampling plan. One result is that
fewer vehicles are weighed than the sam-
pling plan calls for, thus reducing the pre-
cision of the estimate of solid waste quan-
tity. More important, however, is that if the
load weights of the defaulted samples dil-
fer appreciably from the nondefaulted
samples, bias will be introduced. For ex-
ample, at many landfills vehicles arriving
toward the end of the day tend to have
smaller load weights than those arriving at
other times. Since fewer vehicles arrive
towards the end of the day, the tendency
would be to oversample these lightly
loaded vehicles and to undersample the
normally loaded vehicles arriving at peak
hours, thus introducing a bias.
The bias can be removed, however, in
the estimation formula.
Defining
i = index for hours
j = index for vehicles
m. = number of vehicles arriving in ith
sampling hour
N = total number of vehicles in week
n. = number of vehicles sampled in
the ith hour
x~ = average vehicle-load weight for
the week
Seasonality
It is well-known that the quantity of solid
waste generated varies significantly from
month to month. Municipal solid waste
generation, for example, is typically low
during the months of January, February,
November, and December, and peaks
during June, July, and August, although
there are some variations depending upon
geographical location. Thus, it is not suffi-
cient to sample 1 week out of the year to
estimate generation for the entire year. A
Monte Carlo simulation was performed
using actual data obtained from a Boston
solid waste site. Two sampling protocols
were simulated: (1) random sampling and
(2) systematic sampling. Sampling fre-
quencies of from two to ten times per year
were investigated. Systematic sampling
was found to be superior to random sam-
pling for all sampling frequencies except
for a sampling frequency of two, where
random sampling is identical to system-
atic sampling (see Table 1). At a sampling
frequency of four times per year, for ex-
ample, the coefficient of variation (i.e., the
standard deviation as a fraction of the
mean) of systematic sampling was only
56% that of random sampling.
= jth vehicle-load weight in ith hour Table r. Monte Carlo simulation-Seasonality
then it can be shown (the derivation of
these and other equations shown in this
Project Summary can be found in the com-
plete Project Report) that an unbiased es-
timate of x is:
x-IKm/n.NJX,, (1)
and that the variance of this estimate is:
var (x") =
# Weeks
Sampled
2
3
4
5
6
7
8
9
10
cb Random
10.0
7.8
7.3
6.4
5.9
5.6
5.2
4.7
4.6
cb System
9.9
5.3
4.1
3.6
3.2
2.5
3.0
2.2
2.4
nr1
(2)
When the rate of arrival of vehicles is
not uniform throughout the day, random
samples will produce some intervals of
little or no activity and others of frenzied
activity. Once scales are rented and labor
hired to make the observations, it makes
little sense not to use the equipment and
labor to the fullest extent possible. If one
samples to the fullest extent of the sam-
pling capability, not only can Equations 1
and 2 be used to unbias the estimate, but
the resulting estimate has a smaller vari-
ance than random or systematic sampling.
Key: cb = between-week coefficient of variation
Sample Weights for
Component Estimation
Traditional sampling theory assumes
that there are sampling elements, i.e., dis-
crete entities comprising the population
about which inferences are to be drawn.
When it comes to sampling solid waste for
composition, however, there are no such
discrete entities. There is, for example, no
such thing as a basic unit of paper or of
textiles. Thus, sampling procedures based
upon discrete distributions (such as the
multinomial or binomial) are not valid.
Nonetheless, some basic unit weight of
sample must be defined. In traditional
cluster sampling theory, a balance is
achieved between the within-cluster and
between-cluster components of the total
variability of an estimate. If the cluster
(i.e., in this context, a sample of given
weight) is too small, then the between-
cluster variability will be greater than the
within-cluster variability and will result in a
large sample variability. If the cluster
weight is too large, however, the greater
will be the time and expense of sampling.
Since this is not a linear relationship (i.e.,
doubling the cluster size will not neces-
sarily double the precision of the esti-
mate), the optimal procedure is to select
that cluster size where the precision of
the sampling estimate does not signifi-
cantly improve with cluster size. For raw
municipal solid waste, this sample weight
has been found to be between 200 and
300 Ib. For processed waste streams, it
has been found that particle size distribu-
tions are adequately described by func-
tions based upon the exponential distribu-
tion, such as the Rosin-Rammler equation.
Thus, for particles smaller than that found
in raw municipal solid waste, the following
equation is recommended:
Y-Xe
(3)
where Y is the optimal sample weight in
pounds, and X is the characteristic par-
ticle size of the material to be sampled, in
inches (the characteristic particle size is
the screen size at which 63.2% of the
material passes through).
Component Estimation
When it is desired to placed confidence
intervals about the estimates made in the
sample estimation process, traditionally the
distribution of either the population or of
the population parameter estimated is as-
sumed to follow a specific classical prob-
ability distribution; typically, the normal
distribution is assumed. In the sample se-
lection process, similar assumptions are
made when determining the number of
samples to be taken. For example, as-
suming that at least the averages of the
components are normally distributed, then
the sample size is given by
n = (ts/d)2
(4)
where d is the precision required (i.e., 1/2
the confidence interval desired), t is the t-
value at significance level a, and s is the
population standard deviation. Because the
t-value is not known until after n is deter-
mined, Equation 4 is applied in an itera-
tive, trial-and-error procedure.
Although such assumptions are justifi-
able in the estimation of solid waste quan-
tity, such is not the case in estimating
solid waste composition. For one thing,
component fractions are bounded, i.e.,
-------
there are no components in solid waste
that are present in fractions less than zero
or greater than one. These boundaries
are generally located close to the means
of their distributions. Thus, solid waste
component distributions are, at the very
least, positively skewed (i.e., skewed to
the right) and, at worst, are J-shaped (see
Figure 1). Nor does reliance on the Cen-
tral Limit Theorem of statistics help much,
since even averages of component frac-
tions do not approach normality quickly,
at least not within an economically fea-
sible number of samples. Distributions of
component averages still tend to be posi-
tively skewed. This characteristic precludes
the rote application of the traditional sta-
tistical formulas for either the estimation
of sample size (such as Equation 4) or
the construction of confidence intervals.
Although transformations can be used to
construct the proper asymmetric confi-
dence intervals after the sample data is
obtained, these are of little help for esti-
mating sample size before the sample is
taken. A knowledge of the effect of posi-
tive skewness on the actual level of sig-
nificance of a confidence interval, how-
ever, can be of help in determining the
number of samples to take.
Since the rationale behind using the
Central Limit Theorem is to permit the use
of t-statistics to construct confidence inter-
vals about the estimation of the percent-
age of any component in the waste stream,
an appropriate measure of the ability to
meet the normality requirements is the
fraction of confidence intervals that actu-
ally contain the true mean at a given level
of significance. For example, Monte Carlo
simulations (see Table 2) have shown that,
given an average of size 10, if a confi-
dence interval at a nominal significance
level of an » .05 were constructed about
the mean, the actual significance level
would be a = .104. In other words, in-
stead of 95% confidence interval, we would
actually be constructing an 89.6% confi-
dence interval. As the size of the average
gets larger, the discrepancy gets smaller.
For example, at a nominal significance
level of ctn = .05 and an average of size
50, the actual significance level is aa =
.138. For moderately positively skewed
distributions, such as ferrous metals, these
discrepancies are very much smaller, even
for very small sizes of the average.
From the data in Table 2, equations
have been obtained that relate an and oca.
For J-shaped distributions, such as tex"-
tiles, the equation is,
+l1
L p2 J
(7)
(8)
Selecting a t-value, t, at a desired signifi-
cance level, a, and defining quantities L
and U as:
L = u. -1 o (9)
U = u + t a
(10)
the confidence interval around T is com-
puted as:
lower boundary = eL (11)
upper boundary = eu (12)
Although the calculations dictated by
these protocols are arithmetically tedious,
a computer program (designed to be run
on personal computers with modest capa-
bilities) has been developed to carry out
these tasks. The program, called PROTO-
COL, also contains routines that check
input data for errors in coding, and an
editor for preparing and modifying input
data files.
Ferrous
Metals
10%
02 46 8%
Figure 1. Distribution of textiles and ferrous metals in municipal solid waste.
•&V.S. GOVERNMENT PRINTING OFFICE: 1992 - 648-080/40150
-------
Table 2. Results of Simulation Studies for Two Municipal Solid Waste Components
Coefficient of Skewness
Actual a. Nominal a = .01
Actual a, Nominal a = .02
Actual a, Nominal a = .03
Actual a, Nominal a = .04
Actual a, Nominal a = .05
Actual a, Nominal a = . 10
Actual a, Nominal a = .20
Coefficient of Skewness
1.45
.68
.66
.075
.102
.119
.132
.148
.190
.268
.32
10
.46
.051
.065
.079
.089
.104
.138
.227
.20
15
33
.041
.055
.065
.076
.090
.134
.215
Size of Average
20 25 30
J-Function (Textiles)
.32 .30 .27
.033
.046
.056
.065
.076
.128
.214
.031
.040
.051
.060
.067
.115
.211
.024
.035
.045
.055
.065
.114
.207
35
.29
.026
.037
.046
.056
.064
.112
.210
Moderate Skew (Ferrous Metals)
.15 .13 .17 .16 .19
40
.28
.017
.028
.036
.044
.065
.104
.199
.16
45
.21
.018
.028
.038
.046
.060
.104
.200
.12
50
.20
.019
.031
.042
.049
.058
.099
.205
.05
Actual a. Nominal a =
Actual a, Nominal a =
Actual a, Nominal a =
Actual a, Nominal a =
Actual a, Nominal a =
Actual a. Nominal a =
Actual a, Nominal a =
.01
.02
.03
.04
.05
.10
.20
.017
.026
.043
.059
.065
.116
.207
.017
.033
.039
.053
.063
.115
.204
.018
.027
.039
.047
.057
.110
.215
.018
.026
.040
.043
.055
.098
.204
.015
.023
.035
.042
.053
.100
.204
.013
.019
.033
.048
.047
.104
.214
.014
.025
.035
.043
.052
.091
.211
.009
.025
.035
.041
.059
.097
.197
.010
.025
.035
.047
.051
.100
.208
.012
.024
.036
.042
.051
.110
.203
The EPA author, Albert J. Klee, is with the Risk Reduction Engineering Laboratory
Cincinnati, OH 45268.
The complete report, entitled "PROTOCOL - A Computerized Solid Waste Quantity
and Composition Estimation System," and the diskette (Order No. PB 91-201 669/
AS; Cost: $17.00, subject to change) will be available only from:
National Technical Information Service
5285 Port Royal Road
Springfield, VA 22161
Telephone: 703-487-4650
The EPA author can be contacted at:
Risk Reduction Engineering Laboratory
U.S. Environmental Protection Agency
Cincinnati, OH 45268
United States
Environmental Protection
Agency
Center for Environmental Research
Information
Cincinnati, OH 45268
BULK RATE
POSTAGE & FEES PAID
EPA
PERMIT NO. G-35
Official Business
Penalty for Private Use $300
EPA/600/S2-91/005
------- |