m
PROTOCOL
A COMPUTERIZED SOLID WASTE QUANTITY
AND COMPOSITION ESTIMATION SYSTEM
BY
ALBERT J. KLEE
RISK REDUCTION ENGINEERING LABORATORY
U.S. ENVIRONMENTAL PROTECTION AGENCY
CINCINNATI, OHIO 45268
-------
PROTOCOL
A COMPUTERIZED SOLID WASTE QUANTITY
AND COMPOSITION ESTIMATION SYSTEM
by
Albert J. Klee
Risk Reduction Engineering Laboratory
U.S. Environmental Protection Agency
Cincinnati, Ohio 45268
RISK REDUCTION ENGINEERING LABORATORY
OFFICE OF RESEARCH AND DEVELOPMENT
U.S. ENVIRONMENTAL PROTECTION AGENCY
CINCINNATI, OHIO 45268
-------
DISCLAIMER
This report has been reviewed by the U.S. Environmental
Protection Agency and approved for publication. Approval does
not signify that the contents necessarily reflect the views and
policies of the U.S. Environmental Protection Agency, nor does
mention of trade names or commercial products constitute endorse-
ment or recommendation of use.
ii
-------
FOREWORD
Today's rapidly developing and changing technologies and
industrial products and practices frequently carry with them the
increased generation of materials that, if improperly dealt with,
can threaten both public health and the environment. The U.S.
Environmental Protection Agency is charged by Congress with
protecting the Nation's land, air, and water resources. Under a
mandate of national environmental laws, the agency strives to
formulate and implement actions leading to a compatible balance
between human activities and the ability of natural systems to
support and nurture life. These laws direct the EPA to perform
research to define our environmental problems, measure the im-
pacts, and search for solutions.
The Risk Reduction Engineering Laboratory is responsible for
planning, implementing, and managing research, development, and
demonstration programs to provide an authoritative, defensible
engineering basis in support of the policies, programs,' and
regulations of the EPA with respect to drinking water, waste-
water, pesticides, toxic substances, solid and hazardous wastes,
and Superfund-related activities. This publication is one of the
products of that research and provides a vital communication link
between the researcher and the user community.
This report describes a system of sampling protocols for
estimating the quantity and composition of solid waste in a given
location, such as a landfill site, or at a specific point in an
industrial or commercial process, over a stated period of time.
An adequate estimation of these elements is essential to the
design of resource recovery systems and waste minimization pro-
grams, and to the estimation of the life of landfills and the
pollution burden on the land posed by the generation of solid
wastes. The theory developed in this report takes a significant-
ly different approach over the more traditional sampling plans,
resulting in a lower cost and more accurate and precise estimates
of these critical entities. Although the calculations dictated by
these protocols are tedious, a computer program, called PROTOCOL,
has also been developed to do these calculations, thus relieving
a great burden from the analyst. The program is designed to be
run on personal computers with modest capabilities.
For further information, please contact the Waste Minimiza-
tion, Destruction and Disposal Research Division of the Risk
Reduction Engineering Laboratory.
E. Timothy Oppelt, Director
Risk Reduction Engineering Laboratory
iii
-------
ABSTRACT
The assumptions of traditional sampling theory often do not
fit the circumstances when estimating the quantity and composi-
tion of solid waste arriving at a given location, such as a
landfill site, or at a specific point in an industrial or commer-
cial process. The investigator often has little leeway in the
sampling observation process. Traditional unbiased random sam-
pling will produce some intervals of little or no activity and
others of frenzied activity, clearly an inefficient and error-
prone procedure. In addition, there are no discrete entities of
solid waste composition, such as a basic unit of paper or of
textiles, comprising the population about which inferences are to
be drawn. Finally, with respect to solid waste composition, the
traditional assumptions of normality are not valid, thus preclud-
ing the rote application of the standard statistical formulas for
the estimation of sample sizes or the construction of confidence
intervals. This study describes the development of sampling
protocols for estimating the quantity and composition of solid
waste that deal with these problems. Since the methods developed,
although not mathematically complex, are arithmetically'tedious,
a computer program (designed to be run on personal computers with
modest capabilities) was written to carry out the calculations
involved.
iv
-------
PREFACE
Traditional sampling theory generally follows the following
paradigm:
SAMPLE SELECTION - SAMPLE OBSERVATION - SAMPLE ESTIMATION.
Typically, the sample selection process is one in which the
samples are chosen by an unbiased procedure, such as simple
random sampling or systematic sampling where it is assumed that
the population is already in random order. Traditional sampling
theory assumes that there are sampling elements, i.e., discrete
entities comprising the population about which inferences are to
be drawn. In the sample observation (i.e., data recording)
stage, it is further assumed that observation of the elements of
the sample is an independent process, i.e., that there is no
queue of sample elements building up, waiting to be observed
while an observation on one sample element is being made.Final-
ly, when it is desired to place confidence intervals about the
estimates made in the sample estimation process, the distribution
of either the population or of the population parameter estimated
is assumed to follow a specific classical probability distribu-
tion; typically, the normal distribution is assumed. In the
sample selection process, similar assumptions are made when
determining the number of samples to be taken.
Unfortunately, these assumptions do not fit the circum-
stances when the problem is to estimate the quantity and composi-
tion of solid waste arriving at a given site, such as a landfill,
transfer station, incinerator, or a specific point in an indus-
trial or commercial process. For one thing, the sample comes to
the investigator, which is the reverse of the situation commonly
described in standard sampling textbooks. Since the investigator
has no control over the arrival of the sample elements, the
sample observation process often is far from independent. Consid-
er a situation where it is desired to weigh a random sample of
vehicles arriving at a landfill. Suppose that it is feasible to
weight up to 10 vehicles per hour and that the average interar-
rival time of vehicles is 3.2 minutes. On the average then, with
either random or systematic sampling, one vehicle would be
weighed every 32 minutes. If it takes an average of 10 minutes to
weight a vehicle, then this is well within the capability of the
sampling system. Unfortunately, vehicles do not have uniformly
distributed arrival times. There may be peak arrival periods when
the number of trucks arriving and to be sampled overwhelms the
weighing capability and one is forced to "default" on weighing
some of the vehicles selected by the sampling plan. One result
is that fewer vehicles are weighed than the sampling plan calls
-------
for, thus reducing the precision of the estimate of solid waste
quantity. More important, however, is that if the load weights of
the defaulted samples differ appreciably from the nondefaulted
samples, bias will introduced. For example, at many landfills
vehicles arriving toward the end of the day tend to have smaller
load weights than those arriving at other times. Since fewer
vehicles arrive towards the end of the day, the tendency would be
to oversample these lightly loaded vehicles and to undersample
the normally loaded vehicles arriving at peak hours, thus intro-
ducing a bias. This cannot be accomplished with unbiased sam-
ples; it can, however, be accomplished with biased samples, using
the estimation process)to unbias the estimate. (This technique,
by the way, is not unknown in traditional sampling theory; it is
used in making estimates in stratified sampling.)
Traditional sampling theory assumes that there are sampling
elements, i.e., discrete entities comprising the population about
which inferences are to be drawn. When it comes to sampling solid
waste for composition, however, there are no such discrete enti-
ties. There is, for example, no such thing as a basic unit of
paper or of textiles. Thus, sampling procedures based upon dis-
crete distributions (such as the multinomial or binomial) are not
valid. Nonetheless, some basic unit weight of sample must be
defined. In traditional cluster sampling theory, a balance is
achieved between the within-cluster and between-cluster compo-
nents of the total variability of an estimate. If the cluster
(i.e., in this context, a sample of given weight) is too small,
then the between-cluster variability will be greater than the
within-cluster variability and will result in a large sample
variability. If the cluster weight is too large, however, the
greater will be the time and expense of sampling. Further compli-
cating this situation, the optimal sample weight is related to
the size of the particles in the sample.
Finally, although assumptions that solid waste quantities
follow normal distributions are justifiable in the estimation of
solid waste quantity, such is not the case in estimating solid
waste composition. For one thing, composition fractions are
bounded, i.e., there are no components in solid waste that are
present in fractions less than zero or greater than one. These
boundaries are generally located close to the means of their
distributions. Thus, solid waste component distributions are, at
the very least, positively-skewed (i.e., skewed to the right)
and, at worst, are J-shaped. Nor does reliance on the Central
Limit Theorem of statistics help much, since even averages of
component fractions do not approach normality quickly, at least
not within an economically feasible number of samples. Distribu-
tions of component averages still tend to be positively-skewed.
This characteristic precludes the rote application of the tradi-
!
vi
-------
tional statistical formulas for either the estimation of sample
size or the construction of confidence intervals. Although trans-
formations can be used to construct confidence asymmetric inter-
vals, these are of little help when estimating sample size. A
knowledge of the effect of positive skewness on the actual level
of significance of a confidence interval, however, can be of help
in determining the number of samples to take.
The purpose of this study was to develop sampling protocols
for estimating solid waste quantity and composition to solve the
problems enumerated above. This included both sampling and esti-
mating procedures. Since these methods, although not mathemati-
cally complex, are arithmetically tedious, a computer program
(designed to be run on personal computers with modest capabili-
ties) was developed to carry out all of the calculations required
by the protocols. The program, called PROTOCOL, also contains
routines that check input data for errors in coding, and an
editor for preparing and modifying input data files.
VII
-------
Page 1
CHAPTER 1
QUANTITY ESTIMATION
1.1 INTRODUCTION
This study is concerned with innovative quantitative ap-
proaches to sampling solid waste streams for quantity and compo-
sition, and with methods for estimating these values for a given
waste shed over a period of time. In addition, computer-based
programs have been created for implementing the statistical
models selected for the determination of sample sizes for quanti-
ty and composition, and for producing the estimates (along with
measures of their uncertainties) of quantity and composition once
the data are collected.
There are two basic paths to the estimation of solid waste
quantity and composition: (1) direct measurement, and (2) predic-
tive models. Within both approaches there are many variations on
a theme. In direct measurement, for example, one can measure at
the point of generation (at each house, commercial establishment,
etc.) or at the point of destination (at a landfill, incinerator,
community recycling center, etc.). The former, however, is much
more costly and time consuming, and the sampling protocol or plan
is more complex to design and implement than in point of destina-
tion methods. Therefore, point of generation methods are neither
economically feasible nor of sufficient accuracy for waste shed
predictions.
Predictive models rely on surrogate or ancillary measure-
ments to estimate waste quantity or composition. On the one hand
there are the Leontif-type input-output models in which the
materials entering a waste shed are placed on one dimension of a
matrix, with their waste products placed on the other dimension.
Using suitable conversion rates for each cell in the matrix and
applying certain mathematical operations (such as matrix inver-
sion) , it is possible to estimate the quantities of the wastes
produced. Although of some value on a global or strategic level,
such approaches are infeasible for local levels because (1) the
data bases simply are not available, and (2) the degree of accu-
racy achieved is inadequate for local objectives. Other predic-
tive models are those in the form of equations (usually of the
regression type) that relate quantity or composition to selected
independent or predictor variables. (The simplest example equates
quantity to the product of waste generated per capita and total
population.) However, many variables affect waste generation:
geography, climate, income level, local ordinances, etc. To
obtain models with any statistical validity, much data would have
to be gathered all over the country to estimate the parameters
-------
Page 2
(constants) of the model. Also, the significance of the individu-
al predictor variables and the values of the parameters would be
expected to change with time. Therefore, such models would have
to be maintained, a costly procedure that has little chance for
support, even by government agencies. It appears, therefore, that
the most pragmatic approach to the estimation of solid waste
quantity and composition is direct measurement at the point of
destination.
Upon examination, ordinary sampling techniques are inappro-
priate for the estimation of the quantity or composition of waste
arriving at a destination site. For one thing, the sample comes
to the investigator, which is usually the reverse of the situa-
tion commonly described in standard statistical sampling texts.
Once a scale is rented and the labor hired to make the measure-
ments, it makes good economic sense to use the equipment and
labor to the fullest extent possible. This is totally unlike
survey sampling where the variable cost of an interview is usual-
ly greater than the fixed cost (before sampling). In statistical
sampling, the elements of a population are defined jointly with
the population itself, i.e., the elements are the basic units
that comprise and define the population. In the solid waste
composition case, however, a truckload is simply too large to
serve as a single sampling unit. Furthermore, in sampling for
composition the situation resembles (but is not identical to)
multinomial sampling for which there is not much discussion in
the statistical texts. Solid waste quantity and composition
exhibit strong seasonal effects that must be taken into consider-
ation in the sampling protocols, another topic not commonly
encountered in the textbooks. Furthermore, traditional survey
sampling methodology usually attempts to select unbiased samples.
It can be shown, however, that it is more efficient to select
biased samples and then correct for the bias in the estimation
formulas employed. For all of these reasons, it is appropriate to
take a fresh look at the problems of estimating solid waste
quantity and composition.
1.2 SOME BASIC STATISTICAL CONSIDERATIONS
In statistical sampling, the elements of a population are
defined jointly with the population itself, i.e., the elements
are the basic units that comprise and define the population. When
sampling for solid waste quantity, the population elements usual-
ly can be taken as the individual vehicle-loads. ("Vehicle-load"
is used here to stress the fact that, for example, one might
observe 500 loads delivered in one day by only 100 different
vehicles. For simplicity, however, vehicle and vehicle-load will
be used interchangeably in this study.) A "parameter" is a numer-
ical quantity that, perhaps with other parameters, defines the
population completely. Suppose that the weight of solid waste in
a vehicle is distributed normally with mean /t and standard devia-
-------
Page 3
tion a. /i and a are the parameters of the distribution and, taken
together, completely define the distribution.
Precision and accuracy are two terms that are relevant to
the parameters of a population; more specifically, they are
applied to estimates of the parameters. An singl.e estimate is
accurate if it is close to the true value, and a collection of
estimates is accurate if their average value is close to the true
value. The difference between the estimate (or average estimate)
and the true value is known as the "bias". Precision, on the
other hand, refers to the closeness of multiple estimates of a
parameter. If the estimates do not vary much among themselves,
the method of estimation is said to be "precise". The concept of
precision is inversely related to that of variance or standard
deviation in that the greater the precision, the smaller the
variance or standard deviation of the estimate. These concepts
are illustrated by the distributions shown in Figure 1. Each the
individual drawing represents the distribution of repeated esti-
mates of some true value, T, and it is assumed that the estimate
of T is taken as the mean of the distributions shown. It will be
noted that the accurate distributions (A and C) are centered on
the true value, T, i.e., the bias is zero. The precise distribu-
tions (A and B) are those with less dispersion, i.e., smaller
standard deviation. Obviously, estimates that are both unbiased
and of high precision are preferred.
Accuracy is affected mainly by the selection process, i.e.,
the way by which the sampling units are selected; precision, on
the other hand, is affected mainly by the measurement process and
the sample size. Since the most difficult aspect of any protocol
for estimating solid waste quantity involves the selection proc-
ess, the major problem perforce is one of accuracy.
1.3 THE .SAMPLE SELECTION PROCESS
It should be clear that if a destination site has or can be
fitted with scale facilities that permit all vehicles to be
weighed, there is no sample selection (or estimation) problem. It
is assumed here that this is not the case. For the moment, the
problems of trend and cyclical or seasonal variation of the
quantity of solid waste delivered to the site will be deferred,
and it will be assumed that sampling is to take place over a
period of one week. (If the site only operates x days a week,
then a one-week sample involves sampling on each of the x days).
When sampling over this period, variations in quantity due to
hour-to-hour and day-to-day differences are accounted for in the
estimate. Week to week differences, however, are far less impor-
tant than month to month differences. Therefore, it makes little
statistical sense to sample for periods of two or more consecu-
tive weeks. In order to achieve maximum sampling efficiency in a
-------
Page 4
a = STANDARD DEVIATION
= BIAS
-a*
III
(A) ACCURATE AND PRECISE
I
I
(B) INACCURATE BUT PRECISE
I.
liiiiii nil
r
h..
(C) ACCURATE BUT IMPRECISE (D) INACCURATE AND IMPRECISE
FIGURE 1: CONCEPTS OF ACCURACY AND PRECISION
-------
Page 5
statistical sense, the one week sampling periods must be spread
throughout the year. This will be discussed in the next section.
The problem of selection bias can be addressed in either of
two ways:
1. Make no assumptions about the nature of the arrivals of
the vehicle, and take a random sample, or
2. Assume that the vehicles arrive in random order and take
a systematic sample, i.e., one in which every kth vehicle
arriving at the site is weighed.
The second approach is attractive for two reasons: (1) the proce-
dure by which the trucks are selected for the sample is relative-
ly simple, and (2) if we are interested in separate estimates
for different types of vehicle (different sizes of trucks, com-
mercial versus residential, etc.), the systematic sample can
easily yield a proportionate sample, which has statistical advan-
tages that will be explained later.
The basic problem with systematic sampling has to do with
any departure from the assumed randomness in the arrival of the'
vehicles. These departures are of two kinds:
1. A monotonic trend may exist in the weights of the
loads, e.g., the loads may increase with time over
the week. Since a systematic sample consists of a
random start followed by sampling.each kth truck
afterwards, the estimate will depend on the random
start within the first interval. In Figure 2A, the
low random start (solid dots) will produce a lower
estimate than the high random start (open circles).
The estimates in these two cases will be biased
either low (solid dots) or high (open circles).
2. A cyclical or periodic trend may exist in the
weights of the loads. In Figure 2B, if the random
start happens to fall at the top of a cycle (solid
dots), the estimate will be high; if it falls at
the bottom (open circles), it will be low. Again,
in either case the estimates will be biased.
It does not appear, however, that either of these events poses a
real problem. The interval between vehicles is so short that
neither monotonic or periodic effects would influence the esti-
mate significantly. Furthermore, simply changing the random start
each day would average out the effect of any single start.
As was mentioned, systematic sampling consists of sampling
every kth vehicle after a random start. The random start, r, is
the rth vehicle in the arriving vehicle sequence where r is a
number, chosen at random, between 1 and k. The succeeding vehi-
-------
Page 6
cles to be sampled are k+r, 2k+r, 3k+r, etc. The random start
from 1 to k imparts to each vehicle the selection probability
l/k=f where f is known as the sampling frequency. If we know the
total number of vehicles, N, arriving during the sampling period,
the total sample size, n, is given as n=fN=N/k. (Note that n will
be an integer only if N is an integral multiple of k.)
WEIGHT OF
VEHICLE-LOAD
INTERVAL k
T
o
T
O
INCREASING TIME
FIGURE 2A: MONOTONIC TREND
WEIGHT OF
VEHICLE-LOAD
INTERVAL k
T
O
T
O
o
-+ INCREASING TIME
O
O
FIGURE 2A: CYCLICAL TREND
-------
Page 7
Although the concept of systematic sampling is relatively
simple, a problem arises when sampling is weighing scale-limited.
For example, suppose it is known that approximately 750 vehicles
arrive at a site over a five-day period, and a sample size of 75
is desired. Since k=N/n, the sampling interval, k, is every 10
vehicles (after a random start between 1 and 10). Suppose further
that it is practicable to weight up to 10 vehicles per hour. The
average interarrival time of vehicles is (5 days)(8 hr/day)(60
min/hr)/750=3.2 minutes. On the average, then, one vehicle would
be weighed every 32 minutes, apparently well within the capabili-
ty of the weighing system. Unfortunately, vehicles do not have
uniformly distributed arrival times. There may be peak arrival
periods when the number of trucks arriving and to be sampled
overwhelms the weighing capability. One is forced to "default" on
weighing some of the vehicles selected by the sampling plan. This
has two effects. One result is that fewer vehicles are weighed
than the sampling plan called for, thus reducing the precision of
the estimate of solid waste quantity. Second, if the load weights
of the defaulted samples differ appreciably from the nondefaulted
samples, bias will introduced. For example, at many landfills
vehicles arriving toward the end of the day tend to have smaller
load weights than those arriving at other times (often this is
simply a policy not to have vehicles stand overnight with refuse
in them). Since fewer vehicles arrive towards the end of the day,
the tendency would be to oversample the lightly loaded vehicles
and to undersample the normally loaded vehicles. Thus, a bias is
introduced.
The selection of a random sample is more complicated than
that of a systematic sample. Assuming a selection probability
equal to that of a systematic sample, a random sample is sampled
with probability f=l/k. If we desire a 10% sample (i.e., f=0.10),
then we must consider each vehicle entering the site and, using a
probability generator of some sort, decide whether to sample it.
(For example, use a table of random numbers from 1-1000; if the
random number fell below 101 sample the vehicle; otherwise let it
pass.) Not only is this more complicated than systematic sam-
pling, it may result in a greater number of defaults since random
samples "bunch up" more than systematic samples. Thus, the sys-
tematic sample has a number of advantages over random sampling.
is the method of choice in this study.
1.4 ESTIMATION OF MEANS, TOTALS, AND VARIANCES
For a simple random sample, or for a systematic sample
where it can be assumed that the population contains neither
significant trend or significant cyclical components, the mean,
x, is estimated by Equation 1.1,
-------
Page 8
n n
x = Z w^X-i = 1/n S Xi [1.1]
i i
where X^ is the ith observation and the weight, w^, is equal to
1/n where n is the total number of observations. The total, X, is
estimated by Equation 1.2,
X = NX [1-2]
where N is the total number of elements in the population for the
time sampled. The variance of the individual observations is
computed from Equation 1.3,
n n
ZXf - (Z
^
i i
ZX? - (ZXi)2n
[1.3]
(n-1)
and the variance of the mean and of the total are, respectively,
(1-f)
var(x) = varfX.^ [1.4]
n
var(X) = N2var(x) [1.5]
where (1-f) is the finite sampling correction factor and f=n/N.
These equations are well-known and comprise the basic relation-
ships for simple random sampling within a finite population (see
Kish, 1965).
*>
We can restate Equation 1.1 for the estimation of a mean by
grouping, and then summing, the observations on an hourly basis,
i.e.,
h n.: h n.!
x = Z S w. Xjj = 1/n S Z X.!4 [1.6]
i j D i j D
where Xjj is the jth observation in the ith hour, n^ is the
number oV observations in the ith hour, h is the number of hours,
n is the sum of the h n^'s, and w^ = 1/n. We note that, drawing
an analogy with Equation 1.3, the variance of Xjj in interval i
for all intervals where n is not equal to 1 is:
-------
nj
Page 9
var(Xij) =
(n-1)
[1.7]
The variance over all Xj is the weighted (by the number of
vehicles sampled in each
jj
hour
) average of the i variances, i.e.,
var(Xi;j) = (l/n) S n
[1.8]
Thus, the estimate of the variance of an individual load now
becomes:
(l/n) Sn±
i
ni
(f
j
[1.9]
(for all intervals where n^ is not equal to 1), and Equation 1.4
is slightly altered to:
var(x)
(1-f)
n
var(Xi:j)
[1.10]
Equations 1.2 and 1.5 remain unchanged. Note that when using
Equation 1.9, if n^ = 1 the datum for that hour cannot be used in
the calculation. Also note that the "hour" interval used in these
equations really can be any time unit, e.g., half-hours, etc.
Furthermore, even if an hour is selected as the interval, it need
not begin on the hour (e.g., the first hour could be 7:30 AM to
8:30 AM, with the second hour 8:30 AM to 9:30 AM, etc.).
1.5 CORRECTION FOR DEFAULTING (UNDERSAMPLING)
We first consider the case where undersampling takes place
because of burdens placed upon the weighing system, i.e., al-
though the sampling plan calls for specific sample sizes to be
obtained each hour, the weighing system cannot keep up with the
requirements and we default on taking part of the sample. If
-------
Page 10
there is lack of randomness in the vehicle-load data (as there
would be, for example, if the loads tended to weight less toward
the end of the day or if certain-sized vehicles tended to arrive
at a destination site at particular times of the day), the equal
weighting fw.^ = 1/n) of the observations in Equation 1.6 would
bias the estimate of the mean.
Where there are no constraints on the weighing system,
random or systematic sampling can be considered equivalent to
proportionate sampling where the number of samples in a sampling
stratum (the ith hour, in this case) is proportional to the size
of the stratum. Thus,
n-fmi [1.11]
where mjj is the total number of vehicles arriving in the ith
hour. Note, however, that in this case,
= 1/n = 1/fN
= 1/f.^N
[1.12]
Introducing these new weights of Equation 1.10 into Equation 1.6
we obtain:
x = 2 S
i J
w'x
11
[1.13]
where x.j is the mean in the ith hour, and w|= m^/N. The important
observation to be made here is that it can be shown (Appendix
A.I) that, even with defaulting, Equation 1.13 produces an un-
biased estimate of the true mean. Since n = f N = n^N/m^ = n^
Equation 1.9 now becomes:
sj
J
ni
J
n±-l
[1.14]
for all intervals where n^ is not equal to 1.
To find the variance of x in Equation 1.13, we utilize the
concept of propagation of error [Deming, 1950] which, for a
linear equation, can be expressed as:
-------
Page 11
var[f(x1,x2,...,xn)] =
n n n
S (Sf/Sxi )2var(x<) + S Z Sf/S^i 6f/6x+ cov(xs,x^) [1.15]
1 -1 -1
Applying the propagation of error formula to Equation 1.13 (and
assuming that the covariance term are zero) we obtain:
h
var(x) = (1/N2 Z
i
[1.16]
2i. - n^
"i o i
Z X*j - (Z 1
for all intervals where n^ is not equal to 1. The rather compli-
cated derivation of Equation 1.16 is given in Appendix A.2. Equa-
tions 1.2 and 1.5 still remain unchanged. Note that this method
of sampling requires another data set in addition to that re-
quired by ordinary systematic sampling, i.e., the total number of
vehicles arriving each hour. Note that another way of describing
the variability of x is in terms of its coefficient of variation,
i.e., the standard deviation as a fraction of the mean. Thus the
within-week coefficient of variation (expressed as a percentage),
cw, of x is
cw = [var(x)]V* [1.17]
Typical values for this coefficient of variation range, depending
upon the number of trucks sampled during a sampling week, from 2
to 5%. A confidence interval around the mean, X, is
x ± t [var(x)]* [1.18]
where t is the Student t-value at some significance level, a, and
degrees of freedom, df. Defining "error" as one-half of this
confidence interval, i.e., t[var(x)]% the error of the estimate
of x expressed as a percentage of x is given by:
E = 100 t[var(x)]V* [1-19]
-------
Page 12
According to Bennett and Franklin (1954) , if MS is a mean
square and
MS = a ^ MS-i ~t~ 3o MSp "I" a-j MS -j + . . .
where MS^ is based upon dj degrees of freedom, then the effective
degrees of freedom, EOF, for MS is:
MS2
EOF = - [1.20]
2
a«
Defining, for all i where n not equal to 1,
S^cf-i - (S^-:)2^
j 3 j 3 [1-21]
n±-l
and applying the Bennett and Franklin relationship, Equation
1.20, 1.16, the effective degrees of freedom, EOF, for construct-
ing confidence intervals about means or totals for all intervals
is:
(S a± MSi)2
EOF = - [1.22]
where a^ = m? (l-n^/m^)/n^N2 and the sums are over all i where n^
is not equal to 1, and N = Zm^ where the sum is over all i where
n^/m^ is not equal to 1. This value of EDF should be used for the
degrees of freedom when determining the t-value for Equations
1.18 or 1.19.
1.6 FULL SAMPLING
Typically, depending upon the method of readout and the
number of axles involved, it may require up to 30-45 munutes to
weigh one truck using wheel scales. However, platform scales are
frequently 20 to 30 times as fast. When using platform scales,
then, it will be possible to oversample some intervals, if not
-------
Page 13
all of them. As was mentioned previously, once a scale is rented
and the labor hired to make the measurements, it makes good
economic sense to use the equipment and labor to the fullest
extent possible. Since some intervals may be oversampled and
others undersampled, sampling to the fullest capacity of the
weighing system is defined in this study as "full sampling". Full
sampling means that, when finished weighing the current vehicle,
we sample the next available vehicle. Within a reasonably short
sampling interval (e.g., one-half or one hour) it can be assumed
that arrivals are random; indeed, this is the assumption of
traditional systematic sampling. Under such circumstances, sam-
pling the next available vehicle is identical to systematic sam-
pling with frequent, but random, starts.
Two questions then arise: (1) How are the estimates in full
sampling adjusted for bias?, and (2) Is there any advantage to
full sampling? The answer to the first question is simply that we
use the same equations as for defaulting or undersampling, i.e.,
Equations 1.12 and 1.15 (the equations for totals, 1.2 and 1.5,
also apply). As .for the second question, when we apply full
sampling, the variance of the estimated mean or total decreases.
(The proof of this is given in Appendix A.3.) Thus, assuming that
the scales are rented for a given time period and that no extra
people need be hired, it always pays to sample fully.
The use of these formulas is illustrated in Table 1 where
data from a random or systematic sample are presented for an 8-
hour sampling period from a population with true mean = 18605.55.
(The population consisted of eight triangular distributions in
which the mean started low at hour one, then rose to a maximum at
hours four and five, and then finished low at hour eight.) For a
desired sampling frequency, f, of 0.1, the number of trucks
sampled each hour would have to be 1/1/2/8/10/3/3/2, respective-
ly. However, it was assumed that it was not physically possible
to sample more than six trucks in any one-hour period; therefore,
the actual number of trucks sampled each hour was
1/1/2/6/6/3/3/2, respectively. Since the sample is biased, the
estimate of 17560.46 for the mean obtained by using Equation 1.1
is also biased. Using Equation 1.13, however, the unbiased esti-
mate of the mean is 18774.53, which is much closer to the true
value of 18605.55. Note that 24 trucks were sampled
(1+1+2+6+6+3+3+2=24) . If this were simple random sampling, the
degrees of freedom would be 24-1 or 23. Since the effective
degrees of freedom was 14.64, the efficiency of the degrees of
freedom is 14.64/23 = 0.6365 or 63.65%.
Table 2 shows the averages of 1000 daily samples (i.e. 1000
days of Monte Carlo simulation) from the same population used for
Table 1 for three different cases: (1) Random or systematic
sampling, no scale constraints, (2) Random or systematic sampling
with scale constraints (only six vehicles can be weighed in any
hour), and (3) Full sampling (again assuming that six vehicles
-------
TABLE 1: CALCULATIONS FOR EXAMPLE OF SECTION 2.4
{1}
i
1
2
3
4
5
6
7
8
Totals:
{1}
i
1
2
3
4
5
6
7
8
Totals:
{2}
_
20
80
100
30
30
20
280
{2}
mi
_
20
80
100
30
30
20
280
{3}
ni
_
2
6
6
3
3
2
22
{3}
ni
_
2
6
6
3
3
2
22
{4}
5734.19
10155.29
36321.28
144738.20
140308.40
45900.44
31825.14
6468.04
421450.97
{10}
{5}
_
659856700
3587637000
3305738000
728406800
338253500
21095340
{11}
{6}
{5}-{4}{4}
_
239110.
96114180.
24664160.
26123340.
640261.
177597.
{12}
/{3}
10
00
00
00
20
50
{7}
_
239110.10
19222840.00
4932832.00
13061670.00
320130.60
177597.50
{13
{8}
m^/22
}
nijL/300n^ {5}ZXjj m?l-n^/280)/ro2^ {7} {12}
.0333
.0333
.0333
.0444
.0556
.0333
.0333
.0333
191.13
338.50
1210.70
6432.80
7794.91
1530.01
1060.83
215.60
18774.53
_
.002295918
.012585030
.019982990
.003443877
.003443877
.002295918
548.
241920.
98572.
44982.
1102.
407.
387534.
98
00
75
79
49
75
80
_
.071
.276
.357
.107
.107
.071
{13}
{9}
{7}{8
_
17079
5492239
1761726
1399465
34299
12685
8717495
{14}
{13}/(ni-l)
_
' -
301379
11705057280
1943317409
1011725698
607742
166260
14661175768
,
.29
.00
.00
.00
.71
.53
.00
a = (8717495.00)= 2952.54
Using Equation 1.1,
X = 421450.97/24 = 17560.46
Using Equation 1.16,
Std(x) = (387534.80)3* = 622.52
Using Equation 1.13,
X = 18774.53
Using Equation 1.4,
std(x) = (1-22/280)^ 2952.54/722 = 604.25
Using Equation 1.17,
cw = 100(622.52)/17560.46 = 3.5%
Using Equation 1.22,
EOF = (387534.80)214661175768 &"lO.24
-------
Page 15
TABLE 2: AVERAGES FOR 1000 DAILY SAMPLES
SYSTEMATIC OR
RANDOM SAMPLING
NO SCALE
CONSTRAINTS
SYSTEMATIC OR
RANDOM SAMPLING
WITH SCALE
CONSTRAINTS
FULL
SAMPLING
SAMPLE {1/1/2/8/10/3/3/2} {1/1/2/6/6/3/3/2} {6/6/6/6/6/6/6/6}
SAMPLE SIZE 30 24 48
POPULATION a [1.14] 3426.37 3409.62 3317.31
TRUE POPULATION a * 3351.65 >
MEAN [1.13]* 18613.57 18630.06 18638.08
BIASED MEAN [1.1]* 18613.57 17545.41 14080.19
TRUE MEAN ««* 18605.55 +
a OF MEAN [1.16]* 614.29 715.59 595.47
TRUE a OF MEAN 578.68 701.72 672.99
* Numbers in brackets refer to equations used.
can be weighed in any hour). Note that the estimate of the mean
can be quite biased if the ordinary sampling formulas, e.g.,
Equation 1.1, are used when defaulting or undersampling occurs.
The advantages of full sampling are also clear since the standard
deviation of the estimated mean has been reduced by approximately
17% from that obtained with systematic or random sampling. Since
the within-week sampling coefficient of variation, c.., is impor-
tant for sample sizing determinations, the model of Tables 1 and
2 was used to simulate (using 100 iterations) a one-week sample
for different sample sizes. The results are shown in Table 3. In
general, the c.. is rather small, i.e., between 1-2%.
j
1.7 SEASONALITY
The sampling methodology described in the previous sections
is based upon a continuous sampling period such as successive
days or successive weeks. Thus, hour-to-hour and day-to-day
effects are accounted for in the estimation process. It is well-
known, however, that the quantity of solid waste "generated fre-
quently varies significantly from month to month. (Municipal
solid waste generation, for example, is low during the months of
January, February, November, and December, and peaks during June,
July, and August, although there are some variations depending
upon geographical location. See Table 4.) Thus it is not suffi-
cient to sample one week out of the year to estimate generation
-------
Page 16
TABLE 3: AVERAGE WITHIN WEEK SAMPLING
COEFFICIENTS OF VARIATION,
SAMPLE SIZE SAMPLE SIZE COEFFICIENT OF
(ONE WEEK) (HOURLY) VARIATION, %
96 2 2.11
144 3 1.78
192 4 1.48
240 5 1.31
288 6 1.20
336 7 1.11
384 8 1.02
432 9 0.98
480 10 0.91
528 11 0.87
576 12 0.83
624 13 0.81
672 14 0.78
720 15 0.74
768 16 0.72
816 17 0.69
Note: Number of iterations = 100 weeks
The model assumes sampling
8 hours/day, 6 days/week.
for the complete year. -Theoretically, if one knew the weeks of
the year in which the curve of weekly generation crossed the
horizontal line representing the average weekly generation for
the year, we could schedule a one-week sampling period for one of
these intersection points and be confident that our estimate for
that week, multiplied by the number of weeks in that year, would
be identical to the quantity produced throughout the year. (This
is termed the "critical point" approach.) Information about these
critical points is, unfortunately, not available before the fact.
Even if it were available for the previous year, there is no
guarantee that the critical points will be the same for the
current year. Therefore, we are forced to sample additional weeks
throughout the year.
Suppose we sample r weeks out of the year and determine, for
each of these r weeks, the total quantity of solid waste, X^,
arriving at the site in the kth week. Let y be the average total
weekly quantity over the r weeks. Assuming that there are
365/7=52.1429 weeks per year, then an estimate of the total
quantity for the year, Y, is obtained by:
-------
Page 17
TABLE 4: TYPICAL SEASONAL VARIATIONS IN SOLID WASTE GENERATION
WASTE GENERATION AS % OF THE MEAN
LOCATION LOW MONTH HIGH MONTH
CONNECTICUT 85 NOV 111 MAY
ENGLAND 67 JUL 132 JAN
HAWAII 84 NOV 118 JUN
KENTUCKY 85 MAR 125 AUG
MISSOURI 79 FEE 113 JUL
OHIO 87 JAN 113 JUL
ONTARIO 90 MAR 106 JUN
VIRGINIA 80 JAN 125 MAY
WASHINGTON 86 FEE 108 MAY
WISCONSIN 81 FEE 131 JUN
r
Y = (52.1429)y = (52.1429/r) SXk [1-23]
Applying the propagation of error formula (and assuming that the
covariance terms are zero) ,
var(Y) = (52.1429/r)2S var(Xk) [1.24]
Applying the Bennett and Franklin relationship, equation 1.20, to
Equation 1.24, the effective degrees of freedom, EDF^., for use in
calculating confidence intervals for the total amount of waste is
given as:
»
[ (52 . 1429/r) 2Svar (Xk) ] 2
EOF -- - - - - [1.25]
(52.1429/r)4 S[var(Xk) ]VEDFk
Since the coefficient of variation is equal to the standard
deviation divided by the average, the coefficient of variation of
the between-week differences, cb is given as:
cb = sb/y [1.26]
To obtain some idea of typical values of c^, a Monte Carlo simu-
lation was performed (based upon real data obtained from a Boston
solid waste site), the results of which are shown in Table 5. A
conservative value of 2% was used for cw, and the simulation
involved 1000 iterations. Two sampling protocols were simulated:
-------
Page 18
(1) random sampling, and (2) systematic sampling. Sampling
frequencies of from two to ten times per year were investigated.
The cb of random sampling should closely approximate the true cb
(columns 2 and 3 in Table 5) which it does for all sampling
times. However, except for a sampling frequency of two (where
random sampling is identical to systematic sampling), systematic
sampling is superior to random sampling for all sampling frequen-
cies. At a sampling frequency of four times per year, for exam-
ple, the Cw of systematic sampling is only 60% that of random
sampling. Since both sampling methods provide estimates of less
than 1% deviation from the true total (columns 5 and 6), system-
atic sampling is clearly the preferred method for sampling
throughout the year. Table 5 suggests that typical values of cfa
vary between 3 and 4% for sampling frequencies over the range,
four to eight times per year.
TABLE 5: MONTE CARLO SIMULATION - SEASONALITY
(1)
(2)
(3)
(4)
(5)
(6)
# WEEKS
SAMPLED
2
3
4
5
6
7
8
9
10
Cb
TRUE
10.7
8.7
7.6
7.0
6.2
5.7
5.3
5.0
4.8
cb
RANDOM
10.0
7.8
7.3
6.4
5.9
5.6
5.2
4.7
4.6
cb
SYSTEM
9.9
5.3
4.1
3.6
3.2
2.5
3.0
2.2
2.4
TOTAL
RANDOM*
+0.01
+0.10
+0.10
-0.01
+0.33
-0.28
+0.08
-0.10
-0.23
TOTAL
SYSTEM*
-0.27
+0.37
+0.00
+0.04
-0.03
+0.06
-0.06
+0.02
+0.02
*Estimated total quantity as percent deviation from true total
Note: For all simulations in this table, cw = 0.02.
Using the data of Tables 3 and 5, Table 6 was prepared. This
Table shows the percentage error (assuming systematic full sam-
pling and a 90% confidence level ) of the estimate of total
yearly quantity of solid waste for various sample sizes and
sampling frequencies.
1.8 Stratified Sampling
Stratified sampling involves dividing the population into
distinct subpopulations called strata. The strata could be based
upon vehicle size, load type (residential, commercial, industri-
-------
Page 19
al, etc.) or any other attribute. Within each stratum a separate
sample is selected and a separate estimate made. The stratum
means and variances are then appropriately weighted to form a
combined estimate for the entire population. Generally, strati-
fied sampling is used to: (a) increase the precision of the
estimate, (b) afford different sampling methods within the
strata, or (c) provide separate estimates for different popula-
tion elements. With regard to increasing the precision of the
estimate, theory (see Kish, 1965) tells us that grouping like
elements within a stratum (for example, one stratum might consist
of small, private vehicles, and another might consist of munici-
pal or commercial, rear-loading, packer-type vehicles) increases
precision. With regard to affording different sampling methods
within the strata, one might use entirely different scales for
small, private vehicles than for municipal or commercial, rear-
loading, packer-type vehicles. With regard to providing separate
TABLE 6: ERROR OF THE ESTIMATE OF TOTAL YEARLY QUANTITY OF SOLID
WASTE (90% CONFIDENCE LEVEL & SYSTEMATIC FULL SAMPLING)
AS A FUNCTION OF SAMPLE SIZE AND SAMPLING FREQUENCY
SAMPLING
FREQUENCY,
WEEKS/YEAR
3
4
5
6
7
8
9
10
NUMBER
800
492
OF
341
NUMBER
16.7
5.1%
3.4
2.7
2.2
1.6
1.8
1.3
1.3
10.3
5.1%
3.5
2.7
2.2
1.7
1.8
1.3
1.3
7.
5.
3.
2.
2.
1.
1.
1.
1.
OF
1
2%
5
8
3
7
9
4
4
TRUCKS
244
TRUCKS
5.1
5.2%
3.6
2.8
2.3
1.8
1.9
1.4
1.4
SAMPLED
189
SAMPLED
3.9
5.2%
3.6
2.9
2.4
1.8
2.0
1.5
1.5
PER
WEEK
157
PER
3.
5.
3.
2.
2.
1.
2.
1.
1.
HOUR
3
3%
7
9
4
9
0
5
5
127
2
5
3
3
2
2
2
1
1
.7
.4%
.7
.0
.5
.0
.1
.6
.6
97
2.0
5.4%
3.8
3.1
2.6
2.0
2.1
1.7
1.7
Notes:
(1) Table percentages are the errors as a percentage of
the true total yearly quantity of solid waste;
(2) Table percentages can be converted to other confidence
levels by the formula, new = (old*z)/1.645 where "old" is
the old % error, "new" is the % error at the new
confidence level, and z is the standard normal deviate
at the new confidence level.
(3) Trucks sampled per week was converted to trucks sampled
per hour by assuming 8 hours/day and 6 days/week sampling.
-------
Page 20
quantity estimates for different population elements, a "quanti-
ty" unit, such as a point, might not be identical for all ele-
ments. For example, waste composition varies widely between
municipal and industrial waste.
If the standard deviation of a vehicle-load in a stratum is
proportional to the average weight of the load in that stratum (a
not unreasonable assumption), then sampling plans that make the
sampling effort proportional to the total quantity contribution
of each stratum is known as Neyman allocation (see Kish, 1965).
An example of a comparison between sampling with and without
stratification is shown in Table 7. The model used assumed that
10% of the vehicles had an average net weight of 306 Ibs (repre-
senting small vehicles, such as pickup trucks, station wagons,
etc.), and 90% had an average net weight of 15000 Ibs (represent-
ing typical commercial vehicles). The load distributions of the
two vehicle types were assumed to be normal, with the standard
deviation equal to 10% of their means. The results in Table 7
represent a simulation involving a total for 5000 vehicles, and
the stratification used was Neyman allocation. The estimated
means are close to the theoretical mean for sampling both with
and without stratification. The standard deviations (and coeffi-
cients of variation) of these estimates, however, are quite
different. The superiority of the stratified estimate is quite
evident.
TABLE 7: STRATIFIED VERSUS NON-STRATIFIED SAMPLING
MEAN
a
cv
THEORETICAL
13531
1423
10.5%
NON-STRATIFIED
SAMPLING
13552
4645
34.3%
STRATIFIED
SAMPLING
13547
1444
10.7%
cv = Coefficient of Variation.
Thus, if the population of vehicles consists of two or more
subpopulations with significantly different means, it is highly
advisable to (1) sample the subpopulations separately, making the
samples proportional to the total quantity contribution of each
stratum (i.e., Neyman allocation), (2) make separate total quan-
tity estimates for these subpopulations, and (3) add them to
arrive at an overall population quantity estimate. A population
-------
Page 21
quantity estimate of standard deviation, std(xc) , can be made by
combining the subpopulation standard deviations by the following
formula:
std(xc) = [Zdis/Zdi)]5 [1.27]
where s^ is the standard deviation of the estimate of total
quantity for subpopulation i, and dj is the degrees of freedom
for that estimate. Note that std(xc) "has Zd^ degrees of freedom.
REFERENCES
1. Bennett, C.A., and Franklin, N.L., Statistical Analysis in
Chemistry and the Chemical Industry. John Wiley & Sons, New
York, N.Y., 1954.
2. Deming, W.E., "Some Variances In Random Sampling", in Some
Theory of Sampling. John Wiley & Sons, New York, N.Y., 1950,
pp. 127-134.
3. Kish, L., Survey Sampling. John Wiley & Sons, New York, N.Y.,
1965.
-------
Page 22
NOTATION
cv = coefficient of variation, stratified sampling
Cjj = between-week coefficient of variation
cw = within-week coefficient of variation
d^ = degrees of freedom for ith subpopulation
E = error, i.e., one-half of a confidence interval
f = weekly sampling frequency, n/N
f^ = sampling frequency in ith hour, n^/m^
h - total number of hours sampled during the week
i = index for hours
j = index for vehicles
k = index for week
m^ = number of vehicles arriving in ith sampling hour
n = total number of vehicles sampled
N = total number of vehicles in week
n^ = number of vehicles sampled in the ith hour
r = number of weeks sampled during the year
s^ = subpopulation quantity standard deviation estimate
std(xc) = population quantity standard deviation estimate
var(Xj-j) = variance of individual load measurement
w^ = weighting factor for an individual observation, 1/n
X = total weight of all vehicle-loads for the week
x = average vehicle-load weight for the week
x^ = average vehicle-load weight in the ith hour
X^j = jth vehicle-load weight in ith hour
Xk = total quantity of solid waste in the kth week
y = average total weekly quantity during r sampling weeks
Y = total quantity for the year
-------
Page 23
CHAPTER 2
COMPOSITION ESTIMATION
2.1 INTRODUCTION
The estimation of waste composition is a more difficult task
than the estimation of waste quantity for at least four reasons:
1. Complexity; The estimation of waste composition involves
the measurement of more than one attribute.
2. Cost; Weighing a collection vehicle is a relatively low-cost
procedure. Selecting a sample of waste and separating it
into a number of components is both a more expensive and a
more unpleasant procedure.
3. Statistical Problems; Unlike the estimation of waste quanti-
ty, reliance on the Central Limit theorem of statistics in
order to assume normality (and hence permitting simpler
calculations) is not always justified. Also, there are
problems of what constitutes a sample unit and how to obtain
random samples of such units.
4. Small sample size; Because of the time, and expense required
to sample for waste composition, there are fewer data avail-
able regarding this aspect of waste characterization than
for waste quantity. Hence our estimates have less precision
than that of waste quantity.
In quantity estimation, the sample unit is clearly the
vehicle. One weighs the entire vehicle because it makes no sense
to select and just weight a portion of a vehicle-load. In compo-
sition sampling, however, we usually cannot separate an entire
vehicle-load because of time and economic considerations. The
usual commercial vehicle-load is between 10,000 and 20,000 Ibs,
and is obviously too large a sample unit to separate. It must be
remembered that separation has to be done manually. As a rough
approximation, typically one man can separate 65-300 Ib of raw
municipal solid/waste in one hour, depending upon the number and
type of components desired. Clearly, very small sample weights
make no sense physically. A large piece of wood or metal, for
example, could not physically ever be included in a sample of,
say, 5 Ibs. Furthermore, small sample weights tend to be more
homogeneous than the population being sampled, i.e., the smaller
the sample weight, the greater the likelihood that it consists
entirely of wood or paper or glass, etc. Following a well-known
principle of cluster sampling (see Kish, 1965), this homogeneity
-------
Page 24
tends to increase the variance of the sample. Indeed, Klee and
Carruth(1970) found that the smaller the sample weight, the
greater the variance of raw municipal solid waste samples. Howev-
er, the relationship is was linear. Under 200 Ibs the sample
variance increased rapidly; over 300 Ibs it increased much more
slowly. Accordingly, they recommended that a sample weight of
200-300 Ibs be used for general municipal solid waste sampling,
and this recommendation has been widely adopted by other investi-
gators for raw refuse streams.
The sample weight recommendation of 200-300 Ibs is appropri-
ate for raw municipal refuse only. Clearly, optimal sample weight
is related to -the particle size of the material sampled, but the
relationship is not linear. Other investigators (see Trezak,
1977) have found that processed waste stream particle size dis-
tributions are adequately described by functions based upon the
exponential distribution, the Rosin-Rammler equation, for exam-
ple. Thus, the author recommends the following model based upon
the exponential function to determine optimal sample weights for
other than raw municipal refuse:
Y = Xe [2.1]
where Y is the optimal sample weight in pounds, and X is the
characteristic particle size of the material to be sampled,
inches. The characteristic particle size is the screen size at
which 63.2% of the material passes through. The boundary condi-
tion for Equation 2.1 is that at Y = 250 Ibs (the median of the
200-300 Ibs recommended for raw municipal solid waste) , the
characteristic particle size is 18 inches. The value of ft that
satisfies the condition is 0.146. Thus,
Y = Xe°'146X [2.2]
To illustrate the use of this equation, the output of the
Appelton West shredder was found to have a characteristic size of
2.2 in. (see Table 10 in Savage and Shiflett, 1979). To sample
the output from this shredder we would need to take samples of
weight,
Y = 2.2e°-146(2-2> = 3.0 Ib.
Like quantity, the variation in composition of solid waste
can be expected to be influenced by within-week and between-week
differences. Therefore, it is important to sample within given
weeks throughout the week. However, since the actual separation
process is quite lengthy, a long time occurs between samples.
Furthermore, a sample can be taken from a selected vehicle and
the vehicle allowed to move on. For these reasons, unlike with
quantity sampling, the taking of composition samples does not
result in vehicle queues regardless of the sampling method used.
Thus it makes more sense to consider an unbiased sampling scheme
-------
Page 25
for composition determination, e.g., either the random or system-
atic sampling schemes described in Chapter 1.
2.2 NORMALITY ASSUMPTIONS
The determination of the number of samples for the within-
week estimation of waste composition is, unfortunately, a much
more complex matter than for quantity estimation. Discrete dis-
tribution theory (such as multinominal or binomial) cannot be
used because we are not dealing with identical items in the
sampling unit. One piece of wood in a sample, for example, is
different in size and shape from the next piece that might be
found in the sample. One is tempted to reach for the Central
Limit Theorem once again and assume that either the component in
question is distributed normally or that averages taken from
their distributions are distributed normally., Previous composi-
tion studies, however, have shown that no component is distribut-
ed normally (see Klee, 1980) . The question jis then, "How many
samples must be taken so that averages of the samples are dis-
tributed normally?" For components with positively-skewed distri-
butions (i.e., skewed to the right - see Figure 1) - and this
includes most components, including newsprint, total paper,
plastics/rubber/leather (when combined into one component),
ferrous and other metals - averages of as low as n = 4 samples
closely approximate normality and, by n = 10, normality is all
but assured. However, for components with J-shaped distributions
(see Figure 1) - and this includes components such as textiles,
wood, and garden waste - reasonable normality is not approach
until averages of n = 40 or greater are taken.
One indication of normality is the coefficient of skewness,
g^ , which is the third moment about the mean , i.e.,
define k2 = S(xi-x)2/(n-l) [2.3]
and k3 = n2(xi-x)3/(n-l) (n-2) , [2*.4]
then gx = k3/(k^)^ [2.5]
Given the coefficient of skewness, s_, , of a parent population,
the coefficient of skewness of the (distribution of averages of
size n taken from this distribution, sgl, is:
gl
The coefficient of skewness is used in Table 1 which the results
of simulations for averages of different sizes (number of itera-
tions for each case = 5000) for two components, ferrous metals (a
positively-skewed distribution) and textiles (a J-distribution) .
The distributions are taken from the data collected by Britton
(1972) . Since the rationale behind using the Central Limit Theo-
-------
Page 26
o 2
FERROUS
METALS
10X
0 2
8X
FIGURE 1: DISTRIBUTION OF TEXTILES AND FERROUS METALS IN MUNICIPAL SOLID WASTE
rem is to permit the use of t-statistics to construct confidence
intervals about the estimation of the percentage of any component
in the waste stream, an appropriate measure of the ability to
meet the normality requirements is the fraction of confidence
intervals that actually contain the true mean at a given level of
significance. This is shown in Table 1 by the Actual versus
Nominal a lines. For example, for textiles, given an average of
size 10, if a confidence interval at a significance level of o
= .05 were constructed about the mean, the actual significance
level would be o = .104. In other words, instead of a 95% confi-
dence interval, we would actually be constructing an 89.6% confi-
dence interval. Note that as the size of the average gets larger,
the discrepancy gets smaller. For example, at a significance
level of a = .05 and an average of size 50, the true significance
-------
Page 27
TABLE 1: RESULTS OF SIMULATION STUDIES FOR TWO MUNICIPAL SOLID WASTE COMPONENTS
10
15
SIZE OF AVERAGE
20 25
30
J-FUNCTION'(TEXTILES)
35
40
45
50
MEAN
STD (Population)
STD (Mean)
COEFFICIENT OF SKEUNESS
standard deviation
t-value
significance level
Actual at, Nominal a
Actual , Nominal a
Actual , Nominal a =
Actual , Nominal a -
Actual , Nominal a =
Actual , Nominal a =
Actual a. Nominal a
1.6145
1.9037
1.9037
1.45
.03
41.81
.001
.01
.02
.03
.04
.05
.10
.20
1.5924
1.9074
.8530
.66
.03
19.14
.001
.075
.102
.119
.132
.148
.190
.268
1.6001
1.9277
.6096
.46
.03
13.32
.001
.051
.065
.079
.089
J04
.138
.227
1.6058
1.9206
.4959
.33
.03
9.66
.001
.041
.055
.065
.076
.090
.134
.215
1.5956
1.9226
.4299
.32
.03
9.12
.001
.033
.046
.056
.065
.076
.128
.214
1.5957
1.9285
.3857
.30
.03
8.76
.001
.031
.040
.051
.060
.067
.115
.211
1.5983
1.8979
.3465
.27
.03
7.69
.001
.024
.035
.045
.055
.065
.114
.207
1.5920
1.9275
.3258
.29
.03
8.47
.001
.026
.037
.046
.056
.064
.112
.210
1.5972
1.8790
.2971
.28
.03
8.13
.001
.017
.028
.036
.044
.065
.104
.199
1.5994
1.9011
.2834
.21
.03
5.95
.001
.018
.028
.038
.046
.060
.104
.200
1.5951
1.9290
.2728
.20
.03
5.70
.001
.019
.031
.042
.049
.058
.099
.205
MODERATE SKEW (FERROUS METALS)
MEAN
STD (Population)
STD (Mean)
COEFFICIENT OF SKEWNESS
standard deviation
t-value
significance level
Actual a. Nominal a =
Actual a, Nominal a =
Actual a. Nominal a =
Actual a. Nominal a »
Actual a, Nominal a =
Actual a, Nominal a =
Actual a. Nominal a =
3.6712
1.7796
1.7796
.68
.03
19.74
.001
.01
.02
.03
.04
.05
.10
.20
3.6511
1.7750
.7938
.32
.03
9.26
.001
.017
.026
.043
.059
.065
.116
.207
3.6603
1.8034
.5703
.20
.03
5.88
.001
.017
.033
.039
.053
.063
.115
.204
3.6627
1.7920
.4627
.15
.03
4.32
.001
.018
.027
.039
.047
.057
.110
.215
3.6533
1.7982
.4021
.13
.03
3.67
.001
.018
.026
.040
.043
.055
.098
.204
3.6538
1.7955
.3591
.17
.03
4.91
.001
.015
.023
.035
.042
.053
.100
.204
3.6583
1.8620
.3217
.16
.03
4.67
.001
.013
.019
.033
.048
.047
.104
.214
3.6523
1.7937
.3032
.19
.03
5.54
.001
.014
.025
.035
.043
.052
.091
.211
3.6587
1.7563
.2777
.16
.03
4.53
.001
.009
.025
.035
.041
.059
.097
.197
3.6578
1.770
.2639
.12
.03
3.36
.001
.010
.025
.035
.047
.051
.100
.208
3.6564
1.7960
.2540
.05
.03
1.51
.131
.012
.024
.036
.042
.051
.110
.203
NOTE: NUMBER OF ITERATIONS = 5000
-------
Page 28
level is a = .058. As the selected significance level is in-
creased, the discrepancy also gets smaller. For example, at a
significance level of a = .10 and an average of size 10, the true
significance level is o = .138. Note that for the moderately
positively-skewed ferrous metals distribution, these discrepan-
cies are very much smaller, even for very small sizes of the
average.
2.3 WITHIN-WEEK SAMPLE SIZE DETERMINATION
Assuming that the averages of the components are normally
distributed, then the sample size is given by (see Mace, 1964):
n = (ts/d)2 [2.7]
where d is the precision required (i.e., h the confidence inter-
val desired), t is the t-value at significance level a, and s is
the population standard deviation. Because the t-value is not
known until after n is determined, Equation 2.7 is actually a
trial-and-error equation. However, for starting purposes, the t-
value can be replaced by its corresponding z-value, i.e., the
value of the standard normal deviate at 1 - a.
Estimates of s are provided in Table 2. (These estimates are
based on various composition sampling studies throughout the
country, and include within-day and between-day sampling varia-
tion. The Britton (1972) estimates are not appropriate here
because they represent only the within-vehicle variation of one
truckload.)
There is a problem, however, in applying Equation 2.7. The
averages of the components are not normally distributed; they are
positively-skewed. One might consider an appropriate transforma-
tion , such as the lognormal, but since d in Equation 2.7 is not
constant over a lognormal scale (i.e., ln[a-b] is not equal to
ln[a]-ln[b]) we must also have some knowledge of the mean of the
distribution, they very quantity we are trying to estimate. A
simpler approach is to take advantage of the fact that there is a
strong correlation involving the coefficient of skewness, and the
actual and nominal values of a. For example, using the data of
Table 1, the following regression equation was found to have a
coefficient of determination (R2) of 0.98:
On = .0206 + 1.00899tta - .141g1 [2.8]
Note: 1. If an > oa, an = a_
2. If an < 0, an = .001
where a
nominal
_ is the actual level of significance, and an is the
level of significance.
-------
Page 29
TABLE 2: SUGGESTED POPULATION STANDARD DEVIATIONS
COMPONENT STANDARD DEVIATION, S
PAPER COMPONENTS
CORRUGATED PAPER .0744
NEWSPRINT .0687
TOTAL PAPER .1021
METAL COMPONENTS
ALUMINUM .0069
FERROUS METALS .0388
TOTAL METALS .0358
ORGANIC COMPONENTS
FOOD WASTE .0506
GARDEN WASTE .1269
WOOD .1376
TOTAL ORGANICS .1121
MISCELLANEOUS COMPONENTS
ASH/ROCKS/FINES .0572
GLASS/CERAMICS .0502
PLASTIC/RUBBER/LEATHER .0252
TEXTILES .0687
Equation 2.8 can be used to determine confidence intervals
or significance levels even when the distribution is decidedly
non-normal. The only input required is a knowledge of the coeffi-
cient of skewness (which is calculated from the data, using
Equation 2.5) or the sample size, and the desired level of
significance, aa. For example, if aa = .10 and g1 = .33, then,
using Equation 2.8,
an = .0206 + 1.00899(.10) - .141(.33) = .075
Thus, a confidence interval constructed at an a of .075 will
produce the required confidence interval at significance level
a = .10.
Since the value of g^ will generally not be known until
after one has obtained a sample, when one is determining sample
size Equation 2.8 is not particularly useful for sample size
determination. Equations can be obtained, however, that relate an
and ota using n rather than g, . For the textile data in Table 1,
we obtain the following equation (with a coefficient of determi-
nation or R2 of .954):
-------
Page 30
Of- = -.0633 + 1.0121CU + .00136n [2.9]
(37.3) (11.5)
Note: 1. If on > aa, an = a_
2. If an < 0, an = .001
The numbers in parentheses are the t-values of the esti-mated
coefficients. For the ferrous metals data in Table 1, we obtain
the following equation (with a coefficient of determination or R2
of .995):
a_ = -.0102 + .99087a_ + .00019n [2.10]
(116.9) (5.2)
Note: 1. If on > aa, an = cra
2. If crn < 0, on = .001
Thus, if one knows whether the component is distributed as a J-
function (assume this for textiles, wood, and garden wastes),
then Equation 2.9 is used; for all others, Equation 2.10 is used.
To illustrate the use of Equations 2.7, 2.9, and 2.10,
suppose we wished to estimate the concentration of ferrous metals
in the waste stream to within ±2 percentage points of the mean,
at a significance level of a = .05. Using Equation 2.7,
n = (1.9623*.0388/.02)2 = 14.49 or 15, rounded up.
Using Equation 2.10,
ttn = -.0102 + .99087(.05) + .00019(15) = 0.0421.
At n = 15 and a = .0421, t = 2.2367. At iteration #2, therefore,
n = (2.2367*.0388/.02)2 = 18.82 or 19 rounded up.
Using Equation 2.7,
an = -.0102 + .99087(.05) + .00019(19) = 0.0429.
At n = 19 and a = .043, t = 2.1783. At iteration #3, therefore,
n = (2.1783*.0388/.02)2 = 17.86 or 18, rounded up.
Using Equation 2.10,
an = -.0102 + .99087(.05) + .00019(18) = 0.0427.
At n = 18 and a = .0427, t = 2.1902. At iteration #4, therefore,
n = (2.1902*.0388/.02)2 = 18.05 or 19, rounded up.
-------
Page 31
Since we are cycling between 18 and 19, n = 19 and we are fin-
ished.
Finally, note that because the shape of the distribution and
the standard deviation varies with the component, if there are m
components the field sample size, nf, will be the largest sample
size over all of the components, i.e.,
nf = max(nlfn2, . . . ,nm) .
2.4 ESTIMATING THE STANDARD DEVIATION WHEN
NO SAMPLE DATA ARE AVAILABLE
In almost any situation, one can get at least a very rough
estimate of the standard deviation. The minimum information
involves the form of the distribution and the spread of values.
For example, if the values of the component fractions can be
assumed to follow a normal distribution, the either of the
following rules can be used to get an estimate of a:
(a) Estimate two values, a low one, a^, and a high one, b^,
between which you expect 99.7% (almost all) of the values to be.
Then estimate a as:
(bx - ai)/6 [2.11]
(b) Estimate two values, a low one, a,, and a high one, b2,
between which you expect 95% of the values to be. Then estimate a
as:
(b2 - a2)/4 [2.12]
If the values of the component fractions can be assumed to follow
a positively-skewed distribution, than an alternative is to
assume a triangular distribution and estimate a as:
[a5(a5 - b5) + c5(c5 - a5) + b5(b5 - c5) ]/18 [2.13]
where a5, b5, and c5 are the assumed smallest, most likely, and
largest values, respectively, the distribution can take on.
As an example, suppose we are going to sample for aluminum,
and we estimate that the smallest value is 0.2%, the most likely
is 1.5%, and the largest value is 4.1%. Assuming a positively-
skewed distribution, the estimated standard deviation is:
[.2(.2-1.5) + 4.1(4.1 - .2) + 1.5(1.5 - 4.1)]/18 = .66% or .0066
2.5 ESTIMATING SAMPLE SIZE IN A MULTI-STAGE PROCESS
Suppose we do not have a good estimate of the standard
deviation of the distribution of a component. Since the cost of
-------
Page 32
composition sampling is generally high, rather than take a large
sample than is really necessary we can take the sample in more
than one stage. The method (sometimes called "Stein's Method" -
see Natrella, 1966)) is as follows:
(1) Make a first estimate of a (using either Table 2 or
the technique described in Section 2.4). From this,
determine n, the size of the first sample (using the
technique described in Section 2.3). Choose some frac-
tion of n, nlf as the size of the first sample'. (In
Stein's Method, this fraction typically is \.
(2) This first sample of size n, provides a estimate of a.
Use this value to determine How large the second sample
should be.
As an example, suppose we wished to estimate the concentra-
tion of ferrous metals in the waste stream to within ±2 percent-
age points of the mean, at a significance level of a = .05.
Assume that our best estimate of a is .04. Following the proce-
dure outlines in Section 2.3, our estimate of sample size is 20.
Our first sample size (using a fraction of %) , n^f is 10. After
taking this sample we find that the sample standard deviation
is .03. Recalculating the sample size we find n = 13. Since we
have already taken 10 of these sample, only 3 more are required.
Note that we can refine the method simply by making the
first fraction small (say 1/3 or 1/4), and then recalculating the
standard deviation (and hence, new sample size) after every
additional sample obtained.
2.6 ESTIMATION OF COMPOSITION
As with waste quantity, it is well-known, however, that the
composition of solid waste generated varies significantly from
month to month. Thus it is not sufficient to sample one week out
of the year to estimate composition for the complete year, and we
are forced to sample additional weeks throughout the year.
Suppose we sample r weeks out of the year and determine, for
each of these r weeks, the average (as a fraction) of a
particular component of solid waste, pk (where £k = 2p-jv, where
the j sum is over nk, the number of samples talcen in the kth
week), and, Xk, the total quantity arriving at the site in the
kth week. Then an estimate of the fraction of the component over
the year, P, is obtained by weighting the weekly fractions by the
weekly totals:
r r
P = (ZpkXk)/ZXk [2.14]
-------
Page 33
Applying the propagation of error formula (and assuming that the
covariance terms are zero),
var(P) =
S(Xk/SXk)2var(pk) + S[(£k/SXk)(l - Xk/SXk)]2var(Xk) [2.15]
Applying the Bennett and Franklin relationship .(Equation 1.20,
Chapter 1) to Equation 2.15, the effective degrees of freedom,
EDF-, for this variance is given as:
EDFp - [var(P)]2/{S[(Xk/ZXk)2]2[var(pk)]2/dpk +
S{[(pk/SXk)(l - Xk/ZXk)]2}2[var(Xk)]2/dxk) [2.16]
We find the total quantity for the year of the component, T, by:
T = P*W [2.17]
where W is the total quantity of waste for the year. Applying the
propagation of error formula (and assuming that the covariance
terms are zero),
var(T) = W2var(P) + P2var(W) [2.18]
Again applying the Bennett and Franklin relationship (Equation
1.20, Chapter 1) to Equation 2.18, the effective degrees of
freedom, EDFt, for this variance is given as:
[var(T)]2
EDFt [2.19]
[W2var(P) ]
where EDFW is the effective degrees of freedom associated with
the total quantity of waste for the year, W.
Unfortunately, one cannot construct confidence intervals
about the total of the component, T, in the usual fashion, i.e.,
T ± t^varOT)]3*
since the distribution of P is positively-skewed and thus the
distribution of T is also positively-skewed. The assumption of
normality is not appropriate under these circumstances. However,
we can use the logarithmic transformation, which is particularly
effective in normalizing distributions which have positive skew-
ness. If we assume that T1 = ln[T] and is normally distributed
with mean, \i, and standard deviation, CT, then (see Aitchison and
Brown, 1957):
T = e^ + *a2 [2.20]
-------
Page 3 4
and
var(T) = e2^ + a2 [ea2 -1)
Solving for M and a2,
[2.21]
In
var(T)
+ 1
[2.22]
In P - h In
var(T)
+ 1
[2.23]
One can then construct the desired confidence interval by first
computing:
L =
U =
+ t
a
a
The confidence interval around T then is given as:
lower = eL
upper = eu
[2.24]
[2.25]
[2.26]
[2.27]
An example of these calculations is shown in Table 4, using
the data given in Table 3. Note that, assuming that the W2var(P)
and P2var(W) terms are the contribution to var(T) by component
and quantity respectively, then the component term contributed
93% of the variability in this example while the quantity term
only contributed 7%. Thus (and it should come as no great sur-
prise) , the precision of an estimate of yearly quantity of a
given component depends much more on the precision of the esti-
mate of the yearly component standard deviation than on the
yearly quantity standard deviation.
-------
Page 35
TABLE 3: DATA FOR SEASONALITY CALCULATIONS EXAMPLE
102,100,000 Ibs
1,743,900
45
Total, Week #1:
Standard Deviation of Total:
Effective Degrees of Freedom of Total, d
Number of Composition Samples, n:
Average Ferrous Metals (as a fraction)
Standard Deviation of Average:
45
17
0821
.0105
Total, Week #2:
Standard Deviation of Total:
Effective Degrees of Freedom of Total, d
Number of Composition Samples, n:
Average Ferrous Metals (as a fraction)
Standard Deviation of Average:
93,013,000 Ibs
1,363,500 Ibs
29
17
.0773
.0077
Total, Week #3:
Standard Deviation of Total:
Effective Degrees of Freedom of Total, d:
.Number of Composition Samples, n:
Average Ferrous Metals (as a fraction):
Standard Deviation of Average:
97,385,000 Ibs
1,261,900 Ibs
32
17
.0719
.0072
Total, Week #4:
Standard Deviation of Total:
Effective Degrees of Freedom of Total, d
Number of Composition Samples, n:
Average Ferrous Metals (as a fraction)
Standard Deviation of Average:
89,211,000 Ibs
1,355,800 Ibs
42
17
.0589
.0095
Total, Year:
Standard Deviation of Total for Year:
Effective Degrees Of Freedom for Year:
4,975,800,000 Ibs
81,829,000 Ibs
143
-------
Page 36
TABLE 4: 8EASONALITY CALCULATIONS EXAMPLE
{1} {2} {3} {4} {5} {6} {7}
Pk Xk PkXk "k-1 dk Xk/SXk var
.0821 102,100
.0773 93,013
.0719 97,385
.0589 89,211
381,709
{8}
[{6}]2var(pk)
.7887971X10"5
.3520497X10"5
.3374305X10"5
.4929686X10"5
{13}
. {8} + {12
.7963464x10
.3564110x10
.3405653x10
.4955386x10
1.9888613x10
,000 .83824100X10+7 16
,000 .71899050X10"1"7 16
,000 .70019820X10+7 16
,000 .52545280X10+7 16
,000 2.78288200xlO"f7
{9} {10}
pk/SXk {9}[l-{6}]
.2150853xlO~9
.2025103X10"9
.1883634X10"9
.1543060X10"9
A {14}0
} {6}4)*{7}2
.1575544X10"9
.1531636X10"9
.1403064X10"9
.1182424X10"9
/{4} {6}4*{7}2/
~5 .3888756X10"11 .1266461x10
"~ .7746187X10"12 .6559102x10
~5 .7116210X10"12 .3070862X10
~5 .1518863X10"11 .1572631x10
-5
45 .2674813 . 01052
29 .2436752 .00772
32 .2551289 .00722
42 .2337147 .00952
{11} {12}
var(Xk) {10}2var(Xk)
1,743,9002 .7549222X10"7
1,363,5002 .4361353X10"7
1,261,9002 .3134766X10"7
1,355,8002 .2570029X10"7
{16}
{5} {14}+{15}
~15 .3888882X10"11
~16 .7746842X10"?-2
~16 .7116517X10"12
~16 .1518878X10-11
.6894096X10"11
-------
Page 37
TABLE 4 CONTINUED
USING EQUATION 2.14, P = 2.78288200xlO~7/381,709,000 = .0729
USING EQUATION 2.15, std(P) = (1.9888613xlO~5)* = .0045
USING EQUATION 2.16, EDFp = (1.9888613xlO~5)2/.68940960X10"11
=57.38
USING EQUATION 2.17, T = P*W = (.0729)(4,975,800,000)
= 362,735,800 Ibs.
USING EQUATION 2.18, var(T) = W2var(P) + P2var(W)
= (4,975,800,000)2.1988861XlO~4 + .07292(81,829,000)2
= .52799900X1015
and Std(T) = (.52799900X1015)^ = 22978230 Ibs.
USING EQUATION 2.19,
EDFt = [.52799900X1015]2/
{[(4,975,800,000)2.1988861xlO"4]2/EDFp +
[.07292(81,829,000)2]2/EDFW) = 65.84.
USING EQUATION 2.22,
a2 = In [(.22978230xlO~8)2/(362,735,800)2 + 1]
= .4004793X10"2
USING EQUATION 2.23,
H = In (362,735,800) - h(.4004793xlO~2) = 19.70718
ASSUMING A 95% CONFIDENCE INTERVAL, AT EDTt = 65.84, t = 1.997
AND USING EQUATIONS 2.24 AND 2.25,
L = 19.70718 - 1.997(.4004793xlO~2) = 19.58083
U = 19.70718 + 1.9966(.4004793xlO~2) = 19.83354
USING EQUATIONS 2.26 AND 2.27,
lower = e19'58083 = 319,041,400
upper = e19'83354 = 410,766,800
(Note that this is an asymmetrical confidence interval
about the mean.)
-------
Page 38
REFERENCES
Aitchison, J., and Brown, J.A.C., The Lognormal Distribution.
(London, Cambridge University Press, 1957), p. 8.
Britton , P.W., "Improving Manual Solid Waste Separation Studies",
J. San. End. Div.. ASCE. 98(SA5):717-730 (1972).
Kish, L ., "Cluster Sampling", in Survey Sampling (New York: John
Wiley & Sons, Inc., 1965), p. 161.
Klee, A.J., and Carruth, D., "Sample Weights In Solid Waste
Composition Studies", J. San. Ena. Div.. ASCE. 96(SA4):945-954
(1970).
Mace, A.E.', "Estimation Problems", in Sample Size Determination
(New York: Van Nostrand Reinhold Co., 1964), pp. 35-37.
Natrella, M., "Characterizing Measured Performance", in Experi-
mental Statistics. (Washington: United States Department of
Commerce, National Bureau of Standards, 1966), pp. 2-10 to 2-11.
Savage, G.M., and Shiflett, G.R., Processing Equipment for
Resource Recovery Systems; "Ill-Field Test Evaluation of Shred-
ders", Contract No. 68-03-2589, U.S. Environmental Protection
Agency, Cincinnati, OH, 1979).
Trezak, G., Significance of Size Reduction in Solid Waste Manage-
ment. Grant No. EPA-600/2-77-131, U.S. Environmental Protection
Agency, Cincinnati, OH, 1977).
NOTATION
13 = Constant in sample weight Equation 2.1.
d = Precision (i.e., ^ the confidence interval) desired
g± = Coefficient of Skewness
j = jth sample in a given week
k = kth week
k2 = Part of formula for coefficient of skewness
k3 = Part of formula for coefficient of skewness
n = Number of samples
n^ = Number of samples in the kth week
-------
Page 39
Pjk
Pk
r
s
sgi
std(P)
std(T)
t
T
T1
fcgi
var(pk)
var(P)
var(W)
var(Xk)
w
x
xi
xk
a
a.
a
n
Fraction of a given component over the year
Fraction of a given component in jth sample in kth week
Average fraction of a given component in kth week
Number of weeks sampled throughout the year
Population standard deviation of given component
Standard deviation of g1
Standard deviation of P
Standard deviation of T
t-value at significance level o
Total quantity for the year of a given component
log-transform of T
t-value for the coefficient of skewness
Variance of pk
Variance of P
Variance of W
Variance of Xk
Total quantity of waste for the year
Part of formula for coefficient of skewness
Part of formula for coefficient of skewness
Total quantity of refuse in the kth week
Level of significance
Actual level of significance
Nominal level of significance
-------
Page 40
APPENDIX A
DERIVATION OF QUANTITY & COMPOSITION
SAMPLING AND ANALYSIS FORMULAS
A.I PROOF THAT X IS UNBIASED - EQUATION 1.13
To show that Equation 1.13 is an unbiased estimate of the
true mean, IJL, let n^ be the true mean in the ith sampling inter-
val, TJ^ be the true total for the ith sampling interval, and r be
the true total for the entire population consisting of h time
periods. Then, from Equation 1.13 and taking expectaions, e, of
both sides,
e(x)
h
2(mj/N) e
njx
j
E(nu/N)Mi = STi/N = T/N
i i
A.2 DERIVATION OF VAR(x) - EQUATION 1.16
Let a^j be a random variate that takes the value 1 if the
jth unit is in the sample and the value 0 otherwise. The sample
mean of Equation 1.16 may be written as:
h n
x = Z Z
i J
j
[A-l]
where h is the total number of sampling intervals over the
duration of the sampling. Clearly, for a given i,
Pr(aij=l) = n-j/nii, and Prfa^O) = 1^/11^
Thus, a^j is distributed as a binomial variate in a single trial
with p=n^/m^. Hence,
= p =
and
= pq = -
[A-3]
-------
Page 41
To find var(x) we need also the covariance of a^j and
The product, ai-jaik' ^s 1 if both subscripted units are in the
sample, and is zero otherwise. The probability that two specific
units are both in the sample for any given i is clearly n
-l). Hence,
cov(aijaik)
(rn.pl)
n
m
n
[A-4]
Applying the propagation of error concept (Equation 1.15) to
Equation A-l and setting, for convenience, Yjj=m^X.jj/n^, we
/iVi'h a ! « « *
obtain:
var(x) =
cov(aijaik)
h n
£ £
i j
^
-
mi
Z -
YijYik
[A-5]
where
gives:
. Completing the square on the cross-product term
-------
var(x) =
Page 42
h n.s
m
1 ?M
j
nr
h
Z
i
var(Yi:j)
[A-6]
But
= m2var(Xij)/n?. Thus,
var(x) = (l/m2)Z
ni
- <=*«>
[A-7]
A.3 PROOF OF EFFICIENCY OF FULL SAMPLING
To show that full sampling is more efficient (i.e., has
smaller variance) than unbiased systematic or random sampling, it
is sufficient to show that the coefficient of the population
variance in Equation 1.10 is greater than that in Equation 1-16,
i.e.,
[A-8]
Since m = Zm^, f^ = n^/m^ and f = n/m, inequality A-8 can be
rewritten (after first multiplying both sides by m2) as follows:
(m2-m2n/m)/n > S(m?-m?n
so that
and
(m/n)-m > Sm
m2/n
[A-9]
Thus it is necessary only to show that inequality A-9 holds.
-------
Page 43
We start by noting that in systematic sampling (assuming
that the n| are integral multiples of k) , n| = m^/k and k = m/n,
where k is the sampling interval. (Since our symbol for the
number of vehicles sampled in the ith interval is n^ for biased
sampling, we have used n| for its unbiased counterpart.) If the
n| are not integral multiples of k or if simple random sampling
is used, then the n^ will not necessarily be equal to m^/k.
However, the expectation of n', e(n|), is equal to m^/k in such
cases. When sampling to the fullest capacity of our scales, n^
cannot be less than n|, so e(n|) < n^ for all i. Consequently,
m2/n = mk = Zm^k = Zm?/(m^/k) = Zm?/e(n|)
and inequality A-9 is proved.
A.4 DERIVATION OF VAR(P) - EQUATION 2.17
Since P = Epj^X^/SX^, from the propagation of error formula
(Equation 1.15),
var(P) = S[(*f/$pk)2var(pk) + (6f/6Xk) 2var(Xk) ]
Quickly we see that $f/$pk = Xk/SXk.
To calculate
-------
PROTOCOL
A COMPUTERIZED SOLID WASTE QUANTITY
AND COMPOSITION ESTIMATION SYSTEM
OPERATIONAL MANUAL
BY
ALBERT J. KLEE
RISK REDUCTION ENGINEERING LABORATORY
U.S. ENVIRONMENTAL PROTECTION AGENCY
CINCINNATI, OHIO 45268
-------
MANUAL
PROTOCOL VERSION 1.01
TABLE OF CONTENTS
Page
A. INTRODUCTION 1
A..1 FILES AND PROMPTS 1
*
B. DATA FILES 2
B.I CONSTRUCTING AND ADDING TO THE QUANTITY
DATA FILE (OPTIONS 1 AND 2) 2
B.2 CONSTRUCTING AND ADDING TO THE COMPOSITION
DATA FILE (OPTIONS 4 AND 5) 5
C: ANALYZING QUANTITY AND COMPOSITION DATA 7
C.I ANALYZING QUANTITY DATA (OPTION^ 3) 7
C.2 ANALYZING COMPOSITION DATA (OPTION 6) 10
D: DETERMINING COMPOSITION SAMPLE SIZE (OPTION 7) 12
E: EDITING PRO-QUAN.DAT OR PRO-COMP.DAT FILES (OPTION 8)... 14
-------
Protocol Manual Page 1
A. INTRODUCTION
A.I FILES AND PROMPTS
PROTOCOL consists of a single file, PROTOCOL.EXE and is
called merely by entering PROTOCOL on the keyboard. You can,
however, rename this file if you wish. The following is the
PROTOCOL logo.
PROTOCOL
(Solid Waste Quantity & Composition Sampling Protocols)
VERSION 1.01
US ENVIRONMENTAL PROTECTION AGENCY
PROTOCOL creates and uses a number of files with the following
standard (or default) names:
PROTOCOL.OUT - the standard output file for the program,
created by PROTOCOL.
PRO-COMP.DAT - a file that serves as input to the composi-
tion analysis portion of the program. This
file is created by the user.
PRO-QUAN.DAT - a file that serves as input to the quantity
analysis portion of the program. This file
is also created by the user.
QCACCESS.DAT - an internal file that serves as input to the
composition analysis portion of the program.
This file is automatically created by PROTO-
COL.
With the exception of QCACCESS.DAT, if any of the above files
already exist, PROTOCOL warns you of this fact and asks whether
you wish to overwrite them (thus destroying them in the process).
PROTOCOL.OUT can be overwritten if you so wish, but if you do not
want PRO-COMP.DAT OR PRO-QUAN.DAT overwritten, you must exit from
PROTOCOL and save these files by renaming them.
The following is the PROTOCOL Main Menu. Options 1 and 4
construct the PRO-COMP.DAT and PRO-QUAN.DAT files, (Options 2 and
5 are used for adding to these files), and Options 3 and 6 deal
with estimation of waste quantity and composition. Option 7
determines composition sample size (Section 2.3, Chapter 2), and
Option 8 is a full screen editor for editing the PRO-COMP.DAT and
PRO-QUAN.DAT files created with Options 1 and 4.
-------
Protocol Manual Page 2
OPTIONS
1. CONSTRUCT A QUANTITY DATA FILE
2. ADD TO AN EXISTING QUANTITY DATA FILE
3. ANALYZE SET OF WEEKLY QUANTITY DATA
4. CONSTRUCT A COMPOSITION DATA FI1;E
5. ADD TO AN EXISTING COMPOSITION DATA FILE
6. ANALYZE SET OF WEEKLY COMPOSITION DATA
7. DETERMINE COMPOSITION SAMPLE SIZE
8. EDIT QUANTITY OR COMPOSITION DATA FILE
9. QUIT
Use up-arrow and down-arrow keys to move bar
to desired selection (or use number to go to
numbered selection). Then press Enter key.
The PgUp and PgDn keys will move the bar to
the top and bottom positions, respectively.
The selected Option is indicated by a highlighted bar (in this
manual, an "<" arrow is used to show the position of the high-
lighted bar) .
B. DATA FILES
B.I CONSTRUCTING AND ADDING TO THE QUANTITY
DATA FILE (OPTIONS 1 AND 2)
When constructing the quantity data file (PRO-QUAN.DAT), you
will asked to supply a heading for the PRO-QUAN.DAT file (for
identification purposes), e.g.,
ENTER HEADING FOR QUANTITY DATA FILE:
» PROTOCOL QUANTITY DATA EXAMPLE
Data entry follows, first with the entry of the number of vehi-
cles arriving within the first hour of the first week, e.g.,
ENTER NUMBER OF TRUCKS ARRIVING IN HOUR #1, WEEK #1,
» 27
(Press Enter Key if finished entering data for this week.)
then with the entry of the vehicles weights for that hour, e.g.,
-------
Protocol Manual Page 3
ENTER VEHICLE WEIGHTS FOR HOUR 1,
SEPARATED BY A COMMA OR ONE OR MORE SPACES, e.g.,
21232,20395,16930,21020,12046,22739,15322,18205,20397,16724
(When done for this hour, press Enter Key twice.)
21387,17538,18328,23748,26849,19610,16351,21489
Data entry is completed by defaulting on the entry of the number
of trucks in the first hour of a new week, e.g.,
ENTER NUMBER OF TRUCKS ARRIVING IN HOUR #1, WEEK #5,
»
(Starting new week; press Enter Key if finished entering data.)
Upon completion of data entry* the PRO-QUAN.DAT file looks
like that shown in Figure 1. Note that the file consists of a
specific repeating format, i.e., Week-Header/Hour-Header/Data,.
etc., where the Week Header consists of three lines, e.g.,
(Blank Line)
WEEK #1 -
the Hour Header consists of two lines, e.g.,
(Blank Line)
HOUR #1: Total Number of Vehicles = 27
and the Data consists of one or more lines using the comma or
space protocol, e.g.,
29586,28773,18276,23038,21520,18205,22013,19610,16351,21489
16889,23985,25805,15029,24396,21078
When editing the PRO-QUAN.dat file, this format must not be
altered.
Option 2 allows you to add additional weeks of data to an
existing PRO-QUAN.DAT file. Since the file already exists with an
identification file heading, you will not be asked to enter, a new
one.
-------
Protocol Manual Page 4
PROTOCOL QUANTITY EXAMPLE DATA
WEEK #1
HOUR #1: Total Number of Vehicles = 27
6889,4597,4267,6148,3530
HOUR #2: Total Number of Vehicles = 29
7541,11807,11806,11574,6797,10515,10718
HOUR #48: Total Number of Vehicles = 63
2692,5960,4700,4744,4639,3750,4498
WEEK #2
HOUR |l: Total Number of Vehicles - 28
13749,4197,4931,5531
HOUR #48: Total Number of Vehicles = 56
5760,3234,3847,5030,3805,5253,3773,5424
WEEK #4
HOUR II: Total Number of Vehicles - 29
21232,20395,16930,21020,12046,22739
HOUR |48: Total Number of Vehicles = 61
5242,3724,4425,3999,5823,4894
FIGURE 1: EXAMPLE OF A PARTIAL PRO-QUAN.DAT FILE
-------
Protocol Manual Page 5
B.2 CONSTRUCTING AMD ADDING TO THE COMPOSITION
DATA FILE (OPTIONS 4 AND 5)
Data entry for the PRO-COMP.DAT file is similar to that for
the PRO-QUAN.DAT file. You will asked to supply a heading for the
PRO-COMP.DAT file (for identification purposes), e.g.,
ENTER HEADING FOR COMPOSITION DATA FILE:
» PROTOCOL COMPOSITION DATA EXAMPLE
Data entry follows, first with the entry of the component name,
e.g
ENTER COMPONENT NAME:
» FERROUS METALS
(Press Enter Key if finished entering data.)
then followed by entry of the component fractions for each week,
e.g.,
ENTER COMPONENT FRACTIONS FOR WEEK #1,
SEPARATED A COMMA OR ONE OR MORE SPACES, e.g.,
.043,.145,.109,.037,.103,.046,.069,.039,.081,.041,.046,.135
(When done for this week, press Enter Key once;
When done for this component, press Enter Key twice.)
.003,.003,.002,.170,.125,.005,.002,.002,.001,.005, .000,.156
Data for additional components can be added to this file, until
there are no more components. Option 5 allows you to add data for
additional components to an existing PRO-COMP.DAT file. Since the
file already exists with an identification file heading, you will
not be asked to enter a new one.
Upon completion of data entry, the PRO-COMP.DAT file looks
like that shown in Figure 2. Note that the file consists of a
specific repeating format, i.e., Component-Header/Week-
Header/Data, etc., where the Component Header consists of two
lines, e.g.,
-------
Protocol Manual Page 6
PROTOCOL COMPOSITION EXAMPLE DATA
COMPONENT: TOTAL METALS
WEEK #1:
.043, .145,.109,.037,.103,.046,.069,.039,.081,.041,.046,.135
.177,.065,.045,.102,.114
WEEK #2 t
.053,.059,.054,.143,.128,.061,.074,.066,.059,.071,.076,.132
.037,.050,.052,.089,.108
WEEK #4:
.148,.115,.027,.094,.021,.008,.036,.037,.031,.075,.031,.097
.032,.063,.061,.097,.028
COMPONENT: TEXTILES
WEEK #1:
.003,.183,.069,.001,.071,.002,.001,.003,.001,.001,.006,.140
.244,.001,.001,.055,.086
WEEK #2:
.003,.003,.002,.170,.125,.005,.002,.002,.001,.005,.000,.156
.003,.002,.004,.035,.088
WEEK #4:
.182,.094,.005,.036,.001,.000,.004,.001,.001,.003,.007,.047
.000,.001,.001,.050,.000
FIGURE 2: EXAMPLE OF A PARTIAL PRO-COMP.DAT FILE
-------
Protocol Manual Page 7
COMPONENT: TOTAL METALS
(Blank Line)
the Week Header consists of two lines, e.g.,
(Blank Line)
WEEK #1:
and the Data consists of one or more lines using the comma or
space protocol, e.g.,
.148,.115,.027,.094,.021,.008,.036,.037,.031,.075,.031,.097
.032,.063,.061,.097,.028
«
When editing the PRO-QUAN.dat file, this format must not be,
altered.
C: ANALYZING QUANTITY AND COMPOSITION DATA
C.I ANALYZING QUANTITY DATA (OPTION 3)
Option 3 makes all of the calculations, using formulas de-
rived in Chapter 1, to estimate weekly and yearly waste quanti-
ties. A PRO-QUAN.DAT file must exist; otherwise an error message
is issued and the program is terminated. After a check to see if
the PROTOCOL.OUT file already exists (and, if it does, whether
you wish to overwrite it), you will be asked to supply an identi-
fication title for the new PROTOCOL.OUT file that will be creat-
ed, e.g.,
ENTER PROTOCOL.OUT FILE IDENTIFICATION TITLE
» PROTOCOL QUANTITY EXAMPLE
You then will be asked for the significance level for the confi-
dence intervals that will be constructed for the estimates, using
a highlighted bar menu, e.g.,
-------
Protocol Manual Page 8
SIGNIFICANCE
LEVEL FOR
CONFIDENCE1
INTERVALS
.01
.05
.10
.20
Another query
hourly means, e.g.,
made at this time concerns the plotting of
SKIP PLOT OF
HOURLY MEANS?
YES
NO
If this option is selected, plots of the hourly means will be
constructed after the end of data reading and analysis. A typical
plot is shown in Figure 3.
PLOT OF WEEKLY MEANS FOR EACH HOUR
(Dotted line = overall mean)
HOUR
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
NUMBER
VEHICLES
27
29
65
241
309
91
86
54
27
28
57
263
284
93
93
59
SAMPLE
SIZE
5
7
6
8
5
4
8
7
6
6
4
6
6
4
6
8
Minimum = 4,426 Mean = 18,720
Maximum = 25,895
o
0
0 .
o
o
o
0
o
o
0
o
o
. o
o.
o
o
FIGURE 3: EXAMPLE OF HOURLY PLOT OF MEANS
After selection of the significance level, PROTOCOL will read the data
in the PRO-QUAN.DAT file and, if there are no errors in the file, will
proceed to make the necessary calculations. If an error is found, this
will be reported, e.g.,
-------
Protocol Manual Page 9
ERROR IN DATA FILE, LINE NUMBER 28.
SPECIFICALLY, THERE WAS AN ERROR IN
THE RECORDED WEIGHT OF A VEHICLE
IN HOUR NUMBER 4. THE FOLLOWING
IS THE ERROR LINE:
2123X,20395,16930,21020,12046,22739
EDIT QUANTITY DATA FILE
TERMINATE PROTOCOL PROGRAM
The error in this line consisted of a letter (x) where an integer
was expected. PROTOCOL would have caught this error if Option 1
for 21 was used to enter the data. This sort of error can occur
only if you use your own editor to prepare the PRO-QUAN.DAT file.
In any event, PROTOCOL gives you the option of terminating the-
program, or immediately entering its full screen editor to cor-
rect the file.
Figure 4 shows one of the weekly calculation summaries
produced by PROTOCOL. After all of the data in the PRO-QUAN.DAT
file has been processed, a yearly summary is prepared (shown in
Figure 5). The weekly and yearly summaries are written to the
SUMMARY FOR WEEK # 1
NUMBER OF SAMPLING HOURS:
NUMBER OF VEHICLES:
NUMBER OF VEHICLES SAMPLED:
AVERAGE NUMBER OF VEHICLES SAMPLED/HOUR:
VEHICLE SAMPLING FREQUENCY:
48
5,454
295
6.1
5.41%
ESTIMATED WEEKLY MEAN: 18,720
STANDARD DEVIATION OF ESTIMATED WEEKLY MEAN: 320
CONFIDENCE INTERVAL ABOUT ESTIMATED WEEKLY MEAN AT ALPHA = .05:
18,076 < > 19,364
Effective Degrees of Freedom for Mean: 45
Within-Week Coefficient of Variation: 1.71%
ESTIMATED WEEKLY TOTAL: 102,100,000
STANDARD DEVIATION OF ESTIMATED WEEKLY TOTAL: 1,743,900
CONFIDENCE INTERVAL ABOUT ESTIMATED WEEKLY TOTAL AT ALPHA = .05:
98,586,000 < > 105,610,000
-------
Protocol Manual Page 10
FIGURE 4: EXAMPLE OF A WEEKLY QUANTITY SUMMARY
SUMMARY FOR YEAR
NUMBER OF WEEKS SAMPLED = 4
NUMBER OF SAMPLING HOURS: 192
NUMBER OF VEHICLES: 21,663
NUMBER OF VEHICLES SAMPLED: 1,175
AVERAGE NUMBER OF VEHICLES SAMPLED/HOUR: 6.1
VEHICLE SAMPLING FREQUENCY: 5.42%
ESTIMATED YEARLY TOTAL: 4,975,800,000
STANDARD DEVIATION OF ESTIMATED YEARLY TOTAL: 81,829,000
CONFIDENCE INTERVAL ABOUT ESTIMATED YEARLY TOTAL AT ALPHA = .05:
4,814,100,000 < > 5,137,600,000
ERROR AS % OF ESTIMATED YEARLY TOTAL: 3.25%
Effective Degrees of Freedom for Total: 143
Coefficient of Variation of Total: 1.64%
Between-Week Coefficient of Variation: 5.83%
FIGURE 5: EXAMPLE OF A YEARLY QUANTITY SUMMARY
PROTOCOL.OUT output file, as well as the plots of the hourly
means if plotting was selected, and the program is terminated.
Also, a special file, QCACCESS.DAT, is written, containing weekly
and yearly quantity means standard deviations and degrees of
freedom. This file is used in the computation of yearly component
quantities, and must not be edited by the user.
C.2 ANALYZING COMPOSITION DATA (OPTION 6)
Option 6 makes all of the calculations, using formulas de-
rived in Chapter 2, to estimate weekly component fractions and
yearly component quantities. A PRO-COMP.DAT file must exist;
otherwise an error message is issued and the program is terminat-
ed. After a check to see if the PROTOCOL.OUT file already exists
(and, if it does, whether you wish to overwrite it), you will be
asked to supply an identification title for the new PROTOCOL.OUT
file that will be created, e.g.,
ENTER PROTOCOL.OUT FILE IDENTIFICATION TITLE
» PROTOCOL COMPOSITION EXAMPLE
As with the quantity analysis, you will be asked for the signifi-
-------
Protocol Manual Page 11
cance level for the confidence intervals that will be constructed
for the estimates, using a highlighted bar menu, e.g.,
SIGNIFICANCE
LEVEL FOR
CONFIDENCE
INTERVALS
.01
.05
.10
.20
After selection of the significance level, PROTOCOL will read the
data in the PRO-COMP.DAT file and, if there are no errors in the
file, will proceed to make the necessary calculations. If an
error is found, this will be reported, e.g.,
ERROR IN LINE NUMBER 20 OF THE COMPOSITION DATA FILE,
i.e., THERE WAS AN ERROR IN THE RECORDING OF SAMPLE DATA
FOR COMPONENT NUMBER 2 IN WEEK NUMBER 3.
THE FOLLOWING IS THE ERROR LINE:
.043.145,.109,.037,.103,.046,.069,.039,.081,.041,.046,.135
EDIT COMPOSITION DATA FILE
TERMINATE PROTOCOL PROGRAM
The error in this line consisted of a missing comma between the
first two component fractions. PROTOCOL would have caught this
error if Option 4. (or 5J. was used to enter the data. This sort of
error can occur only if you use your own editor to prepare the
PRO-COMP.DAT file. In any event, PROTOCOL gives you the option of
terminating the program, or immediately entering its full screen
editor to correct the file.
Figure 6 shows one of the component summaries produced by
PROTOCOL. The mean, standard deviation, coefficient of skewness,
and degrees of freedom are calculated for each week and for the
year (the yearly estimate is obtained by weighting using the
waste quantity information in the QCACCESS.DAT file and the
appropriate formulas in Chapter 2). An estimate of the total
yearly component quantity, its standard deviation, and a confi-
dence interval for the estimate are also presented. The signifi-
cance level for the construction of the confidence level is
automatically adjusted (using Equation 2.7 in Chapter 2 and the
calculated coefficient of skewness for the year). The coefficient
of skewness for the year is obtained by weighting the weekly
-------
Protocol Manual Page 12
values by both the number of weekly component samples and the
weekly waste quantities. If an adjustment (for skewness) in the
significance level is made, the actual significance level used is
reported.
If the QCACCESS.DAT file does not exist, however, only the
weekly component calculations will be made. If a QCACCESS.DAT
file containing a different number of weeks that the PRO-COMP.DAT
file is used, this will be reported, e.g.,
WARNING! NUMBER OF WEEKS IN QCACCESS.DAT FILE NOT EQUAL TO
NUMBER OF WEEKS IN PRO-COMP.DAT FILE. IT'APPEARS THAT THE
QCACCESS.DAT FILE BELONGS TO ANOTHER DATA SET. SUGGEST
RE-RUNNING QUANTITY ANALYSIS IN ORDER TO OBTAIN THE PROPER
QCACCESS.DAT FILE. PROCESSING WILL CONTINUE BUT PROGRAM
WILL NOT BE ABLE TO CALCULATE WEIGHTED YEARLY COMPONENT
FRACTIONS OR WEIGHTS.
Note that processing will continue, but only weekly summaries.
will be made.
If the QCACCESS.DAT and the PRO-COMP.DAT file contain the
same number of weeks, PROTOCOL has no way of determining if they
"belong" to each other. You will have to make sure that the right
QCACCESS.DAT file is used if you are working on data involving
more than one site. A damaged QCACCESS.DAT file will be detected
by PROTOCOL, e.g.,
WARNING! ERROR IN DATA IN LINE 3 OF THE QCACCESS.DAT FILE!
FOR SOME REASON THIS FILE HAS BEEN DAMAGED. SUGGEST RE-RUNNING
QUANTITY ANALYSIS IN ORDER TO PRODUCE A NEW, ERROR-FREE FILE.
PROCESSING WILL CONTINUE BUT PROGRAM WILL NOT BE ABLE TO
CALCULATE WEIGHTED YEARLY COMPONENT FRACTIONS OR WEIGHTS.
(Press Enter Key to continue: »
Again, note that processing will continue, but only weekly sum-
maries will be made.
D: DETERMINING COMPOSITION SAMPLE SIZE (OPTION 7)
Option 7 makes all of the trail-and-error calculations,
using formulas derived in Chapter 2, to determine composition
sample size. After a check to see if the PROTOCOL.OUT file al-
-------
Protocol Manual Page 13
ready exists (and, if it does, whether you wish to overwrite it),
you will be asked to supply an identification title for the new
PROTOCOL.OUT file that will be created, e.g.,
COMPONENT: TOTAL METALS
STANDARD COEFFICIENT
WEEK MEAN DEVIATION OF SKEWNESS SAMPLE SIZE
1 .0822 .0105 .18 17
2 .0772 .0077 .25 17
3 .0719 .0072 .23 17
4 .0589 .0095 .20
YEAR .0729 .0045 .22
ESTIMATED YEARLY TOTAL:
STANDARD DEVIATION: .
CONFIDENCE INTERVAL ABOUT ESTIMATED YEARLY
ERROR AS % OF ESTIMATED YEARLY TOTAL:
EFFECTIVE DEGREES OF FREEDOM FOR ESTIMATED
17
EFFECTIVE DF = 57.4
362,750,000
22,982,000
TOTAL AT ALPHA = .05
> 410,790,000
6.34%
YEARLY TOTAL = 65.9
FIGURE 6: EXAMPLE OF A COMPONENT SUMMARY
ENTER PROTOCOL.OUT FILE IDENTIFICATION TITLE:
» PROTOCOL COMPOSITION SAMPLE SIZE EXAMPLE
You will also be asked to supply a component identification
title, e.g.,
ENTER COMPONENT IDENTIFICATION TITLE:
» FERROUS METALS
Next the level of significance desired is entered, e.g.,
SELECT
LEVEL
OF
SIGNIFICANCE
.01
.05
.10
.20
-------
Protocol Manual Page 14
and select the assumption you wish to make concerning the shape
of the distribution, the standard deviation, and the desired
sensitivity, e.g.,
ASSUME SKEWED DISTRIBUTION
ASSUME J- SHAPED DISTRIBUTION
PROVIDE COEFFICIENT OF SKEWNESS
MAKE NO DISTRIBUTION CORRECTIONS
ENTER STANDARD DEVIATION: » .038
ENTER PLUS OR MINUS REQUIRED: » .02
The program then does the trial-and-error calculations described
in Chapter 2, Section 2.3. When a steady-state solution has been
achieved, the results are reported in the following manner:
COMPONENT: METALS
at DESIRED PRECISION = .0200
and STANDARD DEVIATION = .0385
FINAL SAMPLE SIZE = 19
(Adjusted for Skew-shaped non-normality.)
You can do additional sample size calculations, either for dif-
ferent components or for the same components with different input
parameters.
E: EDITING PRO-QUAN.DAT OR PRO-COMP.DAT FILES (OPTION 8)
If the EDIT option is selected when errors are reported when
processing quantity or composition information, you will automat-
ically be transferred to PROTOCOL'S full-screen editor. You can
also enter the editor from the Main Menu by selecting Option 8,
"EDIT QUANTITY OR COMPOSITION DATA FILE". Care should be taken
when editing PRO-QUAN.DAT or PRO-COMP.DAT files to preserve the
formats discussed in Sections B.I and B.2.
After entering the editor, the 25th line will display the
following commands,
-------
Protocol Manual Page 15
F1 ABORT F2 UNDO F3 NARK F4 CUT F5 PASTE F6 SAVE F7 DEL EOL F8 DEL L F9 UDEL L FO 7 INS ON
These commands have the following meanings:
Fl ABORT
F2 UNDO
F3 MARK
F4 CUT
F5 PASTE
F6 SAVE
F7 DEL EOL
F8 DEL L
F9 UDEL L
FO ?
INS ON
Aborts edit and returns user to entering routine -
query is "Abandon changes (Y/N)?";
Restores letters deleted by the
A block of text is defined by toggling this command
and moving the cursor keys -. the marked area is
shown in reverse video;
Removes blocked text to a buffer from which it can
be pasted (using F6) at any point where the cursor
is located;
Moves the text in the paste buffer to the cursor
location;
Saves edited file and returns user to entering
routine - query is "Save under file name of PRO-
QUAN.DAT" (orPRO-COMP.DAT);
Deletes text from cursor position to the end of the
line;
Deletes line containing cursor;
Restores the most recent deletion by F7 or F8;
This is the F10 key - it brings up secondary line 25
commands.
Text is entered in insert mode by default; pressing
the Ins key toggles to overstrike mode. When in
overstrike mode, this will read INS OFF,
After pressing the F10 key, the 25th line will display the
following commands,
» - T 1 Home(ROU BEG) EndCROW END) PgUp PgDn *PgUp(F BEG) *PgDn(F END) Ins Del
These secondary commands have the following meanings:
Moves cursor one character to the right;
Moves cursor one character to the left;
Moves cursor one character up one line;
Moves cursor one character down one line;
Moves cursor to the 1st column in the row;
Moves cursor to the last column in the row;
Moves text up one screen;
Moves text down one screen;
This is Control-PgUp - Moves cursor to
beginning of the file;
This is Control-PgDn - Moves cursor to
end of the file;
Toggles Insert/Overstrike modes;
Deletes character to the right of the cursor.
Home(ROW BEG)
End(ROW END)
PgUp
PgDn
~PgUp(F BEG)
~PgDn(F END)
Ins
Del
the
the
-------
Protocol Manual Page 16
Lines may be broken by pressing the Enter key at any point, and
may be deflated by pressing the Del key when the cursor is at the
end of a line.
This version of PROTOCOL is configured for an IBM PC/XT/AT
or compatible (the only real restriction is that the computer is
running under MS DOS 2.x or a higher version), with or without a
math coprocessor. (If a math coprocessor is present, PROTOCOL
will utilize it; if one is not present, PROTOCOL will emulate
coprocessor instructions.) Another requirement is that the
driver, ANSI.SYS, must be present in the CONFIG.SYS file.
Included on the distribution disk are three data files: PRO-
QUAN.DAT, PRO-COMP.DAT, AND QCACCESS.DAT. These are the data
files used in the examples in this manual.
If any bugs are found or you have any questions about the pro-
gram, please contact:
Dr. Albert J. Klee
" Risk Reduction Engineering Research Laboratory
United States Environmental Protection Agency
26 King Drive
Cincinnati, Ohio 45268
513/569-7493
FTS 684-7493
Although PROTOCOL has been tested extensively, you may do
something that never has been done before and may unearth a bug
that no one else has found. Please provide a copy of the data
that lead to the failure and record as much as you can about the
circumstances of the failure. We can identify and correct prob-
lems only if we can reproduce them.
On "cosmetic" bugs (i.e., bugs that involve screen output),
it would be helpful if a screen dump were included with a de-
scription of the problem. (Use "Shift-PrtSc" to print the screen
to your printer.)
------- |