m
                  PROTOCOL
     A COMPUTERIZED SOLID WASTE QUANTITY
       AND COMPOSITION ESTIMATION SYSTEM
                       BY
                   ALBERT J. KLEE
          RISK REDUCTION ENGINEERING LABORATORY
          U.S. ENVIRONMENTAL PROTECTION AGENCY
                 CINCINNATI, OHIO 45268

-------
               PROTOCOL

A COMPUTERIZED SOLID WASTE QUANTITY
  AND COMPOSITION ESTIMATION SYSTEM
                  by
           Albert J. Klee
Risk Reduction Engineering Laboratory
U.S. Environmental Protection Agency
       Cincinnati, Ohio 45268
RISK REDUCTION ENGINEERING LABORATORY
  OFFICE  OF RESEARCH AND  DEVELOPMENT
U.S. ENVIRONMENTAL PROTECTION AGENCY
       CINCINNATI, OHIO 45268

-------
                           DISCLAIMER

     This report  has  been reviewed by the  U.S.  Environmental
Protection Agency and approved  for publication.   Approval does
not signify that the contents necessarily reflect the views and
policies of the U.S.  Environmental Protection Agency, nor does
mention of trade names or commercial products constitute endorse-
ment or recommendation of use.
                              ii

-------
                           FOREWORD

     Today's rapidly  developing and changing technologies and
industrial products and practices  frequently carry with them the
increased generation of materials that, if improperly dealt  with,
can threaten both  public  health and the environment.  The U.S.
Environmental Protection Agency is  charged by Congress with
protecting the Nation's land, air, and water resources. Under a
mandate of national environmental laws, the  agency strives to
formulate and implement actions leading to a compatible balance
between human activities  and  the  ability of natural systems to
support and nurture life.  These laws direct the EPA to perform
research to define our environmental problems,  measure the im-
pacts, and search for  solutions.

     The Risk Reduction Engineering Laboratory is responsible for
planning, implementing, and managing research, development, and
demonstration programs to provide an authoritative, defensible
engineering  basis  in support of the policies, programs,' and
regulations of  the EPA with respect  to  drinking water,  waste-
water, pesticides,  toxic substances,  solid and hazardous wastes,
and Superfund-related  activities.  This  publication is one of the
products of that research  and provides a vital communication link
between the researcher and the user community.

     This report describes a system of  sampling protocols for
estimating the quantity and composition of solid  waste in a  given
location, such as  a landfill site, or at a specific point  in an
industrial or commercial  process,  over a stated  period of  time.
An adequate  estimation of these  elements is essential  to the
design of resource recovery systems and waste minimization pro-
grams, and to the  estimation of the  life of  landfills and the
pollution burden on the land posed by the generation of solid
wastes.  The  theory developed in this report  takes a  significant-
ly different approach over the more traditional sampling plans,
resulting in  a lower cost  and more accurate and precise  estimates
of these critical entities. Although the calculations dictated by
these protocols  are tedious, a computer program,  called  PROTOCOL,
has also been developed to do these  calculations, thus  relieving
a great burden  from the  analyst.  The program is designed to be
run on personal  computers  with modest capabilities.

     For further information, please contact the Waste Minimiza-
tion, Destruction  and Disposal Research  Division  of the Risk
Reduction Engineering  Laboratory.


                           E. Timothy Oppelt,  Director
                           Risk Reduction Engineering Laboratory


                              iii

-------
                            ABSTRACT

     The assumptions  of traditional sampling theory often do not
fit the circumstances when estimating the quantity and composi-
tion of  solid waste arriving at a given location,  such as a
landfill site,  or at  a  specific point in an industrial  or commer-
cial process.  The investigator often has little leeway in the
sampling observation process.  Traditional unbiased random sam-
pling will produce some  intervals of  little or no activity and
others of frenzied activity,  clearly  an inefficient and error-
prone procedure. In addition, there are no discrete entities of
solid waste  composition, such as a basic unit  of  paper or of
textiles, comprising  the  population about which inferences are to
be drawn. Finally, with respect to solid waste composition, the
traditional  assumptions of normality are not valid,  thus preclud-
ing the rote application  of the standard statistical  formulas for
the estimation of sample sizes or the  construction of  confidence
intervals.  This study describes the development  of sampling
protocols for  estimating the  quantity and composition of solid
waste that deal with  these problems.  Since the methods  developed,
although not mathematically  complex, are arithmetically'tedious,
a computer program (designed to be run  on personal computers with
modest capabilities) was written to carry out the calculations
involved.
                               iv

-------
                            PREFACE

     Traditional  sampling  theory  generally  follows the  following
paradigm:

  SAMPLE SELECTION - SAMPLE OBSERVATION - SAMPLE  ESTIMATION.

Typically,   the  sample  selection process is one  in  which  the
samples are chosen by  an unbiased procedure,  such as simple
random sampling or systematic sampling where it is assumed that
the population is already in random order. Traditional sampling
theory assumes that there are sampling elements, i.e.,  discrete
entities comprising the  population  about which inferences are to
be drawn.  In  the sample observation  (i.e.,  data recording)
stage, it is further  assumed  that observation of the elements of
the sample  is  an independent process,  i.e., that there is no
queue of sample  elements building up,  waiting to be  observed
while an observation   on one  sample element is being made.Final-
ly, when it  is desired  to place  confidence intervals  about the
estimates made  in the  sample estimation process,  the distribution
of either the population or of the population parameter  estimated
is assumed to  follow a specific classical probability distribu-
tion; typically, the normal distribution is assumed.  In the
sample  selection process,  similar assumptions are made when
determining the number of  samples to be taken.

     Unfortunately, these assumptions  do not fit the circum-
stances when the  problem is to estimate the quantity and composi-
tion of solid waste arriving at a given site,  such  as a  landfill,
transfer station, incinerator, or a specific point in an indus-
trial or commercial process.  For one thing, the sample comes to
the investigator, which  is the reverse of  the situation commonly
described in standard  sampling textbooks.  Since the investigator
has no  control over  the  arrival  of the  sample elements,  the
sample observation process often is far from independent.  Consid-
er a situation where  it is desired to weigh a random  sample of
vehicles arriving at a landfill.  Suppose that it is feasible to
weight up to 10 vehicles per hour and that the average interar-
rival time  of vehicles is  3.2  minutes. On  the average then, with
either  random or systematic sampling, one  vehicle  would be
weighed every 32  minutes.  If it takes an average  of 10 minutes to
weight a vehicle, then this is well within  the capability of the
sampling system.  Unfortunately,   vehicles do not have uniformly
distributed arrival times. There may be peak arrival periods when
the number of  trucks  arriving and  to  be sampled overwhelms the
weighing capability and one  is forced to  "default"  on weighing
some of the vehicles selected by the sampling plan.   One result
is  that fewer  vehicles are weighed than the sampling plan calls

-------
for, thus reducing the precision of the estimate of solid waste
quantity. More important, however, is that if the load weights of
the defaulted  samples  differ  appreciably from the nondefaulted
samples, bias  will  introduced. For example,  at many landfills
vehicles arriving toward  the end  of the day  tend to have smaller
load weights  than those arriving  at other times.  Since fewer
vehicles arrive towards the end of the day,  the tendency would be
to oversample  these  lightly loaded vehicles and to undersample
the normally loaded  vehicles arriving at  peak  hours, thus intro-
ducing   a bias.  This cannot be accomplished with unbiased sam-
ples; it can,  however,  be accomplished with biased samples, using
the estimation process)to unbias the estimate.  (This technique,
by the way,   is not unknown in traditional sampling theory; it is
used in making estimates  in stratified sampling.)

     Traditional sampling theory  assumes  that  there are sampling
elements, i.e., discrete entities comprising the population about
which inferences are  to be drawn. When it comes to sampling solid
waste for composition,  however, there are no such  discrete enti-
ties. There is,   for example,  no such thing as a basic unit of
paper or of textiles. Thus, sampling procedures based upon dis-
crete distributions  (such as the multinomial or binomial) are not
valid. Nonetheless,  some basic  unit weight  of sample  must be
defined. In traditional  cluster sampling theory,  a balance is
achieved between  the within-cluster  and between-cluster compo-
nents of the total variability of an estimate. If  the cluster
(i.e., in this context, a sample of given weight) is too small,
then the between-cluster variability will  be greater  than the
within-cluster variability and  will  result  in  a  large sample
variability. If the  cluster weight is too  large,  however,  the
greater will  be the time and expense of sampling. Further compli-
cating this situation, the optimal sample  weight is related to
the size of the particles in the sample.

     Finally,  although assumptions that  solid waste quantities
follow normal distributions are justifiable  in the estimation of
solid waste quantity,  such is  not the case in estimating solid
waste composition.   For  one  thing, composition fractions  are
bounded, i.e., there are no components  in  solid waste  that are
present  in fractions less than zero  or  greater than one.  These
boundaries are generally located close to  the means  of their
distributions.  Thus,  solid waste  component distributions are, at
the very least,  positively-skewed (i.e., skewed to the right)
and, at  worst,  are J-shaped.  Nor does reliance on  the Central
Limit Theorem  of statistics help much,   since even  averages of
component fractions do not  approach normality quickly,  at least
not within an economically  feasible number of  samples. Distribu-
tions of component averages still tend to be positively-skewed.
This characteristic precludes the rote application of the  tradi-
                                           !

                               vi

-------
tional statistical formulas for either the estimation  of  sample
size or the construction  of confidence intervals.  Although  trans-
formations can be used to construct confidence asymmetric  inter-
vals, these  are of little help when estimating  sample size. A
knowledge of the effect of positive skewness  on the actual level
of significance of a  confidence interval,  however, can be of help
in determining the number of samples to take.

     The purpose of this  study was to develop  sampling protocols
for estimating solid  waste quantity and composition to  solve the
problems enumerated above. This  included  both sampling  and esti-
mating procedures. Since  these methods,  although not mathemati-
cally complex,  are arithmetically tedious,  a computer program
(designed to be run on personal computers with modest  capabili-
ties) was developed to carry out all of the calculations required
by the protocols. The program, called PROTOCOL, also contains
routines  that check  input data for errors  in  coding, and an
editor for preparing  and  modifying input data files.
                              VII

-------
                                                          Page 1
                           CHAPTER 1

                       QUANTITY ESTIMATION
1.1  INTRODUCTION

     This study  is  concerned with innovative quantitative  ap-
proaches to sampling solid waste streams for quantity and compo-
sition, and with methods for estimating these values  for a given
waste  shed  over  a period of time. In  addition, computer-based
programs  have been  created for  implementing  the statistical
models selected for  the determination of sample sizes for quanti-
ty and composition, and for producing  the estimates  (along with
measures of their uncertainties) of quantity and composition once
the data are collected.

     There are two basic paths to the  estimation of  solid waste
quantity and composition:  (1) direct measurement, and (2) predic-
tive models. Within both approaches there are many  variations on
a theme. In direct measurement, for example,  one can measure at
the point of generation (at each house, commercial establishment,
etc.) or at the point of destination (at a landfill, incinerator,
community recycling center, etc.). The former,  however,  is much
more costly and time consuming, and the sampling protocol or plan
is more complex to design  and implement than in point of destina-
tion methods. Therefore, point  of generation methods  are neither
economically feasible nor of sufficient  accuracy for waste shed
predictions.

     Predictive models  rely on surrogate or  ancillary  measure-
ments to estimate waste quantity or composition.  On the one hand
there  are the Leontif-type input-output models  in which  the
materials entering a waste shed are placed on one dimension of a
matrix, with their waste products placed on  the other dimension.
Using suitable conversion rates for each cell in the matrix  and
applying certain mathematical operations  (such  as  matrix inver-
sion) , it is possible to  estimate the quantities  of  the wastes
produced.   Although  of some value on a global or strategic level,
such approaches are infeasible for local levels because (1)  the
data bases simply are not available, and (2)  the degree  of accu-
racy achieved is inadequate for local  objectives.  Other predic-
tive models are those in  the  form of  equations (usually of  the
regression type)  that relate quantity  or composition  to  selected
independent or predictor variables. (The simplest example equates
quantity to the product of waste generated per  capita and total
population.)  However,  many variables affect waste  generation:
geography,  climate,  income level, local ordinances,  etc.  To
obtain models with any statistical validity,  much data would have
to be  gathered all  over the country  to estimate the  parameters

-------
                                                         Page 2


 (constants)  of the model. Also, the significance of  the  individu-
al predictor variables and  the values  of  the parameters would be
expected to change with time. Therefore, such models would have
to be maintained, a costly procedure that has little chance for
support, even by government agencies. It appears,  therefore, that
the most  pragmatic approach  to  the estimation  of solid waste
quantity and  composition is  direct  measurement at  the point of
destination.

     Upon examination,  ordinary sampling  techniques  are inappro-
priate for the estimation of the quantity or composition of waste
arriving at a destination site. For one thing, the  sample comes
to the investigator, which is usually the reverse of the situa-
tion commonly described in standard statistical sampling texts.
Once a scale is rented and the labor hired to make  the measure-
ments,  it makes good economic sense  to use the equipment and
labor to  the fullest extent  possible.  This is  totally  unlike
survey sampling where  the variable cost of an interview  is usual-
ly greater than the fixed cost (before sampling). In statistical
sampling, the elements of a population are defined  jointly with
the population itself,  i.e., the elements are  the basic units
that comprise and define  the population.  In the  solid waste
composition case,  however,  a truckload  is simply  too large to
serve as a  single  sampling unit.   Furthermore,  in  sampling for
composition the situation  resembles  (but is not identical to)
multinomial sampling  for which there  is  not much discussion in
the statistical texts. Solid waste  quantity and  composition
exhibit strong seasonal effects that must be  taken into  consider-
ation in  the sampling protocols, another  topic  not  commonly
encountered in the textbooks. Furthermore, traditional  survey
sampling methodology usually attempts to select  unbiased samples.
It can be shown,  however,  that it is more efficient  to  select
biased samples  and then correct  for the  bias  in the estimation
formulas employed.  For all of these reasons,  it  is appropriate to
take a  fresh look at the  problems of  estimating  solid waste
quantity and composition.
1.2  SOME BASIC STATISTICAL CONSIDERATIONS

     In statistical sampling, the elements  of  a population are
defined jointly with the population  itself,  i.e.,  the elements
are the basic units that comprise and define  the population. When
sampling for solid waste quantity, the population elements usual-
ly can be taken as the individual vehicle-loads.  ("Vehicle-load"
is used here  to stress the fact that, for example,   one might
observe 500  loads  delivered in  one  day by only 100  different
vehicles.  For simplicity, however, vehicle and  vehicle-load will
be used interchangeably in this study.)  A "parameter" is a numer-
ical quantity that, perhaps with other parameters,  defines the
population completely.  Suppose that the weight  of solid waste in
a vehicle is distributed normally with mean /t and standard devia-

-------
                                                          Page  3


tion a. /i and a are the parameters of the distribution and,  taken
together, completely define the distribution.

     Precision and  accuracy are  two  terms that are relevant to
the parameters of  a  population; more specifically,  they are
applied  to  estimates of the parameters.  An singl.e estimate is
accurate if  it is  close  to the  true  value, and a collection of
estimates is accurate if  their average value is close to the true
value. The difference between  the estimate (or average  estimate)
and the  true value is known  as the "bias".  Precision, on the
other hand,  refers  to the closeness  of multiple estimates  of  a
parameter.  If the  estimates do  not  vary much among themselves,
the method of estimation is  said to be  "precise". The concept of
precision is inversely  related  to that of variance or  standard
deviation  in that the greater  the  precision,  the smaller the
variance or  standard  deviation  of the  estimate. These  concepts
are illustrated by the distributions  shown in Figure  1.  Each the
individual drawing represents  the distribution of repeated  esti-
mates of some true value, T, and it  is  assumed that the estimate
of T is taken as the mean of the distributions shown.   It will be
noted that the accurate  distributions  (A  and  C) are centered on
the true value, T,  i.e.,  the bias is  zero. The precise  distribu-
tions  (A and B)  are those  with less dispersion,  i.e., smaller
standard deviation. Obviously, estimates  that are  both  unbiased
and of high precision are preferred.

     Accuracy is affected mainly by  the selection process,  i.e.,
the way by which the  sampling units are selected;  precision, on
the other hand,  is  affected mainly by the measurement process and
the sample size.  Since the most  difficult  aspect of any protocol
for estimating solid waste quantity  involves the selection  proc-
ess, the major problem perforce is one of accuracy.
1.3 THE .SAMPLE SELECTION PROCESS

     It should be clear that if  a  destination site has or can be
fitted  with scale facilities that permit all vehicles to be
weighed, there is no  sample  selection (or estimation)  problem. It
is assumed here  that this is  not  the  case. For the moment, the
problems  of trend and cyclical or seasonal variation of the
quantity of solid waste delivered to the site will be deferred,
and it  will  be assumed that  sampling is to take  place over a
period  of one week.  (If  the site only operates x  days a week,
then a one-week sample  involves  sampling on each of the  x days).
When sampling over this  period, variations  in quantity due to
hour-to-hour and day-to-day  differences are accounted for in the
estimate.  Week to week  differences, however, are far  less impor-
tant than month to month  differences.  Therefore, it makes little
statistical sense to sample for periods of two or more  consecu-
tive weeks.  In order to achieve  maximum  sampling efficiency in a

-------
                                              Page 4
   a = STANDARD DEVIATION
        = BIAS
          -a—*
           III
(A) ACCURATE AND PRECISE
     I
I
(B) INACCURATE BUT PRECISE
           I.
           liiiiii    nil
                                  r
                 h..
(C) ACCURATE BUT IMPRECISE     (D) INACCURATE AND IMPRECISE
       FIGURE 1: CONCEPTS OF ACCURACY AND PRECISION

-------
                                                          Page 5


statistical sense, the  one  week sampling periods must be spread
throughout the year.  This will be discussed  in the next section.

     The problem of selection bias can be addressed in either of
two ways:

     1. Make  no assumptions about the nature of  the arrivals  of
        the vehicle,  and take a random sample, or
     2. Assume that the vehicles arrive in random order and  take
        a systematic sample, i.e.,  one in which every kth vehicle
        arriving at the site is weighed.

The second approach is attractive for two reasons:  (1) the proce-
dure by which the trucks are selected for the sample is relative-
ly simple, and  (2)  if we are interested in  separate estimates
for different types of  vehicle (different sizes of trucks, com-
mercial  versus residential,  etc.),  the systematic  sample can
easily yield a proportionate sample,  which has statistical advan-
tages that will be explained later.

     The basic  problem  with systematic  sampling has  to do with
any departure from the  assumed randomness in  the arrival of the'
vehicles. These departures are of two kinds:

     1. A  monotonic trend may exist  in the  weights of  the
        loads, e.g.,  the loads may increase  with  time  over
        the  week. Since a systematic sample consists of  a
        random  start followed by sampling.each   kth  truck
        afterwards,  the estimate will depend on the  random
        start within the first interval.  In  Figure 2A,  the
        low random start (solid dots)  will produce a  lower
        estimate than the high random start  (open circles).
        The  estimates  in these two  cases will   be  biased
        either low (solid dots)  or high (open circles).

     2. A  cyclical  or  periodic trend may  exist  in  the
        weights  of the loads. In Figure 2B, if the  random
        start happens to fall at the  top of  a cycle  (solid
        dots),  the estimate will be  high; if it  falls  at
        the  bottom (open circles),  it will  be low.  Again,
        in either case the estimates  will be biased.

It does not appear, however, that either of these events poses a
real problem. The interval between  vehicles  is so  short that
neither monotonic or  periodic effects would influence the esti-
mate significantly.  Furthermore,  simply changing  the random start
each day would average out the effect of any single start.

     As was mentioned,  systematic sampling  consists of sampling
every kth vehicle after a random start. The random start,  r, is
the rth  vehicle in the arriving vehicle sequence where r is a
number, chosen at random, between 1 and k. The succeeding  vehi-

-------
                                                                 Page 6
       cles to be  sampled are k+r, 2k+r,  3k+r,  etc. The  random start
       from 1 to k imparts to each vehicle the selection probability
       l/k=f where f is known as the sampling frequency. If we know the
       total number of vehicles,  N,  arriving during the  sampling period,
       the total sample size,  n,  is  given as n=fN=N/k. (Note that n will
       be an integer only if N is an integral multiple of k.)
         WEIGHT OF
       VEHICLE-LOAD
                               INTERVAL k
                                         T
                                         o
                               T

                               O
                                            INCREASING TIME
                                 FIGURE 2A: MONOTONIC TREND
  WEIGHT OF
VEHICLE-LOAD
                                      INTERVAL k
                         T

                         O
T

O
                                 o
                                         -+ INCREASING TIME
         O
          O
                               FIGURE 2A: CYCLICAL TREND

-------
                                                          Page 7


     Although the concept  of systematic sampling  is  relatively
simple, a problem arises when sampling is weighing scale-limited.
For example, suppose it is known that approximately 750 vehicles
arrive at a site over a five-day period,  and a sample  size of 75
is desired. Since k=N/n,  the sampling interval, k, is every 10
vehicles (after a random start between 1 and 10). Suppose further
that it is practicable to weight up to 10 vehicles  per hour.  The
average interarrival  time  of vehicles is  (5  days)(8  hr/day)(60
min/hr)/750=3.2 minutes. On the  average,  then, one  vehicle would
be weighed every 32  minutes,  apparently well within the capabili-
ty of the weighing  system. Unfortunately,   vehicles do not have
uniformly distributed arrival times.  There may be  peak arrival
periods when  the number of  trucks arriving  and  to be  sampled
overwhelms the weighing capability. One is forced to "default" on
weighing some  of the vehicles selected by the sampling plan. This
has two effects. One  result is  that fewer vehicles are weighed
than the sampling plan called for, thus reducing the precision of
the estimate of solid  waste quantity. Second, if the load weights
of the defaulted samples differ  appreciably from the nondefaulted
samples, bias will  introduced.  For example,  at many  landfills
vehicles arriving toward the end of the  day tend to have smaller
load weights  than those arriving at other times (often this is
simply a policy not to have vehicles stand overnight with refuse
in them).  Since fewer  vehicles arrive towards the end of the day,
the tendency would  be  to oversample the  lightly loaded vehicles
and to undersample the normally  loaded vehicles. Thus,  a bias is
introduced.

     The selection  of  a random  sample is more complicated than
that of a systematic sample. Assuming  a selection probability
equal to that  of a  systematic sample,  a  random sample  is sampled
with probability f=l/k. If  we desire a 10% sample (i.e., f=0.10),
then we must consider  each  vehicle entering the site and, using a
probability generator of some sort,  decide whether  to  sample it.
(For example,  use a table  of random numbers from 1-1000; if the
random number  fell below 101  sample the vehicle; otherwise let it
pass.) Not  only is  this more complicated than systematic sam-
pling, it may  result in a greater number of defaults since random
samples "bunch up"  more than systematic  samples. Thus, the sys-
tematic sample has  a number  of  advantages  over random sampling.
    is the method of choice in this study.
1.4  ESTIMATION OF MEANS,  TOTALS, AND VARIANCES

     For a  simple random sample, or  for a systematic sample
where it can be assumed  that the population  contains neither
significant trend or significant cyclical components,  the  mean,
x, is estimated by Equation 1.1,

-------
                                                           Page  8
                           n            n
                       x = Z w^X-i  = 1/n S Xi                 [1.1]
                           i            i


where X^ is the  ith  observation and the weight, w^, is equal to
1/n where n is the total number of observations.  The total,  X, is
estimated by Equation 1.2,

                             X = NX                         [1-2]

where N is the total number of elements in the population  for the
time sampled. The variance of the  individual  observations is
computed from Equation 1.3,
                               n      n
                               ZXf - (Z
                               •  ^    •
                               i      i
ZX? - (ZXi)2n

           	             [1.3]
                                   (n-1)

and the variance of the mean and of the total  are,  respectively,

                              (1-f)
                     var(x)  = 	 varfX.^                  [1.4]
                                n

                     var(X)  = N2var(x)                      [1.5]


where (1-f) is the  finite sampling correction factor and f=n/N.
These equations are well-known and comprise the basic relation-
ships  for simple random sampling within a  finite population (see
Kish, 1965).
           *>
     We can restate Equation 1.1 for the estimation of a mean by
grouping, and then  summing,  the observations on an hourly basis,
i.e.,

                      h n.:              h n.!
                  x = Z S w.  Xjj  = 1/n  S Z  X.!4               [1.6]
                      i j      D        i j   D


where Xjj  is the jth observation in the  ith hour, n^  is the
number oV observations in the ith hour,  h is the number of hours,
n is the sum of the h n^'s,  and w^ = 1/n.   We  note that, drawing
an analogy with Equation 1.3,  the variance of Xjj  in interval i
for all intervals where n is not equal  to 1 is:

-------
                            nj
                                                          Page 9
     var(Xij) =
                                  (n-1)
                                                           [1.7]
The variance over all  Xj  is the weighted (by the  number of
vehicles sampled in each
           jj
          hour
                            ) average of the i variances,  i.e.,
                var(Xi;j) =  (l/n) S n
                                             [1.8]
Thus, the  estimate of the variance of an individual load now
becomes:
(l/n)  Sn±
      i
                                  ni
                                  (f
                                  j
                                                           [1.9]
(for all intervals where n^  is  not  equal to  1),  and Equation  1.4
is slightly altered to:
                    var(x)
                              (1-f)
                               n
                     var(Xi:j)
                                                          [1.10]
Equations  1.2  and 1.5 remain  unchanged. Note that when using
Equation 1.9,  if n^ =  1 the datum for that hour cannot be  used in
the calculation.  Also  note that the "hour" interval  used in these
equations  really can  be  any time unit,  e.g.,  half-hours,  etc.
Furthermore,  even if an hour is selected as the interval,  it need
not begin on the hour (e.g., the first hour could be  7:30 AM to
8:30 AM, with the second hour 8:30 AM to 9:30 AM,  etc.).
1.5  CORRECTION FOR DEFAULTING (UNDERSAMPLING)

     We first consider the case where undersampling takes place
because of burdens placed upon the weighing  system,  i.e., al-
though the sampling plan  calls  for specific sample sizes to be
obtained each hour, the weighing system cannot keep up with the
requirements  and we default on  taking part of the  sample. If

-------
                                                          Page 10
there is  lack  of  randomness in the vehicle-load data  (as  there
would be, for example,  if the loads tended to weight less toward
the end of the day or if certain-sized vehicles tended to arrive
at a destination site at particular times of the day),  the  equal
weighting  fw.^  = 1/n) of the observations in Equation  1.6  would
bias the estimate  of the mean.

     Where there  are no constraints  on the  weighing system,
random or systematic sampling can  be  considered equivalent  to
proportionate sampling  where the  number of samples  in  a sampling
stratum (the ith hour,  in this case) is proportional to the size
of the stratum. Thus,

                          n-fmi                          [1.11]

where mjj  is  the total  number of vehicles  arriving in the ith
hour. Note, however,  that in this case,
= 1/n = 1/fN
                                         = 1/f.^N
                                           [1.12]
Introducing these new weights of Equation 1.10  into Equation 1.6
we obtain:
x = 2 S
    i J
                                        w'x•
                                         11
                                      [1.13]
where x.j  is the mean in  the  ith hour, and w|= m^/N. The important
observation  to be made here is that  it  can be shown  (Appendix
A.I) that, even with defaulting,  Equation 1.13 produces an  un-
biased estimate of the true  mean.  Since n =  f N  = n^N/m^ = n^
Equation 1.9 now becomes:
                        sj
                        J
              ni
             
-------
                                                           Page 11
       var[f(x1,x2,...,xn)] =

     n                    n  n
     S (Sf/Sxi )2var(x<)  + S  Z Sf/S^i  6f/6x+  cov(xs,x^)    [1.15]
                     1                     -1          -1
Applying the propagation of error formula to Equation 1.13  (and
assuming that the covariance term are zero)  we obtain:
                 h
  var(x) = (1/N2 Z
                 i
                                                           [1.16]

                     2i. -     n^
"i o      i
Z X*j - (Z 1

for all intervals where n^ is not equal to 1. The  rather  compli-
cated derivation of Equation 1.16 is given in Appendix A.2.  Equa-
tions 1.2 and  1.5  still  remain unchanged. Note that  this method
of sampling requires another  data set in addition to that  re-
quired by ordinary systematic sampling,  i.e.,  the total number of
vehicles arriving each hour. Note that another way  of describing
the variability of x is in terms of its  coefficient of variation,
i.e., the standard deviation as a fraction of the mean. Thus  the
within-week coefficient of variation (expressed as  a percentage),
cw, of x is

                      cw = [var(x)]V*                     [1.17]


Typical values for this coefficient of variation range,  depending
upon the number of trucks sampled during a sampling week, from 2
to 5%. A confidence interval around the  mean,  X, is


                     x ± t [var(x)]*                       [1.18]

where t is the Student t-value  at some significance level, a,  and
degrees of  freedom,  df.  Defining  "error" as one-half of this
confidence interval, i.e., t[var(x)]% the error of the estimate
of x expressed as a percentage  of x is given  by:


                 E = 100 t[var(x)]V*                      [1-19]

-------
                                                          Page 12


     According to  Bennett and Franklin  (1954) ,  if  MS is a mean
square and

               MS = a ^  MS-i ~t~ 3o MSp  "I" a-j  MS -j  +  . . .

where MS^ is based upon dj degrees of freedom,  then  the effective
degrees of freedom, EOF, for MS is:


                             MS2
          EOF = -            [1.20]
                     2
                     a«
Defining, for all i where n  not equal  to  1,
                     S^cf-i  - (S^-:)2^
                     j   3     j    3                        [1-21]
                             n±-l


and applying the Bennett and Franklin relationship,  Equation
1.20,  1.16, the effective degrees  of  freedom, EOF, for construct-
ing confidence intervals about means or totals for all intervals
is:

                         (S a± MSi)2
                EOF =  -                   [1.22]
where a^ = m? (l-n^/m^)/n^N2 and the sums are over all i where n^
is not equal to 1, and N = Zm^ where the sum is over all i where
n^/m^ is not equal to 1.  This value of EDF should be used for the
degrees of  freedom when determining the t-value  for  Equations
1.18 or 1.19.
1.6  FULL SAMPLING

     Typically,  depending upon  the method  of readout  and the
number of axles  involved,  it may require up  to 30-45 munutes to
weigh one truck using wheel scales. However,  platform scales are
frequently 20 to 30  times as fast. When  using platform scales,
then, it will be possible to oversample  some  intervals, if not

-------
                                                         Page 13


all of them. As was mentioned  previously, once a scale is rented
and the  labor hired to  make  the measurements, it makes  good
economic sense to use the equipment  and labor to the  fullest
extent possible.  Since some  intervals may be  oversampled and
others undersampled,  sampling to the fullest capacity  of the
weighing system is defined in this study as  "full sampling". Full
sampling means that, when finished weighing the current vehicle,
we sample the next available vehicle.  Within a reasonably short
sampling interval  (e.g.,  one-half or one hour) it can be assumed
that arrivals are random;  indeed,  this is  the assumption of
traditional systematic sampling.  Under such circumstances,  sam-
pling the next available  vehicle  is identical to systematic sam-
pling with frequent, but random, starts.

     Two questions then arise:  (1) How are the estimates in full
sampling adjusted  for  bias?,  and  (2)  Is there any advantage to
full sampling? The answer to the first  question is simply that we
use the same equations  as for  defaulting or undersampling, i.e.,
Equations 1.12 and 1.15 (the equations for totals,  1.2 and 1.5,
also apply).  As .for the  second question,  when we apply  full
sampling, the variance  of the  estimated mean or total decreases.
(The proof of this is given in Appendix A.3.)  Thus, assuming that
the scales are rented for a given time period and that no extra
people need be hired, it always pays  to sample fully.

     The use of these  formulas is illustrated in  Table  1 where
data from a random or systematic sample are presented for an 8-
hour sampling period from a population  with  true mean = 18605.55.
(The population consisted of  eight triangular distributions in
which the mean started  low at hour one, then rose to a maximum at
hours four and five, and  then  finished  low at hour eight.) For a
desired  sampling  frequency,  f,  of 0.1, the number of  trucks
sampled each hour  would have to be 1/1/2/8/10/3/3/2, respective-
ly. However, it was assumed that it  was not physically possible
to sample more than six trucks in any one-hour period; therefore,
the  actual  number  of  trucks  sampled  each  hour   was
1/1/2/6/6/3/3/2, respectively. Since  the sample  is  biased,  the
estimate of 17560.46 for  the mean obtained by using Equation 1.1
is also biased. Using  Equation 1.13, however, the unbiased esti-
mate of the mean  is 18774.53, which is much  closer  to the true
value  of   18605.55.  Note  that  24  trucks were  sampled
(1+1+2+6+6+3+3+2=24) .   If this were simple  random  sampling,  the
degrees  of freedom would be  24-1 or  23.  Since the effective
degrees of  freedom was 14.64,  the efficiency of the  degrees of
freedom is 14.64/23 = 0.6365 or 63.65%.

     Table 2 shows the  averages of 1000 daily  samples (i.e.  1000
days of Monte Carlo simulation) from  the same  population used for
Table  1  for three different  cases:  (1) Random or systematic
sampling, no scale constraints, (2) Random or  systematic sampling
with scale constraints (only six vehicles can be weighed in any
hour),  and  (3) Full sampling  (again assuming that  six vehicles

-------
                      TABLE 1: CALCULATIONS FOR EXAMPLE OF SECTION 2.4
{1}
i
1
2
3
4
5
6
7
8
Totals:
{1}
i
1
2
3
4
5
6
7
8
Totals:
{2}
_
—
20
80
100
30
30
20
280
{2}
mi
_
—
20
80
100
30
30
20
280
{3}
ni
_
—
2
6
6
3
3
2
22
{3}
ni
_
—
2
6
6
3
3
2
22
{4}
5734.19
10155.29
36321.28
144738.20
140308.40
45900.44
31825.14
6468.04
421450.97
{10}
{5}
_
—
659856700
3587637000
3305738000
728406800
338253500
21095340
•
{11}
{6}
{5}-{4}{4}
_
—
239110.
96114180.
24664160.
26123340.
640261.
177597.

{12}
/{3}


10
00
00
00
20
50


{7}
_
—
239110.10
19222840.00
4932832.00
13061670.00
320130.60
177597.50

{13
{8}
m^/22









}
nijL/300n^ {5}ZXjj m?l-n^/280)/ro2^ {7} {12}
.0333
.0333
.0333
.0444
.0556
.0333
.0333
.0333

191.13
338.50
1210.70
6432.80
7794.91
1530.01
1060.83
215.60
18774.53
_
—
.002295918
.012585030
.019982990
.003443877
.003443877
.002295918












548.
241920.
98572.
44982.
1102.
407.
387534.


98
00
75
79
49
75
80
_
—
.071
.276
.357
.107
.107
.071


{13}









{9}
{7}{8
_
—
17079
5492239
1761726
1399465
34299
12685
8717495
{14}
{13}/(ni-l)
_
' -
301379
11705057280
1943317409
1011725698
607742
166260
14661175768
,


.29
.00
.00
.00
.71
.53
.00











a = (8717495.00)= 2952.54
Using Equation 1.1,
  X = 421450.97/24 = 17560.46
Using Equation 1.16,
     Std(x) = (387534.80)3* = 622.52
Using Equation 1.13,
  X = 18774.53
Using Equation 1.4,
      std(x) = (1-22/280)^ 2952.54/722 = 604.25
Using Equation 1.17,
  cw = 100(622.52)/17560.46 = 3.5%
Using Equation 1.22,
 EOF = (387534.80)214661175768 &"lO.24

-------
                                                          Page 15


                TABLE 2:  AVERAGES FOR 1000 DAILY SAMPLES
SYSTEMATIC OR
RANDOM SAMPLING
NO SCALE
CONSTRAINTS
SYSTEMATIC OR
RANDOM SAMPLING
WITH SCALE
CONSTRAINTS
FULL
SAMPLING
SAMPLE             {1/1/2/8/10/3/3/2}  {1/1/2/6/6/3/3/2} {6/6/6/6/6/6/6/6}

SAMPLE SIZE               30                 24                48

POPULATION a [1.14]      3426.37            3409.62           3317.31
TRUE POPULATION a       *	 3351.65 	>

MEAN [1.13]*           18613.57           18630.06          18638.08
BIASED MEAN [1.1]*     18613.57           17545.41          14080.19
TRUE MEAN              ««—*	 18605.55 	+

a OF MEAN [1.16]*       614.29             715.59            595.47
TRUE a OF MEAN          578.68             701.72            672.99

* Numbers in brackets refer to equations used.
  can be weighed in any hour). Note that  the  estimate of the mean
  can be  quite biased if  the ordinary sampling formulas,  e.g.,
  Equation 1.1, are used when defaulting  or undersampling occurs.
  The advantages of full  sampling are also clear since the standard
  deviation of the  estimated  mean has been reduced by approximately
  17% from that obtained with systematic or random sampling.  Since
  the within-week sampling coefficient  of  variation,  c..,  is impor-
  tant for sample sizing determinations, the model of Tables  1 and
  2 was used to simulate  (using 100 iterations)  a one-week sample
  for different sample sizes. The  results  are  shown in Table  3. In
  general, the c.. is rather small,  i.e., between 1-2%.
           j

  1.7  SEASONALITY

       The sampling methodology  described  in  the previous sections
  is based upon a  continuous sampling period such as successive
  days or successive  weeks. Thus, hour-to-hour and day-to-day
  effects are accounted for in the estimation  process. It is  well-
  known,  however, that the quantity of solid  waste "generated fre-
  quently varies significantly  from month to month.  (Municipal
  solid waste generation,  for example,  is  low  during the months of
  January, February,  November, and  December, and peaks during June,
  July, and August,  although there are some  variations  depending
  upon geographical location. See  Table 4.) Thus it is not suffi-
  cient to sample one week out of  the  year to estimate generation

-------
                                                          Page 16
                TABLE 3: AVERAGE WITHIN WEEK SAMPLING
                        COEFFICIENTS OF VARIATION,
                SAMPLE SIZE   SAMPLE SIZE  COEFFICIENT OF
                (ONE WEEK)       (HOURLY)    VARIATION, %
                     96           2            2.11
                    144           3            1.78
                    192           4            1.48
                    240           5            1.31
                    288           6            1.20
                    336           7            1.11
                    384           8            1.02
                    432           9            0.98
                    480          10            0.91
                    528          11            0.87
                    576          12            0.83
                    624          13            0.81
                    672          14            0.78
                    720          15            0.74
                    768          16            0.72
                    816          17            0.69

                  Note:  Number of iterations = 100 weeks
                        The model assumes sampling
                        8 hours/day, 6 days/week.
for the complete year.  -Theoretically,  if one knew the weeks  of
the year  in which the  curve  of weekly  generation crossed the
horizontal  line representing  the average weekly generation for
the year,  we could  schedule a  one-week sampling period for one  of
these intersection  points and  be confident  that  our estimate for
that week, multiplied by the number of  weeks  in  that year,  would
be identical to the quantity produced throughout the year.  (This
is termed the "critical  point" approach.) Information about these
critical points is,  unfortunately, not available before the fact.
Even if it were available for  the previous  year, there  is  no
guarantee that the  critical  points will  be the  same for the
current year. Therefore,  we are  forced to sample additional weeks
throughout the year.

     Suppose we sample r weeks out of the year and determine, for
each of these r weeks,  the total quantity of solid waste, X^,
arriving at the site in  the kth  week. Let y be the  average total
weekly  quantity over  the r  weeks. Assuming  that  there are
365/7=52.1429 weeks  per year,  then an estimate  of the  total
quantity for the year, Y,  is obtained by:

-------
                                                           Page 17
  TABLE 4: TYPICAL SEASONAL VARIATIONS IN SOLID WASTE GENERATION
                        WASTE GENERATION AS % OF THE MEAN
        LOCATION        LOW     MONTH      HIGH     MONTH

        CONNECTICUT      85      NOV        111      MAY
        ENGLAND          67      JUL        132      JAN
        HAWAII           84      NOV        118      JUN
        KENTUCKY         85      MAR        125      AUG
        MISSOURI         79      FEE        113      JUL
        OHIO             87      JAN      •  113      JUL
        ONTARIO          90      MAR        106      JUN
        VIRGINIA         80      JAN        125      MAY
        WASHINGTON       86      FEE        108      MAY
        WISCONSIN        81      FEE        131      JUN
                                              r
                 Y = (52.1429)y = (52.1429/r)  SXk          [1-23]

Applying the propagation of error formula  (and  assuming that the
covariance terms are zero) ,

             var(Y) = (52.1429/r)2S var(Xk)               [1.24]


Applying the Bennett and Franklin relationship,  equation 1.20, to
Equation 1.24, the effective degrees of freedom, EDF^.,  for use in
calculating confidence intervals for the total amount of waste is
given as:
                                         »

                       [ (52 . 1429/r) 2Svar (Xk) ] 2
              EOF  -- - - - -           [1.25]
                    (52.1429/r)4 S[var(Xk) ]VEDFk


     Since the coefficient of  variation  is equal to the standard
deviation divided by the average,  the coefficient of variation of
the between-week differences,  cb is given as:

                        cb = sb/y                          [1.26]
To obtain some idea of typical values of c^,  a  Monte Carlo simu-
lation was performed (based upon real data obtained from a Boston
solid waste site),  the  results of which are shown  in Table 5.  A
conservative  value of  2% was used  for cw,  and the  simulation
involved 1000 iterations. Two  sampling  protocols were simulated:

-------
                                                         Page 18
(1) random  sampling,  and (2) systematic sampling.  Sampling
frequencies of from two to ten times  per year  were  investigated.
The cb of random sampling should closely approximate the true cb
(columns  2 and 3 in Table 5) which  it does for  all  sampling
times. However,  except for a sampling  frequency of two (where
random sampling is identical to systematic  sampling), systematic
sampling is superior to  random sampling for all sampling frequen-
cies. At a sampling frequency of four times per year,  for exam-
ple, the Cw  of systematic sampling is  only 60%  that of random
sampling. Since both sampling methods provide estimates of less
than 1% deviation from  the true total  (columns 5  and 6), system-
atic  sampling is  clearly  the  preferred  method  for  sampling
throughout the year. Table 5 suggests that typical values of cfa
vary between 3 and  4%  for sampling frequencies over the range,
four to eight times  per  year.
        TABLE 5:  MONTE CARLO SIMULATION - SEASONALITY
        (1)
(2)
(3)
(4)
(5)
(6)
# WEEKS
SAMPLED









2
3
4
5
6
7
8
9
10
Cb
TRUE
10.7
8.7
7.6
7.0
6.2
5.7
5.3
5.0
4.8
cb
RANDOM
10.0
7.8
7.3
6.4
5.9
5.6
5.2
4.7
4.6
cb
SYSTEM
9.9
5.3
4.1
3.6
3.2
2.5
3.0
2.2
2.4
TOTAL
RANDOM*
+0.01
+0.10
+0.10
-0.01
+0.33
-0.28
+0.08
-0.10
-0.23
TOTAL
SYSTEM*
-0.27
+0.37
+0.00
+0.04
-0.03
+0.06
-0.06
+0.02
+0.02
*Estimated total  quantity as percent deviation from  true total
 Note: For all simulations in this table,  cw = 0.02.
     Using the data of Tables 3 and 5,  Table 6  was prepared. This
Table shows the percentage error (assuming systematic full sam-
pling and a 90% confidence level )  of  the estimate  of total
yearly quantity of solid waste for  various  sample sizes and
sampling  frequencies.
1.8  Stratified Sampling

    „Stratified sampling involves dividing  the population into
distinct subpopulations called strata. The strata could be based
upon vehicle size,  load  type (residential, commercial, industri-

-------
                                                          Page 19
al, etc.) or any other attribute.  Within each stratum a separate
sample is selected and a separate estimate made.  The stratum
means and variances are then appropriately weighted to form  a
combined estimate for the entire population. Generally, strati-
fied sampling is used  to:  (a) increase the precision of the
estimate,  (b) afford different  sampling  methods  within the
strata, or (c) provide separate estimates for different popula-
tion elements. With regard to increasing the precision of the
estimate, theory (see Kish, 1965) tells us that grouping like
elements within a stratum (for example, one stratum might consist
of small, private vehicles,  and another might consist of munici-
pal or commercial, rear-loading,  packer-type vehicles)  increases
precision. With  regard  to affording different sampling methods
within the strata,  one  might  use  entirely different scales for
small,  private vehicles  than for municipal  or commercial, rear-
loading,  packer-type vehicles. With regard to providing separate
TABLE 6: ERROR OF THE ESTIMATE OF TOTAL YEARLY QUANTITY OF SOLID
         WASTE (90% CONFIDENCE LEVEL & SYSTEMATIC FULL SAMPLING)
         AS A FUNCTION OF  SAMPLE SIZE AND SAMPLING FREQUENCY


SAMPLING
FREQUENCY,
WEEKS/YEAR
3
4
5
6
7
8
9
10
NUMBER
800

492

OF
341

NUMBER
16.7
5.1%
3.4
2.7
2.2
1.6
1.8
1.3
1.3
10.3
5.1%
3.5
2.7
2.2
1.7
1.8
1.3
1.3
7.
5.
3.
2.
2.
1.
1.
1.
1.

OF
1
2%
5
8
3
7
9
4
4
TRUCKS
244

TRUCKS
5.1
5.2%
3.6
2.8
2.3
1.8
1.9
1.4
1.4
SAMPLED
189

SAMPLED
3.9
5.2%
3.6
2.9
2.4
1.8
2.0
1.5
1.5
PER
WEEK
157

PER
3.
5.
3.
2.
2.
1.
2.
1.
1.

HOUR
3
3%
7
9
4
9
0
5
5


127


2
5
3
3
2
2
2
1
1


.7
.4%
.7
.0
.5
.0
.1
.6
.6

97


2.0
5.4%
3.8
3.1
2.6
2.0
2.1
1.7
1.7
Notes:
  (1) Table percentages  are the errors as a percentage of
      the true total  yearly quantity of solid waste;
  (2) Table percentages  can be converted to other confidence
      levels by the formula, new = (old*z)/1.645 where "old" is
      the old % error, "new" is the % error at the new
      confidence level,  and z is the standard normal  deviate
      at the new confidence level.
  (3) Trucks sampled  per week was converted to trucks sampled
      per hour by assuming 8 hours/day and 6 days/week sampling.

-------
                                                          Page 20
quantity estimates for different  population  elements, a  "quanti-
ty" unit, such as a point,  might not be identical for all ele-
ments.  For  example,   waste composition varies  widely between
municipal and industrial waste.

     If the standard  deviation of a  vehicle-load  in a stratum  is
proportional to the average  weight of the load in that stratum  (a
not unreasonable assumption), then sampling plans that make the
sampling effort proportional to the total quantity contribution
of each stratum is known as Neyman allocation  (see Kish, 1965).
An example  of a  comparison between sampling  with  and without
stratification is shown in Table 7.  The model used assumed that
10% of the vehicles had an average net weight of  306 Ibs  (repre-
senting small vehicles, such  as  pickup  trucks, station wagons,
etc.), and 90% had an  average net weight of 15000 Ibs  (represent-
ing typical commercial vehicles). The load distributions of the
two vehicle types were  assumed to be normal,  with the standard
deviation equal  to 10% of their means. The results  in Table  7
represent a simulation involving a total for 5000 vehicles, and
the stratification  used was  Neyman allocation.  The  estimated
means are close  to the  theoretical  mean for sampling both with
and without stratification.  The standard deviations (and coeffi-
cients  of  variation)  of  these estimates,  however, are quite
different. The superiority  of  the stratified  estimate is quite
evident.
       TABLE 7:  STRATIFIED VERSUS NON-STRATIFIED SAMPLING

MEAN
a
cv
THEORETICAL
13531
1423
10.5%
NON-STRATIFIED
SAMPLING
13552
4645
34.3%
STRATIFIED
SAMPLING
13547
1444
10.7%
       cv = Coefficient of Variation.
     Thus,  if the population  of vehicles consists of two or more
subpopulations with significantly different means, it is highly
advisable to (1)  sample the subpopulations  separately, making the
samples proportional to the total quantity contribution of each
stratum (i.e.,  Neyman allocation),  (2) make separate total quan-
tity estimates  for these subpopulations,  and  (3) add  them to
arrive at an overall population quantity estimate. A population

-------
                                                          Page 21


quantity estimate of standard deviation, std(xc) , can be made by
combining the subpopulation standard  deviations by the  following
formula:
                 std(xc) =  [Zdis/Zdi)]5                   [1.27]

where s^  is the standard  deviation  of the estimate of total
quantity for subpopulation  i,  and  dj  is the degrees of  freedom
for that estimate.  Note that std(xc) "has Zd^ degrees of freedom.
                           REFERENCES

1. Bennett, C.A.,  and Franklin, N.L., Statistical Analysis in
   Chemistry and the  Chemical  Industry.  John Wiley & Sons, New
   York,  N.Y.,  1954.

2. Deming, W.E.,  "Some Variances In Random  Sampling",  in Some
   Theory of Sampling. John Wiley & Sons, New York, N.Y., 1950,
   pp. 127-134.

3. Kish,  L.,  Survey Sampling. John Wiley  & Sons, New York, N.Y.,
   1965.

-------
                                                           Page 22

                             NOTATION
cv        = coefficient of variation, stratified sampling
Cjj        = between-week coefficient of variation
cw        = within-week coefficient of variation
d^        = degrees of freedom for ith subpopulation
E         = error, i.e., one-half of a confidence interval
f         = weekly sampling frequency, n/N
f^        = sampling frequency in ith hour, n^/m^
h         - total number of hours sampled during the week
i         = index for hours
j         = index for vehicles
k         = index for week
m^        = number of vehicles arriving in ith sampling hour
n         = total number of vehicles sampled
N         = total number of vehicles in week
n^        = number of vehicles sampled in the ith hour
r         = number of weeks sampled during the year
s^        = subpopulation quantity standard deviation estimate
std(xc)   = population quantity standard deviation estimate
var(Xj-j)  = variance of individual load measurement
w^        = weighting factor for an individual observation, 1/n
X         = total weight of all vehicle-loads for the week
x         = average vehicle-load weight for the week
x^        = average vehicle-load weight in the ith hour
X^j       = jth vehicle-load weight in ith hour
Xk        = total quantity of solid waste in the kth week
y         = average total weekly quantity during r sampling weeks
Y         = total quantity for the year

-------
                                                          Page 23
                            CHAPTER 2

                     COMPOSITION ESTIMATION
2.1  INTRODUCTION

     The estimation of waste composition is a more difficult task
than the estimation of waste quantity  for at least four reasons:

  1. Complexity;  The estimation of waste  composition   involves
     the measurement of more than one  attribute.

  2. Cost; Weighing a collection vehicle is a relatively low-cost
     procedure.  Selecting  a sample of waste and  separating  it
     into  a number of components is both a more expensive and  a
     more unpleasant procedure.

  3. Statistical Problems; Unlike the  estimation of waste quanti-
     ty,  reliance on the Central Limit theorem of statistics  in
     order  to  assume normality (and  hence  permitting  simpler
     calculations)  is  not always justified.  Also,  there   are
     problems of what constitutes a sample unit and how to obtain
     random samples of such units.

  4. Small sample size; Because  of the time, and expense  required
     to sample for waste composition,  there are fewer data avail-
     able  regarding this aspect of waste  characterization  than
     for waste quantity. Hence our estimates have less  precision
     than that of waste quantity.


     In quantity estimation, the sample unit  is clearly  the
vehicle. One weighs the entire vehicle because it makes no sense
to select and just weight a portion of a vehicle-load.  In compo-
sition sampling,  however, we usually cannot separate  an entire
vehicle-load because of  time and economic  considerations.  The
usual commercial  vehicle-load is between  10,000  and 20,000 Ibs,
and is obviously too large a sample  unit to  separate.  It must be
remembered that separation has  to be done manually. As a rough
approximation,  typically one man can separate 65-300  Ib of  raw
municipal solid/waste in one hour, depending upon the number and
type of components  desired. Clearly,  very small  sample weights
make no sense physically. A large piece  of wood or metal,  for
example, could  not  physically ever be included  in  a sample  of,
say, 5 Ibs.  Furthermore, small  sample  weights tend to be more
homogeneous than the population  being sampled,  i.e., the smaller
the sample weight,  the greater  the likelihood that it consists
entirely of wood  or paper or glass, etc.  Following a well-known
principle of cluster sampling (see Kish,  1965), this homogeneity

-------
                                                         Page 24


tends to increase the variance of the sample. Indeed,  Klee and
Carruth(1970)  found that  the smaller  the sample weight,  the
greater the variance of raw municipal solid waste  samples. Howev-
er, the  relationship is was  linear.  Under 200  Ibs  the sample
variance increased rapidly; over 300 Ibs it increased much more
slowly.  Accordingly, they recommended  that a sample weight of
200-300 Ibs be used for general municipal solid waste sampling,
and this recommendation has been widely adopted by other  investi-
gators for raw refuse streams.

     The sample weight recommendation of 200-300 Ibs is appropri-
ate for raw municipal refuse only. Clearly,  optimal sample weight
is related to -the particle  size  of  the material sampled, but the
relationship  is not linear.  Other investigators (see Trezak,
1977) have found that processed waste stream particle size dis-
tributions are adequately described by functions based upon the
exponential distribution, the Rosin-Rammler equation, for exam-
ple. Thus, the author recommends the following model based upon
the exponential function  to determine optimal  sample weights for
other than raw municipal  refuse:
                         Y = Xe                            [2.1]

where Y  is  the optimal sample weight in pounds, and  X is the
characteristic particle size  of  the material to  be  sampled,
inches. The characteristic particle  size  is  the screen size at
which 63.2% of the material passes through. The boundary condi-
tion for Equation 2.1 is that at Y = 250 Ibs (the median of the
200-300  Ibs recommended  for raw municipal  solid  waste) , the
characteristic particle size is  18  inches. The value of ft that
satisfies the condition  is 0.146.  Thus,

                     Y  = Xe°'146X                          [2.2]

     To illustrate the  use of this  equation, the  output of the
Appelton West shredder was found to have a characteristic size of
2.2 in.  (see Table 10 in  Savage  and Shiflett,  1979). To sample
the output from this shredder we would need  to take samples of
weight,

                  Y = 2.2e°-146(2-2>  =  3.0 Ib.

     Like quantity, the variation in composition of solid waste
can be expected to be influenced  by  within-week and  between-week
differences. Therefore, it is important to sample within given
weeks throughout the week. However, since the actual separation
process  is quite  lengthy,  a  long time  occurs  between  samples.
Furthermore, a sample can be taken  from a selected vehicle and
the vehicle allowed to  move  on.  For these reasons,  unlike with
quantity sampling, the  taking of composition  samples  does not
result in vehicle  queues regardless  of  the  sampling  method used.
Thus it makes more sense to consider an unbiased sampling scheme

-------
                                                          Page 25


for composition determination, e.g., either the random or system-
atic sampling schemes  described in Chapter 1.


2.2  NORMALITY ASSUMPTIONS

     The determination of the number of samples for the within-
week estimation of waste  composition is,  unfortunately, a much
more complex matter than for quantity estimation. Discrete dis-
tribution theory  (such as multinominal or binomial)  cannot be
used because we are  not  dealing with  identical items in the
sampling unit.  One  piece of wood in a sample,  for  example, is
different in size and shape from the next piece that might be
found in the sample.  One  is  tempted to reach  for  the Central
Limit Theorem once again and assume  that either  the  component in
question is  distributed normally  or that averages taken from
their distributions are distributed normally., Previous  composi-
tion studies, however, have  shown that no component is  distribut-
ed normally  (see Klee,  1980) . The question  jis  then,  "How many
samples must be taken so  that  averages  of the samples are dis-
tributed normally?" For  components with positively-skewed distri-
butions  (i.e.,  skewed to the right - see Figure 1) - and this
includes most components,  including newsprint, total paper,
plastics/rubber/leather  (when combined  into one  component),
ferrous and other metals  - averages  of as low as  n  =  4   samples
closely approximate normality  and,  by n  = 10,  normality is all
but assured.  However,  for components with  J-shaped distributions
(see Figure 1) - and this includes components such as textiles,
wood, and garden waste - reasonable normality  is not approach
until averages of n =  40 or  greater are taken.

     One indication  of normality  is  the coefficient  of skewness,
g^ , which is  the third moment about the mean ,  i.e.,

          define k2 = S(xi-x)2/(n-l)                       [2.3]

          and    k3 = n2(xi-x)3/(n-l) (n-2) ,                 [2*.4]

          then   gx = k3/(k^)^                             [2.5]
Given the coefficient of skewness, s_, , of a parent population,
the coefficient of skewness  of  the  (distribution of averages of
size n taken from  this distribution,  sgl,  is:
                  gl

The coefficient of skewness  is used  in Table 1 which the results
of simulations for averages  of different sizes (number of itera-
tions for each case = 5000) for two components, ferrous metals  (a
positively-skewed  distribution) and  textiles  (a  J-distribution) .
The distributions  are taken  from  the data collected by Britton
(1972) .  Since the  rationale  behind using the Central Limit Theo-

-------
                                                             Page 26
                              o     2
                                                     FERROUS
                                                     METALS
                             10X
        0     2
8X
          FIGURE 1: DISTRIBUTION OF TEXTILES AND FERROUS METALS IN MUNICIPAL SOLID WASTE
rem is to permit the use  of  t-statistics  to construct confidence
intervals about the estimation of the percentage of any component
in the waste stream, an  appropriate measure  of the ability to
meet the  normality requirements is the  fraction of confidence
intervals that actually contain the true mean at a given level of
significance. This is shown in Table  1 by the  Actual versus
Nominal a lines. For example, for textiles, given an average of
size 10,  if  a confidence interval  at  a significance level  of o
= .05  were  constructed about the mean, the actual  significance
level would be o =  .104.  In  other words,  instead of a 95% confi-
dence interval, we would actually be constructing an 89.6% confi-
dence interval. Note that as the size of the average gets larger,
the discrepancy gets  smaller.  For example, at a significance
level of a =  .05 and an average of size 50, the true significance

-------
                                                                                               Page  27
                    TABLE 1: RESULTS OF SIMULATION STUDIES FOR TWO MUNICIPAL SOLID WASTE COMPONENTS
                                              10
15
SIZE OF AVERAGE
   20      25
30
                                                           J-FUNCTION'(TEXTILES)
35
40
45
50
MEAN
STD (Population)
STD (Mean)
COEFFICIENT OF SKEUNESS
standard deviation
t-value
significance level
Actual at, Nominal a •
Actual , Nominal a •
Actual , Nominal a =
Actual , Nominal a -
Actual , Nominal a =
Actual , Nominal a =
Actual a. Nominal a •
1.6145
1.9037
1.9037
1.45
.03
41.81
.001
.01
.02
.03
.04
.05
.10
.20
1.5924
1.9074
.8530
.66
.03
19.14
.001
.075
.102
.119
.132
.148
.190
.268
1.6001
1.9277
.6096
.46
.03
13.32
.001
.051
.065
.079
.089
J04
.138
.227
1.6058
1.9206
.4959
.33
.03
9.66
.001
.041
.055
.065
.076
.090
.134
.215
1.5956
1.9226
.4299
.32
.03
9.12
.001
.033
.046
.056
.065
.076
.128
.214
1.5957
1.9285
.3857
.30
.03
8.76
.001
.031
.040
.051
.060
.067
.115
.211
1.5983
1.8979
.3465
.27
.03
7.69
.001
.024
.035
.045
.055
.065
.114
.207
1.5920
1.9275
.3258
.29
.03
8.47
.001
.026
.037
.046
.056
.064
.112
.210
1.5972
1.8790
.2971
.28
.03
8.13
.001
.017
.028
.036
.044
.065
.104
.199
1.5994
1.9011
.2834
.21
.03
5.95
.001
.018
.028
.038
.046
.060
.104
.200
1.5951
1.9290
.2728
.20
.03
5.70
.001
.019
.031
.042
.049
.058
.099
.205
MODERATE SKEW (FERROUS METALS)
MEAN
STD (Population)
STD (Mean)
COEFFICIENT OF SKEWNESS
standard deviation
t-value
significance level
Actual a. Nominal a =
Actual a, Nominal a =
Actual a. Nominal a =
Actual a. Nominal a »
Actual a, Nominal a =
Actual a, Nominal a =
Actual a. Nominal a =
3.6712
1.7796
1.7796
.68
.03
19.74
.001
.01
.02
.03
.04
.05
.10
.20
3.6511
1.7750
.7938
.32
.03
9.26
.001
.017
.026
.043
.059
.065
.116
.207
3.6603
1.8034
.5703
.20
.03
5.88
.001
.017
.033
.039
.053
.063
.115
.204
3.6627
1.7920
.4627
.15
.03
4.32
.001
.018
.027
.039
.047
.057
.110
.215
3.6533
1.7982
.4021
.13
.03
3.67
.001
.018
.026
.040
.043
.055
.098
.204
3.6538
1.7955
.3591
.17
.03
4.91
.001
.015
.023
.035
.042
.053
.100
.204
3.6583
1.8620
.3217
.16
.03
4.67
.001
.013
.019
.033
.048
.047
.104
.214
3.6523
1.7937
.3032
.19
.03
5.54
.001
.014
.025
.035
.043
.052
.091
.211
3.6587
1.7563
.2777
.16
.03
4.53
.001
.009
.025
.035
.041
.059
.097
.197
3.6578
1.770
.2639
.12
.03
3.36
.001
.010
.025
.035
.047
.051
.100
.208
3.6564
1.7960
.2540
.05
.03
1.51
.131
.012
.024
.036
.042
.051
.110
.203
NOTE: NUMBER OF  ITERATIONS = 5000

-------
                                                         Page 28


level is  a  = .058. As the  selected significance  level is in-
creased, the discrepancy also  gets  smaller.  For example,  at a
significance level  of  a = .10 and an average of size 10,  the true
significance  level is o =  .138.  Note that  for the moderately
positively-skewed ferrous metals distribution, these discrepan-
cies are  very much smaller, even for very  small  sizes of the
average.
2.3 WITHIN-WEEK SAMPLE SIZE DETERMINATION

     Assuming that the averages  of  the components are normally
distributed,  then the sample size is given by (see Mace, 1964):

                     n =  (ts/d)2                           [2.7]

where d is the precision required (i.e.,  h  the  confidence  inter-
val desired), t is the  t-value  at significance  level a, and s is
the population standard deviation. Because  the t-value is not
known until  after n  is determined, Equation 2.7 is actually a
trial-and-error equation.  However, for  starting  purposes, the t-
value can be replaced  by its corresponding  z-value,  i.e., the
value of the standard normal deviate at 1 - a.

     Estimates of s are provided in Table 2. (These estimates are
based on  various composition  sampling studies throughout the
country, and include within-day and between-day  sampling varia-
tion. The Britton (1972)  estimates are not appropriate here
because they represent only the within-vehicle variation of one
truckload.)

     There is a problem,  however,  in applying  Equation 2.7. The
averages of the components are not normally distributed; they are
positively-skewed.  One  might  consider an appropriate transforma-
tion , such as the lognormal, but since d in  Equation 2.7 is not
constant over a lognormal scale  (i.e.,   ln[a-b]  is not equal to
ln[a]-ln[b])  we must  also  have  some  knowledge of  the mean of the
distribution,  they very quantity we are trying to estimate. A
simpler approach is to take advantage of the fact that there is a
strong correlation involving the coefficient of  skewness, and the
actual and nominal values of a.  For example, using the data of
Table 1, the following regression equation was found to have a
coefficient of determination  (R2) of 0.98:

             On = .0206 +  1.00899tta - .141g1                [2.8]

           Note:    1. If an > oa,  an = a_
                   2. If an < 0, an = .001
where a
nominal
_ is the  actual level  of  significance,  and an  is  the
 level of significance.

-------
                                                          Page 29
        TABLE 2: SUGGESTED POPULATION STANDARD DEVIATIONS

               COMPONENT          STANDARD DEVIATION, S
           PAPER COMPONENTS
             CORRUGATED PAPER              .0744
             NEWSPRINT                     .0687
             TOTAL PAPER                   .1021

           METAL COMPONENTS
             ALUMINUM                      .0069
             FERROUS METALS                 .0388
             TOTAL METALS                  .0358

           ORGANIC COMPONENTS
             FOOD WASTE                    .0506
             GARDEN WASTE                  .1269
             WOOD                          .1376
             TOTAL ORGANICS                 .1121

           MISCELLANEOUS COMPONENTS
             ASH/ROCKS/FINES                .0572
             GLASS/CERAMICS                 .0502
             PLASTIC/RUBBER/LEATHER         .0252
             TEXTILES                      .0687
     Equation 2.8 can be used to determine confidence  intervals
or significance  levels  even when the distribution is  decidedly
non-normal. The only input  required is a knowledge of the coeffi-
cient  of skewness  (which is calculated from the  data, using
Equation 2.5)  or the  sample size,   and the desired level  of
significance, aa. For example,  if  aa = .10 and g1 =  .33, then,
using Equation 2.8,

          an = .0206 +  1.00899(.10) -  .141(.33)  = .075

Thus,  a  confidence interval  constructed at an  a  of  .075 will
produce the required  confidence interval at significance level
a = .10.
     Since the  value of g^ will  generally not  be known until
after one has obtained a sample, when one is determining  sample
size Equation 2.8 is not  particularly useful  for sample size
determination. Equations can be obtained, however, that relate an
and ota using n rather than g, . For the textile data  in Table 1,
we obtain the following equation  (with  a coefficient  of determi-
nation or R2  of .954):

-------
                                                           Page  30


             Of- = -.0633 + 1.0121CU + .00136n               [2.9]
                           (37.3)       (11.5)

             Note:   1. If on > aa,  an = a_
                     2. If an < 0,  an = .001

The numbers in parentheses  are the t-values  of  the esti-mated
coefficients.  For the  ferrous metals data in Table 1, we  obtain
the following equation (with a coefficient of determination or R2
of .995):

             a_ = -.0102 + .99087a_ + .00019n              [2.10]
                           (116.9)      (5.2)

             Note:   1. If on > aa,  an = cra
                     2. If crn < 0,  on = .001

Thus, if one knows whether the component is distributed as a J-
function  (assume this  for textiles, wood,  and garden wastes),
then Equation 2.9 is used; for all  others,  Equation  2.10  is used.

     To illustrate  the use  of Equations 2.7,  2.9,  and 2.10,
suppose we wished to estimate the concentration of ferrous  metals
in the waste stream  to within ±2 percentage points of the mean,
at a significance level of a = .05. Using Equation 2.7,

          n =  (1.9623*.0388/.02)2 = 14.49 or  15, rounded  up.

Using Equation 2.10,

          ttn = -.0102 + .99087(.05) + .00019(15) = 0.0421.

At n = 15 and a = .0421, t = 2.2367.  At iteration  #2,  therefore,

          n =  (2.2367*.0388/.02)2 = 18.82 or  19 rounded up.

Using Equation 2.7,

          an = -.0102 + .99087(.05) + .00019(19) = 0.0429.

At n = 19 and a = .043, t = 2.1783. At iteration #3, therefore,

          n =  (2.1783*.0388/.02)2 = 17.86 or  18, rounded  up.


Using Equation 2.10,

          an = -.0102 + .99087(.05) + .00019(18) = 0.0427.

At n = 18 and a = .0427, t = 2.1902.  At iteration  #4,  therefore,

          n =  (2.1902*.0388/.02)2 = 18.05 or  19, rounded  up.

-------
                                                          Page 31


Since we are  cycling between 18 and 19, n = 19 and we  are fin-
ished.

     Finally,  note that because  the shape of the distribution and
the standard deviation varies with the component,  if  there are m
components the field sample size,  nf,  will be the largest sample
size over all  of the components, i.e.,

                     nf = max(nlfn2, . . . ,nm) .


2.4  ESTIMATING THE STANDARD  DEVIATION WHEN
     NO SAMPLE DATA  ARE AVAILABLE

     In almost any situation, one can get at least a  very rough
estimate of the  standard deviation.  The minimum information
involves the form of the  distribution  and the  spread  of values.
For example,  if the values  of  the component  fractions  can  be
assumed to follow  a normal distribution, the   either  of the
following rules can be used to get an estimate of a:

     (a) Estimate two values,  a low one, a^, and a high one,  b^,
between which  you expect 99.7%  (almost all)  of  the values to  be.
Then estimate  a as:

                        (bx - ai)/6                       [2.11]

     (b) Estimate two values, a  low one,  a,,  and  a high  one,  b2,
between which  you expect 95%  of  the values to be.  Then estimate a
as:
                        (b2 - a2)/4                       [2.12]

If the values  of the component fractions can be assumed to follow
a positively-skewed distribution, than an alternative  is  to
assume a triangular distribution and estimate a as:
           [a5(a5 - b5)  +  c5(c5 - a5) + b5(b5 - c5) ]/18    [2.13]

where a5, b5, and c5 are the assumed smallest, most  likely,  and
largest values,  respectively, the distribution can take on.

     As an example, suppose we  are going to sample  for aluminum,
and we estimate that the smallest  value  is 0.2%,  the  most likely
is 1.5%, and the largest  value is 4.1%. Assuming a  positively-
skewed distribution,  the estimated standard deviation  is:
[.2(.2-1.5)  + 4.1(4.1 -  .2)  + 1.5(1.5 - 4.1)]/18 = .66% or .0066


2.5  ESTIMATING SAMPLE SIZE  IN A MULTI-STAGE PROCESS

     Suppose we do not have  a good estimate of  the standard
deviation of the distribution of a component.  Since  the  cost of

-------
                                                          Page 32


composition sampling is generally high,  rather than take a large
sample than  is  really necessary we can take the  sample in more
than one stage. The method  (sometimes called  "Stein's Method" -
see Natrella, 1966))  is as  follows:

     (1)  Make a first estimate  of a  (using either Table 2  or
         the  technique  described in Section  2.4).  From  this,
         determine  n,  the  size of  the first sample   (using  the
         technique  described in Section 2.3). Choose some  frac-
         tion  of  n,  nlf as the size of the  first  sample'.  (In
         Stein's Method,  this fraction typically is \.

     (2)  This  first sample  of  size  n, provides a estimate of  a.
         Use this value to determine How large the second  sample
         should be.

     As an example, suppose  we  wished to estimate the  concentra-
tion of ferrous metals in the waste stream to within ±2 percent-
age points of the mean, at a  significance level of a  = .05.
Assume that our best  estimate  of a  is  .04.  Following the proce-
dure outlines in Section 2.3,  our estimate of sample size is 20.
Our first sample size (using a  fraction of  %) ,  n^f  is 10. After
taking this  sample we find that the  sample standard deviation
is .03. Recalculating the  sample size we find n  =  13.  Since we
have already taken 10 of these  sample, only 3 more are required.

     Note that  we can refine  the method simply by  making the
first fraction small  (say 1/3 or 1/4), and then recalculating the
standard  deviation  (and hence,  new  sample size)  after every
additional sample obtained.
2.6 ESTIMATION OF COMPOSITION

     As with waste quantity,  it is well-known,  however,  that the
composition of  solid waste generated varies significantly from
month to month. Thus it is not sufficient to sample one week out
of the year to estimate composition for the complete year, and we
are forced to sample additional weeks throughout the year.

     Suppose we sample r weeks out of the year and determine, for
each   of  these r weeks, the average  (as a fraction)  of   a
particular component  of solid waste, pk  (where £k = 2p-jv,  where
the j  sum is over nk,  the number of samples  talcen in  the kth
week),  and, Xk, the  total quantity arriving at  the site in the
kth week. Then an estimate of the fraction of the component over
the year, P,  is obtained by weighting the weekly fractions by the
weekly totals:
                           r      r
                     P =  (ZpkXk)/ZXk                      [2.14]

-------
                                                          Page 33


Applying the propagation of error formula (and assuming that the
covariance terms are zero),

  var(P) =

   S(Xk/SXk)2var(pk)  + S[(£k/SXk)(l  - Xk/SXk)]2var(Xk)     [2.15]


Applying the  Bennett and Franklin  relationship  .(Equation 1.20,
Chapter 1)  to Equation 2.15,  the effective  degrees  of freedom,
EDF-, for this variance is  given as:

  EDFp - [var(P)]2/{S[(Xk/ZXk)2]2[var(pk)]2/dpk +

          S{[(pk/SXk)(l - Xk/ZXk)]2}2[var(Xk)]2/dxk)       [2.16]

We find the total quantity  for the year of the component, T, by:

                           T = P*W                         [2.17]

where W is the total quantity  of waste for the year. Applying the
propagation of error formula  (and assuming  that the covariance
terms are zero),

                 var(T)  = W2var(P) + P2var(W)              [2.18]

Again applying the Bennett and  Franklin relationship (Equation
1.20, Chapter  1)  to Equation  2.18,  the effective degrees of
freedom, EDFt, for this variance is  given as:
                          [var(T)]2
      EDFt	         [2.19]
              [W2var(P) ]
where EDFW  is  the effective degrees of freedom  associated with
the total quantity of waste for  the year, W.

     Unfortunately,  one cannot construct confidence  intervals
about the total of the component, T,  in the usual fashion, i.e.,

                      T ± t^varOT)]3*

since the distribution of P is positively-skewed and thus the
distribution of  T is also positively-skewed. The  assumption of
normality is not appropriate under these circumstances. However,
we can use the logarithmic transformation,  which is particularly
effective in normalizing distributions which have positive skew-
ness. If we assume that T1 = ln[T] and is  normally  distributed
with mean,  \i,  and standard deviation, CT,  then (see Aitchison and
Brown,  1957):

                          T = e^ + *a2                     [2.20]

-------
                                                          Page  3 4
and
                  var(T) = e2^ + a2 [ea2 -1)

Solving for M and a2,
                                                          [2.21]
                         In
                             var(T)
                                    + 1
                                [2.22]
                In P -  h  In
                             var(T)
                                    + 1
                                [2.23]
One can then construct the desired confidence interval by  first
computing:
L =

U =
                               + t
                                    a

                                    a
The confidence interval  around T then is given as:

                         lower = eL

                         upper = eu
[2.24]

[2.25]




[2.26]

[2.27]
     An example of these calculations  is  shown  in  Table  4,  using
the data given in Table 3.  Note that,  assuming  that  the  W2var(P)
and P2var(W) terms  are the contribution  to var(T) by component
and quantity respectively,  then the component term contributed
93% of the variability in  this example while the quantity term
only contributed 7%. Thus  (and it  should come as no great sur-
prise) , the  precision of  an  estimate of yearly  quantity of a
given component depends much more on the precision of the  esti-
mate of the yearly component standard  deviation  than on the
yearly quantity standard deviation.

-------
                                                         Page 35
     TABLE 3: DATA FOR SEASONALITY CALCULATIONS EXAMPLE
                                             102,100,000  Ibs
                                                 1,743,900
                                                      45
Total, Week #1:
  Standard Deviation of Total:
Effective Degrees of Freedom of Total, d
Number of Composition Samples, n:
  Average Ferrous Metals (as a fraction)
Standard Deviation of Average:
45
17
0821
                                                   .0105
 Total,  Week #2:
 Standard Deviation of  Total:
 Effective Degrees  of Freedom  of  Total,  d
 Number  of Composition  Samples, n:
  Average Ferrous  Metals  (as  a fraction)
  Standard Deviation of Average:
                                             93,013,000 Ibs
                                              1,363,500 Ibs
                                                     29
                                                     17
                                                    .0773
                                                    .0077
 Total,  Week #3:
 Standard  Deviation  of  Total:
 Effective Degrees of Freedom  of Total, d:
.Number  of Composition  Samples, n:
  Average Ferrous Metals  (as  a fraction):
  Standard Deviation of Average:
                                             97,385,000 Ibs
                                              1,261,900 Ibs
                                                     32
                                                     17
                                                    .0719
                                                    .0072
Total, Week  #4:
Standard  Deviation  of Total:
Effective Degrees of Freedom of Total, d
Number of Composition Samples, n:
  Average Ferrous Metals  (as a fraction)
  Standard Deviation of Average:
                                             89,211,000 Ibs
                                              1,355,800 Ibs
                                                     42
                                                     17
                                                    .0589
                                                    .0095
Total,  Year:
Standard  Deviation  of Total  for Year:
Effective Degrees Of Freedom for Year:
                                          4,975,800,000 Ibs
                                             81,829,000 Ibs
                                                    143

-------
Page 36
TABLE 4: 8EASONALITY CALCULATIONS EXAMPLE
{1} {2} {3} {4} {5} {6} {7}
Pk Xk PkXk "k-1 dk Xk/SXk var
.0821 102,100
.0773 93,013
.0719 97,385
.0589 89,211
381,709
{8}
[{6}]2var(pk)
.7887971X10"5
.3520497X10"5
.3374305X10"5
.4929686X10"5
{13}
. {8} + {12
.7963464x10
.3564110x10
.3405653x10
.4955386x10
1.9888613x10
,000 .83824100X10+7 16
,000 .71899050X10"1"7 16
,000 .70019820X10+7 16
,000 .52545280X10+7 16
,000 2.78288200xlO"f7
{9} {10}
pk/SXk {9}[l-{6}]
.2150853xlO~9
.2025103X10"9
.1883634X10"9
.1543060X10"9
A {14}0
} {6}4)*{7}2
.1575544X10"9
.1531636X10"9
.1403064X10"9
.1182424X10"9
/{4} {6}4*{7}2/
~5 .3888756X10"11 .1266461x10
"~ .7746187X10"12 .6559102x10
~5 .7116210X10"12 .3070862X10
~5 .1518863X10"11 .1572631x10
-5

45 .2674813 . 01052
29 .2436752 .00772
32 .2551289 .00722
42 .2337147 .00952
{11} {12}
var(Xk) {10}2var(Xk)
1,743,9002 .7549222X10"7
1,363,5002 .4361353X10"7
1,261,9002 .3134766X10"7
1,355,8002 .2570029X10"7
{16}
{5} {14}+{15}
~15 .3888882X10"11
~16 .7746842X10"?-2
~16 .7116517X10"12
~16 .1518878X10-11
.6894096X10"11

-------
                                                  Page  37
          TABLE 4 CONTINUED
USING EQUATION 2.14, P = 2.78288200xlO~7/381,709,000 =  .0729
USING EQUATION 2.15, std(P) =  (1.9888613xlO~5)* = .0045
USING EQUATION 2.16, EDFp = (1.9888613xlO~5)2/.68940960X10"11
                          =57.38
USING EQUATION 2.17,    T = P*W =  (.0729)(4,975,800,000)
                          = 362,735,800 Ibs.
USING EQUATION 2.18,  var(T) = W2var(P) + P2var(W)
      = (4,975,800,000)2.1988861XlO~4 + .07292(81,829,000)2
     = .52799900X1015
     and Std(T) =  (.52799900X1015)^ = 22978230 Ibs.
USING  EQUATION 2.19,
     EDFt = [.52799900X1015]2/
             {[(4,975,800,000)2.1988861xlO"4]2/EDFp +
             [.07292(81,829,000)2]2/EDFW) = 65.84.
USING  EQUATION 2.22,
      a2 = In [(.22978230xlO~8)2/(362,735,800)2 + 1]
         = .4004793X10"2
USING  EQUATION 2.23,
      H = In (362,735,800) - h(.4004793xlO~2) = 19.70718
ASSUMING A 95% CONFIDENCE INTERVAL, AT EDTt = 65.84, t =  1.997
AND USING EQUATIONS 2.24 AND 2.25,
      L = 19.70718 - 1.997(.4004793xlO~2) = 19.58083
      U = 19.70718 + 1.9966(.4004793xlO~2) = 19.83354
USING EQUATIONS 2.26 AND 2.27,
      lower = e19'58083 = 319,041,400
      upper = e19'83354 = 410,766,800
  (Note that this  is an asymmetrical confidence interval
                      about the mean.)

-------
                                                          Page 38
                           REFERENCES

Aitchison,  J.,  and Brown, J.A.C., The  Lognormal Distribution.
(London, Cambridge University Press, 1957), p. 8.

Britton ,  P.W.,  "Improving Manual Solid Waste Separation Studies",
J. San.  End.  Div..  ASCE.  98(SA5):717-730  (1972).

Kish, L ., "Cluster Sampling", in Survey Sampling (New York:  John
Wiley & Sons,  Inc.,  1965), p. 161.

Klee,  A.J., and  Carruth, D.,   "Sample  Weights  In  Solid Waste
Composition Studies",  J.  San.  Ena. Div.. ASCE.  96(SA4):945-954
(1970).

Mace, A.E.', "Estimation Problems", in Sample Size Determination
(New York: Van Nostrand Reinhold Co., 1964), pp. 35-37.

Natrella, M., "Characterizing Measured Performance", in Experi-
mental Statistics.  (Washington:  United States Department  of
Commerce,  National Bureau  of Standards,  1966), pp. 2-10  to  2-11.

Savage,  G.M.,  and Shiflett,  G.R., Processing Equipment for
Resource Recovery Systems; "Ill-Field Test Evaluation  of  Shred-
ders",  Contract No. 68-03-2589, U.S. Environmental Protection
Agency, Cincinnati,  OH, 1979).

Trezak, G., Significance of Size Reduction in Solid Waste Manage-
ment. Grant No. EPA-600/2-77-131, U.S. Environmental Protection
Agency, Cincinnati,  OH, 1977).
                            NOTATION

13       = Constant in sample weight Equation 2.1.

d       = Precision (i.e.,  ^ the confidence interval)  desired

g±      = Coefficient of Skewness

j       = jth sample in a given week

k       = kth week

k2      = Part of formula for coefficient of skewness

k3      = Part of formula for coefficient of skewness

n       = Number of samples

n^      = Number of samples in the kth week

-------
                                                           Page 39
Pjk
Pk
r
s
sgi
std(P)
std(T)
t
•
T
T1
fcgi
var(pk)
var(P)
var(W)
var(Xk)
w
x
xi
xk
a
a.
a
 n
Fraction of a given component over the year
Fraction of a given component in jth sample in kth week
Average fraction of a given component in kth week
Number of weeks sampled throughout the year
Population standard deviation of given component
Standard deviation of g1
Standard deviation of P
Standard deviation of T
t-value at significance level o
Total quantity for the year of a given component
log-transform of T
t-value for the coefficient of skewness
Variance of pk
Variance of P
Variance of W
Variance of Xk
Total quantity of waste for the year
Part of formula for coefficient of skewness
Part of formula for coefficient of skewness
Total quantity of refuse in the kth week
Level of significance
Actual level of significance
Nominal level of significance

-------
                                                          Page 40
                           APPENDIX A

              DERIVATION OF QUANTITY & COMPOSITION
                 SAMPLING AND ANALYSIS FORMULAS
A.I PROOF THAT X IS UNBIASED - EQUATION 1.13

     To show that  Equation 1.13 is an  unbiased  estimate of the
true mean, IJL, let n^ be the true mean in the ith sampling inter-
val, TJ^ be the true total for the ith  sampling  interval,  and r be
the true  total for the entire population consisting of h time
periods. Then,  from Equation  1.13 and  taking expectaions, e, of
both sides,
                 e(x)
h
2(mj/N) e
njx
j
                        E(nu/N)Mi = STi/N  =  T/N
                        i           i
A.2 DERIVATION OF VAR(x)  - EQUATION 1.16

     Let a^j be  a random variate that takes the  value 1 if the
jth unit is in the sample and  the value  0 otherwise. The sample
mean of Equation 1.16 may be written as:
                 h n
             x = Z Z
                 i J
                              j
                                                            [A-l]
where  h is the  total number of  sampling  intervals  over the
duration of the sampling.   Clearly,  for  a given  i,

            Pr(aij=l)  = n-j/nii,  and  Prfa^O)  = 1^/11^

Thus, a^j is distributed as a binomial variate in a single trial
with p=n^/m^.  Hence,
= p =
                       and
= pq =  -
[A-3]

-------
                                                          Page  41
     To find var(x) we need also the covariance of a^j and
The product, ai-jaik'  ^s  1  if  both subscripted units are in the
sample, and is zero otherwise.  The probability  that two  specific
units are both in  the sample  for any given i is clearly n
        -l).  Hence,
          cov(aijaik)
          (rn.pl)
                    n
                    m
                              n
                                                           [A-4]
Applying the  propagation of  error concept  (Equation 1.15) to
Equation A-l  and setting,  for convenience, Yjj=m^X.jj/n^, we
/iVi'h a •! « «                                          ••      •*
obtain:
   var(x)  =
                                         cov(aijaik)
        h n
        £ £
        i j
                 ^
               -
               mi
        Z -
YijYik
                                                          [A-5]
where
gives:
              .  Completing the square on the cross-product term

-------
var(x) =
                                                          Page 42
           h n.s
                         m
                          1   ?M
                                  j
        nr
            h
            Z
            i
var(Yi:j)
                                                           [A-6]
But
             = m2var(Xij)/n?.  Thus,
  var(x) = (l/m2)Z
                           ni
                         -  <=*«>•
                                                           [A-7]
A.3 PROOF OF EFFICIENCY OF FULL  SAMPLING

     To show  that full sampling is more  efficient  (i.e.,  has
smaller variance)  than unbiased  systematic or random sampling, it
is sufficient to show  that the coefficient  of the  population
variance in Equation 1.10 is greater than that in Equation 1-16,
i.e.,
                                                           [A-8]

Since m  =  Zm^,  f^ = n^/m^  and f = n/m,  inequality A-8  can be
rewritten (after first  multiplying both sides by m2) as follows:
               (m2-m2n/m)/n >  S(m?-m?n
so that
and
                    (m/n)-m > Sm
                      m2/n
                                                           [A-9]
Thus it is necessary only to  show that inequality A-9 holds.

-------
                                                          Page  43


     We start by  noting that in systematic  sampling  (assuming
that the n| are integral multiples  of  k) ,  n|  = m^/k  and k  =  m/n,
where k  is the sampling interval.  (Since our symbol for the
number of vehicles sampled in the  ith interval is n^  for  biased
sampling, we have used n| for its  unbiased counterpart.)  If the
n| are not integral multiples of k or if  simple random sampling
is used,  then the n^ will  not necessarily be equal to m^/k.
However, the expectation of n', e(n|), is equal to  m^/k in  such
cases. When sampling to  the  fullest capacity of our  scales, n^
cannot be less than n|, so e(n|) <  n^ for all i.  Consequently,
    m2/n = mk = Zm^k = Zm?/(m^/k) = Zm?/e(n|)

and inequality A-9  is proved.



A.4 DERIVATION OF VAR(P) - EQUATION 2.17
Since  P  = Epj^X^/SX^,  from  the propagation of error formula
(Equation 1.15),

         var(P) = S[(*f/$pk)2var(pk) + (6f/6Xk) 2var(Xk) ]

Quickly we see that $f/$pk =  Xk/SXk.

To calculate 
-------
            PROTOCOL
A COMPUTERIZED SOLID WASTE QUANTITY
 AND COMPOSITION ESTIMATION SYSTEM
         OPERATIONAL MANUAL
                 BY
             ALBERT J. KLEE
    RISK REDUCTION ENGINEERING LABORATORY
     U.S. ENVIRONMENTAL PROTECTION AGENCY
           CINCINNATI, OHIO 45268

-------
                             MANUAL

                      PROTOCOL VERSION  1.01



                        TABLE OF CONTENTS
                                                           Page

A. INTRODUCTION	   1

   A..1  FILES AND PROMPTS	   1
         *
B. DATA FILES	   2

   B.I  CONSTRUCTING AND ADDING TO THE  QUANTITY
     DATA FILE (OPTIONS 1 AND 2)	   2

   B.2  CONSTRUCTING AND ADDING TO THE  COMPOSITION
     DATA FILE (OPTIONS 4 AND 5)	   5

C: ANALYZING QUANTITY AND COMPOSITION DATA	   7

   C.I  ANALYZING QUANTITY DATA (OPTION^ 3)		   7

   C.2  ANALYZING COMPOSITION DATA (OPTION 6)	  10

D: DETERMINING COMPOSITION SAMPLE SIZE  (OPTION 7)	  12

E: EDITING PRO-QUAN.DAT OR PRO-COMP.DAT FILES  (OPTION  8)...  14

-------
                                          Protocol Manual Page 1


                        A.  INTRODUCTION
A.I  FILES AND PROMPTS

     PROTOCOL  consists of a single  file,  PROTOCOL.EXE  and is
called merely by entering  PROTOCOL on the  keyboard. You  can,
however,  rename this  file  if  you wish.  The  following is  the
PROTOCOL logo.
                           PROTOCOL
     (Solid Waste Quantity  & Composition Sampling Protocols)
                         VERSION 1.01
               US ENVIRONMENTAL PROTECTION AGENCY
PROTOCOL creates  and  uses  a number of files with the  following
standard (or default)  names:

   PROTOCOL.OUT   - the standard  output file for the  program,
                    created by PROTOCOL.
   PRO-COMP.DAT  -  a  file  that serves as input to the composi-
                    tion analysis  portion of the program.   This
                    file is created   by  the  user.
   PRO-QUAN.DAT   -  a  file  that  serves as input to the quantity
                    analysis portion of the program.    This  file
                    is also created  by the user.
   QCACCESS.DAT  -  an internal  file that serves as input  to the
                    composition  analysis  portion of the program.
                    This file is automatically  created by  PROTO-
                    COL.

With the exception of QCACCESS.DAT,  if  any of the above  files
already exist, PROTOCOL warns you of  this fact  and asks whether
you wish to overwrite  them  (thus destroying them in the process).
PROTOCOL.OUT can  be overwritten  if you so wish,  but if you  do not
want PRO-COMP.DAT OR PRO-QUAN.DAT overwritten, you must exit from
PROTOCOL and save these files by renaming them.

     The following is the  PROTOCOL Main Menu.  Options 1  and  4
construct the PRO-COMP.DAT  and PRO-QUAN.DAT files,  (Options 2 and
5 are used for adding to these  files), and Options 3 and  6 deal
with estimation  of waste  quantity and composition.  Option  7
determines composition sample size (Section  2.3, Chapter 2),  and
Option 8 is a full screen editor for editing the PRO-COMP.DAT and
PRO-QUAN.DAT files created  with Options 1 and 4.

-------
                                           Protocol Manual Page 2
                           OPTIONS
           1. CONSTRUCT A QUANTITY DATA FILE
           2. ADD TO AN EXISTING QUANTITY DATA FILE
           3. ANALYZE SET OF WEEKLY QUANTITY DATA
           4. CONSTRUCT A COMPOSITION DATA FI1;E
           5. ADD TO AN EXISTING COMPOSITION DATA FILE
           6. ANALYZE SET OF WEEKLY COMPOSITION DATA
           7. DETERMINE COMPOSITION SAMPLE SIZE
           8. EDIT QUANTITY OR COMPOSITION DATA FILE
           9. •••••••••••••••• QUIT ••••••••••••••••••
          Use up-arrow and down-arrow keys to move bar
          to desired selection (or use number to go to
          numbered selection).   Then press Enter key.
          The PgUp and PgDn  keys will move the bar to
          the top and bottom positions,  respectively.
The selected  Option is indicated by  a highlighted bar  (in this
manual, an "<—" arrow is used to show the  position  of the high-
lighted bar) .
                          B. DATA FILES

B.I  CONSTRUCTING AND ADDING TO THE QUANTITY
     DATA FILE (OPTIONS 1 AND 2)

     When constructing the quantity data file (PRO-QUAN.DAT),  you
will asked  to supply a heading for the PRO-QUAN.DAT file  (for
identification purposes),  e.g.,
              ENTER HEADING FOR QUANTITY DATA FILE:
              » PROTOCOL QUANTITY DATA EXAMPLE
Data entry follows,  first with the entry of the number  of vehi-
cles arriving within the first hour of the first week,  e.g.,
      ENTER NUMBER OF TRUCKS ARRIVING IN HOUR #1,  WEEK #1,
      » 27
   (Press Enter Key if finished entering data for this week.)
then with the entry of the vehicles weights for that hour,  e.g.,

-------
                                             Protocol Manual Page 3
                  ENTER VEHICLE WEIGHTS  FOR HOUR  1,
          SEPARATED BY A COMMA OR ONE  OR MORE SPACES, e.g.,
     21232,20395,16930,21020,12046,22739,15322,18205,20397,16724
           (When done for this hour, press Enter  Key twice.)

     21387,17538,18328,23748,26849,19610,16351,21489
  Data entry is completed by defaulting on the entry of the number
  of trucks in the first hour of a new  week, e.g.,
       ENTER NUMBER OF TRUCKS ARRIVING  IN HOUR #1, WEEK #5,
       »
 (Starting new week; press Enter Key  if finished entering data.)
     Upon completion of data entry*  the PRO-QUAN.DAT file looks
like that  shown in Figure 1.  Note that the file consists of a
specific repeating format, i.e.,  Week-Header/Hour-Header/Data,.
etc., where the Week Header consists of  three lines, e.g.,
                          (Blank Line)
                          	 WEEK #1 -
the Hour Header consists of two lines,  e.g.,
   	  (Blank Line)  	
   HOUR #1: Total Number of Vehicles  =    27


and the Data consists of  one  or more  lines using  the comma or
space protocol,  e.g.,


   29586,28773,18276,23038,21520,18205,22013,19610,16351,21489
   16889,23985,25805,15029,24396,21078


When editing the  PRO-QUAN.dat file,  this  format must  not be
altered.

     Option  2 allows  you  to add additional weeks of  data to an
existing PRO-QUAN.DAT  file. Since the file already exists with an
identification file heading,  you will not be asked to enter, a new
one.

-------
                                        Protocol Manual Page 4
PROTOCOL QUANTITY EXAMPLE DATA
                          WEEK #1
HOUR #1: Total Number of Vehicles =   27
6889,4597,4267,6148,3530

HOUR #2: Total Number of Vehicles =   29
7541,11807,11806,11574,6797,10515,10718
HOUR #48: Total Number of Vehicles =  63
2692,5960,4700,4744,4639,3750,4498
                          WEEK #2
HOUR |l: Total Number of Vehicles -   28
13749,4197,4931,5531
HOUR #48: Total Number of Vehicles =  56
5760,3234,3847,5030,3805,5253,3773,5424
                          WEEK #4
HOUR II: Total Number of Vehicles -   29
21232,20395,16930,21020,12046,22739
HOUR |48: Total Number of Vehicles =  61
5242,3724,4425,3999,5823,4894
      FIGURE  1:  EXAMPLE OF A  PARTIAL PRO-QUAN.DAT  FILE

-------
                                          Protocol Manual Page 5
B.2  CONSTRUCTING AMD ADDING TO  THE COMPOSITION
     DATA FILE (OPTIONS 4 AND 5)

     Data entry for the PRO-COMP.DAT file is similar to that for
the PRO-QUAN.DAT file. You will  asked to  supply a heading for the
PRO-COMP.DAT file (for identification purposes), e.g.,
            ENTER HEADING FOR COMPOSITION DATA FILE:
            » PROTOCOL COMPOSITION DATA EXAMPLE
Data entry follows, first with the  entry of the component name,
e.g
          ENTER COMPONENT NAME:
          » FERROUS METALS
          (Press Enter Key if finished entering data.)
then followed by entry of the component fractions for each week,
e.g.,
              ENTER COMPONENT FRACTIONS FOR WEEK #1,
          SEPARATED A COMMA OR ONE OR MORE SPACES, e.g.,
   .043,.145,.109,.037,.103,.046,.069,.039,.081,.041,.046,.135
         (When done for this week, press Enter Key once;
      When done for this component, press Enter Key twice.)

   .003,.003,.002,.170,.125,.005,.002,.002,.001,.005, .000,.156
Data for additional components can be  added to this file, until
there are no more components.  Option 5  allows you to add data for
additional components  to an existing PRO-COMP.DAT file. Since the
file already exists with an identification file heading, you will
not be asked to enter  a new one.

     Upon completion of data  entry, the  PRO-COMP.DAT file looks
like that  shown in Figure  2. Note that  the file consists of a
specific  repeating   format,  i.e.,  Component-Header/Week-
Header/Data,  etc., where the Component  Header consists  of two
lines,  e.g.,

-------
                                        Protocol Manual Page  6
PROTOCOL COMPOSITION EXAMPLE DATA
COMPONENT: TOTAL METALS

WEEK #1:
.043, .145,.109,.037,.103,.046,.069,.039,.081,.041,.046,.135
.177,.065,.045,.102,.114

WEEK #2 t
.053,.059,.054,.143,.128,.061,.074,.066,.059,.071,.076,.132
.037,.050,.052,.089,.108
WEEK #4:
.148,.115,.027,.094,.021,.008,.036,.037,.031,.075,.031,.097
.032,.063,.061,.097,.028

COMPONENT: TEXTILES

WEEK #1:
.003,.183,.069,.001,.071,.002,.001,.003,.001,.001,.006,.140
.244,.001,.001,.055,.086

WEEK #2:
.003,.003,.002,.170,.125,.005,.002,.002,.001,.005,.000,.156
.003,.002,.004,.035,.088
WEEK #4:
.182,.094,.005,.036,.001,.000,.004,.001,.001,.003,.007,.047
.000,.001,.001,.050,.000
      FIGURE  2:  EXAMPLE OF A PARTIAL PRO-COMP.DAT FILE

-------
                                           Protocol Manual Page 7
   COMPONENT: TOTAL METALS
   	  (Blank Line)
the Week Header consists of two lines,  e.g.,


             	 (Blank Line)  	
  WEEK #1:

and the Data consists of one  or more lines using  the comma or
space protocol, e.g.,


  .148,.115,.027,.094,.021,.008,.036,.037,.031,.075,.031,.097
  .032,.063,.061,.097,.028

               «
When editing the  PRO-QUAN.dat  file, this  format must not be,
altered.
           C: ANALYZING QUANTITY AND COMPOSITION DATA

C.I  ANALYZING QUANTITY DATA (OPTION 3)

     Option 3 makes all of  the calculations,  using formulas de-
rived in Chapter 1, to estimate weekly and yearly waste quanti-
ties. A PRO-QUAN.DAT file must exist; otherwise an error message
is issued and the program is terminated. After a check to see if
the PROTOCOL.OUT file  already exists  (and, if  it does,  whether
you wish to overwrite it),  you will  be  asked to  supply an identi-
fication title for the new  PROTOCOL.OUT file that will be creat-
ed, e.g.,
          ENTER PROTOCOL.OUT FILE IDENTIFICATION TITLE

          » PROTOCOL QUANTITY EXAMPLE
You then will be asked for the significance level for the confi-
dence intervals that will be  constructed  for the estimates, using
a highlighted bar menu,  e.g.,

-------
                                           Protocol Manual Page 8
SIGNIFICANCE
LEVEL FOR
CONFIDENCE1
INTERVALS
.01
.05
.10
.20
     Another query
hourly means,  e.g.,
                    made at this  time concerns the  plotting of
SKIP PLOT OF
HOURLY MEANS?
YES
NO
If this option is selected, plots of the hourly means will be
constructed after the end of data  reading  and analysis. A typical
plot is shown in Figure 3.
PLOT OF WEEKLY MEANS FOR EACH HOUR
(Dotted line = overall mean)

HOUR
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
NUMBER
VEHICLES
27
29
65
241
309
91
86
54
27
28
57
263
284
93
93
59
SAMPLE
SIZE
5
7
6
8
5
4
8
7
6
6
4
6
6
4
6
8
Minimum = 4,426 Mean = 18,720
Maximum = 25,895
o
0
0 .
o
o
o
0
o
o
0
o
o
. o
o.
o
o
            FIGURE 3:  EXAMPLE OF HOURLY PLOT OF MEANS
After selection of the significance level, PROTOCOL will read the data
in the PRO-QUAN.DAT file and, if there are no errors in the file, will
proceed to make the necessary calculations. If an error is found, this
will be reported, e.g.,

-------
                                           Protocol Manual  Page  9
               ERROR IN DATA FILE,  LINE NUMBER 28.
               SPECIFICALLY, THERE  WAS  AN ERROR IN
               THE RECORDED WEIGHT  OF A VEHICLE
               IN HOUR NUMBER 4.  THE FOLLOWING
               IS THE ERROR LINE:
               2123X,20395,16930,21020,12046,22739
                   EDIT QUANTITY DATA FILE
                   TERMINATE PROTOCOL PROGRAM
The error in this line consisted of a  letter  (x) where an  integer
was expected. PROTOCOL would have caught this error if Option 1
for 21 was used to enter  the data. This sort of error can occur
only if you use your own editor to prepare the PRO-QUAN.DAT file.
In any event,  PROTOCOL gives you  the  option of terminating the-
program, or immediately entering its  full screen editor to cor-
rect the file.

     Figure  4 shows  one  of  the weekly  calculation summaries
produced by PROTOCOL.  After all of the data in the PRO-QUAN.DAT
file has been processed,  a  yearly summary is prepared (shown in
Figure  5).  The weekly and  yearly summaries  are written   to  the
                        SUMMARY FOR WEEK #   1
NUMBER OF SAMPLING HOURS:
NUMBER OF VEHICLES:
NUMBER OF VEHICLES SAMPLED:
AVERAGE NUMBER OF VEHICLES SAMPLED/HOUR:
VEHICLE SAMPLING FREQUENCY:
   48
5,454
  295
  6.1
5.41%
ESTIMATED WEEKLY MEAN:                                         18,720
STANDARD DEVIATION OF ESTIMATED WEEKLY MEAN:                       320
CONFIDENCE INTERVAL ABOUT ESTIMATED WEEKLY MEAN AT ALPHA =   .05:
                           18,076 <	> 19,364
Effective Degrees of Freedom for Mean:                              45
Within-Week Coefficient of Variation:                            1.71%

ESTIMATED WEEKLY TOTAL:                                   102,100,000
STANDARD DEVIATION OF ESTIMATED WEEKLY TOTAL:                1,743,900
CONFIDENCE INTERVAL ABOUT ESTIMATED WEEKLY TOTAL AT ALPHA =  .05:
                       98,586,000 <	>  105,610,000

-------
                                         Protocol  Manual  Page  10


         FIGURE 4: EXAMPLE OF A WEEKLY QUANTITY SUMMARY

                          SUMMARY FOR YEAR
                   NUMBER  OF WEEKS  SAMPLED = 4

NUMBER OF SAMPLING HOURS:                                          192
NUMBER OF VEHICLES:                                            21,663
NUMBER OF VEHICLES SAMPLED:                                      1,175
AVERAGE NUMBER OF VEHICLES SAMPLED/HOUR:                            6.1
VEHICLE SAMPLING FREQUENCY:                                      5.42%

ESTIMATED YEARLY TOTAL:                                  4,975,800,000
STANDARD DEVIATION OF ESTIMATED YEARLY TOTAL:                81,829,000
CONFIDENCE INTERVAL ABOUT ESTIMATED  YEARLY TOTAL AT ALPHA =   .05:
                    4,814,100,000 <	> 5,137,600,000
ERROR AS % OF ESTIMATED YEARLY TOTAL:                             3.25%

Effective Degrees of Freedom for Total:                             143
Coefficient of Variation of Total:                                1.64%
Between-Week Coefficient of Variation:                            5.83%


        FIGURE  5:  EXAMPLE OF  A YEARLY QUANTITY SUMMARY


PROTOCOL.OUT output file,  as well as the plots  of  the hourly
means if  plotting was selected, and  the program is terminated.
Also, a special file, QCACCESS.DAT,  is written,  containing weekly
and  yearly quantity means standard  deviations  and  degrees of
freedom. This file is used in the computation  of yearly component
quantities, and must not be edited by the user.
C.2  ANALYZING COMPOSITION DATA (OPTION 6)

     Option 6 makes all of  the calculations,  using formulas de-
rived in Chapter 2,  to estimate weekly  component fractions and
yearly  component quantities.  A PRO-COMP.DAT file must exist;
otherwise an error message is issued and the program is terminat-
ed. After a check to see  if  the PROTOCOL.OUT  file already exists
(and, if it does, whether you wish to overwrite it), you will be
asked to supply an identification title  for the new PROTOCOL.OUT
file that will be created, e.g.,
          ENTER PROTOCOL.OUT FILE IDENTIFICATION  TITLE

          » PROTOCOL COMPOSITION EXAMPLE
As with the quantity analysis,  you will  be asked  for  the  signifi-

-------
                                          Protocol Manual Page 11


cance level for the confidence intervals that will be constructed
for the estimates,  using a highlighted bar menu, e.g.,
SIGNIFICANCE
LEVEL FOR
CONFIDENCE
INTERVALS
.01
.05
.10
.20
After selection of the significance level, PROTOCOL will read the
data in the PRO-COMP.DAT file and,  if  there  are  no errors  in the
file, will  proceed to  make  the necessary  calculations.  If  an
error is found, this will be  reported, e.g.,
    ERROR IN LINE NUMBER 20 OF THE COMPOSITION DATA FILE,
    i.e., THERE WAS AN ERROR IN THE RECORDING OF SAMPLE DATA
    FOR COMPONENT NUMBER 2 IN WEEK NUMBER 3.
    THE FOLLOWING IS THE ERROR LINE:
   .043.145,.109,.037,.103,.046,.069,.039,.081,.041,.046,.135
                   EDIT COMPOSITION DATA FILE
                   TERMINATE PROTOCOL PROGRAM
The error in this line consisted of a missing comma between  the
first two component  fractions.  PROTOCOL would have caught this
error if Option 4. (or 5J. was used to enter the data.  This sort of
error can occur  only if you use your own editor to prepare  the
PRO-COMP.DAT file.  In any  event, PROTOCOL gives you the option of
terminating the program,  or immediately  entering  its  full screen
editor to correct the file.

     Figure 6 shows  one of the component summaries produced by
PROTOCOL. The mean, standard  deviation,  coefficient  of skewness,
and degrees of freedom are calculated for each week and for  the
year  (the  yearly estimate is  obtained by weighting using  the
waste quantity  information  in the  QCACCESS.DAT  file and  the
appropriate  formulas in Chapter 2).  An  estimate of the total
yearly component quantity, its standard deviation, and  a confi-
dence interval for  the estimate are also presented.  The  signifi-
cance level for the construction of  the confidence level is
automatically adjusted (using Equation 2.7 in Chapter 2 and  the
calculated coefficient of  skewness for the year).  The coefficient
of skewness  for the year is  obtained by weighting the weekly

-------
                                          Protocol Manual Page 12


values by  both the number of weekly component samples and the
weekly waste quantities.  If  an  adjustment (for skewness) in the
significance level is made, the  actual  significance level used is
reported.

     If the QCACCESS.DAT  file does  not exist,  however,  only the
weekly component calculations  will be made. If a QCACCESS.DAT
file containing a different number of weeks that the PRO-COMP.DAT
file is used,  this will be reported,  e.g.,
   WARNING! NUMBER OF WEEKS IN QCACCESS.DAT FILE NOT EQUAL TO
   NUMBER OF WEEKS IN PRO-COMP.DAT  FILE.  IT'APPEARS THAT THE
   QCACCESS.DAT FILE BELONGS TO ANOTHER DATA SET.  SUGGEST
   RE-RUNNING QUANTITY ANALYSIS IN  ORDER TO OBTAIN THE PROPER
   QCACCESS.DAT FILE.  PROCESSING WILL CONTINUE BUT PROGRAM
   WILL NOT BE ABLE TO CALCULATE WEIGHTED YEARLY COMPONENT
   FRACTIONS OR WEIGHTS.
Note that  processing will continue,  but only weekly  summaries.
will be made.

     If the  QCACCESS.DAT  and the PRO-COMP.DAT file  contain the
same number of weeks, PROTOCOL has no way of determining if they
"belong" to each other.  You will  have  to make sure that the right
QCACCESS.DAT file  is used if you are working on  data  involving
more than one site. A damaged QCACCESS.DAT file will be detected
by PROTOCOL,  e.g.,
   WARNING! ERROR IN DATA IN LINE  3 OF THE QCACCESS.DAT FILE!
   FOR SOME REASON THIS FILE HAS BEEN DAMAGED. SUGGEST RE-RUNNING
   QUANTITY ANALYSIS IN ORDER TO PRODUCE A NEW, ERROR-FREE FILE.
   PROCESSING WILL CONTINUE BUT PROGRAM WILL NOT BE ABLE TO
   CALCULATE WEIGHTED YEARLY COMPONENT FRACTIONS OR WEIGHTS.

              (Press Enter Key to  continue: »
Again, note that processing will continue,  but only weekly sum-
maries will be made.
        D: DETERMINING COMPOSITION SAMPLE SIZE  (OPTION 7)

     Option  7  makes  all  of the trail-and-error calculations,
using formulas derived in Chapter  2,  to determine  composition
sample size. After  a  check to see if the PROTOCOL.OUT  file al-

-------
                                           Protocol Manual Page 13
ready exists (and,  if it does,  whether  you wish to overwrite it),
you will be asked to  supply  an identification title for the new
PROTOCOL.OUT file that will be  created,  e.g.,
COMPONENT: TOTAL METALS
STANDARD COEFFICIENT
WEEK MEAN DEVIATION OF SKEWNESS SAMPLE SIZE
1 .0822 .0105 .18 17
2 .0772 .0077 .25 17
3 .0719 .0072 .23 17
4 .0589 .0095 .20
YEAR .0729 .0045 .22
•
ESTIMATED YEARLY TOTAL:
STANDARD DEVIATION: .
CONFIDENCE INTERVAL ABOUT ESTIMATED YEARLY


ERROR AS % OF ESTIMATED YEARLY TOTAL:
EFFECTIVE DEGREES OF FREEDOM FOR ESTIMATED
17
EFFECTIVE DF = 57.4
362,750,000
22,982,000
TOTAL AT ALPHA = .05
> 410,790,000
6.34%
YEARLY TOTAL = 65.9
            FIGURE 6: EXAMPLE OF A COMPONENT SUMMARY
          ENTER PROTOCOL.OUT FILE  IDENTIFICATION TITLE:

          » PROTOCOL COMPOSITION  SAMPLE SIZE EXAMPLE
You will  also be  asked to supply a component identification
title, e.g.,
              ENTER COMPONENT IDENTIFICATION TITLE:

              » FERROUS METALS
Next the level of significance desired  is entered, e.g.,
SELECT
LEVEL
OF
SIGNIFICANCE
.01
.05
.10
.20

-------
                                           Protocol  Manual  Page  14
and select the assumption you wish to make concerning the shape
of the  distribution,  the standard  deviation, and  the desired
sensitivity, e.g.,
                ASSUME SKEWED DISTRIBUTION
                ASSUME J- SHAPED DISTRIBUTION
                PROVIDE COEFFICIENT OF SKEWNESS
                MAKE NO DISTRIBUTION CORRECTIONS
               ENTER STANDARD DEVIATION:  »  .038
              ENTER PLUS OR MINUS REQUIRED:  »  .02
The program then does the trial-and-error calculations described
in Chapter 2, Section 2.3. When a steady-state solution has been
achieved, the results are reported in the  following manner:
       COMPONENT: METALS
                              at DESIRED PRECISION =  .0200
                            and STANDARD DEVIATION =  .0385
                                 FINAL SAMPLE SIZE =     19
            (Adjusted for Skew-shaped non-normality.)
You can do additional  sample  size calculations, either for dif-
ferent components or for the same components with different input
parameters.
     E: EDITING PRO-QUAN.DAT OR PRO-COMP.DAT FILES (OPTION 8)

     If the EDIT option is selected when errors are reported when
processing quantity or composition information, you will automat-
ically be transferred  to  PROTOCOL'S  full-screen editor. You can
also enter the editor  from the Main  Menu  by selecting Option 8,
"EDIT QUANTITY OR COMPOSITION DATA FILE".  Care  should be taken
when editing PRO-QUAN.DAT or  PRO-COMP.DAT files to preserve the
formats discussed in Sections  B.I  and B.2.

     After entering  the editor,  the 25th line will  display the
following commands,

-------
                                            Protocol Manual Page 15
       F1 ABORT F2 UNDO F3 NARK F4 CUT F5 PASTE F6 SAVE F7 DEL EOL F8 DEL L F9 UDEL L FO 7 INS ON
These commands have the  following meanings:
Fl ABORT

F2 UNDO
F3 MARK
F4 CUT



F5 PASTE

F6 SAVE



F7 DEL EOL

F8 DEL L
F9 UDEL L
FO ?

INS ON
Aborts  edit and returns user  to entering routine  -
query is "Abandon changes  (Y/N)?";
Restores letters deleted by the
A block of text is defined by  toggling this  command
and  moving  the cursor keys -. the   marked  area  is
shown in reverse video;
Removes  blocked text to a buffer from which it  can
be  pasted (using F6) at any point  where the  cursor
is located;
Moves  the  text in the paste  buffer to  the  cursor
location;
Saves  edited  file  and returns user  to  entering
routine  -  query is "Save under file name  of  PRO-
QUAN.DAT" (orPRO-COMP.DAT);
Deletes text from cursor position to the end of  the
line;
Deletes line containing cursor;
Restores the most recent deletion by F7 or F8;
This is the F10 key - it brings  up  secondary line 25
commands.
Text is entered in insert  mode by default;  pressing
the  Ins  key toggles to overstrike mode.    When  in
overstrike mode, this will read  INS OFF,
     After pressing the F10 key, the  25th line will  display the
following commands,
      •» - T 1  Home(ROU BEG) EndCROW END) PgUp PgDn *PgUp(F BEG) *PgDn(F END) Ins Del

These secondary  commands have  the following meanings:

                 Moves cursor  one character to the right;
                 Moves cursor  one character to the left;
                 Moves cursor  one character up one line;
                 Moves cursor  one character down one line;
                 Moves cursor  to  the 1st column in the row;
                 Moves cursor  to  the last column in the row;
                 Moves text up one screen;
                 Moves text down  one screen;
                 This  is  Control-PgUp  - Moves  cursor  to
                 beginning of  the file;
                 This  is  Control-PgDn  - Moves  cursor  to
                 end of the file;
                 Toggles Insert/Overstrike modes;
                 Deletes character to the right of the cursor.
 Home(ROW BEG)
 End(ROW END)
 PgUp
 PgDn
 ~PgUp(F BEG)

 ~PgDn(F END)

 Ins
 Del
                                                  the
                                                  the

-------
                                          Protocol Manual Page 16


Lines may be broken by pressing the  Enter key at any point, and
may be deflated by pressing the Del key when the cursor is at the
end of a line.
     This version of PROTOCOL is configured  for  an  IBM PC/XT/AT
or compatible (the only real restriction is that the computer is
running under MS DOS 2.x or a higher version),  with or without a
math coprocessor.   (If a math coprocessor is  present,  PROTOCOL
will utilize  it;  if one is not present,  PROTOCOL will  emulate
coprocessor instructions.)  Another requirement is that  the
driver,  ANSI.SYS,   must be present  in  the  CONFIG.SYS  file.

     Included on the distribution disk are three data files: PRO-
QUAN.DAT, PRO-COMP.DAT, AND QCACCESS.DAT.  These   are  the  data
files used in the examples  in this manual.
If any bugs are  found  or you have any questions about  the pro-
gram, please contact:

                       Dr. Albert J. Klee
   "      Risk Reduction Engineering Research Laboratory
          United States Environmental Protection Agency
                          26  King Drive
                     Cincinnati, Ohio 45268

                          513/569-7493
                          FTS 684-7493
     Although PROTOCOL has been  tested  extensively,  you may do
something that never has been done before and may  unearth a bug
that no one else has  found.   Please provide a copy of  the data
that lead to the failure and record  as much  as you  can about the
circumstances of the failure.   We can identify  and  correct prob-
lems only if we can reproduce them.

     On "cosmetic"  bugs  (i.e.,  bugs  that involve  screen  output),
it would be helpful if a screen dump were  included with a de-
scription of the problem.  (Use "Shift-PrtSc" to  print  the screen
to your printer.)

-------