EPA-600/4-76-044
August 1976
Environmental Monitoring Series
                       THE  EPA PROI
                STANDARDIZATION OF STATIONARY
           SOURCE EMISSION TEST METHODOLOGY
                                              A Review
                              Environmental Monitoring and Support Laboratory
                                     Office of Research and Development
                                    U.S. Environmental Protection Agency
                                          Washington, D.C. 20460

-------
                RESEARCH REPORTING SERIES

Research reports of the Office of Research and Development, U.S. Environmental
Protection Agency,  have been  grouped into  five series. These five  broad
categories were established to facilitate further development and application of
environmental technology. Elimination of traditional grouping was consciously
planned to foster technology transfer and a maximum interface in related fields.
The five series are:

     1.    Environmental Health Effects Research
     2.    Environmental Protection Technology
     3.    Ecological Research
     4.    Environmental Monitoring
     5.    Socioeconomic Environmental Studies

This report has been assigned to the ENVIRONMENTAL MONITORING series.
This series describes research conducted to develop new or improved methods
and  instrumentation for the identification and quantification of environmental
pollutants at the lowest conceivably significant concentrations.  It also includes
studies to determine the ambient concentrations of pollutants in the environment
and/or the variance of pollutants as a function of time or meteorological factors.
This document is available to the public through the National Technical Informa-
tion  Service Springfield. Virginia 22161.

-------
                                          EPA-600/4-76-044
                                            August 1976
   THE EPA PROGRAM FOR THE STANDARDIZATION OF

        STATIONARY SOURCE EMISSION TEST

           METHODOLOGY - A REVIEW

                      by

              M.  Rodney Midgett
           Quality Assurance Branch
Environmental Monitoring and Support Laboratory
 Research Triangle Park, North Carolina  27711
ENVIRONMENTAL MONITORING AND SUPPORT LABORATORY
           QUALITY ASSURANCE BRANCH
      OFFICE OF RESEARCH AND DEVELOPMENT
     U.S. ENVIRONMENTAL PROTECTION AGENCY
RESEARCH TRIANGLE PARK, NORTH CAROLINA  27711

-------
                                 DISCLAIMER

     This report has been reviewed by the Environmental Monitoring and Support
Laboratory, U.S. Environmental Protection Agency, and approved for publication.
Mention of trade names or commercial products does not constitute endorsement
or recommendation for use.
                                      ii

-------
                                CONTENTS
Section                                                                  Page
LIST OF TABLES	iv
ACKNOWLEDGMENTS  	   v
1.  INTRODUCTION 	   1
2.  CONCLUSIONS AND FUTURE PLANS 	   3
3.  THE METHODS STANDARDIZATION PROCESS  	  .  	   6
    Steps in the Standardization Process 	   6
    Design of Collaborative Tests  	   8
    Analysis of Collaborative Test Data  	  13
4.  RESULTS OF THE METHODS STANDARDIZATION PROGRAM 	  17
    Stack Gas Velocity and Volumetric Flow Rate	17
    Gas Analysis for Carbon Dioxide, Excess Air, and Dry
    Molecular Weight 	  18
    Moisture Fraction  	  21
    Particulates 	  22
    Sulfur Dioxide 	  27
    Nitrogen Dioxide 	  29
    Sulfuric Acid Mist/Sulfur Dioxide  	  30
    Opacity of Stack Emissions 	  32
    Carbon Monoxide  	  35
    Beryllium	36
5.  REFERENCES	43
TECHNICAL REPORT DATA AND ABSTRACT 	  46
                                   ill

-------
                               LIST OF TABLES
Number                                                                     Page
  1.  Methods Collaboratively Tested under the Methods
        Standardization Program 	 39
  2.  Precision Estimates for Those Parameters Where Standard
        Deviation Was Proportional to the Mean Value, 6	40
  3.  Precision Estimates for Those Parameters Where Standard
        Deviation Was Independent of the Mean Value, 6	41
  4.  Collaborative Test of EPA Method 9  	 42
                                      iv

-------
                          ACKNOWLEDGMENTS

      The author wishes to acknowledge with appreciation the assistance
of Dr. Henry F. Hamil, Southwest Research Institute, San Antonio, Texas,
and Mr. Paul C. Constant, Jr., of Midwest Research Institute, Kansas City,
Missouri, who had responsibility for planning and coordinating the testing
under contract to the Environmental Protection Agency.   Appreciation is also
extended to the many other individuals and organizations who took part in the
planning and execution of the tests, including those who participated as
collaborators.  A very special acknowledgment is due those organizations and
company personnel who voluntarily made their plant facilities available as
test sites and who otherwise cooperated in making these studies possible.
These organizations and individuals are too numerous to list here, but each
has been given proper acknowledgment in the individual  collaborative test
reports.

-------
                             SECTION 1
                            INTRODUCTION
     Under the authority of Section 111 of the Clean Air Act, as amended,
the U.S. Environmental Protection Agency (EPA), on December 23, 1971,
promulgated its first group of new source performance standards, which placed
restrictions on the allowable emissions from new plants in five industrial
           1                                                                  2
categories.  These were followed by standards for seven additional industries,
and standards covering several others have now either been promulgated or are
in varying stages of development.  In addition, four substances (asbestos,
beryllium, mercury, and vinyl chloride) have been designated hazardous air
pollutants, and emission standards for the first three have been promulgated
under Section 112 of the Clean Air Act.
      Of fundamental importance to enforcement of the above standards is
the measurement process.  The new source performance standards are set for
a particular facility after first determining in existing well- controlled
sources the emission limitation levels attainable using best available
control technology, with consideration being also given to cost.  A
specific source test method is used to determine these emission levels, and
that method in turn becomes the reference method for demonstrating compliance
with the new source performance standard.  At the time that the initial new
source performance standards were established, many of these methods had not
been fully evaluated, nor had their precision, accuracy, and general reliability
in the hands of typical users been determined.  It is for this reason that
the Quality Assurance Branch, Environmental  Monitoring and Support Laboratory,
EPA, has for the past 3. years been engaged in a systematic program to standardize

-------
or validate those source test methods that will be used to determine compliance
with Federal emission standards.
      Traditionally, the within-laboratory and between-laboratory precision
of test methods is determined through collaborative  testing  (round-robin
testing).  The collaborative test  is designed so  that each participant makes
one or more measurements on identical samples using  the same test method.
Then, from a statistical analysis  of the  results, an estimate is made of the
within-laboratory  and between-laboratory  precision of the test method.  This
general  technique  has been  used very widely  for the  validation of methods  for
the analysis of such  items  as water, drugs,  food  and agricultural products,
fertilizers, coal,  and  ores.
      Our experience  has  shown  that before a stationary source test method
can be  successfully collaboratively tested,  it must  be described  in sufficient
detail  to ensure  that each  collaborator uses exactly the  same sampling and
analysis procedures,  and  further,  it must give  repeatable results when one
 laboratory analyzes the same  sample several  times.   This  can only be
 determined through intensive method evaluation, which now constitutes a  large
 portion of the total  program.   This includes a  rigorous evaluation  of the
 field sampling as well  as the analytical  aspects  of  the method prior to
 collaborative  testing.
       This standardization  program has resulted in more  fully described  methods
 of known precision and accuracy and of proven reliability.   With  such reliable
methodology, the agency should be in  a better position  to enforce compliance
with  the standards that are either now or soon  to be in  the Federal regulations.

-------
                             SECTION 2
                    CONCLUSIONS AND FUTURE PLANS
      Results obtained thus far from the methods standardization program
indicate that the program has been successful  for the most part, although
questions still remain concerning the performance of some of the methods
tested.  We have shown that the methods for stack gas velocity and
volumetric flow rate, particulates, sulfur dioxide, nitrogen oxides, and
plume opacity (Methods 2, 5, 6, 7, 9-Ref. 1, pp. 24884-24885, 24888-24893,
24895) are indeed reliable if used properly under the conditions for which
they were designed.  The Orsat procedure (Method 3 - Ref. 1, pp. 24886-24887)
is generally satisfactory provided its limitations are recognized and its
limits of precision can be accepted.  The method for carbon monoxide (Method
10 - Ref. 2, pp. 9319-9321) is thought to be capable of good accuracy and
precision, but it appears that the suppliers of standard gases need to improve
the state of their technology.  The collaborative test also indicates that some
users of nondispersive infrared (NDIR) instrumentation need further training
in correcting for the nonlinear response characteristics of their instruments.
     Results of the tests of the sulfuric acid mist/sulfur dioxide method
(Method 8 - Ref. 1, pp. 24893-24895) indicate that either this method suffers
from extremely poor precision, or the test design was incapable of compensating
for the normal range of concentration and velocity variation in time and space
at the selected test site.  The cause of this poor precision has still not been
found, but another test of the method, similar to the paired train test of
Method 5 is planned as a future project.
      Much of the imprecision and lack of accuracy observed in the test of
the beryllium method (Method 104 - Ref. 3, pp. 8846-8850) seemed to occur in
the analysis phase.  But this was probably related more to collaborator

-------
competency than to deficiences in the method itself.  Atomic absorption
is known to be a reliable technique for the analysis of  beryllium  in the
absence of interferences or when such interferences can  be eliminated.  There
are presently no plans for a future test of Method 104.  But any such future
test would likely  include a more elaborate test  design as mentioned above for
Method 5.
     Before leaving  the subject of these methods, it should be  pointed out  that
EPA Methods 1 through 8 have recently been revised and many of  these revisions
reflect refinements  and improvements brought about through the  methods stand-
ardization program.  Other changes are  the result of other experience gained
within EPA since the initial promulgation  in December  1971, as  well as the
present agency  policy toward use of metric units.  While the basic chemistry
and procedures  of  these methods remain  unchanged, the  revisions supply much
needed detail and  correct  other deficiencies of  the original 1971  versions.
They  are  due  for proposal  in a forthcoming issue of the  Federal Register.
      Several other test methods  are  now in  various stages of  the  standardization
process.  The EPA  method for determining mercury emissions from chlor-alkali
plants  (Method  102 - Ref.  3, pp.  8840-8845)  has  been evaluated  and the analytical
                                                  4
phase of  the  method has been collaborative tested.   The results  indicate that
many  analysts have difficulty  achieving satisfactory precision  with the method,  so
a modified procedure has been  developed,  which eliminates many,of  the problems of
the original  Method 102.   A collaboratove test of this modified procedure has
shown conclusively that  it is  superior  to the  original.  The agency is now
planning  to adopt  this modified procedure as the official compliance test method.
      The EPA method for determining  the hydrogen sulfide content  of petroleum
refinery  process fuel gases  (Method  11  - Ref.  2, pp. 9321-9323) has been evaluated
in the laboratory  and found to suffer a major  interference from thiols, which are

-------
common constituents of such gas streams.  A modified method, which is designed

to eliminate this interference problem, was therefore developed, and this

method is now being evaluated.   A full-scale collaborative test is planned

for the near future; if successful, the method will be, recommended as a

replacement for the current Method 11.

     The methods for fluorides (Methods ISA, 13B)  and vinyl chloride

(Method 106)  are currently undergoing evaluation, although this work is

still in its early stages.  Field investigations of the fluoride methodo-

logy have indicated that the field sampling phase of Methods ISA and 13B
                      Q
is generally reliable.   These methods will also be submitted to inter-

laboratory collaborative testing upon completion of the laboratory and

field evaluations, provided that these evaluations prove them to be

technically sound.  Other methods will be introduced into the program

as priorities dictate, and as time and funds permit.

-------
                                SECTION 3
                     THE METHODS STANDARDIZATION PROCESS

STEPS IN THE STANDARDIZATION PROCESS
     The validation of source test methodology is a complex, lengthy, and
costly process, but years of experience have indicated the need for a complete
and systematic examination even for those methods and measurement principles
with fairly extensive histories of usage.  Basically, this examination
consists of the following steps:
      First, the method is examined for technical accuracy, clarity, complete-
ness of detail, etc.  Regardless of how good the inherent capabilities of a
measurement principle, a method may not give reliable results if it is poorly
written, has errors in critical spots, or has such a scarcity of procedural
detail that operations cannot be duplicated from one user to another.  If the
method is found to be deficient here,  it may need to be rewritten.
      Second, the method is subjected  to a thorough and rigorous laboratory
evaluation.  This evaluation may include investigations of sample collection
efficiency, applicable concentration range, mode of calibration, effects of
interferences, etc.  It may be  said that these are the job of the researcher
and should be done at the time  the method is developed.  This is true in
principle, but experience has shown that many times such investigations were
not conducted, or were carried  out in  such a superficial manner as not to
uncover significant method deficiencies.  This laboratory evaluation generally
                                                         Q
concludes with an experimentally designed ruggedness test  to determine the
critical operational parameters of the method.  The results of this evaluation
may indicate the need to modify or rewrite the method.

-------
      Third, the method receives field evaluation at an applicable test site
to determine its overall suitability for making the intended measurement.
This evaluation may approach routine source testing, and is designed to
check out the performance of the method under typical field conditions
and to evaluate the source itself as a possible site for a collaborative
test.  When laboratory investigations by themselves are insufficient
for determining the performance cf certain source test methods, extensive
field evaluations and statistically designed experiments using novel and
                                               8 10
original evaluation techniques may be required. '    As before, results
of this work may indicate the need for method modification or revision.
     As the culmination to this chain of events, the method is submitted
to an inter!aboratory collaborative test at an appropriate test site using
 qualified participants to determine its precision, accuracy, and field
reliability.  Based on the test results and other information gained from
the test, a final draft of the method is prepared and recommendation is made
for its adoption by the agency. A report is also prepared documenting the
test itself.  Since collaborative testing is the major milestone event in
validating the performance of a method as used by different individuals, it
will be discussed separately in a subsequent section of this report.
     The above sequence of steps represents a validation process that has
evolved with time from a mute Dimple and naive approach at the beginning of
the program.  When the standardization process was first begun for those
methods that had already hoon published in the Federal regulations, the
assumption was made that these were well written, fully described, and
adequately researched methods needing no further evaluation.  Therefore,
steps 1 and 2 of this sequence were all but eliminated, and plans were made
for method collaborative testing after limited field evaluation.  This is

-------
now considered to have been a mistake.  While  it  is  true that most of  these
methods are based on sound measurement principles  that  had  been widely used
prior to promulgation, it was found  that  procedure details  were occasionally
too sketchy to ensure that different users would  execute procedural operations
in a sufficiently similar manner,  too many options were frequently allowed in
the way certain  operations were  executed, and  in  a few  instances, the  method
had faulty or  incorrect  instructions.   It is for  these  reasons that all
methods introduced  into  the validation program are now  taken through the
sequence given above.

DESIGN OF  COLLABORATIVE  TESTS
      Traditionally,  analytical  methodology  has been validated through the
process of collaborative testing.   The  collaborative test  is designed  so
that  qualified participants  (collaborators)  each  make one  or more measure-
ments  on  identical  samples  using the same test method.  Then, from a
statistical  analysis  of  the  results, an  estimate  is  made of the within-
laboratory and between-laboratory precision  of the test method.   If the
samples  are of know concentration (unknown,  of course to the analyst), or
if a  material  can be  supplied for analysis  that simulates  a known sample
concentration, then an estimate of the  accuracy of the  method can be obtained.
This  general  technique has  been used very widely for the validation of methods
for the  analysis of water,  drugs, food  and  agricultural products, fertilizers,
coal,  ores,  etc.
       When collaboratively testing a stationary source  emission method, the
sampling  procedure, as well  as the analytical  aspects of  the method, must  be
evaluated.   This usually means that the participants must  sample  a  real
source representative of those where the method will be used.   However,

                                          8

-------
depending upon the physical state of the material being sampled, this
may create a series of complex problems with no easy solutions.  For the
test to be successful all participants must have access to the same
pollutant concentration in the stack, for, if they cannot obtain identical
samples, they surely will not get reproducible results.' For gaseous
pollutants, this can frequently be accomplished by extracting a side
stream from the stack and piping it to ground level, where it is delivered
through a manifold.  The collaborators simultaneously sample the gaseous
pollutant through ports on the manifold.  Although a manifold could be
constructed to accommodate a relatively large number of collaborators,
participation is usually limited to ten or less because of coordination
problems, the great expense of maintaining personnel in the field, and
the relative scarcity of qualified collaborators for any particular test.
     Attempts to collaboratively test methods for pollutants that exist
in particulate form become complicated by the requirement that all test
teams sample the material isokinetically directly from the stack.  Here
the problem becomes one of the simultaneous extraction of representative
samples from the stack by each of the collaborative test teams.  Since
spatial and temporal variations may constantly be occurring in both the
velocity profile and the pollutant profile, an attempt must be made to
compensate for this so that each participant has access to statistically
identical or equivalent samples.  In the first particulate tests, this was
attempted by allowing each test team to sample at each traverse point in
the stack for the same period of time during the 2-hour run, although
in any given time period each team would be sampling at a different
      111213
point.  ''    For circular stacks, this automatically limited partici-
pation to four test teams, each sampling independently through one of the

-------
90-degree ports, and each rotating to the next port on a signal from the test
coordinator.  It was reasoned that, over the entire sampling period, each
collected sample should be representative of the average stack pollutant
concentration for that time period.  However, since the estimates of
between-laboratory variability are based upon the differences observed
among collaborators within each sampling run, these estimates would be
affected to the extent that the samples are nonrepresentative in character.
      Due to  the nature of the sampling procedures and the requirements
for simultaneous sampling by all  collaborators, 2 weeks was the minimum
time  in which a collaborative test of some source emission methods could
be conducted.   It  is difficult to find a test site where unit operation
is essentially  constant for that  length of time, due  to load demand
changes  or  the  possibility of process upsets.   Such changes can affect
the pollutant loading  from run to run during the test.  Thus, the
collection  of true replicate samples on consecutive runs becomes almost
impossible,  and a  more indirect approach was used to  estimate the varia-
bility within laboratories on  repeated measurements.  This was done by
grouping the determinations  (runs)  into blocks  of approximately equal
average  concentration  using  the most appropriate blocking criteria avail-
able.   (Such blocking  criteria  are  based  upon unit operating parameters,
which would be  expected  to  influence the  emission levels, such as fuel
feed  rates,  power  generation  levels, production rate, raw material feed
rates, and  electrostatic  precipitator  voltages. Also, opacity data from
in-stack transmissometers have  been used  along  with  unit operating
parameters  in test data  blocking.)   The within-laboratory standard
deviation estimates  were  then  calculated  based  upon  the variability of
each  collaborator  within  these  blocks.   But, while this procedure did tend
                                         10

-------
to reduce the influence of process changes on these precision estimates,
it could not eliminate their effects completely, and in some cases this was
reflected in within-laboratory estimates that are somewhat higher than would
otherwise be obtained.
      Because of problems cited in the last two paragraphs, a new and
improved approach was sought to the collaborative testing of methods for
pollutants that exist in particulate form.  The objective was to develop
a test design that would allow sampling by a greater number of collab-
orators and that would not be affected by the random variations in the
velocity and pollutant concentration profiles mentioned above.  The result
was a new test design using paired sampling trains in which two probe-pitot
tube assemblies could simultaneously sample at very nearly the same point in
the stack.  Since the paired probe tips sample in rather close proximity,
this greatly minimizes the effects of spatial and temporal stack variation
on the samples collected by the adjacent probes.  In addition, this allows
the extraction of up to eight individual  samples per run on a circular stack,
with a resultant increase in the number of degrees of freedom for the
statistical  analysis.
      The test design  specified a 3-week sampling period with six
independent  test teams operating separate trains in three of the paired
train systems for the  entire duration of the test.  Both trains in the
remaining pair were operated by a single team, with one operator running
both meter boxes.  Since all equipment in each train in this pair was
virtually identical, had been carefully calibrated, and was operated by
the same individual, then the sample pair collected during any given
run could be considered replicates.  The participation of the collaborator
that operated this pair of trains was restricted to 1 week, with a

                                       11

-------
different team participating in this capacity during each of the 3 weeks.
Thus, four pairs of samples were obtained on each run -- one sample
pair by a single collaborating laboratory, and three sample pairs by three
pairs of laboratories.  At the end of each 30-minute sampling interval,
each paired train assembly and test team rotated to an adjacent port in the
stack so that, at the conclusion of each run, each team and train had sampled
an equal time at each traverse point.
      Estimates of the  variability within a laboratory were based upon
the  differences in concentration reported by the paired-train laboratory
for  the replicate samples on each run.  Differences among laboratories were
estimated by contrasts  between paired trains that were operated by the six
single-train laboratories.  This test design has been applied to a collab-
                                                                   14
orative test of EPA Method  5 to be discussed in an ensuing section.
      Collaborative testing of source emission methods suffers from two
restrictions,  that are  not  found in  testing methods for other materials.
Most important of these is  the  limited  number of participants.  Attempts
 to compensate  for this  by taking a greater number of  samples only partially
 solves  the  problem;  i.e., with  only  four  laboratories participating, an
 equipment malfunction or a deficiency in  the performance  of just one
 laboratory  can have  a very adverse  effect on  the  outcome  of the test.
This obstacle  has been  largely  overcome by the  paired-train test design
discussed above,  but  at a great increase  in  cost.   A  less serious
restriction concerns  the limited  pollutant concentration  range  that
can  be  examined in  a  collaborative  test at a  real  source. Of course,
with cooperation  from plant personnel,  some  range in  concentration can
be obtained by varying  conditions  such  as precipitator  voltages, excess
air, etc.,  and the  test can sometimes be  augmented  by standard  cylinder

                                        12

-------
gases covering a range of concentration.  But only rarely can the complete
applicable range of a method be investigated using real  samples at a
single test site.  Also, the use of such standard gases  is usually the
only means by which method accuracy can be evaluated, since the true
concentrations of pollutants in the stack are rarely, if ever, known.
However, since one cannot duplicate in a cylinder gas the environmental
conditions and possible interferences that could exist in some stack gas
streams, such cylinder gas data must be regarded as the  best accuracy of
which the method is capable under ideal conditions.

ANALYSIS OF COLLABORATIVE TEST DATA
     Before discussing the results obtained from the collaborative tests,
a brief discussion of the information available from collaborative
testing, and the manner in which this information is derived is in
order. A primary purpose of the test is the determination of the precision
components of the method, i.e., how closely a user can expect to repeat
his results on subsequent application of the method on identical samples
and how closely different users can expect to agree when analyzing
separate but identical samples.  These precision components are estimated
using either a coefficient of variation approach or an analysis of
variance technique after first performing suitable data  transformations
when necessary.
     Prior to evaluating the precision of the method, the determinations
are tested for equality of variance using Bartlett's test   for honcgeniety
of variances.  In addition, the determinations are passed through two
common variance stabilizing transformations, the logarithmic and the square
root, and Bartlett's test is again applied.  The use of transformations
                                   13

-------
serves two purposes.  First, it can put the data into an acceptable form
for an analysis of variance; and second, it can provide information
concerning the true nature of the distribution of the sample points.
The transformation that provides the highest degree of run equality of
variance is accepted and used in deriving the precision estimates.
      Acceptance of the logarithmic transformation implies that there is
a proportional relationship between the true mean, 6, and the true standard
deviation, a, and that the ratio of the standard deviation to the mean
(the coefficient of variation, 6) remains constant.
Once  this relationship  has been established, the data may be analyzed
in  its  linear  form  and  the standard deviations presented as a co-
efficient of variation  times  an unknown mean, 6, i.e.
                                   a  =  6  6.
Alternately, an  analysis  of variance  may  be performed on the transformed
data, and the  components  of variance  then converted  back to the linear
form  to provide  uniform coefficient of  variation estimates for the
determinations.
      When  the distributional nature  of the data is  such that its original
or  linear form provides the highest degree of equality of variance, then
this  implies that there is a  constant variance that  is independent of
the mean level.   In this  case, the variances are estimated by a pooled
analysis of variance on the original  data.
      In order  to provide  the  maximum  useful information, the test must
be  designed and  the data  analyzed  in  such a fashion  that the precision
estimates for  a  determination can  be  partitioned into its respective

                                        14

-------
variance components.  The variance components of interest are those that
estimate the variability within a laboratory, the overall variability between
laboratories, and that portion of the overall variability that is due to the
individual biases of different laboratories.
      The within-laboratory standard deviation, 0, measures the dispersion
in replicate single determinations made by one laboratory team (same field
operators, laboratory analysts, and equipment) sampling the same concentration
level.  Simply stated, this is the measure of a laboratory's ability to repeat
its own test results when all experimental factors and relevant environmental
conditions are held constant.  This term has also been referred to as the
standard deviation of repeatabilty or, more simply, "repeatability," and carrys
with it the concept of making repeated measurements on the same sample, or on
identical samples.  This value is estimated from within each collaborator-block
combination or from replicate samples collected by the same laboratory using
paired sampling trains.
     The between-laboratory standard deviation, ab, measures the total
variability in a determination due to simultaneous determinations by
different laboratories sampling the same true stack concentration, y.
                                  2
The between-laboratory variance, a b,  may be expressed as
                              2   =   2   +   2
                             a b     a L     a

and consists of a within-laboratory variance plus a laboratory bias
          2
variance,a .,  The between-laboratory standard deviation is estimated
using the run results or the within-run differences between paired sampling
trains operated by different laboratories.  This term estimates the degree
of agreement to be expected among different laboratories who have independ-
ently collected and analyzed identical samples.  The between-laboratory
                                        15

-------
standard deviation is frequently called standard deviation of reproducibility,
or "reproducibility".                              	
                                                   12.     2
      The laboratory bias standard deviation, CTL =Ja b - a   , is that
portion of the total variability that can be ascribed to differences in the
field operators, analysts, and instrumentation, and to different manners
of performance of procedural, details left unspecified in the method.
This term measures that part of the total variability in a determination that
results from  the use of the  method by different laboratories, as well as
from modifications in usage  by a single laboratory over a period of time.
The laboratory bias standard deviation is estimated from the within-laboratory
and between-laboratory estimates previously obtained.
       Before  leaving this section, it is appropriate to say  something
about  how method accuracy is expressed.  With respect to the accuracy of a
method we attempt to define  its absolute accuracy; i.e., how well does the
measurement value agree with the actual or true value.  As stated previously,
estimates of  method accuracy must  frequently be based on the analysis of
standard cylinder gases.  One  approach is to have each collaborator measure
the concentration of the  cylinder  gas  (or other material), after which a
mean and a standard deviation  are  calculated for the group of collaborators.
A 95 percent  confidence  interval  is  then calculated around this mean.  If
the true concentration of the  cylinder gas lies within this  95 percent
confidence interval, then the  method is  said to be unbiased  and accurate
within the limits of its  precision.   A more  common means of  stating method
accuracy consists of averaging the respective  biases of all  collaborators
and expressing this as a  percentage  (either  positive or negative) of the
overall mean, or of the true value,  when known.  Both approaches to stating
method accuracy will be found  in  the various collaborative test reports.

                                       16

-------
                             SECTION 4
             RESULTS OF THE METHODS STANDARDIZATION PROGRAM

     Since the initiation of the program in August 1972, evaluations and
collaborative studies have been conducted on a number of methods.  While
the overall aim of the project is the standardization of these methods,
the evaluations, collaborative tests, and subsequent data analysis have
been structured to determine both the strong and weak points of the
methods.  By determining those areas of weakness in a given method,
recommendations have been made for changes that will improve the accuracy
and precision of that method.  The actual collaborative testing phase of
the program began with a test of EPA Method 7 for oxides of nitrogen (NOV)
                                                                        X
in December 1972   and has more recently included tests of Method 9
(opacity) in October 1974   and Method 5 (particulates) in September 1975.
Table 1 lists those methods for which some collaborative testing has already
been completed, and a discussion of the results of these investigations will
now follow.

STACK GAS VELOCITY AND VOLUMETRIC FLOW RATE
     Collaborative tests of the Type S Pitot Tube Method (EPA Method 2 - Ref. 1,
pp. 24884-24885) were conducted in conjunction with tests of EPA Method 5
at three sites:  a Portland cement plant, a coal-fired power plant, and a
                      18
municipal incinerator.    There were 15, 16 and 12* traverses at the three
respective sites and four collaborating laboratories at each.  The data from
one laboratory at the power plant site were not used, and some determinations
were not made due to equipment failure during the sampling runs.  This resulted
in a total of 150 separate determinations of both velocity and volumetric flow
rate for use in the data analysis.
                                        17

-------
      The runs at each site were grouped into blocks based upon the
velocity heads.  The precision components were shown to be proportional
to the mean of the determinations and are expressed as percentages of
the true mean as shown in Table 2 for both the velocity and the
volumetric flow rate determinations.
      A more recent test of Method 2 was conducted at a different
municipal incinerator, and included 13 runs by six different collab-
        14
orators.    Test design and data analysis were similar to those used
for the above  studies, as were the resulting precision estimates.
      Based upon the results of these tests, the precision of the
volumetric flow rate determination seems adequate for use with other test
methods  in determining pollutant emission rates.  The small a. indicates
that  the method is  inherently rugged; i.e., it is not subject to large
biases fron1 one user to another.  A previous single-laboratory study
indicated that for  nonturbulent streams, Method 2 provides an accurate
estimate of the true stack gas velocity  at  the higher velocities of 55 to 60
                19
feet  per second.    Relative  accuracy is somewhat less at velocities of
about 30 feet  per second.

GAS ANALYSIS FOR CARBON DIOXIDE,  EXCESS  AIR, AND DRY MOLECULAR WEIGHT
      Collaborative  tests  of the Orsat methodology for the determination of
                                               20
C02,  excess air, and stack gas molecular weight   were conducted in con-
junction with  the three tests of  EPA  Method 5 to be discussed in a
subsequent section.  The  Orsat procedure tested was similar to that of
EPA Method 3  (Ref.  1, pp. 24886-24887) with one important exception.
                                        18

-------
Method 3 required that the analysis of a gas sample be repeated until
three consecutive analyses that vary no more than 0.2 percent by volume
for each component being analyzed are obtained.  In these tests, the
average of three consecutive analyses was used, but the requirement
that they differ by no more than 0.2 percent by volume" was not enforced.
This was a very significant deviation from Method 3, and the test schedules
were such that the results may be questioned.  (See subsequent discussion
of Method 5 tests.)  The results will therefore not be reported here.
      Five other collaborative tests have been conducted to investigate
                                         21
various aspects of the Orsat methodology.    Four of these were field
studies in which four to seven collaborators analyzed replicate samples
from a larger bulk sample of combustion effluent gas.  The number of
replicate analyses allowed varied according to the design of the
experiment, and ranged from four to seven.  Under these restrictions,
none of the collaborators met the Method 3 operator performance criterion
of three consecutive analyses that differ by no more than 0.2 percent
by volume for each component, so the results are not relatable directly
to Method 3.
      The fifth test in this series was a laboratory study in which seven
collaborators analyzed replicate samples from an EPA stationary source
simulator facility.  Three different levels of carbon dioxide and oxygen
were studied, and only those values that met the performance criterion for
a valid Method 3 analysis were used in the data analysis.  From these
results, between-laboratory standard deviation for Method 3 in the range
of 0.20 to 0.39 percent C02> and 0.38 to 0.55 percent 02 were obtained for
these two components, depending on the level tested.  Within-laboratory
                                       19

-------
standard deviations were not calculated because repeated measurements of
sets of three analyses that met the Method 3 performance criterion were
not made.
      The most recent collaborative field test of Method 3 was conducted
at a municipal incinerator  in  conjunction with a test of Method 5, and
consisted of  13  runs with up to seven collaborators sampling per run.
A revised version  of Method 3  was  used for this test.  The operator
performance criterion  of the revised method states that the analysis
must be repeated until  the  molecular weight for three consecutive
analyses differs from  their mean  by no more than 0.3 gram/gram-mole.
       Precision  estimates were obtained  for the various parameters using
an  ANOVA approach and  are summarized  in  terms  of standard deviation in
Table  3.  In  addition,  the  Orsat  CO-  data were examinee! to estimate
the magnitude of the  error  that might be introduced when a determined
particulate  concentration is  corrected  to  12  percent (XL.  The between-
 laboratory  standard deviation  was 0.40  percent (XL by volume at the
 levels encountered at  this  test site.   If  the true (XL  level were 2.3
percent, then two independent  laboratories  might  be expected to obtain
values of 2.1 and 2.5  percent, respectively.   For  two  laboratories that
had determined the same particulate  concentration,  this would result in
a  19 percent  difference in  the reported particulate concentration after
correction  to 12 percent (XL.
       Based  upon the  results  of all  studies completed,  it is concluded
that:   (1)  the Orsat method is tedious  and  requires great attention to
detail  and  technique;  (2) the  original  EPA  Method  3 operator performance
criterion was not easily met  in the  field,  and even meeting this criterion
does not ensure  that highly reproducible and  accurate results will be
                                   20

-------
obtained; (3) the use of Orsat data to routinely convert particulate
catches to such reference conditions as 12 percent C02 and 50 percent
excess air can introduce significant errors into the corrected particulate
loading; and (4) the Orsat is quite satisfactory for use in determining
stack gas molecular weights.

MOISTURE FRACTION WITH USE OF METHOD 5
     Collaborative tests of the procedure for determination of moisture
fraction (in conjunction with EPA Method 5 - Ref.  1, pp. 24888-24890)
have been conducted at a Portland cement plant, a  coal-fired power plant,
and a municipal incinerator,  using four sampling teams carrying out 15,
                                                           20
16, and 12 sampling runs, respectively, at the three sites.    The absence
of several values from the data set necessitated using runs as repetitions,
and undoubtedly caused the error term to be inflated due to run-to-run
variation in stack moisture content.  Other factors, as discussed in the
succeeding section on particulates likely adversely affected the results.
An analysis of variance procedure on this data produced an estimated within-
laboratory standard deviation of 0.032, a between-laboratory standard
deviation of 0.045, and a laboratory bias standard deviation of 0.032.
     A more recent test of a revised version of Method 5 was conducted at
a second municipal incinerator.  This test consisted of 13 runs over a 3
                                               14
week period with eight trains sampling per run.    The data were submitted
to statistical analysis using an ANOVA model. A two-way model without
interaction was used to avoid blocking the runs, and the run by train
interaction was used for the error term.  This test design and data
analysis resulted in estimates of 0.009, 0.012, and 0.008 for the moisture
                                        21

-------
fraction within-laboratory, between-laboratory, and laboratory bias standard
deviations, respectively.  These are considerably better than those previously
reported and are probably more representative of the true performance capa-
bilities of the method.

PARTICULATES
      Collaborative  tests of EPA Method  5  (Ref. 1, pp. 24888-24890) for
determination  of particulate matter emissions were conducted at a coal-
                   11                         12
fired power plant,    a Portland cement  plant,   and a municipal incin-
erator.    Four sampling teams participated  in each test, accomplishing
16,  15,  and 11  runs,  respectively,  at the three sites.  At  the cement
plant and  the  incinerator,  sampling was performed through four ports
located  at 90-degree  angles on  the  circular  stacks.  The power plant
sampling was  done  through four  ports  in a horizontal duct leading directly
to the stack.   In  an  attempt  to  ensure  collection of statistically equivalent
and representative samples  by all  participants, each team sampled at each
traverse point in  the stack for the same period of  time during the course
of each 2-hour run.  This required that the  teams sample simultaneously,
each sampling through a different port  and then rotating to an adjacent
port at each  quarterly time interval  until all  four ports had been sampled
by each team.
      For the  purpose  of statistical  treatment,  the  determinations were
grouped into  blocks using the most appropriate  blocking criteria  that
could be devised for  each test.   A coefficient  of variation approach was
then used  to  calculate a within-laboratory,  between-laboratory, and
laboratory bias component for each test.  These ranged  from 25.3  to 31.1
percent, 36.7  to 58.4 percent,  and 19.6 to 51.0 percent, respectively, for
the three  tests.
                                      22

-------
      It immediately becomes obvious that these estimates indicate Method 5
to have relatively poor precision.  However, it had been shown by other
single-laboratory studies that under very carefully controlled conditions,
using multiple-probes sampling simultaneously over a very small area, with
well designed equipment, hiqnly competent personnel who execute all operations
in an identical and representative manner, etc., that Method 5 is capable
                                       22
or givinr precise and reliable results.    So before accepting these results
as being representative of the true capabilities of Method 5, a few factors
were examined that could have contributed to this apparent imprecision.
      The first factor concerns the limited number of participants that
could be accomodated, as has been mentioned earlier.  While we attempted
to find fully qualified people to participate in these tests, four collab-
orative teams per test represents a very small statistical population
upon which to base our conclusions.  Thus, any bias or deficiency in the
performance of a single team has a very significant effect on the apparent
precision of the method.
      Another factor that might have contributed to the apparent imprecision
of Method 5 is that of collaborator fatigue.  Because of the very con-
siderable expense involved in running these tests, it was decideded that two
sampling runs per day would be made in order to collect, within a 2-week
period, the 12 to 16 samples per collaborator required for a meaningful
statistical treatment of the data.  With the uprigging and downrigging of
the equipment of four test teams, the movement of this equipment around the
stack during port changes, performance of Orsat analyses, etc., excessively
long days of 12 to 14 hours were often required.  While such work days may not
be uncommon in source testing, it is abnormal to maintain such schedules for
the duration of time required for a collaborative test.  It is thus possible

                                      23

-------
that participant fatigue may have had some adverse effects on method precision.
      The tests were designed so that during each run the collaborators
rotated from port to port, each sampling the same points in the stack over
the course of each 2-hour run, though not at the same point at the same
time.  With this pattern, it was hoped that any effects due to spatial and
temporal variations in the stack particulate concentration would be randomized
out and that all participants would statistically be able to collect identical
and representative samples.  However, v/e doubt that we completely eliminated
all effects due to spatial and temporal concentration variations, and these
could  be reflected to some extent in  the precision.
    At the time of these tests, it was difficult to find sites with the
necessary facilities and with personnel who would voluntarily cooperate in having
their  plants used for such a program.  Therefore, the selected test sites were
frequently less than desirable from the standpoint of port  location, distance
of sampling point from  flow disturbances,  velocity profile, control equipment,
pollutant concentration  range, etc.   Also,  it  could not  be  required that the
plant  maintain steady-state conditions over  the  duration of the collaborative
test as  could be required  in  compliance  testing.  For example, the sampling at
the power plant was  conducted  in  a  horizontal  duct under conditions that did
not really conform to EPA  Method  1.   And the particulate loading at the cement
plant  varied by a factor of eight over the 2-week period of the test.  Obviously,
this might make the  simultaneous  collection  of representative samples by the
various  test teams more  difficult.
       At the time of designing and  performing  these tests,  it was believed
that Method 5, as promulgated, was written in  sufficient detail to assure
that different users would execute  it in a proper and reproducible manner.
Therefore, participants  in these  tests were  allowed to use  Method 5 in
                                       24

-------
accordance with their exact, individual interpretations of the method's
instructions without outside influence from the test coordinator.   However,
it is now apparent that the method lacked sufficient clarity in some critical
areas, and some test teams lacked the experience necessary for its proper
application.  So, it may be possible that the inclusion of more detail  into
the method would have improved its precision.
      Because of problems and uncertainties in the original test designs, a
fourth collaborative test of Method 5 was undertaken using the paired sampling
train test design previously discussed.    The test, conducted at a municipal
incinerator in September 1975, used a revised and more detailed version of
Method 5 since the original method write-up was considered deficient.  At the
same time, the philosophy of conducting collaborative testing was changed to some
extent.  First, potential collaborators were screened more carefully to ensure
that only well experienced and competent personnel would be selected to
participate in the test.  And the role of the test supervisor was increased
in order to assure that the collaborators operated within the constraints of
the revised Method 5 and the associated Methods 2 and 3.  Thirteen runs were
accomplished over a 3-week period, with one run per day, eight samples  per
run.
     The data analysis for the within-laboratory precision estimate was
based upon the differences in concentration reported by the paired-train
laboratory for the replicate samples on a given run.  The standard
deviation estimated from the pooled data of all three laboratories is
13.8 mg/m , for a coefficient of variation of 10.4 percent of the average
determined concentration (Table 2).  The laboratory bias standard deviation
was estimated by ANOVA from the contrasts between paired trains operated
by the six single-train laboratories.  This estimate was 8.15 mg/m , which

                                   25

-------
gives a coefficient of variation of 6.1 percent.  Combining these gives
                                                             o
a between-laboratory standard deviation estimate of 16.0 mg/m , or 12.1
percent of the mean.  There was no detectable effect among these results due
to spatial and/or temporal changes in the stack flow.
     These test results show the precision capabilities of Method 5 to be
considerably greater than had been previously thought, and this may be
due in part to the improved test design.  Spatial and temporal source
effects were eliminated; the testing followed a more relaxed pace; and
the larger population of participants was a definite advantage.  In
addition, much tighter control was exercised over the actions of the
collaborators than was previously done.  The participating test teams
were  probably among  the best in  this country; nevertheless, three of the
meter boxes were  found to be outside the allowable specifications for
dry gas meter calibration according to  the revised method when checked
at the test site.  Had not  the test plan provided for calibration checks
on-site (and recalibration  if necessary) the outcome of the test would
surely have been  adversely  affected.  Other problems were observed during
the test  itself,  and these  were  called  to the attention of the collaborators
for correction.   Such corrective actions had not been taken in the former tests.
      It is impossible to determine the  exact effect on the test of using
the revised method.  However, since the revision does supply much of the
needed detail in  Method 5,  it is safe to assume that the effect was
positive.  In fact,  if anything  is to be learned from these studies, it
is that successful execution of Method  5 requires care and close attention
to detail.   In the hands of a competent test team who will use such care
and attention, Method 5 is  capable of giving satisfactory and precise
results,  as shown by this most recent study.

                                  26

-------
SULFUR DIOXIDE
     EPA Method 6 for S02 (Ref. 1, pp.  24890-24891) was evaluated,19 and
then was collaboratively tested at two different sites, a 140-megawatt
coal-fired electric generating plant and an oil-fired pilot combustion
plant.    A randomized block design was employed at each site, with four
different blocks of emission concentration levels that ranged from about
        3                   3
232 mg/m  to about 1750 mg/m .  These blocks, each of which consisted of
four runs sampled at 60-minute intervals, were obtained on consecutive
days.  The intent was to maintain a constant true S02 emission concen-
tration level at the sampling points on the four runs within each block
to permit an accurate determination of the within-laboratory precision of
Method 6.  Samples at the power plant were collected from a manifold
through which a stream of the stack gas was delivered to rooftop level,
and S02 concentration levels were varied by the injection of dilution air
upstream of the sampling ports.  At the pilot plant, samples were collected
directly from the duct downstream of the furnace and heat exchanger, and
concentration levels were varied by doping the system with S02-  Each run
involved the simultaneous collection of an exhaust sample over about a
20-minute period by each of four collaborating laboratories through their
assigned ports.
     In addition to the above experiments, two auxiliary tests were also
conducted at both sites to complement the real-sample data obtained.  The
first of these was a gas cylinder accuracy test to provide an independent
assessment of the accuracy of Method 6.  This test involved three different
standard cylinder gases containing mixtures of S0« in nitrogen, the
concentrations of which had previously been determined by the supplier
with a claimed accuracy of +1 percent.   On each of the test days, each

                                       27

-------
collaborator obtained one sample from each cylinder according to the
Method 6 procedure to be later analyzed with the day's collaborative
test samples.  The second auxiliary test involved the triplicate analytical
determination of the SO^  concentrations implicit in four unknown standard
sulfate solutions to isolate the accuracy and precision of the sample
analysis phase of Method 6.
     An analysis of the collaborative test data using a coefficient of
variation approach provided estimates of the precision components listed
in Table 2.  From these values, it is evident that Method 6 is capable
of good precision when used by competent personnel.  Analysis of the
standard sulfate solution produced standard deviations of 1.1, 2.4, and
2.2 percent  of the mean value, respectively, for the within-laboratory,
between-laboratory, and laboratory bias terms.  In comparing these
values with  the tabled values for the entire method, it becomes obvious
that most of the precision variation resides in the field sampling phase
of Method 6, as opposed to the analytical phase.
     The gas cylinder  accuracy test  showed Method 6 to be accurate at
SCL concentrations  up  to  about 480 mg/m  , but indicated that it acquires
                                                                    3
a significant  negative bias  above the range of about 48fO to 800 mg/m  .  The
apparent bias  was  found to be  in the field sampling phase rather than  in
the analytical phase of the  method.  This conclusion was based on the
fact the collaborators reported  values for the high-level cylinder gases
that were  generally lower than  those claimed by  the gas supplier.
However, it  is now thought that  this conclusion  of  negative bias was
incorrect, and that the  low  reported values probably  resulted  from decay
of  the  cylinder  S02 concentration  or some other  unknown phenomenon.
                           2
-------
practically 100 percent and the method is unbiased up to S02 concentrations
of at least 5000 mg/m3.
NITROGEN OXIDES
     EPA Method 7 for NOx ( Ref. 1, pp. 24891-24893)'was evaluated for
                                      19
interference effects in the laboratory   and then subjected to collaborative
testing at the same two sites used for the Method 6 tests described
above.    A third test was conducted at a nitric acid plant  which utilizes
                                                  24
a proprietary catalytic ammonia oxidation process.    The tests were based on
a randomized block design similar to that described above for SO^.  Tested
                                         3                   3
concentrations ranged from about 160 mg/m  to about 2400 mg/m , expressed
as N02-  Auxiliary tests at the three sites included the sampling of
standard cylinder gases at the coal-fired power plant and the pilot
combusion plant, and the sampling of a standard test atmosphere at the
nitric acid plant set up and controlled by personnel of the National
Bureau of Standards.  Four collaborators participated in each test.  In
addition, the collaborators were given a series of unknown potassium
nitrate standard solutions to be analyzed with the samples.
     The data from the first two tests were pooled to provide a larger
data base and then analyzed using a coefficient of variation approach.
A similar analysis was performed on the nitric acid plant data.  The
resulting precision estimates are presented in Table 2, first for the
pooled power plant/pilot combustion plant data, and then for the nitric
acid plant data.  Note that the estimates for the nitric acid plant study
were uniformly higher, roughly  by a factor of two.  Because of the larger
data base resulting from the pooling of the data  from the first two tests,
and because of the frequently unstable conditions encountered at  the
                                       29

-------
nitric acid plant, it is felt that more reliance may be placed on the
precision estimates obtained from the former tests.
     An analysis of variance on the nitrate solution data disclosed that
nearly all of the analytical laboratory-to-laboratory variance component
is attributable to day-to-day variation in laboratory measurements instead
of to significant laboratory biases.  Analysis of the standard test atmos-
phere established that Method 7 is unbiased and accurate within the limits
of its precision.

SULFURIC ACID MIST/SULFUR DIOXIDE
     EPA Method 8 (Ref. 1, 24893-24895) for the measurement of sulfuric acid
(H2S04) mist (including any free S03) and sulfur dioxide (S02) was collaboratively
tested at a dual absorption contact process sulfuric acid plant with a rated
                                     25
capacity of 900 tons of acid per day.    Simultaneous samples were collected by
four collaborative test teams (in a manner analogous to that previously described
for two of the Method 5 tests) through four ports located at 90-degree angles in
the stack.
     The collaborative test plan called for the collaborators to obtain 16 samples
during a 2-week period.  The sampling was curtailed by inclement weather, and as
a result only 14 sampling runs were made.  The collaborators were also provided
with standard sulfuric acid solutions to be analyzed along with their field test
samples as described in previous tests.
     An inspection of the collaborative test data revealed that H2$04
mist concentrations in this test varied by as much as an order of magnitude
between collaborators within single runs, with several high values that
were of a magnitude to suggest that they were not representative of the
true concentration in the stack.  Sulfur dioxide determinations showed a
                                        30

-------
similar variation, varying by as much as a factor of two.  A correlation
analysis of the test data showed a significant negative correlation between
the HpSO. mist determinations and the S02 determinations, i.e., high values
reported for acid mist were associated with low S02 values at a greater
frequency than could be expected by chance alone.
     The data from the test were arranged in blocks and analyzed both
with and without the inclusion of six extraordinarily high acid mist
values and their corresponding S02 values.  The precision components
shown in Table 2 for H2SO, mist and in Table 3 for S02> were developed
after these values were excluded from the data set.  It is immediately
obvious that the precision of the acid mist determination was extremely
poor in this test even after the elimination of the six most extreme
values.  Considering that the tested SCL concentrations ranged from
about 480 mg/m  to about 800 mg/m , it is also apparent that the S02
determination, while better than the acid mist, was not highly precise.
     The results from the analyses of the unknown sulfate solutions were
used to evaluate the accuracy and precision of the analytical phase of
the method separate from the field sampling phase.  For the analytical
phase of the method, the within-laboratory standard deviation was found
                                                                       3
to be independent of the mean level, and was estimated as 3.51 mg S02/m .
The between-laboratory standard deviation and the laboratory bias standard
deviation were determined to be proportional to the mean level, and were
estimated as 3.7 percent of 6 and 3.5 percent of &, respectively.  The
analytical  phase was shown to be accurate, within the  precision of the
method,  at all  three levels of concentration studied.   These levels ranged
from 254 to 1,073 mg/m  equivalent S02 concentration.
     From the precision estimates given above,  it is quite evident that
the predominant sources of error were in the field sampling phase of the
                                       31

-------
test.  Because of the significant negative correlation between the H2S04
mist determinations and the S02 determinations, one is immediately
led to suspect some intrinsic problem in the method such as an inability
to satisfactorily separate the S02 and the acid mist fractions of the
sample.  But, with limited data from only one test, it is impossible to
say at present whether the imprecision observed is due to a real
deficiency  in the method, some unknown phenomenon peculiar to the test
site, or to other factors such as those discussed in the preceding
section on  Method 5.  We are  presently planning additional work to
determine the true reliability of Method 8.

OPACITY OF  STACK EMISSIONS
      Collaborative testing of EPA Method 9  (Ref.  1, p. 24895) for visual
determination of opacity of  emissions  from  stationary  sources was conducted
using certified  observers, to obtain  data that would allow statistical
evaluation  of the method.     Three  collaborative  test  sites were used:  a
training  smoke  generator,  a  sulfuric  acid plant,  and a fossil fuel-fired
steam generator. The initial test  on the training  smoke  generator was
conducted  to provide background  information on  the  use of the method,
while the  test  at the sulfuric acid plant and the fossil  fuel-fired  steam
generator  were  conducted  to  obtain  information  on the  use of the method
on applicable  sources under  field conditions.   At no time during any of
the tests  were  warm-up or  practice  runs  allowed  prior  to  the test itself.
      These  tests required  the determination of average opacity, defined as  the
average  of 25  readings taken at  15  second  intervals.   For the purpose of this
study, one  set  of 25 readings was  designated a  "run".  The collaborators began
taking readings  on  a signal  from the  test  supervisor,  and thereafter at 15

                                         32

-------
second intervals until the required 25 observations were obtained.  Concurrent
with the observers' readings, plume opacity readings were taken from the in-
stack transmissometer.  The accuracy of the method was judged by the devi-
ations of the observers'  readings from the actual opacity as measured by this
calibrated in-stack transmissometer.
     Five separate tests of EPA Method 9 were conducted, if both the
white smoke and the black smoke phases of the training generator study are
considered as comprising one test.  For each test, Table 4 presents the
pertinent information on the number of runs completed, the number of
observers participating, and the range of opacity studied.  The studies
were deliberately restricted to the lower opacity ranges within which the
EPA standards lie.
     While the smoke generator and the sulfuric acid plant tests were
designed to evaluate the accuracy and precision of Method 9 as written,
the steam station studies, in addition to this, were designed to in-
vestigate the effects of various factors on the performance of the
method.  The experimental factors studied included the angle of obser-
vation and the relative experience of the observer.  Variations to the
method to be evaluated included reading in 1 percent rather than in 5
percents, and using the average responses of two observers as opposed to
a single observer's result to determine whether these yielded increased
accuracy.  The observers at each test were divided into two groups for the
test, a control and an experimental.  The control group observed the plume
at all times from a position consistent with the method as written and read
in increments of 5 percent.  The experimental group either read the plume
from a more extreme angle in increments of 5 percent or from the same angle
as the control but in increments of 1 percent.  Each group was composed
both of observers who had considerable field experience with the method and
                                       33

-------
of observers who had relatively little such experience.
     Due to the adverse sky and wind conditions during Tests 1 and 2 at
the steam station, not all of the planned evaluations were useful.  There
was an inability to read the low opacity plumes against the type of back-
ground that existed, and as a result, the determinations were generally
well below the concurrent meter average.  The  precision estimated, however,
is  independent of  the accuracy of the determination.  Separate precision
estimates were therefore developed  for these tests, and for the tests at
the other sites.   Composite estimates based upon  the results of all the
tests were also derived, and because the individual estimates were similar
from one test to another, only the  composite estimates shown in Table 3
will be presented  in this report.   Using data  from the training generator
and from Test 3 at the  steam station, a composite estimate of the accuracy
of Method 9 was derived for  ideal  (clear-sky)  conditions.  This estimate
compares the expected deviation of  the observer  from the average metered.
opacity and is given by the  equation, deviation  = 3.13 - 0.31 (meter average),
for the range from 5 to 35  percent  average  opacity.  As the equation indicates,
observers tend to  read  slightly high at the very low opacities, exhibit good
accuracy at around 10 to 15  percent average opacity, and acquire a definite
negative bias at the higher  opacities.
      With respect to the other experimental factors and variables studied, it
was concluded from the  clear-sky  data of Test  3  that  (1) the angle of observation
does affect the observer's  determinations,  and in this study, the most accurate
readings were made when the  group was at an approximately  45 degree angle to
the sun;  (2) the experienced observers were able to read average opacity more
accurately than the inexperienced observers, but the difference occurred mainly
in  the higher opacity range  (>25  percent);  (3) the 1 percent increment data
exhibited greater  within-observer variability  and was  less accurate than the
                                       34

-------
5 percent increment data; and (4) averaging the results of two observers
yielded increased accuracy over the result of a single observer.  Based
partly on the results of these studies, Method 9 was revised and improved
                                                                      ?fi
and has now been repromulgated to replace the original method of 1971.

CARBON MONOXIDE
     A collaborative test of EPA Method 10 for carbon monoxide (CO) (Ref. 2,
pp. 9319-9321} was carried out at a petroleum refinery, where seven collaborators
sampled the emissions from the CO boiler downstream of the fluid catalytic
                                   27
cracking unit catalyst regenerator.    All collaborators simultaneously sampled
through a manifold connected to the CO boiler stack using the integrated
sampling option of Method 10.  Each collaborator obtained four 60-minute samples
per day until 16 runs were completed at each of two CO concentration levels.
In addition to the stack samples, each collaborator analyzed six standard
cylinder gases (CO in nitrogen) that had been supplied for the test by the
National Bureau of Standards.
     It was the intent of the experimental design to maintain the CO concen-
tration constant for the 16 runs at each of the two concentration levels
(.blocks) so that readings within each block could be considered replicates for
the purpose of calculating the within-laboratory precision component.   However,
it was found that the blocks could not be physically maintained at constant
concentration, so an indirect approach based upon the pairing of runs  of similar
concentration was used to estimate the within-laboratory standard deviation of
Method 10.  This value, and the estimated value for the between-laboratory and
laboratory bias standard deviations are given in Table 3.  From an analysis
of the data for the NBS"standards, a somewhat similar between-laboratory term
                       3                           3
was calculated (26 mg/m  as compared to the 32 mg/m  shown in Table 3 for
the field data).  However, the standards data showed about a threefold
                                       35

-------
improvement in the within-laboratory standard deviation over the field data
         3           3
(5.2 mg/m  vs 14 mg/m ), and this is probably due to the presence of some
source variability in the field estimates.
     In analysis of the NBS standards, collaborators differed in the amount
of bias exhibited, and the average bias was dependent on the CO levels.  In
general, a sizeable positive bias was shown at the lower CO levels, but a
negative bias was evident at the highest CO level.  Method 10 as executed
                                                                        2
in this study produced results with only moderate accuracy of +_ 101 mg/m
                                                                             -3
(20 level) on the average over the concentration range of 277 to 1048 rug CO/m .
One factor that  adversely affected the accuracy of Method 10 is that most
commercial NDIR  instruments  have a significant amount of curvature in the
calibration  curves, and many of the collaborators did not adequately correct
for this nonlinearity of response.  Another factor is the calibration gases
themselves,  since some  calibration gas suppliers provided certificates of
analysis that showed errors  of as much as  30 percent when compared with the
NBS standard gases.

BERYLLIUM
     The EPA beryllium method  (Method 104  - Ref. 3, pp. 8846-8850) was collab-
oratively tested in a process plant where  different beryllium ceramic products
are manufactured — a process that involves machining, grinding, blending,
                                28
priming, forming, and polishing.    Air from the process is continuously ex-
hausted through  a series of  HEPA filters before entering the 3-by-5-foot stack
from which sampling was done simultaneously by four collaborators.  This
collaborative test comprised 13 runs, each on a different day, where four
different collaborative organizations sampled simultaneously over the same 30-
point traverse,  with each point being sampled 8 minutes by each collaborator.
The emission levels of beryllium in the stack sampled were low, being in the
                                        36

-------
neighborhood of one-tenth that of the permissible standard emission rate.
     Three types of samples were prepared by the National Bureau of Standards
specifically for this collaborative test:  filters with beryllium oxide, ampoules
with suspended beryllium oxide, and ampoules with soluble beryllium in 0.25
molar hydrochloric acid.  These samples .were given to the collaborators at the
field site to be later analyzed with the field samples in their home laboratories.
     There were three statistical analyses performed.  The primary one was a
two-way analysis of variance to obtain the variance of repeated observations per
collaborator and to obtain the variance between collaborators.  A secondary
analysis was the same except beryllium-loading results were used in place of
the emission rate results.  The third analysis, was to determine if the average
velocity per sampling point per run correctly represented the geometrical variance
in velocity throughout the test run even though they were measured at different
times.
     The precision estimates for the emission rate data are given in Table 2.
Estimates derived from the beryllium-loading results were virtually identical to
these, so it is evident that the velocity and volumetric flow rate measurements
did not contribute significantly to the imprecision observed in the emission
rate data.  It appears that almost all the differences between collaborators
during a run were due to differences in the solution (wash plus impinger contents)
portion of the samples.  Three of the four collaborators did not differ signifi-
cantly in the amount of beryllium collected per run on their filters.  Since, on
the average, about 77 percent of the beryllium was collected from the solution
portion of the sample (probably from the nozzle and probe washes), it is likely
that the sample clean-up was a major source of error.  This would be compounded by
the fact that beryllium concentrations were extremely low at this test site.
      The collaborators relative precision in the measurement of beryllium from
the NBS standard samples was considerably greater than for their field samples
                                        37

-------
(within-laboratory standard deviation of about 10 percent), but the standard
samples contained larger amounts of beryllium.  However, analysis of these
standard samples indicated a definite collaborator bias, which in general
was proportional to the beryllium level, and, on the average, was about
20 percent negative.  The average bias on the filter samples was essentially
zero, but only because large negative and positive biases cancelled out.
One collaborator exhibited essentially no bias on any of the sample types,
and one laboratory measured the filter concentrations without bias.  Since
one collaborator always managed to measure beryllium without bias and since
bias was sometimes positive and sometimes negative, it is apparent that the
observed bias  is a property of the collaborators rather than being inherent
in the method  itself.  Thus, because of questionable competency of some of
the collaborators, it  is  unlikely that the true performance capabilities of
Method  104 were  determined by this test.
                                         38

-------
TABLE 1.  METHODS COLLABORATIVELY TESTED UNDER THE METHODS STANDARDIZATION PROGRAM
 Parameter
 Method of determination
EPA Method No.-
 Stack gas velocity and
 volumetric flow rate

 Stack gas molecular
 weight and CO excess air

 Stack gas moisture
 content

 Particulates
 Sulfur dioxide
 Nitrogen oxides

 Sulfuric acid
 mist/sulfur dioxide

 Opacity of stack
 Carbon monoxide
 Beryllium
S-type pi tot tube
Orsat
Condensation and volumetric
measurement

Dry filtration and gravimetric
determination

Selective absorption and
barium thorin titration

Phenol disulfonic acid

Selective absorption and
barium thorin titration

Visual estimation of
percent opacity

Nondispersive infrared
absorption

Filtration/impingement and
atomic absorption
      2


      3


      5


      5


      6


      7

      8


      9


     10


    104
^/Methods 2, 3, 5, 6, 7, 8, and 9 are described in Reference 1, Method 10 in
  Reference 2, and Method 104 in Reference 3.
                                      39

-------
TABLE 2.  PRECISION ESTIMATES FOR THOSE PARAMETERS WHERE STANDARD DEVIATION
          WAS PROPORTIONAL TO THE MEAN VALUE. 6
Method Mo.
2
2
5
6
7b/
?£/
8
104
Parameter, units
Velocity, ft/sec
3
Volumetric flow rate, ft /hr
3
Parti cul ate matter, mg/m
/ 3
S02, mg/m
NOX, mg/m3
MOX, mg/m3
H2S04 mist (including S03),
mg/m3
Be, g/day
Standard deviations, ,
percent of mean value (6)—
a
3.9
5.5
10.4
4.0
6.6
14.9
58.5
43.5
cb
5.0
5.6
12.1
5.8
9.5
18.5
66.1
57.7
aL
3.2
1.1
6.1
4.2
6.9
10.5
30.8
37.9
— a = within-laboratory deviation; a. = between-laboratory deviation,
  a. =  laboratory  bias.


— Pooled  power  plant/pilot combustion plant data.

— Nitric  acid plant  data.
                                       40

-------
TABLE 3.  PRECISION ESTIMATES FOR THOSE PARAMETERS WHERE STANDARD DEVIATION
          MAS INDEPENDENT OF THE MEAN VALUE.
Method No.
3
3
3
5
8
9
10
Parameter, units
C02» percent
02, percent
Dry mol. wt, g/g-mole
Moisture fraction
S02, mg/m3
Opacity, percent
CO, mg/m
Standard deviation ,
parameter units-'
a a, a,
b L
0.20 0.40 0.35
0.32 0.61 0.52
0.035 0.048 0.033
0.009 0.012 0.008
123 115 ' 99
2.05 2.42 1.29
14.3 32.3 29.0
  a = within-laboratory deviation; a.= between-laboratory deviation;
  a.  = laboratory bias.
                                        41

-------
TABLE 4.  COLLABORATIVE TEST OF EPA METHOD 9
Site/ test
Training generator:
White smoke
Black smoke
Sul f uric acid plant
Steam station/test 1
Steam station/test 2
Steam station/ test 3
Number
of runs

20
16
30
10
18
24
Number
of observers

9
9
11
10
10
8
Opacity range,
percent

0-35
0-35
0-15
0-30
0-25
0-40
                         42

-------
                                     SECTION 5

                                     REFERENCES

 1.  U.S. Environmental Protection Agency.  Standards of Performance for New
     Stationary Sources.  Federal Register.  36(247):24876-24895. 1971.

 2.  U.S. Environmental Protection Agency.  Standards of Performance for Mew
     Stationary Sources.  Federal Register.  39_(47): 9308-9323, 1974.

 3.  U.S. Environmental Protection Agency.  National Emission Standards for
     Hazardous Air Pollutants.  Federal Register.  38_( 66): 8820-8850, 1973.

 4.  Mitchell, W. J. and M. R. Midgett.  Improved Procedure for Determining
     Mercury Emissions from Mercury Cell Chlor-Alkali Plants.  Jour. APCA.
     (In Press.)  U.S. Environmental Protection Agency, Research Triangle Park,
     North Carolina.

 5.  Knoll, J. E. and M. R. Midgett.  Determination of Hydrogen Sulfide in
     Refinery Fuel Gases.  U.S. Environmental Protection Agency, Research
     Triangle Park, North Carolina.  (In Press.)

 6.  U.S. Environmental Protection Agency.  Standards of Performance for New
     Stationary Sources.  Federal Register.  40_( 152): 33157-33166, 1975.

 7-  U.S. Environmental Protection Agency.  National Emission Standards for
     Hazardous Air Pollutants.  Federal Register.  40_( 248): 59549-59550, 1975.

 8.  Mitchell, W. J. and M. R. Midgett.  Adequacy of Sampling Trains and
     Analytical Procedures Used for Fluoride.  Atmos. Environ.  (In Press.)
     U.S. Environmental Protection Agency, Research Triangle Park, North Carolina.

 9.  Youden, W. J.  Statistical Techniques for Collaborative Tests.  The
     Association of Official Analytical Chemists, Washington, D. C., 1973.
     pp. 33-36.

10.  Mitchell, W. J. and M. R. Midgett.  Means to Evaluate Performance of
     Stationary Source Test Methods.  Environ. Sci. Technol. ^0(l):85-88, 1976.

11.  Hamil, H. F. and R. E. Thomas.  Collaborative Study of Method for the
     Determination of Particulate Matter Emissions from Stationary Sources
     (Fossil Fuel-Fired Steam Generators).  U.S. Environmental Protection Agency
     Research Triangle Park, North Carolina.  Report No. EPA-650/4-74-021.   1974.

12.  Hamil, H. F. and D. E. Camann.  Collaborative Study of Method for the
     Determination of Particulate Matter Emissions from Stationary Sources
     (Portland Cement Plants).  U.S. Environmental Protection Agency, Research
     Triangle Park, North Carolina.  Report No. EPA-650/4-74-029.  1974.

13.  Hamil, H. F. and R. E. Thomas.  Collaborative Study of Method for the
     Determination of Particulate Matter Emissions from Stationary Sources
     (Municipal Incinerators).  U.S. Environmental Protection Agency, Research
     Triangle Park, North Carolina.  Report No.  EPA-650/4-74-022.  1974.


                                        43

-------
14.   Hamil, H. F. and R. E. Thomas.  Collaborative Study of Participate
     Emissions Measurements by EPA Methods 2, 3, and 5 Using Paired Particulate
     Sampling Trains (Municipal Incinerators).  U.S. Environmental  Protection
     Agency, Research Triangle Park, North Carolina.  Report No. EPA-600/4-76-014.
     1976.

15.   Dixon, W. J. and F. J. Massey, Jr.  Introduction to Statistical Analysis,
     3rd Ed.  New York.  McGraw-Hill.  1969.

16.   Hamil, H. F., D. E. Camann, and R. E. Thomas.  The Collaborative Study of
     EPA Methods 5, 6, and 7 in Fossil Fuel-Fired Steam Generators, Final Report.
     U.S. Environmental Protection Agency, Research Triangle Park,  North Carolina.
     Report No. EPA-650/4-74-013.  1974.

17.   Hamil, H. F., R. E. Thomas, and N. F. Swynnerton.  Evaluation  and Collabo-
     rative Study of Method for Visual Determination of Opacity of Emissions from
     Stationary Sources.  U.S. Environmental Protection Agency, Research Triangle
     Park, North Carolina.  Report No. EPA-650/4-75-009.  1975.

18.   Hamil, H. F. and R. E. Thomas.  Collaborative Study of Method for Determination
     of Stack Gas Velocity and Volumetric Flow Rate in Conjunction with EPA Method  5.
     U.S. Environmental Protection Agency, Research Triangle Park,  North Carolina.
     Report No. EPA-650/4-74-033.  1974.

19.   Hamil, H. F.  Laboratory and Field Evaluations of EPA Methods 2, 6, and 7.
     U.S. Environmental Protection Agency, Research Triangle Park,  North Carolina.
     Report No. EPA-650/4-74-039.  1973.

20.  Hamil, H. F. and R. E. Thomas.  Collaborative Study of Method for Stack Gas
     Analysis and Determination of Moisture  Fraction with Use of Method 5.  U.S.
     Environmental Protection Agency,  Research Triangle Park, North Carolina.
     Report No. EPA-650/4-74-026.  1974.

21.  Mitchell, W. J. and M. R. Midgett.  Field Reliability of the Orsat Analyzer.
     U.S.  Environmental Protection Agency, Research Triangle Park, North Carolina.
     Jour. APCA.  26j_5):491-495.   1976.

22.  Mitchell, W. J. and M. R. Midgett.  Method for Obtaining Replicate Particulate
     Samples  from Stationary Sources.  U.S.  Environmental Protection Agency, Research
     Triangle Park,  North  Carolina.  Report  No. EPA-650/4-75-025.  1975.

23.  Knoll, J. E. and M. R. Midgett.   The Application of EPA Method 6 to High Sulfur
     Dioxide  Conentrations.  U.S.  Environmental Protection Agency, Research Triangle
     Park, North Carolina.  (In  Press.)

24.  Hamil, H. F. and  R. E. Thomas.  Collaborative Study of Method for the Deter-
     mination of Nitrogen  Oxide  Emissions from Stationary Sources  (Nitric Acid
     Plants).  U.S.  Environmental  Protection Agency, Research Triangle Park, North
     Carolina.  Report  No. EPA-650/4-74-028.  1974.

25.  Hamil, H. F., D.  E. Camann,  and R. E. Thomas.  Collaborative Study of Method
     for  the  Determination of  Sulfuric Acid  Mist  and Sulfur Dioxide Emissions from
     Stationary Sources.   U.S. Environmental  Protection Agency, Research Triangle
     Park, North Carolina.  Report No. EPA-650/4-75-003.  1974.

                                     44

-------
26.  U.S. Environmental Protection Agency.   Standards of Performance for New
     Stationary Sources.  Federal Register.   39i(219): 39874-39876.   1974.

27.  Constant, P. C. Jr., G. Sheil, and M.  C. Sharp.   Collaborative Study
     of Method 10 - Reference Method for the Determination of Carbon Monoxide
     Emissions from Stationary Sources - Report of Testing.   U.S.  Environ-
     mental Protection Agency, Research Triangle Park, North Carolina.
     Report No. EPA-650/4-75-001.  1975.

28.  Constant, P. C. Jr., and M. C. Sharp.   Collaborative Study of Method
     104 - Reference Method for Determination of Beryllium Emission from
     Stationary Sources.  U.S. Environmental Protection Agency, Research
     Triangle Park, North Carolina.  Report No. EPA-650/4-74-023.   1974.
                                   45

-------
                                  TECHNICAL REPORT DATA
                           (Please read Instructions on the reverse before completing)
 RE = CHT NO.
 EPA-600/4-76-Q44
                             2.
                                                          3. REGIMENT'S ACCESSION-NO.
 TITLE ANDSUBTITLE
 THE EPA PROGRAM  FOR  THE STANDARDIZATION OF STATIONARY
 SOURCE EMISSION  TEST METHODOLOGY - A REVIEW
            5. REPORT DATE
               August  1976
            6. PERFORMING ORGANIZATION CODE
. AUTHOR(S)

M.  Rodney Midgett
                                                          8. PERFORMING ORGANIZATION REPORT NO.
 PERFORMING ORGANIZATION NAME AND ADDRESS
                                                           10. PROGRAM ELEMENT NO.
                                                             1HD621
                                                           11. CONTRACT/GRANT NO.
 2. SPONSORING AGENCY NAME AND ADDRESS
 Environmental Monitoring and Support Laboratory
 Office of  Research  and Development
 U.S. Environmental  Protection Agency
 Washington. D.  C.   20460	
             13. TYPE OF REPORT AND PERIOD COVERED
               Final - In-house
             14. SPONSORING AGENCY CODE
               EPA-ORD
15. SUPPLEMENTARY NOTES
16. ABSTRACT

      This  report contains the results  from a  program designed to standardize  those
 emission test methods promulgated by the  EPA  for use in determining compliance  with
 Federal emission standards.  The approach taken has been to conduct at  least  a
 limited laboratory and field evaluation,  followed by an interlaboratory collaborative
 test of each method.  Emphasis here is placed on the collaborative testing, the re-
 sults of which are presented in terms  of  within-laboratory, between-laboratory, and
 laboratory bias standard deviations.   These estimates are based on single-run results,
 and not on the results of three consecutive runs as would be required in  conducting
 compliance .testing.  A brief discussion is given of the manner in which the precision
 estimates  are derived.  Determination  of  method accuracy is also considered where
 practical.  The design of each test, deficiences in test designs, and other problems
 affecting  the test results are discussed.  An improved test design that overcomes
 most of  the problems observed in earlier  tests is described.  A brief discussion of
 current  projects and future plans is given as well as references to the numerous
 reports  on the results of the methods  standardization activities.
17.
                                KEY WORDS AND DOCUMENT ANALYSIS
                  DESCRIPTORS
                                              b.lDENTIFIERS/OPEN ENDED TERMS
                             COS AT I Field/Group
     Air Pollution
     Sampling
     Evaluation
 Collaborative Testing
 Methods Standardization
   (or Methods Evaluationi
 Stationary Sources
 Emissions Testing Method*
                             13B
13. DISTRIBUTION STATEMEN1

 Release  to Public
19. SECURITY CLASS (ThisReport)'

 Unclassified
                                                                          I. NO. OF PAGES
                                              20. SECURITY CLASS (Thispage)
                                               Unclassified
                                                                        22. PRICE
.51
EPA Form 2220-1 (9-73)
                                            46

-------