FINAL REPORT
U.S. ENVIRONMENTAL PROTECTION AGENCY
    Office of Air and Waste Management
 Office of Air Quality Planning and Standards
Research Triangle Park,  North Carolina 27711

-------
                                       EPA-450/3-75-070
                SOTDAT
              i
          FINAL  REPORT
                      by

                     TRW
Transportation and Environmental Engineering Operations
               800 Follin Lane, SE
              Vienna, Virginia 22180
          Contract No. 68-02-1007, Task 3
       EPA Project Officer:  Gregory Bujewski
                  Prepared for

       ENVIRONMENTAL PROTECTION AGENCY
         Office of Air and Waste Management
     Office of Air Quality Planning and Standards
     Research Triangle Park, North Carolina 27711

                   July 1975

-------
This report is issued by the Environmental Protection Agency to report
technical data of interest to a limited number of readers. Copies are
available free of charge to Federal employees, current contractors and
grantees, and nonprofit organizations - as supplies permit - from the
Air Pollution Technical Information Center, Environmental Protection
Agency, Research Triangle Park, North Carolina 27711; or, for a fee,
from the National Technical Information Service, 5285 Port Royal Road,
Springfield, Virginia 22161.
This report was furnished to the Environmental Protection Agency by
TRW, Transportation and Environmental Engineering Operations,
Vienna, Virginia 22180, in fulfillment of Contract No.  68-02-1007.  The
contents of this report are reproduced herein as received from TRW,
Transportation and Environmental Engineering Operations.  The opinions,
findings,  and conclusions expressed are those  of the  author and not
necessarily those of the Environmental Protection Agency.  Mention of
company or product names is not to be considered as  an endorsement
by the Environmental Protection Agency.
                   Publication No. EPA-450/3-75-070

-------
                          1.0  INTRODUCTION

1.1   GENERAL DESCRIPTION OF THE SOURCE TEST DATA (SOTDAT) SYSTEM
      Throughout the country, there is a vast amount of source test data
which has been compiled in recent years.  These data are on file in EPA
offices, both in Durham ancTin the"regions, in~state and local control
agency offices, with private consultants who have conducted stack tests,
industrial plants where tests have been run, control equipment manufactur-
ers, and others.  Up until now, these data have been of little use to any-
one needing a large amount of data, because they are stored in so many
different places and formats.
      The Source Test Data (SOTDAT) System is a useful solution to that
problem.  The ^OTDAT System permits the gathering of source test data from
many places and their storage in a computer-accessible data bank in a com-
mon format.  SOTDAT is designed so that each record describes, in detail,
one run of a stack test.  Variables included are most of those which enter
into the normal stack test calculations, as well as some which will be
necessary to future users of.SOTDAT.  Information stored in SOTDAT contains
an adequate number of source parameters (e.g. plant name, location, stack
height, etc.) and concentrates heavily on data describing a specific test
run.  Since each SOTDAT record is keyed to a record in the National Emis-
sions Data System (NEDS), any required source parameters are readily avail-
able from a NEDS listing.  An exception to this will exist in the case where
test data are coded anonymously in order to protect the .confidentiality of
"the data.  For a complete list and description of the SOTDAT variables, see
the August 1973 National Air Data Branch publication "Source Test Data Sys-
tem (SOTDAT)" which describes in detail each data element.

1  2   VALUE OF SOTDAT INFORMATION
      The data contained in  the SOTDAT System will be useful for many
purposes.  The single fact which makes these data so useful is that, in-
stead of being a mixture of  measured, calculated, and estimated data as
NEDS is, SOTDAT is composed  entirely of measured data.  This greatly in-
creases the reliability of any deductions based on data from SOTDAT.

-------
      The most immediate use to which SOTDAT will be put is to validate
and/or correct existing emission factors, and to create new ones in areas
where factors have not yet been compiled.  In conjunction with this use,
SOTDAT could probably be used as a validity check on the NEDS system.
Estimated emissions in NEDS which are grossly inconsistent with SOTDAT-
generated factors could be flagged for further investigation.
      Another use for data in SOTDAT is the development of accurate methods
for calculating control device efficiencies based on specific operating
parameters.  These parameters are part of the SOTDAT data base.
      A system which contains the type of basic, fundamental data that
SOTDAT does, is sure to become extremely valuable in the future.   Data which
deal with actual (not estimated or calculated) emissions from specific
pollution sources is certainly more valuable than what has been available
thus far.  Hopefully, the individuals charged with maintaining the SOTDAT
system will be sensitive to the needs for this data, and will remain flexible
enough to implement changes as they are needed.

-------
                       2.0  DATA ACCUMULATION

2.1   DATA ON FILE IN THE EMISSIONS MEASUREMENT BRANCH OF EPA
      Many source tests have been performed by personnel of EPA's Emissions
Measurement Branch (EMB) or by EMB-obtained private contractors.  Results
from many of these tests are also on file in the National Air Data Branch
(NADB), and were therefore available for removal from Durham.  The data
from these 155 reports were coded onto SOTDAT coding forms in TRW's McLean,
Va. office.  This effort produced 1292 completed coding forms.
      Following the data validation process described in Section 3.0, the
data from another 9  test reports (submitted to NADB after the original
coding effort) were entered on 109 SOTDAT coding forms.
      Another 26 test reports were on file in the EMB office but not in the
NADB.  Since these reports could not be removed, the data they contained were
coded in Durham.  This additional data generated 209 completed forms.
2.2   DATA ON-FILE IN THE EMISSION STANDARDS AND ENGINEERING DIVISION OF EPA
      The Emission Standards and Engineering Division of EPA has a file of
incinerator test results located in the IRL Building in Durham.  These are
test reports which have been submitted to EPA in an attempt to obtain EPA
certification for a specific model of an incinerator.  The file contains
reports on incinerators which have received certification as well as those
which have been unable to meet the certification standards.  Data from 68
reports were coded onto 173 coding forms during the data accumulation effort
expended in this location.
2.3   SUMMARY OF RESULTS
      All in all, during the project, 190  source test reports were.read;
data were extracted from them and coded onto 1607 SOTDAT coding forms.
      The data now present in the SOTDAT system comprise a relatively good
cross section of most types of industries.   However, since the majority of
the tests were performed to accumulate data to be used in the establishment
of New Source Performance Standards,  the SOTDAT data base may presently be
biased toward the better controlled,  or more efficient sources.

-------
                        3.0  DATA VALIDATION

3.1   NEED FOR VALIDATION
      After the coding effort was completed, the data were keypunched and
loaded into the computer.  The resulting output from the system revealed
that several problems existed either in the input data, or in the computer.
program.  It was decided that for the SOTDAT System to be a truly useful
tool, it would be necessary to rectify as many of the" existing problems
as possible.
3.2   DATA VALIDATION EFFORT
      The apparent errors noted during the initial brief examination of the
computer output included missing data, erroneous values, and unexplainable
printed symbols in the place of data.  The approach employed to identify
and correct the errors involved reviewing each output record (one per input
coding form),  and checking for noticeable errors of the type listed above.
      Whenever suspected errors were discovered, it was necessary to deter-
mine whether an error actually existed, and if so, note it appropriately
for later correction.  This was accomplished by checking each entry which
appeared to be wrong against the original coding form, and then, if neces-
sary, against the data in the stack test report.  True errors were noted
directly on the output.
      At the time this effort was taking place, it was impossible to ascer-
tain definitive procedures for updating the data stored in the computer, so
the changes were made directly on the original  coding forms.   This retained
the maximum flexibility since either the entire form could be repunched, or
just the card or cards which required correction.   This resulted in approxi-
mately 300 coding forms which contained errors.
      Examination of the apparent errors demonstrated many, general problems
with the computer program, and generated several suggestions  for improving
the system.  They were noted during the validation process and are discussed
in Appendix B.

-------
                         4.0  PROBLEM AREAS

4.1   QUESTIONS ARISING DURING THE PROJECT
      The instruction manual supplied by the EPA Project Officer was very
complete and made a very successful attempt to deal with all problems
which might occur.  However, a few questions required answers which were
not available from the manual.  These questions, along with their answers
were documented as they arose, and a copy was given to the Project Officer
at the completion of the task, (see Appendix A).  The problems raised by
the questions should be considered prior to any future revision to the
coding procedure manual.
      Probably the most serious problem deals with the case where an ex-
haust gas stream from a single pollution-producing piece of equipment is
split into two or more streams, not all of which are sampled.  In this situ-
ation, there is no correlation between the process rate for the piece of
equipment, and-the emissions as determined by the test.  Either the process
rate must be reduced a proportionate amount, or the emissions increased.  The
problem is what (if any) apportioning factor to use.
      Another problem applicable almost exclusively to the EMB data was the
lack of process and control equipment efficiency data.  Without these two
data elements, emission factors cannot be calculated.  Some effort should be
made to insure that these data are taken during a test, and, equally impor-
tant, that they are included in the report.

-------
                    5.0  SUGGESTED FUTURE EFFORT


      The two most promising areas for obtaining additional data are proba-

bly the individual state control agency offices, and the control equipment

manufacturers.

      o  State Control Agency Offices - Although there are probably less
         data available from the state offices, they will certainly be
         easier to obtain.  Some states have already expressed an interest'. .
         in having their data coded into SOTDAT, and it seems unlikely   "•'''''
         that other states would refuse to make their data available.  All
         states will, however, resent having to supply the manpower neces-
         sary for the coding effort.
      o  Control Equipment Manufacturers - Control  equipment manufacturers
         usually conduct an inlet and outlet stack  test whenever a new piece
         of equipment is delivered, to insure that  the guaranteed efficiency
         is being met.  Therefore there is a large  amount of test data in
         existence, but the manufacturers are extremely reluctant to release
         the data without"first making them anonymous.  They are afraid of.'
         releasing any proprietary information about their customers.  How-
         ever, the great amount of data available,  and the usefulness in
         evaluating control devices may justify the additional time and
         expense required to obtain it.

-------
                            6.0  SUMMARY

      The effort expended on this project has produced a sizable data bank
of SOTDAT data, and the data obtained are a good representation of most
types of pollution sources.  However, this effort only scratches the surface
of what is available.  Many other sources of data are available in addition
to those discussed in the preceeding section.  Some of these are private
consulting firms which have conducted tests, industrial  trade associations,
plants which have either done their own testing or contracted for required
tests, and other government agencies which have conducted tests in connec-
tion with research and development projects or the preparation of environ-
mental impact statements and/or permit applications.  Since the SOTDAT sys-
tem has the potential to accept a very large amount of additional data, and
since there exists a virtually unending supply of data,  the data accumula-
tion can continue far into the future, constantly improving and increasing
the capabilities and value of the system.

-------
                             APPENDIX  A
           QUESTIONS AND ANSWERS CONCERNING SOTDAT CODING

      During the initial SOTDAT coding effort, a list of questions was
compiled, the answers to which could not be determined from the manual
of coding procedures.  Those questions are presented here along with the
answers supplied by the Project Officer.  It is hoped that this will fa-
cilitate future coding of source test data by persons unfamiliar with the
system.                            .    •                               •
      Q.  Are Orsat analyses considered as test results for coding on
          C cards?
      A.  No.  These data are to be entered in field B 10.

      Q.  Should control devices listed on D cards be all devices on the
          piece of equipment or just those indicated in field C 05?
      A.  Only those in C 05.
      Q.  If a device control efficiency is unknown, what code should
          be used?
      A.  Use the code for a medium efficiency device.
      Q.  What pollutant code should be used for total  gaseous hydro-
          carbons, since "total" is listed under aliphatic compounds
          and "gross" is listed under aromatic compounds?
      A.  Use code 3101.
      Q.  Are gaseous samples which are taken  non-simultaneously with a
          particulate sample considered part of the same run?
      A.  If at least one-half of the gaseous  sample was taken during the
          particulate sample, they are considered to be part of the same
          run.  If not, code it on a separate  form even though most process
          stream parameters will be unavailable.
      Q.  If a traverse point is sampled more  than once during a particu-
          late run, how many times is it counted for coding in field B 06?
      A.  Only once.
      Q.  If a test is actually performed, and the result is nil or zero
          (below the detection limit for the method used) should the test
          be recorded?
      A.  Yes.  Enter the result as zero.
                                   8

-------
Q.  Is the code for participate caught by a control device to be
    "total particulate", "filterable particulate", or "condensable
    participate"?

A.  Use code "A1101" (total participate), because most particulate
    devices are designed for controlling both filterable and con-
    densable fractions, and design efficiencies (field D 02) are
    usually given in terms of total particulate.  However, if de-
    sign efficiencies are given for the other particulate fractions.
    They should also be entered alongside their respective pollu-
    tant codes  ("B 1101" for "filterable particulate", and "C 1101"
    for condensable particulate).

Q.  What should be done with data that are either too large or small
    to "fit" in the field(s) allotted for them on the coding form?

A.  Enter in "Comments" (Section E).  Fill the appropriate field(s)
    with nines.

Q.  How does one enter a negative pollutant temperature?

A.  Leave field C 07 blank, and write the true temperature in "Comments",

Q.  If effective duct cross-sectional area is different from the de-
    si gn~lireirTd~ue to negative flow or sediment build up) which should
    be entered in field B 03?

A.  Enter effective area in field B 03 and write the actual  area in
    "Comments".

Q.  If a single stack, fed by several gas streams, each containing a
    different number of control devices is sampled, how many devices
    are considered to be upstream from the sampling point?

A.  The number of devices found in the stream containing the largest
    number of devices is used.   Indicate in comments.

Q.  Are operating parameters (field D 05) "operating" or "design"
    values?
A.  Operating.   No design data are to be entered in this field.

Q.  If the exhaust from a single piece of equipment breaks into two
    or more separate gas streams, and both streams are tested, what
    values are entered in fields A 11 and A 12 (activity levels)?

A.  None.  Leave those fields blank and enter activity levels in "Com-
    ments" along with a statement such as:. "This form contains test
    data from one of three stacks.  See form numbers 	  and
    	  for data on the other stacks".

-------
                             APPENDIX  B

                  OBSERVATIONS ON THE SOTDAT SYSTEM


During the data validation process, several items (some essential and
some not) came to mind concerning ways to improve the SOTDAT System.
These were noted at the time, and are discussed in this appendix.

1.    The original EPA Project Officer directed that instead of using a
      great amount of time writing the plant name and address on each form,
      the name and address be written only on the first form of a series of
      tests at a plant, and the form number of that first form be written
      in place of the name and address.on subsequent forms.  It was sup-;v ^
      posed to be included in'the keypunching instructions that the name'"1 /
      and address from the first form'be duplicated on the subsequent
      forms, but the instruction was apparently either not given or mis-
      understood.  Therefore on forms with a form number greater than
      A 00390, the form number of the first form for a series of tests
      appears in the output as the name and address on subsequent forms.
2.    Test results are coded three per C card.  If the computer finds data
      in the first test results fields it expects to find data in the re-
      maining fields.   Therefore fields which are specified as requiring
      numeric data and are left blank are interpreted as containing .illegal
      characters, and are printed out as ampersands.

3.    Related to the previous problem is the problem of how to treat un-
      known data.  All  data in the system now were coded assuming (as is the
      case with NEDS)  that unknown data should be left blank, while data
      with a numeric value of zero should be coded as a zero.  Both types
      of entries are printed out as zero in cases where blanks are legal
      characters for the field, and ampersands where they are illegal.
      Some of the fields where they are illegal  are; "Control Device Year
      Installed", "Sampling Location", "Flow Rate",  "Flow Rate Units", "Test
      Method", and "Sampling Location".
4.    On almost all  records, the NEDS ID data (State, County, AQCR, Plant,
      and Point) are incomplete.   These should be as complete as possible
      due to the fact  that these items are used  by the computer, along with
      run number, to sort and group the stored data.  It was  decided during
      the original  coding effort that contractor time could be better spent
      coding data,  leaving the NEDS/SOTDAT correlation for NADB personnel.
      Based on instructions from the project officer, this is the approach
      which was taken.   Determination of the NEDS ID data is  a matter of
      taking the plant's city and state from the SOTDAT form, going to an
      atlas and looking up that state and city in the index.   From the index,
      the county can be determined.   Then the AQCR can be found in AP-102.
      After the state,  county and AQCR NEDS codes have been found, then the
      NEDS Plant ID can be determined by checking for that plant's name in
      a listing of NEDS sources.   This of course will be successful only
      if the plant in  question has been input into NEDS (not the case for
      test data from foreign plants, or for data which are to be input into
      SOTDAT anonymously).
5.    Frequently, extra zeroes randomly appear at" the- begi'ririTng-oT some of"
      the output data  fields.   Some of these are:  "Process Rate" (both capa-
      city and "This Run"), "Test Result," "Cross Section Area," and "Flow
      Rate".
                                 :   10

-------
 6.     The output would be cleaner and easier to read if some or all  of the
       unused fields  (all  printed as  zero)  were suppressed.

 7.     One digit is often  not sufficient to code "Sampling Location".   In
       those cases, to  prevent the ampersand when left blank, a nine  was
       coded in that  field,  and the actual  value was  written in "Comments".
       For future efforts, however, it is probably unnecessary to enter
       the actual  value in "Comments" since only the  fact that the value
       is greater than  seven is of any significance.

 8.     Where there is'more than one control  device entered.on a form,  the
       computer drops the  pollutant codes for all devices past the first one.

 9.     During the original coding effort, two forms were inadvertantly
       coded with form  number A 00098.  It  seems unlikely that either one
       of them is currently  stored in the computer.  Additionally, the A 00098
       form for the Wood River Power  Plant  should have 77.32 instead  of 30.44
       coded in the "Gas Pressure" field.  No attempt was made to correct
       this problem since  the proper  procedure for correction was unknown.
10.     When trace metal  sampling results are to be coded in field C 06, the
       results will often  be too small to enter in the field.  It is  suggested
       that another Units  code be adopted-for field C 03 to represent  milli-
       micrograms per cubic  meter.   .
                                     IT

-------
                                   TECHNICAL REPORT DATA
                            (Please read Instructions on the reverse before completing)
 1. REPORT NO.
  EPA-450/3-75-070
                              2.
                                                           3. RECIPIENT'S ACCESSION* NO.
4. TITLE AND SUBTITLE

  SOTDAT Final Report
             5. REPORT DATE
                July, 1975
                                                           6. PERFORMING ORGANIZATION CODE
7. AUTHOR(S)
                                                           8. PERFORMING ORGANIZATION REPORT NO:
                                                             96005.003
9. PERFORMING ORGANIZATION NAME AND ADDRESS

  TRW
  Transportation and  Environmental .Engineering Operations
  800 Foil in Lane,  SE
  vjgnnqii Virginia  22180	
12. SPONSORING AGENCY NAME AND ADDRESS

  U.S. Environmental  Protection Agency
  Office of Air Quality Planning & Standards
  Research Triangle Park, N.  C.  27711
                                                           10. PROGRAM ELEMENT NO.
             11. CONTRACT/GRANT NO.
               68-02-1007
             13. TYPE OF REPORT AND PERIOD COVERED
               Final Report..
             14. SPONSORING AGENCY CODE
 15. SUPPLEMENTARY NOTES
16. ABSTRACT
       Throughout the  country, there is a vast amount of source test data which  has
  been compiled in  recent years.  Up until now,  these data have been of  little use
  to anyone needing a  large amount of data, because ,they are stored in so many
  different places  and formats.
       The Source Test Data (SOTDAT) System is a useful  solution to that problem.   The
  SOTDAT System permits the gathering of source  test  data from many places and their
  storage in a computer-accessible data bank  in  a  common format.  SOTDAT is  designed so
  that each record  describes, in detail, one  run of a stack test.  Variables  included
  are most of those which enter into the normal  stack test calculations, as  well  as
  some which will be necessary to future users of  SOTDAT.  Information stored in  SOTDAT
  contains an adequate number of source parameters and concentrates heavily  on data
  describing a specific test run.  Since each SOTDAT  record is keyed to a record  in the
  National Emissions Data System (NEDS), any  required source parameters are  readily
  available from a  NEDS listing.  An exception to  this will exist in the case where
  test data are coded  anonymously in order to protect the confidentiality of the  data.
  For a complete list  and description of the  SOTDAT variables, see the August 1973
  National Air Data Branch publication "Source Test Data System (SOTDAT)" which
  describes in detail  each data element.
17.
                                KEY WORDS AND DOCUMENT ANALYSIS
                  DESCRIPTORS
                                              b.lDENTIFIERS/OPEN ENDED TERMS
                           c. COSATI Field/Group
  SOTDAT
  NEDS
  Emission Factors
18. DISTRIBUTION STATEMENT
  Release Unlimited
                                              19. SECURITY CLASS (This Report)
                                                Unclassified
                                                                         21. NO. OF PAGES
                               15
20. SECURITY CLASS (This page)

  Unclassified
EPA Form 2220-1 (9-73)
                                            12

-------