United States
            Environmental Protection
            Agency
              Environmental Monitoring
              Systems Laboratory
              Research Tnangle'Park NC 27711
EPA-600/9-79-042
September 1979
vvEPA
            Research and Development
Data Validation
Conference Proceeding^

-------
                  RESEARCH REPORTING SERIES


Research reports of the Office of Research and Development, U.S. Environmental
Protection Agency, have been grouped into nine series. These nine broad cate-
gon' -  % - -e established to facilitate further development and  application of en-
vironm, ,di technology. Elimination of traditional grouping was  consciously
planned to foster technology transfer and a maximum interface in related fields
The nine series are:

    1.  Friivnonmental Health Effects Research

    2  Br/ironmental Protection Technology

    3.  Ecological Research

    4.  E'r.Konmental Monitoring

    5  Secioeconomic Environmental Studies

    8  Sc«;;ntific and Technical Assessment Reports (STAR)

    7  io;.jragency Energy-Environment Research and Development

    3  "' ^.<>;;,a\" Reports

    9  M'i'nellaneous Reports


This ref-:vt has been assigned to the MISCELLANEOUS REPORTS series. This
series is , t,:erved for reports whose content does not fit into one of the other specific
series  Conference proceedings, annual reports, and bibliographies are examples
of mitcv.'uneous reports.
                        EPA REVIEW NOTICE
         • r•,.$ been reviewed by the U.S. Environmental Protection Agency, and
        . -'a' publication. Approval does not signify that the contents necessarily
refleU 1; •; views and policy of the Agency, nor does mention of trade names or
commH.G.a! products constitute endorsement or recommendation for use.

        jr.iont is availabletothepublicthroughtheNationalTechnical Information
        ..\irgfield, Virginia 22161.

-------
                DATA VALIDATION CONFERENCE

                       Proceedings


                 Hosted and Sponsored by
         The U.S. Environmental Protection Agency
RTF Inter!aboratory Quality Assurance Coordinating Committee
                     November 4, 1977
                         Edited by
                     Raymond C. Rhodes
                            and
                    Seymour Hochheiser
             Office of Research and Development
            U.S. Environmental Protection Agency
        Research Triangle Park, North Carolina  27711

-------
                                 DISCLAIMER
     This report is a compilation of the papers presented at the Data
Validation Conference.  Each individual paper may not have received peer
technical review.  Technical review and clearance of these proceedings
was based primarily on the review of the executive summary and on the
general merits of the proceedings as a total entity.

     The content of these proceedings do not necessarily reflect the
views and policies of the U.S. Environmental Protection Agency  nor does
mention of trade names or commercial products constitute endorsement or
recommendation for use.
                                      n

-------
                                  FOREWORD


     Measurement and monitoring research efforts are designed to anticipate
potential environmental problems, to support regulatory actions by developing
an in-depth understanding of the nature and processes that impact health and
the ecology, to provide innovative means of monitoring compliance with regu-
lations and to evaluate the effectiveness of health and environmental pro-
tection efforts through the monitoring of long-term trends.  The Environmental
Monitoring Systems Laboratory, Research Triangle Park, North Carolina, has
the responsibility for:  assessment of environmental monitoring technology
and systems; implementation of agency-wide quality assurance programs for
air pollution measurement systems; and supplying technical support to other
groups in the Agency including the Office of Air, Noise and Radiation, the
Office of Toxic Substances and the Office of Enforcement.

     Data validation, an element of quality assurance is necessary to provide
accurate and reliable environmental data.  Data of known and acceptable
quality are needed for measuring compliance with regulations, assessing
health effects, and developing optimum strategies to cope with environmental
pollution situations.  A unified treatment of validation of particular types
of data bases is needed to support broad-scale uses of these data.  Current
in-use data validation procedures were presented at the conference to promote
a better understanding of available techniques.  Hopefully, the conference
and these proceedings will provide an impetus toward the development of more
unified and systematic approaches to data validation.


                                      Thomsas R. Hauser, Ph. D.
                                              Director
                            Environmental Monitoring Systems Laboratory
                               Research Triangle Park, North Carolina '

-------
                                  ABSTRACT
     These proceedings are a record for future reference of the technical
presentations made at a conference on Data Validation for environmental
data.  The conference was hosted and sponsored by the U. S. Environmental
Protection Agency, Research Triangle Park Interlaboratory Quality Assurance
Coordinating Committee on November 4, 1977, at the Research Triangle Park.
Various data validation approaches and techniques were presented and are
documented in this publication.
                                      iv

-------
                              CONTENTS


FOREWORD 	   iii

ACKNOWLEDGEMENTS 	   vii

INTRODUCTION	     1
EXECUTIVE SUMMARY AND RECOMMENDATIONS 	     3
     Seymour Hochheiser
     Raymond C. Rhodes

WHAT IS DATA VALIDATION? 	     7
     Raymond C. Rhodes

THE SHEWHART CONTROL CHART TEST FOR SCREENING
   24-HOUR AIR POLLUTION MEASUREMENTS 	    17
     William F. Hunt

DISTRIBUTION GAP TEST FOR HOURLY AIR POLLUTION
   DATA 	    25
     Thomas C. Curran

USE OF STATISTICAL SAMPLING IN VALIDATING
   HEALTH EFFECTS DATA 	    31
     Carolyn P. Chamblee

USE OF SUCCESSIVE TIME DIFFERENCES AND DIXON
   RATIO TEST FOR DATA VALIDATION 	    39
     Tyler Hartwell

CLUSTER ANALYSIS AS A DATA VALIDATION
   TECHNIQUE 	    71
     Harold L. Crutcher

ENGINEERING COMPUTATIONS AND DATA COLLECTION
   FORMATS USEFUL IN DATA VALIDATION 	    81
     A. Carl Nelson, Jr.

VALIDATION PROCEDURES APPLIED TO IN-USE MOTOR
   VEHICLE EMISSION DATA 	    99
     Marcia E. Williams

-------
DATA VALIDATION TECHNIQUES USED IN MOBILE
   SOURCE TESTING 	     125
     C. Don Paul sell

VALIDATION OF CONTINUOUS STACK MONITORING DATA 	     131
     Joseph E.  McCarley

SCREENING CHECKS USED BY THE NATIONAL CLIMATIC
   CENTER 	     135
     William E. Klint

DATA VALIDATION FOR UPPER AIR SOUNDING DATA
   AND EMISSION INVENTORY DATA 	     199
     J. H. Novak

VALIDATION OF BIOMEDICAL DATA THROUGH AN ON-LINE
   COMPUTER SYSTEM 	     209
     Larry D. Claxton

REGIONAL VALIDATION OF STATE AND LOCAL AIR
   POLLUTION DATA 	     219
     Thomas H.  Rose

DATA VALIDATION FOR THE LOS ANGELES CATALYST
   STUDY (LACS) 	     223
     Charles E. Rodes

VALIDATION TECHNIQUES USED IN CONTINUOUS AIR
   MONITORING  	     237
     Marvin B.  Hertz

USE OF PRECISION AND ACCURACY ESTIMATES FOR
   VALIDATION OF DATA 	     247
     David T. Mage

VALIDATION SYSTEM USED IN THE ST. LOUIS REGIONAL
   AIR MONITORING STUDY (RAMS) 	     265
     Robert B.  Jurgens

NAMES AND ADDRESSES:

    PROGRAM  	     295

    EPA/RTP  INTERLABORATORY QUALITY ASSURANCE
      COORDINATING COMMITTEE  	     305

    DATA VALIDATION CONFERENCE, SPEAKERS  	     306

    DATA VALIDATION CONFERENCE, ATTENDEES  	     308

-------
                               ACKNOWLEDGMENTS
     The cooperation of all  participants  in the conference  is  gratefully
acknowledged.   Particular appreciation is due to the participants who
prepared written copy of their presentations for post-documentation  of
the conference.
                                     vn

-------
                                  SECTION I

                                INTRODUCTION
     A conference on data validation was held on November 4, 1977, at
Research Triangle Park, North Carolina.  The conference was sponsored,
organized, and hosted by the Environmental Research Center - Research
Triangle Park (ERC-RTP) Interlaboratory Coordinating Committee.  Con-
ference participants represented  (a) EPA's RTP Research Laboratories and
Program Offices, (b) an EPA Regional Office, (c) EPA Contractors, and (d)
the National Climatic Center, Asheville, North Carolina.

     Welcoming remarks were made by Dr. John K. Burchard, Director of the
Industrial Engineering and Research Laboratory, and Senior ORD official at
RTP.  Each of the speakers presented their current practices for data vali-
dation.  The conference provided an opportunity for a free exchange of
viewpoints and techniques and was intended to enhance the state-of-the-art
of Data Validation.

-------
   EXECUTIVE SUMMARY AND RECOMMENDATIONS
                      by.
              Seymour  Hochheiser
                      and
              Raymond  C. Rhodes
Environmental Monitoring and Support Laboratory
     U.S. Environmental Protection Agency
 Research Triangle Park, North Carolina  27711

-------
                                SECTION II

                    EXECUTIVE SUMMARY AND RECOMMENDATIONS


     The nature and scope of data validation activities varies considerably
among those involved with the collection, analysis, review, and use of
environmental data for research and monitoring purposes.  The objective of
the conference was to review and discuss the various current practices of
data validation, and to provide a forum for the exchange of information.
It was intended that as a result of this conference, the function of data
validation might become more specifically defined and more uniformly and
widely implemented.

     Following is a general review of the contents of the papers presented
at the conference.

DEFINITIONS AND SCOPE OF DATA VALIDATION

     The conference authors implied differing definitions of data validation
and used various words relating to data validation activities.  The scope of
activities extended from that of simple checks for data transfer errors to
that of a total quality assurance program.  Words used in relation to data
validation activities included:   editing, screening, verification  checking,
auditing, and qualification.


TYPES OF DATA INVOLVED

     The type of data involved in most of the papers was  ambient air
pollution concentrations.  However, other types of data -- including epidemi-
ological, meteorological, stationary source emission, mobile source emission,
and in vitro and in vivo bioassays -- were discussed.

     Data were validated in a variety of forms, including strip charts, hand-
written forms, computer printouts, magnetic tape, and optical sensing records.

     In some cases the activities of data validation, as indicated by the
authors, were performed by those producing the data; in other cases, by
independent reviewers; and in still other cases, by the users.


SIZE OF DATA BLOCKS CONSIDERED

     In most cases, the data being validated were reviewed in definable
blocks, varying from one day to one year.  In one case, a real-time

-------
computerized system was used, and in another, the results of single tests
were reviewed individually.  The number of data values considered as a
block, or group of data, varied from only three results for stack sampling
tests to over 30,000 for pibals (meteorological balloons).


TYPES OF TECHNIQUES EMPLOYED

     Both manual and computer techniques were used, usually depending upon
the amount of data involved.  Some systems employed both manual and computer
methods.  Only a few of the papers included graphical techniques for reviewing
data.  In less than half of the systems described, specific checks were
made of the identification (or coding) of the data.  In about half of the
data validation systems, statistical techniques were used.  Some of the
techniques included were Dixon outlier tests, Shewhart control chart limits,
exponential distributions, and asymptotic singular decomposition.  In several
instances, statistical sampling plans were utilized to select specific data
sets for checking.


WRITTEN DATA VALIDATION PROCEDURES

     In only one case were detailed procedures written describing the data
validation activities and criteria.

FLAGGING AND REJECTION OF DATA

     In most cases the questionable results of the data validation process
were flagged, i.e., identified for more detailed evaluation and/or identified
as questionable values in the data records.  In most cases, as a result of
data validation, questionable data were invalidated (rejected) or were
corrected as a result of further investigation.


RECOMMENDATIONS

     Although no official conference recommendations were made, the following
recommendations were generally expressed:

1.   The functions and scope of data validation should be more specifically
     defined.

2.   Data validation techniques should be presented and summarized in some
     logical manner.

3.   Data validation systems should be recommended or specified for use in
     certain situations.

     The above recommendations could be pursued in several possible ways.
One would be for a task group to be formed to develop standardized nomencla-
ture, to summarize in a systematic way various activities and techniques of
data validation, and to recommend data validation systems for specific

-------
situations.  The above tasks could also be performed  by a  knowledgeable
contractor.
SUMMARY

     It is evident that the current practices of data validation vary
widely in nature and scope.  The conference provided an excellent opportunity
for an open exchange of information concerning data validation practices
and should result in a broader utilization of data validation techniques.
In addition, the conference discussions and these proceedings should promote
a greater awareness of the need to develop a more organized and unified
approach to this important element of quality assurance for environmental
data.

-------
           WHAT IS DATA VALIDATION?
                      by
              Raymond C. Rhodes
Environmental Monitoring and Support Laboratory
     U.S. Environmental and Protection Agency
 Research Triangle Park, North Carolina  27711

-------
                         WHAT  IS DATA VALIDATION?

                               R.C.Rhodes

     Just what is data validation?
     Many of us are involved in activities which, we feel, constitute data
validation, or at least,  a part of  a data validation process.  My first
encounter with the term "data  validation" in  connection with air pollution
monitoring data occurred  about five years ago.  Since  then my concepts of
the function and scope of data validation have  expanded considerably, and
in fact they are still changing.
     I'm sure that each person attending this conference  has his or her own
concept—probably different from  anyone else's—of  data validation.  Whatever
these concepts are, we're here to exchange our  ideas,  thoughts, and techni-
ques on the subject.  I feel sure that  each  of  us will  learn something new
and useful for our own particular area  of application.
     Before we hear the other speakers, let's think a  little bit about this
subject of "data validation."   Webster  defines  "validation" as follows:
          VALIDATION
               —  THE ACT OR PROCESS OF VALIDATING*
That doesn't help us very much, does  it? So we might  look at the  definition
of the word "valid" itself.
          VALID
               --  HAVING LEGAL EFFICACY OR FORCE
               —  SUPPORTED BY OBJECTIVE TRUTH
*The capitalized items in this paper were used as visual  aids for the
 presentation.
                                     8

-------
This definition is getting a little closer to our desired meaning in the
data validation sense.   The word "valid"  does have some connotation of a
"stamp of approval," indicating that things are "right."
     In the "Quality Assurance Handbook for Air Pollution Measurement
Systems," EPA 600/9-76-005, the following definition is given:
          DATA VALIDATION
               --  THE  PROCESS WHEREBY DATA ARE FILTERED AND ACCEPTED OR
                   REJECTED BASED ON A SET OF CRITERIA
There is a short section on "data validation" in the Handbook,  which you
may be interested in reading.  My own definition, which I use in the "Data
Validation" lecture of  Air Pollution Training Institute (APTI)  Course 470,
"Quality Assurance for  Air Pollution Measurement Systems," is somewhat more
detailed:
          DATA VALIDATION
               —  A SYSTEMATIC PROCEDURE OF REVIEWING A BODY OF DATA
                   AGAINST A SET OF CRITERIA TO PROVIDE ASSURANCE OF
                   ITS  VALIDITY PRIOR TO ITS INTENDED USE
The above definition says, in other words, that "a body of data" is reviewed
according to some previously defined plan in a rather comprehensive and
extensive way using all available expertise and knowledge at hand to assure
that the data are technically consistent, correctly identified, and contain
no obvious errors before the data are used.
     Following are a number of terms which seem to involve functions or
activities related to data validation.
          RELATED TERMS
               —  DATA EDITING
               —  DATA SCREENING
               --  DATA AUDITING
               --  DATA VERIFICATION
               --  DATA EVALUATION
               —  DATA QUALIFICATION
               --  DATA QUALITY ASSESSMENT
                                     9

-------
During the remaining presentations  today, you will  hear further  references
or usages of some of these terms.   Since some of  these  terms  are used  inter-
changeably, I believe we need more  specific definitions for each of  the
above to better understand how each one is, or  is not,  involved  in data
validation.  As I define data validation and the  above  terms, I  would
include data editing, data screening,  data  auditing and data  verification
as part of data validation.  However,  according to my definitions, data
evaluation, data qualification and  data assessment are  not parts of  the  data
validation process.
     Before considering some of the aspects of  data validation,  let  us
consider the obvious need for data  validation and its relation to quality
assurance.  EPA and other organizations need good data from which to make
good decisions.  This truism applies equally well to research studies  as well
as to monitoring programs although  data validation is not usually considered
as a separate activity in research  efforts.
          GOOD DATA	> GOOD  DECISIONS
               --  RESEARCH STUDIES
               —  MONITORING PROGRAMS
Since data validation is concerned  with an assurance of having obtained good
data, one might think that data validation includes.everything that is done
to get valid, or good, data.  But that is the concern of quality assurance.
Whereas quality assurance is concerned with all activities which may affect
data quality, the activities of data validation involve an after-the-fact
review of the data, along with related information, to assure that valid
data have, in fact, been obtained.   As such, data validation is considered
as only one element of quality assurance.
          DATA VALIDATION

               li
                    AN ELEMENT OF
                    QUALITY ASSURANCE
In the APTI Course 470, data validation is one of 23 elements of quality
assurance as shown by the following "Q.A. Wheel" in the Quality Assurance
Handbook.
                                    10

-------
              QUALITY ASSURANCE ELEMENTS AND RESPONSIBILITIES

                      (THE QUALITY ASSURANCE WHEEL)
           %
             %>,
                */» >v />
         %
    -a
    s

   IP -


%**
^
fflSTICAL
                              PROCUREMENT Q,C,
              ^
                                 //^
•f ^
• I
£
f

^
g
•§
«=*:
^

^
"nr—
CJ3
z:
2:
2
Q_
>-
1^*™*"
H!

-------
     What are some of the attributes  of a data  validation  system?   With  no
intent to restrict the other speakers concerning  their views  of  data  valida-
tion, following are some key features, in my opinion,  of a data  validation
system.
     After-the-Fact Review.   Data validation is an after-the-fact  review of
data to assure that good data have been obtained.   Many activities of
quality assurance are concerned with  the planning and  acquiring  of data, but
these activities are accomplished before or during the acquisition of the
data.  Data validation activities (a  part of quality assurance)  are accom-
plished after the data are obtained.
     Applied to Blocks of Data.  Data validation is applied to incremental
blocks of data.  The blocks in case of air monitoring  data that  are sent to
the National Air Data Bank (NADB) could be the  quarterly blocks  of data  sub-
mitted to the NADB.  The blocks of data for source emissions testing would
most likely be the run of three individual tests of a  test set.   Perhaps in
automotive emissions monitoring, the  block of data may be  that from a single
test.  So, a block would depend upon  what seems to be  logical for a particu-
lar type of data-gathering.  In any case, the data would be given a valida-
tion review as a defined block of data.
     Systematic and Uniform Application.  Data validation  should not be
conducted on an occasional or spot-check basis.  Once  the  procedure is
defined it should be applied systematically and uniformly  to all sequential
blocks of data acquired.  This is not to say that the  procedure should not
be continually improved.  It is helpful for details of the procedure to  be
written to assure uniform application of the procedure in  case of change of
personnel and to avoid "reinventing the wheel."
     A Set of Criteria.  A set of criteria ought to be developed and docu-
mented as a part of the written procedure to be used during data validation
to determine if the data are valid, questionable, or invalid.  If the causes
of questionable or invalid are not evident from the data validation activity,
the detection of questionable or invalid data could trigger investigation
into possible cause with appropriate corrective action implemented to pre-
clude recurrence of questionable, or invalid data.

                                     12

-------
     Checks for Internal Consistency.  Data validation might include checks
for internal consistency, such as relationships among pollutants, or rela-
tionships between pollutants and meteorology.
     Checks for Temporal and Spatial Continuity.   Data validation might
include checks for continuity with respect to time, as might be evaluated by
having a chronological plot of the data, to look for discontinuities, spikes,
gaps, etc.  The data may also have some spatial continuity if the data are
from a network within some relatively small region, such as a local  air
monitoring network.
     Checks for Proper Identification.  To be useful, data must be properly
identified.  Improperly identified data may well  be considered "no data."
Although identification may seem to be a trivial  thing, the Regions, for
example, have difficulties with such improper identifications as (a) one
state reporting data identified for another state, (b) data for October 35,
and (c) duplicate data from one site and none from another.  For medical
history questionnaires of health effects studies, checks may be made to
make sure that children are not older than their mothers!
     Checks for Transmittal Errors.  For paperwork systems, simple checks
may be made to assure that the data have not been incorrectly transferred
from one paper to another.  With more sophisticated electronic and computer
data handling and with telemetry of data, checks could be made to assure
that the data have not been changed in the process.
     Flagged or Rejected Data.  A data validation system might include a
scheme for flagging questionable data and may make provision for outright
rejection of data for use.  It may be desirable, however, to retain such
data in the data system with proper indication of its status.
     In summary, some of the aspects I consider as parts of a Data Validation
System are as indicated below:
          DATA VALIDATION
               —  AFTER-THE-FACT REVIEW
               —  APPLIED TO BLOCKS OF DATA
               --  SYSTEMATICALLY AND UNIFORMLY APPLIED
                                     13

-------
               —  A SET OF CRITERIA
                    —  CHECKS FOR INTERNAL CONSISTENCY
                    --  CHECKS FOR TEMPORAL AND SPATIAL  CONTINUITY
                    —  CHECKS FOR PROPER IDENTIFICATION
                    —  CHECKS FOR TRANSMITTAL ERRORS
               —  DATA FLAGGED OR REJECTED
     Techniques of Data Validation.  Obviously, because  the methods of data
gathering are so varied, the particular techniques that  are to be used for
data validation for a particular program will  depend upon many things.
Following are mentioned a few of the factors which need  to be considered.
The nature of the^ response output,  that is, whether you get a response on a
strip chart recorder, or whether it is generated on paper tape, magnetic
tape, or is fed directly into a computer will  determine  the technique of
data validation.  The techniques will depend upon the method of data
reduction, i.e., whether it is a manual-type method or a computer system.
The form of the data transmittal, i.e., whether data are transmitted by some
handwritten form, typewritten form, computer printout, or magnetic tape will
determine the types of data checks to use.  The techniques will also depend
upon the amount of data.  As we get involved with larger studies and larger
blocks of data involved, such as NADB, different techniques must be used
from those utilized for small sets of data.  The techniques will depend upon
the type and amount of ancillary (related) data that can be used for evalua-
tion, comparison, or for correlation purposes.  Techniques will depend upon
what computing capabi1ity is available for use.  The extent of available
plotting capability is an important consideration, particularly for large
blocks of data.  Personally,  I would like to see graphical presentations
used in data validation.  Much more can be learned by graphical representa-
tion that would be very difficult—almost impossible—to learn from visual
review of large masses of data.  Finally, the  nature and extent of data
validation techniques would depend on the intended use of  the  data.
Different criteria may be used for validating  data from which  long term
trends are estimated as compared to data for three-hour peak values, for
example.  To summarize:
                                     14

-------
          TECHNIQUES WILL DEPEND ON
               --  NATURE OF RESPONSE
               —  METHOD OF DATA REDUCTION
               --  FORM OF DATA TRANSMITTAL
               --  AMOUNT OF DATA
               —  AMOUNT OF ANCILLARY DATA
               —  COMPUTING CAPABILITY
               —  PLOTTING CAPABILITY
               —  USE OF THE DATA
     Lastly, there are two key principles of data validation that I  want to
mention.  First, data validation ought to occur as close in time and
location a^ possible to_ the originating location of the data.   If question-
able values are discovered, and corrective actions need to be made to the
system, they must be made in a very timely and effective manner.  For
example, NADB may be validating data for as much as two years after the
initial generation of the data.  That is much too late to get effective
corrective action at the local level.  Therefore, data validation techniques
should be located as closely as possible to the source of the data.   Second,
where possible, the persons having data validation responsibilities  should
not be the persons directly responsible for acquiring the data.  Ideally,
the person or persons responsible for data validation should be independent
of the data acquisition activities and should be the most knowledgeable and
experienced technical individual available to perform the function.
     Thus,
          DATA VALIDATION SHOULD BE
               —  CLOSE TO THE ORIGINATION OF THE DATA
               —  INDEPENDENT
     Perhaps I have raised a number of questions in your mind concerning the
subject of data validation.  Hopefully, the other speakers will answer some
of these questions, or will raise further questions, and will promote bene-
ficial discussions and interchange of ideas and techniques of data validation.
                                    15

-------
    THE SHEWHART CONTROL CHART TEST FOR
SCREENING 24-HOUR AIR POLLUTION MEASUREMENTS
                     by
              William F. Hunt
 Office  of  Air  Quality  Planning  and Standards
     U.S. Environmental  Protection Agency
 Research Triangle  Park, North Carolina  27711
                      17

-------
                          THE SHEWHART CONTROL CHART TEST FOR

                      SCREENING 24-HOUR AIR POLLUTION MEASUREMENTS

                                       W.F.  Hunt


                                      INTRODUCTION
     A quality control program is being  developed for the U.S.  Environmental Protection
Agency's (EPA) National Aerometric Data  Bank (NADB).   The initial phases of the work
                         (1 2)
were reported previously.   '     The purpose of the program is to develop and apply
quality control tests to check ambient air quality data for anomalies,  such as  trans-
cription and keypunch errors,  as well as to detect erroneous data resulting from the
periodic malfunctioning of air monitoring instruments.  For the sake  of completeness, it
is worth reviewing some aspects involved in the collection and uses of  air quality data.
To begin with, air quality data are primarily collected to measure the  success of
emission control plans in achieving the National Ambient Air Quality  Standards.

     National Ambient Air Quality Standards (NAAQS)     have been established by EPA for
five pollutants:  total suspended particulate (TSP),  sulfur dioxide (802), carbon monox-
ide (CO), photochemical oxidants  (Ox), and nitrogen dioxide (N0£).  These standards are
intended to protect both human health and welfare.  They may be stated as annual means
or as upper limit values that may not be exceeded more than once per  year.  Although
different averaging times are used for various standards, this paper  is primarily con-
cerned with the examination of 24-hour average values for TSP, S02, and N02 concentra-
tions.  While only TSP and S02 standards are in terms of 24-hour averages, all three
pollutants have standards expressed in terms of annual averages.  Because of the impor-
tance that is attached to violations  of the NAAQS, a quality control program to ensure
the validity of the measurement of both short- and long-term concentrations is extremely
important.

     The application of the Dixon Ratio Test    and Shewhart Control Chart Test    to
measured levels of three major pollutants—TSP, S02, and N02—is examined.  The tests
apply to data from monitoring instruments which generate one measurement per 24-hour
period and are operated on a  systematic sampling schedule of approximately once every 6
days.  In the cases of SQ2 and N02, there are also continuous monitoring instruments,
which monitor the pollutants  constantly; but our discussion here is concerned only
with 24-hour  data.  The application of the tests results is flagged data which need  to
be verified as either valid or invalid.

     These statistical tests  are  presently being applied to data collected in EPA's
Region V.  Region V encompasses  the states of Illinois,  Indiana, Michigan, Minnesota,
Ohio, and Wisconsin.  In  terms of population, it is the  largest of EPA's regions, and
there is extensive monitoring of  the  above pollutants.   The purpose of  the Region V
evaluation is to determine whether the data  flagged by the  tests are valid or  invalid
and  to  identify, if possible, the source of  the error.

     This paper will  discuss  the  flow of data from the state and local  government;  the
data-editing  process; the basic  characteristics of the data; and  the application and
evaluation of the two tests;  it  will  conclude with our recommendations.

                                        DATA FLOW
     Most ambient air quality data are collected by state  and local air pollution con-
 trol agencies and are forwarded  via  EPA's  Regional Offices  to the NADB.  A considerable
 amount  of data  is   forwarded.  For example,  the minimum  legal requirements  for air
 pollution monitoring  across the  nation will  result in the  annual  submittal of  over  20
million air  quality measurements to  the NADB.  The data  are sent  quarterly  in  a standard


  Copyright ©1977 American Society For Quality Control,  Inc.  Reprinted by permission.

                                               18

-------
format    that specifies the site location; the year, month, and day of sampling; and
the measurement itself (24-hour or 1-hour value) in micrograms or milligrams per cubic
meter (pg/m3 or mg/m3) or parts per million (ppm).  A corresponding site file contains
descriptive information on the sampling-site environment.  EPA edits the submitted
data, checking for consistency with acceptable monitoring methods, and other identify-
ing parameters.  In the data-editing program, air quality data with extremely high
values are flagged.  Data that do not pass these checks or that have values exceeding
certain predetermined limits are returned to the originating agency via the Regional
Office for correction and resubmittal.

     As might be expected with data sets this large, there are still anomalous measure-
ments that slip through the existing editing and validation procedures.  Therefore,
there is a need for a simple cost-effective statistical test that can be applied to
the air quality data by which to detect, primarily, obvious transcription, keypunch, and
measurement errors. Statistical tests do not eliminate, however, the need for more
intensive quality assurance at the local level.  For example, inadequate calibration
procedures or similar problems that result in measurement bias will not be detected by
our statistical procedures, which are intended primarily for macroanalysis.

                   BASIC CHARACTERISTICS OF TSP, S02, AND N02 DATA
     Basic characteristics of the TSP, S02, and N02 data were considered in selecting
the quality control tests being used.  To begin with, the tests were applied to data
which were obtained from monitoring instruments that generate one measurement per 24-
hour period.     For such monitoring methods, EPA recommends that a systematic sampling
procedure of once every 6 days, or 61 samples per year, be used at a minimum to collect
         /Q\
the data.     Such a sampling procedure generates data, which for our purposes, may be
considered as approximately independent.

     In examining the distributional properties of the data, past research has shown

that ambient TSP concentrations are approximately lognormally distributed.  '     This
is sometimes true for S02 and N02, also, but is not always the case.  Current work

suggests that these pollutants may follow an exponential or Weibull distribution.

     In selecting the quality control tests, the averaging times which correspond to
the NAAQS are important.  The values of interest are the peak concentrations (24-hour
average measurements) for TSP and S02, and the annual means for TSP, S02, and N02.

     The final data characteristic of importance is the seasonality of the pollutants.
As an example, in some areas of the country, TSP and S02 measurements are highest in
the winter months and lowest in the summer months.  Therefore, the factor of seasonality
had to be considered in the selection of the quality control test to minimize this as a
possible source of error.

                                THE QUALITY CONTROL TESTS
     Two quality control tests are presently being applied and the results of the appli-
cation evaluated, the Dixon Ratio Test    and the Shewhart Control Chart Test.     The
output of the quality control tests is a listing of the suspicious data, including the
site and the time of occurrence.  The tests are discussed below.

Dixon Ratio Test                                                        ,^\
     The use of the Dixon Ratio Test was discussed in an earlier paper.     The test
was applied to TSP quarterly data and was found to work reasonably well in detecting a
single anomalous value.  Problems occurred when there were multiple transcription errors
within a quarter, such as the miscoding of an entire month of data.  This problem was
corrected when the test was applied to monthly averages.

     As part of the evaluation of quality control of Region V data, the Dixon Test
was applied to all 1974 monthly averages of TSP, S02, and N02 on a site-by-site basis
to examine the data for possible multiple transcription, keypunch, or measurement
errors occurring within a month.  By applying the test to the monthly averages, the
assumption of normality can be satisfied, although the monthly averages are not entire-
ly independent because of the seasonality in the data.  This must be considered in
examining the flagged data.

     The Dixon Ratio Test requires that the monthly averages be ordered in increasing
levels of magnitude.  The test .basically constructs an "r" ratio that compares the
distance of the maximum (minimum) observation from its neighbors with the range of all

                                          19

-------
but one or two of the observations.   Let us assume that Y^ equals the itn order pollu-
tant monthly average, where Y^j is the highest monthly average and N equals the number
of months within the year for which there are data.  The test procedure is as follows:

1.  Choose a, the probability (risk) of rejecting an observation that really belongs in
    the group.

2.  Order the monthly averages from Y^ through Y^j, where Y^ is the highest value.

3.  If  3 < N <  7,  compute rin =  (Y.T - Y.T ,)/(YM - Y,) ;
          —   —               10     N    N-l    N    1

        8 <_ N ^ 10,  compute rn =  (YN - Vl;/(YN " Y2};

       11 1 N H I2-  compute rn =  (YN - YN_2)/(YN - YZ> ;

    where Y  is the highest value.
                                                         4
4.  Look up r,   for r.  from a table of critical values.
             1-ct      ij
5.  If r.. is greater than r   , print out a list showing the suspect monthly averages,

    the remaining monthly averages, and the site location.

     The Shewhart Control Chart Test    can be used  to examine both  shifts in monthly
averages, as well as shifts in the monthly range.  From the former it can detect
possible multiple errors and from the latter, single anomalous values.  In this  test
                                                                        (12)
the data can be divided up into what Shewhart called rational subgroups.      In a
manufacturing process the subgroups would most likely relate to  the  order of production.
Ambient air quality measurements can be viewed in the same way because  they are  col-
lected by a monitoring instrument over time.  A month of data was selected as the
rational subgroup because the air quality data are recorded by the state and local

agencies on a monthly basis in a standard format.     The monthly subgroup generally
                                                                           / 0\
consists of  live measurements based on EPA's recommended sampling schedule    of 61

observations per year, which also is the common subgroup size found  in  industrial use.
Using a subgroup size of five, it can be assumed  that  the distribution  of the monthly
means is nearly normal, even though the samples are  taken from a nonnormal universe.

     The test was applied to the 1974 Region V data  on a moving  4-month basis:   that  is,
the averages and range of values in the month in  question were compared with  the overall
averages of  the three previous monthly averages and  monthly  ranges.   The moving  4-month
comparison was used  to minimize the effect of the  seasonality of the pollutants.  The
formulas for calculating the trial  limits are as  follows:

For the monthly range:  UCL  = D.R, and

                        LCLn = D..R.
                            K     j
 For  the  monthly  means:   UCL_  =  x  +  A0R,  and
               }             x         2  '
                         LCL_  =  x  -  A2R,

 where  R  =  the  monthly  range;  R  =  the average  of  the three previous monthly ranges;  x =
 the  monthly  average  in question;  x  = the average of the  Three previous monthly aver-
 ages,  and  D-j,  D^,  and  A2 are  factors for determining from R the 3-sigma control limits
 for  x  and  R.   (See Table C  on page  562,  reference number  5.)

                     RESULTS  OF APPLICATION OF QUALITY CONTROL TESTS
      During  1974,  TSP,  S02, and N02 were being monitored  in Region V at 855,  366, and
 303  sites,  respectively.   Both  the  Dixon and  Shewhart Tests were applied to all 1974
 TSP,  S02,  and  N02  data from Region  V.   Still  in  progress, an extensive effort is being
 made on  the  part ot  EPA personnel in Region V, in conjunction with state air pollution
 control  officials, to  evaluate  the  air quality data flagged by both the Dixon and
 Shewhart Tests.  As  an initial  phase of this  evaluation,  examination was made of those
 data in  which  the  flagged monthly mean or range  exceeded  one of the pollutant-specific
 NAAQS.   For  'ISP  and  S02,  appropriate cutoffs  were thought to be 260 yg/m3 and 365 ng/m3,
 which aro  their  respective primary  short-term 24-hour standards.  In the case of N02,
 the  annual primary NAAOS of 100 Mg/m'  was used because N02 has no short-term primary
 standard.   Although  their choice  was somewhat arbitrary,  the NAAQS wero used as cutoffs
 beLausr  their  violation results in  ruexamination of the overall adequacy of local air


                                            20

-------
 pollution control measures in effect.  Thus, high values must be verified because they
 can result in significant impact on the original control strategy designed to achieve
 the NAAQS.

      Table 1 indicates the number of Region V sites reporting TSP, S02, and N02 data
 which were flagged by the Dixon Test, by the Shewhart Control Test, and by both tests.
 As would be expected, there are more sites flagged by the Shewhart Control Test as
 having anomalous data than the Dixon Test, because it looks at both shifts in the
           TABLE 1.   Comparison of Dixon Ratio and Shewhart Control Chart
                     Tests as Applied to Sites in Region V Monitoring TSP,
                     S02,  and N02 in 1974


                                                      Pollutant
                                             TSP          S02          N02
                High value in               .> 260        >_ 365        _> 100
                  question3
                  (yg/m3)

                Total sites, no.              855          366          302
Dixon test
Flagged sites, no.
Flagged sites, no.
with errors
Shewhart test
Flagged sites, no.
Flagged sites, no.
with errors
Both tests
Flagged sites, no.
Flagged sites, no.
with errors

35
31


38
31


32
31


1
1


4
3


1
1


25
11


36
16


19
10

      The High value in question is the monthly mean in the case of the Dixon Test and
the monthly mean or range in the Shewhart Control Chart Test.  The National Ambient
Air Quality Standards (NAAQS) were used as high value cutoffs:  260 yg/m3 and
365 yg/m3 are the 24-hour primary NAAQS for the TSP and S02, respectively, while
100 pg/m3 is the annual primary NAAQS for NO,,.


 monthly mean and range while the Dixon Test examines only the monthly means.  The pre-
 liminary evaluation of the flagged sites is also given as the number of flagged sites
 which were found to have one or more erroneous 24-hour measurements.

      Of the 855 sites in Region V measuring TSP in 1974, 35 were flagged by the Dixon
 Test, 38 by the Shewhart Control Test, and 32 by both tests.  The flagged sites report-
 ed at least one monthly mean and/or range eoual to or greater than 260 ug/nr.  The
 preliminary evaluation indicates that data from 31 sites, which were flagged by both
 tests, were found to have multiple transcription or keypunch errors.  In the caso of
 S02, 1 of the 366 sites was flagged by the lixon Test, 4 by the Shewhart Test, Jnd 1
 by both tests.  The monthly mean and ranges in question were equal to or greater t'tan
 365 ug/m3.  Data from the site flagged by both tests were found to have multiple tran
 scription errors, while data from the remaining tvo sites flagged by the Shewhart
 Test had single transcription errors.  Finally, of the 302 sites measuring NO^, -5
 were flagged by the Dixon Test, 36 by the Shewhart Test, and 19 bv^both tests.  The
 monthly means and ranges in question equalled or exceeded 100 ug/nr .   Transcription
 and keypunch errors were found at 11 of the sites flagged by the Dixon Test, Ib c>t" the
 sites flagged by Shewhart Test, and 10 of the sites flagged by both.

      An example of a site flagged by both tests was one that measured TSF for 11 rionth---
 in 1974.  The monthly mean  (x) , ranges (R) , and subgroup sizes (n't ait? indicated bol^w
 by month:
                                              21

-------
  x
  R
  n
          Jan
           0
Feb
 67
 74
  4
Mar
60
25
5
Apr
56
71
5
May
70
44
5
June
67
102
3
Jul
66
37
5
Aug  Sept  Oct  Nov   Dec
 73   59   591   82    41
 64   68   595   68    30
  55534
The Dixon Ratio Test was applied  to  the  entire year of data; the ratio of the largest
monthly mean, 591, minus the third largest  mean,  73,  was compared with the difference
of the largest mean and second  smallest  monthly mean, 56.   The test statistic is
which is significant at  the
                  = 591-73
               C2 1  591-56
               0.005  level.
     The Shewhart Control  Chart  Test  was applied on a moving 4-month basis.  When  the
monthly average and range  for  October became the values in question, they were com-
pared with the overall  averages  of  the July, August, and September averages and  ranges.
The test results are  shown in  Figure  1 for both the monthly mean and range.  In  both
             600
             500
             400
          E
          a.
          <
          cr
             300
             200
             100
                                   	J	UCL,
                     -T-—+—\
                                                                        LCLX
                     JUL    AUG   SEP    OCT

                   a. R CHART FOR MONTHLY RANGE
                                  JUL    AUG   SEP    OCT

                                 b. X CHART FOR MONTHLY MEAN
                 Figure 1. Example of Shewhart Control Chart Test applied to data with
                 multiple transcription errors in month of October.


 cases the air quality data are "out of  control" for  the  month  of October,  with both
 the October average and range way above  their respective upper  control limits.  The
 problem was later identified as multiple transcription errors  in which all numbers in
 the month of October were off by a factor of  10.
                                            22

-------
                                      CONCLUSION
     From the initial results of the Region V evaluation, it appears that both the
Dixon and Shewhart work well on the TSP, S02," and N02 data and are in reasonably good
agreement.  Ideally, both tests should be used in the screening process.  However, if
an air pollution control agency wanted to employ only one test, the Shewhart Control
Chart Test would be preferable, because it has the advantage that it can simultaneously
examine shifts in both the monthly mean and range and can be presented graphically.
Further, in the case of S02 and N02, the Shewhart Test flagged sites with a single
transcription or keypunch error—identified by shifts in the range—which were not
identified by the Dixon Test.

     The second phase of the Region V evaluation will cover those sites whose highest
measured value did not exceed one of the pollutant-specific NAAQS.  This phase will be
examined in a later paper, along with the development of quality control tests for data
generated by the continuous monitoring methods.

                                   ACKNOWLEDGMENTS
     The authors wish to express their appreciation to the state air pollution control
agencies in Region V for their help in the evaluation of the tests, to Mrs. Ann Rogers
and Mrs. Aline Rolaff for providing the computer programming support, to Mrs. Joan
Bivins, Miss Hazel Browning, and Mr. Willie Tigs for their clerical support, and to
Dr. Thomas Curran for his many helpful comments on earlier drafts of the paper.

                                      REFERENCES
 1.  Hunt, W. F., Jr., and T. C. Curran.  An Application of Statistical Quality Control
     Procedures to Determine Progress in Achieving the 1975 National Ambient Air Quality
     Standards.  Transactions of the 28th Annual ASQC Conference, Boston, Massachusetts,
     May 1974.

 2.  Hunt, W. F., Jr., T. C. Curran, N. K. Frank, and R. B. Faoro.  Use of Statistical
     Quality Control Procedures in Achieving and Maintaining Clean Air.  Transactions
     of the Joint European Organization for Quality Control/International Academy for
     Quality Conference, Venice Lido, Italy, September 1975.

 3.  Title 40 - Protection of Environment.  National Primary and Secondary Ambient Air
     Quality Standards.  Federal Register.  _36(84):8186-8201, April 30, 1971.

 4.  Dixon, W. J.  Processing Data for Outliers.  Biometrics.  9^75, 1953.

 5.  Grant, E. L.  Statistical Quality Control.  New York, McGraw Hill Book Co.
     p. 122-128.  1964.

 6.  SAROAD Users Manual.  U. S. Environmental Protection Agency, Research Triangle
     Park, N.C.  Publication No. APTD-0663.  July 1971.

 7.  Hoffman, A. J., T. C. Curran, T. B. McMullen, W. M. Cox, and W. F. Hunt, Jr.
     EPA's Role in Ambient Air Quality Monitoring.  Science.  190(4211):2A3-248,
     October 1975.

 8.  Title 40 - Protection of Environment.  Requirements for Preparation, Adoption, and
     Submittal of Implementation Plans.  Federal Register.  3.6(158) -.15490, August 14,
     1971.

 9.  Larsen, R. I.  A Mathematical Model for Relating Air Quality Measurement to Air
     Quality Standards.  U.  S. Environmental Protection Agency, Research Triangle Park,
     tl.C.  Publication No. AP-89.  1971.

10.  Hunt, W. F., Jr.  The Precision Associated with the Sampling Frequency of Lognor-
     mally Distributed Air Pollutant Measurements.  J. Air Poll. Control Assoc.  22(9):
     687, 1972.

11.  Curran, T. C. and N. H. Frank.  Assessing the Validity of the Lognormal Model Vhen
     Predicting Maximum Air  Pollutant Concentrations.  Presented at the 68th Annual
     Meeting of the Air Pollution Control Association, Boston, Massachusetts, 1975.

12.  Shewhart, W. A.  Economic Control of Quality of Manufactured Product.  Princeton,
     D. Van Nostrand Company, Inc.  1931.  p.  299.


                                      23

-------
     DISTRIBUTION GAP TEST FOR HOURLY
            AIR POLLUTION DATA
                     by
              Thomas  C.  Curran
 Office of Air Quality Planning and Standards
    U.S. Environmental Protection Agency
Research Triangle Park, North Carolina  27711
                      25

-------
                             DISTRIBUTION GAP TEST FOR HOURLY

                                    AIR POLLUTION DATA

                                        T.C.  Curran


     Previous papers    have discussed techniques for screening air pollution data sets
with particular attention given to 24-hour measurements.   The present paper focuses
upon the use of screening procedures for hourly ambient air quality measurements.  As
with any quality control procedure, it is useful to consider the nature and intended
use of the data before discussing the screening technique.

     Hourly air pollution data sets present some interesting practical problems when one
considers the use of a screening procedure.   The most obvious feature is the volume of
data.  For example, 24-hour air pollution measurements are usually obtained by every-
sixth-day sampling resulting in approximately 60 samples per year.  In contrast, hourly
measurements are obtained from continuous monitors that operate every day and, therefore,
may produce as many as 8,760 values per year.  Thus, hourly data sets are commonly 100
times larger than those for daily measurements.  The reason that the volume of data is
important becomes apparent when the use of the data is examined.  For the most part, air
pollution data is collected to determine status with respect to certain legal standards,
                                                   4
such as the National Ambient Air Quality Standards.   These standards specify upper
limits for air pollution concentrations.  Of particular interest for this paper are the
standards for oxidants or carbon monoxide which indicate hourly values "not to be
                                  4
exceeded more than once per year."   In these situations it is the second highest
value from a data set of 8,760 observations that becomes the decision-making value.
Obviously, this places a premium on ensuring data quality.

     From a practical viewpoint, maintaining a data bank for air pollution measurements
involves the basic conflict of having to routinely process large volumes of data and
yet at the same time ensure an almost zero defect level of data quality.  Many sites
monitor for several pollutants so that on the national level,  thousands of sites are
routinely submitting tens of thousands of data points each year.  However, because of
the nature of the standards, many users may only be interested in the two highest values
at each site for each pollutant.  It should be noted that two  values from a data set of
8,760 observations constitutes 0.023 percent of the data.  This means that the user's
perception of data quality may be entirely different from the  true data quality.  For
example, if only 0.05 percent of  the data points were too high due to errors, this
would still be sufficient to have the user complain that "the  data are useless."  On
the other hand, if elaborate editing checks  are introduced, the sheer volume  of  data
may result in high costs or processing delays, and  the user may now complain  that the
data are not sufficiently current  for him to make timely decisions.

     With this background in mind,  it is apparent that an air  quality data screening
program must be able to  process  large volumes  of data  in an inexpensive fashion  while
flagging virtually every error.   Also, because it is frequently difficult and time con-
suming  to- verify suspect data points, every  flagged value should  be a genuine error.
Unfortunately, while these  characteristics are obviously desirable,  they are  also almost
impossible to attain.  The  approach presented  here  is primarily intended to eliminate
the more glaring errors  from these  hourly data sets.  The major emphasis is on  screening
the  higher concentration values  to  check for  general internal  consistency within the
data  set.

                            RATIONALE  FOR SCREENING  PROCEDURE
      In our  initial development  of  a  screening procedure  for hourly data, a computer

program was  developed that  checked  for departures  from typical patterns.   These typical
patterns we-re  selected on the basis of experience with various types of air pollution
d.it.i.   Basically,  the values were flagged on  a yes-no decision, and  there was no proba-
bility  statement associated with the  rejected  values.  One  stage  in  this development was


Copyright ©1977  American Society For Quality Control, Inc.   Reprinted by permission.

                                             26

-------
to give sample data sets' to experienced air pollution data analysts to see what values
they would reject.  There were two reasons for this step.  The most obvious was to en-
sure that the computerized screening procedure was consistent with so-called expert
judgment.  However, another reason was the need for a test that would mimic the decisims
made by an experienced analyst.  The reason for this was an attempt to avoid a black-
box approach where the screening procedure was viewed as a mysterious oracle delivering
arbitrary decisions.  The point here is that it can be quite time consuming for the data
analyst to check flagged data points.  Values that appear to be quite unlikely from a
statistical viewpoint may actually be quite likely in the real world.  For example,
massive traffic jams do happen and may result in high carbon monoxide levels.  Windstorms'
can mean high total suspended particulate levels.  Sudden shifts in wind direct ion "can
mean that a monitor near a point source goes from a zero reading to almost full scale
and back in a few hours.  The high variability associated with peak air pollution values
makes it almost impossible to develop a screening procedure that does not occasionally
flag real values.  But it seemed desirable to avoid the situation where an air pollution
analyst would tire of repeatedly checking flagged values that turned out to be correct.
Therefore, emphasis was given to developing a test that would flag values that an air
pollution analyst would want to investigate.  An effective way to accomplish this was to
develop a test that would mimic experienced human judgment so that the analyst would
understand why the value was flagged.

     To a large degree the preliminary test on patterns was successful.  Experienced
analysts used the same basic approach of looking for unusual jump discontinuities between
successive hourly values or departures from expected diurnal or seasonal patterns. How-
ever, there were  two main deficiencies in this computerized procedure based upon depart-
ures from suspected patterns.  One was the lack of a probabilistic framework.  The
second, and probably the more serious from a practical standpoint, was the need to vary
the amount of allowable departure from site to site.  The probabilistic framework could
be provided by a  time series model,  and the parameters varied from site to site.  However,
it became apparent during the preliminary investigation that many of the outliers could
be detected by a  much simpler approach.  In most cases, unusually high values could be
detected by examining the frequency  distribution of the hourly data  for a given period
of time, such as  a month, quarter, or year.  Suspect values would be associated with
large gaps in the  frequency distribution.  The length of the gap and the number of
values above the  gap afforded a convenient means of detecting possible errors. With this
simplification of  the problem, it becomes possible to develop a probabilistic framework
for the problem as discussed below.

                                 PROBABILITY OF A GAP
     In order to  compute  the probability of a gap in the empirical frequency distribu-
tion, it is necessary to  assume some type of underlying distribution.  Although this
involves an oversimplification because it ignores dependency between successive hourly
values,  such approaches have traditionally been used with success in air pollution data
analysis.   The lognormal distribution has customarily been used for this purpose.  How-
ever, the exponential distribution has also been found to provide a  reasonable approxi-
mation for the upper tail, or higher concentrations, of hourly air pollution data.
Because  the higher  concentration values were of primary  interest and the exponential
distribution  is mathematically convenient, it was used as the underlying distribution.
As with  any measurements, although the approximating distribution is continuous,  the  air
pollution values  are discrete  valued.  For simplicity, they may be assumed  to be Integers
because  this  involves merely a change  of scale.  A gap in the  frequency distribution  may
then be  described in terms of  its length, the number of  values above the gap, and at
what concentration  the  gap begins.   Therefore, if a monthly empirical  frequency distri-
bution of hourly  values has n values greater than concentration c but  no values between
c, and c+k, this  would  be a gap of length k starting at  c with n observations above the
gap.  To compute  the probability of  this event,  consider the  following:

     Let X be an  exponential random  variable.

     Then Pr(X^c)  =  l-e~       where ,\>0, c^O.

     Thus, Pr(X-c)  = e~A(c~6).            __


                                             27

-------
The probability that X is greater than c+k given that X is greater than c is

                                  -A(c+k-6)     ,,
                     Pr(X>c+k)
                     Pr(X>c)
            -A(c-0)
 Because X is distributed exponentially, this expression is independent of the concen-
 tration c.

      Assuming independence, the probability that n values are greater than c+k given
 that these n values are greater than c is
                                     i -N
                                     (e   )
                                               -nAk
                                                                                 -nXk
      Thus, the probability of a gap of length k with n values above  the gap  is e
 This probability then becomes the criteria for rejecting suspect data.

                                       APPLICATION
      A relatively simple FORTRAN program was written to process hourly data,  compute
 the empirical frequency distribution, and examine any gaps.  Because of the  manner  in
 which the data is routinely submitted to the U.S. Environmental Protection Agency's
 National Aerometric Data Bank, the program was written to check the  data  on  a monthly
 basis (744 hourly values).  The parameter A obviously varies from  one data set to
 another.  For simplicity, A was determined from the 50th and 95th  percentiles of the
 data.  This was computationally convenient and also emphasized the fit for the upper
 tail.  Results to date in evaluating this test Indicate that this  approach is adequate.

      Past experience has indicated that an occasional source of error is  the miscoding
 of units so that an entire month of data would be internally consistent yet  too high
 by some scale factor.  To account for this, a second estimate of  A was computed using
 an assumed value for the 99.9th percentile, i.e., a value that historically  should  not
 be exceeded more than one time in a thousand.

                                         RESULTS
      In order to provide a realistic test of this screening procedure, actual data  sets
 were used.  One of particular interest  involved carbon monoxide data that had been
 quickly key-punched and then manually edited for a  specific study.  This  provided a pre-
 liminary and corrected version of the file.  The preliminary file  had known  errors  and
 the corrected file was presumably valid.  The first test  run on the preliminary  file
 processed 21,362 hourly values from 40  monthly data sets.   Eight  of these monthly data
 sets were flagged.  Hourly carbon monoxide values would be  expected to mostly fall  in
 the range of 0  to 50 ppm.  In this first  test, values of  900, 800, 700,  and  500 were
 found resulting in gap lengths greater  than 100 and associated probabilities of  less
 than 1  in 10,000.  These results are shown in Table 1.  Of  the eight flagged data  sets,

                   TABLE 1.  Rejected Site Months From Sample Data Set
  Site   Month/year
Number                                   Number of
  of              2nd    Gap   Starting   values
values  Maximum  high  length     at       above
Probabilitv
33
33
33
33
33
39
901
901
Oct.
Nov.
Dec.
Jan.
Feb.
June
July
Aug.
1974
1974
1974
1975
1975
1975
1974
1974
530
604
671
653
510
707
620
334
30
500
800
500
33
900
15
800
13
300
500
500
18
700
14
800
16
-100
-100
>100
14
-100
3
-100
14
15
41
20
19
27
11
. 11
1
3
4
Z.
1
3
3
5
.0006
• .0001
- .0001
-.000:
. :ooi
.000;
.0056
.0001
                                                28

-------
seven had keypunch errors.  The one remaining month was flagged on the basis of a gap of
length 3 and the data appeared to be reasonable.  This presented no difficulty for the
analyst because the computer printout was sufficient to indicate that these data were in
an intuitively acceptable range and probably did not warrant further investigation.

     It took less than 30 seconds on EPA'a UNIVAC 1110 to process these 21,362 hourly
values, and the total cost was approximately $1.  It should be noted that the program
does several other editing checks so that this cost includes more than the screening
procedure for gaps.


                                      CONCLUSIONS
     Using gaps in monthly frequency distributions appears to be a convenient means of
screening hourly air pollution data sets for outliers.  Results to date indicate that it
satisfies the criteria of being easy and economical to implement while producing output
that is intuitively understandable to an air pollution data analyst.  The test success-
fully spots the more obvious errors.  As expected, the initial results also suggest that
these types of data sets do have a much lower error rate than the user perceives because
of the emphasis on only the few highest values.    .

     There are certain refinements that can be made in screening these type of data sets.
Time series models and the use of associated data, such as meteorological variables,
would be expected to increase sensitivity and possibly result in even better data qualUty.
However, it remains to be seen if these more elaborate approaches are cost effective
when processing vast quantities of data from locations throughout the nation.

     As a final cement, it should be noted that once a value is flagged as a possible
anoaialy, it cannot be arbitrarily dropped from the data set.  It must first be verified
that the data point actually is incorrect.  The fact that the data point is statistically
unusual does not necessarily mean that it did not occur.

                                      REFERENCES
1.  Hunt, W. F., Jr., and T. C. Curran.  An Application of Statistical Quality Control
    Procedures to Determine Progress in Achieving the 1975 National Ambient Air Quality
    Standards.  Transactions of the 28th Annual ASQC Conference, Boston, Massachusetts,
    May 1974.

2.  Hunt, W. F., Jr., T. C. Curran, N. H. Frank, and R. B. Faoro.  Use of Statistical
    Quality Control Procedures in Achieving and Maintaining Clean Air.  Transactions of
    the Joint European Organization for Quality Control/International Academy  for Quality
    Conference, Venice Lido, Italy, September 1975.

3.  Hunt, W.. F., Jr., R. B. Faoro, and S. K. Goranson.  A Comparison of the Dixon Ratio
    Test and Shewhart Control Chart Test Applied to the National Aerotnetric Data Bank. •
    Presented at j|t>e 30th Annual Conference of the American Society for Quality Control.
    Toronto, Ontaflo, Canada, June  1976.

4.  Title 40 - Protection of Environment.  National Primary and Secondary Ambient Air
    Quality Standards.  Federal Register. 36:(84):8186-8201, April 30, 1971.

5.  Larsen, R.  I.  A Mathematical Model for Relating Air Quality Measurements  to Air
    Quality Standards.  U.S. Environmental Protection Agency, Research Triangle Park,
    N.C.  Publication No. AP-89.  1971.

6.  Curran, T. C.  and N. H. Frank.  Assessing the Validity of the Lognormal Model when
    Predicting Maximum Air Pollutant Concentrations.  Presented at the 68th Annual
    Meeting of  the Air Pollution Control Association, Boston, Massachusetts,  1975.
                                              29

-------
  USE OF STATISTICAL SAMPLING IN VALIDATING
            HEALTH EFFECTS DATA
                     by
             Carolyn  P. Chamblee
     Health Effects Research Laboratory
    U.S.  Environmental Protection Agency
Research  Triangle Park, North Carolina  27711
                       31

-------
                       USE OF STATISTICAL SAMPLING IN
                       VALIDATING HEALTH EFFECTS DATA

                             Carolyn P.  Chamblee
                    Statistics and Data  Management Office
                     Health Effects Research Laboratory
                Research Triangle Park,  North Carolina  27711


                                  ABSTRACT


     A quality control  plan has been adopted for large computer data files of
health effects research studies.  The Dodge-Romig acceptance sampling technique
was selected.  This procedure has the capability of guaranteeing within specific
tolerance limits the agreement between the information on the computer files
and the information on the original data documents.  The method is easy to use
and is adaptable to a wide range of files and to a varying quantity of documents.
The type of plan chosen utilizes a file as a lot, a single document as a
characteristic, a single sampling procedure, and a 2% Lot Tolerance Per Cent
Defective (LTPD).  Our experience with this acceptance sampling plan has been
positive enough that we have extended its use to most of our current studies.
                                       32

-------
                       USE OF STATISTICAL SAMPLING IN
                       VALIDATING HEALTH EFFECTS DATA

                             Carolyn P. Chamblee
                    Statistics and Data Management Office
                     Health Effects Research Laboratory
                Research Triangle Park, North Carolina  27711


     The Statistics and Data Management Office supplies statistical and
data processing support to the Health Effects Research Laboratory (HERL) as
required.   One of the principal responsibilities of the Laboratory, as the
name suggests, is to research and assess effects of air pollution on the
human health.  One method HERL uses to carry out its responsibility is to
conduct nationwide epidemiological research to establish the relationship
between human health and community air quality.  This research includes field
studies that examine the health of population groups residing in communities
exposed to definable air pollutants.  Exposure-response relationships and
injury thresholds are estimated and the studies document changes in health
that accompany changes in environmental quality.

     Questionnaires are designed for the field studies to allow more uniform
collection of data.  These questionnaires are usually designed as keypunch
entry or optical  scanning documents or a combination of both.  As one would
imagine for nationwide studies, the collection effort results in a large
volume of information which is then processed through various steps and
results in a computerized master file or files.

     During the period from 1970 to 1975, HERL conducted a large number of
epidemiological studies commonly referred to as CHESS or the Community Health
and Environmental Surveillance System.  There are approximately 83 of these
studies covering five different types of studies over five areas.  While
successfully completing this intensive data collection effort during a period
of in-house personnel limitations, computer conversion from an IBM 360/50 to a
Univac 1110 and high contractor personnel turnover, HERL developed a backlog
of raw data.  Although these data were computerized and computer-edited, no one
could make a definitive statement regarding the accuracy of the files versus the
original source documents.  The emphasis and importance of quality control
procedures and our inability to qualify the data files led us to the point that
a quality control program for these files had to be developed.  I was assigned
to develop the quality control plan.

     The plan selected had to be able to guarantee that when properly followed
the contents of the computer file reflected data reported on the forms within
a small error tolerance.  Also, each file must meet the error tolerance.
That is to say, a statement that over the 83 files the error rate is less than
the specified tolerance is not sufficient.  Each individual file must satisfy
the limit.  Lastly, the quality control plan had to minimize the verification
effort but must also be simple to use, easy to understand and adaptable over a
wide range of data files and for a varying number of data forms.

                                      33

-------
     A number of statistical  and quality control  references  were  reviewed
before it was decided that the Dodge-Romig acceptance sampling  technique was
the most desirable.   This approach is  discussed in  detail  by Harold  F.  Dodge
and Harry G. Romig (1).   Their book is very easy to read  and understand and
offers a twelve step procedure for selecting a specific sampling  plan.   To
explain how we decided on the plan we  currently use, I will  briefly  describe
the steps and discuss how we implemented Dodge-Romig.

     1.  Decide what characteristics to include.   For example,  a  characteristic
         could be considered a variable or a data field,  which  could lead  to
         distinctly different error rates.  In our case,  we  considered  all
         information on a single questionnaire form as a  group  so that  one form
         equals one record.

     2.  Decide what is to constitute a lot.  A lot is defined  as a  homogeneous
         material unit from a common source.  In choosing the lot unit  we
         balanced the fact that a small number of large lots can  shorten inspec-
         tion time against the additional difficulty of processing the  rejected
         lots.  In our case a lot equals one file.

     3.  Choose the type of protection.  There are two types of protection:
         Lot Tolerance Per Cent Defective (LTPD) and Average Outgoing Quality
         Limit  (AOQL).  The AOQL applies to the average level of quality over
         all lots being inspected.  It is appropriate for a  continuing  supply
         of a product.  The LTPD applies to the quality level of each
         individual lot.  We chose the LTPD type of protection.

     4.  Choose a suitable level of LTPD or AOQL.  For LTPD  choose the value of
         per cent defective you are willing to accept not more than 10 per cent
         of the time, that is, reject at least 90 per cent of the time.  We
         balance the inspection costs against the consequences of accepting a
         file of bad quality.  We considered rates in the range of 1% to 3%
         LTPD.  We decided that 1% error rate was too costly and selected a rate
         of 2%  LTPD.

     5.  Choose between single sampling and double sampling.  For better economy
         in an  overall inspection effort, double sampling is usually preferable.
         However, for minimum variation in the workload, single sampling should
         be used.  We selected single sampling as a more straight forward and
         preferable method in our case.

     6.  Select the proper sampling table on the basis of the preceeding choices.
         We selected the Single Sampling Table for LTPD = 2  per cent (Figure 1
         reproduced from reference 1).

     7.  Obtain an estimate of the Process Average Per Cent  Defective  (PA).  Use
         previous data to obtain the  PA.  Even a rough estimate should be used
         if little prior data are available.  A poor  estimate will only decrease
         the economy of  the plan but  maintains the same LTPD protection.  After
         some  initial examination of  HERL data, the  column  entitled "Process
         Average  0.61% to 0.80%" was  used.

                                         34

-------
                         Single Sampling Table for

                lot Tolerance Per Cent Defective (LTPD) = 2.0%
  SINGLE
 SAMPLING


2.0%
  LTPD
LotSiM
1-75
76-100
101-200
201-300
301-400
401-600
601-600
601-800
801-1000
1001-2000
2001-3000
3001-4000
4001-6000
6001-7000
7001-10,000
10,001-20.000
20,001-60,000
60,001-100,000
PnetmA
OtoO.
•
An
70
86
96
100
106
106
110
118
118
118
118
198
196
196
200
200
208
c
0
0
0
0
0
0
0
0
0
0
0
0







02%
AOQL
%
0
0.16
0.26
0.26
0.28
0.28
0.29
0.29
0.28
0.30
0.31
0.31
0.41
0.42
0.42
0.42
0.42
0.42

0.03 to
ii
All
70
85
95
100
106
106
110
115
190
190
195
260
265
265
286
335
336
c
0
0
0
0
0
0
0
0
0
1
1
1
2
2
2
2
3
3
Average
0.20%
AOQL
%
0
0.16
0.25
0.26
0.28
0.28
0.29
0.29
0.28
0.40
0.41
0.41
0.80
0.50
0.50
0.51
0.58
0.88
Process Average
0.21 to 0.40%
«
All
70
86
95
100
105
175
180
186
255
260
330
335
336
396
460
620
586
e
0
0
0
0
0
0
1
1
1
2
2
3
3
3
4
5
6
7
AOQL
%
0
0.16
0.26
0.26
0.28
0.28
0.34
0.38
0.37
0.47
0.48
0.64
0.64
0.66
0.62
0.87
0.73
0.76
Process Average
0.41 to 0.60%
A
AU
70
85
95
160
166
175
240
245
325
385
460
455
515
620
650
710
770
e
0
0
0
0
1
1
1
2
2
3
4
6
5
6
6
8
9
10
AOQL
%
0
0.16
0.25
0.26
0.32
0.34
0.34
0.40
0.42
0.60
0.58
0.63
0.63
0.69
0.69
0.77
0.81
0.84
Process Average
0.61 to 0.80%
•
All
70
85
95
160
165
175
240
305
380
450
610
675
640
760
885
1060
1180
e
0
0
0
0
1
1
1
2
3
4
5
6
7
8
10
12
16
17
AOQL
%
0
0.16
0.25
0.26
0.32
0.34
0.34
0.40
0.44
0.64
0.60
0.66
0.69
0.73
0.79
0.86
0.93
0.97
Process Average
0.81 to 1.00%
n
AU
70
85
95
160
166
235
300
305
440
666
690
750
870
1050
1230
1520
1690
e
0
0
0
0
1
1
2
8
3
6
7
9
10
12
16
18
23
26
AOQL
%
0
0.16
0.25
0.26
0.02
0.84
0.36
0.41
0.44
0.68
0.64
0.70
0.74
0.80
0.88
0.94
1.0
1.1
                         Figure 1 - Reference 1
Reproduced by permission of John Wiley & Sons, Inc. and Copyright (1959)
Bell  Telephone Laboratories from Sampling Inspection Tables Single and

Double Sampling,  2nd  Edition by Harold F. Dodge and Harry G. Romig.
                                  35

-------
    8.   Choose a sampling plan for the given lot size and estimated PA.
         Since the  sampling plan is designed as a function of the PA, use
         the estimated PA as  the table entry.  Remember to obtain revised
         PA estimates from new data and if possible to select a more
         economical  plan.  For one HERL study, there were 7800 source
         documents.  Based on our estimated PA of 0.61% to 0.80%, we would
         go to the  2% LTPD single sample  table, locate the correct  PA
         column and find the  sample size  for 7800 forms.  This would result
         in the row corresponding to  7001 to 10000 forms being used.  A
         sample size of 760 forms with no more than 10 errors would be used
         for the  study.  For  the purposes oT our plan, the original source
         form was considered  correct  and  any code difference on the computer
         file was considered  an error.

    9.   Find the OC curve of the sampling plan.  If the operating  charac-
         teristic  (OC) curve  is satisfactory, choose the plan.  The OC curve
         for our  plan is shown in Figure  2.

    10.   Select sample units  from the lot by a random  procedure.  A preferred
         method for accomplishing randomization is the use of random numbers.

    11.   Follow the prescribed procedure  for single sampling.   Inspect each
         unit for the characteristics adopted  in step  one and in  accordance
         with sampling procedures.

    12.   Keep a running check of the  PA.  Change the sampling plan  as necessary
         to match  shifts  in  the PA.   Adopt a definite  time period for making
         new estimates such  as every  month or  every quarter.  In  our experience,
         the PA did not change significantly over 6 to 7 months.

     The Dodge-Romig acceptance sampling  plan  described  is not  only
being  used on the past  CHESS  studies  but  is  also used  on current  studies.
For each study  undertaken  by  the data processing staff,  a data  processing
protocol is  prepared in addition to  the  normal  study protocol.  The protocol
describes what  is  to be done  including manual  and computer  steps  and the
expected timeframe.  Edit  checks to  be performed usually  include

     1.   Check  for  valid  codes
     2.   range  checks
     3.   field  type, numeric, alpha  and/or.alphanumeric
     4.   consistency checks  such as  date  of  birth versus age.

Edits  may be  accomplished  by an  individually designed  program or  one or  more
SPSS runs.  SPSS  frequency distributions  are principally used to  identify  out
of range and  other unacceptable  codes.  Audit  trails  are maintained throughout
the processing.

     In conclusion, we  believe we  have a  successful  operational  quality  control
program for our current needs relative to processing  of  large computer  data  files.
For similar applications,  I  would  recommend  reviewing  these  procedures  as  described
by Dodge and Romig and  investigating more usage of  SPSS  as  a quick  evaluation
of the contents of the  data  files.


                                      36

-------
         PROBABILITY OF ACCEPTING A FILE WITH TRUE ERROR
           RATE 6 USING DODGE-ROMIG LTPD (2.0%) PLAN FOR
             n = 760, c = 10; -10000 DIARIES WITH ESTIMATED
                      ERROR IN RANGE .61 - .80%
    100
 X
                                            n=760,  c=10
                                               TRUE
                                              ERROR
                                              RATE  (e)
                             0.00
                             0.25
                             0.50
                             0.75
                               00
 o
 o

 1
 Q.
 Ul

 8
 u.
 O
 >
 Ij
 OQ
 00
 O
 CC
 Q.
      0.00
0.50         1.00         1.50

   % TRUE ERROR RATE (0)
    AOQ=0.79
    PROBABILITY
    OF ACCEPTING
     WITH 6

      100.0
      100.0
       99.0
       97.0
       85.0
       65.0
       42.0
       22.0
       10.0
2.00
            Figure 2 - Operating Characteristics Curve

Reproduced by permission of John Wiley & Sons, Inc.  and  Copyright
(1959) Bell  Telephone Laboratories from Sampling Inspection Tab!es
Single and Double Sampling. 2nd Edition by Harold F.    '~
Harry S.  Romig.
                                  Dodge and

-------
                                 REFERENCE

1.  Dodge, Harold F.  and  Romig,  Harry  G., Sampling  Inspection Tables Single and
    Double Sampling,  2nd  Edition,  John Wiley and Sons,  Inc., New York, 1959.
                                        38

-------
  USE OF  SUCCESSIVE TIME  DIFFERENCES AND  DIXON
       RATIO TEST FOR  DATA  VALIDATION
                      by
                Tyler  Hartwell
         Research Triangle Institute
Research Triangle Park, North Carolina  27709
                       39

-------
            USE OF SUCCESSIVE  TIME DIFFERENCES AND DIXON

                  RATIO TEST FOR DATA VALIDATION

                           Tyler Hartwell*


                            ABSTRACT


     This paper describes preliminary work on two statistical data

editing procedures designed to flag  suspect minute and hourly data from

the Regional Air Pollution Study (RAPS)  computer  data bank which contains

data from the Regional Air Monitoring System (RAMS)  network of monitor-

ing stations in St. Louis, Missouri.  In particular, the  data editing

procedures are:   (i) an intraparameter check where the differences of

successive minute averages for a given variable and station are evaluated,

and (ii) an intraparameter check where hourly averages for a given hour

and variable are compared across the RAMS network or across a selected

subset of stations by use of the Dixon ratio.  The paper  describes how

the procedures were developed for their current application and gives

results of applying the procedures to actual data on the  RAPS data bank.

In addition, suggestions for future research on the two procedures are

presented.  It is concluded that at the present time the  two data edit-

ing procedures should be useful to EPA in flagging suspect minute and

hourly data from  the RAPS data bank.
*  Dr. Hartwell is a senior statistician, Statistical Methodology and
   Analysis Center, Research Triangle Institute, Research Triangle Park,
   North Carolina 27709.
                                  40

-------
                         I.  INTRODUCTION









     The RAMS network of 25 monitoring stations in and around St. Louis,




Missouri collects data on a large number of pollutant (e.g., 0.,, CO,




THC, CH4, NO, N0x, S02, TS, H2S) and meteorological variables (e.g.,




wind speed, wind direction, temperature, dew point, delta temperature,




barometric pressure).  Figure 1 presents a map of the location of the 25




RAMS stations.  The figure indicates that the urban stations (nos. 101-




108) may be as much as 8 miles apart while the rural stations (e.g.,




nos. 122-125) may be as much as 35 miles apart.  The RAPS Data Bank




contains data from the RAMS network of stations.




     The purpose of the two statistical data editing rules  (i.e., minute




successive differences and the Dixon Ratio) examined in this paper is




only to flag suspect RAPS jlatji, not to delete it from the data bank.




That is, because of the vast amount of data collected by the RAMS




network, data editing rules are needed to limit the amount of suspect




data that meteorologists and atmospheric chemists need to examine in




detail.  Thus, the purpose of this paper is to examine two data editing




rules that indicate data that should be examined in more detail by EPA




personnel who have an intimate knowledge of the data that the RAMS




network collects.




     In addition, it is important to note here that the work presented




here is only preliminary.  Because of the complexity of trying to obtain




data editing rules that apply to a large network of monitoring stations,




additional work needs to be done on refining the two rules.  However, at




this point in time, it is felt that the two data editing rules presented




should prove to be useful in flagging suspect data from the RAMS network.





                                   41

-------
    °t
    CO I
    CM I

-------
                II.   MINUTE SUCCESSIVE DIFFERENCES



     The RAMS data received at the Research Triangle Park, North Carolina

contains minute data on several air pollution and meteorological vari-

ables.   Several computerized range validation checks are performed on

this data by the prime contractor, prior to forwarding it to the RAPS

Data Bank.  The RAPS Data Bank was interested in determining if a

statistical procedure could be used in further validation of the data,

to flag minute data values which appeared to be outliers.  In particular,

there was a need to develop and evaluate a procedure (i) which could be

applied to each station's data for one variable at a time and (ii) was

easy to compute and only required one pass through the data.  Accord-

ingly, this study was limited to a simple statistical procedure that

required little computation.  After discussions between EPA and RTI

staff members, it was decided to examine a statistical data editing rule

based on minute successive differences.

     In general, the editing rule examined is designed to flag minute

values which are relatively much higher or lower than the preceding

minute value; i.e.,


                                  flagged value
          Variable
            Level
                         Time in minutes

Thus, the editing rule is designed to detect large spikes in the minute

values of a variable at a station.

                                   43

-------
     In particular, the data editing rule is the following:  at a par-




ticular station compute successive differences between minute values of




a particular variable and if a successive difference is "too large" then




flag this value.  This rule is extremely simple to apply and requires




only one pass through the data base.




     In order to determine when a successive difference was "too large",




the (i) distributions, (ii) sample means, and (iii) sample standard




deviations (s.d.) of minute successive differences for several stations,




times of the day, and air pollution and meteorological variables from




the RAMS network were examined.  For example, Figures 2, 3, and 4 pre-




sent three of these distributions for the variables windspeed, ozone,




and NO-.  In all, over 200 of these minute successive difference plots




were examined.




     After examining these distributions and the corresponding sample




means and standard deviations in detail, it appeared reasonable to




assume that in  general the minute successive differences were approxi-




mately normally distributed with a mean of zero.  However, it was also




clear that the  standard deviation of minute successive differences was




not constant over  stations, times of the day, seasons of the year, and




pollutant or meteorological variables. For example, Table  1 presents




s.d.s of minute successive differences for CO and methane  by the factors




time of the day  (0-4 a.m., 4-8 a.m. and 8-12 a.m.), season of the year,




and two rural and  two urban stations.  It is obvious from  the table that




the s.d.s vary  a great deal over the various factors.




     Accordingly,  it was decided to assume that the distribution of




minute  successive  differences for variables in the RAMS network was




normally distributed with a mean of zero and a standard deviation that





                                   44

-------
CNl

 LU
 ce

 CD
 CO
 LU
 O

 LU
 a: .
 LU !  -
 — 
 CO
 CO O
 LU
 CJ  LU
 CJ  SI
 ZD  •—
CO I—
    CD
 a CD
 LU cn
 LU i—I
 a.
 CO  >
 a  <
 •z. t=\
                                                                                                       ru
i
CO

CsJ

 II

 LU
 N
 I—»
CO
                                                                                                                 CO
                                                                                                                  CD
                                                                                                                  CO
 LU  CD
 0  rH
 O   O

 h-   K
 D   <
 CQ   h-
 —•  CO
 a:
 h-
 co
                                                                                          1
                                                                                                       ru
                                                                                                        i
                                                                                                                 LJ
                                                                                                                 Q
                                                                                                                 in
                     CD
                       *
                     o
                                                                QJ
 
  »
 
                                                    §•
 I
 H


-------
                                FIGURE 3


             DISTRIBUTION OF OZONE  MINUTE  SUCCESSIVE DIFFERENCES
              FOR STATION 122; DAY  180,  1976;  TIME 0 TO 4 A,M,
 o
 c
 0)
 3
 cr
 OJ
 V-i
OJ


•H
4J
o . *t —
00


-


-


™
„


_
0Ck
-e















	 	 P*1
.6 -0.4 -0.
















2
















1






MM









(

MM*
MHMP
•MHB
mm**
mwm*













\


__













)

•






— •




1 — «
{
"l
1 1 f 1 ' 1
8.2 6.4 8.6
                                 xie
                                    -2
 MEAN"  3.60089    STDEU= 0.00084
SAMPLE SIZE = 236
                                    46

-------
                                FIGURE
            DISTRIBUTION OF NC^ MINUTE SUCCESSIVE DIFFERENCES

            FOR  STATION 122; DAY 180, 1976;  TIME 4 TO 8 A,M,
o
c

-------
PQ
          CQ

          LU
          z
          <
          in
          t-
          LU
          Q
          Z
               z
          a:   o
          o   •—
          u-   H-
CD
i\
CD
              GO
          •-•   UJ
               a.
          co   >-
          LU  i—
          u
          Z   Q
          LU   Z
          on   <
          LU
          U_    •»
          U-   C£
 LU
 >  U_
 —  O
 CO
 CO
 LU
 O  CO
 O  <
 ID  LU
               Z
               O
          GO  GO

           LU    -N
 u.
 O
               LU
           CO  •-•
           Z I—
           o
 LU
a

 Q
 a:
 <
 a
 z
 <

GO



CN!
CNI
rH
z
o
1
r—
<
f—
GO




r-x
r-H
i-H
Z
0
>— 1
J-
<
1—
GO





•zr
0
r-H
Z
0
K
t-
GO




i-H
O
, 1
^H
^
0
1— 1
1—
<
GO
















CNJ
t-H
i
oo

oo
1
C3-

j- s::
i
0 -
<
G
LU
_l
CO
<
1— t
a:
<











cn
t-H
i-H
•
1-^
hn
CD
"
i-H
cn
CD

CD
CD
i— 1
•

r-H
un
t— 1

CD
UD
CNI
•
cn
ca-
r—i
-

un
UD
r-H
•
CNJ
i_n
CNJ

OL
LU
r-
z
»— 1
is.
CD
-=r





CD
<_>
CD
fO
CD
"
OO
CNJ
CD
^T
r-H
CD

cr
UD
cr
*
t-H
hO
CD
"
CNI
o->
CD

CNI
CNJ
i_n
»

t^\
00
CNI

CD
l^>.
CNJ
r-H
CO
CD
-

cr
cn
rA
"
^r
Ln
o
™
CD
Z
i— i
o:
a.
GO
cn
CNI
r-H





CD
r-H
CD
"
r^.
r— 1
CD
cr
!—i
CD

UD
r>n
CD
•
UD
CNI
CD
"
cn
CNJ
CD

UD
r-H
^r
-

r-H
hO
CNJ

CNJ
CNI
r— I
UD
-=r
CD
-

UD
OO
CNI
"
Ln
•— i
CD
**
cc.
LU
S
SI
•=>
GO
CD
CNI
OJ





CNI
CNI
CD
"
r\
UD
CD
N-\
r-H
CD

i-H
CN!
CD
"
hO
CN)
CD
"
h^
CNJ
CD

tn
cn
CNI
-

r-H
cn
CD

en
oo
r-H
r->.
UD
CD
-

i-H
cn
CD
•
Ln
cr
CD
™


_l
_l
<
LL_
•^^
^v.
CD
l\
CNI
















cn
r-H
CD
**
r^
r-H
CD
"
cr
CNJ
CD

UD
CD
CD
-

K>
hn
CD

r-H
UD
CD
r^
N^
CD
-

CD
cr
o
"
CD
UD
t-H
m
OL
LU
\-
Z
H— 4
~-^-
CD
•=r


LU
z
<
•3^
I-
LU
s:
t-H
t-H
CD
•*
t\
i-H
CD
K^»
CNJ
CD

OO
CD
CD
~
cn
CD
CD
**
!-«>.
CD
CD

CD
t-H
CD
-

CO
CNI
CD

.3-
r-H
CD
CD
r— 1
CD
-

OO
r-H
CD
"
i_n
CN)
CD
m
o
z
•—*
C£.
D-
GO
cn
CNI
r-H





un
CD
CD
"
UD
CD
CD
Ln
CD
CD

CO
CN)
CD
™
|V^
i-H
O
*
UD
r-H
CD

r-^
r-H
CD
•

cr
t— i
CD

CD
CNJ
CD
*
1^.
CD
CD
-

cn
CD
CD
"
CD
t-H
CD
•i
a:
LU
s:
2:
a
GO
V^
^*S.
CD
CN)
CN|





1^
CD
CD
m
00
CD
0
UD
CD
CD

—j
i-H
CD
"*
UD
t-H
CD
•
CNI
r-H
CD

i_n
r>n
CD
•

K"\
cr
CD

CN)
i_n
CD
CO
CD
CD
•

cr
r-H
CD
"
UD
cr
CD
—


_i
_i
<
u_
N^,^
^*S.
CD
r^.
CNJ





                                                                                                   <

                                                                                                   <
                                                                                                   z
                                                                                                   LU
                                                                                                   1—4
                                                                                                   O
                                                                                                   I-M
                                                                                                   U.
                                                                                                   10-
                                                                                                   ^}
                                                                                                   CO
                                                       48

-------
may vary over time of the day, season, and type of station (urban or




rural).   This implied that a minute successive difference would be




flagged when it was greater than a function of the appropriate standard




deviation (e.g., a standard procedure for detecting outliers for the




normal distribution with mean zero is to flag observations which are




greater (or less) than 4 s.d.s).  The probability of an observation




being greater (or less) than 4 s.d.s for the normal distribution is less




than .0001.   Thus, the problem reduced to determining for each variable




of interest the appropriate standard deviation which might depend on




time of the day, season of the year, and type of station.




     To examine this problem, standard deviations of minute successive




differences for each variable of interest for approximately 4 days per




season in 1976, 4 stations (2 urban and 2 rural), and 3 times of the day




(0-4 a.m.,  4-8, 8-12) were computed.  Thus for each variable between 100




and 192 (4  days x 4 seasons x 4 stations x 3 times = 192) standard




deviations  were computed.  A standard deviation was only computed if at




least 60 minute successive differences were available during the 4-hour




time period being considered.  The results of some of these computations




are summarized in Table 2 for 10 of the variables measured in the RAMS




network.  These 10 variables were chosen not only because there was




interest in editing their minute values but also because sufficient data




was available on them for the days selected for computing standard




deviations.   The s.d.s presented in Table 2 are average s.d.s (i.e.,




averaged over several stations and days).




     Using the s.d.s in Table 2 a statistical technique referred to as




the analysis of variance (ANOVA) was used to test if these average




standard deviations were significantly different (in a statistical





                                   49

-------
 ce
 o
 CO
 LU
 O
 z
 LU
 o:
 LU
Q


 LU
 CO
 CO
 UJ
 o
 u

00

 UJ
      oa
      <
      o:
      <
      >-
      m
O

CO

§   <

p   <
<  C3
 Q
 C£
 <
 a

 <

oo

 u.
 o
 a.
CNI


 LU
UJ
cc
LU
•*
"^
CsJ
_l
ce
LU
CD



ce
.
ca
H
c
u_
^

z
o
»-4
1—
<
t-
oo
_i
LJU
ce
UJ
y|
jr
3
oo


O
Z

cc
0.
oo

CC
UJ
1—
z
ae

i
OO

oo
.sr

i
CD
3!
a:
a;
z

03
cc

LU
	 |
CO
•^
•— «
ce
<







N"1*
Ln


i— i
cr
•



CS|
Ln
.





•sr
UD
.




OO
Ln •

Ln
-


cr

r— 1

•=r



cn
Ln
^.
Q 0
LU LU
LU CO
a. ^
CO CO
a 01
Z LU
3 UJ






cn
CD


UD
0
•



cn
O
.





•— <
•




CD
i— 1

r- 1



UD
CD

CD

OO
CD



^.
UJ
a:

l—
^
CD
CD
CD
t— 1
CSJ
3

CSI
i-H
8
Cs|
CD
CD
i— 1
S
•
CD
Csl
CD
CD



UJ x->
Z •£.
o a.
NI a.
CD ^— •






cn

•—I


O
CSI
"


O
Ln
•—i
•




Ln
CD
i— i
.



UD
cn
CD
CSJ
Csl
CSI


cn
i— t
i— i

CD
^
r- 1
1— 1
•

r- 1
r— 1



s~^
•z.
o a.
<_> a.
v- f






LA
CSI
CD


LA
o
•


|x^
1— 1
CD
m




zy
CSI
CD




^r
CSI
CD

o


oo
CSI
CD

Csl
CD
cn
!— 1
CD
-

r-H
CD


LU

*c y!
i a.
1- D.
LU "*— ^
"





LA
f*s^
CD


cn
CD
•


CSI
oo
o





fs^_
Ln
CD




Ln
uo
CD

OO
CD


-a-
oo
CD

LA
CD
Ln
CD


^^
cn
CD



^^
C— ) ^.
2C Q.
1 — 0-
1 — /






•a-
0
CD


CD
O
CD
•

rH
hn
CD
O




Csl

i



UD

CD
O
cn
o
CD

OO

i
CSI
8
r-l
I— I
CD
CD

cn

CD
CD



^•^
y*
CD Q-
~Z Q-
•— '





^3.
\-C\
0
o


cn
O
CD
•

CD

CD
O




cn
Ln
o
CD



OO
Ln
C~i
°
Ln
Ln
CD
CD

Csl

CD
CD
Ln
^J-

CD
Csl
CD
CD

OO
OO
CD
CD



^-^
x 2:
0 0-
TZ a.
— '





oo
1— 1
o
CD


CSI
CD
0


CSI
CD
CD
CD




CD
t— 1
8



OO
CSI
CD
0
Csl
CD
CD

Ln
1— 1
8
IJTJ
CD
CD
cn
CD
8

^
Csl

cc

u.

3 ^
00 Q.
a.

^
0



1— 1
f-H
CD
0


Ln
CD
s
•

N"\
CD
CD
CD




j— . (
c-H
CD
CD



CSI
Csl
CD
CD
.—1
T— 1
O
CD

CD
rH
0
CD
g
8
UD
CD

-
LT\
r— |
8



s-^.
Csl Zi
CD Q-
oo a.



                                                                                                                    CN
                                                                                                                     Q
                                                                                                                   o cn
                                                                                                                    CD Ln
                                                                                                                    i-H CD
                                                                                                                    cn CD
                                                                                                                    Z CD
                                                                                                                    o r>n
< cn   o
i— cn   —
co CM   i—

     -.   t-
co CD   co
5- 1—
< CSI   U.
a       o
     «s
c3 Ln   111
z UD   a.
< csi   >
         i—
CO   >
Z CD   Q
O CSI   z
— CSI   <
                                                                                                                     I- cn
                                                                                                                              z
                                                                                                                              o
                                                                                                                        csi    co
                                                                                                                     _i       <
                                                                                                                     <  ^    tu
                                                                                                                     cc CD    to
                                                                                                                     UJ OO
                                                                                                                     > 1— I     -
                                                                                                                     UJ       >
                                                                                                                     fO  -N    <
                                                                                                                        O    Q
                                                                                                                     cn r-^
                                                                                                                     UJ r— I    U.
                                                                                                                     >       o
                                                                                                                     o  ->
                                                                                                                        CD    UJ
                                                                                                                     co tf\    E
                                                                                                                     co  cn    a:
                                                                                                                         CSJ    LU
                                                                                                                     z  — i    >
                                                                                                                     o        o
                                                                                                                     Q CD
                                                                                                                     LU CD
                                                                                                                     CO r— I
                                                                                                                     <
                                                                                                                     Z UD

                                                                                                                     UJ   -.
                                                                                                                     Z CD

                                                                                                                     LU
                                                                                                                     _l   •>
                                                                                                                     a. cn
                                                                                                                      •

                                                                                                                     LJ  Q
                                                                                                                               <
                                                                                                                               LU
                                                                                                                               UJ
                                                                                                                               a:
                                                                                                                               LU
                                                                                                                              CH
                                                                                                                              LU


                                                                                                                             CD
                                                               50

-------
sense).  In particular, a separate ANOVA was carried out for each of the




10 variables.  In each of the 10 ANOVAs, statistical tests were used to




determine if average standard deviations by time of day, type of station,




and season of the year were significantly different.  The results of




running the 10 ANOVAs indicated that in the majority of cases the average




s.d.s in Table 2 were significantly different.  For example, in column 3




of the table (i.e., ozone by station) the average s.d.s of minute suc-




cessive differences for ozone for two urban stations was .0020 and two




rural stations was .0012.  The test of significance of these two averages




was significantly different at the .01 level.  Note that Table 2 also




presents the average s.d. for each variable over time of day, season of




the year, and type of station (e.g., .0016 for ozone).




     The average standard deviations in Table 2 clearly indicate that




from a statistical point of view the s.d.s of minute successive dif-




ferences are significantly different for several of the variables for




one or more of the factors examined  (i.e., station type, time of day,




and season of the year).  Thus, to be strictly correct  (in a statistical




sense) in applying the data editing rule based on minute successive




differences, it would be necessary to base the rule on varying s.d.s by




season of the year, etc. (i.e., the rule would be ±4 s.d.s where the




s.d.s are given in Table 2).




     Due to the fact that the above data editing rule might prove to be




somewhat confusing, a more conservative and easier to program rule has




been initially examined  (of course, only actual application of a data




editing rule can determine its practical usefulness).  This rule is




based upon using, for all minute successive differences for each vari-




able in Table 2, ±4 times the largest average s.d. across station type,







                                   51

-------
time of day, and season of the year.   Thus, for ozone the rule would be




based on ±4 times the s.d. = .0024 (i.e., the average s.d. for summer).




This rule is extremely easy to apply requiring only one value to be




exceeded by each minute successive difference of a variable regardless




of station, time of day, or season.  Of course, it is conservative in




the sense of having a limit which is somewhat high in many cases (e.g.,




for ozone in winter a more exact rule would be based on a s.d. = .0007).




     Accordingly, using the largest average s.d. for each variable in




Table 2 and basing the data editing rule on ±4 s.d. limits, Table 3




gives possible limits for flagging minute successive differences by




variable for the RAMS network of stations.  In deriving the limits in




Table 3, it was noted for the RAMS network that CO, methane, and THC




were only measured every 5 minutes and total sulfur (with the exception




of Station 117) and S09 were only measured every 3 minutes.  Therefore,




the s.d.s given to RTI based on minute successive differences of these




variables were underestimates for detecting spikes at five (or three)




minute intervals (i.e., the average s.d.s in Table 2 are too small for




these five variables).  In an attempt to compensate for this under-




estimate, Table 3 includes an adjustment factor for these five vari-




ables.  The adjustment factor multiplies the ±4 s.d. limits for CO,




methane, and THC by /5 and the ±4 s.d. limits for total sulfur and S02




by /3.  These factors were derived by assuming that on the RAMS data




file the minute successive differences for CO, methane, and THC are zero




except for every 5th minute (etc. for total sulfur and S0~).




     Using the limits given in Table 3, RTI then examined the percentage




of minute successive differences that would be flagged for Stations 101




and 122 for 8 days in 1976 for 10 RAMS variables.  The results of these





                                 52

-------
                          TABLE 3
         POSSIBLE MINUTE SUCCESSIVE DIFFERENCE LIMITS
                 ON 10 RAMS VARIABLES17
                  VARIABLE                LIMIT
       WINDSPEED  (METERS/SEC,)           ±3,0
       TEMPERATURE  (°C)                  ± ,660
       OZONE  (PPM)                       ± ,0096
       CO  (PPM)                          ±1,97
       METHANE  (PPM)                     ± ,316
       THC  (PPM)                         ± .m
       NO  (PPM)                          ± ,028
       NOX  (PPM)                         ± ,035
       TOTAL  SULFUR  (PPM)                ± ,022
       S02  (PPM)                         ± ,015
***'  BASED ON ±4 STANDARD DEVIATION LIMITS,   IN ADDITION,
    FOR CO, METHANE, AND THC, THE ±4 S,D,  LIMITS HAVE BEEN
    MULTIPLIED BY-\/5 TO ADJUST FOR THE FACT THAT THESE
    VARIABLES ARE ONLY MEASURED EVERY 5 MINUTES,  SIMILARLY,
    FOR TOTAL SULFUR AND S02 THE ±4 S,D,  LIMITS HAVE BEEN
    MULTIPLIED BY>/3 SINCE THESE VARIABLES ARE ONLY MEASURED
    EVERY 3 MINUTES,
                            53

-------
computations are given in Table 4.   The table shows that except for



ozone the percent flagged per variable was less than .6 percent.   For



ozone it appears that entirely too  many minute differences were flagged.



For example, Figure 5 presents a plot over two days of minute values of



ozone for Station 105.  Examination of the figure indicates that a



relatively large percentage of the  minute values would be flagged using



the limits given in Table 3; although, EPA personnel have indicated that



the data in Figure 5 is not atypical.  In addition, discussions with EPA



personnel have indicated that in the RAMS network, five of the stations



(101, 104, 105, 107, and 115) are heavily affected by traffic.  Thus, it



may be necessary for these stations to have higher limits for flagging



minute values for ozone than those  given in Table 3.  Also, it has been



suggested that for these traffic affected stations it may be necessary



to examine both ozone and NO  minute values simultaneously before flagging
                            X


ozone values (e.g., if a minute ozone value jumps significantly from one



minute to the next but the NO  reading does not jump, then and only then
                             X


should the ozone value be flagged.)



     In addition to the results given in Table 4, the percentage of



minute successive differences that would be flagged for Station 105 for



8 days in 1976 using the limits given in Table 3 were examined.  The



results were the following:



             Percentage Flagged for  Station 105


               Variable       Percent Flagged
ws
Temp.
°3
CO
CH.
4
THC
NO
NO
X
TS
S00
.04
.12
7.9
.56
.41

.90
1.5
1.4

1.5
1.8
                                   54

-------
The above table clearly indicates that additional work needs to be done



for ozone since entirely too many values are being flagged.  In addition,



for the other pollutant variables many more values are being flagged



than in Table 4 (Stations 101 and 122 combined).  Thus, it would seem



that the variables at Station 105 are probably being affected by auto-



mobile traffic.



     Accordingly, at the present time it appears that the minute suc-



cessive differences limits in Table 3 except for ozone are probably



reasonable for a majority of the RAMS monitoring stations.  However, for



Stations 101, 104, 105, 107, and 115 which are heavily affected by



traffic, additional work needs to be done on the limits for the pollutant



variables (the limits for windspeed and temperature appear reasonable



for all stations).  Of course, since only Stations 101, 104, 105, 116,



117, and 122 were examined in this analysis to date, it may be that



additional work is needed for other specific stations.  For Stations 101,



104, 105, 107, and 115 wider minute successive differences limits should



be examined for the pollutant variables particularly for ozone.  As



mentioned previously, it may be for ozone that simple minute successive



difference limits are impractical (i.e., for ozone at the traffic



effected stations it may be necessary to have a minute data editing rule



which is tied to another pollutant such as NO ).
                                             X


     Another refinement of the limits given in Table 3 that might be



examined is to have them vary by time (see Table 2).  For example, for



ozone the s.d. of minute successive differences is much higher for 8-



12 a.m. than from 0-8 a.m.



     Finally, before proceeding, Figures 6 and 7 present two plots of



minute values for CO from the RAPS Data Bank for Stations 101 and 105,






                                  55

-------
respectively.  These plots indicate minute CO values which were flagged




by the CO limits given in Table 3.   The plots indicate that the data




editing rule based on minute successive differences may be quite useful




in detecting minute outliers for CO.
                                    56

-------
                        TABLE 14

PERCENTAGE OF MINUTE SUCCESSIVE DIFFERENCES FLAGGED FOR
STATIONS 101 AND 122 FOR 8 DAYS IN 1976; BY VARIABLE^
VARIABLE
WINDSPEED
TEMPERATURE
OZONE
CO
Cfy
THC
NO
MOX
TS
S02
PERCENT FLAGGED
,53
,17
3,0
,06
,11
,24
,12
,12
,51
,34
   TOTAL NUMBER OF MINUTE SUCCESSIVE DIFFERENCES COMPUTED
   = 23,040 (8 DAYS x 2 STATIONS x 1,440 MINUTES/DAY),
   THE DAYS WERE 1, 2, 96, 97, 231, 232, 286 AND 287,
                            57

-------
CO
o
  .
o
u.
LU
 LU

V^
 CO
 CO
 LU
LU
                                                             Q.
                                                             0-
                                                    58

-------
                                                                  5)
                                                                  3
                                                                  tfl
                                                                  60
                                                                  oo
       V
       •<  r~  in
       x  r  o

       i  r^  »vi  —i
       o  iu  -

Q

a:
o
u.

UJ
               o
               o
            « ".
            in <»
or  •—    -S
      OQ

      LJJ
      C/5
      ID
      {/)
      a:
      LU
       •>-4 CO
                              O
       
                                                                                                         I

                                                                                                        »
                                                                                                             O


                                                                                                             


                                                                                                             00
                                     CO
                                                           i                i                r

                                                 LD              ^r              csi
                                                           CD  a.
                                                           CJ  0.
                                                             59

-------
UJ
CO VERSUS TIME BY MINUTE FOR DAYS 2 AND 3 IN 1976
RW1S RETRIEVAL PLOT USING DEFAULT LABELS DALEXXX)
con n/760ioa/769ie3/0eec/a359/i/i05/2/3'4/co 11/92/7
1SI33M
C.l COAB
•
•>
•









1 I 1 1
3 =;
^ .=3









1 I 1 1
3 —
N"









1 1 1 i
5 C
\ CN









1 1 1 1
3 r~.
J «—
"^
4
-j
4
ml
m
__3C
•
t
*e
_J
1
|
4
o-H
^_<
1 1 1 I
> C
H




t




till
3 C
t^-
1
^
CD
I
»
cu
1
n
CB
Ul
O £
UJ -
O O
s: 2
§ $
^
CN z
r-H g
A C
- »»
t 0
IV) Ul
<» -
m
o r^
g §
^^^ ^9
^ 1
^S «»
O g
0 ?
.. 10
03 ^
» 2
C» \
•• *^
ni
3 *
H
                                                                                                     LLJ
                                                     o  a.
                                                     cj  a.
                                                 60

-------
                         III.  DIXON RATIO







     This study also examined the possibility of flagging hourly data



across the RAMS network by use of the Dixon Ratio.  That is, for a



particular variable and hour of the day this ratio will flag stations



whose hourly averages are "too high" or "too low" as compared with the



other stations in the network for that hour and variable.  The Dixon



criterion was examined as a potential validation procedure for hourly



averages because it was a simple procedure which was easy to compute and



only required one pass through the data.



     In brief the Dixon criterion is the following:



          For a particular hour and variable (e.g. , 0.,) rank the N



     hourly values X. over stations such that X. ^ X_ < ... £ X^.  Then



     compute the criterion (if N is between 11 and 13)





          Wz
     RU = ~z—v—  (to check largest hourly value)
           VX1
           - ;r~  (to check smallest hourly value)               (1)
     If Rj- (or R. ) is too large reject largest  (smallest) hourly value



     (e.g., if number of stations = 12 and R^ >  .642, reject X^ at the



     .01 level of significance under normal distribution assumptions).



     Note, see [1] for the Dixon criterion when  N is less than 11 or



     greater than 13.



     In general, the Dixon criteria is designed  to reject the following



type of station hourly values:
 [1]  Dixon, W. J., "Processing Data for Outliers," Biometrics, Vol.  9

    (1953), p. 74.



                                  61

-------
                                                     Flagged value
                   Station Hourly Values






Thus, the criteria is designed to flag hourly values which contribute a




relatively large percentage of the range of the hourly values across the




network.  Note, that the criterion as given in (1) only flags one high




and one low hourly value and not multiple hourly values.  For the




present study RTI did not examine the flagging of multiple hourly values




for a particular hour of the day.




     The Dixon criterion given in Equation (1) was first applied across




the entire RAMS network of 25. stations.  The results of these calcula-




tions indicated that entirely too many hourly values were being flagged




for several of the RAMS variables (e.g., ozone, CO, and N02).  Accord-




ingly, the reason so many hourly values were being flagged was deter-




mined by examining the sample means of several RAMS variables for urban,




residential, and rural stations in the RAMS network.  Table 5 gives a




summary, for six RAMS variables, of some of these computations.  The




table gives the sample means and standard deviations for the three types




of stations by season of the year.  In general the table shows that the




sample means of hourly values for a particular variable are not the same




for urban, residential, and rural stations in the RAMS network.   (This




is particularly true of the pollutant variables; whereas, for the




meteorological variables the means are much more similar across the




network.)  Statistical tests of these station type means were found to




be significantly different in several cases.  Thus, it became clear that




one of the underlying assumptions of applying the Dixon criterion was




being violated; namely, that the hourly station values come  from a
                                  62

-------
                                   TABLE 5
HOURLY SAMPLE MEANS AND STANDARD DEVIATIONS FOR URBAN,  RESIDENTIAL,  AND RURAL

      STATIONS^ IN THE RAMS NETWORK SY SEASON OF THE YEAR AND VARIABLE

                                            STATION TYPE

VARIABLE

°3
(PPM)


CO
(PPM)


H02
(PPM)


CH/,
(PPM)


WINDS PEED
(METERS/SEC.)


TEMPERATURE
(°C)


SEASON
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
URBAN j RESIDENTIAL
MEAN2/ STD. DEV,
.030
,052
.017
.008
,699
.999
.837
.451
.030
.046
.026
.024
1.781
2,068
1.834
1.768
4.122
2,799
4.518
5.255
13.993
26.474
11.924
-5.180
.013
,023
.021
.006
.756
.917
,987
.526
,017
,087
.015
,012
.257
.486
.391
,275
1.775
1.082
1.385
1.656
4.337
1.879
6.271
6.648
MEAN
.034
,055
.018
,009
.634
1.072
.793
.418
,025
.047
.023
.017
1.604
1,905
1.787
1,436
3,928
2.710
4.122
4.871
13.864
25.767
11.277
-4,612
STD. DEV.
,013
.018
.011
.006
.670
.837
1.092
.327
.016
,094
.012
.010
.202
,279
,244
,235
RURAL
MEAN STD.
.041
.063
.025
.016
,333
.329
.249
.215
.011
.011
,011
.010
1.480
1.778
1,621
1.604
1,525 ! 4,020 1,
1,183
1.252
1,602
4.314
3.153
6.212
7,228
2,249
3.967 1.
5,277 2,
13.367 4.
25.497 2,
10.786 6.
-5.535 6.
DEV.
015
021
013
008
474
638
307
163
012
010
010
008
294
297
195
210
839
898
483
097
408
026
335
722
URBAN = STATIONS 101 TO 108; RESIDENTIAL - STATIONS 111 TO 113, 119,.
        AND 120; RURAL - STATIONS 109, 110, 114 TO 118, 121 TO 125.

MEANS AND STANDARD DEVIATIONS ARE BASED UPON 4 HOURS PER DAY FOR 5 DAYS
OVER THE VARIOUS STATIONS (URBAN, RESIDENTIAL, OR RURAL).
                                     63

-------
normal distribution with the same mean and variance.  Instead, for




example, the hourly station values for ozone have different means for




urban and rural stations (the means are higher for rural stations).  The




consequence of groups of stations having different means is illustrated




below:





          Urban                      Rural
                 Station Hourly Values






The above figure shows, using the Dixon criterion, that some rural




stations may be flagged simply because their means are always higher




than the means for urban stations.  Accordingly, after examining sample




means such as those presented in Table 5, RTI decided to apply the Dixon




criterion separately to rural stations (Stations 109, 110, 114 to 118,




121 to 125) and urban-residential stations  (101-108, 111 to 113, 119,




and 129).




     After applying the Dixon criterion to  the two types of stations




separately, RTI then examined the results and again found that too many




hourly values were being flagged.  Accordingly, after discussions with




meteorologists and air chemists it was decided that the Dixon criterion




for certain pollutant variables should only be applied across the rural




or urban-residential stations if the following criteria were met:




      (i) the high station value > twice the low station value, and




      (ii) the high station value > some constant  (e.g., constant =  .03 ppm




          for ozone).




The first criteria simply means that a factor of 2 for an hourly average




across the network is not uncommon.  Criteria  (ii) limits the applica-




tion of the Dixon Ratio to situations where most of the measurements are






                                  64

-------
well above minimum detectable.  In addition, it was decided that the



Dixon criterion could not be used for hourly NO, NO , TS, and SO,,
                                                   X            «£


values.  Furthermore, it was felt that the use of the Dixon criterion



for CO was questionable due to the heavy influence of traffic on this



variable.



     With the above restrictions, the Dixon criterion for detecting high



hourly values only (i.e., R  in Equation (1)) was then applied to 7
                           rl


variables in the RAPS Data Bank for both urban-residential and rural



stations.  In applying the rule an hourly value was flagged if R,, was



greater than .7 (except for dew point where R,, > . 6 was flagged).  The



results of these computations are presented in Tables 6 and 7.  In



addition, Tables 8 and 9 present examples of flagged hourly values for



several different RAMS variables.  Tables 6 and 7 indicate that the



percent flagged is usually 5% or less.  Thus, the Dixon rule as applied



does not seem to be impractical.  In addition, the examples given in



Tables 8 and 9 clearly indicate hourly values for several variables on



the data bank that should be examined in more detail by knowledgeable



meteorologists and air chemists.



     Accordingly, as with minute successive differences, RTI feels that



the Dixon criterion will be useful in flagging hourly data across the



RAMS network.  However, further refinement of the rule may be required.



For example, two points that need further examination are:



       (i) can the rule be applied to flagging low hourly values  (RT in



          Equation (1)), and
                                  65

-------
(ii)  can the rule be applied in practice to the stations which are




     heavily influenced by traffic (101, 104,  105,  107,  and 115),




     particularly for the variable,  CO?   An alternative  here would




     be to only use the Dixon rule for the 20  stations not heavily




     influenced by traffic.
                              66

-------
                         TABLE 6

  RESULTS OF APPLYING DIXON RATIO TO 12 URBAN STATIONS
                  IN RAMS NETWORK172717
VARIABLE
OZONE
CO
CH4
THC
TEMPERATURE
DEW PoiNT^7
WlNDSPEED
PERCENT OF TIME RATIO > ,7
1,0
2,9
2,1
5,2
2,7
4,4
4,0
NUMBER FLAGGED
5
14
10
25
13
21
19
17  RATIO APPLIED TO ALL 24 HOURS ON 20 DIFFERENT DAYS FOR THE
    VARIOUS POLLUTANT AND METEOROLOGICAL VARIABLES (= 480 RATIOS
    FOR EACH VARIABLE),

    RATIO ONLY USED TO FLAG HIGH VALUES,

    FOR OZONE RATIO APPLIED ONLY IF HIGH STATION VALUE > ,03 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR CO RATIO APPLIED ONLY IF HIGH STATION VALUE > 3,0 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR CH/j RATIO APPLIED ONLY IF HIGH STATION VALUE > 2,0 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR THC RATIO APPLIED ONLY IF HIGH STATION VALUE > 2,0 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR WlNDSPEED RATIO APPLIED ONLY IF HIGH STATION
      VALUE > 3,0 METERS SEC,

    FOR DEW POINT PERCENT OF TIME RATIO > ,6,
                             67

-------
                             TABLE 7

           RESULTS OF APPLYING DIXON RATIO TO 13 RURAL
                 STATIONS IN RAMS NETWORK^/^/

  VARIABLE        PERCENT OF TIME RATIO >  ,7        NUMBER FLAGGED
OZONE                       3,8                           18
CO                          6,0                           29
Cfy                         4,2                           20
THC                         5,0                           24
TEMPERATURE                 8,5^                         41
DEW POINT^                 6,5                           31
WlNDSPEED	*K4	21


1*  RATIO APPLIED TO ALL 24 HOURS ON 20 DIFFERENT DAYS FOR THE
    VARIOUS POLLUTANT AND METEOROLOGICAL VARIABLES (= 480 RATIOS
    FOR EACH VARIABLE),

*/  RATIO ONLY USED TO FLAG HIGH VALUES,

**  FOR OZONE RATIO APPLIED ONLY IF HIGH STATION VALUE > ,03 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR CO RATIO APPLIED ONLY IF HIGH STATION VALUE > 3,0 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR CHjj RATIO APPLIED ONLY IF HIGH STATION VALUE > 2,0 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR THC RATIO APPLIED ONLY IF HIGH STATION VALUE > 2,0 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR WlNDSPEED RATIO APPLIED ONLY IF HIGH STATION
      VALUE > 3,0 METERS SEC,

^  FOR DEW POINT PERCENT OF TIME RATIO > ,6,

^  FOR TEMPERATURE THE DlXON RATIO FLAGGED EVERY HOURLY VALUE FOR
    TWO CONSECUTIVE DAYS IN WINTER WHERE ONE STATION IN THE RURAL
    NETWORK READ APPROXIMATELY 8°C AND ALL OTHER STATIONS READ LESS
    THAN -2°C, SEE TABLE 9 (THIS COULD PERHAPS BE DUE TO A SIGN
    MISTAKE AT THE HIGH STATION),
                                   68

-------
                                 TABLE 8
             EXAMPLES OF FLAGGED VALUES FOR SEVERAL VARIABLES
      USING DIXON RATIO ON 12 URBAN STATIONS  IN THE RAPS DATA

VARIABLE
WINDS PEED
(METERS/SEC,)

TEMPERATURE
(°C)

DEW POINT


OZONE
(PPM)

CO
(PPM)

CH^
(PPM)

THC
(PPM)

DIXON
RATIO
,895
,821
,776
,766
,768
,769
,724
,723
,889
,853
,755
,797
,755
,996
,932
,789
,941
,931
,743
,734
,924
STATION VALUES
HIGHEST
15,7
14.4
6,4
6,5
32,1
18,7
-2,8
15,7
18,4
,163
,133
,041
3,81
43,5
4,2
4,2
5,6
4,1
3,10
4,19
4,31
2 HI
4,5
4,3
2,7
3,2
29,2
15,9
-6,2
9,6
7,1
,037
,100
,040
1,41
,51
,42
1,8
1,8
1,7
1,97
1,91
1,80
3 HI
3,6
4,2
2,6
1,0
28,4
15,2
-9,6
7,0
-6,5
,033
,089
,010
1,04
,24
,37
1,7
1,8
1,7
1,97
1,78
1,59
3 LO
2,4
2,6
1,5
-,30
27,3
14,2
-11,8
4,0
-8,7
,012
,080
,002
,16
,07
,13
1,3
1,6
1,5
1,63
1,28
1,50
2 LO
2,1
1,9
1,5
-.65
27,2
14,2
-12,1
3,7
-9,6
,011
,075
,002
,14
,06
,09
1,0
1,5
1,5
1,58
,91
1,36
LOWEST
2,0
1,7
1,3
-1,6
27,0
13,9
-13,8
-2,2
-10,1
,003
,002
,002
,13
,05
,07
,06
1,4
1,4
1,40
,05
1,25
I/
RATIO ONLY USED TO FLAG HIGH VALUES,
                                   69

-------
                                  TABLE 9

              EXAMPLES OF FLAGGED VALUES FOR SEVERAL VARIABLES
       USING  DIXON  RATIO  ON  13 RURAL STATIONS IN THE RAPS DATA BANK-*-/

VARIABLE
WINDSPEED
(METERS/SEC,)

TEMPERATURE
(°C)

DEW POINT


OZONE
(PPM)

CO
(PPM)

CHij
(PPM)

THC
(PPM)

DIXON
RATIO
,815
,808
,772
,940
,957
,921
,845
,782
,701
,933
,900
,771
,929
,973
,889
,944
,873
,791
,842
,904
,855
STATION VALUES
HIGHEST
6,4
4,1
5,5
8,0
8,5
6,1
31,4
18,1
33,2
,206
,252
,045
8,7
28,9
8,4
10,1
7,6
5,0
4,2
3,9
5,4
2 HI
3,9
2,2
3,1
-3,1
-11,4
-7,0
6,7
5,0
17,6
,018
,051
,015
,96
1,7
1,3
2,4
3,9
2,9
2,3
2,4
3,0
3 HI
3,8
1,5
3,1
-3,6
-11,5
-7,2
,44
4,6
17,3
,016
,048
,012
,77
,94
1,1
2,3
2,7
2,7
2,3
1,9
2,2
3 LO
3,3
,95
2,7
-4,3
-12,3
-8,3
-5,1
2,1
13,4
,006
,028
,003
,21
,43
,28
1,9
2,1
2,1
2,0
1,7
1,9
2 LO
3,2
,84
2,3
-4,4
-12,4
-8,4
-5,2
,82
10,6
,002
,026
,002
,17
,17
,21
1,8
2,0
2,1
1,9
1,7
1,7
LOWEST
3,1
,49
1,9
-4,9
-12,6
-8,5
-7,8
-2,9
6,9
,002
,015
,002
,12
,17
,14
1,7
1,9
2,0
1,7
1,3
1,4
II
RATIO ONLY USED TO FLAG HIGH VALUES,
                                    70

-------
CLUSTER ANALYSIS AS A DATA VALIDATION
              TECHNIQUE
                  by
          Harold L. Crutcher
              (Consultant)
           35  Westall  Avenue
    Asheville, North  Carolina   28804
                   71

-------
               CLUSTER ANALYSIS AS A DATA VALIDATION TECHNIQUE
                               H.L.  Crutcher
                               INTRODUCTION

     In any study the collection,  processing, and storage of data are fun-
damental.  Contaminated, adulterated, or "noisy" data confuse the investi-
gator.  Data do not necessarily fall into neat categories.   Usually there
are mixtures.  Some of these are determinant; some are not.
     There are many techniques used to cluster and to classify data.   This
paper discusses one.  This technique separates mixed data sets into subsets.
Each subset will exhibit homogeneous characteristics.  The investigator can
assess the relative importance of the subsets and the nature of the subsets.
Outlying subsets may indicate anomalous true conditions or may indicate mal-
functioning of some part of the observational program.*  Thus some idea of
data quality may be obtained.
     The techniques used here require the assumption of the normality of
distribution of the data.  If the data are not normally distributed, then
some transformation to approximate normality should be made.  The loga-
rithmic transformation is often used.  Where this is known to be inapplica-
ble, then another transformation is needed.  For example, cloud cover is not
well represented by the normal nor the log-normal distribution.
     The clustering program discussed here was initially developed by Wolfe
(1) and modified by Crutcher and Joiner (2).  It will accept any input data.
However, the criteria selected by the user to enable the computer to make
decisions are based on the assumption of normality of distribution.  Any de-
parture from this assumption introduces some uncertainty in the results.
  In particular, the outlying subsets may be examined for their validity.
  The minimum number in a set is determined by the number of elements being
  examined simultaneously.  With five elements, the minimum subset will be
  one more than five, or six.

                                     72

-------
There is always enough uncertainty without introducing more.   For example,
although the 0.05 probability level is selected for decision, departure from
normality may actually cause the decision to be made at some  other level, but
this level will never be known.

                             ESSENTIAL PHILOSOPHY
     Many elements and many observations may be treated.  Computer capacity,
time, 'and money will be the controlling factor.  Within these constraints
the user may wish to randomly select a representative sample  of the data for
processing.
     Most investigators choose to standardize their data.  This produces
dimensionless numbers with means of zero and a variance of one for each el-
ement.  A mean of zero and a variance equal to the square root of n, the
number of variables, are obtained in the multivariate case.
     If the elements are uncorrelated and are homogeneous, a  spherical clus-
ter of data points is obtained.  If the data are correlated,  the original
element axes are rotated so that along the new axes obtained, the new com-
ponents are not correlated.  The new system will then be spherical in shape
if the data are homogeneous.

                                 CLUSTERING
     If the data are clustered, even though the data are standardized, tests
of normality will be rejected.  Therefore, the usual procedure is to cluster
data into probable groups.  Then null hypotheses are established to compare
two groups against one, three against two, and so on until the null hypothesis
is not rejected.
Initial
     The computer program may be set to establish any number of initial clus-
ters.  Here, the first 40 entry data serve to establish 40 clusters, but
arbitrary clusters could have been inserted.
     The number of elements is n so each datum is an n-vector with its point
in n-space.  The 40 clusters represent 40 centroidal points in n-space.  The
distances between the centroids are computed.  The two closest are merged to
a new centroid which is a mean or average of the two.  After merging, there
                                     73

-------
are now 39 clusters.   A new datum enters from storage to again fill  out the
40 spaces reserved.  This procedure is repeated over and over again until all
data have entered and have been assigned to one of the clusters.  Variance
considerations or other distance measurements as well as distances between
the centroids can be used.
Intermediate
     Forty clusters were obtained initially.  These forty clusters are com-
pared on an argument of the distance between centroids or on variance consid-
erations as before.  The two most nearly alike are merged into one cluster.
The procedure continues until one final cluster remains.
     Figure 1, which came from Figure 1 Crutcher and Joiner (3), illustrates
in an abbreviated way the flow of 74 observations until final coalescence in-
to the final group.  The final group is made up of the initial data but the
sequence is altered to show the entrance into the group.  The data are 4-space
upper air observations at the Canton Island 30-mb surface (1960-1964).  The
four elements of the subspace at the 30-mb surface are:
     1.  Height of the surface;
     2.  Temperature of the surface;
     3.  East-west wind component;
     4.  North-south wind component (orthogonal to east-west component).
Final
     The user is required to provide the number of clusters wanted for review
and the probability level of rejection for the null hypothesis.  The sequen-
tial tests are for k+1 groups versus k groups where k runs from 1 to 40.  The
tests will continue until the null hypothesis in not rejected or until the
requested number of clusters have been examined.
Output
     At the completion of the computer program, output is presented as:
     1.  The initial set of data in some sequence established by the use.
     2.  Matrix of observational setup and preliminary comparisons.
     3.  Forty groups  (clusters) with identified input data and means.
     4.  Coalescence,  step by step, into fewer and fewer groups until the
         unlike final  group is obtained.
     5.  Statistics with means and standard deviations for the main group
                                      74

-------
;|
1
1
1
1
1

1
1
IN i
1
$
R
r- 1
1
1
1
1
1
"1
"1
»•
J
1

I




1
"!
1
1
i


s '
!-J
1 !
i
i
r i
r'~ i
i"
1"*
T—
r
I""*-"
r
r—
TS- -
7s
r
r
7 ~ "
r
r'"
r
\
r
\"""
r
7
i
i1
r
i •
i
i
r
i

l
i
r
r
r
r
i
r
j-. .




"""<_


	
n

; 	
";
RSRS~



.....
1






"V

1
*



l-I
\


1 	

i





"1
sss —
"1
"~~l
r* in f- r-

*




___
1


*
™"






}
— m


t
1


-«a-t



-^
(^^ ^•••^^«

















	 1
4








1





1
a;KPS
^^•••^•™







^


















[




'I
* * *
"*"""
? J7?5 '

* 4

_, i
ss t

*
"««



~1




«


_ —
{
f


""•1

1 •"







1
t

ssss



"}
*
S2£ S*






i


—

** •"


i
i



















e» £ »5


*
SS=;SS
• "SSSS






— 1



" "
f
2aa™2
...



1
sjsas
*

SSKPS
S J^?'


T
i



S5SS
+
g»^«5



SS^SS
NSSSJ •













1




















1



SXSK
M ifi t~ r- «
gasss
S3SRS
""""°~
ss-ss
f
"SSSI
" "
"
1 ™SS"S I

                                                CO
                                                O
                                                CO-
                                                Q  I
                                                Z O
                                                «q; 10
                                                _l CD
                                                CO ,—
                                                Z 3
                                                 i—i
                                                O DC
                                                U-  «

                                                s: i—
                                               <£ O
                                               i—' ex
                                               o S

                                               C3 O
                                               QC <
                                               UJ Z
                                               I— O
                                               oo (vi
                                               O UJ
                                                  QC
                                               o
                                               UJ Q_


                                               CJ UJ
                                               00 I—
                                               UJ
                                               cc
75

-------
         and each  subset.   Each  set  comparison  has  the  actual probability
         level  printed.   An option in  the  program permits either the  selec-
         tion of the eigenvector-eigenvalue  output  or the correlation matrix
         output.
     6.   Discriminant function  scores  for  each  datum.
     7.   In case the eigenvector-eigenvalue  output  is selected, computer
         print-plots of  discriminant scores  are shown with  appropriate
         assignment of each datum to a cluster.  The clusters are  numbered.
     8.   The final printing shows the data in order of  the  input  identified
         by the cluster  configuration assignment.   This permits easy  review
         of the classification  of each individual datum.

                                 EXAMPLES
     Table 1 is taken from Table 5,  Crutcher and Joiner (3).  The  statistics
are for two clusters derived from a  January  data set, Canton  Island 30  mb
surface data, 1953-67.  There are 434 data.   The set  is separated  into  two
clusters which comprise  31 and 69 percent  of the total  set.  The  zonal  com-
ponent of the wind speed, which is an average of -6.4 m/s,  is  separated into
two groups whose means are -28.1 m/s and 3.5 m/s.
     Figure 2 is taken from Figure  3, Crutcher  and  Joiner (3).  The figure
exemplifies the separation of the Canton Island January data  in the 2-space
of the orthogonal  components of the  wind.   The  mean height  and temperature
data are shown. The variances may  be compared  with data of Table 1.

                               APPLICATIONS
     Data of any type may be examined to determine  whether  there  are reason-
able subsets of homogeneous characteristics.  As the standardization tech-
niques remove the  dimensionality of the data, i.e., degrees,  mph,  meters,
grams, etc., any measurements may be used.  Thus,  application may be made  to
environmental data ensembles which  include measurements of  elements such as
particulates, pollutants  (gaseous),  precipitation,  wind, temperatures,  pres-
sures, or changes   of any of the above.
     Pollutant source or  likely deposition areas may be identified or sug-
gested.  Extension from one observational  point to  several  will  permit
                                     76

-------
TABLE 1. CANTON ISLAND, 30-MB DATA AND THEIR
         SEPARATION INTO TWO CLUSTERS, 1953-67

Data fraction
H (gpm)
ft (gpm)
T (°C)
*• (°Q
« (m s"1)
su (m s"1)
V (m s"1)
s, (m s-1)
rm


Data fraction
# (gpm)
j* (gpm)
f (°C)
*• (°C)
i* (m s ')
ju (m s"1)
0 (m s~l)
s, (m s-1)
rut
Group 1
(total)
1.000
23764.9
94.9
-57.0
2.7
- 6.4
16.2
0.4
4.3
- 0.1


1.000
23938.0
94.0
-53.6
3.1
- 4.3
19.6
0.1
3.9
- 0.0
January
TV = 434
Group 2
(easterly)
0.310
23720.6
65.6
-57.4
2.3
-28.1
5.8
1.1
4.7
0.2
July
TV = 558
0.470
23879.6
78.5
-55.4
2.2
-23.4
5.8
0.3
3.3
0.1
Group 3
(westerly)
0.690
23785.0
100.0
-56.9
2.9
3.5
7.5
0.0
4.1
0.1


0.530
23989.7
75.6
-52.1
3.0
12.6
9.3
- 0.2
4.3
0.1
Group 1
(total)
1.000
23801.3
81.8
-54.9
3.0
- 4.6
16.7
- 0.0
4.0
0.0


1.000
23886.6
104.4
-55.1
2.8
- 3.8
19.7
0.2
4.2
0.0
April
# = 509
Group 2
(easterly)
0.427
23773.8
71.0
-56.1
2.7
-20.9
10.7
- 0.2
3.7
0.0
October
#=476
O.?30
23829.3
67.6
-55.9.
2.2
-30.2
4.7
0.0
3.7
- 0.1
Group 3
(westerly)
0.573
23821.9
84.5
-54.0
3.0
7.6
7.6
0.1
4.2
0.0


0.670
23915.0
108.4
-54.7
2.9
9.3
7.3
0.3
4.4
0.0
                      77

-------
      0
     I o
     j. «
"5
r. >
n u
IS!
o 4
                a
                z

                i
                                      Si
                                      Z I
                                      <
                                      o
§«r- .„
-< « 10 «

u

ii
2

                                                             t- 10 ui W

                                                              TTT
P NO. H

23887

23829

23915
.SO

UP
O.

GRO

1

2

3
                                    /^
                                                                         CQ
                                                                o

                                                                CO

                                                                
                                                     o» • ffi
                                                    .j n n n
                                                    O ft ft fi
                                                    Z

                                                   i o.


                                           O^Nmoc^O^Mn


                                           (9      o.   o
                                                          ,-s
                                                                 -
                                                                         >-
                                                                         a:
                                                                         C\J

                                                                         UJ
                                                                         oc.


                                                                         cs
                            78

-------
mapping in a topographical sense.  These techniques can be applied to data
sets derived from topographical studies made by use of polynomials, trigono-
metric, polynomial orthogonal, or other types of polynomials.  Recent com-
puting advances in the computation of orthogonal polynomials known as asym-
metric singular decomposition (ASD) procedures make these polynomials easier
to obtain and use.

                                 REFERENCES
1. Wolfe, J.H. NORMIX 360 Computer Program.  Research Memorandum SRM 72-4,
   Naval Personnel and Training Research Laboratory, San Diego, CA (1971)
   125 pp.
2. Crutcher, H.L. and Joiner, R.L.  Separation of Mixed Data Sets into Homo-
   geneous Sets.  NOAA Technical Report EDS 19, National Oceanic and Atmos-
   pheric Administration, Asheville, North Carolina 28804 (1977) 165 pp.
3. Crutcher, H.L. and Joiner, R.L.  Another Look at the Upper Winds of the
   Tropics. J. Applied Meteorology, 16 (5), (May 1977) pp462-476.
                                      79

-------
ENGINEERING COMPUTATIONS AND DATA COLLECTION
     FORMATS USEFUL IN DATA VALIDATION
                     by
            A. Carl Nelson, Jr.
      PEDCo  Environmental,  Incorporated
           505 South Duke Street
        Durham, North Carolina  27701
                      81

-------
       ENGINEERING COMPUTATIONS AND  DATA COLLECTION
            FORMATS USEFUL  IN  DATA VALIDATION
                      A.C.  Nelson, Jr.

          A considerable number of "after-the-fact" data
validation techniques will be given during this one-day
conference.  It is not the attempt here to try to summarize
all of these techniques but to indicate some of the impor-
tant validation procedures from the view point of the
laboratory and field experts.  The approach taken is as
follows.  The question was asked of laboratory and field
experts:  What are some of the important areas of data
validation in order to yield data of good quality?  Some of
the areas listed are briefly described below for both ambient
air monitoring and source testing.
Data Validation - Ambient Air Monitoring Data
     1.   Audits
               The purpose of audits should not be to point
          a finger at the organization/team being audited.
          In some cases the auditor can be in error.  The
          value of an audit is that it can identify a gross
          bias or  inaccuracy  in reported data.  One recent
          example of a problem was  in the use of an incor-
          rect method.  The audit pointed out the problem,
          special  instruction was given, and the condition
          was hopefully corrected.  It is possible that no
          after-the-fact data validation techniques could
          have identified the error in this case since there
          was a bias throughout the region as the same
          procedure was taught to all operators.  The audit
          has served its purpose well in this example.

                               82

-------
In EPA sponsored audits the auditor is almost
always checked out at EPA, Quality Assurance
Branch (QAB) prior to conducting a field audit.
This provides a traceability to a common standard
and method.
Knowledge of the instrument
     This is certainly one of. the most important
considerations in obtaining good quality data.
The operator must know the sensitivities/inter-
ferences of the instrument.  An example would be
the interference of CO- concentration on an S02
analyzer employing a Flame Photometric Detector.
A test was designed to test the possible sensi-
tivity and the results indicated a definite and
reproducible relationship.
     This particular type of error detected for a
particular analyzer could be very difficult to
detect by after-the-fact validation procedures.
Some independent and accurate check must be made
using an instrument which has been tested for
possible interferences/sensitivities.  If the
sensitivity of an instrument to an interference has
been precisely determined and is reproducible,
then a correction can be made to obtain the result
which would have been observed if no sensitivity
existed, this was true in the case mentioned.
     Another means of gaining information about
the instrument/method is to design a ruggedness
test to check out possible gross factors/steps
which may have a significant effect on the results
or measurements if appropriate control is not
exercised.
                   83

-------
Interlaboratory tests
     Participation in these tests provides a means
of checking the laboratory analysis methods and of
validating the current multipoint calibration
curve.  The feedback of information from the
laboratory performing the overall analysis of the
results from all participating laboratories is
most important.  For example, if a laboratory is
consistently in error for a particular analysis or
range of concentrations, then this laboratory must
have some means of correcting this problem through
communication with the overall test laboratory or
some representative thereof.
     The performance survey is a very good means
of validating data.  Furthermore, it is also a
good source of information about what can be
expected from a particular analysis procedure.
     In a conversation with a supervisor in one
laboratory, he indicated that they were performing
their CO analysis incorrectly and that the per-
formance survey helped them to identify a problem
which they did not know they had.
     On a small scale, three or four laboratories
could set up their own interlab test for a par-
ticular analysis for which no performance survey
data are being obtained by EPA, NIOSH or some
other agency.
Standards traceable to an NBS standard
     Often one hears that calibration gases are
not accurately analyzed.  It is thus necessary
that the user check the calibration gases prior to
their use in developing new calibration curves.
All measurements must be traceable ultimately to a
primary standard.

                      84

-------
               Significant  errors  in  calibration  gases  can
          usually  be  determined  by a  check  against  the  pre-
          vious calibration curve  obtained  using  the  most
          recent gas.   EPA,  QAB  has developed a protocol  for
          traceablity of gases.
     5.    Data reduction
               The raw data must be recorded  legibly  and
          completely  on appropriate data formats.   The
          calculations should be checked either completely
          or on a  sampling  basis.   In this  manner the equations
          used, the substitution of the correct values  in
          these equations,  and the calculated results are
          all checked.  This should be an internal  audit  as
          well as  a part of an external audit.
     6.    Other considerations
               Some other considerations are  the  more routine
          types of quality  control and assurance  techniques
          which are primarily internal functions.   Some of
          these techniques  are the use of blind reference
          samples, quality  control limits for internal
          checks of reference samples, comparison checks  of
          two or more calibrators, ruggedness tests,  inter-
          nal audits  by an independent operator,  and  chain
          of custody  procedures.

Data Validation -  Source Tests
     The results of the first series  of collaborative source
tests clearly showed  that more quality control and data
validations were needed to ensure  good quality data.  The
results of the first  Method 5; particulate collaborative
test, using average testing teams  and no special  quality
control, produced  a relative standard deviation for each run
in excess of 50%,  with the outliers thrown out.  As a result
                              85

-------
of this poor reproducibility,  several quality control and
data checks were incorporated  into the collaborative test
series.  These controls and the use of selected testing
firms produced results that were repeatable to within about
10 percent for each run.  Most of these additional quality
control checks are now detailed or implied in the revised
methods contained in the Federal Register, August 18, 1977.
The collaborative test series  showed two other areas of
concern with respect to quality control and data validation.
The first was that the methods and additional written proce-
dures were thought to be clearly written as to their exe-
cution.  This was found not to be the case in the early
collaborative tests as many variations were noted in the
performance of the methods.  It became obvious early in the
program that the performance of the average testing team
should be observed by a qualified observer.  Also the nature
of most of the errors were such that they could not or would
not have been detected by any data validations on the emis-
sion test report.
     The second area of concern was that most of the quality
control techniques were executed prior to the performance of
the field test and the assumption was made that all com-
ponents remained unchanged during testing.  Two examples are
dry gas meter calibration and the pretest leak check.  If
the dry gas meter calibration changed during testing or if  a
leak developed in the sampling train, this would not be
detected.
     The collaborative testing program has clearly demon-
strated that to properly perform the needed data validations,
controls must be clearly defined and observed before, during
and after each test series.  Data validation of uncontrolled
and unobserved sampling is not effective as a general rule
and usually will not clearly determine acceptability or
unacceptability of data.
                              86

-------
     The revised method published August 18, 1977 contains
many equipment performance calibrations and validations.
The examples used before, a dry gas meter calibration and
leak check only prior to testing, have now been changed to
include a post-test leak check and a post-test meter cali-
bration.
     The best method for data validation has become equipment
performance validation.  If the equipment is operating
properly the data should be accurate/precise within the
determined limits for that method.

Source Test/ Report Review
     One aspect of source testing report review involves
checking the results, not only that the correct equations
were used and there were no mathematical errors, but also
that the correct values were used as inputs into the equations,
The latter requirement can be checked quickly if all of the
required data were measured and recorded legibly, and the
raw data sheets are submitted with the report.  Any report
which includes a computer listing of the raw data, instead
of the original data sheets, should be rejected.
     The degree to which the calculations should be checked
is generally a function of the consistency of the results
and the reviewer's confidence in the tester's ability.  The
various levels of review possible for the calculations would
be  (1) none at all,  (2) random spot checks,  (3) complete
review of results which seem inconsistent, with respect to
each other or to typical results,  (4) complete review of one
randomly chosen run, and  (5) complete review of all runs.
     There are some empirical techniques that can be used to
check or validate process and sampling data provided by the
tester and the source.  In some cases, the sampling data
from the tester can be used to check process data supplied
  DSSE Workshop  -  draft  report.
                             87

-------
by the source.  Some of the available techniques are given
herein.  The experienced reviewer will ultimately develop
his own list of short cuts, cross checks, and rules of
thumb.

     1.   Barometeric Pressure
               Incorrect barometric pressure measurement
          will not generally cause errors of more than 10 to
          15 percent, but it is a very common error.  The
          value reported by the tester can be checked in two
          separate ways:   (1) At sea level, the barometric
          pressure is almost always between 29 and 31 inches
          of mercury, and usually close to 30.  For every
          1000 feet above sea level, the value will decrease
          by  1.1 in. Hg.  Therefore, if a test is run in
          Denver, with an elevation of 5000 feet above sea
          level, the barometric pressure reported should be
          from 23.5 to 25.5  inches of mercury.   (2) The
          reviewer can call  the airport closest to test
          site, and ask for  the "station" pressure  (not
          corrected to sea level) for the date of the test.

     2.   Leak Tests
               If the report claims that leak tests were
          performed either before each test or after filter
          changes, the dry gas meter readings on the data
          sheet would indicate this.  In other words, it is
          unlikely that a  leak test was done before run #2
          if  the final volume reading for run #1 is the same
          as  the initial volume reading on run #2.  If a
          leak test was made in the middle of the run  (because
          of  a filter change, for example), the volume
          readings before  and after the  leak test would be

-------
         shown on the data sheet,  so that the computed
         meter volume could be adjusted accordingly.

         Moisture Data
              The results presented in the report for the
         volume percent of water vapor in the gases sampled
         can be checked in several different ways.   For any
         combustion source, the moisture content can be
         approximated by use of nomographs  if the reviewer
         calculates the excess air and can estimate the
         ambient temperature, ambient humidity, and the
         free water in the fuel.  Hopefully, the process
         data will include an analysis of the fuel.  If
         not, use zero for gas and oil, 10 percent for
         bituminous coal, and 25 percent for lignite, bark,
         wood, and refuse unless the fuel has been rained
         on recently.  If the best estimates available are
         ranges, use the high and low estimates to bracket
         the moisture content.
              Entrained droplets of liquid water in the
         stack gases can yield an erroneously high moisture
         content.  All moisture data should be checked
         (even if there are no entrained water droplets) to
         ensure that the reported value is not higher than
         the saturation moisture content.  Nomographs
         provide moisture content at saturation as a function
         of stack absolute pressure and stack gas temperature.
         If the reported value is higher than the maximum
         read from the nomograph, the data are suspect.
         Generally, if the high reading was caused by
         entrained water droplets, the value is adjusted to
         the saturation moisture content.
DSSE Workshop - draft report.
                            89

-------
              In  sources where the process  involves drying
         (removing water) from a raw material or product, a
         water balance  across the process should validate
         the moisture data  in the report.   Remember to
         include  the water  introduced  as humidity  in  the
         ambient  air.

    4.    Orsat Data
              For any combustion source, the  relative
         amounts  of oxygen  and carbon  dioxide in the  flue
         gases can be predicted by the use  of a nomograph.
         When  a report  is submitted containing orsat  data
         (or C02  and 02 data from any  other instrument),
         the data can be checked by aligning  the type of
         fuel  with the  %C02/ and checking  the %G>2  from the
         nomograph with the reported value.  If the results
         do not check,  it indicates that  there is  a problem
         with  the reported  data.
                                                            i1
              This nomograph also gives the percent excess
         air based on  the type  of fuel and orsat  analysis.
         The reviewer  should be  cautioned  that if  the orsat
         data  were taken after  a water scrubber,  the  nomo-
         graph will  not work,  since  the scrubber  will
         remove  an  indeterminate  amount of carbon  dioxide.

    5.    Volumetric  Flow Rate Data
              The volumetric flow rate is  difficult  to
         cross-check accurately,  but  there are several ways
         of determining if  the  reported values are in the
         "ball-park".   In  any duct  or  stack where  the air
         is moved by a blower,  the  design criteria generally
         result  in  a gas velocity of  25-40 feet per  second.
DSSE Workshop - draft report.
                            90

-------
The idea is that higher velocities cause prohi-
bitive pressure losses, and lower velocities are
uneconomical due to the cost of the duct work.
Since the size of many stacks is dependent on
structural strength or future needs, the check
works best for the duct work leading to the stack.
If the velocity measurements are made in the
stack, and the stack cross-sectional area is much
larger than that of the duct work, apply the
25-40 feet per second check by dividing the
volumetric flow rate by the duct area.  If there
is no fan or blower in the process, such as with a
natural draft boiler or incinerator, the flow will
generally be 5-15 feet per second.  Keep in mind
that the ranges given here are not theoretical
limits, but merely commonly encountered values.
If the test results presented do not fall within
these ranges, it is only a signal to look at the
velocity data more closely.
     In reviewing test results, it is always
desirable to have available the results from any
previous tests on the same source, previous tests
on any similar sources (such as an identical unit
at the same plant), or tests performed at the
inlet to the control device.  If the inlet tests
were done simultaneously with the outlet tests,
the volumetric flow rates  (corrected to standard
conditions) should match from inlet to outlet.  If
the control device uses water, the checks should
be made on a dry basis.  Air leakage in or out of
the control device can occur, which would lessen
the value of this check, but air leakage can
generally be identified by a change in the moisture,
                     91

-------
     temperature, or C02 content  from  inlet to outlet.
     Since  inlet tests are not usually done for com-
     pliance,  they  are often  performed in  ducts with
     little or no straight run, which  can  cause higher
     than real velocity data, a factor to  consider  when
     making inlet-outlet comparisons.
         Many sources have fan performance curves  for
     the fans  used  in the process,  and these  can  be
     used as a check against  the  reported  flow rate
     data.   The gas flow moved by the  fan  is  a function
     of the pressure head produced (or induced, or
     both), the gas temperature,  the gas composition,
     and the fan speed  (rpm).  Unless  all  of  these
     factors are controlled or quantified  (which  is a
     rare  situation) the  fan  curves can  only  be used to
     estimate  or roughly  check the flow  rates.
         When process equipment  and/or  control devices
     are designed,  there  is generally  a  design  speci-
     fication  on volumetric  flow  rate.  If these  speci-
     fications are  available  (from the source or  from
     permit forms)  they  can be used to check  the  tester's
     results.

6.   Process Data
          There are probably  as many different ways to
     check process  data  as  there  are types of processes.
     Some  are  so much  a  part  of  a particular  process
     that  they could not all  be discussed  here.   There-
     fore,  if  the  checks mentioned herein  are not
     adequate  for  the  process in  question, then that
     process should be  studied  (using  the  literature
     and  communications  with  the  source) to determine
     if some additional  checks  are available  for  use.
                       92

-------
     For many processes, the production rate is
relatively constant from day to day.  In this
case, the production rate reported should compare
favorably with the annual production rate (or
annual raw material usage rate) divided by the
number of operating days.
     In a case where the reviewer wants to compute
the production rate from the raw material rate, or
compute the raw material rate to check the produc-
tion rate, the principle of material balances
should be employed.  If one ignores nuclear
reactions, then it can be stated that in any
process, matter will be neither destroyed nor
created.  This means that any materials entering
the process must either accumulate or leave the
process (in minus out equals accumulation).  The
material balance can be done on all components of
the process stream, or it can be limited to a
single component such as water or carbon dioxide.
     Drying operations, like grain dryers, are
good examples of sources which adapt readily to a
water balance.  Water enters the process from the
grain itself, from the drying air (which is
generally ambient air), and from the combustion of
fuels containing hydrogen; it leaves as water
vapor in the exhaust and as residual water in the
grain.  The stack test data provide the total
water vapor leaving the dryer, by multiplying the
total gas flow rate by the percent water vapor,
and converting the result to a mass rate.  From
the amount of fuel burned, one can compute the
water vapor produced by the combustion.  If the
                    93

-------
     ambient  temperature  and  relative  humidity  are
     known, the water  supplied  by  the  drying  air  can  be
     computed.  From the  symbols shown in  Figure  1, the
BM..T
C-

from ambient air
Wcl LUJL JL L UIll yiilill •
water from
combustion
DRYER
—water vapor 	
residual water
in grain
                                                    > D
            Figure 1.   MATERIAL BALANCE
     water balance would be:
               A+B+C=D+E,
      and A,  C, and D have been computed.   If F is the
     weight of grain dried, W is the inlet moisture
     fraction for the grain,  and W  is the outlet
     moisture fraction for the grain, then
               B = WF, and E = W'F.
     Substituting these expressions  in the above equation
     and solving for F yields:
                   D-A-C
               TJI _
                    w-w
7.    Emission Results
          Unfortunately, the most difficult data to
     validate are the emission results, which also are
     the most important data to validate.  Emission
     rates for gaseous pollutants,  such as SO9 and NO ,
                                             f-f       2C
     can often be checked against process parameters,
     but this is because these pollutants are rarely
     controlled.  For example, since essentially all
     the sulfur present in coal or oil will be liber-
     ated as S02 during combustion, a sulfur balance
                        94

-------
          should  yield  a  good  check  of  SO-  emission results.
          For  a, specific  design  of boiler or  incinerator,
          the  amount  of NO   produced can be estimated  from
                         x                      2
          the  emission  factor, published by EPA.
              The  generation  of particulate  pollutants  is a
          function  of a large  number of process  parameters,
          many of which cannot be measured.   Emission  factors
          are  available for  many sources of particulates,
          but  most  of these  sources  use some  type  of control
          device  to remove the bulk  of  particulates prior  to
          the  stack exhaust.   As an  example,  consider  the
          particulate emissions  from a  utility boiler  with
          an electrostatic precipitator, or from an asphalt
          batch with  a  baghouse  collector.  The  emission
          factor  for  these sources,  prior to  the control
          device, are listed in  the  emission  factor book and
          the  literature, and  for now it will be assumed
          that they are accurate. The  control devices that
          are  used  would  have  a  design  efficiency of 99  to
          99.5 percent, but  the  actual  efficiency could
          range  from  50 to  99.9  percent.  What this is
          saying  is that  if  the  uncontrolled  emissions are
          100  pounds per hour,  the design efficiency would
          yield  emissions of 0.5 to  1.0 pounds per hour, but
          the  actual  emissions could range  from  0.1 to 50
          pounds  per  hour.   The  emission  factor  book lists
          factors for various  types  of  control devices,  but
          these  are design efficiencies,  and  the reviewer
          should  resist paying much  attention to them.  They
          only reflect  the emissions if the control equip-
          ment is operating  at its design efficiency,  and  if
          that is assumed to be  true, then  there is no need
          to perform  a  compliance test  in the first place.
2
 "Compilation of Air Pollution Emission Factors", U.S. EPA,
  Publication No. AP-42.
                             95

-------
     One approach which is often suggested (and
used)  by control agencies is the idea of comparing
the three runs to one another.  In other words,
the validity of the data can be measured by the
proximity of the three results to the average.
This would work if all of the variation in the
results was a function of random sampling errors.
Using these assumptions data could be handled as
in the following examples:  (1) the three results
are 2, 3, and 4, and the reported emission rate is
3, and  (2) the three results are 2, 4, and 15, the
15 value is thrown out as an outlier, and 3 is
reported as the emission rate.  The second example
says that since 2 and 4 are close to one another,
and 15 is not close to 2 or 4, that the 15 must
represent a gross sampling error, and only the 2
and 4 should be averaged together to get the
emission rate.
     Several additional considerations should
discourage the reviewer from applying this vali-
dation technique.  There is no question that three
nearly  identical results will  instill confidence
in the reviewer's mind, and that three widely
different results will reduce  that confidence.
Using the example above, however, with the results
of 2, 4, and 15, how can the observer tell what
the rest of the  "population"  looks like?  Had  four
samples been taken instead of  three, with results
of 2, 4,  15, and 15 and the last three were
reported  instead of the first  three, the 4 would
be thrown out as an outlier and 15 would have  been
reported  as the emission rate.  Process variations
can occur during testing that  could  produce ten-
to-one  variations in the actual emission rates,
                    96

-------
          and these variations can occur at any time,
          without any warning, and often without being
          noticed.
               As a final note, any statistician can supply
          dozens of methods for evaluating a set of results,
          including ways to calculate confidence limits and
          eliminate outliers.  Any statistician will also
          tell you, however, that a single set of three
          results is really too small to study statistically.
          And all the statistics in the world cannot replace
          common sense.

Summary
     In summary, data validation must be an integral part of
the data collection, analysis, reduction, and reporting
process.  Several useful data validation techniques which
will aid in detecting large inaccuracies in the reported
results are described.  However, there are obvious limitations
to the types of data inaccuracies which can be identified in
both ambient and source tests as pointed out in this paper.
It is hoped that the techniques suggested herein, from the
viewpoint of laboratory and field experts, will stimulate
further discussion.

Acknowledgement
     Lawrence Elfers and William DeWees of PEDCo Environmental
provided much of the information which is briefly summarized
herein.  "In addition, appreciation is due to Entropy Environ-
mentalists, Incorporated because the draft copy of the Division
of Stationary Source Enforcement Workshop contains information
provided by this organization."  I hope that this paper does not
misinterpret their written and verbal suggestions.   These inputs
are greatly appreciated.
                               97

-------
  VALIDATION  PROCEDURES  APPLIED  TO  IN-USE
        MOTOR VEHICLE  EMISSION DATA
                     by
             Marcia E.  Williams
Office of Mobile Source Air Pollution Control
    U.S. Environmental Protection Agency
         Ann Arbor, Michigan  48105
                      99

-------
                VALIDATION PROCEDURES APPLIED TO IN-USE

                     MOTOR VEHICLE EMISSION DATA

                            M.E. Williams
                               ABSTRACT
     One of the functions of the Office of Mobile Source Air Pollu-
tion Control  is the collection and subsequent assessment of data on
the emission performance of in-use vehicles.   On an annual  basis,
over 4.5 million fields of data are collected.   These data  must be
carefully validated before they are used by EPA, by other government
agencies, and by private citizens.  The current data editing proce-
dures are designed to be fairly routine but quite complete.  Systematic
problems are eliminated with thorough laboratory facility check-
outs, frequent calibration checks, and correlation programs with the
EPA laboratory.

     The data editing procedure is divided into two parts - manual
editing of the large amount of supporting data forms and strip
charts and computer editing of all data cards.   This edit procedure
has detected error rates of from 14 to 32 percent manually and from
5 to 50% in the computerized phase.  Most of these errors are cor-
rectable and less than five percent of tests are invalidated.
Although some of the errors can be found in either the computer or
manual phase, many errors can only be detected in one of the two
phases.  To avoid needless effort, the phases are performed in
series rather than in parallel.

     Editing costs are less than two percent of the total program
cost and the current edit program is estimated to achieve a final
error rate of about one percent.  Future changes to the editing
procedure focus on reducing the EPA manpower requirements without
sacrificing the current quality level.  An effort will be undertaken
to determine how much effect various types of errors have on the
ultimate uses of the data so that the question of "How good do the
data have to be?"  can be factored into the design of data valida-
tion methodology.
                                 100

-------
                              BACKGROUND

     The Office of Mobile Source Air Pollution Control  is responsible
for generating a data base on the in-use emission and fuel  economy
performance of all mobile sources.   These data are used by many
groups within EPA as well as by other Federal  agencies, state and
local governments, private industry, and private citizens.   A fairly
complete list of uses for emission factor data is given in Table 1.
Different degrees of data accuracy are needed  for different data
applications.  Most applications are concerned with having an accurate
estimate of average emissions or fuel economy.  However, those items
in Table 1  which are notated with an asterisk  require that the data
on every vehicle be completely accurate.  At this point in time, the
data edit procedure is geared to ensure that all fields of data are
correct.

     Table 2 provides a list of typical ongoing test programs.  On
an annual basis, OMSAPC spends between 2.5 and 4.5 million contract
dollars on characterizing the performance of in-use vehicles.
Typical ongoing test programs are listed in Table 2.  Each test
program involves the procurement and subsequent testing of consumer
owned vehicles.  Vehicle owners are given incentives such as a U.S.
savings bond, a leaner car, and a free tank of gas to participate in
the EPA test program.  Vehicles are then tested over a variety of
different test sequences.  In some cases, entire test sequences are
repeated with vehicles in different states of  tune or with ambient
test conditions varied.  Table 3 lists the types of variables which
are collected for each vehicle test.  For each vehicle test sequence,
there are 150 to 600 pieces of information gathered.  For each
vehicle tested, there are from one to six test sequences performed.
Thus, on an annual basis, approximately 4.5 million fields of data
are collected and must be validated.

                           GENERAL APPROACH

     The data validation procedure begins with the assumption that
there is no systematic bias or error in the data.  Systematic errors
are prevented by the development of detailed test procedures, record-
keeping procedures, and mandatory recording formats.  Each contractor
must undergo a rigorous facility check-out at the beginning and end
of each test program.  The check-out includes  performance tests for
all contractor personnel including equipment operators, drivers,
test technicians, and data handlers.  In addition, EPA personnel
specify frequent calibration checks on all equipment, carry out


                                101

-------
reference gas and vehicle correlation testing against the EPA
laboratory in Ann Arbor,  and implement both announced and unan-
nounced contractor inspections throughout the duration of each test
program.  Table 4 details the procedures which are implemented to
prevent systematic bias.

     All EPA contractors  are required to perform data validation
before submitting any data to EPA.   EPA contracts specify that the
contractor must use some  form of computerized edit procedure in
addition to a manual procedure.  However, the exact contractor
procedure is not specified.  Since  contractors are not paid for
tests until EPA accepts the tests as valid, an incentive exists to
submit correct data as soon as possible after the completion of a
vehicle test.

     The EPA edit procedure is diagramed in Figure 1.  The manual
and computer aspects of data validation are carried out in series to
avoid needless manpower effort and  to ensure that at no time will
data files contain any data that are not validated.  The manual  edit
procedure is performed first and many of the steps in that procedure
are listed in Table 5.  The manual  procedure concentrates on strip
chart data including the driving trace and the emission concentrations.
Although there is a trend toward contractor computerization of these
items, a fully computerized data aquisition system is expensive and
is not required by EPA due to the short term (annual), fixed-price
nature of EPA contracts.   Thus, errors in following the appropriate
driving trace and properly zeroing and calibrating analyzers can
only be detected in the manual phase.

     Table 6 presents the types of checks which are performed in the
computer edit procedure.   As shown in Figure 1, the computer editing
does not occur until the manual edit checks indicate a potentially
valid test.  Table 7 summarizes the types and the severity of errors
which have been detected.  Tests are invalidated only in cases where
test procedure errors are uncovered or in cases where key data are
missing.  Table 8 lists typical reasons that complete test sequences
have been invalidated.

     The one type of error which the current edit procedure  is not
specifically designed to detect is discrepancies between the computer
cards and the supporting documentation.  If each computer card entry
is within range and consistent with other computer data fields,  it
will not be flagged.  For example, if a highway emission result  is
incorrectly keypunched as 4.52 instead of 4.92, it would not be
detected since both numbers could be equally valid.  One would have
to examine the analyzer trace and the data packet notation to know
which value was correct.  Since data are double keypunched,  these
                                   102

-------
types of errors are assumed to be minimal.   However,  consideration
is being given to the implementation of an  acceptance sampling
procedure to ensure that information on the data cards matches
information in the supporting documentation.

                         RESOURCES AND RESULTS

     Table 9 indicates the EPA detected error rates in three recent
test programs.  In each case, the error rate is the percentage of
vehicles with at least one detected error;   some vehicles may have
multiple errors.  The range of detected error rates clearly indicates
that not all contractors employ the same levels of quality control.
However, despite the high detected error rates, less  than five
percent of total tests are invalidated;  most errors  can be corrected.

     Table 10 summarizes the required EPA editing resources for two
recent test programs.  These resources are  examined as a fraction of
total contract cost in Table 11.   Assuming  that EPA manpower used to
perform data editing costs $20,000 per manyear, the EPA cost for
data validation is about two percent of total contract cost.  With
this resource effort, it is estimated that  the undetected error rate
is less than one percent.

                           FUTURE APPROACHES

     Table 12 lists a number of additional  data validation approaches.
More automated data aquisition is being implemented in one ongoing
contract.  The system has taken considerable time to debug and it is
too early to judge the cost-effectiveness of this approach.  In
recent contracts, EPA has increased the contractor data validation
requirements.  Again, it is too early to determine whether this
action will prove to be a cost-effective way to achieve a low final
error rate.  Finally, in one large test program, EPA has stationed
personnel at the contractor's site on a full time basis.  Again, the
effect on final error rate is not yet known.

     Table 12 lists three additional approaches which have not yet
been implemented.  If improved contractor error rates can be achieved,
EPA will attempt to reduce dedicated edit manpower by implementing a
general spot check procedure.  Such procedures are based on statistical
principles and one such procedure is outlined in detail in Tables 13
and 14.

     Before edit procedures can be implemented which attempt to
lower the cost of editing by applying statistical procedures to key
data fields, two important philisophical questions must be answered.
First, it must be determined how good the data have to be.  Variables
                                   103

-------
of maximal interest need to be specified and in each case,  the
confidence and range within which the variable needs to be  known
must be determined.  The second major area of uncertainty requires a
determination of the impact that various errors make on the vari-
ables of maximal interest.  Answers to these question areas are
currently being pursued so that statistical edit procedures can be
considered in more detail.
                                  104

-------
                              Table 1
                        USES FOR TEST DATA
 EPA
 1,   ASSESSMENT OF EMISSION AND DETERIORATION RATES FOR AP-42
     (HANDBOOK OF AIR POLLUTION EMISSION FACTORS)
 2,   DEVELOPMENT OF EMISSION AND FUEL ECONOMY CORRECTION FACTORS
     FOR AP-42
 3,   COMPARISON OF IN-USE LEVELS WITH CERTIFICATION. ASSEMBLY
     LINE LEVELS
 4,   DETERMINATION OF REASONS FOR POOR IN-USE VEHICLE PERFORMANCE
 5,   ASSESSMENT OF SHORT TEST/FTP CORRELATABILITY FOR ALT, 207(B)
     (APPLICABLE SECTION OF THE CLEAN AIR ACT;
 6,   ASSESSMENT OF INSPECTION/MAINTENANCE BENEFITS
*7,   EVALUATION OF IN-USE VEHICLE COMPLIANCE WITH STANDARDS -
     SUPPORT FOR AGENCY RECALL PROGRAM
*8,   COMPARISON OF PRODUCTION/PROTOTYPE FUEL ECONOMY LEVELS
 9,   SUPPORT FOR REGULATION DEVELOPMENT PACKAGES - ENVIRONMENTAL
     IMPACT ANALYSES
10,   PRIORITIZATION OF AGENCY REGULATION/COMPLIANCE PROGRAMS

 OTHER USERS
 1,   HIGHWAY ENVIRONMENTAL IMPACT STATEMENT (EIS) WORK
 2,   INDIRECT SOURCE REVIEW
 3,   REGION/STATE EMISSION INVENTORY WORK
 4,   EVALUATION OF IMPROVED PUBLIC TRANSPORTATION SYSTEMS
 5,   EVALUATION OF VEHICLE MILES TRAVELED (VMT) REDUCTION STRATEGIES
 6,   GENERAL TRANSPORTATION CONTROL PLAN (TCP) EVALUATION
 7,   FUEL AVAILABILITY STUDIES
 8,   AlR QUALITY MODEL INPUTS
 9,   STATE IMPLEMENTATION PLAN (SIP) CONFORMANCE WITH AMBIENT
     STANDARDS
10,   HEALTH ASSESSMENT STUDIES
                                105

-------
                           Table 2

                TYPICAL ONGOING TEST PROGRAMS

1,    ANNUAL IN-USE AUTOMOBILE TESTING PROGRAM

     A,    7 CITIES
     B,    WIDE RANGE OF MODEL-YEARS
     C,    LARGE NUMBER OF EMISSION TEST CONDITIONS
     D,    32000 VEHICLES PER YEAR

2,    ANNUAL AUTOMOBILE RESTORATIVE MAINTENANCE TESTING PROGRAM

     A,    3-4 CITIES
     B,    PRIMARILY NEW MODEL-YEAR VEHICLES
     C,    EXTENSIVE DIAGNOSTIC AND MAINTENANCE WORK PERFORMED
     D,    3400 VEHICLES PER YEAR

3,    IN-USE LIGHT DUTY TRUCK TESTING PROGRAM

     A,    MULTIPLE CITIES
     B,    WIDE RANGE OF MODEL-YEARS
     C,    LARGE NUMBER OF EMISSION TEST CONDITIONS
     D,    3200 VEHICLES PER YEAR

4,    IN-USE HEAVY DUTY TRUCK TESTING PROGRAM

     A,    SINGLE CITY
     B,    WIDE RANGE OF MODEL-YEARS
     C,    LARGE NUMBER OF EMISSION TEST CONDITIONS
     D,    3200 VEHICLES IN FY/8

5,    IN-USE MOTORCYCLE TESTING PROGRAM

    • A,    2 CITIES (HIGH AND LOW ALTITUDE)
     B,    WIDE RANGE OF MODEL-YEARS
     C,    LARGE NUMBER OF EMISSION TEST CONDITIONS
     D,    3250 VEHICLES IN CURRENT TEST PROGRAM

6,    INSPECTION/MAINTENANCE DEMO PROJECT

     A.    PORTLAND/ OREGON
     B,    1972-1977 MODELS
     C,    LARGE NUMBER OF EMISSION TEST CONDITIONS AND
          AMBIENT CONDITIONS
     D,    33000 VEHICLES AND 6000 TESTS

7,    OTHER SMALL TESTING PROGRAMS

     A,    MOPEDS
     B,    DIAL-A-RIDE BUSES
     C,    AMBIENT TEMPERATURE TESTING
     D,    GOOD TECHNOLOGY VEHICLES
                              106

-------
                           Table 3


         TYPES OF VARIABLES COLLECTED FOR EACH TEST

1,    IDENTIFICATION DATA
     MODEL YEAR
     MAKE
     MODEL
     ENGINE DISPLACEMENT
     CARBURETOR VENTURIS
     CATALYTIC CONVERTER
     AIR PUMP
     NUMBER OF CYLINDERS
     TRANSMISSION TYPE
     VEHICLE IDENTIFICATION NUMBER (VIN)
     ENGINE FAMILY CODE

2,    EMISSION DATA
     IDLE DATA (2 POLLUTANTS)
     SHORT TEST DATA (3 POLLUTANTS/ UP TO 5 TESTS)
     FTP DATA (4 POLLUTANTS/ 3 BAG VALUES/ COMPOSITE VALUE)
     OTHER CYCLES (4 POLLUTANTS)
     FUEL ECONOMY DATA
     EVAPORATIVE EMISSION DATA
     SULFATE EMISSION TESTS
     PARTICULATE EMISSION TESTS
     METHANE - NON-METHANE MEASUREMENTS
     MODAL TESTING

3,    AMBIENT CONDITIONS
     TEMPERATURE
     HUMIDITY
     BAROMETRIC PRESSURE

4,    TEST CONDITIONS
     ROAD LOAD HORSEPOWER
     INERTIA WEIGHT
     SOAK TIME
     PRE-CONDITIONING SCHEDULE

5,    PARAMETRIC DATA
     ENGINE IDLE SPEED
     ENGINE TIMING
     MANUFACTURER SPEC VALUES
     COMPLETE DIAGNOSTIC CHECKS (9 MAJOR VEHICLE SYSTEMS/
      5)10 COMPONENTS PER SYSTEM)
     TAMPERING DATA
     DRIVEABILITY DATA

6,    OWNER QUESTIONNAIRE
     NUMBER OF TRIPS PER DAY
     NUMBER OF MILES PER YEAR
     TYPE OF DRIVING
     TYPICAL PASSENGER LOADING
     LAST MAINTENANCE PERFORMED - TYPE AND COST
     FUEL ECONOMY ESTIMATE
     TYPE OF GASOLINE USED

                               107

-------
                         Table 4



            PROCEDURES TO PREVENT SYSTEMATIC BIAS



1,    SPECIFICATION OF GAS ANALYZER, DYNAMOMETER/ AND CVS
     EQUIPMENT (WITH PROVISIONS MADE FOR EQUIVALENT SUBSTITU-
     TIONS),

2,    EPA NAMES REFERENCE GASES,

3,    EPA SPECIFIES ANALYZER CALIBRATION PROCEDURES INCLUDING
     GAS TYPES, CYLINDER FITTINGS, IMPURITY LEVELS, CURVE-
     FITTING PROCEDURE, AND REQUIRED ACCURACY,

4,    SPAN GASES, SIMILAR TO CALIBRATION GASES, ARE SPECIFIED,

5,    SYSTEM PLUMBING MATERIALS ARE SPECIFIED,

6,    EQUIPMENT CHECKS ARE SPECIFIED
          DAILY LEAK CHECKS - CVS AND ANALYTICAL SYSTEM
          WEEKLY COMPLETE CURVE CHECK - ANALYTICAL SYSTEM
     A,

     c!    DYNAMOMETER~WARM-UP PROCEDURES
     D,    COMPLETE CURVE CHECKS AFTER ANY SYSTEM MAINTENANCE
     E,    DYNAMOMETER CALIBRATED BI-WEEKLY
     F,    SAMPLE BAGS LEAK CHECKED BEFORE EACH TEST
     G,    DAILY NOx ANALYZER CONVERTER EFFICIENCY TEST
     H,    COMPLETE DAILY LOGS OF ALL GASES, CALIBRATIONS,
          MAINTENANCE, ETC,

7,    MAXIMUM BACKGROUND LEVELS SPECIFIED,

8,    COMPLETE EPA FACILITY CHECK-OUT INCLUDING TESTS TO
     CONTRACTOR PERSONNEL,

9,    UNANNOUNCED EPA VISITS,

10,   CORRELATION TESTING WITH EPA LAB,
 i
 CONSTANT VOLUME SAMPLER
                             10£

-------
            Data Cards
           Arrive at EPA
                        Support Data
                        Arrive at EPA
                             Data Arrival  Logged
                               Into Record Book
                                     1
                                 Supporting
                               Data Screened
                                                Not
                                Additional
                                Data or
                                Clarification
                            Acceptable
Data Card
Corrections
Received
                Accept-
                able
   Call
Contractor
                                No Solution
                        Eliminate
                         Vehicle
    i
      Additional
      Data or
      Clarification
 Data Cards To
 Computer Group
   Call
Contractor
 Update
Log Book
                          Accept-
                          able
      No
      Solution
Eliminate
Vehicle
                                        Acceptable
Vehicle Added
   To File
                                     I
                                   Update
                                  Log Book
       Figure  1.   Flow  Diagram of  Edit  Procedure
                                     109

-------
1/1

 01
i-H
XI
     CO
2   co
^   W
(2   o
o   o
q   rf
a   &i
      o   w
      55   M
      H
      CO
O    CJ
H    <
O    O,
       *   3
       O   Q
       CO
       CO
                        CO
                        M
                        33
                        CO
                        o
                        ,-1
                     )-l CO
                     3   •
                     CO Q
                     X  HI
                     -o  u
                     0  0
                     (U  U
                     4-1  CO
                     CO
                 0
                 CO



X
X
-3
n
XI
•^s
X
X
X
X

X
X
^-N
to
ra
XI
v_x
X
X
X
X

X
X
U)
C8
XI
X
X
X
X
_
X
X
X
X
X
X

X
X
oo
rt
XI
v^
X
X
X
X

X
X
oo
rt
XI
^x
X
X
X
X
                                                                                                                            4J
                                                                                                                            01 4J
                                                                                                                            CJ 0)
                                                                                                                            4-1 CJ
                                                                                                                      rt  r-H
                                                                                                                     -o  rt
                                                                                                                      O  13
                                                                                                                      E  O

                                                                                                                     •H  E
                                                                                                                      O  >4H
                                                                                                                         O
                                                                                                                      C
                                                                                                                      O  Vi
                                                                                                                     •rl  0)
                                                                                                                      4-i  -a
                                                                                                                      Vl  C
                                                                                                                      o  -H
                                                                                                                     P-I  rt

                                                                                                                      I  CJ
                                                                                                                      OJ
                                                                                                                      4->
                                                                                                                      rt
•O  OJ
 0)  O
 o  a.
 a. co
CO

 S  00
 O  -rl
                                                                                                                     •a      i   i
                                                                                                                      rt >
                                                                                                                      oJ si  co co
                                                                                                                      4-1 3  O M
                                                                                                                     co co  >J K
zeroes,
*
to
d
o
o.
01

v<
o
I-l

to
4J
Vl
rt
_r^
y

c.
•rl
Vi
4-1
to
rl
CJ
N
>-,
rH
ra
d
ra

^
o
OJ
Si
CJ
to
60
C
•H
•d
rt
CJ
M

U
O
CJ
M
Vl
o
o

T3
d
rt

f>
0)
0)
to
d
rt
VI •
to
u-t C
o o
•H
U 4-1
01 CJ
3 OJ
rH
VI 14-1
OJ >
rH
«
d
«J
OJ
VI
3
01

CJ
.*!
•9
X




•
tn
4J
CJ
U

01

rt
4-1
rt
•a

01
XI
4-1

o
U
d
o

>>
rH
4-1
Cl
CJ
VI
Vi
o
CJ




ations
Vi
4-1
d
OJ
Cl
C
o
o

01
rt
60

VI
01
o.
o
Vl
p.

01
x;
4J
4J
rt
X
4-1
01
>J
3
01

OJ
4» 01
Xl >
-rl
01 4-J
« O
CJ OJ
OJ &.
XI 01
to oi
Vi
rt
4-1 OJ
rt x:
T3 4J

OJ 4-1
XI 01
4-1 C
•H
C rt
O t>0
rt
•a
QJ CO
Vi OJ
OJ 3
4-1 r-H
d w
QJ >
01 OJ
vi x;
rt 4J


























•
01
4J
VI
C)
x:
o
o
•H
4-1
IS




4-1
•H
O)
4J
01
01
XI
to

rt
4J
rt
•o

C
o

60
d
•H
T3
rt
u
PS
Vi
CJ
4J
OJ
e
o
VI
rt
M

.M
0
CJ
x;
u
OJ
rH
XI
rt
rH
•H
rt
>
rt

d
SJ
x:
S

4-1
VI
S
XI
o

D.
•rl
U
U
01

Vi
0
4-1
CJ
e
o
U
rt
XI




4-1 OJ
to xi
CJ 4J
4J
4-1
OJ rt
vi xl
3 4-1
U
rt .
01 Vi
Vi 13

VI «}
o
a U
o ,
SrH
,  O  01
                                                                                                                                H  CJ W  -O
                                                                                                                                 0)  O  3  S", S
                                                                                                                                 41  x: en  oj
                                                                                                                                H  co     t£  tr>

                                                                                                                                rH  rH  rt  C  rH
                                                                                                                                 O  rt  S  O  rt
                                                                                                                                 M  S-l XI 4-1  rl
                                                                                                                                 CJ  O  tO  >-, 0)
                                                                                                                                •a  T3 -H  O  13
                                                                                                                                 QJ  CJ W rH  OJ
                                                                                                                                fm  fn     U  {*

                                                                                                                                •II*     II
                                                                                                                             CJ H  CO C=j X  H
                                                                                                                            fe! lii  tu S O  fe
                                                                            no

-------
           Od  CO


           g«
           H
           CO CO
       0     IH
C
o
(J
                       X
                                                                      X
                                                                                     X
                                                                                     X
                                                                                                  X
                                                                                      X
                                                                                                   X
                                                                                                               X
,0
 o
E-"


•o
CJ
•3
1-1
O
CJ
o
u

CO
01
M
3
4-)
CD

CJ
ex
E
cu
4-1

OJ
.C
4-1

4J
cd

'u

OJ
OJ
CO

o

V
o
01


CU
rH
.a
cd
4-1
O.
CJ
o
CJ
cd

d
•rl
J3
4-1
•H
Jj

OJ
M
rt

OJ
CJ
o
1-4
4-)

J4
rj
O
CO

OJ
r;
4J
C
0





CO
•rl

01
O
o
M
4J

a)
^4
u

jj
2



CJ
j_j
3
CO

CJ
^
rt
(^

•a
c
rt

CO
4J
•H
•H
rH


14
3
O


CM
rH

•
X
o

ex
a.
f3

.
OJ

•H


OJ
S
•H
4J

lJ
CJ
a.
o

ex

u
-^
4-1

O
U-l







s^.
WJ
c
•H
4-1
CO
OJ
4-1

O
4J

^4
O
•H
|J
a

>^
rH
CJ
4J
cd
•rH
•d
OJ
g
g
•H

•d
c
•rH
OJ
ex


                                                      60
                                                      C   «
                                                     •rt   •
                                                     •d  o.
                                                      VH  CO
                                                      O  -H
                                                      O  T3
                                                      OJ   •
                                                      V4  tfl
                                                          c
                                                      >H  01
                                                      cu
                                                      Cu  ^
                                                      o   >
                                                      >-i  B
                                                      IX O
                                                          •a
                                                      n  o
                                                      o
                                                     M-H   "^

                                                          CO  -
                                                        .  CU  •
                                                      o  >,
                                                  4J U-4      .
                                                  co  d   «

                                                  >J      OJ
                                                   o   cj  e  cd
                                                   OJ  'rH      VJ
                                                  J^  4:   -  U
                                                  o   o  cu
                                                       > ^S   «>
                                                   CJ       ca   .
                                                  rH   d  E .43
                                                   CJ   O      1J
                                                  •H   O UH  Cd
                                                  J2  CAJ  O  O
                                                   CJ
                                                   OJ
T3

 cd  co
     cd
T3  3
 cu
 C t-H
•H  OJ
 cd  3

•o

 CO  OJ
 cd  D.
 3  o
 CJ
 3 'I t
14-H  O

•O  4-1
rH  ti
 O  3
     o
 4J  g
 P3  O


     o    •
^  GJ  T3
 O  VJ   QJ
 GJ  V^  'O
^  O  T3
O  cj   cd
preliminary 10-min-

j^
CJ

*4J
GJ

*£

GJ
CJ
05

O
U

L>
o

c*
CJ

,
c
G

""j-
u

CO

•>
*""
0)
>
•H

•o


a

3
CU
a
CO
C3
0)
CJ
•H
C-;
CJ
*>

jj
cd
.e


CJ
0)
CO

o


vj
o
CU
(£
o
CU
•H
4J

4_J
r"
to
•rH
M

C
•rl

^
CO
O
CO

MH
O

4J
3
O
ssion controls are
•rH
e
CJ

4-1
C3
^C
3

OJ
CJ
CO

o
4J

^•S,
O
0)
^G
o
ed onto the mechan-
t-i
M
OJ

10
a
cd
u
[ i

-d
d
Cd

4J
d
CJ
CO
0)
j^
ex
4-1
CU
CU
to

cd

R)
•d

d
o
•H
4J
CJ
CJ
r\
CO
C
•rH

rH
cd
CJ
•H
CO
G
CJ
OJ
rH
•o
•H
O
UH
CO
OJ
d
CJ
0)
o.
CO

t^J

•o
CJ

3
CO
cd
CJ
6

•^
CJ
CJ
_f~*
u
OJ
OJ ,
CO
CJ
O
o
01
!-H
•d
•rl
v>
00
d
•H
E

4-1

n
rH
t-H
CU
3
•a

*>
•d
0)
CJ
a.
u
a
o
OJ
ll-
CU
4-J
c
o
rH
CU
O
a.
'd
OJ
^i
a)
4J
d
OJ

cu
M
cd

.X
o
,c
4J

4.J
cd
£.
4J

4J
O
0)
^
M

cd
u
(-3
•d

0
o

4-1
CJ
O
ex
CO
d
•H
                                                                                        111

-------
* 3
o
in
C/5
O
_!
o4 en
q w
< H
H H
0 H
en u}
C-i
t~
























































































































































X
X
X
-
X
X



X
-














X
X





O
o
•8
H
u
cu
o

CO

rt
u
rt
•o

c
o
                          OO
     rt
 CU  T3
13       rt

 o   to   rt
     
  • o - R g VJ VJ O Cu CU U uyoCUCUOCUCUCJ CU CJ Cu O. O. CCCuCuC-CXCUC-(X O. C. O OOCUCU-rlOOOOOOO VJ o — ^ -a C to 13 CU o ^5 •a JS -H Ul JJ 4-1 cu o •a cu C to (0 CJCUGCUCUCUCUCU;ncUCUCUCUCJCU OOCJOCJCJCJJ-J CJ CJ CJ O CJ CJ cu CU o •rl C •H cu o -o • 5 j-i J3 rt •a o C D- o •- o T3 C CU -H Ul 3S cu 3 o c U -rl •a rt cu o CO rH o Cu • C.IX 3 • to K o 6 ,z •a iJ O -rl o VJ CO o xi -a cu o to [ f j X Ul J-l o •rl 3 o rt r-J. a. -y ' rt 3 O «VJ o d cu o X! -H u u rt .. u en vj O VJ o cu o o rt -a E. C ui rt cu co 3 CU o o cu XI J-* O C 3 •• 1 U~t rt VJ cu cu o a. rt o CU VJ co c. H CU u I o o •a a> o 1-1 t-H O 0) rH XI CU o! > • C -H CU O to > (0 Ul •rl W . O C3 CU M vj to rt o cu CJ Ul 'rl o a tt> u Ul -rl -r) T3 VJ tfl O Ci O ^i rH .C > cu c rt •rH CX rt *-* >» VJ Ul VJ O C c: rt o o VJ cu •H VJ Q .M rH O 01 Q) Q) VJ XI VJ VJ VJ 3^333 CO -H U) CO 01 CU CU CU c . .. o rt rt P. 23 A. 112

  • -------
     H
             O
         
         Q  W
         •3  H
         M  <:
         H  H
         O
         to
                                                        X
                                                                    X!
                                                                                        X   !   X
                                                                                                                                                    X
                                                                                                                   X
                                                                                                                               X
                                                                                                                                      X
     OJ
    
    ,0
                          o
                          •H
    
    
                          CO
                          OJ
                          3
    M
    r^
    
    to
    cu    •
    rH   CU
    v-l  rH
    E  XI
         rt
    <4J   C
         O
      •  to
    cu   rt
    to   OJ
    rt   l-i
    
    o   to
    )-l  -H
    3
    
    °--e
      •   o
    
    CD  O
                              CJ -r-l
    
                              O
                              C  O
                             -H  C
                              CO -H
    
    
                              CJ  U
    
                              B  O
                             •H  1-1
                              4-1  CU
        CO
     CJ
     1-1  4J
    
    •H  CO
    
     (I  rH
     c
     O  VI
    •H  O
                      to  to  o   •>
                      CJ  -H     ^~,
                      3  co  x:  c
                      o  rt  u  a
    
                      >J      C  -H
                      o  r:  cj  u
                      C  O  rH  T3
                       o
                       CU
                      x:
                      u
        co
        cu
     to  to
    •H  o
        a.
     O  IJ
    rH  3
     o  a.
    •H
    x;  to
     CU  3
     >  O
        •H
     CJ  V4
     e  rt
    •H  >
     4-1
        V4
    VI  O
     O  VI
    
     4-1  CO  IA
     C  CJ
     3  O  13
     O  rt
     S  rH  -J-
     rt  (X
    
     4-1  CO   *•
     rt  3  O-
    x:  o  ^
     4J  -r-l
        rJ   0)
     QJ  rt  rH
     cu  >  xi
     to       rt
        c   c
     O  -H   O
    
        C   rt
    ,M  CJ      V4
     O  -H
     x;  V4   co
     CJ  TJ  -rl
                                       o
                                       c
                                       o
     c
     o
    
    •a
     c
     rt
    
     cu
     C
     o    •
    
     CJ  "tj
     l-i   rt
     3   o
     to
    
     CJ   O
    
    "rt
     E  -a
         cj
        ,-Si
     ••   CJ
      cj   oj
         w   cs
     u      to
     O  O   Ci
     c  4-i   rt
    
     U .*!   to
     O  CJ   -H
         CU
     1-1 x:  ^N
     CU  O  rH
    x;      cu
          -  3
                                            
    x>
    T3
    
    4.J
    «
    x:
    4J
    
    OJ
    CU
    CO
    
    0
    4-1
    
    y
    o
    CU
    x:
    o
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    4J
    C
    rt
    cu
    •H
    O
    •H
    J
    j_i
    rt
    a.
    
    
    
    
    
    4J
    a
    OJ
    to
    cu
    V4
    P.
    to
    •rl
    
    -d
    !-l
    rt
    o
    
    c
    o
    •H
    4J
    rt
    o
    •r<
    rH
    CU
    CU
    rt
    
    •a
    d
    o
    x>
    4J
    rt
    rt
    4J
    
    CJ
    CJ
    to
    
    o
    4-1
    
    ^
    CJ
    a)
    _r~;
    O
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    -3
    O
    
    •a
    CJ
    , — i
    rH
    •H
    Vl
    
    -CJ
    C
    rt
    
    
    
    
    error
    or obvio
    shee
                                                                                                              rt
                                                                                                             •o
                                                                                                       c:
                                                                                                       o
                                                                                                                                       CO
    
                                                                                                                                       a)
                                                                                                                                       en
    
                                                                                                                                       o
                                                                                                              c
                                                                                                              rt
                                                                                                              CJ
                                                                                                              en
     rt
    •H
    
     Cu  CJ
     O   QJ
    
     P*  to
     a,
     rt   "
         4J
     4J   '.J
     ni  ""o
    
     •M   C
    
    
     cj  T;
     co   o
    
     o   o
     4J   4J
    
    
    
    
    
    
    
    
    
    
    
    4->
    <1)
    CJ
    _n
    to
    
    rt
    i .
    rt
    •a
    CJ
    T3
    O
    E
    
    >^
    cu
    
    C
    r*l
    CTv
    rH
    O
    CJ
    O
    
    CO
    *^^
    
    •a
    cu
    >*
    o
    4J
    C
    
    
    .
    4-1
    u
    
    •
    CJ
    >
    
    V*
    O
    O
    x;
    
    
    rH
    CJ
    CJ
    rt
    Cu
    CO
    XJ*
    
    p ,
    p^
    
    rH
    
    LJ
    O
    "J
    1_!
    M
    O
    CJ
    
    O
    U-l
    
    _^i
    CJ
    CJ
    ^
    u
    
    
    
    OJ
    o
    rt
    Cu
    CO
    
    
    -o
    CD
    O
    £,
    re
    
    OJ
    CO
    -H
    3
    i_i
    U
    •rt
    tc
    
    _^
    o
    CJ
    ^
    u
    
    X— V
    r^
    CO
    
    cu
    o
    rt
    Cu
    CO
    *^s
    
    CJ
    CO
    •H
    ^J
    1-4
    O
    
    O
    CJ
    ru
    01
    0
    rJ
    
    ^
    O
    a
    XI
    u
    
    ndence between
    o
    a.
    U}
    CJ
    >-<
    M
    O
    O
    
    cu
    rH
    XI
    rt
    d
    0
    CO
    ra
    GJ
    l-i
    
    O
    •4-1
    
    ^
    CJ
    O
    r"
    O
    
    
    
    •
    c\4
    
    4-J
    to
    OJ
    1_J
    
    t<3
    
    rH
    
    LJ
    CO
    cu
    4-1
    
    S-l
    o
    
    CO
    4-1
    rH
    3
    W
    CJ
    1-1
    
                                                                                        113
    

    -------
       *~    «=•!
       H 1-9 2
       U.    o
     C
     o
     CJ
           3
           CO
           w  <
           H  H
           C/3  CO
               H
            O W
            CO fe
            tn 33
    C-,
    H
    .0
     ni
    H
                                  >        <3
    
                                  OS     CO  CO
                                         M  O
                                   «     O3
                                  Oi     -Q -H
                                  S     Er  cd
                                  0     3  >
                                  EH     C
                               4J  cd
    
                               O  >
                               n
                               •H    -
                           CO     J3
    
                           3  &T Ow
                           O  S
    
    
                           C  sT Q
                           •H  IJ
                           p  n    •
                           »• -O  C3
                            a) -:t   -
                            u   .  o
                            3 x;  M
                            o. o  oj
                            E > P3,
                            o
                            a
                           CO
                           c
                           o
                          -H  C
                           u  O
                           r;  "H
                           VJ  CO
                           i-l  CO
                           ci  -H
                           a>  E
                           O  £1)
                           C
                           O  CJ
                           O  rH
                              XI
                           CO  C3
                           rt  c
                             I O
                              CO
                              rj
                              t)
                              R!
     cu  co
     CJ  OJ
     S  3
     *J rH
     O  tij
    Xi  >
    
     O  GJ
     O ^H
     d  ex
     cu  B
    T3  tO
     d  co
     o
     o. to
     m  c
     •U M
     M  C
     >-i  d
     O  3
     O  M
                            o
                            o
                           6'
                                                                               114
    

    -------
                              Table 6
    
    
                      TYPES OF COMPUTER CHECKS
    
    1,    RANGE CHECKS ON ALL VARIABLES
    
         EXAMPLE:  68°F 1 DRY BULB TEMP 1 86°F (VALID TEST RANGE)
    
    2,    ID CHECK FOR ALL TESTS WHICH SHOULD BE INCLUDED FOR
         EACH VEHICLE
    
         EXAMPLE:  A SET OF 1/0 CODES ARE BUILT INTO THE VEHICLE
         ID INDICATING WHETHER FTP/  HFET, EVAP,, SULFATE/ MODAL/
         SHORT TESTS ,,, ARE RUN,   THEN/ THE EDIT PROGRAM CHECKS
         FOR APPROPRIATE DATA CARDS,
    
    3,    ID INFO CHECKED AGAINST VIN
    
         EXAMPLE:  MODEL YEAR CHECKED AGAINST VIN CODE,
    
    4,    FUNCTIONAL RELATIONSHIPS ARE DEVELOPED WHEREVER POSSIBLE
    
         EXAMPLES:   MODEL YEAR RELATED TO MILEAGE;  ROADLOAD
         HORSEPOWER RELATED TO INERTIA WEIGHT;  ENGINE SETTINGS
         COMPARED WITH MANUFACTURER  SPECIFICATIONS;   ALLOWABLE
         EMISSION LEVELS DEPENDENT UPON MODEL YEAR;   NUMBER OF
         CYLINDERS RELATED TO ENGINE DISPLACEMENT;  FUEL ECONOMY
         RELATED TO ENGINE DISPLACEMENT,
    
    5,    COMPOSITE VALUES COMPUTED FROM COMPONENTS AND COMPARED
         TO CARD VALUE
    
         EXAMPLES:   COMPOSITE FTP COMPUTED FROM INDIVIDUAL BAGS;
         FUEL ECONOMY COMPUTED FROM  HC/ CO/ COo DATA USING CARBON
         BALANCE,
    
    6,    RANKING COMPARISON OF RELATED VARIABLES
    
         EXAMPLES:   IDLE MODE EMISSIONS LESS THAN HIGH SPEED MODE
         EMISSIONS;  HIGHWAY FUEL ECONOMY GREATER THAN FTP FUEL
         ECONOMY;  COLD EMISSIONS GREATER THAN STABILIZED EMISSIONS,
    
    7,    CHECK THAT EXPECTED BLANK COLUMNS ARE BLANK TO ENSURE
         PROPER COLUMN ALIGNMENT
    
    8,    DATA ON VEHICLE COMPARED TO PREVIOUS DATA ON SAME VEHICLE
    
         EXAMPLE:  EMISSIONS TAKEN AT TWO DIFFERENT LOCATIONS OR
         TWO DIFFERENT TIMES ARE COMPARED,
    
    
                                115
    

    -------
                              Table 7
                  TYPES/SEVERITY OF  DETECTED ERRORS
    
    1,    ERRORS IN TEST PROCEDURE
         A,    DETECTED IN MANUAL AND/OR COMPUTER EDIT
         B,    TEST IS INVALIDATED
         c,    EXAMPLES:  DRIVING TRACE OUT OF SPECS (MANUAL)
              WRONG INERTIA WEIGHT SETTING (COMPUTER)
    2,    ERRORS IN CALCULATION METHODOLOGY
         A,    DETECTED IN MANUAL AND/OR COMPUTER EDIT
         B,    ALL DATA ARE CORRECTED BY LOOKING AT PACKET
         c,    EXAMPLES:  USED WRONG SCALE TO READ EMISSIONS (MANUAL)
              COMPOSITE FTP INCORRECTLY CALCULATED (COMPUTER)
    3,    KEYPUNCH ERRORS
         A,    DETECTED IN COMPUTER EDIT
         B,    ALL DATA ARE CORRECTED BY LOOKING AT PACKET
         c,    EXAMPLE:  ENGINE DISPLACEMENT DISAGREES WITH VIN
              AND/OR  IS OUT OF RANGE
    L\,    MISSING DATA
         A,    DETECTED IN MANUAL AND/OR COMPUTER EDIT
         B,    TEST INVALIDATED UNLESS MISSING DATA CAN BE FOUND
         c,    EXAMPLES:  DRIVING TRACE MISSING  FROM PACKET  (MANUAL)
              BLANK FIELD FOR ENGINE DISPLACEMENT  (COMPUTER)
    5,    DISCREPANCY  BETWEEN DATA CARD AND DATA PACKET
         A,    DETECTED IN COMPUTER EDIT CHECK-OUT  PHASE
         B,    PACKET  VALUE ASSUMED CORRECT
         c,    EXAMPLE:  RLHP READING ON DATA CARD  is OUT OF RANGE
              AND DISAGREES WITH WHAT  IS  RECORDED  IN THE PACKET
                               116
    

    -------
                              Table 8
              TYPICAL REASONS TESTS HAVE BEEN REJECTED
    
    1,    WRONG CVS COUNTS, EITHER TOO HIGH OR TOO LOW
    2,    EXCESSIVE CRANKING TIME, OVER 10 SECONDS WITHOUT REGARD
         FOR PRESCRIBED PROCEDURES FOR RESTART
    3,    WRONG INERTIA WEIGHT SETTING ON DYNAMOMETER
    4,    WRONG HORSEPOWER SETTING ON DYNAMOMETER
    5,    EMISSIONS CONCENTRATIONS READ OFF-SCALE OF ANALYTICAL
         EQUIPMENT
    6,    LABORATORY BACKGROUND EMISSION LEVELS TOO HIGH
    7,    VEHICLE HAS WRONG AXLE RATIO
    8,    SAMPLE BAGS NOT ANALYZED WITHIN 10 MINUTES OF TEST
         COMPLETION
    9,    DRIVER'S TRACE NOT FOLLOWED AS PRESCRIBED
    10,  RECORDING MALFUNCTION/ 110°F DURING TEST
    11,  INITIAL FUEL TEMP, TOO HIGH (63°F)/ OR HIGHER
    12,  SOAK AREA TEMPERATURE TOO HIGH, FOR PRESCRIBED PORTION
         OF VEHICLE SOAK PERIOD
    13,  TEST AREA TEMPERATURE TOO HIGH FOR VEHICLE TEST PERIOD
    m.  CVS TEMPERATURE NOT WITHIN ±10° OF SET POINT
    15,  ANALYTICAL INSTRUMENT(S) SPANNED INCORRECTLY
    16,  TEST ITEM(S) NOT DOCUMENTED AS REQUIRED
    17,  ENGINE TIMING NOT CHECKED
    18,  ENGINE TIMING SET INCORRECTLY
    19,  ENGINE IDLE CO NOT CHECKED
    20,  ENGINE IDLE CO SET INCORRECTLY
    21,  ENGINE IDLE RPM SET INCORRECTLY
                               117
    

    -------
                              Table 9
                    CURRENT DETECTED ERROR RATES4
                  MANUAL
    PROGRAM
    1
    2
    3
    
    CONTRACTOR 1
    14%
    17%
    17%
    COMPUTER
    CONTRACTOR 1
    CONTRACTOR 2
    
    21%
    26%
    (PROGRAM 1)
    CONTRACTOR 2
    CONTRACTOR 3
    
    
    32%
    CONTRACTOR 3
                                   10%
    50%
    *    PERCENTAGE OF VEHICLES WITH AT LEAST ONE ERROR DETECTED,
         LESS THAN 5% OF TESTS ARE INVALIDATED - MOST ERRORS
         CAN BE CORRECTED,
                               118
    

    -------
                              Table 10
                             EDITING EFFORT PER CAR (MANHOURS)
    
    
                                         PROGRAM 1    PROGRAM 2
    
    LOGGING, FILING, SCOREKEEPING           ,1
    INITIAL REVIEW                          ,3
    REVIEW AND SCOREKEEPING, RETURNED       ,1           ,2
      AND RESUBMITTED PACKETS
    SUPPLEMENTAL TESTS                      ,2           ,4
    COMPUTER EDIT                           ,05          ,15
    PRO-RATED COMPUTER PROGRAM              ,05          ,3
      DEVELOPMENT
    PRO-RATED MANUAL EDIT                   ,10          ,5
      PROCEDURES DEVELOPMENT**                          	
                                            ,90         2,35
         BASED ON 100 HOURS OF EFFORT SPREAD OVER 2000 VEHICLES
         FOR PROGRAM 1 AND 120 HOURS OF EFFORT SPREAD OVER 400
         VEHICLES FOR PROGRAM 2,
    
         BASED ON 200 HOURS OF EFFORT SPREAD OVER 2000 VEHICLES
         FOR PROGRAM 1 AND 200 HOURS OF EFFORT SPREAD OVER 400
         VEHICLES FOR PROGRAM 2,
                              119
    

    -------
            oo
    4-1
    ^1
    O
    cu 14-1
    CO MH
    3 W
    o
    1 iH
    
    /— s
    CO
    
    3
    O
    A
    h
    a
    
    
    
    
    o
    o
    CM
    
    
    
    •K
    *
    m
    CM
    CM
    r-l
    
    
    
    
    m
    ^
    00
    in
    
    
    
    
    
    0
    m
    co
    
    
    
    
    
    O O
    o m
    r^ co
    ^N
    CO
    ^
    6
    ^
    •
    -a-
    ^
                          w
    CO 00
    O F^
    c_> ^
    
    CU 1
    4-1
    CO 1^
    
    o --
    »4
    o
    o
    o
    o
    oo
    m
    
    
    
    o
    o
    o
    o
    o
    CM
    •%
    r-H
    
    O
    O
    O
    0
    o
    CM
    {/>
    
    
    o
    o
    o
    m
    i^.
    rH
    
    
    
    O
    0
    o
    o
    in
    CO
    vy-
    
    
    o
    o
    0
    o
    m
    rH
    
    
    
    O
    O
    o
    o
    rH
    00
    ft
    
    0
    O
    O
    m
    *sQ
    *3"
    «i
    {/i-
    
                           cu oo
                           4J r^
                                                                                                        u
                                                                                                        cfl
                                                                                                        cu
                                                                      o
                                                                      .c
    
                                                                      o
                                                                      m
                                                                      oo
            •H
             M
    M
    1
    ra
    cu r-^
    H r~»
    O ^~
    •H 0-
    O
    o
    00
    o
    o
    o
    m
    CM
    o
    m
    CM
    o
    o
    CM
    O
    o
    r-l
    O O
    o o
    o o
            cn
     ca
    H
    4-1
    o
    cfl
    )-l
    4J
    G
    O f-\
    U 
    ^-^
    r-l
    CO 4-1
    4-1 CO
    O O
    H U
    
    
    O
    0
    o
    *
    0
    0
    V0
    ^
    r-l
    •W
    
    
    
    
    o
    o
    o
    •s
    o
    o
    o
    r»
    CN
    •C/J-
    
    
    
    
    o
    o
    o
    
    o
    0
    CM
    •{jy-
    
    
    
    
    
    
    o
    o
    o
    
    m
    
    rH
    •CO-
    
    
    
    
    
    
    CO
    o
    o
    *
    o
    o
    
    •C^h
    
    
    
    
    
    
    0
    o
    o
    »l
    o
    m
    rH
    {/)-
    
    
    
    
    
    
    o
    o
    o
    
    o
    o
    
    *
    CM
    •0>
    O
    O
    O
    •V
    o
    CO
    m
    9\
    [X^
    
    
    
    CU
    4-1
    o cu
    O CO
    Z cu
    rH
    rH O
    Cfl -H
    O CU
    H >
    
    
    O ON O O
    CM CM m m
    CM CO CM CM
    CM CM
    
    
    
    
    
    
    0 O O
    O 0 0
    **^ *"H **^
    ^O
    
    
    
    
                               cfl
                              la
                                                                                         ON
    W
    
    m
    
    
    £
    o
    o
    m
    00  CO
    
    o  a
    o  3
    O  )H
    vC  IH
     O
     4-)
    
    a  co
        CU
    in rH
    r~  o
                                                                        m
                                                                                      13
                                                                                       C
                                                                                       cfl
                                                                                       O
                                                                                      PL.
                                                                                  CO
                                                                                  cu
                                                                                  o
    
                                                                                  cu
                                                                                  3
                                                                                  a1
                                                                                  cu
                                                                                  co
    O
    m
    CM
    
     0)
    •a
    
    
     a
     c
                                                                                                                   *
                                                                                                                   *
                                                               120
    

    -------
                               Table 12
    
     FUTURE APPROACHES - CURRENTLY BEING TESTED ON A TRIAL BIAS
    
    
    1,   MORE AUTOMATED DATA ACQUISITION
    
         A,   DEDICATED COMPUTER ON SITE (GENERATE DRIVING
              TRACE/ SET DYNAMOMETERS/ AUTOMATIC DATA RECORDING),
    
         B,   CENTRALIZED COMPUTER,
    
         c,   COST is HIGH ($75K - 125K FOR A DEDICATED SYSTEM)
              AND DIFFICULT TO JUSTIFY FOR YEAR AT A TIME CONTRACTS,
    
    2,   REQUIRE STRICTER CONTRACTOR DATA EDIT PROCEDURES
    
         A,   ALL KEYPUNCHED DATA MUST BE ENTERED AND THEN
              VERIFIED BY TWO DIFFERENT PEOPLE,
    
         B,   REQUIRE CONTRACTOR TO APPLY MANUAL AND COMPUTER
              EDITING TECHNIQUES,
    
         c,   CHECK-OUT OF ALL CONTRACTOR DATA HANDLING PERSONNEL,
    
         D,   CONTRACTOR WILL SUBMIT ERROR PRINT-OUT WITH EACH
              GROUP OF TEST PACKETS,
    
    3,   STATION EPA PERSONNEL FULL-TIME AT EACH TEST SITE
    
    4,   ASSUMING CONTRACTOR ERROR RATE DECREASES/ USE SPOT
         INSPECTION OF PACKETS
    
    5,   USE STRATIFIED SAMPLING SPOT INSPECTION OF PARAMETERS
         TO MINIMIZE COST OF ERROR TIMES VARIANCE IN VARIABLE J,
         STRATA CAN BE DIFFERENT DRIVERS, ETC, EACH STRATA FOR
         EACH VARIABLE IS INVERSELY PROPORTIONAL TO THE SQUARE
         ROOT OF ERROR COST AND DIRECTLY PROPORTIONED TO THE
         VARIANCE OF THE MEASUREMENT IN THE STRATA,   THIS IS
         ONLY GOOD FOR CORRECTLY ESTIMATING THE MEAN OF VARIABLE
         J,
    
    6,   SEQUENTIAL LIKELIHOOD RATED TEST,  BASED ON MINIMIZING
         THE COST ASSURE A GIVEN OVERALL ERROR RATE,  START BY
         EDITING THE VARIABLE WITH THE HIGHEST IMPACT ON THE
         EMISSION RATE OR THE SMALLEST COST/BENEFIT RATIO IF
         COSTS OF EDITING ARE DIFFERENT,
                                121
    

    -------
                              Table 13
                            SPOT TESTING
    QUESTION:   GIVEN A TOTAL TEST POPULATION OF N CARS, IF
         WE EDIT A SAMPLE OF Y CARS AND FIND NO ERRORS/ WHAT
         IS THE LIKELIHOOD THAT THE ERROR RATE IN THE ENTIRE
         POPULATION IS LESS THAN X%,
    
    
         ASSUME UNKNOWN ERROR RATE OF P%, SAMPLE SIZE N/
         TOTAL NUMBER BAD TESTS X = PN, TOTAL NUMBER OF GOOD
         TESTS IS N-X = Y,
             ,  OF N GOOD PACKETS IS
    
    
      -  Y  (Y-l)   (Y-2)    (Y-N+1)   _    Y!(N-N)!
         N  IIFTI   TIFZT    (N-N+1)   ~   (Y-N)!N!
    
    
         SEE P (CONFIDENCE LEVEL)/ N, p (ERROR RATE)
    
    
         DETERMINE N
                                122
    

    -------
    43
    cfl
    H
            CB
            e
           •H
            O
            to
            to
           W
    
           MH
            O
           to
           QJ
           U
            O
    
           rH
            Q)
    
    
    
    
    
    
    
    
    
    
    
    
    
    o
    Ox
    0
    00
    
    
    O
    Is*
    
    o
    
    0
    LO
    
    o
    •*
    o
    ro
    m
    CN
    0
    CN
    • *
    0
    O
    •H
    4J
    ca
    4-1
    QJ
    to
    &
    
    QJ
    4J
    rj
    H
    O
    Ox
    0
    00
    
    
    O
    r~-
    
    o
    
    
    
    
    
    
    
    
    
    
    
    
    no errors among n
    CO
    13
    0
    •H
    UH
    
    QJ
    0
    O
    
    <4H
    H
    Ox
    Ox
    VO
    OX
    
    
    rH
    Ox
    
    ^)-
    00
    m
    r^
    
     Cfl CO to Q) Cfl -H
    4J 4-1 42 42
    QJ QJ Q) 0 4J O CO
    rH to £t QJ 4-1
    •H QJ O to QJ Cfl
    42 42 cfl 0 O 42 QJ
    S 4-i cx <; IH 4-1 4-1
    
    
    
    J>^
    4-1
    0
    •H
    cfl
    M
    QJ
    O
    Ox
    °\
    A!
    
    
    
    
    
    
    r-- ox
    OX OX
                    co
                    4J
                    CO
                    QJ
            CN
                    O
                    o
                    0
                    o
                    •H
                    4J
                    3
                    cx
                    o
                    p.
                    cfl
                    4-J
                    O
                    to
                    O
                             CO
                             QJ
             Ox
    
    
             00
    
    
             r-
     to
     Q)
    
    1
                >,
                4-1
                0
                •H
                ca
                        O
                        m.
                        v
                                                                            rH 
    -------
     DATA VALIDATION TECHNIQUES USED  IN MOBILE
                  SOURCE TESTING
                         by
                  C.  Don  Paulsell
    Office of Mobile Source Air Pollution Control
        U.S. Environmental Protection Agency
             Ann Arbor, Michigan  48105
                         125
    

    -------
    INTRODUCTION
    
    The EPA  laboratory  at Ann  Arbor is the  primary government  facility
    responsible  for  certification  testing  of  engine-driven  vehicles  to
    determine compliance with the standards for emissions levels  and  fuel
    economy.   Approximately 2500 to  3000 vehicles of foreign  ami  domestic
    manufacture  are  tested annually.   This  testing is  performed  in  10
    dynamometer test cells using the constant-volume sampling  (CVb)  method
    to collect  emission  samples from vehicle exhausts •   The samples  are
    analyzed  on seven analyzer sites each  equipped with all  of the various
    instruments necessary for sample analysis.   As the vehicles  are
    operated  through a prescribed simulated  driving  cycle, sufficient  data
    are also  recorded to determine  fuel economy.  A complete  data set for
    a vehicle  includes  information  such  as vehicle identification data,
    test specifications, instrument  calibrations, calibration data  corre-
    lations,  test  data,  calculated   (reduced)  test  data, vehicle  manufac-
    turer's  test results,  EPA test  results, and  quality  control data.
    After  these data have been collected and/or generated,  they  are
    subjected  to  quality control procedures  to assess overall  accuracy,
    precision, uniformity, and validity.
    
    QUALITY CONTROL SYSTEM
    
    The "products"  of  our test process are the data which represent the
    intangible  exhaust  emissions of the  vehicles   tested.   The quality
    control system assesses the acceptability of this  product in  terms of
    accuracy, uniformity, and validity.
    
    Accuracy is  important  since these  data are used to decide whether a
    vehicle meets federal standards.  Moreover,  a financial  penalty  may be
    applied  to any  manufacturer  for not  meeting  the  standard   for  fuel
    economy.    This assessment  is  five  dollars  per vehicle  produced  for
    each tenth of  a mile  per gallon less  tnan  the standard.  Thus,  the
    question of  accuracy could potentially involve millions of  dollars.
    
    Since accuracy can be  a relative attribute,  the data  are  also checked
    for precision and uniformity to determine whether  measurements  can be
    repeated   at an analyzer  site  and  whether  results from  each  oi  the
    seven analyzer  sites  are essentially equivalent.   Finally, since the
    "data product"  is  dependent upon the test process  used,  the  validity
    of that process must be verified.  Data validation techniques  comprise
    a very important part of the total quality control system.
    
    TYPES OF DATA VALIDATION
    
    Data validation  begins  long before  the vehicle  test  is performed anu
    continues after the vehicle has  been returned to its company.
    
    This broad  application  of data  validation is illustrated in  the  five
    parts of  the overall process.
    
                                    126
    

    -------
            1.  Calibration Acceptance
            2.  Operational Verifications
            3.  Procedural Checks
            4.  Test Data Review
            5.  Comparative Measures
    
    The  following  paragraphs  discuss  each  of  these  areas and  provide
    examples of the methods used.
    
    CALIBRATION ACCEPTANCE
    
    A wide variety of  instruments and equipment are  used  in the measure-
    ment process.   It  is  obvious that each unit must  be  calibrated, but
    what is not  obvious  is how to validate  that  a  calibration has normal
    characteristics.   The calibration  procedures  are  often more  compli-
    cated  than  the test  procedures  -  an  erroneous  calioratiori can only
    produce erroneous test data.
    
    The QC  methods  used  for calibration validation  emphasize the  quanti-
    tative  aspects of  the  equipment  characteristics.   For example,  a
    dynamometer  is  calibrated  to establish  residual  bearing  frictions.
    These frictional values tend to have predictable magnitudes across all
    dynamometers.   Use of  this  characteristic can  provide  confidence in
    both accuracy and uniformity for dynamometer calibrations.
    
    The analyzer and  constant  volume sampler  also  have unique character-
    istics.   An  analyzer  curve can be  assessed in  terms of nonlinearity,
    curve fit deviations,  and  the absence  of  inflections.   A CVS utilizes
    a critical  flow  venturi which has  a  characteristic discharge  coeffi-
    cient  of  .985 to  .995.   This  coefficient,  the  ratio  of  actual  to
    theoretical  flow for  a given  throat  diameter,  can  be  used  to assess
    flow metering accuracy and long term stability.
    
    The dynamometer, constant volume sampler  (CVS),  and analyzer represent
    the  three major  components  of  the  measurement process.   A proper
    calibration  is a  necessary condition  for  getting valid  test results,
    but the operational verification is equally important.
    
    OPERATIONAL VERIFICATIONS
    
    This phase  of  the process is used  to assure that  the  equipment can
    measure and  produce  a known  result.   Special  tests  are conducted at
    daily, weekly, or bi-weekly  intervals  to  produce a QC parameter whicti
    can  be normalized  relative  to  all  systems.    These  parameters  are
    manipulated  statistically  or  plotted  graphically to assess control of
    the process accuracy and precision.
                                    127
    

    -------
    For  example,  the CVS  is  checked by  injecting  a known  mass of  pure
    propane as though it  were  auto exhaust.  All measurements  and  calcu-
    lations are  performed as  in  a  test and  the result must  be  within
    plus  or minus  2 percent  of  the known value.   Leaks,  calibration
    drift,  erroneous  analyzer  span gas  values, and many other  parameters
    can cause the verification to fail•
    
    The  analyzer  is verified  daily  by  analyzing a  bag of blended  gases
    at  each of  the  seven analyzer  sites.    The deviation  of  each  site
    from  the overall  average  serves  as  the normalized  parameter.   A site
    which is consistently high or low or inconsistent  will be obvious from
    the automated control chart analysis.  Positive  or negative  consecutive
    runs  greater  than five, or  excessive data scatter are  automatically
    flagged and noted by a QC message on the analysis  printout.
    
    The dynamometer gets a short version of  the full calibration to  verify
    its  stability.    Control   charts  of  flywheel  frictional values  will
    graphically  show a  deteriorating   bearing  or  load control  problem.
    
    Finally, a  repeatable car  is  tested on  each site  utilizing all  the
    normal  components  of the  system.   The  emission  results are statis-
    tically analyzed to show significant differences between sites.   These
    operational verifications each address a specific  part  of the process,
    and  when  assessed  in total,  provide  assurance  that   the  system  is
    capable of producing  valid  emission results  from  a  properly conducted
    test procedure.
    
    PROCEDURAL CHECKS
    
    A complete emissions test can require a total of about  eighteen  hours,
    including the twelve hour overnight  "soak" period.  The  specifications
    and  criteria  are so numerous  that  a set  of  checklists  have  been
    developed  to  document  that  each one has been  clone  properly.    Test
    times,  temperatures,  shift  patterns,  horsepowers, special procedures,
    and  many  other  conditions  are  noted  or  checked  off  as  each  phase
    progresses.   In some  cases, such  as  fueling,  the  operation must  be
    witnessed by  two people,  since  the  type  of fuel  can  greatly  affect
    emissions.
    
    Although  the  test  equipment  has   been  previously  verified,  several
    checks  are performed  as  part of the  test.   An  open valve or improper
    horsepower setting would cause the test to be voided-
    
    At  the end  of  the  test  process   all  the  stripcharts,  cnecksheets,
    datasheets, and  driving traces  are consolidated,  reviewed,  and sent
    for computer processing.
                                    128
    

    -------
    TEST DATA REVIEW
    
    The test processing office validates  that all necessary data have been
    obtained.   The  data  sheet  is  then batch  processed by  computer  to
    generate a printout of input data, calculated results, QC checks, and
    pass/fail criteria.  The  computer program has  been designed to audit
    the various test  data for  omissions  or  unrepresentative values.
    
    For example, a  40,000 pound  automobile  would  likely be a 4,000 pound
    value improperly keypunched.  Other  data,  such as ambient background
    concentrations   can  be compared  to   a  normal  distribution  of  values
    obtained  at  EPA to  flag  high  levels.   Higher values  may indicate
    improper  analyzer  parameters or  a  leaking  vehicle  exhaust  system.
    Since some of the test sequence is repeated, a ratio  of two flowrates
    or  distances travelled  can be very  useful  in highlighting abnormal-
    ities.   A normalized  ratio has become  a valuable  tool because it is
    not affected by the  magnitudes  of parameters, which  may  normally be
    different.   It  is  the ratio of these  different magnitudes that pro-
    duces a  value   which  lies within  a  narrow  bandwidth.  The  ratio of
    highway  to city fuel economy  is an  example  of  this application.
    
    If all data  have been  validated and  all acceptance criteria met, the
    documentation is stored and  the  results are updated  as valid  in the
    computer data base.  While this completes the processing of one test,
    it is not the end of  the  data validation process.
    
    COMPARATIVE TESTS
    
    Each  test  alone has certain characteristics,  and  all tests combined
    have other useful measures.  Comparative tests  on large populations of
    vehicle  results  can highlight differences  and  trends that an indivi-
    dual test does  not  show.
    
    The manufacturer has normally tested the vehicle prior to EPA's test,
    so  an independent set of data  is  available  for comparison.   The
    MFR/EPA  emission differences and percent  differences are calculated
    and stored in a "paired data" file.  These normalized values can then
    be  statistically summarized for each manufacturer group.  The results
    of  this  analysis  show the  relative agreement between EPA and  all
    individual manufacturers.   If EPA is consistently  higher or lower, a
    systematic  bias may be  indicated.   Diagnostic  tests or correlation
    programs can be performed  to identify and correct the  cause.
    
    Statistical analysis of all these  data  can  provide  the upper and lower
    limits which  are used to assess  the  significance of a  bias.   Test
    conditions  and  equipment  identifiers   can  be  used  to  stratify the
    analysis for assessing whether such  things as  altitude differences or
    specific test  sites correlate  with  the  paired  data differences.
    Finally, the data  validation loop can  be  refined  by the statistical
    determination of QC limits.
    
                                  129
    

    -------
    QUALITY CONTROL REFINEMENTS
    
    A strong data  validation  program  can be developed by automating many
    of  the  checks  being  made-   Computerized  validation  and  acceptance
    tests require  that  the  data be pertinent  and  accessible.  Use ot  an
    integrated data base structure  can minimize manual operations,  improve
    security, and assure the integrity of the  data.
    
    A computerized  data base  can also enable  the automation  of  screening
    programs,  plotting  routines,  and  statistical  summaries.    It will
    permit rapid development of more precise tools and tests  which can  be
    used in the data validation process.
    
    Finally,  a computer  data  base can provide  a  trail for  audits  or
    requests for documentation.
    
    CLOSURE
    
    This paper has shown that the data validation process is not simply  an
    inspection of  results at the  end of a  test.   Rather,  it is a combi-
    nation of  specific  individual  tests  and checks which when  taken as a
    whole,  form  the  foundation  for a  quality control  system  which can
    provide documented, quantitative assurance  that  the "data product"  of
    the EPA mobile source  program is fit  for use in our regulatory  process.
                                   130
    

    -------
    VALIDATION OF CONTINUOUS STACK MONITORING
                       DATA
                         by
                 Joseph  E. McCarley
    Emission Standards and Engineering Division
        U.S. Environmental Protection Agency
    Research Triangle Park, North Carolina  27711
                         131
    

    -------
                       VALIDATION OF  CONTINUOUS  STACK MONITORING
                                         DATA
                                     J.E.  McCarley
                                        SUMMARY
           The Emission Standards and Engineering Division  is  currently  developing
    a revised standard of performance for new  steam generators.   As  part of this
    study, the feasibility of continuous regulation of  sulfur  dioxide  emissions,
    as well as a percentage of sulfur reduction  from fossil  fuels,  is  being
    evaluated.  In support of this study, the  Emission  Measurement  Branch is con-
    ducting sulfur dioxide continuous monitoring projects  at five coal-fired
    power plants equipped with flue-gas desulfurization units.  When data are be-
    ing collected for supporting regulations,  validation of these data is an
    important consideration.
           Prior to collecting emission data,  the continuous monitoring  systems
    are validated by following the procedures  described in Performance Specifi-
    cations—Appendix B 40 CFR 60.  (Performance Specification 2--Performance
    Specifications and Specification  Test Procedures for Monitors of S02 and NOX
    from Stationary Sources.)
           The monitoring data are then collected and recorded continuously from
    each emission point at least once every 15 minutes.  In this study,  the data
    are then placed in a computer bank, printed and then edited or  validated man-
    ually.  During the monitoring periods when data are collected during instru-
    ment malfunction, calibration, or plant upset conditions the time  periods for
    these conditions are recorded by plant personnel.  These data are  purged from
    the computer bank and the remaining data are averaged for  each  1-hour, 3-hour,
    8-hour, 24-hour, and 30-day periods of time.  If more than one  15-minute data
    point has been determined to be invalid in any one hour period, that entire
    1-hour data are considered invalid and not included in the longer averaging
    periods.  In summary, the data are edited for actual known errors  and no
    
                                          132
    

    -------
    statistical  validation procedures are performed.
          Further details of these monitoring projects  are contained in the fol-
    lowing report and references therein: Kelly, W.  and Sedman,  C.  First Interim
    Report: Continuous Sulfur Dioxide Monitoring at  Steam Generators.  EMB Project
    No. 77SPP23A, Emission Standards and Engineering  Division,  Office  of Air
    Quality Planning and Standards, U.S. Environmental  Protection Agency, Re-
    search Triangle Park, North Carolina  27711, June 1978. 54pp.
          Future plans for evaluating validation procedures include (1) applica-
    tion of more automatic recording and data validation instrumentation, and
    (2) quality control steps to assure the accuracy of long-term emission
    monitoring.
                                         133
    

    -------
             SCREENING CHECKS USED BY THE
               NATIONAL CLIMATIC CENTER
                           by
                    William E.  Klint
    National Oceanic and Atmospheric Administration
               National Climatic Center
                    Federal Building
           Asheville, North Carolina  28801
                           135
    

    -------
                            SCREENING CHECKS USED BY THE
    
                              NATIONAL CLIMATIC CENTER
    
                                      W.E.  KLINT
    
    
                                       ABSTRACT
    
    Current processing is discussed with emphasis on validation checks and manual
    interface.  The need for an automated quality control  program is recognized
    and plans for such are presented.  Plans for a new modular surface edit are
    presented along with a new quality control  procedure using an interactive
    graphics system.  Data management is addressed through a Data Dictionary/Data
    Base Management system.
                                         136
    

    -------
    The National  Climatic Center is:
    
         Responsible for receipt, processing, archiving and publication of
         climatological  data.   Coordinates the analysis of past meteorological
         data for NOAA,  other Government agencies and the oublic to accommodate
         user requirements for climatological data through special studies
         and statistical analyses.   Manages the national program of climatolog-
         ical data recall and works closely with the military in meeting
         this special requirement.   Provides facilities, data processing
         support, and expertise, as requested, for World Meteorological
         Organization programs (e.g., 6ARP and GATE).  Assists in training
         programs to familiarize the representatives of developing countries
         with modern meteorology and coordinates (through World Data Center-
         A) international exchange of climatic data.
    
    Of the various types of incoming data, paper forms predominate.  These
    then must be keyed to digital form for processing.  This effort entails
    keying approximately 37 million bytes of data per month.  Because of a
    cutback in funding years ago only three-hourly surface observations, or
    eight observations per day, are digitized.
    
    At the present time, processing exists in two modes; a machine edit, and
    a manual interface.
    
    The machine edit consists of data verification, a range limit check, a
    cross-field consistency check,  a continuity check, and appropriate flags
    to "verifiers."
    
    Data verification is a simple machine check to see if there is indeed
    data keyed into the appropriate field, and if there, and the field is
    coded, is it a legitimate code.
    
    The range limit checks to see if the value in a particular field falls
    into an appropriate range.  However, at the present time there is only
    one range limit per field.  This, in and of itself, causes many unnecessary
    "kickouts."
    
    The cross-field consistency check looks at the entries in related fields
    for consistency; i.e., clouds and precipitation.
    
    The continuity check does a range limit check on certain fields between
    the previous observation and the one being checked.
    
    Finally, if any of the above checks fails, the appropriate flags are
    printed out for return to the "verifiers" and appropriate action.
    
    The manual interface, due to the magnitude of data, consists primarily
    of a visual scan of all forms.   A random sampling of stations receives a
    closely scrutinized check of all observations.  Problems with the data
    requiring corrections are handled as follows:  first, the erroneous
    entry is crossed through with a blue pencil and the "correct" entry is
    made directly above the erroneous one.  Second, if the observation is
    one which is normally digitized, a change form is routed to key entry.
    
                                          137
    

    -------
    The "kickouts" from the machine edit, which were returned for action,
    are scrutinized, a decision on validity is made, and,  if necessary,  a
    correction is made both on the original paper form and on a change form
    to key entry.
    
    The change forms are routed to key entry for digitizing and the changes
    are again run through the machine edit.
    
    The above procedure is a recurring one until no more errors appear.
    Once all the data "pass" the edit, they are formatted into the surface
    observation file and entered into the data bank.2
    
    It is fairly obvious that, due to the rather limited nature of these
    checks, some erroneous data slip through and are placed into the data
    bank.  This fact, coupled with the realization that the magnitude of
    incoming data in digital form is on the increase, and with the fact that
    a more closely "real time" edit is both possible and needed, is forcing
    changes upon NCC.
    
    Although the basic processing stages of machine edit and manual interface
    will remain the same, the nature of each will take on a new and challenging
    meaning.
    
    With the innovation of the new National Weather Service Automation of
    Field Operations and Services (AFOS) system, the NCC will acquire near
    real time collection capabilities of data in digital form.  These, plus
    manuscript forms, create a real need for dual processing of data.
    
    The edit computer program is being completely rewritten, as in its
    present form it is difficult to maintain.   It is designed in a modular
    form and many previously manual functions are designed  into the program.
    
    The creation of a Master Station  Inventory  (MSI) will completely change
    the complexion of the edit program.  The basic  edit routines remain the
    same, with the following changes:
    
         1.  The verification step will  now be  checked against  the MSI for
    validity.  Previously some missing entries  were flagged  to  a "verifier"
    whether they were missing or simply  not observed at that particular
    station.  The MSI will  now be checked  for proper disposition before an
    error flag is returned,  thus alleviating  the  "verifier"  of  this task.
    
         2.  The creation of  the MSI  will  allow for a complete  set of range
    limits  for every field  of every  individual  station, thus preventing
    unnecessary  "kickouts"  for "good" data, and providing  for  a narrower
    range limit  check of each field.
    
         3.  Cross-field consistency  checks will  remain basically  the same
    with the provision  that with the  above mentioned checks, should be more
    reliable.  They have been  "beefed-up"  to  contain closer checks and
    checks  previously left  to  the  "verifiers."
    
                                          138
    

    -------
         4.  If an error is isolated and a flag is called for, a check is
    first made with the MSI to see if a mathematical  relationship exists.
    If one does, a new value is calculated and entered beside the original
    with an appropriate flag.
    
    If an error is isolated and no mathematical relationship exists,  the
    appropriate flag is issued and the observation queued for scrutiny by a
    "verifier."  All observations changed by a "verifier" are automatically
    re-entered into the edit program.
    
    The manual interface by the verifier will consist of interacting  with
    the data through use of an interactive graphics system.  The "verifier"
    previously had only manuscript forms as input to his decision.  Now he
    will be able to present the data in any of several displays including
    contoured map analyses of a surrounding areal coverage.  With this input
    the verifier will be able to make a more intelligent decision as  to
    proper disposition of questionable data.
    
    Up to this point we have discussed only a superficial edit of the incoming
    data.  We have not, as yet, looked at the inherent quality of the data
    itself.  NCC, at the present time, does not have the capability of doing
    relational checks on the data.  With the acquisition of the Asymptotic
    Singular Decomposition (ASD) model, developed by Dr. John Jalickee,3
    CEDDA, NCC now has this capability.
    
    In its simplest terms the ASD model uses the method of least squares on
    a data matrix.
    
    The first step is to calculate a "characteristic" vector for the matrix.
    Next, the differences between the data matrix and the appropriate "charac-
    teristic" are calculated.  The matrix is now overlayed with these dif-
    ferences and the process is iterated.
    
    The first component of vector magnitudes, when plotted, results in a
    graph of the dominant features;  the second component, the features of
    the difference matrix, etc.  We  have found that with most data fields,
    the second and third component plots prove to be the most useful  for
    validation.  By the time the fifth component plot is made, we have
    usually reached the noise level.
    
    The data, thus plotted, can be expected to show "continuity."  The
    physical relationship of the field should be apparent in the graph.  If
    that relationship breaks down at any point in the graph, we can assume
    bad data.  This model will give  NCC the capability to perform quali-
    tative (relational) checks on all  incoming data.
    
    A side effect benefit of this model is  the capability of building a
    station "normal" situation.  Based on this, such things as instrument
    drift, miscalibration, and erroneous launch data become readily apparent
    when exposed to a trend analysis.  Once  isolated, these "bad" data can
    be adjusted  "toward" the normal  with at  least some degree of  accuracy.
    
                                           139
    

    -------
    The concept of the "verifier's"  job changes somewhat under this  new
    approach.  The computer edit now will  do much of the job the verifier
    did previously, thus relieving him of  that task; the bulk of which was
    scanning "good" data.  Upon his  arrival  for duty,  he sits down at a KCRT
    console or terminal  and calls up the flag file for his particular area
    of interest.  He sits in the seat of judgment and  makes those decisions
    too delicate or volatile to have been  programmed into the edit routine.
    Once made, these observations are returned to the  edit queue to be run
    once again.  Only after an observation "passes" the edit program is it
    allowed to continue into the ASD model.
    
    The results of the ASD run are displayed in one of several graphic modes
    for verification.  Realizing that the  normal range of this display is
    from +0.5 to -0.5 units, its power and usefulness  becomes apparent.
    Remember here that this is a display of the second or third component of
    the data field, and, as we are working with differences, should nicely
    fit within this range.  The "outliers" will stand out here with striking
    notoriety.  The verifier now has the task of "replacing"* the "outliers"
    with a more reasonable value.  This can be done simply by sight align-
    ment of his cursor or light pen with the trend of the curve or by having
    the computer do a best fit.  Although  this sight alignment appears to
    be a rather gross correction, when it is "blown up" into the initial
    state it becomes very tolerable.
    
    All original data are kept, with corrections and appropriate flags being
    entered adjacent to them before being  incorporated into the NCC data
    bank.  This will allow use of either datum by the user.
    
    The NCC is currently planning a database environment.  This quality
    control process will allow us to place only QC'd data of a high relia-
    bility into our database, thus assuring the user of quality data.
    Another side effect of ASD is its compaction possibilities for storage.
    The set of components for a data field can be "blown up" to explain 99%
    of the original field; thus NCC can store components and blow them up  to
    the "original" field on output.  This will result in many orders of
    magnitude reduction of the necessary storage facilities.
     *Note  here  that  "replace" does not imply that we destroy the original
     value.   It  will  be maintained and output along with the corrected value.
    
                                          140
    

    -------
                                     REFERENCES
    For your convenience,  a copy of the following three references  are included
    herein, starting on the next page.   The generosity of Mr.  Walter James  Koss,
    Primary Data Branch, EDS,  Asheville, NC 28801, for supplying these references
    for publication in these Proceedings is appreciated.
    
    1.  Barton, G.  and Saxton, D.   The  Role of Interactive Computer Systems in
        Data Processing at CEDDA.   Environmental  Data Service  (EDS) Magazine,
        pp. 10-14.
    
    2.  Edit Procedures -  Surface Observational  Data.  Surface Section, Primary
        Data Branch, National  Climatic  Center, Asheville, NC 28801.  August 1975,
        31  pp.
    
    3.  Jalickee, J., et.al. Validation, Compaction,  and Analysis of Large
        Environmental Data Sets.  Environmental  Data  Service (EDS)  Magazine,
        pp. 3-9.
                                         141
    

    -------
    The Role  of
    Interactive Computer
    Systems  in Data
    Processing at CEDDA
    By Gerald Barton and David Saxton
    Introduction
    The  Environmental  Data  Service's
    Center for  Experiment Design  and
    Data  Analysis  (CEDDAI  processes
    enormous volumes  of  interdisciplin-
    ary environmental data collected in
    major  field research  programs  and
    projects, such  as the  recent GARP
    I Global  Atmospheric  Research Pro-
    gram )  Atlantic Tropical Experiment
    I GATE I. As an example, CEDDA re-
    ceived 1,700 miles of magnetic tape
    data from  the  four U.S. ships (Re-
    searcher. Oceanoprapher, Dallas, and
    Gillis) in GATE's primary array.
       CEDDA's goal is rapid  processing
    to provide the  data to  the scientific
    rommunitv  as soon as possible after
    the completion  of a field experiment.
    One necessary step is editing the data
    to rcmo\e invalid readings. CEDDA's
    current turnaround time for interac-
    ti\e  editing of  a data  file,  is 1 to 3
    weeks. It is hoped that a  new inter-
    active  computer  system CEDDA  is
    cunentl)  assembling  will cut  this
    lime to ''.  hour or less.
    Data Collection
    During  field  experiments,  environ-
    mental data are recorded continuously
    by  instruments  on  ships,  towers,
    buoys, balloons, and other platforms
    at sample  rates from  10/second to
    4/second. A wide variety of specially
    calibrated sensors measure such vari-
    ables as temperature, dewpoint, pres-
    sure,  wind, radiation,  salinity,  and
    rainfall.  The  outputs  are processed
    and  stored on  multitrack  magnetic
    tapes. One track is used  exclusively
    for time so that the exact Julian date,
    hour, minute,  second,  and 1/10 sec-
    ond  for each sample are known.
      To  augment  this  high-resolution
    taped data, each major sensor  sub-
    system  output  is  supplemented by
    logs, stripcharts, and optical marked
    cards that record calibration checks,
    sensor changes  (with all  serial num-
    bers), and special events,  such as the
    beginning or  end  of  an  instrument
    cast.
       The completeness of the  data sets
    and their security  are  matters  of
    prime concern. At the  end of a phase
    of  a field  experiment, or at  other
    convenient intervals,  all  tapes,  logs,
    cards, etc., are shipped  to CEDDA
    using  the safest methods  available.
    During GATE,  CEDDA had  a  data
    manager  on each of the 4 U.S. ships
    in the primary (B-scale) array and
    also at the GATE Operations Control
    Center to ensure the completeness and
    security  of the  transfer process.
    
    Current Processing  System
    At CEDDA, the incoming  analog data
    tapes are first checked for recording
    quality   and completeness.  Next,  a
    minicomputer  converts the  analog
    data  to  digital form,  producing  a
    digital  tape.  Playback time is 32
    times  faster  than  field  recording
    speed, so an 8-hour field tape is tran-
    scribed in about 15 minutes. During
    the minicomputer processing, an ad-
    ditional computer time word is added
    to each sample to control subsequent
    data processing programs and to pre-
    clude the loss of any sensor data due
    to malfunction  or noise in the field
    time system.
      Processing next proceeds to one of
    NOAA's  larger  computer  systems,
    where data sets are organized by com-
    ponent systems  used on the data col-
    lection platform,  e.g., Oceanographic
    Data  Set  or Rawinsonde Data  Set.
    Graphical display of the data as time-
    series  plots  and graphs,  and  fre-
    quency distribution plots, is required
    for the analysis of these data sets.
      The editing features of the current
    computer  processing  system can be
    thought of as an  interactive graphics
    system, with the time required for in-
    teraction varying up to a  week or
    more. For optical mark cards, reac-
    tion is rapid since all event cards may
    be  listed in  chronological order and
    cards may  be  inserted, deleted, or
    corrected using a list-edit program in
    the minicomputer. However, for high-
    resolution meteorological or  ocean-
    ographic data which must be trans-
    formed  to  engineering  units  and
    properly scaled, display for editorial
    review is currently limited to a micro-
    film graphics subsystem  located in
    nearby  Suitland. Md. For these data
    sets the time required for interaction
    includes the transport of  data tapes,
    generation of microfilm graphics in  a
    batch mode at the remote site, trans-
    port of microfilm on the return loop,
    review using microfilm readers, test-
    ing of  automated  corrections  when
    required, and the recycling to display
    
    New Processing  System
    CEDDA is  currently assembling the
    hardware and software necessary to
    implement an  interactive computer
    system that will allow the data editing
    and updating  functions to be per-
    formed in a  single  processing step
     I real time). The main components of
    the system  will  remain  a Digital
    Equipment Corporation (DEC)  PDP-
    ] 1/50  minicomputer  and  an  IBM
    360/65. It will  be possible to access
    data on the IBM 360/65 through the
    PDP-11  or through  terminals. The
                                                      142
    

    -------
    PDF 11/50
    (184K bytes)
                         Floating-point hardware
                         Line frequency clock
                         Programmable real-time clock
                         RSX-11D operating system
    DRUB
    Interactive
    ? graphics
    interface
    DEC-writer
    terminal
    Optical
    reader
    •:•/•..- •:-->'fc:v??.y.f3
    	 -r DR11C (12) t;
    J~^" Uecomutation ••'/
    •*" interfaces >>
    Future DPll-DA
    link to .S"~ Synchronous
    IBM 360 communications
    interface
    BLUE
    __ — _^_ Auto answer
    acoustic coupler
    interface
    DL11E
    — _f~~ Auto answer
    acoustic coupler
    interface
    High speed
    paper tape
    reader/punch
    Versatec
    printer-plotter
    
    9-
    _. track
    800 BPI 9-
    ^_^^ track
    "~~~~ track
    2UO, bob,
    9. 800 BPI
    track 	
    1 — 800. 1600
    RPT 9-
    J — - track
    • — — 800, 1600
    BPI
    40 ..:;\;\ . •'•.:;•
    C million
    u » 40
    bytes ;*y
    million
    bytes
    SUPERBEE r
    Kpvbnnrrl Centronics
    cathode ray tube iuu char/ sec
    . . printer
    terminal
    I .phnratorv 	 _ \ir i.- l ]
    
    peripheral analop
    system "" "" ' " • .
    A/D hardware
    CEDDA's proposed interactive
    computer configuration.
          143
    

    -------
    IBM
    360/65
    
    Color
    TV monitor
    Monochrome
    TV monitor
    Monochrome
    TV monitor
    Pictoral
    hard copy
    ( 16 shades)
    -4-7 PDF 11/50
    * 	 ** (184K bytes)
    )
    	 R
    > , i \ > i
    "• v.
    *•'••'••
    Track '•,-}
    ball (2) ;;
    	 	 ._,.,..,_ Kcyhnnrd (?)
    
    Pencil and
    tablet (2)
    i
    AMTEK . -,,.,
    and display
    1256 levels)
    
    
    t
    j Video
    ' tape
    recorder
            CEDDA's proposed interactive
            graphics subsystem configuration.
    144
    

    -------
    PDP-] 1  will  have a  graphics sub-
    system that will take less than 30 min-
    utes to  perform the functions  of the
    current microfilm subprogram.
      The major features and components
    of the interaction system  are:
    (Ij  Access  to the IBM 360/65 time-
    sharing facilities via key, board cath-
    ode ray tube iKCRT) terminal, ASR-
    33  teletype  terminal, or  PDP-11/50
    minicomputer.
    (2)  Input terminals to the PDP-11/
    50. including an LA-30  DEC  writer
    terminal,  a KCRT. and  two  dial-in
    terminal interfaces for use  with  re-
    mote terminals.
    13)  A  graphics subsystem  for the
    PDP-11/50.
    (4)  DEC's   (RSX)-llD   real  time,
    priority-driven,  multidisking  execu-
    tive system  for the PDP-11/50.
       With these  features, a user can  ac-
    cess the 360 to perform mathematical
    computations  or generate data sets.
    He can look at  the data and analyze
    them in real time  on  the interactive
    graphics subsystem. V^ hen he finds
    errors,  he can  immediately  correct
    the data,  and display them again on
    the  graphics  system  to validate the
    corrections. He can then archive the
    updated data  set for future  use.
    
    Interactive Graphics
    Capabilities
    The interactive graphics  subsystem,
    designed and assembled by Operating
    Systems  Incorporated   of Tarzana,
    California,  consists of  a RAMTEK
    graphics  display  system interfaced
    with CEDDA's PDP-11/50 computer
    by an appropriate switching network.
    Features of the full system (onlv part
    of  which  is  required  for the data
    editing job)  include two black and
    white  TV monitors, one color TV
    monitor,  two data entry  keyboards,
    two pencil  and  tablet  systems, two
    track ball cursor controls, a television
    tape recorder with  microphone input.
    a TV camera  with zoom lens, an ana-
    log to digital converter, eight  planes
    of  memory  that  allow  up  to  256
    shades of gra}  or coloi and a cross-
    print  switching  network  that allows
    mixing control of inputs and outputs.
      A  simple  use  of an  interactive
    graphics system  is the editing of raw
    data displayed as'a time-series analy-
    sis  or plot.  For example, a single
    parameter, such  as temperature,  is
    plotted at its highest resolution in a
    time sequence covering  several hours
    or days. Visual inspection of the data
    may  reveal large  errors  where the
    sensor or telemetry sv stem failed. To
    correct  these  larger  errors,  a  win-
    dow  edit program might be tested
    with all "'good"'  values of the param-
    eter constrained  to  fit  between the
    upper and lower limits of the window.
    Diurnal  and other trends might be
    superimposed  on the data plot.  The
    limits  and trends can  be displayed
    with the raw data to  show which data
    points ^hould  be edited out.
      A slightly  more sophisticated  ver-
    sion  of  this time-series  plot would
    compute running means over minutes
    ur hours and shou which of th° liiHh-
    resolutiori point* nil!  fall outside two
    or thiee standard deviation*. Complex
    cm vp=  nsini:  higher   01 dot   polv-
    nominals  can  be fitted  to  time-series
    data,  both before  and  aftci various
    editing  passes, to eliminate,  insofar
    as possible, "noise"  from the  data.
    Various  filteis and  smoothing  func-
    tions  also ran be tested and evaluated
    befoie going into an  Automatic  Data
    Pioccssing ( \DPl production mode.
       In  general.  CEDD V's new interac-
    tive maphic« .system  will make it pos-
    sible  to display  two 01  more curves
    simultaneously, using coloi. intensity.
    or blinking characteristics to distin-
    guish, for example, between a stand-
    ard and dial edit scheme or between
    different parameters. It  u ill provide
    the capability to produce haul-copy
    documentation of  both  the trial  pio-
    grams as thev progress dming a test
    and the data sets used.
       \ more demanding requirement  of
    an  interactive giaphics  svstem is the
    ability to display  and opeiate on dig-
    itized field data.  Vn example of this
    type of data  i- ,i digitized i.id;u pic-
    ture.  Under the contiol of an inter-
    active graphics system,  the  analyst
    should be able to select and  display
    a radar picture, to rotate and rescale
    it to a standard grid size, to enhance
    the digitized increments bv contours
    or  false  color transfer*, to  overlay
    and compare it with  the previous pic-
    ture,  and to display onlv those points
    fiom  the two  pictutes whose change
    exceeds some  threshold  value. Simi-
    larly, the analyst  should be  able to
    display  the  overlap  portion of disi-
    tized  radar  pictures fiom  two loca-
    tions  and to scale and normalize these
    independently  so that compatibility i?
    established on common echo systems.
      A further refinement is (he addition
    of a TV-type  scanner  so  that  analog
    material  can be rapidly  digitized  at
    high ieso)ution and then handled with
    all  the capabilities of  the interactive
    graphic* system For example, a satel-
    lite visual range photograph could be
    scanned  and digitized  and then dis-
    placed with a radar  picture coveiins
    the same area. Specific rainfall rates
    fiom  surface  observations could be
    overlayed on (be same display so that
    some  integration  of  area]  rainfall
    amounts would be immediately avail-
    able.
       \n interactive graphics system pro-
    v ides the ability to overlay data from
    different  platforms or  different  sv s-
    teins  For example,  the temperature
    and vertical velocity  from sensors at
    several levels  on a  tethered  balloon
    sy stem  could   be  compared  by  an
    analyst for coherence and lags as eon-
    vertive plume?  are  sampled.  Prop-
    sonde* I atmospheric  soundings'1 from
    aircraft could  be  graphically  super-
    imposed  on simultaneous ladiosonde
    soundings from ships   Spcdia taken
    by  instrumented aii craft  duiinc ship
    fly In s can be  compared with hiiih-
    resolution  data recoided  on  board
    each ship.
      CFDP \  plans  to rune the  neu
    interactive computei system in oper-
    ation by  late 1(17,~>  In that time, im-
    plementation of flic  graphic* subsv --
    tem should  include the v\oik done in
                                                          145
    

    -------
    the  current   COM  c\cle.  Future
    CEDDA applications of the graphics
    s\stem will include  program'; that al-
    low display  of  radar or satellite pic-
    tures  in  multicolors  or up to 256
    shades of gray  using the tape-record-
    ing features of the graphics -system.
    It should be  possible  to  construct
    time-motion   pictures  of   changing
    weather features. Also envisioned is
    the  capability   to  display   slices
    through 3-D models of weather sys-
    tems. CEDDA currently has analysis
    programs that  allow an  analyst  to
    change   parameters   in  a  weather
    model. The real time  operation  of
    the  graphical  display should  allow
    the  scientists   to  experiment  with
    parameters that he may never  have
    had  fhe opportunity  to look at previ-
    ously.
       It can be seen from the above ex-
    amples  that an interactive graphics
    system has broad  applicability, ex-
                                                      P
    Gerry  Barton
    
    The Authors
    
    GERALD BARTON, Chief of CED-
    DA's Computer  Systems Branch, has
    a  B.S.  degree  in  Geophysics  from
    Pennsylvania State I niversiu  and an
    M.A. in Geological Science from the
    Lniversit)  of Texas. Before coming
    to CEDDA, he worked  for ten \ears
    with the U.S Naval Oceanographic
    Odice as a  geophysicist. His early  as-
    sociation with the Oceanographic Of-
    fice  included  gravity surve\  cruises
    in the  I'SS Archerfish,  a research
    submarine,  in  the  Western  Pacific
    and  off the east and west coasts of
    the United  States. From  1967 through
    January  of  1974,  when  he  joined
    CEDDA,  Cerrv   spent  most  of  his
    time working in computer program-
    ming,  systems design, and the  proc-
                       Dave  Saxton
    
    essing of gravity and geodetic data—
    to determine,  among  other  things,
    the  deflection  of  the vertical,  or
    "which way is up."
    DAVID SAXTON joined  CEDDA as
    Chief of the Operations Division in
    April 1974,  following a 30-year ca-
    reer  in  the  Air  Weather   Service
    which took him to England. France,
    German),  and Japan. Dave has  a
    B.S. degree  from the University of
    Michigan and an M.S. from  the Uni-
    versit\  of  Chicago.  During World
    War II,  he served as an Air Force
    weather  forecaster in  Europe. After
    the war  and a year of civilian/stu-
    dent life, he was recalled  to  active
    duty  and  assigned   to  the  joint
    Weather Bureau/Army/Navy Weath-
    er Central  in Washington, D.C. Sub-
    sequently, he was posted to the Tokyo
    tending from program design and test
    through all  stages of data reduction
    and processing to scientific data anal-
    ysis. In addition, interactive graphics
    provides programmers  and  analysts
    with the ability  to see the data move
    through  programs  from  recorded
    voltages on  multiple  channel  tapes
    until they become validated meteoro-
    logical or oceanographic data suitable
    for permanent archival  and  dissemi-
    nation to the  user community.
    Weather Central, then  to the USAF
    Weather Central in  Suitland,  Md.,
    later  moving  with  that  organization
    to Offutt AFB, Nebraska. In 1961 he
    was assigned as Chief of the Strategic
    Air Command Weather Support Cen-
    ter in High Wycombe, England. Four
    years later  he  was assigned to Air
    Weather Service Hqs.. Scott  AFB,
    Illinois, as Chief of AWS' Computer
    Techniques  Division. In 1967,  Dave
    returned  to   Offutt,  now  the  Air
    Force's  Global  Weather Central, as
    Chief,  Development  Division,   and
    later  Chief  of  Operations.  In  1971
    he went to Hickman AFB. Hawaii, as
    Chief of Operations Division, Head-
    quarters, First Weather Wing. Retir-
    ing fiom the military in March  1974
    I with the Legion of Merit). he joined
    CEDDA the following month.
                                                           146
    

    -------
                        EDIT PROCEDURES
                  SURFACE OBSERVATIONAL DATA
    Contents                                                     Page
    
    Card Images Keyed                                              1
    Procedures                                                     2
    No. 1 Card Edit                                                4
        Psychrometrie Check                                        4
        Limiting Range of Variability                              6
        Wind, Weather, Temperature, and Visibility                 7
        Cloud Coding                                               9
        Clouds and Obscuring Phenomena                            11
        Explanation of Edit Flags                                 16
    Visual Checking of Records                                    19
    No. 3 Card Edit                                               19
        Machine Computations                                      24
    Precipitation Data Card Images                                26
        Checking Procedure - Hourly Precipitation                 27
        Checking Procedure - Extreme Precipitation                28
        Maximum Short Period Precipitation                        29
                        Surface Section
                      Primary Data Branch
                   National Climatic Center
                    Asheville, N. C. 28801
                          August 1975
                             147
    

    -------
                         SURFACE  OBSERVATION  RECORDS  PROCESSING
                                                  NWS   FAA   NAVY LAND*
                                                           FOB
           TSB
    Eistrib«ti»n to E3S,
    Kfl U>. and ISO  t».
       geceiv«s records (unuscript tacSM
       • Dd charts) froa KWS £ FAA stations.
       rre-«dits fora* and indicates keying
       instruction*.
       Kikes copies oC Preliminary LCD'"
       (Tora »-«> .
                                                                                                                      ADPSD
                                                                                                               Data Entry *eys eata on tape.
                                                                                                               Opens. Sect,  organise* cUt«
                                                                                                               on  tape and edits.
                                                         FOB
                                                                                                                       ADPSD
                                             1.  Revitv* edit
                                             2.  CorreeCs li«tii>9» and fomi.
                                             3.  Preparu ditenpaney r*pacu.
                                             4.  Kauriui discrepancy reports.
                                                                   I.  D«tft entry keys corrections.
                                                                   3.  Oprns. Sect, updates tapes  and re-
                                                                      edits (repeated as necessary  to
                                                                      obtain clean data) .
                                                                   3.  tuns LCD COM copy for printing.
                                                                                                                                          \
                                                                                                                                           IfDCCK
                                                                                                                                             for
                                                                                                                                           frlntia?
                                                          toe
                                                                                      %>*
                                                                                                                       ADPSD
                                                                                                       \
     •tier SCC rroira
                             Ustings-
    1.  Heviews Listing..
    2.  rrepares data for special jobs as
        required.
    J.  **vi Annual cape*.
    
                                                                                                                  nan* data tables  for LCD Annual
                                                Ann. corrections-^
    
                                        ->U33 Ann. control cards ^
                                                . da« tables'
                                                            trol cards for ten
                                             3.  IrtMhl-T aad reviios US Annual
                                                                                        Print* and diitribut*. LCD
                                                                                                                            Data on tape
                                                                                                                              listings*
                                                                                                                              or cards
                                     1.  Microfilm records.
                                     2.  Archives reco: i» and Kicrofilai.
                                     1.  Kalataias stack. -A publications
       •S«rr Land r*cei
    -------
                               EDIT PROCEDURES
                          SURFACE OBSERVATIONAL DATA
     I.     Introduction
    
           A.  Surface records are received at NCC for processing and quality
               control to produce several routine summaries by machine methods
               from taped data.  Processing includes keying, verification,  and
               quality control procedures.  After processing, records and sum-
               mary products are archived at the NCC.
    
           B.  A joint machine edit program for a portion of the hourly obser-
               vations has been made by EDS and AWS.  However, where different,
               only that which is applicable to EDS is listed in this outline.
    
           C.  Data are keyed on magnetic tape.  If wet bulb temperature and
               relative humidity values are not in the basic data as keyed,
               machine computations of these values are entered on tape.
    
           D.  The taped data are machine edited, corrected, and used in a
               number of machine programs producing various monthly and annual
               summaries.
    
    II.     Card Images Keyed
    
           A.  WBAN No.l card - Hourly Surface Observations.  This image is
               keyed only for the hours corresponding to 3- and 6-hourly syn-
               optic times in LST for NWS, FAA, and Navy stations.
    
           B.  WBAN No. 3 card - Summary of Day.  This image is keyed from the
               summary blocks of Form MF1-10B, the B-16 or, in a few cases,
               from the F-6.  For FAA stations, the form is MF1-10C.  In gen-
               eral, this image is not used when the station program is such-
               that a summary approximately midnight to midnight is not possible.
    
           C.  The precipitation card series is:
    
               1.  Hourly precipitation - 2 images ( 1 & 2 keyed in col. 12)  for
                   each day of the month having precipitation and for the last
                   day of the month with or without precipitation.
    
               2.  Maximum short period precipitation, per month - 2 images
                    ( 1 S 2 keyed in col. 10) for each station  per month showing
                   maximum amounts for time intervals of 5 to 180 minutes.
    
               3.  Maximum 24-hour amounts, per month - 1 image (4 in col.  12)
                   is keyed showing the greatest precipitation and date(s),
                   greatest snowfall and date(s), and the maximum snow depth and
                   date(s) .
    
                    (a)  When the value is zero  (0), date is left blank.
    
                                    149
    

    -------
    III.   Procedures
           A.   A scan edit of the forms is made and keying  instructions  appli-
               cable to the station program indicated on the  station folder.
    
           B.   Data Entry Section keys data on tape.
    
           C.   Operations Section transfers keyed data to computer tapes by
               record type.
    
               WBAN No. 1 images, hourly observations,  are  placed on two tapes.
    
               1.  Tape No. 1 includes NWS (except Antarctica)  and FAA stations.
    
               2.  Tape No. 2 includes NWS Antarctica and Navy stations.
    
               3.  The edit program'provides for priority editing on tape No.  1
                   into two groups.
    
                   a.  Group No. 1, stations in the LCD program, is edited in
                       two lots - first and second cutoffs.  The first cutoff
                       is made at the discretion of the Chief, Surface Section,
                       when 75 to 90 percent of the records for the month are
                       available; remaining records constitute the second cut-
                       off.  Records received unduly late can be held for
                       processing with data for the next month.
                   b.  Group No. 2, stations not in the LCD program, is usually
                       processed after completion of group No. 1.
    
           D.   WBAN No. 1 records are edited according to the station's observa-
               tional program using a reference tape containing the station WBAN
               Number, Name, Elevation, Psychrometric  Pressure Table, and Obser-
               vational Pattern.
    
               1.  The observational pattern is designated by assignment of nu-
                   meric values to fields in the card image and use of the sum of
                   the field values applicable to the station for each hour as a
                   control of the machine tests to be made.
    
                                         Value       Card Image Columns
                                             1              14-16
                   Sky Condition             2              17-20
                   Visibility                4              21-23
                   Wea. & Obstruction        8              24-31
                   S. L. Pressure           16              32-35
                   Dry Bulb Temp.           32              47-49
                   Dew Point Temp.          64              36-38
                   Wind Dir. & Speed       128              39-42
                   Station Pressure        256              43-46
                   Wet Bulb Temp.          512              50-52
                   Relative Humidity      1024              53-55
    
                                   150
    

    -------
    Field                Value       Card Image Columns
    
    Total Sky Cover       2048                56
    Cloud Layers          4096               57-58
    Total Opaque          8192                79
    
    The observational pattern is keyed in two cards as illus-
    trated in Fig. 1.  The station WBAN number in cols. 1-5,
    the first 12 hours LST of the day in cols. 7-11, 13-17, etc.,
    in the card keyed 2 in sol. 80 as an identifier and the last
    12 hours in the card keyed 3 in col. 80 as an identifier.
    The observations are sorted from the original tape into
    chronological day and hour order, edited, and one observa-
    tion only for each hour (first on the original tape if
    multiple entries) transferred to another tape (called the
    sorted tape).
    
    Only the records questioned in the edit are listed.  Complete
    data, keyed and computed, in a questioned record are listed
    on format paper (Fig.Sa) with triple spacing.  Appropriate
    flags appear on the line above the data in the first column
    of the field(s) questioned.  Field corrections are entered
    on the second line above the data for keying.
    a.  An asterisk "*" indicates inconsistency.
    b.  An ampersand "&" indicates data not in the station's
        program, except that, if there is an inconsistency, the
        "*" flag instead of the "&" will appear.
        Observations not in the station's program are edited
        as though all fields were required.
    c.  "DUP1," "DUP2." etc., are listed to indicate duplicates
        up to three.  All duplicates are edited, but only the
        first observation on the original tape is transferred to
        the sorted tape.
    d.  "MSG" above the day and hour indicates an observation in
        the edit pattern is missing.  "- -" for the hour indicates
        the entire day is missing.
    
    An inventory listing (Fig. 3g) at the end of the edit listing
    for each station shows all hours for which observations are
    on tape with the total number of observations on the tape
    for the month at the end of the inventory listing.
    a.  "01" printed under an hour indicates an observation with
        the cloud field keyed.
    b.  "02" indicates an observation without the cloud field
        keyed.
                    151
    

    -------
    031/53 pOO|Op Op Op 14(347,
    1 1 )i> S «|7 1  1,
    J 45 JG|;7*4I A[ p' 5: H|M ej SO[S' *- i K'lfc! '-' '^M £>K|b' ti -.9|^['i 11 73(" n »
    7 1 "j
    
    
    |00
    111
    222
    3*3
    444
    
    555
    666
    
    777
    888
    999
    i i >
    |00
    1 1 1
    222
    3|3
    444
    
    555
    566
    III
    1 1 1
    222
    333
    444
    
    555
    666
    ONE
    777
    388
    999
    SAC
    777
    888
    999
    M4*3
    101
    1 1 1
    222
    (333
    444
    
    555
    ,666
    
    ?77
    !888
    5999
    
    III
    1 1 1
    222
    333
    444
    
    555
    666
    TWC
    7 7 /
    888
    999
    
    too
    ME
    222
    333
    44*
    
    555
    666
    )
    777
    388
    999
    000
    1 1 1
    222
    1333
    t4j4
    
    5 55
    566
    
    77|
    !I88
    3999
    
    Jll
    1 1 1
    222
    333
    444
    
    5 5 5
    5 66
    rHRE
    7 77
    888
    999
    n
    in
    1 1 1
    222;
    333"
    444-
    
    555:
    66ft
    E
    7 7 7
    888
    999J
    Ill
    1 1 1
    222
    333
    444
    
    555
    666
    
    777
    !888
    1999
    ma ,»•!.*
    lie
    1 1 1
    222
    333
    4 44
    
    555
    666
    FOU
    777
    888
    999
    ijj,.
    000
    ill
    222
    333
    414
    
    5 5 5
    666
    R
    7 7 7
    884
    999
    
    loot
    1 1 1
    222
    >333
    1444
    
    )555
    >G66
    III
    1 i 1
    222
    333
    444
    
    555
    66 6
    FIV
    "tn
    )888
    3998
    7 7 /
    888
    999
    """"
    |0||
    1 1 1
    222i
    333:
    44 4 <
    
    555:
    656E
    
    7 7 7
    888
    999
    
    111
    1 1 1
    222
    333
    444
    
    555
    666
    000
    1 .'1
    222
    333
    4 4 i
    
    555
    566
    0 (i ',' !
    1 1 1
    222
    333
    •!|4
    
    5 5 ;.
    5 £ 2
    SIX
    7 77
    888
    )999
    777
    888
    99?
    
    / 1
    |8S
    999
    III
    1 1 1
    2?2
    1333
    144
    
    555
    ,666
    
    7 7 7
    )888
    < ~3 * J
    no
    1 1 1
    222
    333
    444
    
    5 55
    566
    Itl
    1 1 1
    22?
    333
    4M
    
    555
    i ' 6
    I
    1
    2
    3
    4
    
    3
    5
    SEVEN
    / ; ?
    888
    999
    "'"""' " -"""
    // /
    888
    999
    /
    8
    
    IGO
    ! U
    2 22
    3 * -
    4 4 4
    000
    1 1 1
    222
    333
    "4f
    i
    5 5 5
    6 65
    3 0 C G
    111!
    222|
    3333
    !4 4 4
    
    j 5 S'.i 5 ; 5
    5 6 6|6 6 6 5
    EIGHT
    7 7 7
    88S
    '99
    '
    / / /
    eg 8
    999
    
    1777
    5883
    3993
    .,.«..
    03103 pOO,Op OpOOp I|*j34r, 00,000, pOOpO
    i i i|i : !j> t •|io|r a u|i< is ii|i.' ii u|»bi uap< Bxfmn\a\it av\x tut
    1494,7 0,000,0 0,0,000, 14,3*7
    ,000, ii ,'jOOflQ 1,^4,7 ,:.
    •- s 44i '.' «! • « «|a « «Uk •! -^ •-• • 1" - -I* H e i'H - «!- - --[ *• « .'
    
    
    |00
    H|
    222
    3|3
    
    444
    
    555
    
    
    666
    
    III
    888
    999
    1 ! 1
    |ou
    1 1 1
    222
    31 3
    
    444
    
    555
    
    
    666
    ONE
    7 7 /
    888
    999
    4 I f
    CAC
    Ill
    11 1
    222
    333
    
    444
    
    555
    
    
    666
    
    1 t 1
    888
    999
    i i i
    M40
    loiill
    iiini
    !222
    1333
    
    444
    
    555
    
    
    222
    333
    
    444
    
    55-5
    
    
    > 6 6 6|6 6 6
    TW(
    mi
    i888
    )999
    Mil 1!U
    / 7 1
    888
    999
    n u»
    4uo
    ill
    222
    333
    
    444
    
    555
    
    
    366
    3
    7 7 /
    388
    999
    0
    1
    2
    3
    
    1
    
    5
    
    
    6
    
    '
    8
    9
    D; u life
    000
    1 1 1
    222
    333
    
    4|4
    
    555
    
    
    666
    
    } 7l
    188
    999
    binz,
    oil
    1 1 1
    222
    333
    
    444
    
    55-5
    
    
    666
    FHRF
    7 7 1
    388
    999
    WUH
    n
    Ui
    1 11
    222
    333
    
    444
    
    555
    
    
    666
    E
    7 7 7
    888
    999
    bra a
    itn ,
    J
    1
    2
    3
    
    4
    
    5
    
    
    6
    
    7
    8
    9
    £
    1
    Hill"
    ,11
    222
    333
    
    444
    
    555
    
    
    1 1 1
    222
    333
    
    444
    
    555
    
    
    6 6 6K 6 6
    FOU
    7 77
    888
    999
    11 BE
    7 7 /
    888
    999
    WSJ!
    001
    in
    222
    333
    
    4|4
    
    555
    
    
    666
    R
    77 7
    88|
    999
    axx
    •/
    1
    2
    3
    
    1
    
    5
    
    
    6
    
    7
    S
    9
    tw
    oo|
    1 1 1
    222
    333
    
    444
    
    555
    
    
    666
    
    17 7
    888
    999
    k «
    -------
               5.  Corrections for updating the tape are keyed on a "correction
                   card" image by fields or by keying a complete No. 1 card image
                   for missing observations or those having numerous field errors.
                   Following the updating of the tape, another edit is made in-
                   cluding the inventory listing.
    
    IV     Details of No. 1 Card Image Edit
    
           A.  Major check groups
               1.  Psychrometric check:  relationships between T, T ,  T  , & KH.
                                                                   w   dp
               2.  Limiting ranges of variability.
               3.  Wind, weather, temperature, and visibility and certain
                   interrelationships.
               4.  Cloud coding.
               5.  Cloud, ceiling, and sky cover relationships.
    
           B.  Psychrometric Check
    
               Psychrometric relationships.  The program is designed to accept
               and check the interrelationships between the four psychrometric
               parameters if all are keyed, or to compute T  and RH and then
               check the interrelationships if only T and T   are keyed.  The
               notations are in terms of whole degrees Fahrenheit for tempera-
               ture and whole percentages for relative humidity.  If there is a
               suspected error in these relationships, the observation is printed
               out complete, including an appropriate error flag.
    
               The empirical formulas used to compute Tw and RH (with respect to
               water)  are:
    
               1.  Computation of wet bulb (Tw):
                   If the dry-bulb temperature (T) is zero and above:
    
                   Tw = T - (.034N - .00072N  JN - Ij  ) (T + Tdp - 2P + 108)
    
                   a.  If T is less than 100°F., rounding of Tw follows this
                       scheme:
    
                       Tw rounded = Tw + .9 if the tens position of T is 0, 1, 2.
    
                       Tw rounded = Tw + .9 -.01(T +  .9)  if tens position of T is
                       3, 4.
    
                       Tw rounded = Tw + .4 if the tens position of T is 5 thru 9.
    
                   b.  If T is 100°F. or higher:
    
                       Tw rounded = Tw + .9.
    
    
                                        153
    

    -------
         If the  dry-bulb temperature  (T) is less than  zero;.
    
         TV = T  -  (,03'tf -  .oo6;;?)(,6rr -f Tdpl - 2? -v-  108)
    
         TV rounded =  Tw -  .OlTdp
    
         H » T ~ Tgp in the above  equations ,
              10
    2.    Computation of relative humidity:
    
         RH
                173 + ,9T
    
         The checking procedures print out the error flag if:
    
         Tg  is greater than Tw and if Tw is greater than T, and
         if the following are not satisfied:
    
         a.   In the range of temperature from -60° to -M39°» the
             dew point range may be -60° to +90° •   For individual
             observations t the dew point check requires that s.
             maximum Tdp taking T -0,5°F, and T v+ O.U°F., and a
             xaininum Tdp- taking T + Oj*°F,,and TV - 0.5°F.  (if
             T = TV, maximum Tdr> = TV.) Saturation vapor pressures
             from tables stored  in memory:   Table A  (vapor pressure
             over vater for  the  range  -60° to  -VlUO°F,)  or B  (vapor
             pressure  over ice for the range -60° to -5-31°F.) are taken
             for the above values of Tv.  The  vapor  pressures for each
             end of the allowable dew  point  range are  then computed,
             using
    
             e = ew - 0.000367? (T - TV) A + Tv^-jsa  \
                                         \     1571  /
                        P are in inches of mercury.  Pressure may be
             taken from individual observation, or from the pressure
             applicable to elevation range in vhich station is located.)
    
             From the vapor pressure tables in memory, the dew point
             temperatures corresponding to the vapor pressures at
             each end of the range, which are for the air at tempera-
             tures T + O.Uo?t| Tv - 0,5°F.t and T - 0,5°F., Tw + 0.1*0?.,
             are taken in terms of whole degrees of dew point.  If the
             dew point being checked falls with 2° above or 2° below,
             it is accepted as correct.  If outside this range, an in-
             dication of psychronetric error is printed.  Note that
             if station pressure values are not recorded in the obser-
             vations, computation of Tw should still always be possible
             since the program will taken an appropriate pressure va,lue
             that corresponds to the station elevation.
    
                                  154
    

    -------
            b.   Relative humidity values are accepted if they are in the
                range of 4% to 100% and are within 2% above and 2% below
                the computed range of humidity below.  All values less
                than 4% are flagged for review.  For hygrothermometer sta-
                tions, the relative humidity will have been computed by
                the formula in 2 above; for other stations it will have
                been keyed from the original record.
    
                The range for relative humidity is determined in the same
                way as for the dew point check.  Maximum and minimum vapor
                pressures are obtained from the taped tables for each end
                of the range, and the computation at each end of the range
                is by this formula:
                     e
                RH = — ,   e being the vapor pressure of the dew points
                      s    and e  the saturation vapor pressure of the
                                s
                           air at the observed temperature plus 0.4°F or
                           less 0.5°F.
    
                If liquid fog is reported in present weather and the tem-
                perature is 31°F. or less, T   = T  = T is acceptable.
    
                If T is less than -35°F., no formula is applied.  In the
                latter case, when T = -36° or - 37°, an error is listed if
                the dew point does not fall within the range T - 6° (plus
                of minus 1°).  An error also lists if temperature is within
                the range -38° through -53° and dew point is not in the
                range T - 7° (plus or minus 1°).
    
    C.          Limiting Range of Variability
    
                Limiting values, some absolute and some dependent on other
                elements within an observation, are incorporated into the
                machine edit program for checking purposes.  Items with
                values outside the limits, or such as appear inconsistent
                with other elements in the observation, or approach extreme
                conditions are flagged for technical review as follows:
    
                1.  Sea-level pressure:  above 1060.0 or below 940.0 mb.
    
                2.  Station pressure:  if pressure in inches and hundredths
                    plus 10~3 times the elevation (H ) in feet is less than
    
                    27.75 or greater than 31.30 inches.
    
                3.  Change of sea-level pressure from one observation to the
                    next is greater than 6.0 mb., change of station pressure
                    from one observation to the next is greater than 0.20
                    inches.  The interval between observations in both cases
                    is 3 hours.  For 1-hour, 3.0 mbs. & 0.10 inch apply.
    
                4.  Temperature:  T, above 125° or below -60°; T  , above 125°
                    or below -60°; T, , above 90° or below -60°,Wand if T
                                    dp                                   w
                    and T.,  are present and T is -53° or colder.
                         dp
    
                                 155
    

    -------
        5.   Temperature fluctuation from one 3-hourly observation to the
            next:   if T or T   changes 20°  or more from one 3-hourly obser-
            vation to the nex€,  the observation which varies 20°  or more
            from the preceding is flagged for review.  Changes of 10° are
            flagged for hourly observations.
    
        6.   Relative humidity:  below 4%.
    
        7.   Winds:   When wind at one 3-hourly observation of 20 knots or
            more doubles at the next 3-hourly observation, or reaches 50
            knots,  the wind speed is flagged for review.  (In AWS version
            of this edit, all winds 30 knots and higher are flagged for
            review.)
    
        8.   Visibility:  is 15 miles or less at one observation and 70
            or above at the next.
    
        9.   Obscuration and cloud heights,  as follows:
    
            a.  Obscuration                        greater than 4,000 ft.
            b.  Fog                                greater than 1,500 ft.
            c.  Stratus, stratocumulus,            greater than 9,000 ft.
                stratus fractus, cumulus
                fractus, cumulus mamatus
            d.  Cumulus, cumulonimbus              greater than 12,000 ft.
            e.  Altostratus, altocumulus,          less than 4,500 ft. and
                nimbostratus, and altocumulus      greater than 20,000 ft.
                castellanus
            f.  Cirrus, cirrostratus, and          less than 15,000 ft.
                cirrocumulus
    
    D.  Wind, Weather, Temperature, Visibility:
    
        1.   Wind:  direction is recorded and keyed in tens of degrees from
            north  (00 = Calm), and wind speed in knots  (00 = Calm).  If
            speed is 00, direction must be 00.  Legal directions other than
            00 are 01 through 36.  The wind error indication is printed
            with illegal directions, for speed of 01 or more with direction
            00, for direction of 01 - 36 with speed 00, and for exceeding
            limits mentioned in b above.
    
            Speed is related to the check of blowing dust, sand, blowing
            spray and blowing snow.  Observations in which these items
            appear with wind speed less than 9 knots are flagged.
    
        2.   Weather:  the following items and observations containing them
            are flagged for review:
    
            a.  Tornado.
    
            b.  Ice crystals with intensity indication or in combination
                with any other element.
    
                                  156
    

    -------
    c.  Fog or any form of precipitation with clear sky (0 cloud
        amount) except ice crystals.
    
    d.  Fog with dew point depression greater than 8°F.
    
    e.  Fog with less than 1/10 cloud cover.
    
    f.  Weather types below with visibilities other than those listed:
    
        Weather                        Visibility range
        S+, SP+,SW+, L+, ZL+, SG+      000-004 (0 - 1/4 mile)
        S, SP, SW, L, ZL, SG, 1C*      005-007 (5/16 - 1/2 mile)
    
        (* Note:  1C may be reported with higher than 1/2 mile visibility)
    
        S-, SP-, SW-, L-, ZL-, SG-     008	 (3/4-unlimited)
        F, IF, GF, BD, BN, K, H,KH,    000-060 (0 to 6 miles)
        D, BS, BY
    
    g.  Weather types (all intensities)  with temperatures other than
        within ranges below:
    
        Weather                        Range of temperature
    
        R, RW, L                       28°F. or higher
        ZR, ZL                         No lower limit, to 39°F.
        IP                             10°F, through 44°F.
        SP, SG, S, SW                  -40°F. through 44°F.
        1C                             -40°F. through 15°F.
        IF                             -40°F. through 15°F.
    
    h.  100% relative humidity reported without liquid fog or liquid
        precipitation in the weather fields and wind speed > 6 knots.
    
    i.  Illegal visibility codes are flagged for correction.  The legal
        visibility codes are:
    
        VSBY      Code   VSBY     Code       VSBY     Code
    0
    1/16
    1/8
    3/16
    1/4
    5/16
    3/8
    1/2
    5/8
    3/4
    1
    1 1/8
    1 1/4
    1 3/8
    000
    001
    002
    003
    004
    005
    006
    007
    008
    009
    010
    012
    014
    016
    1
    1
    2
    2
    2
    3
    4
    5
    6
    7
    
    5/8
    3/4
    
    1/4
    1/2
    
    
    
    
    
    
    018
    019
    020
    024
    027
    030
    040
    050
    060
    070
    
    8
    9
    10
    11
    12
    13
    14
    15
    20
    
    and,
    080
    090
    100
    110
    120
    130
    140
    150
    200
    
    by 5 mile
    increments , on
    
    
    
    
    
    
    to
    95
    
    950
        1 1/2     017                      > 100      990
    
                             157
    

    -------
    E.  Cloud Coding
    
        Ceiling, sky condition, and clouds are interrelated.  Three
        columns are keyed for ceiling height.  The valid codes are as
        indicated below and any others are flagged for correction.
    
        Ceiling height                        Card code
    
        Unlimited                             XXX
        Zero                                  000
        100 ft. - 5000 ft.                    001 - 050
        (every hundred feet)
        5000 ft. - 10,000 ft.                 050 - 100
        (every five hundred feet)
        10,000 ft. and higher                 100 - 250, etc.
        (every thousand feet)
    
        Sky condition is a four-position  (4 card columns) field, with
        provision for keying four sky condition symbols, as may be
        recorded in the MF1-10A Sky column.  Heights of clouds are not
        keyed in this field  (ceiling is keyed in the ceiling field and
        cloud heights in the "layer" fields discussed below).  If less
        than 4 symbols are reported, keying begins at the left of the
        field, with "0" keyed in each column at the right of the field
        for which no sky symbol is reported.  The lowest sky symbol is
        keyed first, the next highest second, etc., until the 4-column
        field is coded completely, either with sky condition symbols
        (including blanks) or zeros.
    
        If more than 4 sky condition symbols are reported,  the highest
        is keyed in column 20, and the first three in columns 17-19, un-
        less this excludes the ceiling symbol.  In the latter case the
        ceiling symbol is keyed in column 19, the first two in columns
        17 and 18, and the highest in column 20.
    
        For a partial obscuration  (-X) the first column of  the sky con-
        dition field is left blank.  The  succeeding three columns are
        keyed for reported sky conditions.
    
        No clouds or obscurations  (clear) is keyed 0000.
    
        An obscuration  (not partial) requires an X key in the first or
        second column of  the sky condition field.  If the obscuration
        is the lowest sky condition, the  X will be in the first  column.
        If a cloud layer  is  reported below the obscuration  it will be
        keyed in  the first  column  in the  normal manner, and the  X in the
        second column of  the field.  In this situation, the last two
        columns of the field would be 00.
                                 158
    

    -------
    The table below presents the valid codes of the Sky Condition
    field,  in the table, p = punch, b = blank, and - = X.
    Card
    code
    0000
    Card column punching | Description
    possibilities j
    0
    pOOO ' 1,2,4
    5,7,8
    
    ppOO
    
    
    pppO
    
    
    
    pppp
    
    -000
    
    bOOO
    
    bpOO
    
    bppO
    
    bppp
    
    p-00
    
    
    1,2,4
    5,7
    
    1,2,4
    5,7
    
    
    1,2,4
    5,7
    X
    
    Blank
    
    Blank
    j
    Blank
    
    (
    Blank
    
    1,2,4
    
    0
    0
    
    
    1,2,4
    5,7,8
    
    1,2,4
    5,7
    
    
    1,2,4
    5,7
    0
    
    0
    
    1,2,4
    5,7,8
    1,2,4
    5,7
    1,2,4
    5,7
    X
    
    0 0 I Clear sky, (less than 1/10) .
    00 j One symbol only, not an
    i obscuration or partial
    j obscuration.
    0 j 0 | Two symbols reported, no
    obscuration or partial
    . obscuration.
    1,2,4 i 0 i Three symbols reported, no
    5,7,8 f j obscuration or partial
    1 obs cur a ti on .
    i
    1,2,4 j 1,2,4 i Four symbols, no obscuration
    5,7 : 5,7,8 i or partial obscuration
    0 0 Obscuration, 10/10 sky hidden,
    no layer below obscuration.
    0 0 Partial obscuration, no other
    symbols.
    0 0 Partial obscuration, one other
    symbol.
    1,2,4 0 Partial obscuration, two other
    5,7,8 symbols.
    1,2,4 |l,2,4 Partial obscuration, three other
    5,7 ! 5,7,8 symbols.
    j
    i
    0 j 0 Obscuration above one layer of
    i i cloud
                               159
    

    -------
    F.  Clouds and Obscuring Phenomena.
    
        Provision for keying as many as four layers of clouds and/or
        obscuring phenomena, total sky cover, and opaque sky cover
        amount is made in this field.  Cloud layers are keyed in ascend-
        ing order.  If more than four layers are reported, the four
        lowest are keyed.  The lowest layer is always keyed in the left
        hand cloud field of the card.  For each layer, amount, type, and
        height are keyed.  For the second and third layers (if reported),
        the summation amount(s) is keyed at the level(s) involved.
    
        If a complete cloud layer section is reported unknown, "U", on
        MF1-10B, the corresponding card field for the entire layer is
        left blank.
    
        When fog or any other obscuring phenomenon is reported, it will
        be handled in a manner similar to a cloud layer, and an amount,
        type, and height will be keyed.  Obscuring phenomena other than
        fog (smoke, for example) are keyed X for type.  Heights of clouds
        and vertical visibility into obscurations are keyed in hundreds
        of feet.  Where vertical visibility is unlimited  (dash in height
        column of MF1-10B) height is keyed XXX.  If cloud field is re-
        ported clear or none, height will be keyed XXX.  If cloud height
        is reported unknown (U), height is left blank if type is unknown.
    
        Summation totals may not exceed 10/10, but the  first summation
        (card col. 67) may be 1 greater (not exceeding  10/10) than the
        sum of card columns 57 and 62; and card column  73 may be 1 greater
        than the sum of card columns 67 and 68, not to  exceed 10/10.
        Total cloud amount  (card col. 56) should be the same as col. 57
        if only one layer is reported, the same as col. 67 if only two
        layers are reported, the same as col. 73 if only three layers are
        reported, and equal to not more than 1 greater  than the sum of
        cols. 73 and 74  (not exceeding 10/10) if four layers are reported.
    
        1.  Legal codes in  the  card  field for  "Clouds and Obscuring
            Phenomena" are  related to the Ceiling, Sky  Condition, and
            Weather and/or  Obstruction to Vision fields.  Accordingly,  a
            discussion of the several relationships is  presented.
    
            a.  If sky condition is  reported clear, ceiling must be un-
                limited.  Summation  of all clouds must  be zero.  Type and
                height in the  cloud  layer fields may be keyed for zero
                amount  (less than 1/10).
                                 160
    

    -------
    b.  The ceiling height must be consistent with the height of
        the lowest cloud layer whose corresponding symbol in the
        sky condition field is broken or overcast, or with the
        height of an obscuration.  The total (if one layer)  or
        the summation amount at the layer constituting the ceiling
        must be equal or greater than 6/10.  The cloud type must
        be coded either 2 through 9, X/2, X/4 through X/7, X/9, or,
        if an obscuration, 1 or X.
    
    c.  If the ceiling is not XXX (unlimited), some sky symbol must
        be keyed:  i. e., broken (5), ovc (8), or obscured (X in
        1st or 2nd col. of sky condition field).  Only one X may be
        keyed.  It will be the first column of the sky condition
        field if the lowest layer is the obscuration; the second is
        a layer below a portion of the surface-based obscuration.
    
    d.  If the first cloud layer contains 10/10 F or IF (not GF),
        the ceiling height must equal the height of the first layer,
        and the sky must be obscured.
    
    e.  If fog is keyed as an Obstruction to Vision with clear sky
        or with partial obscuration or less than 5/10, with no
        clouds above, the fog must be GF or IF (not F).  If the
        partial obscuration is 6/10 or greater, with no clouds above,
        or with obscuration (10/10), the fog may not be classified
        as GF.
    
    f.  If total opaque is zero, all sky symbols must be thin or
        clear, and ceiling must be unlimited.
    
    g.  If any sky symbol is thin, the total opaque amount must not
        be more than half the summation amount of that layer and
        all higher layers (not always in error for higher layers,
        but should be flagged for review).   If the ratio of total
        amount is 1/2 or less, the highest sky symbol must be thin.
    
    h.  Sky condition symbols must,  with increasing height,  reflect
        equal or increasing sky cover.  Only the highest sky symbol
        may be overcast, except that below an overcast there may
        be a thin overcast.
    
    i.  The highest sky condition symbol must be compatible with the
        amount of total sky cover.
    
    j.  If obscuration  (X) is reported as the second sky condition,
        the second cloud layer type must be obscuring phenomena
        (fog, ice fog, smoke, rain,  snow, for example) keyed X;
        total amount, total opaque, and first summation total must
        be 10/10.  The third and fourth cloud layers and the second
        summation total columns should be blank (may be keyed if
        an aircraft report has been received, but should be flagged
        for review).  The third and fourth sky cover symbols must
        be zero  (0).  Ceiling must not exceed the second cloud layer
        height, and that height should be 4,000 ft. or less.  Normally
        fog will be questioned in such a situation.
    
                              161
    

    -------
    k.  If obscuration is reported as the first sky symbol, the
        type of the first cloud layer must be fog (code 1)  or
        other obscuring phenomena (code X), total amount and total
        opaque must be 10/10, and height must correspond with ceil-
        ing height.  The second, third, and fourth cloud layer and
        first and second summation total columns should be blank
        (may be keyed if an aircraft report has been received, but
        should be flagged for review).  The second, third, and
        fourth sky cover symbols must be zero (0).  Height and
        ceiling should be the same.  Height should be 1,500 ft. or
        less if fog (code 1) or 4,000 ft. or less if other obscur-
        ations (X) are encoded.
    
    1.  When fog  (code 1} is reported in the first cloud layer,
        amount not coded 0, the sky condition must reflect an ob-
        scuration or partial obscuration.
    
    m.  When fog is reported as the only cloud field (code 1), it
        should be coded in Obstructions to Vision as GF if amount
        is 1 to 5 tenths, or F if 5 to 10 tenths (prevailing
        visibility being 6 miles or less).
    
    n.  The corresponding cloud and summation total columns for
        sky cover symbol reported (code ) above an overcast
        (code 8 in Sky Condition) should be:
    
        -1.  Blank if total opaque is 10/10.
        -2.  Zero  (0) in amount and type columns, 10 in summation
             total columns, and XXX in height columns whenever
             total opaque is less than 10/10.  Additional  layers
             may be keyed if an aircraft report has been received,
             but should be  flagged for review).
    
    o.  Partial obscuration  (blank in first position of Sky Condi-
        tion) must have a first-layer amount from 1 to 9,  type
        must be fog  (code 1) or obscuration  (code X), and  height
        must be unlimited.
    
    p.  Some stations  (FAA) do not observe cloud layer values,
        but do enter total  cloudiness and total  opaque.  If the
        ratio of  total opaque to total amount is 1/2 or less,
        there should be no  codes 5,  8, or obscuration  (X)  in Sky
        Condition.  If the  ratio  is greater than 1/2, there must
        be 5, 8, or X in Sky Condition, and  if ratio is not 1:1,
        X is invalid.   (The valid blanks  in  cloud layer field will
        cause "2's" to print in the  inventory listing for  these
        stations.
                         162
    

    -------
    2.  The testing procedure to flag errors or suspected conditions
        in Clouds and Obscuring Phenomena, Sky Condition, Total Clouds,
        and Total Opaque is systematic.  Missing fields are indicated
        (except the valid condition for FAA coded "2" in the inventory)
        in the usual manner.  The system, in general follows these
        steps:
    
        In the cloud fields, the valid codes for cloud amounts are
        0 = no clouds or less than 1/10; 1 - 9 = 1/10 to 9/10 clouds;
        X = more than 9/10 or 10/10.
    
        Valid codes for cloud types are 0 = NONE; b (Blank)  when a
        cloud type is reported UNKNOWN; 1 = Fog; 2 = Stratus;
        3 = Stratocumulus; 4 = Cumulus; 5 = Cumulonimbus; X/2 (K)  =
        Stratus fractus;  X/4 (M)  = Cumulus fractus; X/5 (N) = Cumulus
        mammatus; 6 = Altostratus;'"7 = Altocumulus; X/6 (0)  = Nimbo-
        stratus; X/7 (P) = Altocumulus castellanus; 8 = Cirrus;  9 =
        Cirrostratus; X/9 (R) = Cirrocumulus; X = Obscuration other
        than fog.
    
        The valid codes for cloud heights in the cloud layer fields
        are the same as for ceiling heights (in number of hundreds of
        feet); XXX indicates NONE or (in the first layer) a surface-
        based partial obscuration; and bbb (Blanks) indicates cloud
        height unknown with type unknown.
    
        Errors are listed for invalid codes, and if
    
        a.  Any cloud field element is keyed and the Total Clouds left
            blank.
        b.  Total opaque is not keyed, unless indicated in the station's
            observation pattern (III. D. 1.) .
        c.  Total opaque is greater than total cloud amount.
        d.  Total opaque is less than 10/10, and any blanks occur in
            Cloud and Obscuring Phenomena fields including summarions,
            FAA excepted.
        e.  Any element within a cloud layer is keyed (amount, type,
            or height), and any other element is left blank.
        f.  Total Cloud amount is keyed from 0/10 to 9/10 inclusive,
            and amounts and types of fields above highest reported layer
            are not coded "0" and heights are not coded "XXX."
        g.  Each summation amount does not equal or exceed the next lower
            summation amount, or if a succeeding summation amount is
            greater than 1 more than the amount(s) of the accitional
            layer(s), or exceeds 10/10.
    
            1.  In the case of partial obscuration in the first layer,
                the second summation is not greater than the, amount of
                the first layer if a cloud layer is also reported.
    
                              163
    

    -------
        2.  The summation amount is less than any lower individual
            cloud layer amount.
    
        3 .  Blanks in a summation amount are not preceded by 10/10
            in the last summation amount (which is not blank) by
            blank cloud amounts in fields with blank summarion
            amounts, and 10/10 Total Opaque is not reported.
    
    h.  Height ranges by cloud types are in disagreement with those
        listed under Obscuration and Cloud heights .
    
    i.  Fog (code 1) is coded in layers above the first.
    
    j . * Height in cloud layers reporting height does not increase
             one layer to the next.
    k.HfThe ceiling height does not agree with the lowest layer
        height constituting a ceiling, or the highest sky symbol
        is not compatible with total sky cover.
       4
      »
                            164
    

    -------
    G.  Explanation of Edit Flags
    
        Beginning on the following page are numbered explanations of edit
        flags appearing in the correspondingly numbered observations printed
        out in the Surface Weather Observations format.  An asterisk (*)
        prints over fields to be checked.
    
        The edit is designed for No. 1 card images keyed every 3 hours (at
        local standard time corresponding to 00, 03, 06, 09, 12, 15, 18 and
        21 hours GMT) and is readily adaptable to programs under which all
        record observations are keyed.
    
    
    
    
    
    
    1
    1-
    c:
    in
    (0
    0
    0
    u_
    1
    "
    i
    <
    *-
    n
    y
    s
    §
    
    
    STAT'OH
    NUMBER
    
    
    00000
    i a 3 4 s
    11111
    22222
    
    33333
    
    DATE
    
    rr»
    
    00
    j 7
    
    no
    
    3D
    8 9
    1111
    22
    
    33
    2
    
    DAY
    
    iiO
    1911
    1 1
    22
    
    333
    
    
    00 NOT PUNCH
    IN THESE
    COLUMNS
    S666C
    77777
    
    88886
    
    99993
    
    
    66
    77
    
    88
    
    6
    7
    
    8
    
    99 9
    
    
    
    
    4
    
    5
    6
    7
    
    8
    
    9
    
    
    APK323O BSC I
    1 CEIltj } «KV 1 {WEATHER AMD/OKI
    j II IcovjT^fi! l^sr^jcrod-, To.sjW
    KOJA
    
    
    OC
    1211
    1 1
    22
    
    3
    
    4
    
    5
    E
    7
    
    8
    •
    »ct w s 'S Ij i)
    mi, II !| I, 1
    ^M ; i ,
    S333 !
    OOG
    srrVf '.-•""'-'' J 5. I;"-;
    °EW I WIND 1 I ORY
    PO,»4T 1 	 -— r-J 1 Bill p
    
    iMttLC?jL£*i*>i "| F>*l",->, j « irStSSJRc1 j
    pS|>«c»
    3IOIDISIO 8:0
    0 5^0
    ;4l5i; 17 '8 :S !i 21 2125 24
    1 1 1
    222
    
    333
    
    444
    
    55
    66
    77
    
    88
    I
    cj g o
    i
    
    
    dstes
    
    1 t t
    
    assist
    
    fiSU^I
    1 1 1
    25I»
    1 1V!"-1
    ..ill
    1 '
    33V4-!
    
    44V,
    
    5551s
    66'i
    me&i ? \
    i
    jte&felaa'i
    
    
    III 9 9 *<
    
    
    12 13, '4 15 11 n ig iv A'UI ^/ a
    i 1 1
    
    
    
    o
    
    
    
    
    
    
    
    
    24
    |
    K L-
    1
    
    1
    1
    T.'K'b
    1
    »Ui
    1
    [8-fl-
    1
    
    73! 21
    i
    TKt
    1
    OlOiOICl!)
    27IH<21|3sh
    S-*3»-if f 1 1
    1 1
    1 1
    s>ii«if-
    1 1
    
    1 1
    5?l U
    1 1
    1 1
    teH
    1 1
    i;kcl
    1 1
    l»»
    1 1
    I
    I
    Sl|i>
    1
    1515
    1
    
    1
    b-
    1
    1
    1
    1
    1
    1
    1
    WET
    RM B
    ^I'tm^^'Jj-f'f^-"
    "" a i" «
    C'.eEi j-t, f-i:1-!
    0 0 DlO
    n 15 "JJ3
    1 111
    1
    22(2
    I
    3313
    I
    4414
    1
    55'5
    66i6
    1
    7717
    1
    8818
    1
    99919
    1
    
    IT i 7S|29l 30)31 1 3! 33 34)35
    i i 1 i 1 i
    -flOOC
    36 37 C3 :9 4C
    1 ill 1
    22122
    
    33
    
    44
    
    55
    66
    77
    
    88
    
    99
    
    
    2S 3J 33
    
    33
    
    110
    41 4:
    1 1
    22
    
    33
    
    444
    i
    r,
    6
    7
    
    8
    
    9
    
    
    39 41
    55
    66
    7 7
    
    88
    
    39
    
    
    4! 42
    (I'.CHES) | i
    
    0 CIO !i
    43 «|4) 45
    1 111 1
    1
    22122
    j .
    331^3
    H
    4144
    1
    515 i
    616 L
    . I
    ^
    £ D
    
    H-^M
    CLOUDS AND O8SCUWN6 PHENCV^1. A
    £*--
    B,'' ^ 1
    CU-
    - o : - oo;o o o
    47 48 43 5C 51 V
    1 ! 1
    22
    
    33
    
    44
    
    5 5
    J11
    7'77! 77
    1
    8'3 6
    1
    S'3 3
    1
    
    43 44143 45
    I
    88
    
    39
    
    1 1
    22
    
    33
    
    44
    
    1] 54 *5
    1 1 1
    22
    
    33
    
    OiO
    Sj ''
    ih
    J.
    
    i
    
    44J4
    
    5 S| 5 5
    66
    77
    
    88
    
    99
    
    
    -!»  E B
    '
    1
    ,' 1
    t 5 -v
    v 77i;; ? 7. ''7..-
    
    
    
    5 6 6
    7 T 7
    i , ' ' •
    B 8i? *•'• 8 s!?.3' ! 3 ?. Gj
    
    i
    i
    ;
    ok Q q!o c) ,i c C'. c:-
    3 •. 3 S 0 1 u 1 Jj'v.a-'"
    
    
    
    
    
    u2 DJr^ t] bs
    i
    1
    
    ,: O
    i
    1
    
    j' i
    i
    9 S SlSi
    |
    
    "6!
    
    
    
    
    
                                      Fig. 2
    ' STATION
    NUMBER
    WSAN
    00000
    1 Z I < 5
    . 11111
    §22222
    133333
    • §44444
    ^55555
    3^66666
    2|77777
    g;88888
    §199999
    C* 1 ! 1 4 5
    ; . OC M-
    Y
    E
    A
    R
    00
    ( 7
    1 J
    22
    33
    44
    55
    66
    77
    88
    99
    t 7
    m
    N
    T
    H
    00
    8 9
    11
    22
    33
    44
    55
    66
    77
    88
    99
    8 9
    54>4
    S
    00
    ion
    1 1
    22
    33
    44
    55
    66
    77
    88
    99
    101'
    o
    !
    ** i
    *
    00(
    12 n i
    1 1
    2 2
    33
    44
    55
    661
    77
    88
    99
    u-
    1'
    ^
    JOO
    4 IS IS
    1 1
    222
    333
    »44
    555
    166
    777
    !S8
    599
    i
    f
    0
    17
    1
    2
    3
    4
    5
    6
    7
    8
    9
    u
    DATA
    00000
    18 19 20 21 22
    11111
    22222
    33333
    4444 j
    55555
    66666
    77777
    88888
    99999
    18 II) 3 21 22
    -
    i.
    00(
    23142
    1 1
    22
    33
    44'
    55
    66f
    77
    38!
    99
    21242
    ; DATA
    0000 0
    i 26 27 28 23 X
    11111
    22222
    3333.3
    44444
    55555
    66666
    77777
    88888
    99999
    5 262228 2938
    or~n— 1|
    001
    11 323
    1 1
    22
    33'
    44.
    555
    66
    775
    88!
    99C
    ;i 321
    1 DATA
    00000
    1 34 3; -o; B
    11111
    22222
    33333
    44444
    55555
    66666
    77777
    38888
    99999
    3 31 »36 3/ *>
    f r
    1
    E
    b '
    oot
    39 40 1
    1 1
    22
    33
    444
    55
    66
    77
    88!
    39
    DATA
    00000
    42 41 44 45 46
    11111
    22222
    33333
    44444
    55555
    66666
    77777
    888S8
    99999
    42 41 44 -.3 46
    F
    b
    00
    4748
    1 1
    22
    33
    44
    55
    66
    77
    38
    99
    47 48
    1
    0
    n
    1
    2
    3
    4
    5
    6
    7
    8
    9
    '5
    DATA
    00000
    £0 M 52 53 54
    11111
    22222
    33333
    44444
    55555
    66666
    77777
    88888
    99999
    SO SI S2 53 54
    b '•
    OOC
    5556 .
    1 1
    223
    33:
    444
    55S
    66
    77)
    88E
    99'
    5553 '.
    DATA
    00000
    58 59 60 61 62
    11111
    22222
    33333
    44444
    55555
    66666
    77777
    88888
    99999
    1 58 i-J U 61 62
    orm— •*!
    000
    53 64 S
    1 1 1
    222
    33:
    44<
    5 55
    6 6 E
    7 7)
    38!
    991
    au E
    DATA
    00000
    66 67 68 69 70
    11111
    22222
    33333
    44444
    55555
    66666
    77777
    88888
    99999
    5 68 !7 68 B 70
    F ,
    1
    E1
    001
    71 :i 7
    1 1
    22
    33
    44'
    55
    66
    77
    88
    9S
    7,72,7
    DATA
    00000
    3 74 75 75 T 78
    11111
    22222
    833333
    44444
    55555
    66666
    77777
    88888
    99999
    3 74 '' 7j 77 73
    
    00
    r9 80
    1 1
    22
    33
    44 J
    U*
    55 i
    66-S
    77 |
    88 8
    UJ
    99 g
    73 80 1*4
                                      Fig.  2a
                                       165
    

    -------
           Explanations  of Reasons  for  Flags  on Correspondingly
               Numbered  Observations  in the Edit Listings  **
    
    
     1.   Dew point incorrectly keyed.
     2.   Hour other than that in the  normal program.
     3.   Card missing for hour in station program.
     4.   Dew point temperature higher than dry-bulb temperature.
     5.   Ceiling height  differs from  cloud layer height.
     6.   Ceiling height  differs from  cloud layer height.
     7.   Obscuration under Weather  is snow with type of obscuration fog.
     8.   Obscuration with less than 10/10 sky cover.
     9.   First sky symbol scattered with corresponding layer over 5/10,
         and Second cloud group and ceiling non reportable value.
    10.   Ceiling, sky symbol, and summation total not in agreement with
         total sky cover.
    11.   Two opaque overcast symbols.
    12.   Lower layer is  not opaque.
    13.   Incorrect relationship of  ceiling, sky condition, and total
         opaque sky cover.
    14.   Dry-bulb temperature incorrectly keyed.
    15.   Review of wind  speeds over 50  knots.
    16.   Wind direction  value over  36 (360°).
    17.   Partial obscuration with lowest layer a cloud type.
    18.   Wind direction  with calm wind  speed.
    19.   Wind speed with no wind direction.
    20.   Ceiling height  missing.
    21.   Amount of partial obscuration  is greater than total opaque.
    22.   Total opaque missing.
    23.   Amount of obscuration and  total sky  cover differ.
    24.   Amount of obscuration less than 10/10.
    25.   Partial obscuration due to fog omitted from weather, and in-
         complete keying of cloud and obscuring phenomena.
    26.   Amount of partial obscuration  is greater than total opaque.
         Also, incomplete keying of clouds and obscuring phenomena.
    27.   Third layer summation is missing.
    28.   Total sky cover omitted.
    29.   Visibility omitted.
    30.   First two columns of weather omitted.
    31.   Fog not shown as obstruction to vision.  Clouds and obscuring
         phenomena layers less than total sky cover.
    32.   Illegal visibility.
    33.   GrOund fog with obscuration greater  than 5/10.
    34.   Fog reported with less than 6/10 obscuration and lowest cloud
         layer greater than 5000 feet.
    35.   Ceiling not a reportable height.
    36.   Ground fog with over 5/10 obscuration.
    37.   Sky symbol and first cloud layer not in agreement.
    38.   Total opaque cloudiness and cloud layer data in error; or
         ceiling, sky and cloud layer relationships in error.
                 **  See  paqes  168  through  174.
    
                                166
    

    -------
    39.  Blowing dust with wind speed less than 7 knots.
    40.  Flagged for review - no increase in 2nd layer summation amount.
    41.  Ceiling height not a reportable value.
    42.  Fog reported as obstruction to vision with visibility greater
         than six miles.
    43.  Visibility value not reportable.
    44.  Visibility value not reportable.
    45.  Ground fog reported as obstruction to vision with visibility
         greater than six miles.
    46.  Visibility reduced to less than seven miles and no obstruction
         to vision.
    47.  Illegal keying in weather and obstruction to vision columns.
    48.  Illegal keying in weather and obstruction to vision columns.
    49.  Squalls reported with wind speed less than 16 knots.
    50.  Fog with less than 6/10 obscuration and lowest cloud layer
         greater than 5000 feet and psychrometric error.
    51.  Sea level pressure flagged for non-reportable value.
    52.  Dry bulb and dew point sequence check.
    53.  Station pressure flagged for improbable value.
    54.  Dew point incorrectly keyed.
    55.  Cloud height incorrectly keyed.
    56.  Illegal punch in weather & obstruction to vision columns.
    57.  Flagged for intensity of snow with 1/4 visibility.
    58.  Station pressure sequence check.
    59.  Duplicate cards, date and hour 1st card.
    60.  Duplicate cards, date and hour 2nd card.
    61.  Visibility sequence check.  Change in values up.  Sea level and
         station pressure check.
    62.  Visibility sequence check.  Change in values down.
    63.  Missinb observation.
    64.  Dry bulb sequence check.
    65.  Flagged for intensity of snow with 1/4 mile visibility.
    66.  Flagged for liquid precipitation with 24-degree temperature.
    67.  Station and sea level pressure sequence check.
    68.  Station and sea level pressure sequence check.
    69.  Sea level pressure sequence check, station pressure flagged for
         review.
    70.  Frozen precipitation with 45-degree temperature, station pressure
         flagged for review.
    71.  Station pressure flagged for review.
    72.  Station pressure flagged for review.
    73.  Snow intensity not in agreement with visibility.
    74.  Snow intensity not in agreement with visibility.
    75.  Missing observation.
    76.  Sea level pressure sequence check.
    77.  Sea level pressure sequence check.
    78.  Duplicate cards, date and hour 1st card.
    79.  Duplicate cards, date and hour 2nd card.  Dry bulb sequence check.
    SO.  Sky condition symbols missing.
    81.  Weather and obstruction to vision symbols missing.
    82.  Observations for the 29th day missing.
    83.  Observations for the 30th day missing.
    84   Observations for the 31st day missing.
    85.  Monthly inventory check.
    
                                 167
    

    -------
    IMOAA F«rw K2-3U , ft IPPArP \A/PATWPP OR^PRVATinN-K NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION
    (1.73, OUKPAV-t WCAmCK WBOCKVMIIUINO ENVIRONMENTAL CATA SERVICE
    c
    f
    Ul
    u
    u
    -
    s
    u
    E
    3
    Z
    
    
    
    i
    CM
    
    1 STATION NAME
    WBAN EDIT TEST *1
    CO
    5 °
    STATION NO.
    OOOOA
    S
    s
    K
    K
    S
    s
    -
    =
    R
    S
    :
    6
    2
    -
    -
    -
    -
    -
    fl
    F*
    *;
    Q
    u
    
    inorrfo
    •c
    u
    
    
    fit Ofl -MW
    MAI
    4NOO«
    NOI1VWMOS
    OAVl OHC
    uaxvi out
    ";"..v*
    MAI
    innoHv
    SjAvrss
    2ND LAYER
    LOWEST LAYER
    "'"..T1
    Jjju
    ANCWm
    C14 -K) IMtl
    MU
    H3A03 AXf
    1V101
    (V)
    AiiownH
    3AI1VT3M
    •ina
    (•j
    13*
    >)
    iitnstaaj
    NO 11 VIS
    ** i
    * i
    Ii
    
    TENS OF
    DECRCCS
    S|t
    (•$««>
    13A11 V3S
    HEATHER AND/OR OBSTRUCTION TO VIIION
    s>
    PROZCN
    PRECIP.
    LIQUID
    PBECIP.
    .•u:r.'«
    VISIBILITY
    (MILES)
    • SKY
    STMBOLS
    Ml
    M ~
    •14 JO 'SOH
    9HI1I33
    
    1
    2
    
    *^
    1
    1
    0
    
    O
    1
    1
    O
    
    o
    1
    1
    o
    
    1
    1
    1
    o
    	 o
    	 o
    CO
    o
    o
    -«
    •o
    CO
    CM
    o
    CM
    O
    O
    * 1
    o
    o
    o
    o
    o
    o
    o
    o
    o
    o
    o
    
    0
    
    1
    1
    o
    o
    ^
    1
    1
    o
    0
    1
    1
    o
    
    o
    1
    o
    1
    1
    1
    0
    	 o
    1-
    o
    o
    o
    -•
    1-
    03
    N
    o
    C"
    CM
    O
    0
    O
    o
    0
    o
    o
    0
    o
    o
    0
    o
    o
    IM
    
    
    
    1
    1
    CMCO
    o
    o
    
    
    
    
    
    
    
    
    
    	
    
    	
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    	
    
    
    
    o -»
    in O
    o
    
    1
    1
    o
    0
    •o
    1
    1
    0
    
    •o
    1
    o
    	 -o
    o
    0
    en
    	 o
    "°
    o
    *4
    CO
    0
    o
    r-
    co
    CM
    O
    CD
    CM
    O
    m
    0
    o
    0
    o
    o
    o
    o
    o
    o
    0
    o
    — o
    "
    1
    1
    1
    0
    0
    *
    
    
    
    
    
    
    
    o
    o
    m
    Cl
    o
    CO
    o
    
    CO
    o
    IT-
    CM
    1-
    co
    CM
    O
    O
    •fl
    •O
    o
    o
    0
    o
    o
    o
    o
    o
    o
    o
    o
    :3
    f-
    * o
    2
    0
    ^
    
    
    
    
    
    
    
    
    	
    o
    CO
    °
    o
    o
    CM
    CM
    0
    CM
    r-1
    0
    CM
    O
    0
    O
    O
    0
    o
    o
    o
    o
    o
    o
    o
    o
    c\
    
    
    "**
    
    * 0
    •o
    0
    ^
    
    
    
    
    
    
    
    
    
    
    
    O>
    o
    o
    D
    u.
    	 o
    CO
    o
    CM
    CM
    •O
    CD
    CM
    .-*
    m
    CM
    r-
    c-
    o
    0
    o
    o
    o
    o
    I/)
    0
    0
    * o
    o
    o
    — ^
    
    *
    o
    o
    o
    o
    *
    1
    0
    •-•o
    *
    e»
    l
    l
    o
    *
    o
    1
    1
    0
    *
    t-
    o
    o
    1
    	 o-
    	 o
    *
    CO
    r-
    0
    l\
    CM
    a
    CM
    O
    o-
    o-
    o
    o
    o
    0
    o
    o
    1
    o
    o
    o
    CM
    o
    
    
    *
    0
    0
    CM
    CM
    O
    z
    
    
    *
    
    
    
    *
    o
    CO
    CO
    U
    	 CO
    *
    o
    o
    — «
    	 o
    CD
    O
    o
    CM
    m
    CO
    CM
    IA
    O
    tA
    CM
    *
    C
    0
    X
    o
    o
    o
    o
    0
    0
    o
    
    o
    	 o
    o
    IA
    O
    (/•
    -
    •o
    o
    ~
    "
    CO
    CO
    
    o
    r-
    CM
    CM
    o
    0
    o
    o
    o
    0
    o
    o
    0
    o
    o
    
    XJ
    
    *
    u
    o
    o
    o
    '%J
    1
    1
    o
    
    0
    1
    o
    
    o
    f\
    "
    * r-l
    o
    a-
    o
    — 0
    	 o
    o
    o
    o
    o
    -
    CO
    CO
    
    *
    s
    CM
    o
    •o
    0
    o
    o
    o
    o
    o
    0
    o
    o
    o
    in
    
    	 a
    1
    o<
    * o
    *
    o
    o
    "'
    1
    1
    o
    o
    o
    1
    0
    0
    0
    c\
    •"
    s
    0
    0
    o
    » -f
    	 o
    •o
    1-
    0
    o
    o
    o
    o
    CO
    
    CO
    o
    fx
    o
    1-
    o
    0
    0
    o
    o
    o
    o
    o
    o
    o
    o
    CM
    — ^
    °
    *
    o
    » 0
    o
    o
    o
    1
    I
    o
    - -— o
    o
    1
    o
    -(5
    o
    1
    1
    1
    0
    	 TC
    I
    t
    1
    o
    o
    4
    0
    o
    0
    « 1
    o-
    
    o
    1^
    CM
    o
    u
    CM
    o
    0
    o
    o
    o
    o
    0
    o
    o
    o
    CS
    — c
    - 
    CM
    It IA
    O
    CM
    l~
    O
    f
    <\
    0
    O
    o
    o
    o
    o
    o
    o
    o
    o
    
    
    m
    CN
    O
    
    o
    o
    1
    _ 1
    o
    c
    1
    1
    I
    o
    — js
    o
    1
    0
    c
    1
    1
    1
    o
    o
    <
    o
    c
    o
    *M
    o
    CM
    0
    r-
    * m
    r—
    o
    I
    a
    CM
    C
    0
    o
    0
    0
    o
    0
    0
    o
    o
    in
    
    u
    •-*
    
    o
    o
    OXEN LATER X» OBSCURATION ("X" APPEARING IH CLOUD TTPE COLUMNS NOTEl "+" DENOTES -MEAVY-
    ERCAST LATEH DENOTES OBSCURING PHENOMENA OTHER THAN POO) «-" DENOTES" -PARTIAL." "LIGHT." "THIN". OR "MINUS- AS APPROPRIATE
    • H
    • O
    ,
    TTHIO LAY
    CLOUD
    ENTRIEIiSiSC'
    C>NO
    i
    S
    
    i-l i-l 1-4 i-l i-l fH r-l
    Fig. 3a
    
    
    
    
    
    
    
       168
                                                          OINVBM
    

    -------
    1 33IAB3S*.l\rO-|VJ.N3MJOHmN3 CM/-M 1 W A M3Car\ V3UIW3AA 3^WJVnt- Ert) 1
    | Noavtij.stNirvav3iii=w"«*« von J
    NATIONAL CUMATIC CENTER |
    
    
    
    
    X
    u
    £
    O1
    (M
    
    STATION NAME
    WBAN EDIT TEST #1
    to
    rf 0
    « 0
    u •"
    STATION NO.
    00001
    a
    s
    n
    «
    s
    a
    R
    ~
    
    "
    i
    to
    £
    O
    -
    •
    -
    -
    '
    n
    "
    O
    
    ano*dQ
    K
    u« *••*•«
    «A1
    IHTOW
    NOIiYNHnt
    D3AY1 Odt
    3RD LATER
    '""„"?"
    IKflOKY
    a 3 xvi OWE
    K
    *
    J
    a
    z
    *
    w
    >•
    <
    UJ
    *
    O
    _»•
    ' " ™£"
    IdAi
    iHnoiw
    en ID tew
    WAI
    INnOWT
    Mii?ioi>$
    (M
    3AUV13H
    aina
    13«
    ains ma
    (S3HDHO
    NOUV.IS
    Q u
    J «
    ll
    |
    0 JJJ
    P. S
    sft
    T3A3T f 35
    WEATHER AND/OR OBSTRUCTION TO VISION
    51
    S>
    FROZEN
    PRCCIF.
    ,__J
    LIQUID
    PRECIP.
    «",r,°.v«
    >-
    >
    * SKY
    SYMBOLS
    (MILES)
    „, "
    "
    „ "
    «. -
    'IJ JO'SQH
    ONI1133
    I
    
    
    
    
    
    
    
    
    
    
    o
    
    *
    CM
    O
    o
    o
    CO
    CM
    m
    o
    X
    o
    o
    o
    o
    o
    o
    c
    o
    m
    o
    
    
    **
    * i
    CM
    2
    O
    *
    t
    
    
    
    
    
    
    
    
    
    
    f-
    0
    o
    to
    •-)
    -
    o
    !•-
    O
    r-
    en
    *
    *n
    •o
    0
    S
    «
    P-I
    cf>
    o
    o
    o
    o
    o
    o
    0
    o
    o
    m
    o
    
    — — o
    o
    o
    o
    0
    
    	 r
    i
    i
    o
    0
    CO
    1
    1
    o
    <_>
    CO
    *
    1
    1
    o
    o
    1
    1
    1
    D
    U.
    •ft
    
    CO
    o
    in
    in
    in
    CM
    •o
    0
    z:
    i
    •o
    (N
    O
    O
    a.
    CO
    o
    o
    o
    o
    0
    o
    ^
    	 — X
    1
    t
    
    O
    s
    
    o
    0
    CM
    •O
    O
    Z
    m
    fM
    o
    o
    u.
    o
    o
    o
    o
    1
    a.
    0
    o
    0
    G
    
    
    * **
    O
    O
    o
    o
    
    
    
    
    
    »
    
    
    *
    *
    
    
    *
    1
    1
    a
    u.
    * C'
    o-
    
    m
    CO
    o
    CM
    -t
    CO
    CO
    CM
    m
    o
    CO
    o
    m
    o
    o
    o
    o
    o
    o
    o
    * o
    o
    o
    0
    	 o-
    
    
    1
    1
    r-
    o
    o
    
    
    
    
    
    *
    
    
    *
    *
    
    
    «
    1
    1
    1
    o
    u.
    
    •^
    
    o
    CO
    o
    *
    
    t>
    CO
    CM
    m
    0
    l
    r-t
    
    -«
    CO
    in
    o
    s
    CM
    CM
    O
    CO
    0
    m
    r-t
    CM
    O
    O
    o
    o
    o
    o
    o
    o
    o
    
    0
    
    0
    
    » CM
    0
    -•
    
    
    
    
    
    
    
    
    
    
    
    
    o
    CO
    o
    <
    W
    *
    CO
    in
    o
    CM
    (M
    CM
    tM
    CO
    o
    r-
    0
    (M
    r-4
    o
    o
    o
    o
    o
    0
    o
    *
    o
    
    — ^
    CO
    o
    o
    r-l
    
    
    
    
    
    
    -—-
    o
    * -•
    o
    o
    «
    *
    1
    1
    1
    D
    u.
    *
    -
    CO
    f-
    o
    fvJ
    (M
    r-t
    PJ
    o
    rsj
    O
    OJ
    o
    O
    O
    a.
    0
    0
    0
    1
    a:
    K|
    * C
    o
    0
    1
    o
    o
    ft
    r*
    
    _ —
    
    
    
    
    
    
    
    
    
    m
    O
    o
    I/I
    r-l
    
    CO
    o
    
    -------
    
    
    1C NATIONAL OCEANIC AND ATMOSPHERIC AOMINISTBATIO
    "*a ENVIRONMENTAL DATA SEiWtCE
    NATIONAL CLIMATIC CENTER
    I JKUUF..UW SURFACE WEATHER OBSERVATlOt"
    1 u-73
    
    
    
    
    
    
    
    
    5
    £
    O
    CM
    
    STATION HAKE
    WBAN EDIT TEST f»l
    ?°
    2°
    STATION NO.
    ooool
    3
    s
    s
    s
    s
    R
    R
    K
    s
    t
    =
    t
    2
    -
    -
    -
    -
    -
    
    
    "
    c
    u
    
    anerwo
    J
    
    
    „.-,..
    arfAJ.
    iHnomr
    NOIlYWtM
    X
    < i< n tMQ
    im
    1NHOT1V
    WOIJ.V
    o
    x
    y
    MHOS
    OH:
    •"M.2T"
    3JA1
    IMHOHT
    1IOIIH
    »A1
    INnonr
    B3AOD A«
    3AI1T13S
    U
    flina
    .)
    u.)
    anna «ao
    NOIJ.V1S
    I||
    x«2'B
    *• S
    s|£
    aanssigj
    T3A3T ta$
    VIATHE* AND/OR OBSTRUCTION TO VIIION
    Ii
    MOZCH
    PRCCIP.
    5u
    uauiMi
    VISIBILITY
    MILES!
    • SKY
    SYMBOLS
    
    ^ •*
    V, **
    
    'J.J 40 'SON
    ONnm
    
    
    2
    
    *
    1
    1
    O
    
    0-
    1
    t
    o
    	 0
    
    1
    1
    o
    
    1
    1
    1
    CD
    u_
    *
    *
    CM
    CO
    0
    CO
    rj
    CM
    O
    Co
    CM
    O
    •-4
    (-4
    CM
    CM
    0
    0
    u.
    o
    o
    o
    o
    o
    0
    « 1C
    o
    n
    o
    
    
    
    « i
    l
    i
    I
    tr
    •-4
    
    
    1
    t
    O
    
    CO
    1
    1
    O
    
    CO
    o
    in
    	 1-
    I
    1
    O
    u.
    *
    
    tn
    CO
    O
    CM
    CM
    CO
    CM
    0
    in
    o
    CM
    o
    o
    UL
    O
    O
    e>
    0
    » o
    o
    •-I
    o
    	 -_A
    01
    1
    o
    o
    CM
    O
    
    O
    
    
    
    
    
    
    
    
    
    	
    o
    o
    tn
    * -i
    
    o
    o
    o>
    CM
    CO
    
    CO
    CM
    O
    CO
    0
    CO
    CM
    1-
    0
    0
    0
    o
    o
    o
    o
    o
    •3
    0
    c
    
    "
    ^^
    ^•*
    m
    * 0
    o
    
    o
    
    
    
    
    
    
    
    o
    o
    0
    
    en
    i
    i
    i
    o
    u.
    *
    
    CM
    CO
    O
    CO
    CO
    
    CO
    CM
    in
    o
    CM
    CO
    CM
    in
    •-*
    o
    o
    u.
    u
    o
    o
    o
    0
    1
    tf
    * o
    CSJ
    o
    o
    
    c
    1
    o
    o
    0
    
    
    1
    1
    1
    o
    ..... JJ.
    CO
    1
    1
    o
    0
    CO
    1
    1
    0
    o
    0
    o
    u_
    
    
    •o
    o-
    o
    CO
    m
    
    CM
    o
    S
    CO
    CO
    o
    o
    o
    u.
    o
    o
    o
    o
    1
    o
    o
    o
    o
    
    — ^
    *
    o
    m
    
    
    •
    
    *
    *
    
    
    *
    o
    * -<
    0
    CO
    — — Iff
    m
    o
    V,
    *~
    * -i
    CO
    o
    o
    
    ro
    
    CO
    CM
    -
    CO
    CM
    CO
    CM
    C"
    0
    0
    0
    o
    o
    o
    o
    o
    o
    o
    0
    in
    	 -o
    —~—- ^
    in
    o
    p-
    o
    ^.'
    o
    1
    1
    o
    o
    o
    i
    i
    i
    o
    o
    1
    1
    o
    o
    o
    <
    w
    
    O
    CO
    0
    ^
    
    CO
    
    CO
    ' CM
    * o
    CO
    CM
    O>
    CM
    
    0
    o
    0
    CO
    o
    o
    o
    o
    _ o
    * 0
    0
    o
    o
    IT0
    1
    1
    o
    i
    *~
    1
    1
    0
    o
    o
    r-t
    CM
    o
    tn
    ^
    CO
    *
    o
    0
    1-
    tn
    1
    l
    1
    " O
    a
    u.
    *
    
    CO
    CO
    o
    CO
    
    ~
    o
    CM
    O
    
    CM
    O
    
    o
    X
    u_
    o
    o
    o
    o
    	 p
    c
    0
    •0
    o
    	 >c
    1
    CM
    o
    r-
    0
    _4
    O
    r-l
    
    
    
    
    
    
    
    0
    CO
    CO
    • 'in
    	 if
    *
    CO
    
    
    ~°
    "
    
    fM
    O
    O
    f.
    .-J
    1
    1
    1
    o
    0
    -"
    1
    1
    o
    **
    -
    1
    1
    o
    **
    o
    CO
    p
    
    ~
    CO
    c
    *
    
    -
    
    Csl
    O
    
    CO
    -
    
    0
    o
    o
    o
    o
    o
    o
    o
    o
    0
    
    	 en
    i
    r-
    o
    _4
    *
    1
    1
    O
    -o
    CO
    1
    1
    1
    o
    — - — TO
    CO
    1
    1
    o
    o
    o
    o
    en
    u
    
    
    o
    in
    o
    3}
    
    CO
    
    CM
    0
    
    CM
    r-
    
    o
    o
    u.
    o
    o
    0
    o
    o
    o
    * o
    o
    
    TQ
    1
    !
    o
    *
    «-<
    1
    0
    o
    CO
    1
    1
    o
    0
    CO
    o
    tj
    
    o
    •o
    o
    *
    
    
    in
    o
    in
    
    *
    O*
    fM
    *
    
    cs
    CO
    CM
    
    O
    0
    0
    O
    o
    o
    o
    o
    * 0
    o
    
    ^
    VI
    
    
    f
    ,-
    »-«
    
    
    
    
    
    
    
    O
    o
    I/
    o
    u
    l/>
    
    
    m
    o
    u\
    
    m
    CO
    CM
    O
    O
    O
    CM
    O
    CO
    
    O
    O
    O
    CO
    o
    o
    o
    0
    * 0
    0
    
    w
    
    
    o
    o
    _
    o
    
    
    
    
    
    
    
    o
    10
    o
    «J
    I/I
    CM
    *-•
    o
    u
    in
    vu
    *-<
    CM
    CO
    O
    CO
    
    CM
    r-
    co
    CSJ
    O>
    O
    o
    CM
    •*•
    CM
    0
    0
    0
    o
    •o
    0
    o
    o
    o
    * o
    o
    o
    	 o
    *"*
    
    0
    r-
    o
    r*
    1« IDOKEII LAYEK K» 001CU«ATIOH ("X" APPEAKIN9 IN CLOUD tYPI COLUMNS NOTEl "+" DlROIlt "BEAVlf "
    0 . OVEHCAST LAYEK DENOTES OOJCUKIH6 PHENOMENA OTHEH THAN POO) "-" DENOTES" "PAKTIAL." "LIOHT," -THIN". OK "MINUS" AS APPSOf KIATE
    •e
    TTEHED LAY
    CLOUD
    ENTKIESiS>SCA
    C* NO
    i
    8
    
    SSSSKSSS5335355S
    Fig. 3c
    
    
    
    
    
        170
    OlNt8M
    

    -------
    o
    NOAA fwm S206* StlRPArP WPATHPR OBSERVATIONS NATIONAL OCEANIC ANO ATMOSPHERIC AOM.NISTSAT
    (3-73) iUKrAV-C WCAinCK VJD3CKVAIIUIN3 ENVIRONMENTAL OATASERV'CE
    NATIONAL CLIMATIC CS.NTS3
    
    
    x
    
    STATION NAUI
    WBAN EDIT TEST #1
    CO
    d O
    « 0
    3 r-
    STAT10H NO.
    00001
    »
    «
    R
    £
    S
    s
    r«
    =
    o
    :
    5
    IS.
    O
    '
    -
    -
    -
    -
    «
    ~
    _J
    O
    300VJO
    X
    X
    >-
    "ixy
    Mil
    INflONV
    M3AV1 08C
    w
    •€
    a
    K
    n
    1HMM
    Mil
    IHnONV
    SSjiVToS!
    2ND LAYER
    LOWEST LAYER
    1MIU
    J..U
    IHOOHV
    4M-M1W
    3JA1
    iNnonv
    "*?ioV$
    Aliarwnn
    flina 13*
    tins Ana
    {53H3NO
    NOI1V1S
    e SP
    - "2
    * «s
    Q«*° «
    S5£i
    »- Q
    a*.>"
    T3A31 V3«
    i
    o
    X
    o
    i
    1-
    a
    O
    «
    a
    Cf
    X
    t-
    *
    *t o
    g?
    N U
    O 411
    DC tt
    M. k.
    LIQUID
    PRECIP.
    ~"!££ii
    = 5
    *
    
    0 v, "*
    > «•
    •U JO '5CJH
    i
    2
    O
    
    
    
    
    
    
    
    O
    r-t
    t/»
    O
    o
    
    
    o
    o
    1/1
    
    
    -0
    o
    s
    
    •0
    a)
    CM
    CM
    * O
    CM
    O
    CO
    o
    o
    o
    a.
    o
    0
    1
    t/)
    o
    o
    * 0
    CM
    •*N.
    r-t
    r-4
    O
    
    
    o
    o
    o
    r-4
    ^
    CM
    1
    1
    1
    O
    	 a
    *
    i
    o
    	 cr
    -*
    O
    o
    1
    1
    1
    a
    u.
    *
    
    o
    o
    o
    CM
    *
    CO
    CM
    *
    CM
    CO
    *
    O
    O
    0
    u.
    o
    0
    o
    o
    o
    # o
    o
    in
    o
    i
    i
    i
    2
    Z
    
    t
    t
    o
    0
    -
    o
    0
    •+
    1
    1
    o
    	 -o
    o
    o
    r-4
    •^r
    
    O
    •O
    O
    O
    CM
    CO
    CM
    ^
    O
    O
    CM
    O
    * 0
    O
    O
    O
    O
    0
    o
    0
    o
    o
    m
    ~ — \^-
    u
    
    o
    ^
    CM
    1
    1
    O
    °
    *
    1
    1
    O
    - - -ts
    *
    S
    
    tn
    m
    o
    VI
    
    o
    •o
    o
    o
    1
    
    
    
    o
    o
    o
    CM
    CM
    r>J
    CO
    CM
    in
    o
    tn
    CM
    CM
    in
    o
    o
    o
    u_
    CO
    o
    o
    o
    1
    ce.
    •» o
    o
    tn
    0
    
    
    COT
    CM
    o
    o
    (M
    o
    •-I
    
    
    
    
    
    
    
    
    
    
    
    CM
    o
    o
    1
    
    ,-t
    in
    CO
    o
    CM
    CM
    CM
    CO
    CM
    0
    CM
    o
    CM
    o
    CM
    CM
    o
    o
    0
    0
    o
    o
    o
    o
    * o
    i-4
    o
    * 0
    
    
    	 XT
    o
    o
    t 0
    
    u
    
    0
    2
    
    CM m
    CO CO
    CM CM
    —1
    CM
    o
    o
    o
    o
    o
    o
    o
    0
    o
    o
    o
    o
    
    
    
    1
    1
    -o
    —«
    
    o
    	 o
    o
    1
    t
    1
    o
    	 'O
    o
    1
    o
    	 D
    1
    1
    1
    o
    - " o
    a
    tn
    in
    o
    •*
    o
    o
    r-4 CO
    O CO
    CM CM
    i
    CO
    CM
    r-
    o
    1
    -o -»
    CM — t
    00
    o
    o
    0
    o
    o
    o
    0
    o
    o
    o
    in o
    i~* «3
    
    
    
    1
    o
    CM
    o
    i
    1
    o
    — o
    o
    1
    1
    o
    • o
    o
    1
    1
    1
    o
    o
    1
    1
    1
    o
    — TO
    
    m
    0
    CM
    o
    o
    CO
    CO
    CM
    en
    CM
    o
    1
    r-
    o
    o
    o
    o
    o
    o
    o
    o
    0
    o
    o
    o o
    CO »-«
    
    
    
    1
    r-
    o
    CM
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    	
    
    O .-"
    to o
    IV
    *
    1
    1
    1
    o
    e>
    o
    1
    1
    1
    o
    -—(5
    o
    1
    1
    o
    0
    o
    o
    m
    	 'Cfi
    o
    ~H
    
    CM
    o
    CM
    O CM
    O
    o
    CM
    0
    f^t
    O
    in
    C\J
    o
    o
    o
    o
    o
    o
    o
    o
    o
    o
    tn
    "
    
    
    
    -0
    r-*
    fM
    SSSSSSSSKSSSSSSS
    * COLUMN 3 ENTRIES) S = SCATTERED LAYER t * BROKEN LAYER X« OBSCURATION C*X" APPEARING IN CLOUD TYPE COLUMNS NOTEi "+" DENOTES "HEAVY**
    C* MO CLOUD 0 * OVERCAST LAYER DENOTES OUCUHtHC PHENOMENA OTHER THAN POC) "-" DENOTES" **PARTIAL." "LIGHT." "THIN**, OR "MINUS** AS APPROPRIATE
    Fip. 3d ZiErsnvABH PINVEV
    SKDU^AtiSEO SJ^KHS OTSS ftBOd
    171 -
    

    -------
    | NOAAr-ormtt-:** . ' ci ipFArP WFATWFR OR^FRVATION^ NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION
    1 (».73) oUKrM\-C YYCMinnK \^DOCr\YAMiw.^o ENVIRONMENTAL DATA SERVICE
    r
    u
    E
    JJ
    J
    -t
    <
    X
    j
    -i
    J
    <
    E
    3
    4
    E
    
    
    
    B
    X
    eg
    
    STATION NAME
    WRAN EDIT TEST #1
    CO
    « 0
    >.
    STATION NO.
    00001
    s
    s
    s
    R
    5
    n
    "
    K
    S
    t
    =
    -
    0
    -
    -
    *
    -
    -
    
    
    
    "
    -J
    o
    u
    an&Y«
    K
    J
    
    "!£»*
    3JAI
    1MHOWY
    NOI1VNHRS
    ti
    i
    o
    DC
    1M9I1H
    adxi
    IHnOKY
    NOIlYI-mnS
    tl 3 AVI QNZ
    3ND LAYER
    LOKliT LAYER
    '"*»",«'°
    3JA1
    iNROHV
    JM9OH
    Ull
    lNnoit>
    K3AOD AJCS
    (*)
    AllQimm
    aAuviaii
    tine ia«
    (^*)
    vina JIB a
    (S3H3NI)
    NOI1VJ.S
    III
    «. .SB
    His
    »- o
    lit
    auntsaitj
    0
    X
    o
    p
    at
    0
    ae
    X
    ac
    X
    *
    =1
    o *•
    St
    at at
    LIQUID
    PRECIP.
    -•asa.
    VISIBILITY
    (MILES)
    >>
    tt
    M
    *
    ^ "
    _1 M
    O M
    a «••
    S *
    
    |VU rfO 'SON
    9NI1I33
    I
    
    o
    
    
    
    
    
    
    
    
    
    
    
    o
    o
    */>
    
    -
    CO
    0
    
    CO
    CM
    r4
    ^t
    r-
    o
    ?
    o
    o
    o
    o
    o
    1
    to
    o
    0
    * o
    *»•
    •-4
    o
    * o
    
    
    w
    
    o
    o
    o
    CM
    "
    
    
    
    
    
    — __
    
    
    
    
    o
    o
    
    -
    CM
    O
    0
    CM
    CM
    *
    CO
    CM
    CM
    CO
    O
    IM
    CM
    O
    O
    U.
    O
    O
    0
    o
    1
    n:
    * o
    o
    m
    o
    
    
    
    o
    o
    0
     o
    o o
    0
    u.
    0
    o
    o
    o
    o
    o
    o
    
    
    — «
    o
    0
    o
    CM
    2
    
    
    
    
    
    
    
    
    
    
    
    r-t
    o
    o
    " ej
    o
    u_
    
    r-4
    o
    o
    en
    en
    .» CM
    CO CO
    CM CM
    1-
    0
    in
    o
    en
    CM en
    0 O-
    O 0-
    0
    u_
    0
    o
    1
    to
    0
    o
    o
    •J-
    0
    o
    
    °
    
    0
    o
    en
    CM
    «
    
    
    
    
    
    
    
    
    
    
    
    
    OOOpCdDT
    i i
    "
    eg
    O>
    o
    in
    «n
    m
    o
    * CM
    m
    o
    eg
    o
    en
    enm
    C-C"
    o
    u.
    o
    o
    CO
    0
    o
    0
    0
    o
    o
    Ci
    
    -^
    o
    o
    •o
    CM
    
    1
    1
    1
    o
    o
    o
    1
    o
    o
    0
    o
    I-
    o
    0
    •-4
    o
    o
    "a;
    0
    	 KT
    "
    en
    CO
    o
    en
    *
    o
    i-
    * eg
    CM
    O-
    o
    eg
    CO
    
    0
    o
    o
    u
    to
    0
    o
    1
    3:
    » o
    o
    03
    o
    	 v
    t~
    o
    o-
    eg
    
    1
    1
    1
    o
    0
    0
    1
    1
    o
    	 o
    o
    .-I
    i
    i
    i
    o
    D
    CO
    en
    	 o
    to
    —-a
    .-i
    ^
    o
    o
    •o
    en
    en
    o
    * CM
    CM
    o
    CM
    en
    m
    CM
    03
    
    O
    o
    o
    o
    o
    o
    o
    o
    o
    o
    •-4
    
    	 o
    	 0
    en
    o
    CM
    CM
    IN
    •-4
    
    
    
    
    
    
    
    O
    •4-
    0
    
    * eg
    t-
    en
    CM
    CO
    CM
    en
    CO
    
    0
    0
    o
    o
    o
    0
    o
    o
    0
    m
    
    
    -r
    o
    o
    CM
    
    p-4
    
    
    
    
    
    
    
    
    
    
    
    O
    en
    	 o
    to
    cf
    .-i
    r-4
    O
    o
    CM
    eg
    CO
    eg
    r-4
    1-4
    CM
    eg
    "
    o
    o>
    
    o
    o
    o
    o
    10
    o
    0
    * o
    CM
    O
    * o
    
    	 tr
    en
    o
    o
    eg
    
    1
    1
    1
    o
    cs
    0
    1
    o
    o
    r>
    D
    O
    en
    «
    w
    0
    en
    	 0
    10
    w
    •0
    o
    o
    CM
    CM
    U
    CO
    CM
    03
    0
    en
    •o
    o
    o
    
    o
    o
    o
    o
    to
    o
    o
    o
    « o
    o
    » o
    "
    ™ -to
    ft
    0
    o
    CM
    
    
    
    
    
    
    
    
    
    
    
    
    	
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    o *
    in o
    CM
    •f
    1
    1
    1
    o
    — e>
    CO
    1
    1
    1
    O
    	 e>
    CO
    1
    1
    1
    o
    0
    o
    o
    	 en
    	 m
    CD
    •O
    o
    CM
    X
    CC
    CM
    •"*
    CO
    
    CO
    o
    o o-
    o o
    o
    o
    o
    o
    o
    o
    0
    0
    o
    in
    m
    
    
    
    i
    i
    i
    •o
    CM
    O
    1
    1
    1
    O
    0
    o
    1
    1
    1
    o
    o
    1
    o
    o
    1
    	 1
    o
    o
    0
    i-
    o
    o
    eg
    c\
    r-
    03
    CM
    CM
    CO
    
    O
    CM C\
    CM -<
    O O
    O
    o
    o
    o
    0
    o
    o
    o
    o
    0
    CM
    
    - 0
    l_)
    I
    1
    CM
    CM
    CM
    O
    
    
    
    
    
    	
    
    
    
    
    	 	 O
    cy
    	 e>
    o
    03
    0
    CM
    CM
    ir\
    eg
    r-
    co
    CM
    O
    CO
    
    CM
    O
    O
    0
    0
    0
    o
    o
    o
    o
    -3
    0
    m
    
    l-J
    •f
    o
    — » -d
    a. o
    OIM
    0
    1
    o
    0
    o
    1
    1
    o
    0
    o
    1
    o
    
    1
    1
    	 1
    o
    o
    °
    o
    o
    o
    o
    a
    *
    o
    
    CM •-*
    -• 0
    O 1
    o
    o
    o
    o
    o
    o
    o
    o
    0
    o
    m
    
    — 0
    "
    1
    1
    CM >J-
    a. o
    O 'M
    
    1
    A
    o
    <3
    *
    en
    1
    1
    1
    o
    *
    en
    I
    1
    0
    	 e>
    m
    o
    o
    *
    en
    -r
    o
    i-
    CM
    «
    CO
    CM
    r-
    en
    
    "
    o
    O
    o
    o
    o
    o
    o
    0
    o
    o
    o
    m
    
    	
    *
    I
    I
    -o
    0
    0 PHENOMENA OTMEU THAM POC) "-" DENOTM" "PAKTIAL." "LIGHT." "THIN". OX "MINUS" AS APP«OP«IATE
    
    K I a eKOHEN LATEH X> O5SCUKATIOH I"
    0 * OVEXCAST LATEX DENOTES OiSCU
    TTEXED LAY
    CLOUD
    sS
    u *
    Ml U
    ie
    X
    i
    3
    s--ss°---"-----s
    Fig. 3e
        172
    SKOUVAMSiO 33VJKnS OtK HKOJ
    

    -------
    1 KOAA form t2-3*» SURFACE WFATHFR ORSFRVATIONS RATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION
    1 IMS OUKPMVwC YYCAmCK VJD3CKVAIIWINO EMVIRONWSMTAL DATA SERVICE
    NATIONAL CUMATIC CENTER
    
    CLOUDS AND OBSCURING PHENOMENA
    r
    
    
    
    STATION NAME
    WBAN EDIT TFST SI
    CO
    0
    
    O
    
    STATION NO.
    00001 '
    S
    s
    s
    "
    ™
    s
    ft
    s
    o
    =
    -
    t
    s
    •
    -
    -
    »
    *
    -
    -
    o
    u
    3 nor jo
    w
    X
    r"."il°"
    3411
    1NHOHV
    B31Y1 OSC
    ]RD LAYER
    '">S£"
    3411
    iNnowv
    D3AV1 ONE
    INO LAYER
    LOWEST LAYER
    '"js,y
    3411
    1NHOHV
    1HM1H
    3411
    J»
    no>.»
    »3AO> A»S
    1Y101
    AllomnH
    3AUV13tl
    •ins i»
    Uc)
    9ing ABO
    (13H3HI)
    NOIiViS
    ;£l=
    a .°S
    ""
    slf
    (•!»«!
    13A31 VIS
    WEATHER AND/OR OBSTRUCTION TO VISION
    s'i
    PROZEN
    PRECIP.
    D fe
    ^ 9.
    • 0 BO*HI01
    2
    . SKY
    SYM&OLS
    S
    „ "
    „, "
    „ "
    „ -
    •14 JO 'SOU
    9NI1I»
    1
    
    
    o
    1
    t
    o
    0
    0
    1
    1
    o
    	 o
    o
    1
    o
    
    1
    1
    0
    
    
    o
    CM
    N
    CO
     SCATTERED LAYER • • BROKEN LAYER Urn OBSCURATION <"K" APPEARING IN CLOUD TYPE COLUMNS NOTHl *H" DENOTES "HEAVY"
    C» NO CLOUD 0 « OVERCAST LAYER DENOTE! CISCURINO PHENOMENA OTHER THAN POO) "_" DENOTES" "PARTIAL," "LIGHT," "THIN", 0* "MINUS" AS APPHOPRIATt!
    Fig. 3f
        173
    

    -------
    1 HOA«. f a-m u-3»* . ^IIRFATP WFATHFR OBSERVATIONS »*THMAL OCEANIC AND ATMOSPHERIC ADMINISTRATION
    1 (MM OUKPAt-C YYCAinCK UD3CKYAIIWINO ENVIRONMENTAL DATA SERVICE
    NATIONAL CLIMATIC CENTSR |
    
    
    kf
    5
    
    i
    00
    o
    o
    r-
    STATION NO. |V
    oooox 1
    s
    n
    R
    a
    s
    s
    s
    «
    s
    s
    =
    E
    ~
    -
    '
    *
    -
    '
    "
    "
    «•
    8
    
    JftOYJO
    «
    M
    £
    "is.r
    94J.I
    1NOOHV
    NOIXV
    )RD LAYER
    WMRt
    1H911M
    301
    INnoKY
    NOUVMwns
    >NO LAYER
    LOVCST LAYER
    "is,"r
    MAI
    imotn
    "ISff"
    MAI
    1NHOMV
    U3AOD AXf
    1Y101
    (t)
    AllOIHnH
    1AI1V1JII
    BlrtB 194
    U.)
    (I3HDHI)
    j»nsi3»J
    NOllYlf
    £ i
    |
    lil|
    °£~"
    fSVMV
    T3A3TT3S
    X
    O
    o
    \
    o
    1
    o
    ac
    X
    -'I
    e •»
    LIQUID
    PRECIP.
    M DO*»Mi
    \
    «<
    1
    J
    „ '
    „ "
    „ "
    «* ^
    I'la JO'SOH
    9NI1U3
    
    
    5
    
    J- L_
    
    
    1 n
    i n
    an
    ca
    (a
    Fe
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    .111111
    
    ventory of No. 1 car
    formation. Note the
    d the 0400 card on 1
    rd count, 222, is 2
    t 8 observa-t ions per
    bruary.
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    o
    o
    
    ~*
    
    
    
    
    
    
    
    
    
    
    000
    o o o
    
    o o o
    o o"6
    o do"
    
    000
    
    
    
    o C- o
    
    o rt rs
    
    
    
    
    
    
    
    
    
    
    O o O
    O O O
    
    o o o
    o'o'o
    O 0 O
    
    o o o
    
    o'6'o
    
    
    o c o
    
    " n o
    
    
    
    
    
    
    
    
    
    
    000
    o o o
    
    o o o
    o'o'o
    o o o
    
    o oo
    
    000
    
    
    
    
    so o- a
    
    
    
    
    
    
    
    
    
    
    0~0 0
    o o o
    
    o o o
    
    o"5 o
    
    o o o
    
    S o~o
    
    oTTo
    
    -f CM ffl
    _1
    •ds
    >t t
    he
    les
    • da
    
    
    
    
    
    
    
    
    
    
    o o o
    o oo
    
    o o o
    —t ft ft
    o'oo
    o o o
    
    o o o
    
    o'-o'o
    
    o o o
    
    *"lf. -0
    _JL. L J
    conta? ni ng
    he 0100 ca
    27th are rr
    s than the
    y) for a n
    
    
    
    
    
    
    
    
    
    
    0 0 0
    o o o
    
    o o o
    ft f-t ft
    
    o'oo
    
    o o o
    
    °_?_?
    
    o o o
    
    t- 00 O>
    
    
    
    
    
    
    
    
    
    
    o o o
    o o o
    
    o o o
    O'O'O
    o'oo
    
    o o o
    
    ?^_1
    
    o o o
    
    O ft N
    
    
    
    
    
    
    
    
    
    
    o o o
    o o o
    
    o o o
    o'oo
    000
    
    o o o
    
    0"0"0
    
    o o
    
    rt * in
    _l
    cl
    rd
    iss
    pr
    on-
    
    
    
    
    
    
    
    
    
    
    000
    00 O
    
    000
    —4 r-t t~t
    a~c o
    o o o
    
    o o o
    
    0 0
    
    
    coo
    
    _J .
    -
    
    oud layer
    on the 24t
    ing. The
    ogram coun
    leap year
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    •o r* oo o* o ft
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    fM
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    h -
    t -
    -
    -
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    «TES "NEAYY"
    HOTEI" "PARTIAL," "LI6NT," "THIN", OK "MINUS" AS APP*OT«UTE
    > Q
    t
    I
    • > MOKEN LAYER X* OMCUHAT1OH f»" APPEARIH* IM CLOUO TYPE COLUMNS NO
    0 > OYEHCAST LAYER DENOTES OUCUftlHO PHENOMENA OTHER THAN POO)
    
    TTEREO LAY
    CLOUO
    u i
    i £
    M
    t-
    M
    8
    
    oo
    Fig. 3g
         174
                                        z«i -Blur/van
                                                          ot«»SM
    

    -------
    VI.
    Visual Checks
    
    Data that are not keyed onto tape are given a limited visual scan
    as a random check for quality control and consistency of climatolog-
    ical data entries.
    
    WBAN No. 3 Card Edit and Listing
    
    The No. 3 WBAN Card images  (Fig.4 ) contain varying daily Climatolog-
    ical data for individual stations for the period midnight-midnight LST
    , 	
    STATION
    NUMBER
    00000
    1 2 3 4 5
    1 1 1 1 1
    r~DAtr~
    Y*
    00
    9 7
    1 1
    uo
    00
    1 9
    1 1
    
    DO NOT
    PUNCH IN
    THESE
    COLUMNS
    
    6GE6E
    77777
    
    88888
    
    99999
    
    
    66
    77
    
    88
    
    99
    
    AfK
    6
    7
    
    8
    
    9
    
    DAY
    00
    !« 11
    1 1
    7?
    
    33
    4
    5
    
    6
    7
    
    8
    
    9
    
    MAX
    TEMP
    CFI
    3
    * 0 0
    a 13 14
    1 1
    77
    
    33
    44
    55
    
    66
    77
    
    88
    
    99
    
    1231 BSC
    HIM
    TEMP
    l-F]
    3
    * 0 0
    15 16 17
    1 1
    77
    
    33
    44
    55
    
    66
    77
    
    88
    
    99
    
    
    PRECIP
    (1.-.J
    E
    OOlOO
    II 19120 21
    I 111 1
    77i?7
    1
    33133
    1
    44144
    1
    5155
    
    6|66
    1
    7177
    1
    8188
    1
    9199
    i
    II rt 11
    SNOW-
    FALL
    
    -------
     VII.  Program and Tape Control Procedures
    
           In order to provide appropriate edit information and to meet publi-
           cation deadlines, the records are placed on two separate  tapes each
           month, preceded by certain station program and priority information.
    
           A.  Tape No.l contains data for all  stations in the LCD and CD
               programs and Tape No.2 contains  data for all others.
    
           B.  A thirteen-digit Program and Priority Editing Code is provided
               in the station's header on the name tape.  Positions  1-11 indi-
               cate the programs in which the station participates.   The figure
               "1" in the various positions indicates that the station has that
               program; "0" indicates it does not.
    
               1.  Sunshine data keyed in cols. 54-58.
               2.  Fastest mile in compass points.
               3.  Station in 1009 program.
               4.  Station has monthly temperature normals.
               5.  Station has mid-monthly temperature normals.
               6.  Station has monthly precipitation normals.
               7.  Station has degree day normals.
               8.  Station in Extended Forecast program.
               9.  "Days With" are keyed in cols. 41-51.
              10.  "Water Equivalent" keyed in  cols. 63-65
                   when snow depth 002, or greater.
              11.  Station has CD number.
              12.  Station has LCD, coded 1; no LCD, coded 2.
              13.  Station is operating, coded 1; closed, coded zero
                   (a convenience in using the  name tape as a reference
                   in other programs).
    
    VIII.  Edit and Listing
    
           The machine edit is designed to detect various inconsistencies of
           data.  The corrected  (updated) listings provide various computations
           of sums, averages, departures from normal and counts of number of
           occurrences, etc., used in the climatological programs.
    
           Sample  listings  appear on pages 20a and 20b.
    
           The fields for all inconsistencies noted in the edit are flagged
           with appropriate symbols in the column(s) to the right of the
           field(s) questioned.
    
           Checks are made and field(s) flagged for review, according to the
           outline below:
    
           A.  All columns 1-80
    
               "12" overpunch.
    
                                       176
    

    -------
    I
    I
    8
    O
    
    c
    c
    o
    0
    o
    z
    c
    t—
    t-
    <3
    t-
    V
    >
    u.
    H
    • c
    c
    f-
    vl
    •—
    c
    c
    c
    c
    UJ
    o
    o
    EATHER TW ES
    . Z 0 KH BS MF
    OCCURRENCES Of W
    f T (P A R S
    2?
    SUNSHINE
    HOURS K
    HJ Of
    I5
    ll
    20
    C
    Od
    z
    L
    35
    ii
    ii
    X ^
    S"
    c
    o —• m -o ^1-
    rn O ^f -O ir\ ;
    o o o — « «-t,
    ^S \^ jf
    UJ 3:
    ^-« Z
    *-*
    % ^t A -l
    o co r- o o
    r^ & C3 & -t
    S (si UJ
    -O Z •-!
    m z (o
    O •-• (SI O
    •-t in (si r-«
    * *
    
    O O O O O
    ft I
    fl r-t f*~ *£" -T
    * O O CO
    cs» O O<
    «
    3: 2 IU;
    r- 21Z
    m .
    r-« o rsi -O
    rsi CM (Si .-< <
    *i
    •M ^« O •-• i-
    csl 000
    =? O O O
    0 O O O
    CO O O --<|
    r-' o o -«
    •O 'r+ O •-«
    u^ o .0 o
    ,* 0 O •-•
    CA O O O
    rsi 1-1 O ^-«
    ~$\ Q* fi^ ^3
    ^ &• »-4 ^3
    O • in o o
    (SI O
    r- 4 ^J f*^ C5
    (si O
    rH r-«
    *
    ro c^ co
    r^- (A (O
    *
    csi rA rH in
    O CM CSI .-•
    o -»•
    
    o o o *$•
    
    o CM o m
    o rj>o (si
    
    O O (7s
    •^ o o o o i
    o n ir» o csi <
    D (si o O r-J (
    S 3 3 S 3:;
    IS 2-
    •si »o •-> m in
    >J ^4 ff) ^t ^<
    -< .-i ^-< O O '<
    D O O O O
    O O O O O
    O O O O O
    r-4 •-» O O O
    O O O -• — i
    r-* r~« O O O
    o o o o o
    o •-> o o o
    o o o o o
    r-l .-• _4 O O
    CO O Cl O(SI
    •-4
    co O rsi o o
    r-t •-<
    n o co O (•<•>
    03 O w^L^(>(slrj
    in -f rj
    ~o A o -*
    *J ^f ~3 csi ro
    >0 1? co 0 o N n -* in
    •^ r- csi —i so
    D M -O O O .
    >J (SI r-« CNJ O
    S 3 3 3: 2
    z^z-zz
    £. 10 3
    co -O 1— rA co
    D O O O O
    O O O O O
    o o o o o
    o o o o o
    O O O 0 O
    O O -i O O
    o o o o o'
    o o o o o
    o o 6 o o
    o o o o o
    O O .-i O O
    f*- (O CO -^ »-«
    -o o o»  csi o cst r-«
    M csi csi en en
    -< cst (si co r-
    M .-t esi m csi
    ." •
    - • •
    ^-l O1 ~t
    S 01 - Z
    1
    D O O O r*
    o -• o o o
    o o o o o
    o o o o o
    o o o o >-i
    — • ••* O O r*
    r-l »^ O O »-«
    o o o o o
    O O O O -i
    O O O O -i
    O r-l O O •-*
    r~ in o IA o
    ~
    A in o -o o
    ro O CO *—* O
    -o m o o
    CO -t CO O O
    WD IAOO
    
    O O O O IA
    ;sj rs( csi
    r- >o co csi (si
    -1 C<1 CS) r-t CSI
    rA m CM
    
    OO«-«OO| OOO
    1"
    D O O O O
    
     IA in
    CM r- o csi -i
    O rsi o O O
    ' CM
    •O ro ro I*- co
    ro -J in -J- (si
    o (sj csi co r-
    rj CSI -• r-l PI
    D O O
    300 .
    300
    D O O
    D O O
    -< o o
    boo
    o o o
    0 O 0
    000
    300
    •- en ^
    •Q t-'CO
    •t O 1A
    -• CO IA
    A O -O
    
    •>- o o •
    si ro (A .
    SJ (SI (SI
    
    
    o o
    
    0 0 -
    o o
    
    o -o -a
    ~- o o
    rSl r-l CM
    INJr-ICSI Ui • CM j-t r-l CM _ - •
    -i- -a r- rsi ^
    ^ CA C '1 CSJ C^a
    ,-o r- ca c* o
    xnr-ocoot^-co-o
    n"T"--
    3: 2:
    NCMIMfMCMNCMCMfMrt
    
    r-
    r-(
    •V
    r-l
    •o
    CO
    IA
    IA
    (M
    •o
    CO
    
    
    
    CO
    -o
    (M
    o
    »A
    CO
    
    •a-
    •o
    o
    CSI
    o
    1
    
    
    «SI
    SA
    CO
    >f
    •
    
    
    
    
    
    
    IA
    CO
    (M
    CO
    r-
    cs
    0>
    m
    
    	
    
    
    
    
    
    
    -4-
    (SI
    IA
    r-
    I
    •o
    
    
    3 «
    O:
    UJ
    Z c*»
    IA
    •
    
    
    (SI
    O
    kTV
    
    (SI
    O
    (SI
    O IA
    • CSI
    (SI
    
    .
    o
    -J-
    t
    O
    sr \f\
    s
    c*
    5
    (1) Water oquivaloni of mow and ice on ground.
    SNOW MLY (2) If 'no / (salidus) appears, speeds are gusts. Figures for directions are lens
    ) 0+ ClEAIt ClOY ClOY of degrees from Irue North; i.e., 9 = Eosl, 18 =SoutS, 27 = Wesl, and 34 =Norlh.
    4 1178' When directions ore in tons of degrees, speeds ore fastest observed 1-minule values.
    (3) S-S indicates sunrise lo sunset and M-M midnight lo midnight.
    (4) Enlry of 1 indicates occurrenco, 0 indicates no occurrenco. Weoiher types orei
    .. • • F =fog, visibility moro than '/Smile; T = thunderstorm; Ifc &~ K, A =hoill R = roinj
    —,- . 	 S =snowj 2 =glazo; D =dust, visibility Vjmilo or less; KH =imoko or hoio or both;
    BS = blowing »now; ond HF =heoxy fog • (visibility 14 mil* or less du« lo fog).
    + (SI |
    *!«! i
    o^-H IA
    c
    . . o
    .i » N 5
    Si V •;
    Tf^J
    Si y j5
    2 | | — o
    Q ' A 0
    «. '. *~~co ^
    2 si v
    z
                                            177
    

    -------
          -
          Olu
          00
                               o  rum mix »« o«t nrainom r-+ me* «A o> •»    n»    «-»
                                           •«•«•«••••••«»*        -«       on    •» o
            0
    
    
            a
          IT*
               Or- CMVtCM
               0*»<
                   CMRItriCMfX^IWfM
                          oooo
    
                        ooooo
               ooooooooooooooo
    5 0
    
    5 *
    S o
      CM
            -
               ooooo
    
    
               _• O O'*-*
                        OOOOOOOOX^
           S3
      	  _   -<—«o> r- o o «in r- —
    »«O'<» o  «coo  «r-r»^
               -rmc««
               r-» OCMO
    
           gi
           IS
      •»
    
    
      X
      o "^
           •:
               ooooo
               CW O O -
                                         O <
    
    
                                         00-
    
    
                                         O I
                                           ooooooooo ooooo
                                         oooooooooo ooooo
                                                  OOO-^O 3OOOO
                                         oo-<»--«ooo-^ >-
                                         O «fk !*• Ol>-
                                         n m m a> tn
                                                          U tMUl til I
                                                               z:
                                                          ooooo
                                                  O OO t-O
                                                  oo> m r-r^
                                                 OOOO
    
    
                                                >~t.«,«O
                                                            r- •» « o
                                                           ••»••» CO O
                                                            IUUIIU IU
                                                                 z
                                                           or- o 01 o
                                                           ooooo
                                                           ooooo
                                                   -^
    
                                                                               S»
    
                                                                                                Z
    
    
                                                                                                O:
                                                                                                u O
                                                                                                fS
                                                                                           o*   u
                                                                                           t-f    - 5
                                                                                                  o
                                                                                                2 0
                XDQ ai^t jo XjDiuu
                                                                            jtY puo jiu
                                                 178
    

    -------
    B.  Day (col. 10-11)
    
        1.  ^>  Possible for month
        2.  Missing
    
    C.  Max. Temp. (cols. 12-14)
    
        Legal punches are:  X, 0, or 1 in col. 12 and 0-9 in cols. 13-14.
    
        1.  Illegal punches
        2.  < Min. Temp  (cols. 15-17)
        3.  < Min. Temp. (cols.  15-17)  of previous day
            Print negative values (X punch in col. 12)  with a minus (-)
            preceding numerical values in cols. 13-14.
    
    D.  Min. Temp. (cols. 15-17)
    
        Legal punches are:  0 or X for col. 15 and 0-9 for cols. 16-17.
    
        1.  Illegal punches
        2.  > Max. Temp. (cols.  12-14)  of previous day
            Print negative values (X punch in col. 15)  with a minus (-)
            preceding numerical values in cols. 16-17.
    
    E.  Precipitation (cols. 18-21)
    
        Legal punches are 0-9 or BBBX
    
        1.  Illegal punches
        2.  "0000" with cols. 22-24 other than "000" or "BBB"
        3.  Col. 21 "X" with cols. 18-20 other than B
        4.  Any of cols. 18-20B with col. 21 "0-9"
        5.  > 1000
            Print BBBX as "T."  Also print "OOOX" as "T" but flag as
            error as indicated above.
    
    F.  Snowfall  (cols. 22-24)
    
        Legal punches are 0-9 or BBX
    
        1.  Illegal punches
        2.  Col. 24 "X" with cols. 22 and 23 other than B
        3.  Cols. 22 or 23 B with col. 24 "0-9"
        4.  > 200
            Print BBX as  "T."  Also print "OOX" as "T," but flag as
            error as indicated above.
    
    G.  Snow Depth (cols. 25-27)
    
        Legal punches are 0-9 or BBX.  May also be B for entire field.
    
        1.  Illegal punches
        2.  Other than "000" with cols. 22-27 for preceding day and cols.
            22-24 for same day punched all O's.
    
                                  179
    

    -------
        3.  Col. 27 "X" with cols.  25 and 26 other than "B"
    
        4.  Cols. 25 or 26 B with col. 27 "0-9"
            Print "BBX" as "T."  Also print "OCX" as "T," but flag as error
            as indicated above.
    
    H.  Peak Gusts, Direction and Time (cols. 28-35)
    
        Legal punches are:
    
        0-9 for cols. 28-30,
        The "Alpha" Compass Point Code for cols. 31-32, and
        000 - 239 for cols. 33-35 or entire field may be "B."
        An "X" in col. 31 is programmed to convert peak gust speeds from
        knots to mph and publish under fastest mile heading with "/" following
        the direction as an indicator of peak gust speed.  Omission of "X" in
        col. 31 will be flagged by "$" following the direction on the edit
        listing.
        A "#" following the speed and direction spaces on the edit calls atten-
        tion to entry of speed with direction omitted.
    
        The Alpha Compass Point punches are:
    
        00  C   (calm)  22   NE    44   SE    66   SW
        11  N          32  ENE    54  SSE    76  WSW
        12  NNE        33    E    55    S    77    W
        18  NNW        34  ESE    56  SSW    78  WNW
    
        1.  Illegal punches
        2.  Cols. 28-30 > 050
            Print in Alpha Code Letters.
    
    I.  "Days With"  (cols. 41-51)
    
        Legal punches  are 0 or 1 if in station's program, otherwise all cols.
        should be B.   (If punched, all columns should be punched.)
    
        1.  Illegal punches
        2.  CoL. 41  "0" with  "1" in col. 51
        3.  Col. 43  "1" with  either or both  fields  (cols.  18-21, 22-24)
            all  O's.
        4.  Col. 43  "1" with  min. temp.  (cols. 15-17) > 044
        5.  Col. 44  "1" with  cols. 18-21 "0000"
        6.  Col. 44  "1" with  cols. 43 & 46  "0" & cols. 22-24 other than "0000"
        7.  Col. 45  "1" with  cols. 18-21 "0000"
        8.  Col. 46  "1" with  either or both  fields  (cols.  18-21, 22-24) all O's.
        9.  Col. 46  "1" with  min. temp.  (cols. 15-17) > 044
       10.  Col. 47  "1" with  col. 45  "0" (some exceptions, but flag)
       11.  Col. 47  "1" with  min. temp.  (cols. 15-17) > 039
       12.  Col. 50  "1" with  either cols. 28-30 or 59-60  (if punched 010 or 10
            respectively).
    
    
                                        180
    

    -------
    J.  Sky Cover (cols. 52, 53)
    
        Legal punches are 0-9 and X for both cols, or "B" for col. 53
        if cols. 41-51 are B.
    
        1.  Col. 52 B with other than B in col. 53
        2.  col. 52 > 3 greater than col. 53
        3.  Col. 53 other than B with cols. 41-51 B
        4.  Col. 53 > 2 greater than col. 52
    
            Print "X" punches as "10"
    
    K.  Sunshine and Percent of Possible (cols. 54-58)
    
        Legal punches are:  000-199 for cols. 54-56, 0-9 or X for col. 57
        and 0-9 or B for col. 58.  Also entire field may be B.
    
        1.  Illegal punches
        2.  Col. 57 "X" with underpunch
        3.  Cols. 54-58 are blank
        4.  Col. 57 "X" with other than B in col. 58
    
            Print as "100" when cols. 57-58 punched "XB."  Also print
            "100 when col. 57 has an "X" punch regardless of other illegal
            punching in either or both cols. 57 or 58, but flag as error
            as indicated above.
    
        5.  With cols. 54-56 punched 000, cols. 57-58 with other than zeros
        6.  With cols. 57-58 punched 000, cols. 54-56 with other than zeros
        7.  With cols. 54-56 punched greater than 000, cols. 57-58 will be
            greater than 00.
    
    L.  Fastest Mile and Direction (cols. 59-62)
    
        Legal punches are:  0-9 for cols. 59-60 with an X overpunch permitted
        in col. 59 for speeds of *> 100, 00-36 for cols. 61-62 if neither
        col. has an "X" overpunch, and the "Alpha" Compass Point Code (see
       VIII,H above) if either or both (cols. 61-62) have an "X" overpunch.
    
        Illegal punches:
    
        1.  Cols. 59-60 "00" (without "X" overpunch in col. 59) with
            other than "00" in cols. 61-62.
        2.  Cols. 59-60 > 50.
        3.  Cols. 59, 61 or 62 punched "X" without an underpunch 0-9.
        4.  Col. 62 "X" overpunched with no  "X" overpunch in col. 61.
                                    181
    

    -------
              Print:
    
              1.  "1" preceding speed punched in cols. 59-60 when col. 59 has an
                  "X" overpunch.
    
              2.  Direction in the "Alpha" code letters when either or both cols.
                  61, 62 have an "X" overpunch.  (See VIII, H above.)
    
              3.  A dash (-)  in col. following direction with col. 61 has an "X"
                  overpunch.
    
              4.  A plus (+)  in 2nd col. following the direction when col. 62 has
                  an "X" overpunch.
    
          M.  Water Equivalent  (cols. 63-65)
    
              Legal punches are:  0-9 or B.  Water Equivalent is in inches & tenths.
    
              Illegal punches:  B in any of cols. 63-64 with col. 65 punched 0-9.
    
              Other cols. (36-40, 66, 68-80) should be B.
    
    IX.   Machine Computations
    
          Various sums, means, departures  (from pre-programmed normals), frequency
          counts, summary cards, etc., necessary in the verification program and
          used in the preparation of formats for the LCD, CDNS, and Table J are
          made by the computer.
    
          Print the sums, averages, etc.,  from the data available when some days
          and/or items are missing.
    
          A.  Daily Computation are made for:
    
              1.  Average temperature
              2.  Departure from normal
              3.  Degree days
    
          B.  Monthly Sums are  computed and listed for:
    
              1.  Max. temperature
              2.  Min. temperature
              3.  Mean temperature
              4.  Degree days,  heating and cooling
              5.  Precipitation
              6.  Snowfall
              7.  Sunshine
              8.  "Days With" (if in  station's program)
              9.  Sky Cover  (SR-SS & Mid-Mid)
                                           182
    

    -------
    C.  Monthly Averages are computed and listed for:
    
        1.  Max. temperature
        2.  Min. temperature
        3.  Mean temperature (this is 1/2 the sum of the average
            max. and min., C, 1 & 2 above).
        4.  Average percent of possible sunshine (sum of daily
            percentages divided by the number of days).
    
            Monthly percent of possible sunshine is computed from
            total sunshine recorded and the pre-programmed possible
            amount, sunrise to sunset.
                                 183
    

    -------
    D.  Monthly Departures are computed and listed for:
    
        1.  Mean Temperature
        2.  Degree days, heating and cooling
        3.  Precipitation
    
    E.  Seasonal Departure for Degree Days (from seasonal totals carried
        forward from preceding month and current month's total) are com-
        puted and listed.  Season begins with July for heating and January
        for cooling.
    
    P.  Extremes and Dates are selected and listed for:
    
        1.  Highest temperature
        2.  Lowest temperature
        3.  Greatest precipitation
        4.  Greatest Snowfall
        5.  Greatest Snow Depth
        6.  Greatest Wind Speed and Direction
    
        (When the same value occurs on two or more dates, the date of
        the last occurrence followed by a plus (+) is  listed.  Also,
        the direction of the last occurrence of multiple "Greatest Wind
        speed" is printed.)
    
    G.  Frequency Counts are made and listed for:
    
        1.  Temperature
    
            a.  Max. :£ 32
            b.  Max. ^90, except 5^70 for Alaskan stations
            c.  Min. 2:32
            d.  Min. :a 00
    
        2.  Precipitation
    
            a.  Trace  (BBBX)
            b.   > 0001
            c.    >, 0010
            d.   5; 0050
            e.    > 0100
    
        3.  Snowfall
    
            a.   =>: 010
    
        4.  Character  of Day  (SR-SS)
    
            a.  Clear  (Avg. 0-3)
            b.  Partly Cloudy  (Avg. 4-7)
            c.  Cloudy  (Avg. 8-10)   (Punched 8,  9, or  X)
    
    
                                    184
    

    -------
    X.
    Precipitation Data Card Images
           A.  Program Involved.  Hourly precipitation,  monthly extremes, and
               maximum precipitation.
    
               1.  Hourly precipitation, greatest amounts of precipitation,
                   snowfall, and snow depth and maximum precipitation are con-
                   tained in a series of tape formats currently known as the
                   HPD Deck.  These are identified as to station, year, and
                   month in the same manner as the WBAN #1 and #3 cards.
    STATION
    NUMBER
    OOOOC
    1 2 1 * 5
    11111
    
    YR
    00
    C I
    1 1
    DATE
    WO
    00
    1 >
    1 1
    DAY
    00
    iQ II
    1 1
    |c«flO W.MBSR 1
    22222|22|22|2 2
    (CASO HUMSER 2
    3 33 3 3|3 3|3 3i3 3
    [CAP? DUMBER 3
    44444)4 4I44J4 4
    JCARO «uuacR 4
    55555
    55
    00 HOT
    PUNCH IN
    T1CSE CCUUHi
    9999999
    1 1 ! 4 ill 1
    55
    NS
    I 99
    "
    5 i
    66
    7 7
    86
    99
    it 11
    .
    8
    0
    i?
    1
    
    2
    
    3
    
    4
    5
    6
    7
    E
    c
    11
    c«os • c---j 2 ;*»- i
    1
    IE
    0100
    !j|l4 IS
    111 1
    0.00
    2122
    1300
    3|33
    9 MM
    4|44
    1
    5|55
    6|6G
    '!"
    he
    3l99
    1) U IS
    1
    !n
    i
    0:00
    5|IJ IS
    111 1
    0200
    ?I22
    1400
    3133
    10 M'.N
    4|44
    1
    5155
    1
    E|66
    1
    7|77
    1
    6)88
    1
    9)99
    KI7II
    1
    'It
    1
    0100
    nb a
    111 1
    OiOO
    21:2
    IWO
    313 3
    .3 UH4
    4|44
    1
    5|55
    t
    6166
    1
    7177
    1
    818 0
    1
    9199
    HHII
    1
    :'E
    OIOO
    nb ;t
    ill i
    0400
    212 2
    isoo
    3133
    20 MiN
    4|44
    1
    5135
    1
    6)66
    1
    717 7
    1
    eise
    i
    9199
    nilU
    1
    •JE
    0100
    -.Is .-7
    111 1
    0500
    '.M 2
    17X>
    3133
    W M"l
    4j;4
    1
    5|55
    1
    6|6 6
    1
    717 7
    1
    8|88
    1
    9190
    Bai7
    1
    •!c
    OIOO
    3sl:j M
    ih 1
    O6UO
    2I2 7
    leoo
    3I33
    45 M.S
    4U4
    1
    5155
    1
    6|G6
    1
    7177
    1
    8138
    1
    919 9
    nil 30
    1
    ,'E
    o;oo
    'ii t
    0700
    2122
    
    _•_•. - - .. .
    TS
    3
    0 010 03 0 OJO 0
    i ih i
    1
    1 2|2 2
    ,i,
    4j
    j 515 5
    1
    56|6 6
    1
    7 7|7 7
    1
    88188
    1
    3 919 9
    57 53 53 £3
    5
    Q
    
    51
    1 1
    2 2
    3 3
    ',
    
    5
    6
    7
    8
    9
    ti a
    1 1
    2 2
    33
    4
    
    5
    6
    7
    8
    9
    C4 IS
    
    •R
    ~!
    
    
    0 O'OJO'I 0
    1 111
    ,i
    1
    3 3,3
    1
    4 4U
    1
    5515
    1
    6 S|6
    1
    77|7
    1
    9813
    I
    9 919
    u t; &
    it
    z.
    <
    ll
    <
    £S
    i i
    22
    3 o
    _i
    5
    E
    7
    8
    9
    ESI
    0~0
    1 1
    22
    3 3
    11
    5 'j
    86
    7 7
    38
    9 9
    '2 73
    Ji_
    00
    1 1
    2 2
    33
    4J
    5l
    66
    7 7
    38
    3 9
    14 IS
    j"1
    0
    c
    a:
    c
    4
    0 0
    1 1
    2 2
    "
    4
    5
    6
    7
    8
    o
    77 «
    j 0
    ; i
    i
    22
    3 3
    •1 4
    55
    6 E
    7 7
    8 8
    3 9
    o «
                                      Fig.  5
                2.  Hourly precipitation, HPD card format  1 or 2 in col. 12
                   as  identifier.
    
                   For each  station  in  the LCD program, #1 and 2 HPD data are
                   keyed each day with  precipitation  and  for the last  day of
                   the month whether precipitation has occurred or not.  If
                   stations  are  not  equipped with recording gages, amounts are
                   keyed only at 6-hourly synoptic times.  In this case, the
                   daily total is not keyed in the second format; the  monthly
                   total, however, is keyed in the last format of the  month  for
                   all stations.
                                       185
    

    -------
    B.  Checking Procedure - Hourly Precipitation
    
        1.  The checking is accomplished by a computer cross-foot listing
            to insure internal compatibility.  A second check is made be-
            tween the daily totals and the monthly total.  The cross-foot
            for each station is begun by building in the memory of the
            computer a grid of zeros for all days in the month, i. e.,
            28 days, 30 days, etc., as the calendar requires.  The keyed
            data are read into the grid and then edited.  Information
            concerning missing record, blank fields, erroneous keying,
            and arithmetic mistakes is listed to the right of the data
            field.  If a record is missing, the computer will list all
            hourly fields as having zero precipitation with indication
            to the right that the record is missing.  In the case of
            duplicates, only the last presented to the computer will be
            used and duplication indicated to the right.
    
            Since the presence of the HPD 1 & 2 record is a controlling
            factor, stations not having hourly precipitation must have
            "dummy" records for the last day of the month, containing
            only identification, date, and card number data.
    
            The HPD #4 card image  (4 in col. 12) has the greatest 24
            hour precipitation and date, snowfall and date, and great-
            est depth of snow on the ground and date.  There is only
            one #4 per station month.
    
        2.  The edit checks of the HPD 1, 2, and 4 card images are
            as follows:  (Sample shown on page  ]ygm
    
            Column   Data                          Edit Check
    
             1- 5    Station No.      Sequence checked by number with a
                                      4 punched in column 12 of 1st image.
    
             6- 9    Year & Month     Values are checked and must be the
                                      same for the entire edit.  Month
                                      must be  in range of 01-12 in cols.
                                      8-9.
    
            10-12    Day Card No.     Only days with pcpn. are keyed ex-
                                      cept for the last day of the month.
                                      Each day will have only  two images
                                      identified as 1 and 2 in col. 12.
                                      No. 2 has the daily total in cols.
                                      49-52.   The #4 in  col. 12 will not
                                      have day punched in cols. 10-12.
                                  186
    

    -------
       Hour Iy  edi t
                                                           10  1J  12 __
    00001
    oooof
    ooeoi
    ooooiT
    00001
    ooooi"
    00001
    O'OOOl"
    • 00001
    00001"
    ooool
    ~~oooor
    00001
    
    ooool
    00001
    70
    70
    70
    "70
    70
    '70
    70
    "70
    70
    70
    70
    70
    70
    
    70
    70
    ooboT"7o"
    00001 70
    ~ 00001
    70
    06
    ~06
    06
    T6
    06
    "06
    06
    "06
    06
    "06
    06
    "06
    06
    
    06
    06
    1
    2
    ~~S
    6
    ~7
    'e
    "10
    11
    "12
    1«
    "15
    IS
    
    21
    22
    ""66-2S
    06 26
    "06
    'JO
    1 000
    2 000
    ~i~ooo
    _2 	 000
    1 000
    2 000
    ~i ooo"
    2 000
    1 OCO
    2 OCO
    'V'ooo
    2 000
    'l 002
    2 000
    ~i~" 000
    _2_COO
    1 000
    ~2_ 000"
    "1" 000
    2 000
    1 000
    2 000
    1" 000
    2 000
    1 000
    2 000
    1 .000
    2_000
    1 000
    2 000'
    "T~OOV"
    _2__005
    1 000
    2 000
    1
    000
    000
    000
    000
    000
    000
    000
    000
    000
    'ooo
    000
    000
    010
    000
    000
    000
    000
    000
    000
    000
    ooo
    '000
    000
    000
    000
    000
    000
    000
    000
    000
    010
    010
    000
    000
    
    oos
    000
    000
    003
    000
    000
    000
    000
    000
    ooo"
    001
    005
    005
    010'
    003
    000
    000
    000
    001
    001
    000
    000
    ooo
    000
    000
    ooo"
    001
    001
    001
    '005"
    "010~
    010
    T
    • r
    
    00-
    000
    000
    000
    T
    000
    000
    000
    000
    005
    000
    002
    015
    005
    000
    000
    000
    001
    001
    T
    T
    000
    000
    000
    000
    001
    001
    001
    005
    015
    010
    000
    000
    
    000
    004
    000
    000
    000
    ' 000
    000
    030
    000
    005
    005
    003
    005
    005
    000
    000
    "ooo
    001
    OU1
    000
    "ooo
    001
    001
    T
    f
    001
    001
    001
    ' f
    "005
    T
    ooo
    000
    
    000
    CO*
    000
    000
    000
    000
    0 1
    000
    000
    000
    005
    005
    000
    " 10
    000
    000
    T
    T
    001
    001
    000
    "ooo
    000
    000
    T
    	 T
    000
    000
    000
    010
    "020
    oos
    000
    000
    
    001
    "ooo"
    000
    000
    000
    ~ooo~
    000
    000
    000
    "ooo
    005
    000
    000
    "010
    000
    000
    T
    T"
    000
    000
    000
    000
    000
    000
    000
    ~00l'
    ouo
    000
    000
    000
    "o:'5"
    005
    000
    ooo
    
    000 <
    000 (
    000 <
    000 (
    000_<
    ooo"i
    000 (
    000 (
    000 (
    ooo~c
    005 (
    000 <
    005 (
    005 (
    000 (
    000 (
    000 <
    000~<
    000 (
    000 (
    000
    000
    000 (
    000 (
    000 <
    ooo~<
    000 !
    001 <
    001 (
    ooo'c
    020~(
    010 C
    000
    coo
    
                                                       000 000 000
                                                       ooo ooo ooo
                                                       000 000 000
                                                       000 000 000
                                                       ooo_ooo_coo
                                                       ooo ooo ooo"
                                                       ooo ool ooo
                                                       000 000 000
                                                       000 000 000
                                                       oco'ooo'ooo"
                                                       005 000 000
                                                       000 000 000
                                                       005 00  000
                                                       005 000 000
                                                       005   T 005
                                                       000 000 000
                                                       000 000 000
                                                       ooo'ooo'ooo
                                                       ooo ooo ooo
                                                       000 000 000
                                                        _T__000 000
                                                        ~T 000"~000~
                                                       000 000 000
                                                       001 000 000
                                                       000 000 000
                                                       coo" ooo ooo'
                                                       000 000 000
                                                       000 000 000
                                                       001_000 000
                                                       000 000~000"
                                                       O'.O" 005 005"
                                                       005 005 000
                                                        _T_000__000
                                                         T 000 000"
                                                                 000
                                                                 "ooo~
                                                                         0010   ERROR HR  3 *
                                                                 ooo         MISSING no.  i  CARD
                                                                 000  _0003  ERROR HR   _ 16	
                                                                 ooo_
                                                                 000
                                                                             _H 1SS1NC _Np_._V_CARD_
                                                                         00  -   CROSSFOOT IKH.OH
    000
    000
    000
    000
    --5
    000
    000
    000
    000
    000
    coot
    
    T
    
    
    0092'
    002)
    ERROR HR 6
    HISSING N3. 1 CAR?
    ERROR KR 17
    ERROR HR 12
    
    ERROR HR 1018
    
                                                                   000
                                                                 000
                                                                 000
                                                                   000
                                                                         oooa
                                                                 000"
                                                                 000
                                                                 000_
                                                                 'ooo
                                                                 000
                                                                 000
                                                                 000
                                                                 'ooo
                                                                         0003
                                                                         0001
     0007_
    
    
    -oWf
                                                                 010
                                                                 010
                                                                 009
                                                                 000""
                                                                         020",
    _ 00001. 70 .0»_.f.
    _CiRD. MONTHLY. .TOTK_0<.30_...CC!',?UTE 3
    
     CCMPUTED
                                       TOTAI._.03ao__£RRCR._
    
                                        _ 25 __________
      6 Hnurly Edi t_
    23237 70 06 t 1
    2
    23237 70 06 » 1
    2
    23237 70 06 12 1
    2
    23237 70 06 1* 1
    2
    23237 70 06 25 1
    2
    23Z37 70 06 24 1 '
    2
    23J37 70 04 30 1
    '• 2
    23237 70 04 4
    000
    ool
    005
    000
    000
    000
    T
    000
    ooo
    000
    000
    000
    
    
    
    000
    T
    001
    000
    000
    I
    T
    000
    000
    T
    T
    000
    
    
    
    
    CROSSFOOT ERROR 0001
    CROSSFOOT ERROR OOOi
    
    CROSSFOOT ERROR. T
    CROSSFOOT ERROR T
    
    CROSSFOOT ERRBR T
    CROSSFOOT ERROR T
    
    
    0007 00309 000 000
    	K(
                                                  HPD   EDIT   LISTING
                                                         187
    

    -------
           Column
    Data
                 Edit Check
           13-48
    Hourly Values
           49-52
    Daily Total
           53-56
    Monthly Total
    Each hour has three cols,  for data,
    i. e., hour 0100 cols. 13-15, etc.
    Zero pcpn. is keyed "000."  BBO, BOO,
    OBO, COB & BBB are flagged.  All cols.
    are keyed with zeros placed to fill
    the col.  Blanks are flagged.  Trace
    amounts are indicated by an X in the
    right col. of the hour, preceded by
    two blank columns. Punched data of OOX,
    OBX, BOX and over-punches are flagged
    as errors.  A trace is indicated by
    an X and accumulation by a Y punch.
    
    Flagged for error when these data are
    omitted from the HPD #2 or keyed in the
    #1.  When entered, the field is fully
    keyed and errors are indicated for
    blank columns.  Trace is BBBX.  Data
    are flagged for OOOX, BOOX, etc.  The
    daily total is checked with the values
    in cols. 13-48 of the HPD #1 & #2
    cards.  If the values do not agree it is
    indicated as a "cross-foot error" and
    the amount of error is shown.  The cross-
    foot does not function if there are
    illegal punches.
    
    Keyed in the #2 card of the last calen-
    dar day of the month.  This datum is
    listed at the bottom of the edit as
    card total.  It is compared to the com-
    puted total taken from all daily #2
    cards.  If the totals are the same, the
    word  "agree" is printed and, if not,
    the word "error" appears.  If the total
    is omitted from the last #2 card, the
    card total is blank and error indicated.
    C.  Checking Procedure  - Extreme Precipitation
    
        1. The remaining data are contained in the HPD #4 card image.  This
           card contains no date in cols.  10-11,  and cols.  13-56 are blank.
           The card is listed on the edit below the last day of the month.
           Column
    Data
                 Edit Check
           57-60     Greatest pcpn.
                     in 24 hours
                     The value is checked for illegal
                     punching.
                                188
    

    -------
           Column
    
           61-65
    Data
    
    Date of 24
    hour amount
           66-68
           69-73
           74-75
    Greatest 24
    hr. snowfall
    
    Date of 24
    hr. snowfall
    
    Greatest
    snow depth
           76-78
           79-80
    Date of
    snow depth
    None
                 Edit Check
    
    Col. 61 is keyed zero or X.  Other
    values are flagged.  When the value
    in 57-60 is 0000 these cols, will be
    blank and are flagged if not.  The
    field is fully keyed if there is a
    value for 57-58 and listed as an
    "error pcpn. date" if miskeyed.
    
    Datum is keyed the same as the hourly
    pcpn. and has the same error check.
    
    Same check as in cols. 61-65, with 69
    keyed 0 or X with data in cols. 66-68.
    
    Zero is keyed for no snow.  2" = 02;
    110 = X/10.  Note:  The snow depth is
    keyed in two cols., but prints to
    three places.  This is to accommodate
    the overpunching for values greater
    than "99."
    
    These cols, are blank if the value
    in cols. 74-75 is 00.  76 is keyed X
    for + dates or zero.  Other values
    are flagged.
    
    HPD cards 1, 2, and 4 are blank in
    these fields.
        2. The edit contains a "Computed High - 24 Hour Precipitation"
           with dates.  This is a guide to checking this value on the LCD.
           There is no check by the computer between this value and the
           one keyed in the HPD #4 card.
    
    D.  Correction of HPD Data
    
           Data contained in the HPD 1, 2, and 4 card forms are corrected
           by submitting to the computer a new card punched in its entirety
           containing the information to be updated.
    
    E.  Maximum Short Period Precipitation.
    
           For each month, maximum precipitation is keyed as two records:
           1 in col. 10 with data for 5, 10, 15, 20, 30 and 45-minute
           periods, and 2 in col. 10 with data for 60, 80, 100, 120, 150
           and 180-minute periods.  See page 31 for the keying format.
           Day and time entries designate the end of the time period in
           which the amount of precipitation occurred.  Day and time are
           omitted when the amount is zero, trace, or missing.
                                    189
    

    -------
    A computer edit program checks completeness and consistency
    of the data and produces an edit listing with flags indicating
    the deficiencies.  The flags and associated deficiencies are
    as follows:
    A = Record #1 missing
    B = Record #2 missing
    C = Month < (2(1 or > 12
    D = Day < 01 or > 31
    E = Hour < 00 or > 23
    F = Minutes  59
    R = Amount zero or trace
    S = Missing "M"
    T = Pcpn, 5; 0.01 with day
        or time missing
    W = 10.00 or greater
    AA
    AB
    AC
    AD
    AE
    AF
    AG
    AH
    AI
    AJ
    AK
    AL
    AM
    AN
    = 10
    = 15
    = 20
    = 20
    = 30
    = 30
    = 45
    = 80
    = 100
    =120
    = 60
    = 150
    =120
    =180
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    > 2
    ;> 5
    :p=- 5
    -y* 2
    ^»10
    •y* 2
    ^=*15
    >>20
    ?-20
    ^•20
    :=- 2
    5>30
    > 2
    ^•60
    X
    MIN
    MIN
    X
    MIN
    X
    MIN
    MIN
    MIN
    MIN
    X
    MIN
    X
    MIN
    5 MIN
    + 10
    + 15
    10
    + 20
    15
    + 30
    + 60
    + 80
    + 100
    30
    + 120
    60
    +120
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    MIN
    Corrections are keyed in the format shown on page 31, enter-
    ing data for the time period involved only, for updating the
    tape.  The updated tape produces printers copy for use in
    the CDNS Annual.
                          190
    

    -------
    
    
    
    
    
    
    
    
    
    s
    H
    IS
    H
    H
    O
    §
    a
    w
    p<
    o
    ffi
    0)
    S
    H
    i
    o
    E-J
    O
    
    S3
    H
    
    
    
    
    ana
    on 15
    ens
    onC
    o» SS
    enS
    o»8
    en 3
    ons
    enS
    enS
    ens
    en 3
    en9
    ens
    en at
    en a
    ens
    en 51
    en 3
    en S
    en 3
    en 9
    enS
    en 9
    en*
    coy
    en 3
    en n
    COS
    cnS
    e»3
    on n
    c»$i
    e»R
    en n
    cnR
    en*
    en R
    o>S
    0>R
    er>K
    en K
    cog
    en 2
    en 5
    en c
    o»S
    en B
    en :£
    enS
    en ~
    en 2
    en •»
    en <•
    en r»
    en «•
    en •<•
    en ••
    en •%
    O9 w
    en —
    
    .
    £
    -H
    X
    
    m
    
    1-1
    X
    «n
    81
    O
    tj
    
    
    O
    3
    |
    ~
    n
    3
    I
    
    
    0
    c
    
    M
    
    
    
    
    
    
    z
    rt
    •
    
    X
    •H
    O
    ?
    X
    •-«
    
    -5
    2
    8
    S
    3
    X
    
    
    «
    •H
    S
    c«
    
    
    
    8
    C
    »5
    C
    IS
    K
    T£ 
    -------
                       Validation,
                       Compaction,  and
                       Analysis of  Large
                       Environmental
                       Data  Sets
    
                       By John Jalickee
                          Jerry Sullivan
                          Richard Rozett
                       EDS scientists have  developed a tech-
                       nique which, among,  other  benefits,
                       allows them to compact a data set of
                       184,000  values  into  an  equivalent
                       data set of fewer than 6,000 values,
                       while retaining 90  to 98 percent of
                       the  variability of the  original data
                       fields. Moreover, much of the remain-
                       ing  variability appears to be sensor
                       noise.
    Introduction
    Large-scale  environmental  field  ex-
    periments  such  as  the  Barbados
    Oceanographic  and  Meteorological
    Experiment  (BOMEX), the Interna-
    tional Field Year for the Great Lakes
    (IFYGL),  and the  GARP (Global
    Atmospheric Research Program)  At-
    lantic Tropical  Experiment (GATE)
    produce huge data sets and attendant
    large-scale problems in data  valida-
    tion, analysis, and synthesis. New and
    more sophisticated  techniques   are
    needed  to  extend  and  complement
    traditional  methods when working
    with such large data sets.
      The failure of conventional smooth-
    ing techniques to adequately  remove
    noise from  an  IFYGL  rawinsonde
    (atmospheric sounding)  wind data
    set   and  still   retain   meaningful,
    though highly variable, natural fluc-
    tuations  led  the  authors and  other
    scientists of EDS' Center for  Experi-
    ment Design  and  Data  Analysis
    (CEDDA)   to  try  a  new method,
    called the asymptotic singular decom-
    position  method,  or ASD  for short.
    The resulting computer program elim-
    inated the noise and retained  the es-
    sential data.  It also greatly reduced
    the size of the original data base and,
    through  intermediate  graphics,  pro-
    vided  a  quick and  efficient  method
    of error  detection,  while  isolating
    physical  relationships  and character-
    istic  patterns.
    
    The  ASD Method
    The idea behind this data decomposi-
    tion technique is to extract meaning-
    ful information  in the form of char-
    acteristic  patterns.  As an  example,
    consider  a  meteorologist  studying
    daily  maximum  temperature data for
    the east roast. Station by station, he
    observes that, in general, it is warmer
    in summer lhan in  winter: from this
    he abstracts  a typical seasonal  vari-
    ation. On  the other  hand, studying
    station-to-station  variations, he notes
    that temperatures are generally colder
    in the north than in  the south at al-
    most  any  time  of  year. With  these
    two  characteristic  variations  (space
    and time)  lie can qualitatively explain
    the main  features  of  the entire  data
    set. And b) retention of a relatively
    few significant temperature \alues he
    could quantitatively describe  perhaps
    ')()  percent  of  the  east  coast maxi-
    mum  temperature  field.
      The ASD data decomposition meth-
    od adapted by CEDDA formalizes this
    process  and provides a technique  to
    calculate  characteristic  patterns for
    small and  large data sets.  I  sing the
    ASD method, dominant patterns with-
    in the data are easily extracted in an
    objective,  repeatable fashion.  In many
    respects, the science of ASD  is much
    akin to  the art of the caricaturist: the
    major  features  of  the  subject  are
    quickly shown  with a few sure, deft
    strokes.
       CEDDA scientists have  used  ASD
    to reduce  the quantity of data needed
    for a  sufficient representation  of  a
    physical situation:  often the equiva-
    lent data set is an order of  magnitude
    smaller than the original  one.  Data
    generated by the method also  are used
    in calculations  that require relatively
    noise-free   data:  random  noise   is
    smoothed out, while real discontinui-
    ties or  sharp changes are relatheh
    unchanged. An  unexpected bonus of
    the method is  its  error-detection ca-
    pabilities: keeping  with the caricatur-
    ist analo<:\.  distorted  (erroneous^
    features stand  out sharpK.  Physical
    relationships within  the  data,  often
    buried  b\ the volume of numbers, are
    also highlighted h\ the  method.
       The  ASD  method is  related   to
    other  statistical techniques  such   as
    principal   component  analysis.  Lor-
    enz's1 empirical orthogonal functions
    in meteorology, and the  factor analy-
    sis method of psychologists, political
    scientists,  and  sociologists:  however.
    ASD has  the advantages of simplicit\
    and accuracy. A factor analysis com-
    puter program might fill over a thou-
    sand  punched  cards,  while  ASD
    would  use  a  hundred.  And  ASD  is
    almost  immune to  computer  roundoff
    error,  an  important  consideration
    when large data sets aiv in\ol\ed.
                                                        192
    

    -------
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    x
    t
    *
    •=
    *
    0-
    ft,
    :
    a
    C,
    L.
    a.
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    18
    -2*. 3
    -22. C
    -Jfl. 5
    -15.5
    -19. 0
    -17.9
    -io. e
    -Iv. 7
    - It. t
    500-13.2
    -12.0
    -10.8
    — 5.0
    -8. -3
    -e. 3
    -7. 7
    -7.1
    -o.5
    -b. 0
    400-6.0
    -"•. 3
    -".. 0
    -3. 5
    -2. 7
    -<:• u
    -1. 5
    , D
    1. D
    300 l.o
    2.7
    3.1
    3. a
    t. 5
    t. 9
    3 . t
    o.O
    o. 7
    7. 0
    200 7. 7
    0.2
    0.3
    3.9
    9.2
    9.6
    9. 8
    9.0
    5. 0
    9.3
    100 S.t
    9.o
    5. 5
    1U. 1
    9.8
    9.5
    9.t
    9.5
    9. *»
    5.b
    0 5. 8
    
    1 21
    -23. J
    -22.5
    -21.5
    -2j.9
    -20. U
    -19. u
    -17.9
    -I'.b
    -15.2
    -»U.C
    -12.6
    -11. t
    -9.3
    -b.b
    -5.1
    -f .'
    -/ .0
    -o. u
    -3.7
    -5.2
    -t.5
    - 1 . t
    -3.5
    -2.°.
    -2.3
    — 1.3
    
    • d
    1. c
    2.6
    3.5
    t.2
    t . /
    j.l
    5.5
    6.2
    7.0
    7.2
    8 • u
    8.5
    9 ,j
    9 ,u
    9.5
    !<..<•
    10.5
    11. t
    11.7
    12.0
    12.3
    12. b
    13.0
    12.}
    li.b
    1 ?. 1
    11.6
    11. H
    11.3
    11.0
    10 .b
    Nov. 3
    00
    -2o.l
    -2t.t
    -<: J.I
    -fink
    -20.3
    -10.9
    -17.?
    -16.3
    -It. 9
    -13.6
    -13.0
    -ll.o
    -in /
    i. J • /
    -9.9
    -9.2
    -e.s
    -7.6
    -7.1
    -o . 2
    -5.3
    "* J • (l
    -•». t?
    *" «* « *t
    -3.5
    -2.9
    -l.o
    t C
    1. 3
    1.9
    3.T
    t.2
    5.2
    6.1
    /.I
    0.2
    9.2
    10.1
    n.?
    11.1
    10.5
    1U.7
    10. C
    10.2
    10.3
    10.7
    11.0
    11.6
    12.3
    12.5
    13.3
    13.5
    1-..5
    It .5
    13.0
    13. t
    11.2
    13.S
    13.o
    12.0
    
    03
    -25.?
    -23.0
    -22.6
    -21.0
    -20.0
    -19.3
    -10. J
    -17.0
    -lo.2
    -l^.o
    -It. 1
    -13.t
    — 12.0
    -10 . y
    -9.6
    -9.7
    -o. u
    -7.7
    -O.O
    --•. t
    -t.3
    - '. 1
    -c. 5
    -1.9
    -1.0
    -.8
    . 1
    .ft
    1.3
    i. 5
    .9
    1.8
    ...9
    T ^
    t.5
    5.0
    5. U
    -.. . 7
    t.5
    fa. 9
    7.6
    7.9
    0.7
    5.2
    5.7
    5.0
    1C. t
    11.0
    11.7
    12'. t
    13. u
    13.5
    1^.3
    12.8
    12. 7
    13.2
    1-..5
    13.1
    12.1
    
    ' Oft
    -2i..t
    -25. 0
    -23.7
    -22.0
    -22.3
    -21. t
    -20.0
    -10. 3
    -10.2
    -1 7.2
    -15.9
    -It. 8
    -12.0
    -1 !.!•
    -1 C.7
    -9.t
    -S.t
    - / • 3
    -6. 5
    -3.3
    -t.7
    - ?. 9
    -3.<
    - 3 . t
    -t.t
    ~ 3 • i
    -3.7
    -3.3
    -t.6
    -3.9
    -3.'
    -2.'
    -2.0
    -l.o
    -••i
    -.a
    -. ^
    .2
    .1
    1.7
    2.t
    3.7
    t.l
    t . 0
    5.o
    6.2
    7.1
    7. i
    3.7
    9.7
    10.3
    11.1
    ll.l
    11.7
    12.1
    12.o
    13.2
    13.1
    12.0
    Time
    09
    -2*. 7
    -28.7
    -27.l)
    -2o.O
    -25.1
    -2U.1
    -?.>. 1
    -22.0
    -2i . 9
    -Zll.i
    -18.7
    -18.?
    
    -I5.o
    -la.O
    -li./'
    -It. 0
    -13. t
    -13.0
    -12.0
    -11.0
    -lu . 3
    -•3.3
    -K.7
    -b.3
    -9.f-
    -5.1
    -l.«
    -b.t
    -7.5
    -6.2
    -7.3
    -7.3
    -7.C
    -3.6
    -3.7
    -2,0
    -2.0
    -.7
    _ C
    .5
    1.9
    ?.6
    3.3
    t.l.
    5.2
    6.1
    7.1
    8.5
    8.7
    9.9
    lu.1
    15 .6
    11.1
    11.7
    12.1
    12.1
    11.7
    (GMT)
    12
    -jl.5
    -32.'
    -33.7
    -32. 5
    -31.o
    -J0.1
    -?Q . U
    -27,-n
    -2o . e
    -25.^,
    -25. (
    -23.5
    -.2.0
    -21.7
    -tG.O
    -20.0
    -19. *
    -18.2
    -17.3
    -io. ti
    -16.5
    -15. i
    -It. 9
    -In. 3
    -1J.9
    -1^.-
    -11.0
    - 1 9 . r
    -10. u
    -9. 3
    -S. b
    -s.a
    -< . t
    -6.?
    -6.0
    -5.5
    -t.9
    -t . 0
    -o. 1
    -2.5
    -l.o
    - . D
    . 1
    1.0
    A. 0
    2.0
    ^, c
    1.2
    t.l)
    t, ,7
    3.6
    6.2
    7.3
    8.?
    8.9
    3.9
    iu.6
    10. j
    
    15
    -29.
    -29.
    -3u.
    -29.
    -28.
    -30.
    -29.
    -27.
    -2o.
    -25.
    -2t.
    -23.
    -22.
    -21.
    -2u ,
    -19.
    -16.
    -17.
    -16.
    -15.
    -15.
    -lu.
    -Io.
    -lj.
    -12.
    -11.
    -in.
    - ^ .
    -9.
    -9.
    -8.,
    -7.
    -6.=
    -b.c
    -t-.t
    -H. "
    -t *
    -3.
    -0.
    — ? .
    -1.
    -1 •£
    ~ • C
    
    • C
    1.-
    1 • *•
    2.
    3.
    3.
    
    5.
    5.-
    5.
    b.
    7.1
    8.
    9.
    9.
    
    18
    -25.5
    -?i* . t
    -Ct. 1
    -23.0
    -23.5
    -22. t
    -? 1 . t
    -i J.I
    -15.2
    -.a.i
    -If .U
    -16. 1
    -15.2
    -It. -j
    -13.3
    -12. t
    -11.7
    -13.0
    -1L.1
    -8.9
    -1.3
    -8.'.*
    -1.9
    -1 . 1
    -1.1
    -.H
    .0
    - . b
    - ?.l
    -3.5
    -U.J
    -3.8
    -3.6
    -3.7
    -"..5
    -U.I
    -3.t
    -2.7
    -2.2
    -1 .1
    -.9
    -. J
    . .>
    ,S
    1. J
    1.9
    2. a
    1.7
    2.3
    3.i
    3.*
    t.7
    E.5
    8. D
    Nov. 4
    00
    -25. 9
    -2t.3
    -2^.9
    -21.1
    -±9.7
    -IS. t
    -If. 3
    -lu.U
    -lo.l
    -lp. 0
    -It. 5
    -13. T
    ~ A 2 • 0
    -11. S
    -li.O
    -10.1
    -?. J
    -0.2
    -0.9
    -0. 1
    -5.3
    -3.2
    -t . T
    -t.O
    -3.2
    -2.7
    -1.9
    -2.3
    -1.5
    -1.2
    -1.1
    -. 8
    -1.0
    - A . 0
    -1.9
    -'.U
    -'.3
    -2.5
    -3.2
    -,.1
    - 1 . b
    -t.5
    -t.t
    -1.7
    -3.2
    -2.0
    -2.0
    -1.5
    -1.3
    -1.2
    -.8
    -.2
    .3
    1.1
    2.1
    2.9
    o.b
    t.3
    •j.2
    
    03
    -26.1
    -•",.7
    -?o.3
    -21.8
    -2C. H
    -15.1.
    -19.3
    -1< .9
    -16.6
    -15.7
    -11.. 7
    -11-. 0
    ~ 1 ? • 2
    -12.
    -11.
    -1C.
    -9.
    -6.
    -7.5
    -6.7
    -b.2
    -5.2
    -U.6
    -t. 2
    -3.6
    -2.8
    -t.9
    -L,q
    -t.7
    -<•.!»
    -3.9
    -3.9
    -3.6
    -<«. 1
    -"..7
    -t.l
    -3.6
    - 1, 7
    -1..8
    -6. 3
    -b.6
    -E.2
    -t.5
    -t.5
    -3.0
    -o.C
    -2.t
    -1.6
    -.8
    -.1
    . <4
    .9
    .5
    -.0
    .5
    l.C
    1.7
    i.b
    3.7
                                                     Figure 1. Upper-air temperature data for Stony Point, New York.
    Data Compaction
    
    Data  from IFYGL for 1972-73  pro-
    vide some vivid  illustrations  of the
    benefits of ASD applications. To dem-
    onstrate the data-compacting capabili-
    ties  of  the  ASD method  (plus  the
    method itself), consider 12 successive
    1FYGL ravvin^onde launches from sta-
    tion Stony Pt,, N.Y.. for the  period
    1800 GMT Nov. 2, 1972, to 0300 Nov.
    1. 1972 (fig. 1 ). Temperature values
    are pivcn  for each 10-mbar pressure
    level, so that up to the 590-mbar level
    we have 12 X 60 ~ 720 values. (The
    pressure  variable used in all figures
    is P*, the difference between surface
    pressure  and observed pressure, i.e.,
    P» =P,ur,.r,.-P.)  The  particular
    time period  was  chosen  because a
    sharp upper-air  trough was  passing
    over Lake  Ontario,  producing  the
    characteristic  temperature variations
    represented  by the solid lines in  fig-
    ure 2.
      The  object  of  ASD  application in
    ibis instance is to replace the 12 col-
    umns of 60 numbers with 1 column
    of f>0 numbers and 1 row of 12 num-
    bers, as in figure 3. In the latter illus-
    tration, the column  represents the
    pressure dependence of the tempera-
    ture soundings, while the row repre-
    sents  the  time variation. To  obtain
    the 350-mbar  temperature  for 0000
    GMT on November 3. one would mul-
    tiply the 36th  number  from the bot-
    tom of the column by the 3d number
    of the row (as shown  in fig. 3), or,
                                                          193
    

    -------
                                                                                                      500-
                                                                                                      400-
    
                                                                                                               A
                                                                                                               L.
                                                                                                               5
                                                                                                              i
    
                                                                                                              y—v
                                                                                                      300-    ft.
                                                                                                      200-
                                                                                                      100-
                                                  Time (GMT)
                                                                                                      0-
    to get  the  150-mbar  temperature at
    inOO GMT on November 3. multiph
    the 16th column  number from the
    bottom ]r>\ the Oth number in the row.
       Where  did the  column  and  row
    come fromV  t\u\ column and ro\\ of
    numbers can be multiplied together
    to generate a temperature field. The
    best choice is one that minimizes the
    sum of squared  differences  between
    the generated field and the original
    field. In practice,  the ASI)  computer
    program begins with a  trial column
    and row,  then  generates successive
    values until there  is no further  mini-
    mization of  differences  between the
    two temperature  fields.
       In the example at hand, the origi-
    Figure 2. Time-height temperature
    analyses for Stony Point. The solid
    lino are based on the original
    data set. the dashed lines on a
    reconstituted data set.
    nal 7'20 numbers. have been replaced
    by 60  -  12 = 72 numbers, a  10-fold
    reduction.  The  new  field generated
    by the row and column explains ap-
    proximately ('0  percent of the varia-
    tion  about the mean of the original
    field. The  ASD  method now mav  be
    used again to describe the residuals
    of the original  field minus the first
    generated  field,  producing  another
    row  and column. Vsuallv.  about  OM
    percent of the original  temperature
    held  variation i? covered  li\  three
    rows  and columns. The broken line*
    in ft;:lire 2 show  a it-constituted tem-
    peratuie field using thiee iow-  and
    columns.
      Overall.  CEDP \   scientist-  wen-
    able to compact 60 levels of tempera-
    lure, humiditv. and wind \.ilues from
    768 IF^GL  upper-air -oundinns  i (>
    stations. l'2o launches each i.  a total
    of 184.000 values, into an  equivalent
    data set containing fewei  than 6.0(H'
    values. From 1X>  to «>{'> peicent  of the
    characteristics  of the  oiiginal field-
    are retained, and much of the une\
    peeled variabilitv  appeal* to be  sensoi
    noise.
                                                          194
    

    -------
    Error Checking
    Figure 4  illustrates ASD's error-de-
    tection  capability.  Obviously,  the
    sounding  for station 2 differs greatly
    from the soundings for the other five
    stations.   Figure  5  shows  the  time
    components  corresponding   to  the
    pressure component of figure 4.  Once
    again, a   strong  anomaly  (circled
    values') shows up. The six soundings
    indicated  were checked and did prove
    to be erroneous.  Thus, a 10-second
    scan of these two ASD graphs isolated
    an error that previous!) had escaped
    detection.
    
    Physical  Relationships
    Three station  pairings   stand  out
    clearly  in  the lower  levels  of the
    soundings shown  in  figure 6. These
    station pairs—1-2, 3-6, and 4-5 —are
    geographically  related.  Stations 4-5
    are on the  western end of Lake On-
    tario, 3-6  on the middle shoreline, and
    ] -2 on the eastern end. Figure  7, a
    plot of the corresponding time com-
    ponent, shows that the effect is most
    pronounced for launches number 20
    through 27. A detailed check of the
    soundings from all stations for this
    period revealed a large east-west wind
    velocity gradient  which varied from
    2 m/s in the west to  6 m/s in the
    middle to 14 m/s in the east.
    
    Other Uses
    With ASD.  new  data  can be  com-
    pared quickh  with  older data ob-
    tained by the  same measuring  sys-
    tem.  Drastic differences in the  ASD
    plots will  suggest  instrument  drift
    and/or mistaken  assumptions about
    experimental background  conditions.
    The same approach can be used where
    different t\pes of instruments are sup-
    posedly measuring the  same physical
    phenomenon. This type of application
    allows CFDDA scientists to study the
    very  large  data sets associated  with
    ecosystems and often, through simul-
    taneous  analysis  of many  different
    kinds of   variables,  uncover hidden
    interactions.
    
    
    
    
    
    
    400
    
    
    
    
    
    
    
    
    £ 300
    ^
    jg
    .9
    ^^
    L
    
    IE
    s
    
    200
    
    
    
    
    
    
    
    100
    
    
    
    
    
    
    Nov. 3 Time (GMT)
    18 21 00 03 06 09 12 15
    .77 84 .91 95 106 1.24 U9 IDS
    27,0
    110
    101
    9.2
    85
    7.9
    73
    6.6
    5.6
    54
    50
    4.5
    3.9
    3.3
    2.9
    24 *
    18
    1.6
    13
    7
    2
    2
    6
    8
    1 2
    1 7
    23
    3.4
    40
    46
    5 1
    57
    64
    69
    75
    7.7
    80
    83
                                                                                                 18
                                                                                                 .84
                                                                   Nov. 4
                                                               21   00   89
                                                               AO   41   MS
            94
            98
          0  98
    Figure 3. An illustration of the ASD
    data compaction technique. The
    single column oj 60 numbers and
    single row of 12 numbers replace
    the 12 columns oj data appearing in
    figure I, yet retain approximately
    90'/i oj the details oj the original
    data set.
    
    Modeling  and  Experiment
    Design
    CEDDA scientists are pursuing other
    potential  applications  of  the  ASD
    method, including its use  in modeling
    and experiment design. The  charac-
    teristic patterns obtained  provide  im-
    portant clues as to the physical reali-
    ties underl)ing the data. We hope that
    the  pattern-detection  capabilities  of
    the ASD method may lead to an em-
    pirical, data-oriented form of system
    modeling.
       Another  promising path leads to-
    wards the economical  design of field
    experiments and data  collection  sys-
    tems, based on characteristic patterns
    derived from preliminary survey data.
    Much  redundant data  and informa-
    tion are  often collected in  large-scale
    field experiments. If the redundancies
    could  be eliminated,  all  subsequent
    data collection,  processing, analyses,
    archival, and  dissemination activities
    would be greatly simplified and more
    cost-elfective.  The  ASD method,  by
    highlighting  significant patterns of
    preliminary survey  data  sets,  could
    suggest which data contribute most
    to the definition of the patterns, and
    which are  dispensable.
    
    Reference
    1 Loren/, E. H., Empirical  Orthogonal Func-
      tion',  
    -------
                     Figure 4. Composite printout of
                     U-components of the wind for 48
                     upper-air soundings taken at each of
                     six IFYGL observation stations.
               0 S
               ii
                      Figure 5. Time analysis of data from
                      figure 4 isolates six anomalous
                      soundings (circled).
    196
    

    -------
    Figure 6. Composite of V-components
    of the wind for 48 upper-air
    soundings taken at each of six
    IFYGL observation stations.
     Figure 7. Time analysis of data from
     figure 6 indicates that the pairing
     pattern is most pronounced in
     soundings 20-27.
                                                     197
    

    -------
    About the Article
    and the Authors
    
    JACK   JALICKEE  was   thumbing
    through  a scientific  journal  in  the
    spring  of 1973 when he came across
    an  article   on  the  mathematical
    theorem  of  singular  decomposition.
    It  was  evident  that the theorem was
    adaptable to the analysis of the large
    data sets the EDS Center for Experi-
    ment   Design  and   Data   Analysis
    (CEDDA) was working with. This
    was the origin  of the ASD I Asv mp-
    totic Singular Decomposition) meth-
    od.
      CEDDA  analysis  of atmospheric
    data from  the  International  Field
    Year for  the Great Lakes  (IFYGLl
    began  in  the autumn of 1974. Prob-
    lems arose  almost  immediately.  Di-
    vergence  calculations  derived  from
    upper  air winds did not make physi-
    cal sense. (The calculation is a ver\
    sensitive one. involving small differ-
    ences of large numbers which contain
    noise.) The  data themselves appeared
    reasonable and consistent  with  ob-
    served   weather  conditions,  \\bich
    were   highK   variable.  Traditional
    analysis techniques  could not resoKe
    the problem: ASD did.
       A native of Vi ashington,  D.C., Jack
    Jalickee   worked his  vva)   through
    Catholic Universitv  I in D.C.I, receiv-
    ing a  R.A.  in  1962.  and  a Doctor's
    degree in 1966. both in Phvsics. Sub-
    sequently, he worked  as  a research
    associate and teacher at Northwestern
    Uni\ersit\ in  E\ anston. Illinois. A
    Presidential   Internship appointment
    brought   him  "to EDS/CEDDA  in
    1972.
    JERRY  SULLIVAN v»as  the  man
    having problems with  IFYGL  data
    divergence calculations. His  inhouse
    paper  on  the subsequent resolution of
    those problems through ASD applica-
    tions provided  the nucleus of the cur-
    rent article.
       Jerry received a  Bachelor's degree
    in Phvsics from Hol\  Cross  College.
    Worcester,  Mass.. and  his Doctor's
    Degree from Catholic I nhersitv. He
    Jack Jatickfe
    t
    Dick Rozett
    joined EDS/CEDDA in the fall  of
    1970.
    
    Fr. RICHARD ROZETT, S.J.,  is  on
    a  year's  sabbatical from  Fordham
    I'nivershv in New York. His previous
    work and interest in the application
    of statistical techniques to large data
    sets   led  Fr.  Rozett  to  come  to
    CEDD \.  where  he heads  up  its
    MESA (Marine Eeosvstem Analysis)
    Project.  Since September  1974.  he
    has been working with Jack  Jalickee
    in collecting,  devising,  and develop-
    ing  ASD and  similar techniques to
    analvzp ecosv stems data  sets.
       Kcosv stems  data  sets  are   verv
    large, complex, and highlv redundant.
     They include  plivsieal measurements
    such as temperature, depth, pressure.
    and  the particle size of sand: chemi-
    cal measures of oil. lead, phosphate.
    aciditN. =>alinitv. nitrate,  and  carbon-
    ate  concentrations—not  to  mention
    garbage  and sewer sludge:  and bio-
    Jerry Sullivan
    logical  measurements  such as  the
    number  of  barnacles  per  square
    meter, or the percent of flounder with
    fin  rot. ASD and  similar techniques
    make it possible to massage the orig-
    inal data  into  a  simpler,  concen-
    trated. and more meaningful data set.
      Dick Rozett earned a B.S.  degree
    in chemist r\ from Spring  Hill  Col-
    lege in Spring Hill. Alabama,  a M.S.
    degree  fiom  St.  Louis  Vniversitv.
    then  studied  chemical   phvsirs  at
    Johns  Hopkins  in Baltimore. Md..
    where he received  bis Ph.D. in 1967.
      Ordained  a  priest  in 1062.  Fr.
    Rozett was an Assistant Professor of
    Chemistiv at Fordham from 1967 to
    1972. when he \sas made an \ssoci-
    ate  Professor. He  is  the author  of
    more than  30 scientific  papers  on
    chemistrv  and the  statistical analvsis
    of large data sets, and  has partici-
    pated in international scientific  con-
    ferences in I.eningiad. Lisbon. Kifis-
    si.i  l Greece i . and Kvoto.
                                                           198
    

    -------
     DATA VALIDATION FOR UPPER AIR SOUNDING DATA
             AND EMISSION INVENTORY DATA
                          by
                     J.H.  Novak
     Environmental Sciences Research Laboratory
        U.S. Environmental Protection Agency
    Research Triangle Park, North Carolina 27711
                         199
    

    -------
               DATA VALIDATION FOR UPPER AIR SOUNDING DATA
    
    
    
    
                       AND EMISSION INVENTORY DATA
    
    
    
    
                               J.H. Novak
    
    
    
    
    
    
    
         A  systematic  approach   to  data   validation   requires   that
    
    
    
    
    several  steps  be taken during the design of  a validation  scheme.
    
    
    
    
    For any set of data it is essential to be familiar  enough with the
    
    
    
    
    data  collection and data handling  procedures  to  be able  to locate
    
    
    
    
    all possible sources  of  error   and   to define   a  criteria  for
    
    
    
    
    distinguishing  good  and  bad  data  at  those  critical  points. The
    
    
    
    
    next task is to  determine  which   techniques   can   be   used   most
    
    
    
    
    effectively in error checking, and  what  course of action  should be
    
    
    
    
    taken if an error is  detected.   Finally,   after  the   validation
    
    
    
    
    scheme  has  been  implemented  the   quality of  the validated data
    
    
    
    
    should be assessed in some manner.
    
    
    
    
          Therefore the first step in the validation  of RAPS  upper air
    
    
    
    
    data was to determine all possible  sources of  error  in  the   data
    
    
    
    
    handling  system.  The  upper  air  data consists  of two  types of
    
    
    
    
    observations, Pibals and  Radiosondes.
    
    
    
    
          A  pibal is a pilot ballon  which is filled  with helium to an
    
    
    
    
    exact pressure in order to insure that it will rise with  a  known
    
    
    
    
    ascension rate when released  into the atmosphere. An observer uses
    
    
    
    
    a mechanical device known as  a theodolite to track the balloon  by
    
    
    
    
    recording  azimuth  and   elevation   angles at 30  second intervals.
    
    
    
    
    These angles are then used to  calculate  wind speed  and  direction
    
    
    
    
    at various heights above  ground.  There are  two possible sources of
    
    
    
    
    error during this phase of data collection.   First,  the  observer
    
    
    
    
                                   200
    

    -------
    may  read  the  angles incorrectly during  the  sounding  and  second,
    
    
    
    
    transcription errors may occur when coding  the  data  onto  forms  for
    
    
    
    
    keypunching.
    
    
    
    
         The radiosonde is similar to a pibal  in that  it   is   also   a
    
    
    
    
    balloon;  however,  a package of instrumentation  containing  various
    
    
    
    
    meteorological sensors is attached to  the  balloon  which is  tracked
    
    
    
    
    electronically instead of manually. In  addition to the  azimuth  and
    
    
    
    
    elevation angles, pressure,  temperature,   and   relative  humidity
    
    
    
    
    readings  are  recorded. A variety of  thermodynamic  parameters  can
    
    
    
    
    be  determined  from  these  measurements.   There   are    several
    
    
    
    
    potential  sources  of  error  associated   with the  soundings   :
    
    
    
    
    electronic   difficulties,   sensor    malfunction,    calibration,
    
    
    
    
    misinterpretation  of  the   strip  charts,   interpolation  of   the
    
    
    
    
    adiabatic charts and transcription errors.
    
    
    
    
         Once all possible sources of error have been  determined  and a
    
    
    
    
    range of good and bad data   defined,   various   techniques  can   be
    
    
    
    
    chosen  to  search  the  data  for  possible errors. The  upper  air
    
    
    
    
    sounding  network's(UASN)  preliminary   quality control    program
    
    
    
    
    contained the following tests on  the  raw data:
    
    
    
    
      1. Routine data checks - data was checked for completeness  and
    
    
    
    
        compared  with  known  data(e.g.   station   date   and  time vs a
    
    
    
    
        performance matrix,  station  #   vs  station  height,   balloon
    
    
    
    
        weight vs release time)
    
    
    
    
      2. Consistency checks with alternate  data source(e.g. wind  data
    
    
    
    
        vs  station  log  books,  doubtful   data   vs  weather maps  and
    
    
    
    
        recording barograph)
    
    
    
    
    
    
                                    201
    

    -------
      3.  Intra-station checks with previous and following  soundings.
    
    
    
    
      4.  Inter-station checks with simultaneous soundings.
    
    
    
    
      5.  Checks with known meteorological relationships(e.g.
    
    
    
    
        comparison of temperature and relative humidity  with  adiabatic
    
    
    
    
        charts, shape of the pressure-altitude curve).
    
    
    
    
    
    
    
        The actual key punching of the data forms   introduces  another
    
    
    
    
    source  of  error.  But  at  this  point  the   data   checks can be
    
    
    
    
    computerized, so that all data will  routinely   undergo   the  same
    
    
    
    
    tests.  The UASN data validation programs test  the  data  for order,
    
    
    
    
    range, missing values, station height, and special  conditions such
    
    
    
    
    as  calms  or  wind  speeds  greater than 40 meters/second. Again,
    
    
    
    
    additional checks can be performed on  the  radiosonde  data  when
    
    
    
    
    special  relationships  exist(e.g.  inverse  relationship  between
    
    
    
    
    pressure and time). The advantage of computerized   error   checking
    
    
    
    
    is that the entire data set can be objectively  evaluated.
    
    
    
    
         Once  the known data errors have been flagged   and  corrected,
    
    
    
    
    the  next  step  is  to  archive  the data. During  this  phase both
    
    
    
    
    printouts  and printer plots  are  produced  in   order  to  provide
    
    
    
    
    additional information to be used for error detection. The printer
    
    
    
    
    plots of speed,  direction, temperature, dew point  and  atmospheric
    
    
    
    
    pressure can quickly be scanned for remaining  inconsistencies.
    
    
    
    
         In an effort  to further validate  the UASN  radiosonde  data  ,
    
    
    
    
    dew  point,  relative  humidity  and  vapor pressure calculated at
    
    
    
    
    sites  141  and 142 were compared with corresponding   data  recorded
    
    
    
    
    by  the national weather service at Lambert Field.  Correlation and
                                   202
    

    -------
    incidence  matrices   were   also   calculated  for  the  same   three
    
    
    
    
    parameters on  a  seasonal basis.  Both types of preliminary analysis
    
    
    
    
    proved very effective  in isolating some remaining data problems.
    
    
    
    
          The final  quality  assurance effort produced Calcomp plots  of
    
    
    
    
    wind speed, direction, temperature,  potential wet bulb temperature
    
    
    
    
    and  mixing  ratio  for each of  the 5,717 UASN radiosondes. Each  of
    
    
    
    
    these plots were  scanned for data errors  and  used  to  determine
    
    
    
    
    mixing depths  for the  St.  Louis  area.
    
    
    
    
         In summary,  the  important  concepts to  be  derived  from  the
    
    
    
    
    previous  discussion   of   validation techniques used with the RAPS
    
    
    
    
    Upper Air Sounding  Network data  are:
    
    
    
    
      1) Determination  of  all  possible sources of error in the
    
    
    
    
        collection and  data  handling.
    
    
    
    
      2) Use of alternate  sources of data for consistency checks.
    
    
    
    
      3) Use of intra and  inter station comparisons.
    
    
    
    
      4) Use of known relationships(meteorological in this case)  for
    
    
    
    
        compar isons.
    
    
    
    
      5) Completeness and  objectivity of computerized comparisons.
    
    
    
    
      6) Use of preliminary  analysis routines in error detection.
    
    
    
    
      7) Use of computer  graphics.
    
    
    
    
    
    
         The second  topic  for  discussion is the validation of  the RAPS
    
    
    
    
    emission  inventory.   The  main objective of the RAPS program  is to
    
    
    
    
    provide a body of data  (emissions,  meteorological,  air   quality,
    
    
    
    
    etc.)  which   could  be  used to develop, improve and validate air
    
    
    
    
    quality simulation  models. The first priority is  to determine what
                                      203
    

    -------
    accuracy  is  required   in  any  data  base  to be able to achieve the
    
    
    objectives of RAPS and  secondly,   what   accuracy,  precision,  and
    
    
    bias  currently exists  in  the RAPS  emission inventory. The answers
    
    
    to these questions are  too  complex  to  be  addressed in this  paper,
    
    
    but  they are essential  to  the  design  of  a good validation scheme;
    
    
    therefore I have  included(as  references)  a list  of  papers  which
    
    
    discuss  this important  question  of  accuracy in detail. Thus,  from
    
    
    this point I will limit  the discussion to the procedures that  were
    
    
    chosen  to verify the  accuracy  of the  acquired and estimated data.
    
    
         The RAPS emission  inventory  is  composed of three separate data
    
    
    bases:  (1) point, (2) area, and (3)  line source.  The choice of validation
                             \
    technique depends on  the amount and form of the data in each   data
    
    
    base.  The  point source data base contains hourly, daily, monthly
    
    
    and annual raw  process  data ;  no  emissions are stored in the   data
    
    
    base.  The methodologies used to  calculate emissions and determine
    
    
    temporal  resolution   are   applied  at  data  retrieval  time.   In
    
    
    contrast,  the  area   and   line  source  data bases contain annual
    
    
    emissions. The  methodologies   used  to  calculate  emissions   have
    
    
    already  been   applied   before   the data was entered into  the  data
    
    
    base.  Temporal apportionment  is accomplished through the retrieval
    
    
    software.  As usual,  checks must  be performed on raw data  at  their
    
    
    entrance  into the data handling system . For the area source   data
    
    
    base, this implies  checking the raw data inputs  to  the methodology
    
    
    programs. There are  seven source categories  for  the  area  source
    
    
    inventory  -  river   vessels,  fugitive dust, highways,  railroads,
    
    
    stationary residential and commercial sources, off-highway  mobile
    
    
                                    204
    

    -------
    sources, and stationary industrial  sources and airports.  The software
    
    
    
    
    for these source categories was  developed   by   several  different
    
    
    
    
    contractors  and  therefore  must   be  reviewed  independently.  Area
    
    
    
    
    source data is mainly checked  for  internal  consistency within  each
    
    
    
    
    grid.  Parameters  such  as population,  number  of  homes,  amount of
    
    
    
    
    water area per grid, agricultural  acreage etc,  are  compared  with
    
    
    
    
    each  other  in terms of overall land  use per grid.  Typical errors
    
    
    
    
    that were found include a  1 KM square  grid  which contained over  2
    
    
    
    
    million  acres  of  tilled farm land and a  grid with population of
    
    
    
    
    180 and only 11 single family  homes.   Calcomp graphics was heavily
    
    
    
    
    used  in  the  validation  of  line source  data.  Line sources and
    
    
    
    
    associated  characteristics  such   as    average   daily   traffic,
    
    
    
    
    functional  class, etc.  were  plotted on gridded  maps to the  same
    
    
    
    
    scale as county roads and  DOT  maps. Overlaying  these maps provided
    
    
    
    
    an excellent means of checking the  raw line source data.
    
    
    
    
         In contrast with the  area source  data  base, the point  source
    
    
    
    
    data  base  contains  all  the raw  data  for emission calculations.
    
    
    
    
    Because the raw point data includes temporally-distributed process
    
    
    
    
    data  for  the  entire  study  period  in contrast  to annual county
    
    
    
    
    statistics for area data,  the  amount and type  of point  data  must
    
    
    
    
    be  taken  into  account   when  choosing a technique for raw data
    
    
    
    
    validation.  Parameters  which  apply   at   the   stack  level   and
    
    
    
    
    therefore  do  not  have   a  temporal   association can be manually
    
    
    
    
    verified against original  plant  data.  These   parameters  include
    
    
    
    
    stack  and  fuel  characteristics,  operating  patterns, stack test
    
    
    
    
    data, and applicability of the SCC  to  a  given  stack.  And  because
    
    
    
                                 205
    

    -------
    of the small amount,  monthly  process data was verified manually.
    
    
    
    
         In order  to  perform  a  reasonable check on  the  remaining   data
    
    
    
    
    a  random  selection   of   representative  sources were chosen.  The
    
    
    
    
    prime determinants  in the  selection  of  test  sources   were  the
    
    
    
    
    method  of emission calculation and the time  interval  of  reporting
    
    
    
    
    the data.  One  source from  each combination of  these   two  factors
    
    
    
    
    was  chosen  to   insure   that  all  paths in  the software would be
    
    
    
    
    exercised. The  following  tests  were  performed  on  the   selected
    
    
    
    
    sources:  1)   manual  verification of process  data,  2)  verification
    
    
    
    
    of diurnal, weekly  and/or   seasonal  variations,   3)  hourly  and
    
    
    
    
    annual  retrievals.   Computer software was developed  to  check all
    
    
    
    
    process data in  the point data base for consistency and continuity.
    
    
    
    
          Finally,   all  test  software  runs  were compared  with hand
    
    
    
    
    calculations and  the  retrieval programs themselves   were   compared
    
    
    
    
    with the  documented methodologies.
    
    
    
    
         In summary,  the  important concepts to  be  derived   from  the
    
    
    
    
    above discussion of the  validation  of RAPS emission inventory data
    
    
    
    
    are:
    
    
    
    
       1) Preliminary determination of required accuracy.
    
    
    
    
       2) Analysis  of  current  accuracy.
    
    
    
    
       3) Selection of validation  techniques by:
    
    
    
    
        a) amount  of data
    
    
    
    
        b)  form  of data
    
    
    
    
        c)  availability of supporting data
    
    
    
    
        d)  significance of data  to the  overall accuracy
    
    
    
    
        e)  availability of time  and personnel
    
    
    
    
    
    
                                   206
    

    -------
                         REFERENCES
    Kock, R.C. et al, "Validation and  Sensitivity  Analysis
      of the Gaussian Plume Multiple-Source Urban  Diffusion
      Model", NTIS Publication Number  PB-2Q6951, Geomet  Inc.,
      Rockville, Maryland(1971).
    Ditto, F.H. et al, "Weighted Sensitivity Analysis  of
      Emission Data", Final Report, EPA  Contract  #  68-01-0398(1973)
    Littman, F.E., S. Rubin, K.T. Semrau,  and  W.F,  Dabberdt,
      "A Regional Air Pollution Study(RAPS) Preliminary  Emission
      Inventory", SRI Project 2579 Final Report,  EPA  Contract
      #68-02-1026 (1974).
    Gibbs, L.L., C.E. Zimmer, and J.M.  Zoller,  "Source
      Inventory and Emission Factor Analysis",  Volumes I  and II,
      Final Report, EPA Contract # 68-02-1350  (September  1974).
    Ruff, R. E., P. B. Simmon,  "Evaluation  of  Emission Inventory
      Methodologies for the RAPS Program",  SRI  Project 4331,
      Final Report, EPA Contract #  68-02-2047  ((1977).
                                 207
    

    -------
      VALIDATION OF BIOMEDICAL DATA THROUGH AN
               ON-LINE COMPUTER SYSTEM
                         by
                   Larry  D. Claxton
         Health Effects Research Laboratory
        U.S.  Environmental Protection Agency
    Research  Triangle Park, North Carolina  27711
                          209
    

    -------
                VALIDATION OF BIOMEDICAL DATA THROUGH AN
                         ON-LINE COMPUTER SYSTEM
                                 L.D.  Claxton
                                 INTRODUCTION
         Within the biomedical disciplines there are a variety of testing pro-
    cedures used routinely within many separate laboratories.   Since health,
    research and regulatory decisions are being based upon the results from
    many laboratories, there is a basic need for assuring the quality of the
    data.  In the area of microbial mutagenesis, the use of Salmonella
    typhimurium as an indicator organism for mutational events is employed
    by many laboratories across the country.  The various procedures available
    are rapid, relatively simple, sensitive and are used in a variety of
    laboratory situations including private industry, government and university
    laboratories.  Presently, a great deal of emphasis is placed upon these
    types of tests as prescreens for substances that may be human mutagens and
    potential carcinogens.  Therefore, the use of a system involving Salmonella
    typhimurium could provide an excellent pilot study for methods involved in
    data validation.  Data validation is used in this context to mean the
    process by which generated data is filtered and accepted or rejected by
    objective criteria.   Likewise, computerization provides a potential
    means for systematically applying a predetermined set of objective
    criteria in a rapid non-biased manner.  With the use of TSO (Time
    Sharing Option), portions of the data validation can be conducted during
    the performance of a  biological test.  This article will describe the
    design of a pilot system for the on-line computer assistance of testing
    Also published separately as EPA-600/1-73-038, "Biomedical Data Vali-
    dation Through an On-Line Conpucer System,"  Hay 1978.
                                   210
    

    -------
    protocols and data validation.   The scientific protocols and initial
    computerization have been completed and the system will  be tested in  a
    laboratory situation in the near future by the National  Institute of
    Environmental Health Sciences.
    DESCRIPTION OF TEST:
         From a variety of microbial mutation  test systems,  the suspension
    test using a mammalian activation  system was  chosen because it  is well
    defined  and  is a  quantitative test system.^- '  The more  commonly used
    Ames plate incorporation method  is only semiquantitative.  We also chose
    to compare three  strains of  Salmonella typhimurium and  a forward muta-
                                 (2)
    tion strain  of K-12 E. colIi.v J  In simple  terms,  the  test involves the
    combining of the  bacterial strain  with a compound and a  mammalian activation
    system into  an Erlenmeyer  flask  which  is incubated at 37°C for  30 minutes
    to 2 hours.  The  bacteria  are then separated  and  aliquots are plated on
    minimal  media for the detection  of mutants and on supplemented  media for
    relative survival.  Figure 1 provides  a representation  of the pilot  test
    presently used.   Pilot tests are used  to define more  appropriate testing
    conditions,  and definitive tests provide data from which mutagenicity is
    judged.   For complete testing,  the substance  must be  tested in  several
    strains  of bacteria to monitor  for a variety  of different types of
    genetic  alteration.
    SYSTEMS  OVERVIEW
         This program uses TSO and  was written in COBOL with some additional
    FORTRAN  being  integrated  into the  final program.  All programming was
    accomplished on an IBM System/370  at the Division of  Computer Research
    and  Technology within  the  Nationa} Institutes of  Health, Bethesda,
    Maryland.
                                      211
    

    -------
                                                                                  C/J
                                          D   w
                                          O   £
                 I
             cc  ><
             =>  2
             to  I-
    oc
    111
    >
    LU
    QC
        OQ
    
        O
        CC
        O
        cr:
        O
        CO
    O)  Z
    s-  o
    3  HH
    cn  co
        Q-
        co
    
        co
    
    
        O
        _l
        I—I
        ex.
    
        u_
        o
    
    
        o
        I—I
    
        5
         LU
    
         CO
    
         LU
    
         CsL
    
         Q-
    
         LU
                                      UJ  C
                                      1—  f
    
    
                                      CO  §
    
    
    
    
                                      —  co
    tt^Jig I-*
    i
    O
    + 1
    1 I
    "} =
    oaf >
    O
    1
    z
    
    3  O
    o  o
    ^  r*
    ±  CO
    

    -------
         For ease of programming,  the task was divided into three individual
    programs (Figure 2).   Information, needed prior to testing of a parti-
    cular substance, is stored with the use of Program 1.   This program also
    supplies a number for the blind coding of the compound.  The second
    program provides for the technician the proper form of the basic proto-
    col , performs certain "within-experiment" calculations, accepts the
    input of data from the tests,  and evaluates the test by predetermined
    objective criteria.   The ability for the central laboratory to monitor
    the accomplished work and recall any pertinent data is provided by
    Program 3.  A more precise description of the program is available.  '
      Quality Control Through Interactive Computerization
         One of the basic premises of quality control is that good data
    yields good decisions.   By monitoring the quality of data during an
    experiment and providing  feedback to the technical personnel, both
    personal bias and technical variation can be reduced.   With an inter-
    active computer network this can be done.  This pilot project demon-
    strates these capabilities in several ways.  First, the compound to be
    tested is coded and only essential information for the test is provided.
    Secondly, certain other variables, e.g., concentrations of various
    components, are predetermined for both the pilot tests and definitive
    test.  Within this testing system, two pilot tests are conducted to
    determine levels of toxicology and potential mutagenicity.  From this
    data a narrower range of concentrations for the definitive tests are
    calculated by predetermined rules so that there are a  limited number of
                                    213
    

    -------
    o
    Z
    
    GO
    UJ
    I-
    Z
    O
    I-
    <
         00
         o
         cc
         o
    EXPERIMENT
    PERFORMED
    
    
    
    00
    < <
    _l
    
    
    
    r TERMINAL
    c -^
                                                                             M -J  D
                                                                              . o  a.
                                                                            o o  g
                                                                            a j73  ^
                                                                            a. h-  <
    a>
         UJ
         o
         GO
         GO
         Q
         UJ
         N
    
         CC
         UJ
         H
         D
         Q.
         S
         O
         o
         cc
         o
         LU
         t-
         00
              < o cc
              CC UJ O
              O -J a.
              CQ uj 2
                                                    214
    

    -------
    definitive concentrations used across all laboratories.   Next,  the
    computer performs any needed calculations during the performing of a
    test thus lessening the occurrence of potential computational errors.
    Some of the calculations performed for this system are:   (1) bacteria
    per ml solution based on a standardized spectrophotometer curve, (2)
    variance for the weights of animals used in microsomal S-9 preparation
    (if outside normal limits, these will be rejected), (3) calculation of
    liver weights and amounts of buffers to be used in microsome prepara-
    tion, and (4) calculations for the dilution of samples.   Final  data
    validation is also performed automatically upon the final data output.
    The computer's ability for data storage and retrieval is very important
    in this regard.  For example, in this system, final results are recorded
    as number of colonies per plate.  This software program compares the
    average number of colonies per plate for the controls to the past 100
    accumulated controls to determine statistically if the controls are
    within normal limits.  After the statistical examination of the controls
    the test is either accepted or rejected.  If the test is a pilot then
    the data is also used to determine the concentrations of test substance
    to be used in further testing.  All data are, however, recorded per-
    manently.  Rejected data are recorded so that problems can be analyzed
    as they are encountered.  A flow diagram for the areas within the de-
    cision processes is shown in Figure 3.  The TSO is the component that
    allows for immediate technician/program interaction, thus allowing for
    a rapid and constant quality control.
                                    215
    

    -------
        o
        a.
        o
    
        UJ
        _j
        CD
        i—i
        to
        QC
        0
     •  OO
    oo  to
        UJ
    O)  C_>
    i.  O
    3  QC
    Ol  Q_
        0
        O
        LU
        Q
    
        LU
        O
    
        ce
        rs
        to
    is
             u
             uj
               0
               "
                                                           O
    
    
                                                           CO
    
                                                           H <
                                                           uj E
                                                           2 ui
                                                           It
                                                           LU CC
                                                           a u
    m
    6
    Z
    CO
    H<
    UJ CC
    Z UJ
    u. t
    UJ CC
    Q 0
    i
    
    
    
    
    
    
    
    t
    
    -i
    OH"1
    2 58
    UJ UJ <
    H a H
    
    
    *
    
    
    
    
    
    
    
    
    z
    o
    p
    <
    o
    _l
    UJ
    1-
    co
    UJ
    H-
                                                                               Ik
                                                                                               CO
    
                                                                                               O
                                                                                               CO
    
                                                                                               U
                                                                                               UJ
                                                                                               O
                                                                                               O
                                                                                               UJ
                                                                                               N
    
                                                                                               CC
                                                                                               Ul
    
    
                                                                                               0.
                                                                                                        u
                                                                                                       «
                                                                                               a
                                                                                               z
    <
    z
                         -   i
                         O   wl
                         Z  .LO
                         SoS
                         UJ _ <
                         H a. H
                                       CO
    
    
    
    
    
    
    t/ANTED
    RAINS V
    i-
    CO
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    i
    
    
    
    
    
    
    CO
    Ul
    H <
    Ul CC
    Z "'
    iZ t
    Ul CC
    
    
    »
    ro
    0
    Z
    CO
    UJ
    I- <
    UJ CC
    Z UJ
    It
    UJ CC
    QO
    
    
    
    
    
    
    H
    
    
    
    
    
    
    ^ ^
    OH
    H
    UJ U.
    1- C
    '
    
    »
    
    O H
    2 =
    * 2
    CO U
    1
    
    
    col
    <
    1-
    '
    
    
    
    : col
    > W>|
    
    
    
    
    
    
    
    H
    
    
    
    
    
    
    3
    >
    LU
    CO
    UJ
    
    nr
    »
    z
    o
    H
    D
    _J
    Ul
    CO
    UJ
    H
    T
    
    J
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
                                                         216
    

    -------
         This prototype system demonstrates that interactive computer pro-
    grams can be used to effectively increase the quality control of rapid
    ui vitro tests.   However, it is also apparent that the more simple
    i_n vitro microbial mutagenesis tests such as spot tests and simple plate
    incorporation tests do not require such extensive computerization if well
    documented and detailed protocols are available.   Since most i_n vivo
    mammalian systems have extended experimental time periods, the time sharing
    option would be of little benefit due to cost factors and experimental
    design.  However, even with the more simple i_n vitro tests and mammalian
    cell culture tests, this system can serve as a model for data storage
    and test evaluation for the purpose of quality control.
                                                    /4)
         This paper was extracted from an EPA reportv ' which is available
    through the National Technical Information Service, Springfield,
    Virginia 22161.
                                    217
    

    -------
                                  REFERENCES
    
    
    
    1.    Frantz, C.  N.  and Mailing, H.  U.   1975.   The Quantitative Microsomal
    
    
    
         Mutagenesis Assay Method.   Mutation Research 31:365-380.
    
    
    
    2.    Mohn, Georges, Ellenberger, J.  and McGregor, D.   1974.  Development
    
    
    
         of Mutagenicity Tests Using Escherichia coli K-12 As Indicator Organism.
    
    
    
         Mutation Research 25:187-196.
    
    
    
    3.    Claxton, Larry and Baxter, Richard.  1978.  The Computer Assisted
    
    
    
         Bacterial  Test for Mutagenesis.   Mutation Research (In Press).
    
    
    
    4.    Claxton, L.   Biomedical  Data Validation  Through  An  On-Line  Computer
    
    
    
         System.  EPA-600/1-78-038, U.S.  Environmental  Protection Agency,
    
    
    
         Research Triangle Park,  North  Carolina 27711,  May 1978.   10 pp.
                                    218
    

    -------
    REGIONAL VALIDATION OF STATE AND LOCAL AIR
                  POLLUTION DATA
                         by
                  Thomas H. Rose
                    Region  IV
       U.S. Environmental Protection Agency
              Athens, Georgia  30605
                        219
    

    -------
                     REGIONAL VALIDATION OF  STATE AND LOCAL  AIR
                                   POLLUTION DATA
                                   Thomas H. Rose
                                       SUMMARY
          Two types of data auditing are performed on state  and local  data in the
    region.   One is directed.  The goal  of a directed audit  is to verify a certain
    value such as a violation of a standard.  The other is undirected.   The goal
    of the undirected audit is to determine  the quality of the data being gener-
    ated.  Both are systems audits but the undirected audit  will  have wider ram-
    ifications.  For the most part, I will address the undirected audit.
          Each measurement system requires a different auditing path.   I point
    this out not to make the job sound complicated, but to emphasize the impor-
    tance of having the auditor to be knowledgeable in the area of the audit.
    The path of auditing will be determined by:
          • the quantity and quality of records,
          • the existance of an agency SOP,
          • the availability of records,
            •• on a macro geographic scale,
            •• on a micro geographic scale,
          • the system itself,
          • the time frame allowed.
          Thus you can see that the auditing process is tailored to the specific
    system being audited.
          In Region IV where every funded state and local agency is audited at
    least once a year this is the approach  that we take.
          1.  Establish the flow of samples and data through the system (from
    the agency SOP).
          2.  Trace each parameter of the measurement process  (volume, time, flow-
    rate, etc.) back to the base standard and verify the quality of that standard.
    
                                         220
    

    -------
          3.  Verify that all  measurements and transfers of data are documented
    and follow reference methods.
          4.  Verify that all  measurements, calculations, and data transfers
    are accurate.
          5.  Provide feedback to  the agency being audited of improvements that
    could be made in the measurement process as well  as the data handling.
          One of the most important aspects of this audit is that the agency
    itself has to participate  and  will itself determine the best corrective
    action for their own system.
                                         221
    

    -------
        DATA VALIDATION FOR THE LOS ANGELES
                CATALYST STUDY  (LACS)
                          by
                   Charles E.  Rodes
      Environmental Monitoring Systems Laboratory
        U.S.  Environmental Protection Agency
    Research  Triangle Park, North Carolina  27711
                          223
    

    -------
                       DATA VALIDATION FOR THE LOS ANGELES
    
                              CATALYST STUDY (LACS)
    
                                  C.E. Rodes
    
                                  INTRODUCTION
    
         The Environmental  Monitoring and Support Laboratory (EMSL) is very
    concerned with the quality of data generated in its field studies.  This
    is reflected in the quality control measures employed by EMSL during
    sampling and analysis,  and the data validation performed before data are
    released.
    
         Data validation like other aspects of quality control requires
    resource allocations, especially in terms of the manpower required to
    complete the final validation.  The amount or degree of validation
    required is dependent upon the end use of the data.  In a study such as
    the Los Angeles Catalyst Study (LACS) which is primarily concerned with
    long-term trends, the emphasis in data validation is to detect any
    extreme outliers which  would affect monthly averages.  Since we do not
    report maximums, our data validation philosophy for this study is
    primarily concerned with those values that may affect long-term averages.
    
    
                                  CONCLUSIONS
    
         As the project officer responsible for the study, I initially chose
    an acceptance error band of ±10% on individual measured values, hence,
    this is also the error band of the averages generated from these numbers.
    
         Given this requirement one should be able to assess statistically
    the amount and types of validation required to prevent data reduction
    and transfer errors from contributing more than 1 to 2% to this overall
    ±10% error.  Unfortunately this area has really not been examined for
    this study in any detail nor I expect for many other studies.  The
    present validation levels used for the LACS probably examine more data
    than are necessary to maintain the desired error level, but in regard to
    validation, I would much rather be conservative than embarrassed after
    the data are released.
                                      22k
    

    -------
                                  PROCEDURES
    
         The main objective of the Los Angeles Catglyst Study (LACS)  is to
    develop ambient air data bases for sulfate (SO,), carbon monoxide (CO),
    lead (Pb), and other mobile source related pollutants before and  after
    introduction of the 1975-model automobiles that employ catalytic  converters.
    The data from this study are being analyzed to determine whether  the
    catalytic converter has significantly increased the ambient sulfate
    levels and/or simultaneously decreased the ambient CO and Pb levels near
    the San Diego Freeway in Los Angeles.
    
         The Environmental  Monitoring and Support Laboratory (EMSL) is
    responsible for all study-related functions including instrumentation,
    operation, sample analyses, quality control, and data validation  and
    analyses.  Since January 1976, the operation of instruments and analyses
    of samples were performed under contract to Rockwell International or by
    interagency agreement with the Lawrence Berkeley Laboratory.  To  assure
    the quality of the data supplied by these two organizations, EMSL maintains
    a comprehensive quality assurance program covering all aspects of the
    study.  EMSL issues periodic reports which discuss the trends and the
    interrelationships among the various pollutant patterns.
    
         The site locations in Los Angeles and the site layouts in relation
    to the San Diego Freeway are shown in Figure 1.  By selecting sites with
    the prevailing wind perpendicular to the freeway, the cross-freeway
    contribution to the ambient pollutant levels can be determined using
    concurrent upwind and downwind measurements.
    
         The data collected are classed as either continuous or integrated
    depending on the measurement method.  Continuous data are reduced to
    hourly averages and integrated data are collected either over a 4-hour
    or 24-hour period.  The total data volume generated by the LACS is shown
    in Table 1.  Since the sites are usually shut down in December of each
    year for routine maintenance, the data volumes are based on an 11-month
    year.
    
         The flow of samples and data are shown in Figure 2.  All block
    items except "Data Processing at RTP" and "Final Data Validation" are
    performed by the contractor.  Data validation steps taken by the  contractor
    are referred to as "pre-validation", while validation performed at RTP
    under more direct EPA control are referred to as "final validation".
                                     225
    

    -------
                                   _j _i OC
                                   O O oo
                                   »E
                                   CC CC OC
    
                                   Z Z Z
                                     >
                                     _J
                                     <
                                     z
                                     <
    
    
                                     fftc
                                     CO UJ
                                                   < ce
                                                   CO UJ
                                          SCO 2
                                          O <
                                       Q.  oc co
    
                                     -rr  S =
                                     Sco  ^2
                                              CO
                                              >
                                              CO
    o
    o
                                   oil*
                                                   j
                                          "  >°1^>1
                                          o  — > S < uj —
      UJ
      o
      <
      u.
      cc
      =3
    
    °£
    a >
    LLJ 4
    
    5 5
    < cc
    co u.
                                !Z  o o O ro .i.
                                CO  O H- Z O S
                                                     ec
                                                     z
    u
    LU
    
    
    o
    •h
    ^ I
                                   -1-1 SS
                                   oo £
                                                               c
                                                               o
                                                               '+J
                                                               OJ
    
                                                               I
                                                               (U
    
                                                               T3
                                                               c
                                                               03
    
                                                               c
                                                               O
                     O
                     Q.
    
                     E
                     o
                     o
                     (U
                                                                        T)
                                                                        3
                                                                        o
                                                                        3
                                                                        O5
                                   zz S
    
                                   OC OC CC
                                     N
                                           OC
    
                                     60UJ   UJ
                                     — KJ   -J
                                                   «S I
                                                =  co 5 »
                                                       UJ
                                                       a
                                             fills6s
                                <  Sigis  s s=»;
                                JS  
    -------
               Table 1. LACS YEARLY* DATA VOLUME
    
    CONTINUOUS (HOURLY)
    INTEGRATED (4-HR)
    INTEGRATED (24-HR)
    INTEGRATED (WEEKLY)
    INTEGRATED MONTHLY)
    
    SUMMER
    70,080
    20,160
    9,000
    816
    120
    100,176
    WINTER
    58,560
    11,100
    4,050
    680
    100
    74,490
    TOTAL
    128,640
    31,260
    13,050
    1,496
    220
    174,666
    'ASSUMES OPERATION FOR 11 MONTHS/YEAR.
                        227
    

    -------
    CONTINUOUS SAMPLER OUTPUT
                                               INTEGRATED SAMPLER OUTPUT
           STRIP CHARTS
                                                       SAMPLES/
                                                      DATACARDS
            DIGITIZER
            PRINTOUT
                                       SAMPLES
                                      ARCHIVED
    ri
     SAMPLE
    ANALYSES
                                                       PRINTOUT
    DIGITIZER
    DATA QC
    CHECKS
                                                                 I
                                                             LABORATORY
                                                               DATA QC
                                                               CHECKS
    | j STRIP CHARTS TO RTP |
    
    1
    
    PREVALIDATED
    DATA
    
    
    
    
    
    
    DATA PROCESSING
    RTP
    1
    FINAL DATA
    VALIDATION
    
    
    
    ,
    
    PREVALIDATED
    DATA
    
    
    
    
    •j DATA CARDS TO RTP
    
    
                               Figure 2. LACS Data Flow.
                                       228
    

    -------
         Pre-validation by the contractor is performed  in  two  areas  -  electronic
    digitization of the strip charts and compilation  of the analysis data  in
    the laboratory.  A portion of the data generated  by the electronic
    digitizer are checked against manually read strip charts to verify
    scaling and digitizer performance.   At present 5% of the data  are  spot
    checked in this procedure.  The laboratory analysis results are  compared
    on the contractors computer listing against the data cards manually
    completed during the analyses.  All data (100%) generated  in the laboratory
    are checked in this procedure because of the importance of single  integrated
    values.  We do not at present require the contractor to keep records of
    the amount of data corrected during prevalidation.
    
         Final data validation is performed at RTF following the general
    procedure in Figure 3.  This step in the validation is concerned primarily
    with data transfer errors, but also examines data that are not consistent
    (outliers) with the rest of the data base.  In general all of the  values
    in approximately the highest and lowest 1.0 percentile are verified with
    a check made at random of approximately 5% of the remaining data.   These
    validation levels were initially selected somewhat  arbitrarily by the
    project officer as a compromise between data quality and the amount of
    resources required for the validation.
    
         The final data validation procedures are based upon the output
    formats used to list the individual data values.  The three formats are:
    (1)  an hourly listing for continuous data such as  CO and NO,  (2)   an
    integrated data listing for samples averaged over 4-hour or 24-hour
    periods, and (3)  a summary listing comparing simultaneously collected
    upwind and downwind data for freeway contribution.   The general  instructions
    given to the data clerks are shown in Figure 4.  A  sample printout of
    hourly data is shown in Figure 5, followed by the outlier limits in
    Table 2 used in validating the hourly data.  A sample of a 24-hour
    integrated data printout is shown in Figure 6 with  its associated validation
    limits given in Table 3.  A study is presently being made of the frequency
    distributions of the LACS data to reassess the validation limits listed
    in Tables 2 and 3.  The starred (*) values on the printouts are values
    determined to be outside ±3 standard deviations of  the monthly means.
    A sample of the summary format is shown in Figure 7.
    
         For possible future validation requirements, portions of integrated
    samples are stored at the contractor's laboratory,  and the strip charts,
    data cards, and final validated printouts are stored by EPA at RTF.
                                       229
    

    -------
    RAW DATA FROM
    CONTRACTOR
    '
    I i
    STRIP
    CHARTS
    
    
    
    
    
    
    PREVALIDATED DATA
    FROM CONTRACTOR
    r i
    F
    
    
    
    
    
    
    C°RTD*S | OAT»nH>CE«M«G |
    
    /
    HOURLY
    PRINTOUT
    \
    ^ ^^
    ;i
    
    •
    r
    24-HR
    PRINTOUT
    1
    r
    FINAL DATA
    VALIDATION
    \
    SUMMARY
    PRINTOUT
    f
    '
    
    
    
    
    
    
    
    VERIFICATION
    OF OUTLIERS
                FINAL
              VALIDATED
              PRINTOUTS
    Figure 3. Final data validation.
             230
    

    -------
              Figure 4. LACS PRINTOUT VALIDATION INSTRUCTIONS
    GENERAL
    (1)  VERIFY THAT BLANK SPACES ON PRINTOUT MEAN THAT NO DATA EXISTS.
    (2)  VERIFY THAT ALL ZERO VALUES (0.0) ARE REAL,
    CONTINUOUS
    (1)  CHECK ALL HOURLY PRINTOUT VALUES THAT EXCEED THE OUTLIER
        LIMITS AGAINST THE STRIP CHART.
    (2)  SPOT CHECK 5 RANDOM HOURLY VALUES ON EACH STRIP CHART (ONE
        WEEK/CHART) OTHER THAN THE MAXIMUM VALUES.
    INTEGRATED
    (1)  CHECK ALL 4-HOUR AND 24-HOUR PRINTOUT VALUES THAT EXCEED THE
        OUTLIER LIMITS AGAINST THE SAROAD CARDS.
    (2)  SPOT CHECK 2 RANDOM VALUES FOR EACH POLLUTANT AND TIME
        INTERVAL PER MONTH.
    (3)  CHECK ALL STARRED VALUES ON THE SUMMARY PRINTOUT IN COLUMNS
        A, B, C, D, AND (C-A). IF ONLY (C-A) IS STARRED, IN ADDITION CHECK A, B,
        C, and D.
                               231
    

    -------
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    >•
    w
    lu 4
    149
    < 4
    z «
    0 t
    IM
    »— £
    O U
    Hi h
    t»- «
    o d
    or M
    O.
    •
    _J • :
    « 1
    »- t
    z
    UJ
    * u.
    O r-
    ac c
    1-4 1
    >
    z <
    UJ 4
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    M
    ^
    
    rg
    *•
    
    
    
    
    
    
    
    
    
    »-
    
    
    a
    BE f*
    O lA
    u. O
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    UJ
    <3
    ) IM
    M
    O
    »-4
    0
    1
    1 2
    Ui .»
    u> «-
    O rg
    ?i
    i 1-4 rg
    z •*
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    j»
    .j
    IM
    <
    A
    
    
    
    
    
    
    
    
    
    rw
    r»-
    0
    *"
    
    O.
    UJ
    1/9
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    Z
    o
    ^
    -1 141
    « 141
    *$
    a. u
    u  ••* ••
    0 O
    
    a
    0 ci
    
    oo
    K. (V
    •4t •"* O
    O O
    «- »
    •A »- O
    00
    
    ro »•
    IM •- »-
    O 0
    
    •- »-
    «* r*. r*
    O O
    
    
    C •*
    r ocl
    
    ». T2
    4T- »- O
    
    
    r> an KI
    »- O a
    
    
    oo rg
    &• *o ^
    oo
    
    
    * *™r
    06 w\ i/t
    o ci
    
    r^ *n trt
    °°4
    rout
    •d ro *a*
    O O
    
    «• fM -*
    «*. iA ro +9
    D •-)
    
    «- -*
    •* >g -t
    0 Cl
    
    
    *) -1
    Kl rvj O
    O O
    
    •- l/l
    
    0 0
    
    
    ••* ro a)
    o o
    
    
    Ui iA
    •g -o A|
    r* CJ c J
    rg rg IM
    Ktl/MA
    0 O O
    
    4*O *O ***^
    0 OO
    
    
    ^- lA «"*
    
    0 O O
    
    
    
    r-j rg ro
    c*> u «— *
    
    
    ro oo gj
    *- IM IM
    0 O 0
    
    
    ^* O fO
    
    o o o
    rgr- fi
    r- IM IM
    O O C.
    
    
    
    
    O O O
    *~ u-.r-
    r- rg rg
    0 O O
    CO -O IM
    o oo
    
    f^ rg !>•
    ^ N^ IO
    O O Cl
    
    o ro o
    tj\l (jf) ^
    0 O < J
    
    
    V* O O
    ** r~> ^
    t> «- C-
    
    SroT
    O »» tJ
    
    
    OO fO 0(>
    o «- o
    
    
    r^- o^- *o
    « o  
    O O 0
    • • •
    o «» r^
    ro ^j go
    f i O t
    
    .00--
    >J K> r^
    0 CJ 0
    
    0 K> -»
    r") ^r rw
    r- o c
    
    
    t *J ^O
    ro •» r^
    o O o
    
    IA r^. r\j
    
    O O C.
    
    
    «- ~» gj
    o o o
    
    
    TJ N. >»
    »- .» -o
    t j 0 C.
    «~ rg i>* ry IM
    J iA -O iA gj
    ooooo
    
    IM sr rg O IA
    ooooo
    
    
    
    •43 O g/ lA IM
    OQOOO
    
    
    
    O g) .» >» «-
    
    
    
    fC?c ms»c
    0 r> « . LJ 3
    
    
    *~ ao -* oo ro
    
    O cj Ci ' J O
    -» M A, ^- A,
    K1 ^ *^ Kt *~
    o o o o c-
    
    
    
    
    ooooo
    r\j *c ^i a c
    r\J (\^ f\J ,1^ <|r_
    o o o o c:
    O r* fO Ki O
    CM <^ K\ KV C4
    ooooo
    
    -4 M f^ »- o
    rvj rn ^ ^ fvj
    0 O C- 0 C;
    
    ••* -O f*» W~»
    
    O '•-> O O
    
    
    >4 ht T* O
    O -O oo JO
    o c> c c
    
    l/^ w~ ^ Ow
    «^) r- O1 o
    O T- 0 f J
    
    
    ao LJ »- t>
    Q f>J r* O
    
    
    
    
    «- •- r* O
    
    
    Kl Kl -^ *-
    JCJ i— ^ -O
    o «- o o
    
    O 0 0 « t
    r** IA *o ^
    ** wi ^> IA
    o o cj r
    
    
    •* >» M ^»
    J o cj o
    
    0 u) .) ^
    • (Wl ^ 1O
    r) o c ) o
    
    
    ^ D wi (ij
    IA ro IA ro
    :3 0 O 0
    
    IA •- r^ gj
    
    a o o o
    
    
    >O <4 Wl iT%
    a '_. • i o
    
    
    
    •CI ^» l/J >»
    CJ C 0 Ci
    IM rg
    fM IM
    OO
    
    O r«t
    OO
    
    
    
    Co
    
    
    
    4JJ— O^
    
    
    
    •^ir?
    O O
    
    
    t- CJ
    
    o o
    oa «o
    o o
    c- c
    
    
    
    
    OCJ
    «- x<
    «- c>
    oo
    rg ao
    0 0
    
    r-j >»
    CM »-
    Cl CJ
    
    0 >»
    
    •-J O
    
    
    ^ -0
    C C)
    
    JO rg.
    r >»
    O rj
    
    
    •* Jo
    c > cj
    
    
    
    
    o o
    
    
    o o.
    ro to
    r> o
    
    '^^
    Cl M
    •0 r-1
    O f I
    
    >A -(>
    t* t f 1
    f > -J
    
    art 0
    rO f
    <-•>.)
    
    
    •0 v|
    ro io
    O O
    
    N. <)
    
    O u
    
    
    >t «-
    o o
    
    
    
    J r-
    r. c i
    4r- <\j rg
    rg rg -»
    a o u
    
    IA ro O
    OO O
    
    
    
    -» |A IA
    
    
    
    < lA •»
    
    
    
    °Jo ro
    ' •> CJ CJ
    
    
    IA rv sf
    
    cj o o
    
    
    o o o
    
    
    
    
    OO 0
    0 «.;
    T* V
    O O
    rg o
    O O
    
    •O «r-
    ^- f\J
    Ci C
    
    •o o
    
    0 -J
    
    
    ^ ^
    C' c
    
    ^ r°
    u 1-J
    
    
    ^- ao
    c_> c:
    
    
    
    
    o o o
    
    
    
    •~ rg o
    O O e
    
    .J O . 1
    rj r j .-
    'M ^^ *A
    r J t i r
    
    T- .N) LA
    r- i r-> N*
    Cj O .J
    
    o- r- ro
    i-\J fO ^
    r. n i;
    
    
    * i r^« -j
    IM ro 'rf
    0 T j
    
    uj cj r^
    
    C, CJ O
    
    
    1- 1 ~T IA
    l-l O J
    
    
    
    "> ^4" l/^
    I-' CJ 1
    IM IM rg rv rw
    rg m in IM rg
    u O OO O
    
    ro gj 10 rg g)
    oo oo a
    
    
    
    K> g» i»> IM g/
    o O o o o
    
    
    
    ro K> ro rg ro
    
    
    
    r- IA o -o o
    O K> O O O
    
    
    ro c> •— •» o*
    
    O O o O '-3
    c, rg ro in ro
    
    OOOOO
    
    
    
    O O r- C »~
    OOOOO
    O •»• •- -4 C
    CJ Cj v~ U ^
    o o ci O o
    0 Kt oarjo
    OOOOO
    
    o- 06 >» O rg
    O •• *- ^* v-
    c O o c. c
    
    O *- w» O f '
    
    J O ( 1 -J 3
    
    
    »- ^* -^ 10 r^
    CJ O O O C
    
    ^ r i? * ?
    V_J l^ \J »• 1 1
    
    
    IM iv* r*- ("J IA
    f.; O 0 •- 0
    
    
    
    
    O O O O CD
    
    
    
    to »^> s* fM IO
    ci ct c:i n o
    
    o o o -} o
    •- o o IA r-
    rO r*> ro IM fM
    c r» c - r> r
    
    J-, o* rg o r-
    i\i rg r* t r\j **
    i > O O O c.>
    
    r- rs. ro >» r^-
    g '\J .-•> AJ ^*
    • I ) I 3 Cl O
    
    
    0 > - 1 » 1
    rvj rg ro rg r\j
    :J T ci ~J O
    
    u- ro c i •* r-
    
    c-. cJ O Ci cj
    
    
    ro ro -J fM i lA OO
    *A -^ tf\ & gj
    cj O cj ci o
    
    >* O Kl AJ (A
    IA Kl •* •- J
    OO O rg O
    
    
    
    -«-«»- r- ro
    O O o f 0
    
    
    r«- oo on r- o
    ro rg »- ro •-
    
    
    
    ^2^2^
    C. C5 0 0 0
    
    
    CJ IA T- lA U"V
    
    OOOOO
    •OIA r- rg rg
    
    C O C- Cl O
    
    
    
    
    OOOOO
    K r-. c >* ro
    ^~ •"* IM ro T^
    CJ O O O O
    ao of gj rg •-
    O O o O O
    
    -* r1- rg ro KI
    10 rg T- N. T—
    CJ O C • O C
    
    IA f* c> O f^
    
    0 . » 0 ' 1 0
    
    
    
    4T- C) CJ •- C-
    
    c" ^ u" r4, o?
    »- 0 0 «- 0
    
    
    LA f>- 00 5 r^
    «- o o »• o
    
    
    
    
    »- o o T- o
    
    
    
    oo -d o rg ^j
    C' n o •- cj
    
    -) T 0 T- o
    o o o- r- a,
    u> ui ^ ji r**
    f O r • i ) n
    
    
    b> IA .4 f*- Gv>
    O CJ O O J
    
    f^. IA 0 0 IA
    u-\ Jf t O >
    c r-1 i i r> i >
    
    
    -o n> ,•> ,i M
    g^ *4* i^* «o •—
    '_) o n n •-
    
    N* IN) r- u\ r^
    
    O CJ V'J Cl «-
    
    
    lA I/I >» INI 1*1
    *,» o rj *- ^j
    
    
    
    w m j •>• g
    c ' i 1 i- j r j 10
    rg cv rg fM rg
    ro Kl IM KI T-
    ro Kl O lA lA
    OOOOO
    
    gf f»fc o 30 O
    IA oo f»- r»- »•
    ooooo
    
    
    
    K% 0^ OQ Qr* 1"^
    o o o o o
    
    
    *f ^ fig ^ ^-
    *~ aO oO fit i/^
    
    
    
    ^3 Lrt O U^ CVJ
    O O O O ' J
    
    
    O «" *•• O1 O
    
    o o o o ^
    o- u P«> i/.
    
    c^ o o o c
    
    
    
    
    ooooo
    0- O 0 ,» >»
    O «• 1- 4f» «r-
    0 CJ O O O
    oa O - •*
    ooooo
    
    C< a. ro r>- c.
    •- O g/ rg ru
    c^ c; o o o
    
    ro «™ •* ro •—
    
    J C> 'J 00
    
    
    ^ |/\ 9^ QQ ^.
    ro r* n >o *M
    r c: *- o c
    
    *, is £ r *'
    '" i >. > •- O CT
    
    
    r«- %» a. Ni
    0 C) C. 0
    
    
    r«- -o o r^ -o
    
    OOOOO
    
    
    
    -O r-1 ~O ^o o
    O O Cl O O
    
    1.1 1 1 J C» O
    K K> Of «- C
    ro »• O O uu
    t • c) o ci o
    
    ^ .0 IM «A IA
    -4> «i », i>t gj
    -3 O U CJ C 1
    
    IA ui r\i vj ^J
    •O O * A
    i_. n c' ci ••*
    
    
    ^1 M -O A Is-
    r»> 1*1 g^ ^> IA
    0 ^3 '3 0 »-i
    
    » OO O
    
    0 CJ (.i 0 C-
    
    
    Kl "1 Is- -O XJ
    • J ' J • J CJ • 1
    
    
    
    g- u> r*- r*- «j
    C. Cl VJ Cl U
    oo
    o
    1CJ
    -» IM
    O IM
    • •
    
    Kl O IM
    IA Kl •
    O T-
    • •
    
    rg ors.
    •^ K> K>
    »_* T*
    • •
    
    r» 0 **%
    rO ro fw
    f > O
    • •
    
    fM CD grj
    fs* f • *O
    0 0
    » •
    aw c-K»
    
    C C
    • •
    
    h» o ^
    «- Kt -^
    ^ S
    V 0 *
    v* rw r*1
    0 0
    • •
    jtj O rg
    r* f\i m
    0 0
    • •
    u^ o ra
    IM r\j t*-
    O o
    » •
    ««O Qfl O"
    f • P-J a)
    > O
    • *
    
    -» a 0
    C •-
    * •
    *" O CO
    1 3 f^-
    * *
    
    aJ rj o
    ci fM
    • •
    
    «p— *^- >*
    r** «**a oo
    O •-
    • •
    
    a 0s o
    *A r>j rg
    CT «-
    4 •
    tA f\( O
    ul <-
    * •
    m o r--
    -* "ij J&
    r> n
    • •
    KI ;v » t>j a>
    tJ 0
    • •
    *o i> *n
    * M v>
    Cl < J
    • •
    
    (\ > l vl
    <>4 r>i «*
    i ) T-
    *
    < J o r*
    */^ <^g iA
    O v
    « •
    
    w> f-j CJ
    J r-i
    • •
    
    u i c> r\i
    tj > ' \j AJ
    i_j ro
                                                                        o
                                                                        •f-J
                                                                        c
                                                                        'L_
                                                                        Q.
                                                                        O
                                                                       .G
                                                                        D.
    
                                                                        E
                                                                        03
                                                                       to
    
                                                                       iri
                                                                        0)
                                                                        L_
                                                                        D
            • O- O   »- rg i
    232
    

    -------
    Table 2. LACS CONTINUOUS SAMPLER OUTLIER LIMITS
    LACS CONTINUOUS SAMPLER OUTLIER LIMITS
    (ppm)
    
    CO (CARBON MONOXIDE)
    NO (NITRIC OXIDE)
    N02 (NITROGEN NIOXIDE)
    O3 (OZONE)
    TS (TOTAL SULFUR)
    WS (WIND SPEED)
    SITE 008
    25.0
    —
    -
    -
    ALL OTHERS
    15.0
    0.5
    0.3
    0.3
    0.05
    15A
    AMILES/HOUR
                      233
    

    -------
    1
    
    
    
    
    
    
    
    
    
    
    
    
    
    o
    if,
    IT
    •a
    u
    Z 2
    C »
    — Y
    t- t
    *- •
    li-' -
    t-
    c «
    01 >
    0. «
    : £
    •3 'O
    'Z C
    L. :
    
    i a-
    —
    ^
    
    u
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    '•
    
    
    
    \
    i
    
    
    
    
    
    
    
    
    
    
    
    i.
    c
    •
    c
    c
    c
    0
    -
    C
    c
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    t
    I r n P N i 4
    _i (,
    * <
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ;
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    vr
    i C
    t
    ViJ
    
    i in
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    5
    i
    
    <
    
    
    
    
    .
    -
    ]
    c
    3
    
    •
    
    
    
    
    
    £
    "*
    t
    
    
    
    
    
    L
    <
    u
    ~
    
    0
    
    
    
    
    t
    *"
    c
    t-
    
    
    c
    u
    -z
    u
    (X
    u
    u
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    1
    ,
    x c
    J C 3
    r c 3
    
    
    
    r*
    -
    c
    P
    * O
    c
    ff
    r
    ;
    
    r-
    _
    c
    c
    a
    ^
    ~"
    f^
    "
    
    
    r->
    C
    ' "
    o
    f*
    t
    • '
    ! fV
    : —
    
    
    r-
    -
    i r
    o^
    (N
    
    r>
    _
    c
    &•
    • — H
    t- tr
    !t -
    ;?i
    -
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ,
    
    
    
    
    
    
    '
    =
    C
    
    
    
    
    
    
    
    !
    r
    
    
    
    
    
    
    
    C
    
    "
    
    
    
    
    
    
    
    g
    f
    
    ,
    '
    
    
    
    
    C
    L
    
    
    -
    
    a
    r
    x
    
    :
    r*
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    r
    
    
    
    
    
    
    
    b*-
    Lf =
    r r
    
    
    
    
    
    
    
    i C' <
    '• — i
    • Ul u
    
    
    
    
    
    
    
    IT r
    
    3- -4
    
    
    
    
    
    
    
    (SJ ~-
    *i 3" C1
    f-i K
    J
    
    
    
    
    CM C
    _* »
    
    
    
    , X X
    k tl t
    i^;
    r"
    i
    i
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    CV.'
    
    
    
    
    
    
    
    r -
    fx -
    
    
    
    
    
    
    
    
    • C1 C
    ' 3 1
    er f
    
    
    
    
    
    
    
    cr r»
    '
    *^* u
    
    
    
    
    
    
    
    rO 5"
    i 1^ C
    
    
    
    
    
    
    — c
    _-
    
    
    
    X X
    j CJ C
    (^
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    r. 3
    
    
    
    
    
    
    
    C T
    • -t r
    C^ 3
    
    
    
    
    
    
    
    O (
    1^ -
    o- a
    
    
    
    
    
    
    
    <
    
    
    
    
    i rg a
    .-•
    
    
    
    - CS! 1*
    I O C
    x x
    «SI fS>
    1 c r
    > X X
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    t;
    
    
    
    
    
    
    
    
    t. a
    i -
    
    
    
    
    
    
    
    t <
    ^ r- c
    IT 0
    
    
    
    
    
    
    
    r- 3
    1
    3
    
    
    
    
    
    
    
    
    ! 0- O
    '
    *s cc
    
    
    
    
    ^ oC
    
    
    
    
    1 C C
    •s >•
    c c
    
    
    
    
    
    
    
    C' -
    
    
    
    
    
    
    
    
    
    
    o c
    * *
    
    
    
    
    
    & tr
    
    
    
    
    IsT ?s
    xC
    M  3-
    
    rs —
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    1/1 (S
    •~* ^<-
    
    
    
    ec (*•
    X X
    fsi rs,
    C' C
    X X
    fs- Js.
    
    
    
    
    
    1
    
    
    
    
    1
    
    
    
    1
    
    
    
    
    
    
    
    
    
    
    
    
    f" f
    Cr C
    -c.
    
    
    
    
    
    
    
    t c
    fv *J
    cc -t
    
    
    
    
    
    
    
    
    
    — .
    
    
    
    
    
    
    
    
    
    I^-L f
    t_' t
    
    
    
    
    0- O-
    
    
    
    
    c -
    (N (S
    X X
    IS-  c
    X X
    rs. i^
    fs, fs
    
    
    
    
    
    j
    i
    
    
    
    
    
    
    
    
    
    
    
    c o-
    r rj
    
    
    
    
    
    
    
    *
    cr -t
    C (S
    r c
    
    
    
    
    
    
    
    r c
    o -
    <*: u'
    
    
    
    
    
    
    
    0- C
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    00 sO
    
    
    
    
    (V IS
    
    r -i c
    TS |s.
    rs ^^
    
    
    
    
    
    
    
    
    ;
    
    
    
    
    
    
    
    
    
    r .
    
    
    
    
    
    
    
    -c is
    f ^
    t r
    
    
    
    
    
    
    
    C C
    — 0
    o r-
    
    
    
    
    
    
    
    ^ iT
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    U i.
    
    
    
    
    ^5 rs
    X X
    fsi r\i
    t, c
    x x
    t-s rs
    r. rs
    
    
    
    
    
    
    i
    I
    
    ''
    
    
    
    i
    
    
    1
    
    
    
    
    
    
    
    
    
    c
    f
    L,
    
    
    
    
    
    
    
    c
    -t.
    ^
    
    
    
    
    
    
    
    c.
    
    
    
    
    
    
    
    
    
    
    ts
    
    
    
    
    
    
    a
    _•
    
    
    
    cr
    ISI
    IN
    r
    x
    rs
    r~.
    
    
    
    
    
    I
    i
    
    
    
    
    
    
    
    
    
    
    
    -
    
    
    
    
    
    
    
    r
    -
    c
    
    
    
    
    
    
    
    c
    (Si
    
    
    
    
    
    
    
    •c
    
    
    
    
    
    
    
    
    
    {£
    IT
    *
    ISi
    
    
    
    
    _
    *-,
    
    
    
    O
    ?"
    X.
    ^
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    +•
    C
    C
    *^
    c
    L
    c
    s.
    ^
    r
    (
    1
    
    I
    
    
    C
    
    
    
    L
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    1
    234
    

    -------
    Table 3. LACS INTEGRATED SAMPLER OUTLIER LIMITS
    
    TSP (SUSPENDED PARTICULATES)
    NOa (NITRATE)
    SOi (SULFATE)
    NH4 (AMMONIUM)
    Pb (LEAD)
    S02 (SULFUR DIOXIDE)
    24 HR
    200
    30
    30
    3.0
    8.0
    50
    4HR
    300
    30
    50
    3.0
    12.0
    -
    235
    

    -------
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    rv
    FV
    V.
    f
    _
    X
    O
    c
    
    
    u
    z
    4
    ct
    cr
    >-
    UI U
    •-• z
    I/I UJ
    >- u>
    _l 4
    S H
    «• c
    ^
    _l 1-
    4 U
    u u
    *-• >-
    z c
    x a
    o a.
    UI
    »- _
    4
    O 1-
    z z
    4 U
    Z
    _l Z
    4 C
    ^J Q;
    — —
    UI Z
    — u.
    1-
    4
    l>—
    UI
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    >-
    o
    3
    IT1
    
    • »-
    IT
    >-
    1 _(
    4
    »-
    4
    U
    
    UI
    UI
    _J
    12
    z
    4
    
    I/I
    O
    _l
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    —
    3"
    CM
    1
    O
    
    
    
    
    
    
    
    
    
    CNJ
    cr-
    eo
    CM
    
    CM
    
    
    Q
    4
    UI
    _l
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    U
    
    m
    INI
    — •
    
    tt
    UI
    I-
    UI Z
    I O
    —
    u n.
    x cc
    13 C
    r v
    cr
    z <
    •—
    c
    z -
    c z
    
    1— 1-
    4 4
    or
    h-
    Z _l
    UI O
    u >
    Z 1
    0 —
    O I
    
    
    
    
    t
    I
    
    
    
    t
    
    
    1
    
    1
    1
    t
    »
    
    
    t
    
    »
    
    
    
    
    II
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    t
    1
    H
    o a
    i
    i
    - H
    1 U
    1 C
    1
    1 t
    ICO 4
    1 ••»
    IV, _
    1 U
    1 I
    1 >
    1 1
    0 -1
    1 U
    a
    i
    i »
    10 -
    u
    1
    1
    i CO _
    1 u
    c
    1
    c
    1
    1 1-
    •-
    1 U
    1
    1 1.
    > u
    1 «-
    V
    1
    a
    u
    u
    4
    1-
    V
    
    
    O
    X.
    *
    1
    0
    c
    o
    3-
    CM
    0- 0
    0 -
    c
    
    o
    CM
    •
    3"
    C> f
    t —
    O
    rv
    0 O
    CM rv
    rv «
    c. o
    CM CM
    O O
    0 C
    CM C
    
    < 1 C
    
    o a
    o m
    in in
    o c
    CM CM
    c a
    o-
    T
    1
    f>
    1
    O
    1
    C
    3-
    O O
    O -
    1
    o
    -
    o
    -0
    3"
    D
    O
    O O
    rv 00
    co a>
    C C
    CM CM
    |0C
    o-
    T
    *
    t
    Cl
    1
    tl
    CN
    o-
    0
    1
    o a
    — CN
    C
    c
    c
    c
    -o
    c
    0 C
    rv -o
    o c
    CM CM
    o c
    o c
    — c
    - .J
    
    
    0 0
    
    c —
    CM CN
    0 C
    1
    J1
    1
    o-
    IT
    D
    Cl
    c
    CM
    1
    0 0-
    1
    0
    
    IT
    •
    1/1
    O
    o a
    
    
    CM CN
    O C
    ,
    •
    "'
    1
    O-
    1
    C
    or
    o
    o o
    CM CN
    r
    r
    c
    o
    •
    in
    o
    rv
    0 0
    
    
    CN CM
    O C
    1
    1
    o o
    — c
    1
    
    
    
    0 0
    
    
    CM CM
    O C
    !
    " I
    1
    C'
    c
    o
    o
    CM
    o
    C' C
    3- -<
    C
    o
    -c
    •
    un
    0
    3-
    0 C
    
    
    CN, CN
    0 C
    3
    •
    1
    r-1
    »
    C
    C
    3
    C
    <
    c
    0 C
    ^\ u
    c
    c
    c
    t>
    c
    0 C
    
    — CN
    CM CN
    C C
    236
    i
    0 0
    rv *•
    --
    
    
    
    0 O
    
    CM CM
    CM CM
    0 C
    O
    T
    CN
    1
    1
    O
    -c
    1
    0
    CM
    1
    0 C
    c, c
    D C
    D
    -
    C
    •
    CM
    CJ
    X
    o o
    
    
    CM CM
    C C
    i
    •Ci
    fMJ
    0-
    a
    i
    INI
    CNi
    o- a
    -C 3
    — a
    i
    c
    -
    Cl
    3-
    •
    «•>
    O
    n
    o o
    
    CM CNI
    CM CM
    0 0
    0 O
    -C Q
    O CM
    
    
    
    0 0
    
    CM CM
    X X
    CM CM
    0 C
    1
    
    <
    X
    o si
    * »l
    » 1
    > •*!
    * *
    IT tS
    1
    |
    1
    O" iN
    • i
    — IN
    0- 0
    o m
    3- —
    i
    o- un
    -o o
    T-
    O m
    c c
    o rv
    1 -O —
    • •
     <
    1 — CM
    Cl IT
    •C -O
    |o m
    IrC-
    •
    c
    4 • •
    q z o
    3-
    a
    CM
    
    O-
    CC
    CM
    -
    CD
    t£
    C
    ^
    t-
    O
    *-
    •Hi X > 1
    •1 • •
    t =r C
    
    IT
    *
    C
    i
    •v C
    IV C
    <~ f
    1
    c
    o
    *~
    
    ^1
    oc
    o
    
    m
    CM
    ^
    ~
    U
    a
    CM
    rn
    i
    \r
    in
    -c
    c
    o
    
    1*1
    n
    o
    
    o
    3 |v | ir
    • 1 •
    n l o
    
    
    
    
    u
    u-
    
    
    
    * \T: CM
    CN
    -
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    CM O
    11
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    C'
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    i
    
    
    
    
    
    
    
    
    
    
    
    
    i i
    
    
    
    
    
    
    
    
    
    U.
    u.
    u^
    1 c
    
    -\ n
    »-; ' a:
    4 O
    or
    t~ o
    i
    
    
    
    
    
    
    1
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    !
    
    
    
    
    
    
    
    
    
    
    1
    
    
    
    
    
    
    
    
    
    t.
    a
    <
    z
    c
    %-
    I/
    ! *+•
    1 0
    , o
    u
    a
    
    o-
    c
    c
    ir o-l o
    o- H
    • 0
    0 II
    ii a.
    CL
    K-
    >-
    UI
    X
    •-
    -
    "J-| 4
    1- Z- U
    Z 4 -
    4 UI O_
    03 >-
    — u.: i-
    U, — , 4
    
    Z IT UI
    IT •— 1 or
    ~ 01, 4
    <"
    U\ UI
    U1 -, r^
    M -1
    Ui O
    o z
    4
    >
    Z UJ
    UJ Or' —
    cc u, •
    U U. Q
    u. — ] UH
    — o! cr
    c cr
    ••' 4
    -• UI 1—
    UI Ui UI
    
    

    -------
      VALIDATION TECHNIQUES USED IN CONTINUOUS
                   AIR MONITORING
                         by
                  Marvin B. Hertz
         Health Effects Research Laboratory
        U.S. Environmental Protection Agency
    Research Triangle Park, North Carolina  27711
                         237
    

    -------
                      VALIDATION TECHNIQUES USED IN CONTINUOUS
                                   AIR MONITORING
                                     M.B.  Hertz
    
         The Community Health Air Monitoring Program (CHAMP) is a network of air
    monitoring stations used to acquire reliable air quality data for use in
    epidemiologic health effects studies.
         The CHAMP network has remote air  monitoring stations located in each of
    the selected health study communities   across the country.  The focal point
    of the CHAMP network is the central computer facility located at the Environ-
    mental Protection Agency, Research Triangle Park, North Carolina.  A mini-
    computer at each of the remote stations controls and acquires data and asso-
    ciated system status information from aerometric and meteorologic instrumen-
    tation and transmits the data by phone lines to the central computer facility.
    The central controller for the CHAMP network is a dual processor system with
    a full complement of input, storage, and display peripherals.  One minicom-
    puter was selected to perform the tasks associated with the management of the
    large data base to be generated by the network.  The telecommunications and
    real-time processing tasks are handled by the other processor.
         Two fundamental system objectives were:  (1) to provide machine valid-
    ation of the data, and (2) to develop a management information system for use
    in the quality assurance, field logistics, and field maintenance tasks asso-
    ciated with system operations.
    Remote Data Acquisition System
         Basically, the minicomputer in the remote station serves as an  interface
    between the pollutant analyzers and associated system, magnetic  tape data
    storage, the remote field service operator, and the telecommunications net-
    work.  The data generated and recorded at the remotes and transmitted to
    central includes not only the actual meteorologic and pollutant  sensor re-
    sponses, but also associated analog signals and digital status signals.
                                         238
    

    -------
    These signals supply information about the performance and status  of each  in-
    strument.  For example, if an instrument is switched from an ambient sampling
    mode to the calibration mode, a status bit is recorded which reflects this
    change.
    Telecommunication
         Data is retrieved on the request of the central computer system from
    each of the remote stations via a dial up phone line at two-hour intervals.
    The central and remote computers converse via voice-grade telecommunications
    system consisting of modems operating in a full duplex mode at the rate of
    1200 baud from remote to central and 150 baud in the reverse direction.
    Polling is under the complete control of the central controller.  A file is
    maintained on a disk at central which contains the phone number of each sta-
    tion in the format required by the calling software system.  An alterable
    polling queue is also disk resident.  A rigid protocol has been established
    to guarantee accurate transmission and retrieval of data.  Central makes sev-
    eral tries to establish contact with a remote station before abandoning the
    attempt and placing the station at the bottom of the polling queue.  A hard-
    ware carrier detect protocol establishes the communications link.   Each frame
    is checked for parity and framing errors by the modem controller.   Checksums
    are computed for each 512 frame record and compared by the computer.  An
    acknowledge character is exchanged indicating correct receipt of the record.
    Should any of the tests fail, several transmission retries are made.  Commu-
    nications are terminated by receipt of a character from the remote system
    indicating the end of data or by failure of the remote to transmit in the
    required period of time.
    The Central Controller Hardware Configuration
         The focal point of the CHAMP network is the central computer facility
    located at the Environmental Protection Agency Complex, Research Triangle
    Park, North Carolina.  The central controller for the CHAMP network is a
    dual processor system with a full complement of input, storage, and display
    peripherals.  The heavy burden on processor time placed by the telecommuni-
    cations and real-time processing of the large quantities of data justified
    the choice of a dual processor system.  A PDP-11/40 with 40K of core was
    
                                         239
    

    -------
    selected to perform the tasks associated with the management of the large
    data base generated by the network.   The telecommunications  and real  time
    processing tasks are handled by a PDP-11/05 computer with 16K of core.   The
    two processors are interconnected by a Unibus window which takes advantage of
    the unified asynchronous data path architecture of the 11 system.   The  window
    allows each processor to address the core and peripherals on the other  pro-
    cessor as if it were its own.  In addition, the DEC memory management option
    was added to the PDP-11/40 to handle addressing above 32K in the 16-bit sys-
    tem.  An extensive complement of peripherals including two 1.2M word car-
    tridge type disks, three tape drives, and electrostatic printer-plotter, line
    printer, and CRT display were initially selected; the rapid retrieval require-
    ment for large quantities of data necessitated the addition of a Telefile
    dual spindle, quad density, removable 20 surface pack disk system capable  of
    storing 98M words.
    The Central Controller Software Requirements
         As mentioned previously, the PDP-11/05 processor is dedicated to the
    system telecommunication tasks and the storing of the data simultaneously
    on a magnetic tape and the Telefile disk as received.  The data (Level  1)
    so stored is an image of the tapes recorded at the remote station.  These
    data include the primary data (those data which actually represent parameters
    of interest such as pollution levels), secondary data (that data required to
    validate primary data or which are used only to insure proper station oper-
    ation), and the status bits.  For flexibility any channel at the remote sta-
    tions can be selectively assigned a primary or secondary function as required;
    furthermore, the number of primary and secondary channels is made arbitrary.
    All of the data at the remote stations, whether primary  or secondary data,  is
    assigned a remote station data slot  (RSDS).  The complement of  instruments
    and the RSDS number corresponding to a given instrument  may be  different  in
    each station.   It is, therefore, necessary to append a "map" which gives  the
    correspondence  between instrument and RSDS number at the front  of each set  of
    station data.   At the central each parameter is assigned a mnemonic  (2 to 4
    letters) which  describes the parameter  (NOS, 03, TOUT, etc.).   The map,
    therefore, must contain the mnemonic and the link between the mnemonic and  the
    corresponding RSDS.  As the complement of  instruments in a station changes,
                                         240
    

    -------
    a new map will  be created and appended to future data from that station.   All
    operations at Central, will, therefore, refer only to the mnemonic names  and
    not any "channel," "data slot," or other number.
         The 11/40 processor is devoted to data validation tasks and task asso-
    ciated with the management information system requirements for quality assur-
    ance, field maintenance, and field logistics.
    CHAMP Software Features
         As mentioned previously, fundamental system objectives were:
    1.   To develop and implement a computer-based management information
         system for use in system quality assurance, field logistics, and field
         maintenance tasks.
    2.   To provide for machine (computer)-validation of the data.
         The current CHAMP software, which will now be described, represents
    the composite of original system programs plus those that were subsequently
    developed in response to needs recognized after the system became operational
    and to inadequacies in the original software.
    Current CHAMP Central Software
    1.   FILMAP - This program creates the air quality data base station map
         files.  The station map files contain station location information,
         instrument complement by station, status bit configuration indicating
         hardware failure, calibration data, and correct operating ranges of
         the hardware.  In addition, the map files  identify validation criteria
         such as secondary parameter limits, dependencies between primary para-
         meters, filtering and interpolation techniques to be applied, and the
         format of the final data output.
    2.   FILSET - This program sorts polled and station data  (mailed) by
         type and writes the data in the correct format into the appropriate
         files in the data base.  Six types of data are sorted.  These are
         (1) primary parameter data,   (2) calibration constants,  (3) status
         words, (4) journal entries,   (5) secondary parameter data, and
         (6) calibration data. Table 1 presents a sample of secondary data.
    3.   TIMSTN - This program processes the remote station data tapes and
         checks for remote station data time anomalies.  The  program enables
                                         z4i
    

    -------
    the operator to edit the reported times to resolve the time  jumps.
    The edited station data is recorded on magnetic tape.
    PDAILY - This program summarizes and produces  a printed report of
    daily station performance.  PDAILY invalidates primary parameter
    data in cases where bits associated with hardward failures are set.
    Calibration data and data which is collected on the wrong instrument
    range is invalidated.  The number of invalid five-minute averages for
    each primary parameter for each hour is tallied as well as the number
    of times each status bit is set for each hour.  The amount of data
    found invalid, valid, missing or in calibration made for each primary
    parameter is summarized over  the day.  The journal entries, which are
    operator comments entered at the remote stations, are also listed.
    PSUMRY - This program summarizes daily station performance and produces
    printed summaries of station performance.  The performance summary
    contains the following:
    1.  Percentage of valid data by parameter by day.
    2.  Percentage of valid, missing, and calibration data by day.
    3.  Percentage of valid, missing, and calibration data over the
        days shown on the summary.
    4.  Logs of data processing progress.
    5.  Primary and secondary parameter calibration occurrences, and
        occurrence of control chart  samples which exceed  upper or
        lower control chart limits.
    PSCHRT - This program samples the air quality data base  secondary
    parameters and generates  control chart files from the  sampled
    secondary parameter  file  data.
    REVMAP - This program allows manual editing of  the data  by processing
    validation actions entered on punched cards in  the standard  "Review
    Change Request" format.
    PCHEMS - This program generates  the calibration,  performs pre-
    established validation  tests, and  produces  a  printed  report  summarizing
    chemical analysis data  (i.e., hi vol  data,  bubbler data).
                                    242
    

    -------
    9-   PPCHRT - This program samples the air quality data base primary
         calibration parameters (A and B constants) and generates control
         chart files for these data.
    10.  PCALIB - This program generates the calibration coefficients used
         to convert the air quality data collected as a voltage by the remote
         station into a concentration.  The calibration constants are calculated
         from the raw calibration mode data recorded at the remote station on
         tape.
    11.  FAUT - This program performs automatic calibration for selected
         equipment under control  of the Central Computer and/or the remote
         station operator.
    12.  FILCAN - This program sorts the chemical analysis data by station
         by data and creates chemical analysis data files.
    13.  PLOTCC - This program takes input from the control chart files and
         plots control charts for the primary parameter zeroes, spans and B
         coefficients and for the secondary parameter ranges and values.
    14.  FLMRG - This program sorts station data (mailed) by types and writes
         the data in the correct format into the appropriate file in the data
         base, then these files are compared with those created by the FILSET
         Program (polled data) and fills the gaps in the data base.
                                        243
    

    -------
    00
    0
    t—
    CO
    1 — 1
    1—
    
    
    
    SECONDARY PARAMET!
    77222 TO 77237
    r~*i . _
    
    NJ ff.
    _J
    QL 1-1
    0 —I
    Lu.
    0 C£.
    O
    2 Lu.
    0
    Q O
    1-1 CD
    3 CO
    
    f-r i^
    Q 1—
    co co
    o:
    u_ 
    S
    
    
    
    0)
    cu
    -M
    fO
    O
    
    -^
    
    [__
    a>
    03
    Q
    rO
    Q
    o> r**. csj ^ LO o^
    i— 
    i^D O> OO OO ^ **O
    CM O O O O O
    i— O^t OO CT» CTi CJ»
    
    rf\ m ^4-
    - VO
    CM CM CO CO CO CO
    CM OJ CM OJ OJ OJ
    CO r— «* OJ O •*
    CO «* OJ CM CO t—
    ^- OJ <* OJ OJ r—
    OJ VO CO O OJ <*
    OJ CM CM CO CO CO
    OJ CM OJ OJ OJ OJ
    
    
    
    
    
    
    
    
    
    C
    cu
    ai
    >,
    X
    
    
    
    
    
    
    
    i — i-~ CM o un cn
    CM OJ O CO CD r—
    r-- un un un un un
    o i — cn co oj un
    OO VO 00 CO O OO
    co CM vo r--. r-^ r--.
    ^3" CM i — ** OJ OJ
    O O 0 O O O
    CO r— 00 «* O CO
    
    
    o o o o o o
    CM o cn oo vo cn
    o un o CM o r—
    un co co cn cn i—
    O O CM r— CD r—
    vo r*1* cn CD *^ ^i-
    CM CM OJ CO CO CO
    OJ OJ OJ OJ OJ OJ
    co oo cn o <• co
    
    
    r— r— O O r— •—
    oj *d- cn cn 
    -------
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    *^++
    *^3
    
    a
    
    
    cu
    3
    re
    >
    cu
    _£
    
    (—
    
    
    CU
    
    
    re
    a
    
    
    cu
    E
    •r—
    
    
    
    CU
    .p
    re
    a
    
    
    
    
    
    r**» cn r— • r^ i*** CM
    CM 00 O 00 CO r—
    r-~ LO LO LO LO LO
    
    
    
    
    i — r~» «*• 1 — CM r^
    cn «d- LO o LO CM
    LO l-v i — O LO LO
    r^ CM CM CM i— i—
    O O O O O O
    
    
    1 — VO «3" 01 LO LO
    cn cn LO co co co
    «J- <»• LO LO LO LO
    CM CM CM CM CM CM
    CO LO LO r— CM O
    i— CM LO OO i— LO
    Cn Cn CO 00 cn O
    r- O O O O r—
    
    
    
    co co 01 1 — LO «*
    CM CM CM CO OO OO
    CM CM CM CM CM CM
    
    
    LO LO  r-» oo r^.
    CM LO r^ CM r^. CM
    V4J LM LAJ r^« CU
    CO LO LO LO LO LO
    CM CM CM CM CM CM
    
    
    
    CM LO CTi OO <=3- «=t-
    LO O «* CM LO r—
    r^ LO o LO LO i —
    r— O i— r— r— r—
    
    
    
    LO tv. O r— «* OO
    CM CM OO OO OO CO
    CM CM CM CM CM CM
    
    
    
    p^ o cn LO oo •*
    «* r^ i— Osj o •*
    tM ^r t-j uj *~j r^
    r^ ^ LO ^~ LO ^f
    r— CM CM CM CM CM
    
    
    r^* CM r^ r~- LO cn
    i — i — O LO CD CM
    CM •* CM CM r— i—
    
    
    VO CO O CM VO •*
    CM CM OO CO OO OO
    CM CM CM CM CM CM
    
    
    
    ^1- CM O i— O «3"
    r— O CM r— LO CM
    LO OO LO OO i — OO
    
    
    
    CM LO CO O «=J- CM
    CM CM CM CO OO OO
    CM CM CM CM CM CM
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    CO
    CO
    "a.
    E
    re
    CO
    
    E
    CU
    cn
    >{
    
    
    j-
    re
    
    cu
    ^3
    •i—
    ^
    O
    
    t/1
    ZJ
    o
    s-
    4_>
    •r—
    ^
    
    
    ti
    H—
    3
    3
    O
    re
    ^>
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    S-
    ^.
    o
    •*-*
    
    cu
    3
    CO
    CO
    ai
    s-
    Q.
    CU
    
    3
    r—
    O
    CO
    J3
    •^
    
    
    
    
    
    
    
    
    
    
    
    
    
    •a
    o
    •r-
    S-
    cu
    o.
    01
    E
    •i—
    a.
    E
    re
    00
    
    
    
    o
    
    i.
    cu
    1^
    E
    3
    ^
    
    •o
    i.
    re
    ~£2
    E
    •P
    oo
    
    
    
    re
    cu
    
    
    
    
    
    
    
    
    E
    3
    E
    •r—
    X
    
    s:
    
    
    
    
    
    
    
    
    
    
    E
    3
    E
    
    E
    •^
    :^
    
    
    
    
    
    
    
    
    
    
    
    E
    T3
    E
    UJ
    
    
    
    
    
    cn
    E
    E
    E
    •r—
    cn
    cu
    CO
    
    
    
    (/)
    
    E
    O
    
    .Jj_)
    fO
    
    cu
    Q
    
    
    
    3
    ,E
    re
    
    
    Q^
    E
    
    ^^
    
    
    cu
    •P
    re
    o
    
    
    
    0)
    3
    
    
    
    
    cu
    E
    L_M
    fmn.
    
    
    CU
    +J
    re
    0
    
    
    
    cu
    
    "re
    ,
    
    
    
    
    •p—
    
    
    cu
    
    re
    Q
    
    
    cu
    P^
    ••^
    *~"
    
    
    cu
    •p
    re
    a
    
    
    
    cn r^ CM «^- LO cn
    «* CM O CM O i —
    IO LO LO LO LO LO
    
    
    
    CM
    oo r-- r~» LD LO oo
    LO CT> LO "^ O LO
    oo cn cn o co vo
    OO CM LO CM O LO
    CM i — O «d" i — CO
    
    
    o CM cn LO «3- co
    LO O O LO «* CO
    LO LO LO CM OO LO
    CM CM CM CM CM CM
    
    
    LO *j- vo cn r^ *±
    r— LO r— OO «* r—
    CM CM CM r--. co r^
    O O CM O O O
    
    
    •sj- r>. co CM «vf LO
    CM CM CM OO CO CO
    CM CM CM CM CM CM
    
    
    
    LO CM cn cn «* «*
    <* «* oo LO cn O
    
    C^i P"*1* t-Q vO r**^ uo
    OO CM CM CO CM CO
    
    
    
    CO CT> LO ^t" CM CTi
    •* LO CM O CO i—
    CO LO LO CT> CO CM
    __ —~ f \ f-~i (—x __
    f— , - (^J ^^^ 1 ^^ 1
    
    
    CM LO CTI CM 
    
    
    LO CO O CM «=}• LO
    CM CM CO OO OO CO
    CM CM CM CM CM CM
    
    
    
    CO i — «d- CM O «*
    00 •* CM CM CO i—
    ^J- CM *J- CM CM r—
    
    
    
    OO LO OO O CM «d~
    CM CM CM OO CO 00
    OO CM CM CM CM CM
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    CO
    CU
    
    a.
    E
    rO
    CO
    c
    cu
    r—
    >>
    
    4->
    t Lj
    
    -o
    
    re
    
    cu
    
    o
    NJ
    o
    
    JT
    o
    M-
    E
    3
    3
    0
    re
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    LO LO O O CM "3"
    •3- oo o CM oo o
    LO LO LO LO LO LO
    
    
    
    LO
    r— LO f^ ^3" CO *~~
    oo sf cn CM ^^ r^
    oo oo •— CM o r^
    co CM r~-- CM r^ o
    CM i— O -!d- 00 r—
    
    
    i — co J- LO cn 'd- r^
    i— LO •— co r— «d-
    CM CM CM l^> r~ OO
    O O CM O O O
    
    
    <5f r~» co CM vo «*
    CM CM CM CO OO CO
    CM CM CM CM CM CM
    
    
    
    LO CM cn cn •* <*
    
    -------
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    •o
    CD
    ^
    C
    • r—
    4J
    C
    o
    o
    
    
    •
    r—
    
    UJ
    ^_
    ca
    •^
    h-
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    _
    CD
    i—~
    r}
    03
    1—
    C
    H- 1
    
    (/)
    0)
    
    "c.
    e
    03
    oo
    ,_
    5
    s-
    o
    Li-
    
    eu
    s-
    3
    10
    in
    CD
    S-
    O-
    
    O
    si
    f\ \
    UJ
    s-
    03
    CQ
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    01
    
    
    £
    £
    
    **
    
    (/)
    OJ
    Q.
    
    CD
    -P
    ^3
    r—
    o
    If)
    j3
    ^£
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    XJ
    o
    •r~
    S-
    OJ
    Q.
    
    CT
    C
    •r-
    "o.
    E
    fO
    oo
    
    
    
    0
    S-
    eu
    o
    E
    3
    ^^
    
    
    
    "O
    S-
    n3
    T3
    03
    ^_)
    oo
    
    
    «0
    ^
    
    
    
    
    
    
    E
    
    E
    •r—
    X
    cO
    s:
    
    
    
    
    
    
    
    
    
    
    
    £
    Z3
    _E
    
    C
    •r-
    2£!
    
    
    
    
    
    
    
    
    
    
    cn
    c
    •5
    c
    UJ
    
    
    
    
    
    
    01
    c
    E
    c:
    •i —
    01
    OJ
    on
    
    
    to
    CD
    ^
    1 —
    03
    ^>
    
    
    C
    o
    •f—
    -p
    03
    '>
    CD
    Q
    
    CU
    fr<
    •W
    
    CD
    _E
    
    1—
    
    
    QJ
    
    03
    Q
    
    
    
    
    
    03
    
    
    CD
    E
    
    t_
    
    
    
    CU
    4_>
    03
    Q
    
    
    CD
    
    >
    
    
    
    QJ
    £
    •p.
    f—
    
    
    Qj^
    ^_>
    03
    Q
    
    
    
    CD
    E
    i 	
    i
    
    
    CU
    -4_>
    (O
    Q
    
    
    <~O 00 •=*• i— CO CO
    «=!••* i— Ln CM co
    r~. Ln Ln Ln Ln Ln
    
    CM CM
    CO CO
    1 1
    en Ln vo *d~ en
    CO CM CO CO i — i —
    co p*-» tn en en en
    o en co co co co
    en en oo Ln vo 10
    O O r— O O O
    
    
    i — O «D P~- Ln CM
    Ln Ln oo o CM «*
    ID Ln co en oo co
    p^s r*^ j*«» ^^ r^^ r^^
    
    Ln co "^ r^. cr> en
    «* «5t O CM O «*
    CM o co CM CM en
    O r- CM r— r— O
    
    
    «* co en o co vo
    CM CM CM CO CO CO
    CM CM CM CM CM CM
    
    
    
    co en en co o Ln
    co «* Ln en i — r—
    co co r- o en o
    r— r— CM CM r— CM
    
    
    r~- r— co co o vo
    Ln co Ln CD ^~ ^"
    CO CO CM CO CO CO
    •— •— O i— O r-
    
    
    
    Ltt VO CO r— -=3- •^•
    CM CM CM CO CO CO
    CM CM CM CM CM CM
    
    
    O i— Ln rv. o ^t
    «* en oo vo tn co
    «* CM «e- 1^ r- ID
    i*» h» r~ r^ r^ r->
    
    
    
    ^* co r*** en *^ '^
    r— Ln Ln o ^3" Ln
    CM co i— CM o en
    i— •— t— i— r— O
    
    
    ^O CO O CM *^J" VO
    CM CM CO CO CO CO
    CM CM CM CM CM CM
    
    
    
    
    CO CM CO CM «* r-
    CO ^3- CM CM CO i—
    •=d- CM *d- CM CM r—
    
    
    
    CM ID OO O CM «*
    CM CM CM CO CO CO
    CM CM CM CM CM CM
    
    246
    

    -------
       USE OF PRECISION AND ACCURACY ESTIMATES
                FOR VALIDATION OF DATA
                          by
                    David T. Mage
      Environmental Monitoring Systems Laboratory
        U. S. Environmental Protection Agency
    Research Triangle Park, North Carolina  27711
                          247
    

    -------
                      USE OF PRECISION AND ACCURACY  ESTIMATES
                               FOR VALIDATION OF  DATA
                                   David T.  Mage
                                    INTRODUCTION
          A basic need in the presentation of a data set is  a  description of the
    precision and accuracy associated with the measurements.   The definition of
    valid data for a given study is then determined  by the precision and accuracy
    claimed for the data set.  For example, a statement may be made that the data
    have a precision of 10%.  Assuming that the errors are independent and normal-
    ly distributed, one expects that 68% of the measurements are within ±10% and
    95% of the measurements are within -20% of the true value  which is unknown.
    By this definition the 5% of the data in error by more than -20% are not in-
    valid since one expects from probability alone that runs of positive or runs
    of negative errors can occur.  It is not productive and probably impossible
    to examine each datum point and determine an  individual  error associated with
    it.  The approach being taken for the Community Health Air Monitoring Program
    (CHAMP) data base is to determine the precision and accuracy of the entire
    data set and not invalidate data except for known cause such as instrument
    failure.  The alternative of eliminating data suspected of higher uncertainty
    in order to improve the precision of the remaining data set is counter pro-
    ductive in the context of a health study.
          In a health study, the aerometric data  are paired with health data.
    When aerometric data are invalidated, the associated health statistics are
    also removed from the analysis.  Because the  occurrence of the health indica-
    tor, such as an asthmatic attack, is relatively infrequent, the loss of the
    information significantly reduces the validity of the overall study.  For
    this reason an approach which provides a large aerometric data base of moder-
    ate precision and accuracy is preferable to an approach which provides a re-
    duced data base of higher precision and accuracy.  The following sections
    
                                         248
    

    -------
    describe the system of data validation currently being  used  by  CHAMP.
    
                                   ERROR ANALYSIS
    
          When an aerometric analyzer is continually monitoring  pollutant,  several
    sources of potential error can influence the measurement.  The  ten  sources  of
    error given in Table 1 are discussed below.
          1.  Span Gas Analysis—This error covers the uncertainty  in the  process
    of preparing a known concentration of pollutant to provide an upscale  reading.
          This process may contain errors associated with preparation of a primary
    standard and subsequent analysis of a secondary or transfer  standard to be
    used in the field.  When dilution is necessary to attain the desired concen-
    tration, the errors in the flow measurements of standard and diluent air also
    contribute to the overall error.   It is the belief of the author that this
    uncertainty is on the order of 5% (al = .05  ).
          2.  Zero Gas Impurity—The gas used to zero the instrument may contain
    some impurity.  The presence of 0.1  ppm as opposed to 0.0 ppm represents po-
    tential error of 10% at the 1 ppm level and  1% error at the  10  ppm  level.
    For the purpose of this analysis a low error of 2% (o2 = .02 )  is chosen
    since greatest concern is at or above the National Ambient Air  Quality Stan-
    dard (NAAQS) where this error is minimized.
          3.  Instrument Drift, Electronic—When the sample and  reagent flows to
    the instrument are held constant, a constant input concentration produces a
    signal  which fluctuates about a mean value.   This "noise" in the output sig-
    nal may be caused by electonic noise in the  photomultiplier  tube and other
    electrical components due to voltage, frequency and temperature fluctuations.
    The estimate of error for this effect is 3%  (a3 = .03 ).
          4.  Instrument Drift, Flow Variations—After an instrument is calibrat-
    ed, and with input concentration held constant, an increase  or  decrease of
    the sample flow rate will tend to cause the  instrument to drift away from the
    equilibrium point.  When reagent flows are also being mixed  in  a reaction
    chamber, such as ethylene flow in a chemiluminescent ozone analyzer, the
    fluctuations in reagent flows also influence the output signal.  These flows
    may be influenced by fluctuations in atmospheric pressure and vacuum in the
    flow system.  The overall effect of these variations of flow from the
    
                                         249
    

    -------
                                                                                                          to
                 cxi
        LT\  C3-  CD  CD  UD  CD
    '    CXI  CD  CD  hO  i—H  CD
    ~    CD  CD  CD  CD  CD  CD
        CD  CD  CD  CD  CD  CD
                                 CD   CD   CD   CD   CD     r-H
                                 CD   CD   CXI   CD   CD     CD
                                 CD   CD   CD   CD   CD     CD
                                                                                                          O
                                                                                                          0.
                       IT\  CM   N^   CJD   CT
                       CD  CD   CD   CD   CD
                                 CXI  CNJ   CNJ
                                 O  CD   CD
                                                               CD  O
                                                                                                           LU
    uu
    _i
    CO
            o:
            o
           O
    
    
           cn
            a:
            o
            on
            ce
            u.
            o
    
            CO
            UJ
            o
            on
            ^
            o
           GO
                                       00
    
                                       o
    CO
    C£
    O
    or
    o:
    en
    O
    o
    a:
    i-
    o
    00   >-    ~>    i   0)
    —   I-   I-   h-   —
    CO   •—i   U_   U-   O
    >-   on   i—ii—i   LU
    —J   :D   on   on   on
    ^   Q_  f*"^  f~^i   Q.
                                                                UJ
                                  O
                                  CO
                                                                CO
                                                                z
                                                                o
                                                                O
                                                                LU
                                                                on
                                                                on
                                                                o
                                                 LU
                                                 a:
                                                      CD
    
                                                      Q
                                                                                    O
                                                                                    0£
         CO   CO
    
    
         CD   <_D
                                            a:
                                            o
                                            h-
                                            <
                                            a:
                   on   on
         Z   O   t-   I-
         <   OC   V>   CO
         Q-   UJ   Z   Z   0-
         CV5   rx|   ^-   H—   CD
    CO
    o:
    o
    on
    on
    _u
    
    Qi
    O
    Z
    1—4
    s:
    
    
    
    
    
    o
    
    CO
    LU
    i— *
    1-
    1— 1
    on
    ^
    UJ
    z
    1— 1
    _J
    1
    z
    o
    
    
    
    
    
    LU
    21
    
    h^
    
    UJ
    CO
    z
    o
    Q_
    CO
    UJ
    Q^
    
    
    
    
    CO
    UJ
    u
    z
    UJ
    on
    UJ
    u.
    o;
    LU
    h-
    z
    1— 1
    1-
    ^
    a:
    LU
    a.
    2:
    LU
    1—
    1
    LU
    on
    ^
    00
    CO
    UJ
    o:
    Q_
    ^
    
    CO
    z
    t— 1
    CO
    CO
    LU
    o
    o
    en
    O-
    
    «^
    1-
    «^
    CD
                                                                                                           on
                                                                                                           o
                                                                                                           u_
                                                                                                          CD
                                                                             II
    
                                                                            b"
                                                                        on
                                                                        O
                                                                        cc
                                                                        or
                                                              a
                                                              LU
                                                              i-
                                                              o
                                                              UJ
                                                              Q_
                                                              X
              CNI
                                      -=r   in
                                           to
                                            oa   o>
                                                       250
    

    -------
    calibration condition is taken to be 6% (a^ =  .06  ).
          5.   Operator Impreci si on—The station operator  in  performing  calibra-
    tions must adjust potentiometers and rotometers  and perhaps  read  the  mean  of
    a fluctuating signal.  A different operator repeating these  procedures  will
    arrive at a slightly different result for each of  them.   The resulting  uncer-
    tainty, due to the human element, is estimated at  4%  (o5 =  .04 }.
          6.   Non-Linearities of Scale—A linear relation is usually  assumed  be-
    tween voltage output and pollutant input.  Slight  non-linearities of  scale
    are usually masked by the uncertainties in the measurements  themselves. Where
    the scale appears to be linear an error of 2% in the  linearity is almost  in-
    distinguishable, consequently this error of 2% is  treated as a possibility
    which cannot be ignored (o6 = .02 ).
          7.   Response Time—Due to the finite response time of  the instruments,
    a rising signal will lag and tend to be underestimated and a falling  signal
    will lag and tend to be overestimated.  These errors  are felt to  provide  an
    error on the order of 2% in the measurements (a7 = .02 ).
          8.   Interferences—Variations in the atmospheric composition from the
    composition of the gas used to calibrate the instrument can  cause errors.
    Common gases, besides the common pollutants, which fluctuate in the atmosphere
    are C02 and H20.  These fluctuations can cause variations in output signal on
    the order of 2% (a8 = .02).
          9.   Pressure Temperature Correction—When  data  are corrected to stan-
    dard conditions (25°C and 760 mm Hg), uncertainties in measured pressure  and
    temperature can cause a slight error on the order  of  1% (a9  = .01 ).
          10.  Data Processing and Round Off--In analog-digital  conversions and
    vice versa, an error is created.  When data are  outputted a  round off error
    also occurs.  These errors are quite small and probably less than 1% (a10 =
    .01 ).
          The net result of all of these effects, assuming that  the variances are
    additive, is an overall uncertainty of -10%.  This is interpreted as follows.
    If  the atmospheres at 10 stations were all 10 ppm  of some arbitrary pollutant,
    one would expect the mean of all 10,station measurements to  be 10 ppm,  7  of
    them would be between 9 and 11 ppm, 2 of them would be in the range 8-9 ppm
    and 11-12 ppm, and 1 would be over 12 ppm or less  than 8 ppm.
                                         251
    

    -------
                              CHAMP VALIDATION CRITERIA
          The CHAMP aerometric system is unique in that it measures and records
    the secondary flow parameters within the instrument simultaneously with the
    measurement of the primary parameter (pollutant concentration).
          The fluctuations in the secondary parameters produce uncertainties in
    the measurements as described previously.   When the fluctuations exceed their
    expected range, or limit of normal  operation, two possible causes must be in-
    vestigated.  The first is pure randomness  which means that the data are valid.
    The second cause may be a component failure, such as a clogged capillary tube,
    and this produces a bias in the data.  Besides signalling the need to repair
    the instrument, the analyst has a cause for invalidating that portion of the
    data set where the instrument was operating out of the normal range.
          As an example of the usage of secondary parameter data for determining
    precision, accuracy, and validity, the effect of ethylene flow within the
    Bendix chemiluminescent ozone analyzer is chosen for discussion.  Figure 1 is
    a typical plot of the ozone flow variations within a CHAMP station.  On the
    day shown the ethylene flow (FETH) varied from 25 cc/min to 26.3 cc/min.
    Figure 2 is a histogram of the fluctuations of FETH about the previous cali-
    bration setting of FETH during the 11-day period August 15 - August 25, 1977.
    The histogram has a mean of -0.03 cc/min and a standard deviation of 0.29 cc/
    min.  The mean close to zero confirms that the fluctuations are not producing
    an appreciable bias, as expected.  The standard deviation can be related to
    a standard error by examining how the output of an ozone analyzer varies with
    FETH.
          Figure 3 shows the instrumental bias  (A ppm 03) as a function of AFETH
    at various ozone levels.  In this case the instruments were on the 0.5 ppm
    scale, and were calibrated with a sample flow of 1000 cc/min and a  FETH of
    25 cc/min  (bias = 0 by definition).  When these data are normalized by di-
    viding bias by original ppm value, the percentage changes at all four concen-
    trations follow a common curve, Figure 4.   In this case a linear fit to these
    data is not justifiable.  At 25 cc/min the slope of the curve  is +1.6% per
    cc/min.  The standard deviation of 0.3 cc/min, therefore, corresponds to a
    standard error in ozone of 0.5% as shown on Table 2.  The biases of the meas-
    urements to changes in vacuum  (VAC), sample flow of ozone (SFOa), sample
    
                                         252
    

    -------
         CD
         C
    i—   OJ
    
    UJ   >>
    a:   .£=
    
    en   QJ
    i—i
    u_   <*-
         o
    
         c
         o
         •r—
         -!->
         (O
         •r—
         s_
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    (
    0
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    "
    
    
    
    I
    
    
    
    
    
    
    
    |
    !
    i
    
    
    
    
    
    
    
    
    c
    o
    
    
    
    
    o
    CL
    O
    
    
    
    
    
    
    
    
    
    
    0
    n °
    o
    o
    o
    o
    o
    o
    Q
    e
    0 <
    Q
    o
    o
    o
    C
    o
    o
    » 0
    e
    r)
    *o0
    
    
    
    O
    tf
    
    
    
    
    
    I
    i
    
    i
    
    
    c
    o <
    ' u
    Oo
    0 4
    o
    o
    °0
    > °
    o
    o
    
    0
    0 0
    
    
    1
    d»«°
    
    
    
    
    
    
    i
    
    
    
    
    o
    o
    MJ<
    
    o
    0
    o
    0 ^
    ,°3
    0 0
    0
    0 °
    0 0
    
    
    f *
    
    
    
    
    
    
    
    
    
    
    0°
    0°
    
    0°
    o
    0
    ° o
    0°
    0°
    0 °
    o
    o 0
    0 °
    o
    o
    o
    o
    o
    \J
    o
    
    
    
    0
    
    
    
    
    
    o
    
    o
    0
    
    
    u
    o
    o
    3
    
    
    
    
    
    
    
    
    0
    
    „ 0
    p
    
    I '
    
    o
    0
    
    
    o
    
    
    o
    o
    o
    3
    0*&
    0°~^
    
    
    
    
    
    
    
    
    9
    o
    c
    
    
    o
    o
    o
    e
    0
    1
    »v
    o
    0°
    o
    o
    fl
    3°°
    0
    
    
    
    
    
    
    
    !
    
    
    
    
    
    t
    J°o
    p
    o
    
    
    0°
    ^
    1
    0 ^
    
    o
    o
    d- »
    o
    >o
    lO
    
    > „<>
    > o
    >0
    
    
    
    I
    11 	 1
    
    
    
    
    
    
    
    
    
    
    
    1
    00
    »
    o
    o
    > o
    (f3
    0
    o
    
    
    I
    1
    f
    i
    i
    
    !
    
    
    
    
    
    
    
    
    
    o
    {
    {
    c
    o
    e
    : O
    i 0
    o
    0
    <
    1
    o
    <
    1
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    p
    0
    a
    1
    
    
    o
    o
    
    0
    0
    
    ?tl
    
    &
    
    y
    
    2
    U3
    
    .-r
    rH
    ^1
    l-l
    
    y
    
    
    
    
    
    
    
    
    0
                                                                                                <£
                                                                                                Q
    CM
                                               u>
                                                •
    
                                               CM
                                                                                4-
                                                                                  •
    
                                                                                CM
    O
      •
    
    
    CM
                                           253
    

    -------
    CNJ
    
    LLl
    ce.
    
    o
        LU  IO
        Z  CO
        LU  CM
    Z  O
    1-1  --3
    
    
    00  O
    Z  I—
    O
    I—l  VO
    
    1—  CM
    <  CM
        O
        II
    
    
        LU
            o
            -3
        o  :r
        H-  »-
        CO  LU
        h-H  LU
        a:
                                                                                                 o
                                                                                                 UJ
                                                                  £
    
                                                                  u
    
    
    
                                                                 CO
                                                                                              Si
                                                                                                   o
                                                                                                   ID
                                                                                              o
                                                                                              J-
                                                                                                  s
                o
                o
                      o
                      o
    o
    o
    o
    §
                                           sNOiivniorm  do
                                                    254
    

    -------
                                  -NnjMtfll^fflUJB-NnTtl
    B   M   J   n   W   -   H   B  Bl   B   B   B   H.   H  H   B   ---.---
    B   B   B   B   B   B   B   g  gf   g   jj   gj   g|   gj  5}   ^   B*   B   B   B  B   B
    SBBBBBBllllllilllilJl!
    (6Q
                                                          SVia
                                                      255
    

    -------
                                                                                     "SE
    «3-
    
    LU
    C£
            •      •     •                .    H    M
           BUB     .      .     M    -    -
           N     -    -    W     0     I      1     i
                                  (aBueip  %)  SVI9
    
                                          256
    B     W    B    W    B
    N     PI    PI    PI    J
    I     i     i     i     I
    

    -------
    
    
    
    
    CD
    U_
    oo
    
    
    ZE
    |— —
    UJ
    LJ_
    
    
    
    CO
    ry
    LU
    1- CO
    LU hO C_)
    Si CNl «=C
    -
    a: r^.
    < CO 1 <
    d_ LU CNJ •— '
    o CNI z
    >- < CNJ o:
    o: o; r^. o
    < LU l\ U_
    o > — •
    CNl 2  «-" ^ CNl
    < z. a: o CD
    ) — LL. •— UJ Z U_
    O 2: Q_ <
    1 _l
    co i_n > o.
    •^ ^ -^ 	 '
    O Q
    •— 1
    1- LA
    < I— I
    »•— t
    <
    >
    
    
    CD
    ^
    u_
    oo
    
    
    
    
    
    
    
    
    
    
    
    s:
    •^^
    CJ
    CJ
    oo cr>
    T— H N^v
    
    s:
    o
    CJ
    LO en
    CNl 1^
    N^ *
    ^ +
    a:
    or
    0
    I—
    
    N™\
    r— 1
    CO N~\
    CT i— i
    CO -f
    
    
    
    
    
    
    ^r
    ^^
    CJ
    
    
    00 CD
    en
    CNl r-(
    
    
    
    
    
    
    
    
    2:
    CJ
    CJ
    
    •CJ" r— • |
    cn
    CNJ -=T
    
    0
    *— i
    LU |-
    O <
    > 21
    CC CC. ID
    uj LU s:
    CQ CO — •
    s: CQ x.
    ~\c~~i  cr en
    CD 1^. CNl
    • - -
    1 1 0
    a:
    a:
    0
    l-
    oo
    oo en hn
    - i— i
    CD C.O
    N"^
    + 1
    
    
    
    
    
    
    ^"
    v^
    O
    OO CJ
    CD
    CD 00
    CO -3"
    . •
    + T— | 0
    1
    
    
    
    
    
    S
    U
    CJ
    
    co i_n
    CD rv. to
    - . .
    LO r— i
    1 1
    
    •z.
    0
    Q— r
    SI fv* (—
    13 ^ - 2 >
    < 2 < LU
    HI t~i f— C5
    SZ SI CO
    
    
    fe-5
    un
    CD
    
    
    
    
    &-S
    un
    •
    CD
    
    
    
    
    
    fe-S
    tn
    -
    CNl
    
    
    
    
    
    
    
    
    
    
    
    
    B-S
    hO
    «
    O
    
    
    
    
    
    
    
    
    
    
    &~s
    to
    -
    r-H
    
    
    
    
    G
    Od
    «i
    Q
    "Z.
    <£
    ^—
    GO
    257
    

    -------
    flow of NO (SFNO), and flow of air through  the ozone  generator  (FCh)  are
    shown on Figures 5-8.   These biases were used to compute  the  other standard
    errors given in Table  2.
    
                                     DISCUSSION
          The assumption developed in the preceding section is  that the measure-
    ment errors from instrument-to-instrument are independent and normally dis-
    tributed.  When the same standard is used for repeated calibration of an  in-
    strument to provide a  time series of measurements,  the error  in the analysis
    provides a bias.  One  of the functions of an audit, using an  independent  set
    of standards and observers, is to disclose whether  a  significant bias exists.
    In September of 1977,  an audit of 46 instruments located at seven CHAMP sta-
    tions was performed by the contract operator of the stations.  The audit  pro-
    cedure was not truly independent since the transfer standards in station  use
    were originally compared to the primary standard used in the  audit.  Of the
    46 instruments, seven  of one type showed a consistent bias  indicating a prob-
    lem with the audit procedure.  The remaining 39 instruments showed an ex-
    pected positive and negative scatter about the audit values.
          In order to test the hypothesis that these 39 results are normally dis-
    tributed, the results  are plotted on normal probability paper at frequencies
    corresponding to their rank, lowest to highest, divided by  the total (39)
    plus one.  Each datum point represents the average  of four  span results at
    approximately 20%, 40%, 50%, and 80% of full scale.  These  four values of
    deviation are not independent since the same operator used  the same standards
    for each one.  However, the set of 39 averages are  mutually independent.   The
    mean of the deviation, y, is -1.55% and the standard deviation, o, is 5.5%.
    The maximum difference between the frequencies predicted by the normal dis-
    tribution, N( u,a), and the data points is 6%.  This corresponds to a
    Kolmogorov-Smirnov statistic of 0.06 which indicates that the hypothesis of
    normality for the distribution cannot be rejected at the 5% level.  If an
    independent auditor performed the audit with independent standards, a stan-
    dard deviation  larger than 5.5% would be expected,  probably on the order of
    10 to 15%.  The results of independent CHAMP audits are being analyzed in the
    manner described  above and the results of  the analyses will be reported with
    
                                         258
    

    -------
                                                                           Q
                                                            139
                                                                                     '55
                                                                                     •05
    LO
    
    LU
    C£.
    
    CD
                                                                                        CM
                                                                                       O
                                                                                     T3h
                                                                                     "SE
                                                                                     •0E
                                                                                     •52
           S    W
           W
    M     El
    . El
    M -
    1 1
    (aSueip %
    U H
    N
    1 1
    ) svia
    u
    n
    i
    
    El
    n
    i
    
    w
    n
    i
    
    d
    j
    i
    
                                            259
    

    -------
                                                                         *0hZ
                                                                          "01Z
    VO
                                                                          •061
                                                                               u
    
                                                                          •0B1-
    
                                                                               o
    
                                                                               u_
                                                                               CO
                                                                          •051
                                                                          •021
                                                                          •0hl
                                                                          •0EI
    

    -------
           o
    LlJ
    a:
    
    CD
           o
    
           <:
     to
    •»->
     E
    
    
     3
     S-
    •P
    
     c
    •I—
    
     
    -------
    262
    

    -------
    U3
                    CM
                                                                                                                                                           o
                                                                                                                                                           o
                                                                                                                                                          •r-
                                                                                                                                                          4->
                                                                                                                                                           IB
                                                                                                                                                           .c
                                                                                                                                                           4->
                                                                                                                                                           •r—
                                                                                                                                                           X
                                                                                                                                                           T3
                                                                                                                                                           4J
                                                                                                                                                            C
                                                                                                                                                            OJ
                                                                                                                                                            o
    
                                                                                                                                                            O)
                                                                                                                                                           Q.
                                                                                                                                                           IO-
                                                                                                                                                           CS
                                                                                                                                                           O
    

    -------
    the data elsewhere.
          In conclusion, the approach being taken for the CHAMP data validation
    procedure is to accept data from the instruments when they are known to be
    operating properly and make a probability statement for the individual data
    set as a whole.  For example, a pollutant data set for one station may have a
    standard deviation of 10%  but the standard deviation for another pollutant
    at the same station may be 15%.
          These different uncertainties allow the statistical analyst to weight
    the data higher when the expected error is low and adjust for the relative
    uncertainties in making correlations between air pollution and health.
                                         261*
    

    -------
     VALIDATION SYSTEM USED IN THE ST. LOUIS
       REGIONAL AIR MONITORING STUDY  (RAMS)
                        by
                 Robert B. Jurgens
     Environmental Sciences Research Laboratory
        U.S. Environmental Protection Agency
    Research Triangle Park, North Carolina  27711
                         265
    

    -------
                    VALIDATION SYSTEM USED IN THE  ST.  LOUIS
    
                      REGIONAL AIR MONITORING STUDY  (RAMS)*
    
                                R.B.  Jurgens+
    
    
                                   Abstract
    
    
         This paper describes the RAMS measurement system,  screening categories
    
    of data validation, the RAMS automated validation  system, the current
    
    status of special validation studies - visual  validation and successive
    
    differences, and updates to the RAMS data base.   The conclusion presents
    
    a generalized measurement system including quality control, data validation
    
    and feedback.
    *Portions of this paper have been discussed in detail elsewhere (1) and
     will only be mentioned here for completeness.
    
    +0n assignment from National Oceanic & Atmospheric Administration.
     U.S. Department of Commerce, Rockville, Maryland 20852.
                                      266
    

    -------
                                 Introduction
    
         The Regional  Air Monitoring System (RAMS)  is  the ground-based
    aerometric network of the St.  Louis Regional  Air Pollution Study (RAPS).
    See references (2-3) for a discussion of the  objectives, scope and
    accomplishments of RAPS.
         The location  of the 25 RAMS stations within the St. Louis metropolitan
    area is shown in Figure 1.  The air quality,  meteorological  and solar
    radiation measurements within  the RAMS network  are listed in Table 1.
    Note that not all  measurements are made at each station.  Measurements
    began being recorded in mid summer 1974 and continued through June 1977.
    From April to June 1977 only stations 104, 106, 107, 111, 115, 121 and
    125 were in operation.  The approximate volume  of data recorded during
    the network operation is 500 million values.   Figures 2 and 3 show the
    data flow through the RAMS stations and through the central  facility at
    Creve Coeur.  Rockwell International Corporation was the prime contractor
    for the installation operation and maintenance  of the RAMS network.  A
    detailed description of RAMS can be found in references  (4,5).
                                     267
    

    -------
    Quality Control & Data Screening
         Data validity results from:  1) a quality control  program designed
    to provide accurate data as it is measured and 2) a screening process to
    detect spurious values which exist despite the quality control process.
    The quality control program for the RAMS network is reviewed in (1) and
    (5).  Detailed definition and discussion of the elements of quality
    control for air pollution measurement systems have been published in (6).
    The specific quality control activities relating to calibration, zero/span
    checks, status and analog checks associated with the gas analyzers are
    quite similar to those of the CHAMP program which are discussed in the
    preceding paper by Dr. Marvin Hertz.
         Based on the experience of managing the data validation activities
    of RAMS we have developed a summary of screening techniques which is
    applicable for any continuous automated monitoring network (air pollution,
    water quality, etc.).  These tests  (Table 2) have been divided into
    three categories:  1) Operational, 2) Continuity and Relational and 3) A
    Posteriori.  Discussion of screening tests within each category and
    their application in RAMS follows.
         The first category,  "Operational," contains checks which document
    the network instrument configuration and operating mode of the recording
    station.  These checks, which in RAMS are part of the quality control
    program, include checks for station instrumentation, missing  data,
    system analog and status  sense  bits, and instrument calibration mode.
    In addition to documenting system performance the checks are  used to
    flag data in the RAMS archive.  As designed, the RAMS data bank contains
    space for every potential measurement.  For example, if an instrument  is
    in calibration mode, the  corresponding data slots will contain a "calibration"
    flag.
                                        268
    

    -------
         The second category, "Continuity and Relational,"  contains  temporal
    and spatial  continuity checks and relational  checks  between parameters
    which are based on physical  and instrumental  considerations or on statistical
    patterns of the data.   A natural subdivision  can be  made between intra-
    station checks, checks which apply only to data from one station, and
    interstation checks, those which test the measured parameters for uniformity
    across the network.
         Intrastation checks include tests for calibration drift (gas
    analyzers in RAMS), lower detectable limits,  gross limits, aggregate
    frequency distributions, relationships, and temporal continuity.
         The drift calculations, which are part of the quality control
    program, are discussed in the above references.  Many measurement instruments
    have a threshold, or lower detection limit (LDL), below which their
    output is obscured by instrument noise.  A standard practice adopted in
    RAMS is to replace values in this range  (0.0 +_ LDL) with +1/2 LDL. The
    LDL's for the gas analyzers and the wind speed sensor are the lower
    instrument limits listed in Table 3.
         Gross limits, which in RAMS are used to screen impossible values,
    are based on the ranges of the  recording instruments.  These, together
    with the parametric relationships which  check for internal consistency
    between values, are listed in Table 3.   Setting limits for relationship
    tests requires a working knowledge of  noise levels  of the individual
    instruments.  The  relationships used are based on meteorology, atmospheric
    chemistry, or on the principle  of chemical mass balance.  For example,
    at a station for any given minute, TS  cannot be less than S02 +  H2S  with
    allowances for noise limits of  the instruments.
                                      269
    

    -------
         A refinement of the gross limit checks can be made using aggregate
    frequency distributions.  With a knowledge of the underlying distribution,
    statistical limits can be found which have narrower bounds than the
    gross limits and which represent measurement levels that are rarely
    exceeded.  A method for fitting a parametric probability model to the
    underlying distribution has been developed by Dr. Wayne Ott of EPA's
    Office of Research and Development (7).  B.E. Suta and G.V. Lucha (8)
    have extended Dr. Ott's program to estimate parameters, perform goodness-
    of-fit tests, and calculate quality control limits for the normal distribution,
    2- and 3-parameter lognormal distribution, the gamma distribution, and
    the Weibull distribution.  These programs have been implemented on the
    OSI computer in Washington and tested on water quality data from STORET.
    This technique has not been implemented within RAMS.
         Also, under intrastation checks are specific tests which examine
    the temporal continuity of the data as output from each sensor.  It is
    useful to consider, in general, the types of atypical or erratic responses
    that can occur from sensors and data acquisition systems.  Figure 4
    illustrates graphically examples of such behavior, all of which have
    occurred to some extent within RAMS.  Physical causes for these reactions
    include  sudden discrete changes in component operating characteristics,
    component failure, noise, telecommunication errors and outages, and
    errors in software associated with the data acquisition system or data
    processing.  For example, it was recognized early in the RAMS program
    that a constant voltage output from a sensor indicated mechanical or
    electrical failures in the sensor instrumentation.  One of the first
    screens  that was implemented was to check for 10 minutes of constant
                                     270
    

    -------
    output from each sensor.   Barometric pressure is not among the parameters
    tested since it can remain constant (to the number of digits recorded)
    for periods much longer than 10 minutes.   The test was modified for
    other parameters which reach a low constant background level during
    night-time hours.   SO^ was generally at zero and no persistency check
    was applied against it.
         A technique which can detect any sudden jump in the response of an
    instrument, whether it is from an individual outlier, step function or
    spike, is the comparison of successive differences of a measurement with
    predetermined control limits.  These limits are determined for each
    parameter from the distribution of successive differences for that
    parameter.  These differences will be approximately normally distributed
    with mean zero (and computed variance) when taken over a sufficiently
    long time series of measurements.
         The type of "jump" can easily be identified.  A single outlier will
    have a large successive difference followed by another about the same
    magnitude but of opposite sign.  A step function will not have a return,
    and a spike will have a succession of large successive differences of
    one sign followed by those of opposite sign.
         Though not implemented in Rockwell's data processing and validation
    program, TAPGEN, (partly because of an expected large increase in processing
    time on the PDF 11/40), or in EPA's data archiving programs (version
    6.4) strong consideration has been given to this technique for potential
    applications in data screening and quality control checks.  In 1976, the
    Data Management and Systems Analysis Section (DMSAS) awarded a contract
    to RTI to study validation procedures for the RAPS data bank (9) in
                                      271
    

    -------
    which a major area of investigations was the use of minute successive
    differences.  Use of successive differences in an ongoing special  validation
    study will be discussed in a later section.
         A number of interstation checks on meterological  parameters are
    implemented in Rockwell's TAPGEN program.   However, they have only been
    used for quality control of the RAMS system and not for validation
    (flagging) of RAMS data.  These tests, which are shown in Table 4, are
    performed on hourly average data.
         Another interstation check, the Dixon ratio test has been examined
    to determine its applicability for screening RAMS network outliers
    (1,9).  Dr. Ty Hartwell, RTI, in an earlier session presented some
    results he has obtained using the Dixon ratio test on RAMS data.  This
    test was never implemented into the RAMS data validation system.
         Referring again to Table 2, the third screening category, "A
    Posteriori", was established to provide a mechanism for overriding the
    automated flagging schemes which have been implemented in the instrumentation
    at the remote sites and in the data screening module.  From a review of
    station logs and preventive maintenance records, a knowledge of unusual
    events, or through visual inspection of data, it may be determined that
    previously valid data should be flagged as questionable.  Conversely,  it
    may  be determined that  previously invalid  data  should be validated by
    removing existing flags.  An example of when data would be invalidated
    is when an  instrument,  such as a wind direction  indicator, becomes
    misaligned  or uncalibrated because  of some non-linear or unknown  reason.
    Removal of  flags or revalidation can occur, for  example, when the recording
    instruments function properly, but  either  the sense bit or analog status
    circuitry is  known to  have malfunctioned.
                                     272
    

    -------
    RAMS Automated Validation System
         The screening tests used in validating RAMS data were largely
    developed and tested at RTF and then implemented in the St. Louis central
    facility computer for on-site, near-real-time processing.  Through
    continued testing and modification the validation system evolved to its
    final version - version 6.4.  All data archived by previous versions
    have been rearchived to this standard.  Table 5 lists the causes and
    flags of screening tests while Figure 5 shows a flow diagram of the
    order in which the tests are applied.
    
    Special  Validation Studies
         Special known problems have occurred on certain parameters from
    time to time.  The origin of these problems can be traced to sensor
    failure, electrical transients, software bugs at RAMS stations and at
    the central facility, data acquisition hardware, etc.  Despite the
    automated validation program these problems have lead to the archival of
    erroneous data.  It should be noted that these problems have only effected
    a small  percent of all data - estimated to be less than 1 percent of the
    total.
         In an effort to locate, review and flag any remaining suspect data
    (known a priori or not) several studies have been initiated within
    DMSAS.   Two major efforts involve a graphical review of hour average
    data and a computer study of minute successive differences:
         Rams Hour Average Graphical Review. — Table 6 lists the volume of
    data from the RAMS networks, the number of minute and hour plots and the
    number of microfiche (24x) required for plotting all RAMS data.  The
    tremendously large number (70,000) of minute plots preclude a graphical
                                     273
    

    -------
    review at this time interval.   Therefore, a graphical review system
    using hour average data was developed.  This system combines the use of
    computer graphics, interactive programs and computer files (lists)
    wherever possible to reduce the manual labor associated with the various
    tasks.
         The steps in this study are shown in Figure 6.  Computer-generated
    note books of hour average plots are reviewed by trained personnel for
    any suspect data.  See Figure 7 for an example.  The plots are also
    reviewed by a second individual.  A consolidated list of dates and times
    is entered into the computer for input into an automatic retrieval of
    minute plots using the RAPS*GRAPHICS program.  These plots from the RAMS
    minute archive are reviewed by DMSAS personnel.  From the original
    review file a second computer disc file  (preliminary update file) containing
    dates, time periods and suggested changes and flags  is prepared.  This
    list with corresponding minute/hour plots is forwarded to Rockwell for
    review and investigation of cause.  With the concurrance of DMSAS and
    Rockwell the final output from the graphical review  process is reached:
    an update file for the RAMS minute/hour  archive.
         Minute Successive Difference Study. — Visual  inspection of  hour
    data will detect  large discontinuties  in time series plots of a measurement
    or uncorrelated  traces between stations.  However,  if a few minutes of
    "spiky" data were recorded during an  hour,  the hour  average may only be
    changed by a few  percent.  Since hour-to-hour variations  in almost all
    RAMS parameters  can normally be much  larger than a  few percent, small
    changes caused by errors in minute data  will not be  detected by visual
    observation.
    

    -------
         To determine the quality of the minute archival  data we have been
    applying a flagging procedure based on distribution functions of minute
    successive differences.   This technique is based on the assumption that
    minute successive differences will approximate a normal distribution
    with mean zero.  See Figure 8 for an example of a distribution function
    of ozone data from a five hour period.
         The RTI study (9) has shown that for a given parameter sample
    standard deviations of minute successive differences are not constant
    over stations, time of day, or seasons.  The functional form for E,
    which can be expressed as:
                        I E z (parameter, date, time, station)
    is not known, however.  Therefore, this study data flags have been
    chosen as 4*Zmax where Zmax is the largest sample standard deviation
    found in the RTI study for a given parameter.  These 4 sigma limits are
    listed below.
    
                   RAMS Variable                       4 Sigma Limit
              Windspeed (meters/sec.)                    +_ 3.0
              Temperature (°C)                           +_ 0.7
              Ozone (ppm)                                + 0.010
              CO (ppm)                                   +1.97
              Methane (ppm)                              +0.32
              THC  (ppm)                                  +0.84
              NO (ppm)                                   +   .028
              NOX  (ppm)                                  +   .035
              Total Sulfur (ppm)                         +_  .022
              S02  (ppm)                                  +_  .015
                                      275
    

    -------
         Flagged data (dates, times,  station,  etc.)  are stored on the Univac
    1110 and can be used as input in  further analysis.   Programs exist to
    automatically print and plot suspect data.   An example of minute temperature
    data with a succession of outliers is shown in Figure 9.   The corresponding
    hour average is circled in Figure 7.
         Application of the minute succesive difference technique in a RAMS
    data base update module will permit recalculation of hour averages which
    contain significant amounts of erroneous spiky data.
    
    RAMS Data Base Update
         The visual validation and successive difference studies are part of
    a review of the RAMS data being conducted by DMSAS and Rockwell.  The
    results of this review process will be an update file (dates, times,
    changes, flags, etc) and separate modules for an update program.  Figure
    10 shows this update process including review studies and specifically
    known problem areas.  Underlying this review is the requirement  that all
    changes will be documented and concurrance required as to probable cause
    of suspect data.
    
    Monitoring System with Quality Control and Data Screening
         Figure 11 illustrates the data validation process within the framework
    of a generalized monitoring  system.  Associated with sensor  instruments
    and the data acquisition system are quality control blocks which contain
    those elements required  for  acquiring acceptable data:  calibration,
    system status and sense  bits, preventive maintenance, training and
    operation and maintenance documentation and records.  Data  processing
                                     276
    

    -------
    and screening should take place soon after data acquisition to permit
    system feedback in the form of corrective maintenance, changes to control
    processes and even to changes in system design.
         A control data set (or sets) should be created for use in software
    verification.  When software changes are made, the control data set is
    processed and the output compared with previous versions.  This is
    analagous to the recalibration of gas analyzers after maintenance.  The
    control data set used in RAMS is 1-day's data from all sites.
         The effectiveness of the data review process can be greatly enhanced
    by the use of graphics.  Review of graphical displays of raw data permits
    a rapid continuity check of individual time series and a visual correlation
    of network data.  Graphics naturally augments automated data validation
    techniques which are necessarily based on a priori knowledge of system
    performance characteristics, expected magnitude and variations in recording
    levels, etc.
         A monitoring system is dynamic in nature - responding to changing
    hardware/software requirements and to variations in operating and maintenance
    procedures.  On-site, near-real-time data review and allowance for
    feedback in system design can minimize the amount of lost or marginally
    acceptable data.
                                     277
    

    -------
    References
    
    
    (1)  Jurgens, R.B. and R.C.  Rhodes,1976:   Quality Assurance and Data Vali-
         dation for the Regional Air Monitoring System of the St.  Louis
         Regional Air Pollution  Study.   Proc.  of the Conference on Environmental
         Modeling and Simulation, EPA 600/9-76-016,  730-735.
    
    (2)  Burton, C.S. and G. M.  Hidy, 1974:   Regional Air Pollution Study Program
         Objectives and Plans, EPA 630/3-75-009, 53  pp.
    
    (3)  Browning, R.H., 1977:  (RAPS)  Description and Status of the Data Measure-
         ments, Quality Assurance and Data Base Management System, (unpublished),
         72 pp.
    
    (4)  Meyers, R.L. and J.A. Reagan,  1975:   Regional Air Monitoring System at
         St. Louis, Missouri, International  Conference on Environmental Sensing
         and Assessment, Paper 8-2, Lofc #75-37494,  4 pp.
    
    (5)  Hern, D.H. and M.H. Taterka, 1977:   Regional Air Monitoring System Flow
         and Procedures Manual,  EPA Contract DU 68-02-2093, 177 pp.
    
    (6)  Quality Assurance Handbook for Air Pollution Measurement Systems, 1976:
         Volume  I, Principles, EPA 600/9-76-005, 365 pp.
    
    (7)  Ott, W.R., 1974:  Selection of Probability Models for Determining Quality
         Control Data Screening Range Limits, Presented at 88th Meeting of the
         Association of Official Analytical  Chemists, Washington, D.C., 6 pp.
    
    (8)  Suta, B.E. and G.V. Lucha, 1975:  A Statistical Approach for Quality
         Assurance of STORET-Stored Parameters, SRI, EPA Control No. 68-01-
         2940, 8 pp.
    
    (9)  Hartwell, T. and F. Smith, 1977:  Study of Two Data Validation Procedures
         for  the RAPS Data Bank, RTI project 43U-1291-2, EPA Contract 68-02-2407,
         46 pp.
                                        278
    

    -------
    "I
    ro I
    a '
            a
            •—I
            a
                                                    c
                                                    o
                                                   en
                                                    c
                                                    o
                                                    '+-J
                                                    03
                                                    o
                                                    o
      273
    

    -------
          00
       LU O
       CO >— i
       Z h-
          oo
                      CM ro    no    in
                      i— r—    r—    CM
                              in
                              CM
                                    in
                                    CM
    in
    CM
               in
               CM
                                        in
                                        CM
    in
    CM
    in
    CM
    in
    CM
    in    CM
    CM    i—
                                                                                                                 CM
                                                                                                                       CM
       OH —I
       •=> et
       oo >
       
                                                        01
    
                                                        o         •—
                                                        C      O  I
                                                        •r-     O     t
                                                                                                                    o
    ex:
    cc
    o
    CO
                  CM
    
                 O
                 00
                 LU
                 O
                 X
                 o
    Q
    
    o:
                                oo
                                 CM
                                                  X
                                                  o
                                                                                                     LU
                                                                                                     O
                                                                                                           CM
         00
    
    
    
         eg
    
         U-
         	I
         =3
         oo
                                OO
                                       co  x
                                      o    o
    CD          O
    O    LU    i—i
    OH    Z    OH
    O    O    I—
    >•    M    HH
    x    O    z
    o
    
    oo
    LU
    O
                                                  X    HH
                                                  o    z
                                             CM
                                           O
     O    LU
     i-i    Q
     X    i—i
     O    X
     1-1    O
     O    Z
                                                        LU
                                                 a:
                                                 5
                                                                    C_J
                                                                    LU
                                                             OO
    
                                                             O
                                                             CD
                                                                          u
                                                                          o
                                                                                               Q.    t— •
                                                                                        UJ
                                                                                        t— •
                                                                                        O
                                                                                   00
                                                                                         O
                    CJ
                    LU
                    OH
    
    
                    O
    
    
     -
                                                                                         Q.
                 
    -------
                            o
                            QC <
                                             LU
                                             H- 00
    
    
                                             II
                                             LU O_
                                             QC
    <
    I-
    <
    a
    
    oo
    00
    
    00
            cc
    
            <
    
                                                                      CO
                                                                      2
                                                                      O
                                                                   LU
                                                                      o
                                                                      o
                                                                                  -t >
                                                                                  < I-
                                                                                     0
                                                                                  LU <
                                                                                  O LU
                                   x o o
                                   320
                                   5 < <
    o
    _J
    2
    O
    p
    <
    K
    oo
                                                                                 00
                                                                      CO LU
                                                                      2 h-
                                                                      — CO
                                                                         >-
                                                                         CO
                                                                                                    o
                                                                                                   in
                                                                                                   cc
    03
    •o
                        CO
                        I-
                        2
                        LU
                        ^
    
                        cc
                        H-
                        GO
    
    ,
    ^
    
    j
    *•
    *u.
    O
    0
    0
    Mi
    J
    o
    QC
    O
    LU
    t-
    LU
    ^
    •
    ^
    
    
    
    
    
    
    
    
    
    
    2
    O
    H-
    
    Q
    <
    QC
    QC
    _i
    O
    00
    
    
    
    
    
    
    
    
    
    
    
    >
    j-
    _
    <
    
    
    »•
    •N
    J
    t
    3
    C
    3
    QC
    <
    
    I
    
                                                                id cc
                                                                < 22
                                                                a _,
                                                   281
    

    -------
        CJ
    CO h-
    
    cc a
       co
       LU Jrt
     CC
     *"
                1
    LU
    O
    oo
    00
    
    CO
      _    -
      « CO
    
      0
      a. J3
       a co
     in -J LU
     CM UJ K
       LU CO
                                       CO
    
                                     LU C3
                                     H- <
                                     = CC
    
                              m|*l>
    
                              H- co — co" cc
    
                              u5£5i =
                              CC CO of L! 3-
                              " S rf ^
                     I- >
                     < 00 2
                     a Q o
                           CO
    = < H
    2 a co
    
    
    111
                              P < o <
                              co a x DC
                              oc — u. LU
                                           •M
    
                                           'o
                                           c
                                           01
                                           o
                                           CO
                                           •H1
                                           nj
                                           T3
                                           CO
                                           CO
    
                                           O)
                                           O)
                               LU
                                    CO
                     o:c
              282
    

    -------
          TABLE 2.   SCREENING CATEGORIES FOR AUTOMATED RECORDING NETWORKS
    
    
    
    
    
    
    
    
    I.    OPERATIONAL
    
    
    
              NO INSTRUMENT
    
    
    
              MISSING MEASUREMENT
    
    
    
              STATUS
    
    
    
              CALIBRATION
    
    
    
    
    
    
    II.   CONTINUITY AND RELATIONAL
    
    
    
         A.    INTRA-STATION
    
    
    
                   CALIBRATION DRIFT
    
    
    
                   LOWER DETECTABLE LIMITS
    
    
    
                   GROSS LIMITS
    
    
    
                   AGGREGATE FREQUENCY DISTRIBUTIONS
    
    
    
                   RELATIONSHIP AMONG PARAMETERS
    
    
    
                   TEMPORAL CONTINUITY
    
    
    
                        CONSTANT OUTPUT
    
    
    
                        SUCCESSIVE DIFFERENCE
    
    
    
         B.    INTER-STATION
    
    
    
                   METEOROLOGICAL NETWORK UNIFORMITY
    
    
    
                   STATISTICAL OUTLIERS
    
    
    
    
                   DIXON RATIO
    
    
    
    
    
    
    III. A POSTERIORI
    
    
    
              REVIEW OF STATION LOG
    
    
    
              UNUSUAL EVENTS OR CONDITIONS
    
    
    
    
              VISUAL INSPECTION OF DATA
    
    
    
    
    
                                      283
    

    -------
                   TABLE 3.   GROSS LIMITS AND RELATIONAL CHECKS
    PARAMETER
    INSTRUMENTAL LIMITS
    INTERPARAMETER CONDITION
    
    Ozone
    Nitric Oxide
    Oxides of
    Nitrogen
    Carbon Monoxide
    Methane
    Total Hydro-
    carbons
    Sulfur Dioxide
    Total Sulfur
    Hydrogen Sul-
    fide
    Aerosol Scatter
    Wind Speed
    Wind Direction
    Temperature
    Dew Point
    Temperature
    Gradient
    Barometric
    Pressure
    Pyranometers
    Pyrgeometers
    Pyrehliometers
    LOWER
    .005 ppm
    .005 ppm
    .005 ppm
    .1 ppm
    .1 ppm
    .1 ppm
    .005 ppm
    .005 ppm
    .005 ppm
    0.00001 m"1
    .27 m/s
    0°
    -20°C
    -30°C
    -5°C
    950 mb
    -0.50
    0.30
    -0.50
    UPPER
    5 ppm
    5 ppm
    5 ppm
    50 ppm
    50 ppm
    50 ppm
    1 ppm
    1 ppm
    1 ppm
    
    N0*03 £0.01
    NO - NOX £ .002
    NO - NOX <_ .002
    
    CH4 - THC 1 -1
    CH4 - THC <_ .1
    S02 - TS 1 .002
    S02 - TS l .002
    H2S - TS £ .002
    
    
    (NO)
    (NOX)
    
    (CH4)
    (THC)
    (so2)
    (TS)
    (H2S)
    0.00099 m"1
    22.2 m/s
    360°
    45°C
    45°C
    5°C
    1050
    2.50
    0.75
    2.50
    
    
    DP - 1.0 <.T
    
    mb
    Langleys/min
    Langleys/min
    Langleys/min
    
    
    
    
    
    
    
    
                                      28^
    

    -------
                 IRREGULAR INSTRUMENT
                        RESPONSE
    A) SINGLE OUTLIER                B) STEP FUNCTION
    
    
    
    
           A.
     •••••S   •••••••           •*•••••••••••••
    
    C) SPIKE                       D) STUCK
     • •%••      ••*••%          ••*•••
     E) MISSING                     F) CALIBRATION
    
    
    
    
    
    
      • ••••*
    
    
     G) DRIFT
    
    
    
    
    
             Figure 4. Irregular instrument response.
                            285
    

    -------
            TABLE 4.   MAXIMUM ALLOWABLE DEVIATIONS  FROM NETWORK MEAN
                 UNDER MODERATE WINDS (NETWORK MEAN > 4 m/sec)
             WIND SPEED                    2 m/sec OR mean/3
                                            (WHICHEVER IS LARGER)
    
             WIND DIRECTION                30°
    
             TEMPERATURE                    3°C
    
             TEMPERATURE DIFFERENCE       0.5°C
    
             DEW POINT                      3°C
    
             ADJUSTED PRESSURE            5.0 millibars
                    TABLE 5.   RAMS DATA VALIDATION VERSION 6.4
    
    
         CAUSE                                   FLAG
    
    1.   MISSING DATA                            1037
    
    2.   CALIBRATION DATA                        1035
    
    3.   EXCESS DRIFT                            -VALUE
    
    4.   FAILED RANGE TESTS                      1034
    
    5.   LDL CHECKS                              1/2 LDL
    
    6.   STATUS ERROR                            VALUE X 10"25
    
    7.   FAILED RELATIONAL TESTS                 VALUE X 1032
    
    8.   FAILED TIME CONSTANT TESTS              VALUE X 1024
    
    9.   FAILED NETWORK TESTS                    Q.  A. REVIEW
    
    10.  DATA MANAGEMENT OVERRIDE                VALUE X 10"15
                                    286
    

    -------
     ro
    T3
     13
     C
     O)
     C
    
    
    
    
    
    
    S screen
    <
    tr
    73
    
    6
    
    LO
     O)
     i_
     13
     O)
    

    -------
                UJ
                o
                oo
                O
                o;
                                              i—    CM
                                                          o
                                                          CM
                                                       O
                                                       CM
    C£ UJ
    ro >-
    o \
    :r i—
       o
       _i
       CL.
                            VO
                                  00
                                  CO
          CD    00
          r—    CO
    •—    O
    co    co
    CTl    CO
          CM
    ce
    o
                o
                oo
                      O
                      Lf>
                      CM
                   O
                   CM
                                  CO
                                                     CO
                      •=*
                      Cft
                      CM
                                                                 LO
                                                                 CO
    o
    a:
    u,
    o
    
    UJ
    s
    I
    — _J
    o
    LU LU
    1— >-
    Z3 \
    2: oo
    1—4 t—
    	 [
    Q.
    LO
    OU
    f>n
    CO
    CM
    
    
    O
    O
    cn
    px—
    
    
    
    o
    co
    f*«*.
    
    
    
    
    LO
    vo
    co
                                      LO
                                      ko
                                      co
                                                     O
                                                     co
                                                           CM
          v:
    
          o
            *
          o
    en
    •a:
             00
                o:
                e£
             UJ >-
              ^3 o:
              (/> UJ
                      o
                      co
                                   co
                             co
                                oo
                                to
                                         co
                                               LO
                                               co
                o
                CM
                                                           CM
                                                           o
                                                           CM
                                                        o
                                                        o
                                                        LO
              eu oo
              z. -z.
              i-. o
              Q 1-1
              0£. \—
              0<
              O I—
              uj oo
              cc
              LO    CO
              CM    r-
                          CM
                                      <£>
                 oo
                 Qi
              CC. UJ
              UJ h-
              CQ LU
                       co
                             CM
                                         i—    i—    CM
                                                                 OO
    
    
    
                                                                 I
                                                        ct:
                                                  o    
    -------
                    HOUR AVERAGE
                        PLOTS
                    IDENTIFICATION
                         OF
                     SUSPECT DATA
                       REVIEW
                        PLOTS
                         OF
                     MINUTE DATA
                    INITIAL ANALYSIS
                         OF
                  QUESTIONABLE DATA
                     VERIFICATION
              OUTPUT: FILE OF TIME PERIODS
                     AND CHANGES
    Figure 6.  Visual review of RAMS hour average data.
    
                           289
    

    -------
                                                 SNOIlViS
                                                      in
                                                      o
    
    C;
    in
                                                                                                                a z
                                                                                                                UJ O
                                                                                                          my  Jf?
                                                                                                          ro ^r  I**
                                                                                                                at
    
    
                                                                                                                a>
                                                                                                                in
    
                                                                                                                at
                                                                                                                 in
                                                                                                                 r»
                                                                                                                 O)
                                                                                                          _ o>   o
                                                                                                          S£S   5
                                                                                                                 <
                                                                                                                 a
    •3- CM    CM «» CM    CM rt CM
                                    000 OOOO OOOO 000  O  00000  000 00
                                    CM^CM    CM^CM    CtJ^-CM    CM^-CM     CM^a**"    	---
                                                                                  CMQ'CM    (M«teM
                                                                                                                        CD
                                                                                                                        a
    
                                                                                                                        a>
                                                                                                                        (O
                                                                                                                        •4—'
                                                                                                                        TO
                                                                                                                        tu
                                                                                                                        O)
            O
            JC
    
            CO
    
    
    
            cc
    
    
            r--'
            
    -------
    I    I   I   I   I   I    I   I    I   I   I   I   II
    II   I   I
           I    I   I    I   I   I   I   I   I   I    I   I   I   I   I   I   I
                                                                    E
                                                                               c
                                                                               o
                                                                               o
                                                                               c
                                                                               C
                                                                               .Q
                                                                               O)
                                                                               o
                                                                               c
                                                                               0)
                                                                                0)
                                                                                0
                                                                                u
                                                                                c
    
                                                                                E
                                                                                O)
                                                                                c
                                                                                o
                                                                                N
                                                                               O
    
    
                                                                               CO
                                                                                01
                                     291
    

    -------
    1 1 \ fl I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    1
    1
    1
    
    V
    1
    
    —
    
    
    
    	
    
    
    
    
    
    
    
    — 1 -
    1
    /
    1
    
    •
    
    —
    
    
    
    
    
    J t
    
    
    
    
    
    
    lAv
    
    . . 1 , , , , 1 . . . I 1 . . . . 1 . . . .
    a
    D
    
    en
    CM
    CM
    CD
    o
    ^
    CM
    O
    
    o
    en
    en
    0
    
    
    o
    o
    00
    ^
    a
    
    
    
    en
    5
    a
    
    o
    o
    in
    0
    o
    g
    CM
    in o in o in a *T
    «- «- CM«-
    
    
    
    
    
    
    
    CO
    in
    en
    CM
    in
    ^
    o
    CM
    O
    O
    *~
    §
    CM
    
    in
    ^
    o
    s
    
    
    
    
    
    
    
    
    
    
    CO
    0
    c
    o
    (O
    +-t
    E
    s
    'ro
    (D
    T3
    _^
    L
    3
    to
    a3
    Q.
    E
    
    3
    C
    ai
    O)
    ^
    LL
    
    
    
    (30) 3UniVU3dW31
          292
    

    -------
                       HOUR
                      ARCHIVE
                     VERSION 6.4
                      MINUTE
                      ARCHIVE
                     VERSION 6.4
                                            UPDATE
                                           PROGRAM
    VISUAL VALIDATION
         STUDY
       SUCCESSIVE
      DIFFERENCES
                                UPDATE
                                 FILE
                                 WIND SPEED
                                   SPIKES
                                     CO SPIKES
                                   (AFTERCALIB)
                                         NEGATIVE
                                       POLLUTANTS
                                              LDL
       HOUR
     ARCHIVE
    VERSION 7.0
                           Figure 10.  RAMS update.
    
                                      293
    

    -------
                         ll
                         LU {/)
                            CJ
    
                         oc o
    
                         =3 K
    
                         O O.
    
    ocu 2
    
    1-0-1
    Z co <
    
      ~~
                                           II
                                           t— UJ UU
    
    CO
    
    a
                                                                       E
                                                                       o>
                                                                       01
             c
             o
    
             E
    
            •a
             a>
             N
                                                                       9)
    
    
                                                                       CD
                                                                       O)
    
    
                                                                      LL
                                  294
    

    -------
                          NAMES  AND  ADDRESSES:
                                 PROGRAM
    
    
    
    EPA/RTP INTERLABORATORY QUALITY ASSURANCE COORDINATING COMMITTEE
    
    
    
                  DATA VALIDATION CONFERENCE, SPEAKERS
    
    
    
                 DATA VALIDATION CONFERENCE, ATTENDEES
                                    295
    

    -------
                            Conference on Data Validation
                        Research Triangle Park, North Carolina
                                  November 4, 1977
    
    The program that was distributed before the meeting is presented.  Following
    this is an alternate schedule, which was used the day of the meeting.
                                         296
    

    -------
             CONFERENCE
                    ON
         DATA VALIDATION
      Environmental Research Center Auditorium
          Highway 54 and Alexander Drive
        Research Triangle Park, North Carolina
    
                November 4, 1977
                  Sponsored by
    
           ERC/RTP Interlaboratory Quality
          Assurance Coordinating Committee
    U.S. ENVIRONMENTAL PROTECTION AGENCY
         Office of Research and Development
        Research Triangle Park, North Carolina
                  297
    

    -------
                         PROGRAM
      8:00
     Registration
      8:25
     Welcome
           Dr. J.K. Burchard
    Sen/or ORD Official, RTP
      8:30
     Opening Remarks
              S. Hochheiser
                     EMSL
    GENERAL SESSION
                                     D.J. von Lehmden
                                               EMSL
                                           Moderator
     8:35
    What is Data Validation?
               R.C. Rhodes
                    EMSL
      8:45      Validation Procedures Applied to
               In-Use Motor Vehicle Emission Data
                                         M.E. Williams
                                       EPA, Ann Arbor
     9:15      Computer Graphics In Data
               Validation
                                        Dr. R.H. Allen
                                           COMP-AID
     9:45
             COFFEE BREAK
     10:00      Engineering Computations and Data
               Collection Formats Useful in
               Data Validation
                                       A.C. Nelson, Jr.
                                              PEDCo
     10:30      Regional Validation of State and
               Local Air Pollution Data
                                            T.H. Rose
                                       EPA, Region IV
     11:00      Use of Precision and Accuracy
               Estimates for Validation of Data
                                        Dr. D.T. Mage
                                               HERL
     11:30
     12:30
                LUNCH
                               298
    

    -------
                     J2 S to
                     -*UJ ®
                     <    t5
    
                     q    i
    
                     CD
               f 2
               LU§
    ceo
    29;
    S5
    O
    L_I
    r^
    Q
    .
    
    ^
    
    
    GO
    —.  w
    
       €
       o
    4
      O"
      O
    >
    
                     "5.
                     a.
                     <
    
                     «
                     "5
    
                     a
                     (A
             J3 C
    
               !w
             o>|
             £-2
             c15
               !Z
                r
                co£
                o
                co
                cvi
                                                    o
    
                                                    cvi
                                                       m
                                                       co
                                                       cvi
    co
    
    O
    
    CO
    f0
    uj
    co
    u.
    O
    
    cc
    UJ
    Q
    
    
    
    1
    LU
    cc
                                         299
    

    -------
                          AUDITORIUM
    Special Environmental
    Monitoring Studies
                                      Dr. M.M. Bufalini
                                               ESRL
                                           Moderator
    2.50       Data Validation for the Los Angeles
               Catalyst Study (LACS)
    
    3:25       Validation Techniques Used in
               Continuous Air Monitoring
               Network (CHAMP)
                                           C.E. Rodes
                                               EMSL
    
                                        Dr. M.B. Hertz
                                               HERL
    3.50       Validation System Used in the
               St. Louis Regional Air Monitoring
               Study (RAMS)
                                          R.B.Jurgens
                                                ESRL
    4:15
    Closing Comments
    S. Hochheiser
           EMSL
                              300
    

    -------
                           CONFERENCE ON DATA VALIDATION
                       Research Triangle Park,  North Carolina
                                 November 4,  1977
    
    
                               (Alternate Schedule)
                                    AUDITORIUM
    8:00     Registration
    
    
    8:25     Welcome
    
    
    8:30     Opening Remarks
    
    
    
    8:35     What is Data Validation?
    Dr. O.K. Burchard,
    Senior ORD Official, RTP
    
    Seymour Hochheiser,
    EMSL
    R.C. Rhodes, EMSL
                                        301
    

    -------
    o
    
    I—I
    Q
           C  i-
          IS
           
           (J  fC
     C  O -i- Q
     O 4-> -C
    •i-     O)  C
    -t-» TD >  O
      
     US     $••  fO
     CO     CD T3
     J-        •!-
    03     J- r-
           CO  <0
     CO     •»-> >
    
    «t-     Q. «O
                  0)
                  CT
                  (O
                                           • o:
                                          S- LU
                                         Q DC
                                                              CO
                                                              QJ
                  C +•>  O
                  O  CO
                  •r- LU  C
                  00     O
                  •r-  >,-i-
                  O  O  -4->
                  O)  (0  (O
                  J-  S.-0
                  a.  3  -I-
                      O  r—
                  <4-  O  
     O  (O
    O Q
                            O
                            O
                                                           0)
                                                           CO
                     -O $-
                      C O
                               o
                               co
                                                                                co
                                                                CO
                                                                CO
                                                                LU
                                                                co
                                                                                       CO
                                                                                       01
                                                                                       3
                                                                                       CT
                                                                                        U
                                                                                        OJ
                                                                                        U
                                                                                       40
                                                                                        CO
                                                                                       co       r-
                                                                                       cos.
                                                                                CO  -P O) Q. 3
                                                                                    5- O)     CO
                                                                                 a>  n3 i-  $- (O
                                                                                j=  x: o -i- aj
                                                                                H-  o co «a: s:
    
                                                                                o
                                                                                co
                                                                                 • •
                                                                                CM
           •r-  S-
           f—  O
           (O 4J
           »*-  ra
           3  S-
           CQ  O)
              •a
             •  o
                                                                LU
    
                                                                U.
                                                                <
              co
           a
     O
     CO
    
     O)
    •z.
         o
      • c_>
    O Q
      . LU
    «C 0-
        O
     OJ •r-
     co cn
     o a>
    Q; a:
                                                          -a
                                                           o
                                                          cc:
                                                                                        c  s-
                                                                                        a>  
            CO
            CO
            CD
           CO
            «0
    
            a>
            c
            0)
           CD
     3     C
     O.    .r-c
    
     O  «O r—  O
    O  -M 3 •!-
        «O <4- 4->
     O)Q CO  (O
     C     00 T3
    •r-  TO Z> -i-
     J-  C    .—
     co  «o co   U. O
     O
     O
                       (U
                       S-
                      CQ
    
                       OJ
                       at
                                       o
                                      o
                      in
                             c:
                             O r-  «J
                             •^  »0  +->
                             •»J  O  fO
                              ( —
     C  «J  O
     O 4-> O-
     •r- CO
     a>    s-
     co M- -i-
     Q£  O <
     O
     O
                                                     3     T-
                                                                  CO
                                                           S-     O
                                                           o     <
                   C CO
                   o a»  >>
                  •r- i— -O
                  +•> a>  3
                    o  >>
                                                               _J t—
                                                            (O     (O
                                                           4-> O) 4->
                                                            to x:  (O
                                                           Q 4-> O
                         LO
    
    
                         O
                                                        o
                                                                         o
                                                                         co
                                                                                        CO
                                                                                        c
                                                                                        o
                                                                                        IT]
                                                                                        o
                                                                        a.
                                                                        a.
                                                                       o
                                                                       O)
                                                                       Q.
                                                                      co
     00      CO
    
     U      (O
     co      e
    
    O  CO r—
        ^ o
    
     C     r—
    
     C  Z3  C  S-
     CO      O  CO
     co  -o ••-  +•>
     S-  OJ +J  C
     u  co  (O  co
    CO  3 Z  CJ
    
    o
    co
     • •
    CM
                                                                302
    

    -------
    an
    o
    o
             c
             re
    
             S-
             3
            CJ>
    
            I— 00
                D.
              • cr
             i- <
            Q O
             (/) 3
             CU r-
    
                'o
             Q.Q-
             fO
            C3 S-
    
             C <
             O
             3 S-
            -Q 3
            •r- O
             s- n:
            -t->     to
             01 s- +->
            •r- O  re
            O 4- Q
                     OJ
                     cu
    
                     .a
    
                     re
                     Q-
    
                     O
     I
     E
     re
    oo
        cn  re
    r—  C -P
     re -I-  re
     o -P o
    
    •P T3  01
     to -i— -p
    •r- r—  O
    -p  re  cu
     re > 4-
    
    oo  c LU
     o  cn-p
        c r—
     cu -r- re
     01 r— CU
                                           (0
                                           s-
                                           CQ
     CU
    4-
    
     O
                                          CU
                                                  S-
                                                  03
                         S- I—
                         a o:
            CU
            E  C
           •r-  O
           i—  x re
              •r- -P
            cu o re
            >     o
           •i- -a
            01  C i-
            01  re o
            cu     4-
            (_)  01
            o  cu -P  c
               o to  o
               c cu •!-
               oji- -P
               s-     re
               cu o -a
                                        3
                                        i.
                                                                     c
                                                                     re
                              tO
                            •  c
                           i-  O
                          Q O
    re
    
    01
    
    
    to
    
    (/) O
    
    
    re re
                                                    C
                                                    CU
                                                    01
    
                                                    3
                                                    CQ CtL
                                                     • 00
                                                                           oo
    
                                                                       "re -=c
                                                                        C OL
                                                                    -a  o —
                                                                     CU •!-
                                                                     01  cn >,
                                                                    rD  CU T3
                                                                       ct  3
                                                                     E     -P
                                                                     cu  01 co
                                                    to
                                                    O)
                                                   -o
                                                    O
                                                                                  o  oo
                                                                                  to
                                          3
                                         OO
                                                         i-
                                                           a c£. >
                                                         01 re,
                                                   r-   3
                                                                
                                          ' Q
                01  3 cn    c
                >, o c     cu
                
             o
                              O
                o:
                oo
                                 oo
                                 a.
                                                  C.
                                                  o
                                                  o
                                                  a a:
                                                           s-
                                                       i—  O
                                                       r- _a
                                                       cu  s-
                                                       tO ef.
    
                                                       3  C
                                                       re  c
                                                       Q. et
                                                       Q o:
                                                         • Q_
                                                       CJ LU
                                                    s-
                                                    O)
                                                                             CQ
                                                                              S-
                                                                             Q
                                                                                C£.
                                                     i.
                                                     O)
                                                     to
    
                                                     ai
                                                                                           o
                                                                                           o
                                                                    • oo
                                                                  oo LU
    o
    o
    OL
    oo
    oo
     S-
     O  CD
    n-  c
     o  c  o  re
    •1-  3 •!- -P
    •p  o  01  re
     re oo  01 Q
    •o     -r-
    •r-  S-  E  >,
    i	1- LU  S-
     re 
     re  Q.  re  c
    a •=> Q •-<
     01
     3
     O
     3 re
     c 4->
    •i- re
    4-> Q
     c
     O O)
    o c
        •^-
    1- S-
     o o
        +J
     E -r-
     o c
    •r- O
    •P 5!
     re
                              r—  re
                              re -t->
                              s> oo
                                                   re
                                                   o
    
    
    
    
    
    
    
    
    
    \s
    re
    cu
    
    03
    
    cu
    cu
    4-
    
    o
    0
    •r—
    •o cu
    QJ £T
    £ *r~
    O _l
    •I- 1
    CO C E
    o cu
    4- -P
    O -C 01
    01 >>
    C 3 00
    o o
    •r- S_ J-
    -P JC CU
    re i— -P
    -O 3
    •r- re a.
    r- -P E
    re re o
    => Q O
    cn
    C
    • r—
    C -P
    •r- 01
    CU
    c -o i—
    o cu
    •i— O1 CU
    -p => a
    re s-
    TD 01 3
    •r- CU O
    •— 3 CO
    re cr
    > -i- CU
    C r—
    re jz T-
    •P O J3
    re cu o
    a i— 2:
    01
    CD
    3
    cr 01
    •r- 3
    C O
    .C 3
    o c cn
    0) -r- C
    J— -P •!-
    c s-
    C O 0
    O 0 -P
    
    -P C C
    re -i- o
    T3 2!
    •i- T3
    .— cu s-
    re 01 -i-
    5» ^3 «i
    
    
    
    
    
    01
    •^>
    c
    cu
    
    
    o
    o
    
    cn
    c
    •r-
    01
    o
    r~
    CJ
             o
             o
                     o
                     oo
                  o
                  o
                                                  LT>
                                                    O
                                                    o
                                                                             OO
                                                                303
    

    -------
                            Conference on Data Validation
                       Research Triangle Park, North Carolina
                                  November 4, 1977
    Members of the EPA/RTP* Inter!aboratory Quality Assurance
    Coordinating Committee:
    Mr. Seymour Hochheiser, Chairman
    Assistant to the Director
    EMSL
    MD-75
    RTP, NC 27711
    Telephone:  (919) 541-2106
                FTS:  629-2106
    
    Mr. Raymond C. Rhodes, Secretary
    Quality Assurance Specialist
    STAB/EMSL
    MD-75
    RTP. NC 27711
    Telephone:  (919) 541-2293
                FTS:  629-2293 -
    
    Mr. Ferris B. Benson
    Quality Assurance Coordinator
    HERL
    MD-52
    RTP, NC 27711
    Telephone:  (919) 541-2545
                FTS:  629-2545
    Dr. Marijon M. Bufalini
    TPRO/ESRL
    MD-59
    RTP, NC 27711
    Telephone:  (919) 541-2949
                FTS:  629-2949
    
    Mr. William B. Kuykendal
    Mechanical Engineer
    IERL
    MD-62
    RTP, NC 27711
    Telephone:  (919) 541-2557
                FTS:  629-2557
    
    Mr. Darryl von Lehmden
    Chemical Engineer
    QAB/EMSL
    MD-77
    RTP, NC 27711
    Telephone:  (919) 541-2415
                FTS:  629-2415
     Acronyms arranged alphabetically and used in this and the subsequent two
     sections.
         EMSL - Environmental Monitoring and Support Laboratory
         EPA  - Environmental Protection Agency
         ESRL - Environmental Sciences Research Laboratory
         HERL - Health Effects Research Laboratory
         IERL - Industrial Environmental Research Laboratory
         MD   - Management Division
         NC   - North Carolina
         QAB  - Quality Assurance Branch
         RTP  - Research Triangle Park
         STAB - Statistical and Technical Analysis Branch
         TPRO - Technical Planning and Review Office
                                         305
    

    -------
                            Conference on Data Validation
                       Research Triangle Park, North Carolina
                                  November 4, 1977
                                  List of Speakers
    Dr. Rod Allen
    COMP-AID, Inc.
    Box 12327
    RTF, NC* 27709
    Telephone:  (919) 967-6376
    
    Ms. Carolyn P. Chamblee
    EPA/HERL
    MD-55
    RTP, NC 27711
    Telephone:  (919) 541-2348
                FTS:  629-2348
    
    Mr. Larry Claxton
    EPA/HERL
    MD-68
    RTP, NC
    Telephone:  (919) 541-2518
                FTS:  629-2518
    
    Dr. Harold Crutcher
    Consultant
    35 Westall Ave.
    Asheville, NC 28804
    Telephone:  (919) 253-2539
                FTS:  672-0961
    
    Dr. Thomas Curran
    EPA/OAQPS
    MD-14
    RTP, NC 27711
    Telephone:  (919) 541-5351
                FTS:  629-5351
    
    Dr. Tyler Hartwell
    RTI
    Box 12194
    RTP, NC 27709
    Telephone:  (919) 541-6453
    Dr. Marvin Hertz
    EPA/HERL
    MD-56
    RTP, NC 27711
    Telephone:  (919) 541-3124
                FTS:  629-3124
    
    Mr. William F. Hunt
    EPA/OAQPS
    MD-14
    RTP, NC 27711
    Telephone:  (919) 541-5351
                FTS:  629-5351
    
    Mr. Robert B. Jurgens
    EPA/ESRL
    MD-80
    RTP, NC 27711
    Telephone:  (919) 541-4545
                FTS:  629-4545
    
    Mr. William E. Klint
    NOAA
    Federal Building
    Asheville, NC 28801
    Telephone:  (704) 258-2850, ext. 755
                FTS:  672-0755
    
    Dr. David T. Mage
    EPA/HERL
    MD-56
    RTP, NC 27711
    Telephone:  (919) 541-3121
                FTS:  629-3121
                                         306
    

    -------
    Mr. Joseph E. McCarley, Jr.
    EPA/ESED
    MD-13
    RTP, NC 27711
    Telephone:  (919) 541-5245
                FTS:  629-5245
    
    Mr. A. Carl Nelson
    PEDCo
    Suite 201
    5055 Duke Street
    Durham, NC 27701
    Telephone:  (919) 688-6338
    
    Ms. Joan Novak
    EPA/ESRL
    MD-80
    RTP, NC 27711
    Telephone:  (919) 541-4545
                FTS:  629-4545
    
    Mr. C. Don Paul sell
    EPA
    2565 Plymouth Road
    Ann Arbor, MI 48105
    Telephone:   (313) 668-4342
                FTS:  374-8342
    Mr. Charles E. Rodes
    EPA/EMSL
    MD-76
    RTP, NC 27711
    Telephone:  (919) 541-3076
                FTS:  629-3076
    
    Mr. Thomas H. Rose
    EPA/SB
    College Station Road
    Athens, GA 30605
    Telephone:  (404) 546-3489
                FTS:  250-3489
    
    Ms. Marcia Williams
    EPA
    2565 Plymouth Road
    Ann Arbor, MI 48105
    Telephone:  (313) 688-4342
                FTS:  374-8323
     See previous section for definition of acronyms.
     viously are defined as follows:
                Acronyms  not  used  pre-
         NOAA  - National Oceanic and Atmospheric Administration
         OAQPS - Office of Air Quality Planning and Standards
         RTI   - Research Triangle  Institute
         GA    - Georgia
         MI    - Michigan
                                         307
    

    -------
                         Conference on Data Validation
                    Research Triangle Park, North Carolina
                               November 4, 1977
                               List of Attendees
    Gerald G. Akland
    EPA/EMSL/STAB
    MD-75
    RTP, NC  27711
    Tel:  (919) 541-2346
          FTS:  629-2346
    
    Rod Allen
    COMP-AID
    P.O. Box 12327
    RTP, NC  27709
    Tel:  (919) 967-6376
    
    Joseph S. All
    EPA/HERL
    MD-55
    RTP, NC  27711
    Tel:  (919) 541-2240
          FTS:  629-2240
    
    J. Anderson
    Rockwell International
    5529 Chapel Hill Blvd.
    Durham, NC  27707
    Tel:  (919) 942-2407
    
    D. W. Armentrout
    PEDco
    1499 Chester Road
    Cincinnati, OH  45246
    Tel:  (513) 782-4700
    
    James D. Ashworth
    U.S. Army Corps of Engineers
    P.O. Box 2127
    Huntington, WV  25721
    Tel:  (FTS) 924-5694
    Andy Berlin
    Xonics, Inc.
    P.O. Box 12415
    RTP, NC  27709
    Tel:  (919) 541-3080
    
    John Boston
    EPA/SDMO
    MD-55
    RTP, NC  27711
    Tel:  (919) 541-2337
    
    Frank Briden
    EPA/IERL
    MD-60
    RTP, NC  27711
    Tel:  (919) 541-2557
          FTS:  629-2557
    
    T. G. Brna
    EPA/IERL
    MD-61
    RTP, NC  27711
    Tel:  (919) 541-2915
          FTS:  629-2915
    
    Steve Bromberg
    EPA/QAB
    MD-77
    RTP, NC  27711
    Tel:  (919) 541-2273
          FTS:  629-2273
                                       308
    

    -------
    Robert Browning
    EPA/ESRL
    MD-80
    RTP, NC  27711
    Tel:  (919) 541-4545
          FTS:  629-4545
    
    Sam Bryan
    EPA
    Chapel Hill, NC  27514
    Tel:  (919) 541-2872
          FTS:  629-2872
    
    Marijon M. Bufalini
    EPA/ESRL
    MD-59
    RTP, NC  27711
    Tel:  (919) 541-2949
          FTS:  629-2949
    
    Bob Burton
    EPA/HERL
    MD-52
    RTP, NC  27711
    Tel:  (919) 541-1394
          FTS:  629-1394
    
    D.  Calafiore
    EPA/HERL
    MD-54
    RTP, NC  27711
    Tel:  (919) 541-2674
          FTS:  629-2674
    
    Oon Carpenter
    EPA
    Ann Arbor, MI  48105
    Tel:  (FTS) 374-4293
    
    Tom Caldwell
    Xonics,  Inc.
    P.O. Box  12415
    RTP, NC   27709
    Tel:   (919) 541-3080
    
    Susan S.  Casada
    Northrop  Services,  Inc.
    P.O. Box  12313
    RTP, NC   27709
    Tel:   (919)  549-0611
    Carolyn Chamblee
    EPA/HERL
    MD-55
    RTP, NC  27711
    Tel:  (919) 541-2518
          FTS:  629-2518
    
    Ronald Chambler
    NCHS - DPB
    Box 12214
    RTP, NC  27709
    Tel:  (919) 541-4422
          FTS:  629-4422
    
    Jonn Chavy
    Xonics, Inc.
    P.O. Box 12415
    RTP, NC  27709
    Tel:  (919) 541-3080
    
    Larry Claxton
    EPA/HERL
    MD-68
    RTP, NC  27711
    Tel:  (919) 541-2518
          FTS:, 629-2518
    
    John Clements
    EPA/EMSL
    MD-77
    RTP, NC  27711
    Tel:  (919) 541-2196
          FTS:  629-2196
    
    Wayne Clements
    TVA
    345 Evans  Bldg.
    Knoxville, TN   37902
    Tel:  (615) 632-4579
    
    William M. Cox
    EPA/OAQPS
    MD-14
    RTP, NC  27711
    Tel:  (919) 541-5312
          FTS:  629-5312
                                       309
    

    -------
    C. L. Cox, Jr.
    EPA/ADM
    MD-30
    RTP, NC  27711
    Tel:  (919) 541-2296
          FTS:  629-2296
    
    Tom Curran
    EPA/MDAD
    MD-14
    RTP, NC  27711
    Tel:  (919) 541-5351
          FTS:  629-5351
    
    Bob Currin
    Xonics, Inc.
    P.O. Box 12415
    RTP, NC  27709
    Tel:  (919) 541-3080
    
    Harold Crutcher
    35 Westall
    Asheville, NC  28801
    Tel:  (919) 253-2539
    
    Robin Davis
    EPA/CH/HERL
    MD-73
    RTP, NC  27711
    Tel:  (919) 541-2872
          FTS:  629-2872
    Davis Davis
    P.O. Box 12313
    RTP, NC  27711
    Tel:  (919) 549-2333
    
    Robert Denny
    EPA/QAB
    MD-77
    RTP, NC  27711
    Tel:  (919) 541-2785
          FTS:  629-2785
    
    0. L. Dowler
    EPA/HERL
    MD-56
    RTP, NC  27711
    Tel:  (919) 541-3126
          FTS:  629-3126
    Ronald Drago
    EPA/MDAD
    MD-14
    RTP, NC  27711
    Tel:  (919) 541-5486
          FTS:  629-5486
    
    Cary Eaton
    RTI
    P.O. Box 12194
    RTP, NC  27709
    Tel:  (919) 541-6920
    
    Foy W. Edwards
    TVA
    345 EB
    Knoxville, TN  37902
    Tel:  (615) 632-2071
    
    Susan B. Edwards
    NRCD - Air Quality
    P.O. Box 27687
    Raleigh, NC  27611
    Tel:  (919) 733-5125
    
    Gardner Evans
    EPA/STAB
    MD-75
    RTP, NC  27711
    Tel:  (919) 541-2292
          FTS:  629-2292
    
    Gary Evans
    EPA/STAB
    MD-75
    RTP, NC  27711
    Tel:  (919) 541-2294
          FTS:  629-2294
    
    B. E. Edmonds
    EPA/EMSL
    MD-76
    RTP, NC  27711
    
    Donald H. Fair
    EPA/STAB
    MD-75
    RTP, NC  27711
    Tel:  (919) 541-2732
          FTS:  629-2732
                                       310
    

    -------
    Bob Faoro
    EPA/OAQPS
    MD-14
    RTP, NC  27711
    Tel:  (919) 541-5351
          FTS:  629-5351
    
    Paul Feder
    NIEHS - EBB
    P.O. Box 12237
    RTP, NC  27709
    Tel:  (919)  541-5402
          FTS:   629-5402
    
    H. L. Fisher
    EPA/HERL
    MD-74
    RTP, NC  27711
    Tel:  (919) 541-2631
          FTS:  629-2631
    
    R. Fisher
    EPA/ESRL
    MD-80
    RTP, NC  27711
    Tel:  (919) 541-4551
          FTS:  629-4551
    
    Nancy Gaskins
    RTI
    P.O. Box  12194
    RTP, NC  27709
    Tel:  (919) 541-6915
    
    Gerald Gipson
    EPA/OAQPS
    MD-14
    RTP, NC  27711
    Tel:  (919) 541-5486
          FTS:  629-5486
    
    Maurice  E.  Graves
    Northrop  Services,  Inc.
    P.O. Box  12313
    RTP, NC   27709
    Tel:   (919) 549-0411
    D. Glover
    Rockville International
    5529 Chapel  Hill Blvd.
    Durham, NC  27707
    Tel:  (919)  942-2407
    
    Bonnee Gryder
    Xonics, Inc.
    P.O. Box 12415
    RTP, NC  27709
    Rel:  (919)  541-3080
    
    Ed Hanks
    EPA/MDAD
    MD-14
    RTP, NC  27711
    Tel:  (919) 541-5474
          FTS:  629-5474
    
    F. Hageman
    Xonics, Inc
    P.O. Box 12415
    RTP, NC  27709
    Tel:  (919) 541-3080
    
    Martin Hamilton
    NIENS - EBB
    P.O. Box 12237
    RTP, NC  27709
    Tel:  (919) 541-5402
    
    Tyler Hartwell
    RTI
    P.O. Box 12194
    RTP, NC  27709
    Tel:   (919) 541-6453
    
    Tom Heiderscheit
    EPA/HERL
    MD-55
    RTP, NC  27711
    Tel:   (919)   541-2468
          FTS:    629-2468
    
    Marvin Hertz
    EPA/HERL, MD-56
    RTP, NC  27711
    Tel:   (919) 541-3124
          FTS:  629-3124
                                       311
    

    -------
    David 0. Hinton
    EPA/HERLD
    MD-56
    RTF, NC  27711
    Tel:  (919) 541-3126
          FTS:  629-3126
    
    Seymour Hochheiser
    EPA/EMSL
    MD-75
    RTF, NC  27711
    Tel:  (919) 541-2106
          FTS:  629-2106
    
    William F. Hunt
    EPA/OAQPS
    MD-14
    RTP, NC  27711
    Tel:  (919) 541-5351
          FTS:  629-5351
    
    R. C. Jordan
    Northrop Services, Inc.
    P.O. Box 12313
    RTP, NC  27709
    Tel:  (919) 541-2766
    
    Robert B. Jurgens
    EPA/ESRL
    MD-80
    RTP, NC  27711
    Tel:  (919) 541-4545
          FTS:  629-4545
    
    Robert Jungers
    EPA/ESRL
    MD-78
    RTP, NC  27711
    Tel:  (919) 541-2456
          FTS:  629-2456
    
    William E. Klint
    NOAA
    Fereral Building
    Asheville, NC  28801
    Tel:  (704) 258-2850
          FTS:  672-0755
    William B. Kuykendal
    EPA/IERL
    MD-62
    RTP, NC  27711
    Tel:  (919) 541-2557
          FTS:  629-2557
    
    Ralph I. Larsen
    EPA/ERSL
    MD-80
    RTP, NC  27711
    Tel:  (919) 541-4565
          FTS:  629-4565
    
    William D. Lee
    EPA/QAB
    MD-75
    RTP, NC  27711
    Tel:  (919) 541-2293
          FTS:  629-2293
    
    Robert E. Lee
    EPA/HERL
    MD-51
    RTP, NC  27711
    Tel:  (919) 541-2283
          FTS:  629-2283
    
    Barry Levene
    EPA - Region VIII
    1860 Lincoln Street
    Denver, CO  80203
    Tel: (303) 837-2226
         FST:  327-2226
    
    Dan Litton
    EPA/HERL
    MD-73
    RTP, NC  27711
    Tel:  (919) 541-2873
          FTS:  629-2873
    
    Raymond Michie, Sr.
    RTI
    P.O. Box 12194
    RTP, NC 27709
    Tel:  (919) 541-6492
                                        312
    

    -------
    Randell Morgan
    Xonics, Inc.
    P.O. Box 12415
    RTP, NC  27709
    Tel:  (919) 541-3080
    
    Gerald K. Moss
    EPA/MDAO
    MD-14
    RTP, NC  27711
    Tel:  (919) 541-5335
          FTS:  629-5335
    
    George C. Murray, Jr.
    NCAQ
    P.O. Box 27687
    Raleigh, NC  27611
    Tel:  (919) 733-5125
    
    J. E. McCarley, Jr.
    EPA/ESED
    MD-13
    RTP, NC  27711
    Tel:  (919) 541-5243
          FTS:  629-5243
    
    Linda J. McDay
    TVA
    345-EB
    Knoxville, TN  37902
    Tel:  (615) 632-2071
    
    John S. Nader
    EPA/ESRL
    MD-46
    RTP, NC  27711
    Tel:  (919) 541-0385
    
    A. Carl Nelson
    PEDco
    5055 Duke  Street
    Durham, NC  27701
    Tel:   (919) 688-6338
    
    William C. Nelson
    EPA/HERL
    MD-53
    RTP, NC  27711
    Tel:   (919) 541-2330
          FTS:  629-2330
    W. Norris
    Xonics, Inc.
    P.O. Box 12415
    RTP, NC  27709
    Tel:  (919) 541-3080
    
    Joan Novak
    EPA/ESRL
    MD-80
    RTP, NC  27711
    Tel:  (919) 541-4545
          FTS:  629-4545
    
    Barbara Nye
    EPA/HERL
    MD-56
    RTP, NC  27711
    Tel:  (919) 541-3125
          FTS:  629-3125
    
    Blaine F. Parr
    EPA/HERL
    MD-56
    RTP, NC  27711
    Tel:  (919) 541-3123
          FTS:  629-3123
    
    C.  Don Paul sell
    EPA
    2565 Plymouth Road
    Ann Arbor,  MI  48105
    Tel:  (313) 668-4342
          FTS:  374-8342
    
    Debora R. Pizer
    EPA/HERL
    MD-56
    RTP, NC  27711
    Tel:  (919) 541-3124
          FTS:  629-3124
    
    Francis Pooler
    EPA/ESRL
    MD-59
    RTP, NC  27711
    Tel:  (919) 541-2857
          FTS:  629-2857
                                       313
    

    -------
    James Reagan
    EPA/ESRL
    MD-59
    RTP, NC  27711
    Tel:  (919) 541-4486
          FTS:  629-4486
    
    Joan Reece
    EPA/HERL
    MD-55
    RTP, NC  27711
    Tel:  (919) 541-2466
          FTS:  629-2466
    
    Raymond C. Rhodes
    EPA/STAB
    MD-75
    RTP, NC  27711
    Tel:  (919) 541-2293
          FTS:  629-2293
    
    Wilson Riggan
    EPA/HERL
    MD-54
    RTP, NC  27711
    Tel:  (919) 541-2674
          FTS:  629-2674
    
    Charles D. Robson
    EPA/HERL
    MD-67
    RTP, NC  27711
    Tel:  (919) 541-2625
          FTS:  629-2625
    
    Charles E. Rodes
    EPA/EMSL
    MD-76
    RTP, NC  27711
    Tel:  (919) 541-3076
          FTS:  629-3076
    
    Tom Rose
    EPA - Region IV
    College Station Road
    Athens, 6A  30601
    Tel:  (404) 546-3111
    Glenn Ross
    NCAQ
    P.O. Box 27687
    Raleigh, NC  27611
    Tel:  (919) 549-8941
    
    Bill Sensing
    EPA/IERL
    MD-62
    RTP, NC  27711
    Tel:  (919) 541-2557
          FTS:  629-2557
    
    Frank D. Slaveter
    EPA
    401 M Street, S.W.
    EN 340
    Washington, DC  20460
    Tel:  (202) 755-1572
    
    Ben  Smith
    EPA/IERL
    MD-62
    RTP, NC  27711
    Tel:  (919) 541-2557
          FTS:  629-2557
    
    Paul E. Smith
    Xonics, Inc.
    P.O. Box 12415
    RTP, NC  27709
    Tel :  (919) 549-8941
    
    Ralph Sullivan
    Xonics, Inc.
    P.O. Box 12415
    RTP, NC  27709
    Tel:  (919) 549-8941
    
    Jake Summers
    EPA/MDAD
    MD-14
    RTP, NC  27711
    Tel:  (919) 541-5395
          FTS:  629-5395
    
    Jose  Sune
    EPA/HERL
    MD-56
    RTP, NC  27711
    Tel:  (919) 541-3127
          FTS:  629-3127
                                       314
    

    -------
    Richard Symonds
    Catalytic, Inc.
    P.O. Box 240232
    Charlotte, NC  28224
    Tel:  (704) 542-4107
    
    Charles Tate
    Xonics, Inc.
    P.O. Box 12415
    RTP, NC  27709
    Tel:  (919) 541-3080
    
    C. E. Tatsch
    RTI
    P.O. Box 12194
    RTP, NC  27709
    Tel:  (919) 541-5945
    
    Lawrence E. Truppi
    EPA/HERL
    MD-54
    RTP, NC  27711
    Tel:  (919) 541-2861
          FTS:  629-2861
    
    John Van Bruggen
    EPA/HERL
    MD  55
    RTP, NC  27711
    Tel:   (919) 541-2465
          FTS:  629-2465
    
    Darryl vonLehmden
    EPA/QAB
    MD-77
    RTP, NC  27711
    Tel:   (919) 541-2415
           FTS:  629-2415
    
    Betty Wagman
    EPA/EMSL
    MD-56
    RTP, NC   27711
    Tel:   (919) 541-3125
           FTS:  629-3125
    
    Kim Wattenbarger
    Xonics,  Inc.
    P.O. Box  12415
    RTP, NC   27709
    Tel:   (919) 541-3080
    J. E. Whitney
    EPA/WA
    RD-680
    401 M Street, S.W.
    Washington, DC  20460
    Tel:  (202) 426-4477
    
    Cindy Wingarden
    Xonics, Inc.
    P.O. Box 12415
    RTP, NC  27709
    Tel:  (919) 541-3080
    
    Mack Mil kins
    EPA/EMSL
    MD-45
    RTP, NC  27711
    Tel-(919) 541-3119
        FTS:  629-3119
    
    Marcia Williams
    EPA
    2565 Plymouth Road
    Ann Arbor, MI  48105
    Tel:  (313) 688-4342
          FTS:  374-8323
    
    Max Woodbury
    Rockwell  International
    5529 Chapel Hill  Blvd.
    Durham, NC  27707
    Tel:  (919) 493-2471
    
    Chris Woodbury
    Xonics, Inc.
    P.O. Box  12415
    RTP, NC   27709
    Tel:   (919) 541-3080
                                       315
    

    -------
    TECHNICAL REPORT DATA
    (Please read fnuructions on the reverse before completing}
    1 REPORT NO. 2.
    EPA-600/9-79-042
    4 TITLE ANDSUBTITLE
    DATA VALIDATION CONFERENCE, Proceedings
    7 AUTHOR(S)
    Raymond C. Rhodes and Seymour Hocheiser, Editors
    9 PERFORMING ORGANIZATION NAME AND ADDRESS
    Office of Research and Development
    Environmental Monitoring and Support Laboratory
    Research Triangle Park, N. C. 27711
    12 SPONSORING AGENCY NAME AND ADDRESS
    3. RECIPIENT'S ACC£SSIOf*NO
    5 REPORT DATE
    September 1979
    6. PERFORMING ORGANIZATION CODE
    8. PERFORMING ORGANIZATION REPORT NO.
    10. PROGRAM ELEMENT NO.
    11. CONTRACT/GRANT NO
    13, TYPE OF REPORT AND PERIOD COVERED
    14. SPONSORING AGENCY CODE
    EPA 600/08
    15 SUPPLEMENTARY NOTES
    16 ABSTRACT
    The proceedings document technical presentations made at a l-day
       conference  on  Data  Validation  for environmental data.  The conference
       was  hosted  and sponsored  by the U.S.  Environmental  Protection Agency,
       Research  Triangle Park I nter laboratory Quality Assurance Coordinating
       Committee on November  k,  1977, at the Research Triangle Park.  Various
       approaches  and techniques used for data validation are presented.
    17. KEY WORDS AND DOCUMENT ANALYSIS
    a. DESCRIPTORS
    Data Val idat ion
    Data Screening
    Data Editing
    Qua! i ty Assurance
    Out 1 iers
    Stat ist ics
    Environmental Data
    18. DISTRIBUTION STATEMENT
    Release to publ ic
    b. IDENTIFIERS/OPEN ENDED TERMS
    Environmental monitoring
    Data management
    19 SECURITY CLASS (This Report/
    Unclass if ied
    20 SECURITY CLASS (This page/
    Unclass if ied
    c COS AT i 1 icld. Group
    43F
    68A
    21 NO OF PAGES
    315
    22. PRICE
    EPA Form 2220-1 (9-73)
                                             316
    

    -------