Proceedings Data Validation Conference


            United States
            Environmental Protection
            Agency
              Environmental Monitoring
              Systems Laboratory
              Research Tnangle'Park NC 27711
EPA-600/9-79-042
September 1979
vvEPA
            Research and Development
Data Validation
Conference Proceeding^

-------
RESEARCH REPORTING SERIES

Research reports of the Office of Research and Development, U.S. Environmental
Protection Agency, have been grouped into nine series. These nine broad cate-
gon' - % - -e established to facilitate further development and application of en-
vironm, ,di technology. Elimination of traditional grouping was consciously
planned to foster technology transfer and a maximum interface in related fields
The nine series are:

1. Friivnonmental Health Effects Research

2 Br/ironmental Protection Technology

3. Ecological Research

4. E'r.Konmental Monitoring

5 Secioeconomic Environmental Studies

8 Sc«;;ntific and Technical Assessment Reports (STAR)

7 io;.jragency Energy-Environment Research and Development

3 "' ^.<>;;,a\" Reports

9 M'i'nellaneous Reports

This ref-:vt has been assigned to the MISCELLANEOUS REPORTS series. This
series is , t,:erved for reports whose content does not fit into one of the other specific
series Conference proceedings, annual reports, and bibliographies are examples
of mitcv.'uneous reports.
EPA REVIEW NOTICE
• r•,.$ been reviewed by the U.S. Environmental Protection Agency, and
. -'a' publication. Approval does not signify that the contents necessarily
refleU 1; •; views and policy of the Agency, nor does mention of trade names or
commH.G.a! products constitute endorsement or recommendation for use.

jr.iont is availabletothepublicthroughtheNationalTechnical Information
..\irgfield, Virginia 22161.

-------
                DATA VALIDATION CONFERENCE

                       Proceedings


                 Hosted and Sponsored by
         The U.S. Environmental Protection Agency
RTF Inter!aboratory Quality Assurance Coordinating Committee
                     November 4, 1977
                         Edited by
                     Raymond C. Rhodes
                            and
                    Seymour Hochheiser
             Office of Research and Development
            U.S. Environmental Protection Agency
        Research Triangle Park, North Carolina  27711

-------
                                 DISCLAIMER
     This report is a compilation of the papers presented at the Data
Validation Conference.  Each individual paper may not have received peer
technical review.  Technical review and clearance of these proceedings
was based primarily on the review of the executive summary and on the
general merits of the proceedings as a total entity.

     The content of these proceedings do not necessarily reflect the
views and policies of the U.S. Environmental Protection Agency  nor does
mention of trade names or commercial products constitute endorsement or
recommendation for use.
                                      n

-------
                                  FOREWORD


     Measurement and monitoring research efforts are designed to anticipate
potential environmental problems, to support regulatory actions by developing
an in-depth understanding of the nature and processes that impact health and
the ecology, to provide innovative means of monitoring compliance with regu-
lations and to evaluate the effectiveness of health and environmental pro-
tection efforts through the monitoring of long-term trends.  The Environmental
Monitoring Systems Laboratory, Research Triangle Park, North Carolina, has
the responsibility for:  assessment of environmental monitoring technology
and systems; implementation of agency-wide quality assurance programs for
air pollution measurement systems; and supplying technical support to other
groups in the Agency including the Office of Air, Noise and Radiation, the
Office of Toxic Substances and the Office of Enforcement.

     Data validation, an element of quality assurance is necessary to provide
accurate and reliable environmental data.  Data of known and acceptable
quality are needed for measuring compliance with regulations, assessing
health effects, and developing optimum strategies to cope with environmental
pollution situations.  A unified treatment of validation of particular types
of data bases is needed to support broad-scale uses of these data.  Current
in-use data validation procedures were presented at the conference to promote
a better understanding of available techniques.  Hopefully, the conference
and these proceedings will provide an impetus toward the development of more
unified and systematic approaches to data validation.


                                      Thomsas R. Hauser, Ph. D.
                                              Director
                            Environmental Monitoring Systems Laboratory
                               Research Triangle Park, North Carolina '

-------
                                  ABSTRACT
     These proceedings are a record for future reference of the technical
presentations made at a conference on Data Validation for environmental
data.  The conference was hosted and sponsored by the U. S. Environmental
Protection Agency, Research Triangle Park Interlaboratory Quality Assurance
Coordinating Committee on November 4, 1977, at the Research Triangle Park.
Various data validation approaches and techniques were presented and are
documented in this publication.
                                      iv

-------
                              CONTENTS


FOREWORD 	   iii

ACKNOWLEDGEMENTS 	   vii

INTRODUCTION	     1
EXECUTIVE SUMMARY AND RECOMMENDATIONS 	     3
     Seymour Hochheiser
     Raymond C. Rhodes

WHAT IS DATA VALIDATION? 	     7
     Raymond C. Rhodes

THE SHEWHART CONTROL CHART TEST FOR SCREENING
   24-HOUR AIR POLLUTION MEASUREMENTS 	    17
     William F. Hunt

DISTRIBUTION GAP TEST FOR HOURLY AIR POLLUTION
   DATA 	    25
     Thomas C. Curran

USE OF STATISTICAL SAMPLING IN VALIDATING
   HEALTH EFFECTS DATA 	    31
     Carolyn P. Chamblee

USE OF SUCCESSIVE TIME DIFFERENCES AND DIXON
   RATIO TEST FOR DATA VALIDATION 	    39
     Tyler Hartwell

CLUSTER ANALYSIS AS A DATA VALIDATION
   TECHNIQUE 	    71
     Harold L. Crutcher

ENGINEERING COMPUTATIONS AND DATA COLLECTION
   FORMATS USEFUL IN DATA VALIDATION 	    81
     A. Carl Nelson, Jr.

VALIDATION PROCEDURES APPLIED TO IN-USE MOTOR
   VEHICLE EMISSION DATA 	    99
     Marcia E. Williams

-------
DATA VALIDATION TECHNIQUES USED IN MOBILE
   SOURCE TESTING 	     125
     C. Don Paul sell

VALIDATION OF CONTINUOUS STACK MONITORING DATA 	     131
     Joseph E.  McCarley

SCREENING CHECKS USED BY THE NATIONAL CLIMATIC
   CENTER 	     135
     William E. Klint

DATA VALIDATION FOR UPPER AIR SOUNDING DATA
   AND EMISSION INVENTORY DATA 	     199
     J. H. Novak

VALIDATION OF BIOMEDICAL DATA THROUGH AN ON-LINE
   COMPUTER SYSTEM 	     209
     Larry D. Claxton

REGIONAL VALIDATION OF STATE AND LOCAL AIR
   POLLUTION DATA 	     219
     Thomas H.  Rose

DATA VALIDATION FOR THE LOS ANGELES CATALYST
   STUDY (LACS) 	     223
     Charles E. Rodes

VALIDATION TECHNIQUES USED IN CONTINUOUS AIR
   MONITORING  	     237
     Marvin B.  Hertz

USE OF PRECISION AND ACCURACY ESTIMATES FOR
   VALIDATION OF DATA 	     247
     David T. Mage

VALIDATION SYSTEM USED IN THE ST. LOUIS REGIONAL
   AIR MONITORING STUDY (RAMS) 	     265
     Robert B.  Jurgens

NAMES AND ADDRESSES:

    PROGRAM  	     295

    EPA/RTP  INTERLABORATORY QUALITY ASSURANCE
      COORDINATING COMMITTEE  	     305

    DATA VALIDATION CONFERENCE, SPEAKERS  	     306

    DATA VALIDATION CONFERENCE, ATTENDEES  	     308

-------
                               ACKNOWLEDGMENTS
     The cooperation of all  participants  in the conference  is  gratefully
acknowledged.   Particular appreciation is due to the participants who
prepared written copy of their presentations for post-documentation  of
the conference.
                                     vn

-------
                                  SECTION I

                                INTRODUCTION
     A conference on data validation was held on November 4, 1977, at
Research Triangle Park, North Carolina.  The conference was sponsored,
organized, and hosted by the Environmental Research Center - Research
Triangle Park (ERC-RTP) Interlaboratory Coordinating Committee.  Con-
ference participants represented  (a) EPA's RTP Research Laboratories and
Program Offices, (b) an EPA Regional Office, (c) EPA Contractors, and (d)
the National Climatic Center, Asheville, North Carolina.

     Welcoming remarks were made by Dr. John K. Burchard, Director of the
Industrial Engineering and Research Laboratory, and Senior ORD official at
RTP.  Each of the speakers presented their current practices for data vali-
dation.  The conference provided an opportunity for a free exchange of
viewpoints and techniques and was intended to enhance the state-of-the-art
of Data Validation.

-------
   EXECUTIVE SUMMARY AND RECOMMENDATIONS
                      by.
              Seymour  Hochheiser
                      and
              Raymond  C. Rhodes
Environmental Monitoring and Support Laboratory
     U.S. Environmental Protection Agency
 Research Triangle Park, North Carolina  27711

-------
                                SECTION II

                    EXECUTIVE SUMMARY AND RECOMMENDATIONS


     The nature and scope of data validation activities varies considerably
among those involved with the collection, analysis, review, and use of
environmental data for research and monitoring purposes.  The objective of
the conference was to review and discuss the various current practices of
data validation, and to provide a forum for the exchange of information.
It was intended that as a result of this conference, the function of data
validation might become more specifically defined and more uniformly and
widely implemented.

     Following is a general review of the contents of the papers presented
at the conference.

DEFINITIONS AND SCOPE OF DATA VALIDATION

     The conference authors implied differing definitions of data validation
and used various words relating to data validation activities.  The scope of
activities extended from that of simple checks for data transfer errors to
that of a total quality assurance program.  Words used in relation to data
validation activities included:   editing, screening, verification  checking,
auditing, and qualification.


TYPES OF DATA INVOLVED

     The type of data involved in most of the papers was  ambient air
pollution concentrations.  However, other types of data -- including epidemi-
ological, meteorological, stationary source emission, mobile source emission,
and in vitro and in vivo bioassays -- were discussed.

     Data were validated in a variety of forms, including strip charts, hand-
written forms, computer printouts, magnetic tape, and optical sensing records.

     In some cases the activities of data validation, as indicated by the
authors, were performed by those producing the data; in other cases, by
independent reviewers; and in still other cases, by the users.


SIZE OF DATA BLOCKS CONSIDERED

     In most cases, the data being validated were reviewed in definable
blocks, varying from one day to one year.  In one case, a real-time

-------
computerized system was used, and in another, the results of single tests
were reviewed individually.  The number of data values considered as a
block, or group of data, varied from only three results for stack sampling
tests to over 30,000 for pibals (meteorological balloons).


TYPES OF TECHNIQUES EMPLOYED

     Both manual and computer techniques were used, usually depending upon
the amount of data involved.  Some systems employed both manual and computer
methods.  Only a few of the papers included graphical techniques for reviewing
data.  In less than half of the systems described, specific checks were
made of the identification (or coding) of the data.  In about half of the
data validation systems, statistical techniques were used.  Some of the
techniques included were Dixon outlier tests, Shewhart control chart limits,
exponential distributions, and asymptotic singular decomposition.  In several
instances, statistical sampling plans were utilized to select specific data
sets for checking.


WRITTEN DATA VALIDATION PROCEDURES

     In only one case were detailed procedures written describing the data
validation activities and criteria.

FLAGGING AND REJECTION OF DATA

     In most cases the questionable results of the data validation process
were flagged, i.e., identified for more detailed evaluation and/or identified
as questionable values in the data records.  In most cases, as a result of
data validation, questionable data were invalidated (rejected) or were
corrected as a result of further investigation.


RECOMMENDATIONS

     Although no official conference recommendations were made, the following
recommendations were generally expressed:

1.   The functions and scope of data validation should be more specifically
     defined.

2.   Data validation techniques should be presented and summarized in some
     logical manner.

3.   Data validation systems should be recommended or specified for use in
     certain situations.

     The above recommendations could be pursued in several possible ways.
One would be for a task group to be formed to develop standardized nomencla-
ture, to summarize in a systematic way various activities and techniques of
data validation, and to recommend data validation systems for specific

-------
situations.  The above tasks could also be performed  by a  knowledgeable
contractor.
SUMMARY

     It is evident that the current practices of data validation vary
widely in nature and scope.  The conference provided an excellent opportunity
for an open exchange of information concerning data validation practices
and should result in a broader utilization of data validation techniques.
In addition, the conference discussions and these proceedings should promote
a greater awareness of the need to develop a more organized and unified
approach to this important element of quality assurance for environmental
data.

-------
           WHAT IS DATA VALIDATION?
                      by
              Raymond C. Rhodes
Environmental Monitoring and Support Laboratory
     U.S. Environmental and Protection Agency
 Research Triangle Park, North Carolina  27711

-------
                         WHAT  IS DATA VALIDATION?

                               R.C.Rhodes

     Just what is data validation?
     Many of us are involved in activities which, we feel, constitute data
validation, or at least,  a part of  a data validation process.  My first
encounter with the term "data  validation" in  connection with air pollution
monitoring data occurred  about five years ago.  Since  then my concepts of
the function and scope of data validation have  expanded considerably, and
in fact they are still changing.
     I'm sure that each person attending this conference  has his or her own
concept—probably different from  anyone else's—of  data validation.  Whatever
these concepts are, we're here to exchange our  ideas,  thoughts, and techni-
ques on the subject.  I feel sure that  each  of  us will  learn something new
and useful for our own particular area  of application.
     Before we hear the other speakers, let's think a  little bit about this
subject of "data validation."   Webster  defines  "validation" as follows:
          VALIDATION
               —  THE ACT OR PROCESS OF VALIDATING*
That doesn't help us very much, does  it? So we might  look at the  definition
of the word "valid" itself.
          VALID
               --  HAVING LEGAL EFFICACY OR FORCE
               —  SUPPORTED BY OBJECTIVE TRUTH
*The capitalized items in this paper were used as visual  aids for the
 presentation.
                                     8

-------
This definition is getting a little closer to our desired meaning in the
data validation sense.   The word "valid"  does have some connotation of a
"stamp of approval," indicating that things are "right."
     In the "Quality Assurance Handbook for Air Pollution Measurement
Systems," EPA 600/9-76-005, the following definition is given:
          DATA VALIDATION
               --  THE  PROCESS WHEREBY DATA ARE FILTERED AND ACCEPTED OR
                   REJECTED BASED ON A SET OF CRITERIA
There is a short section on "data validation" in the Handbook,  which you
may be interested in reading.  My own definition, which I use in the "Data
Validation" lecture of  Air Pollution Training Institute (APTI)  Course 470,
"Quality Assurance for  Air Pollution Measurement Systems," is somewhat more
detailed:
          DATA VALIDATION
               —  A SYSTEMATIC PROCEDURE OF REVIEWING A BODY OF DATA
                   AGAINST A SET OF CRITERIA TO PROVIDE ASSURANCE OF
                   ITS  VALIDITY PRIOR TO ITS INTENDED USE
The above definition says, in other words, that "a body of data" is reviewed
according to some previously defined plan in a rather comprehensive and
extensive way using all available expertise and knowledge at hand to assure
that the data are technically consistent, correctly identified, and contain
no obvious errors before the data are used.
     Following are a number of terms which seem to involve functions or
activities related to data validation.
          RELATED TERMS
               —  DATA EDITING
               —  DATA SCREENING
               --  DATA AUDITING
               --  DATA VERIFICATION
               --  DATA EVALUATION
               —  DATA QUALIFICATION
               --  DATA QUALITY ASSESSMENT
                                     9

-------
During the remaining presentations  today, you will  hear further  references
or usages of some of these terms.   Since some of  these  terms  are used  inter-
changeably, I believe we need more  specific definitions for each of  the
above to better understand how each one is, or  is not,  involved  in data
validation.  As I define data validation and the  above  terms, I  would
include data editing, data screening,  data  auditing and data  verification
as part of data validation.  However,  according to my definitions, data
evaluation, data qualification and  data assessment are  not parts of  the  data
validation process.
     Before considering some of the aspects of  data validation,  let  us
consider the obvious need for data  validation and its relation to quality
assurance.  EPA and other organizations need good data from which to make
good decisions.  This truism applies equally well to research studies  as well
as to monitoring programs although  data validation is not usually considered
as a separate activity in research  efforts.
          GOOD DATA	> GOOD  DECISIONS
               --  RESEARCH STUDIES
               —  MONITORING PROGRAMS
Since data validation is concerned  with an assurance of having obtained good
data, one might think that data validation includes.everything that is done
to get valid, or good, data.  But that is the concern of quality assurance.
Whereas quality assurance is concerned with all activities which may affect
data quality, the activities of data validation involve an after-the-fact
review of the data, along with related information, to assure that valid
data have, in fact, been obtained.   As such, data validation is considered
as only one element of quality assurance.
          DATA VALIDATION

               li
                    AN ELEMENT OF
                    QUALITY ASSURANCE
In the APTI Course 470, data validation is one of 23 elements of quality
assurance as shown by the following "Q.A. Wheel" in the Quality Assurance
Handbook.
                                    10

-------
              QUALITY ASSURANCE ELEMENTS AND RESPONSIBILITIES

                      (THE QUALITY ASSURANCE WHEEL)
           %
             %>,
                */» >v />
         %
    -a
    s

   IP -


%**
^
fflSTICAL
                              PROCUREMENT Q,C,
              ^
                                 //^
•f ^
• I
£
f

^
g
•§
«=*:
^

^
"nr—
CJ3
z:
2:
2
Q_
>-
1^*™*"
H!
-------
     What are some of the attributes  of a data  validation  system?   With  no
intent to restrict the other speakers concerning  their views  of  data  valida-
tion, following are some key features, in my opinion,  of a data  validation
system.
     After-the-Fact Review.   Data validation is an after-the-fact  review of
data to assure that good data have been obtained.   Many activities of
quality assurance are concerned with  the planning and  acquiring  of data, but
these activities are accomplished before or during the acquisition of the
data.  Data validation activities (a  part of quality assurance)  are accom-
plished after the data are obtained.
     Applied to Blocks of Data.  Data validation is applied to incremental
blocks of data.  The blocks in case of air monitoring  data that  are sent to
the National Air Data Bank (NADB) could be the  quarterly blocks  of data  sub-
mitted to the NADB.  The blocks of data for source emissions testing would
most likely be the run of three individual tests of a  test set.   Perhaps in
automotive emissions monitoring, the  block of data may be  that from a single
test.  So, a block would depend upon  what seems to be  logical for a particu-
lar type of data-gathering.  In any case, the data would be given a valida-
tion review as a defined block of data.
     Systematic and Uniform Application.  Data validation  should not be
conducted on an occasional or spot-check basis.  Once  the  procedure is
defined it should be applied systematically and uniformly  to all sequential
blocks of data acquired.  This is not to say that the  procedure should not
be continually improved.  It is helpful for details of the procedure to  be
written to assure uniform application of the procedure in  case of change of
personnel and to avoid "reinventing the wheel."
     A Set of Criteria.  A set of criteria ought to be developed and docu-
mented as a part of the written procedure to be used during data validation
to determine if the data are valid, questionable, or invalid.  If the causes
of questionable or invalid are not evident from the data validation activity,
the detection of questionable or invalid data could trigger investigation
into possible cause with appropriate corrective action implemented to pre-
clude recurrence of questionable, or invalid data.

                                     12
-------
     Checks for Internal Consistency.  Data validation might include checks
for internal consistency, such as relationships among pollutants, or rela-
tionships between pollutants and meteorology.
     Checks for Temporal and Spatial Continuity.   Data validation might
include checks for continuity with respect to time, as might be evaluated by
having a chronological plot of the data, to look for discontinuities, spikes,
gaps, etc.  The data may also have some spatial continuity if the data are
from a network within some relatively small region, such as a local  air
monitoring network.
     Checks for Proper Identification.  To be useful, data must be properly
identified.  Improperly identified data may well  be considered "no data."
Although identification may seem to be a trivial  thing, the Regions, for
example, have difficulties with such improper identifications as (a) one
state reporting data identified for another state, (b) data for October 35,
and (c) duplicate data from one site and none from another.  For medical
history questionnaires of health effects studies, checks may be made to
make sure that children are not older than their mothers!
     Checks for Transmittal Errors.  For paperwork systems, simple checks
may be made to assure that the data have not been incorrectly transferred
from one paper to another.  With more sophisticated electronic and computer
data handling and with telemetry of data, checks could be made to assure
that the data have not been changed in the process.
     Flagged or Rejected Data.  A data validation system might include a
scheme for flagging questionable data and may make provision for outright
rejection of data for use.  It may be desirable, however, to retain such
data in the data system with proper indication of its status.
     In summary, some of the aspects I consider as parts of a Data Validation
System are as indicated below:
          DATA VALIDATION
               —  AFTER-THE-FACT REVIEW
               —  APPLIED TO BLOCKS OF DATA
               --  SYSTEMATICALLY AND UNIFORMLY APPLIED
                                     13
-------
               —  A SET OF CRITERIA
                    —  CHECKS FOR INTERNAL CONSISTENCY
                    --  CHECKS FOR TEMPORAL AND SPATIAL  CONTINUITY
                    —  CHECKS FOR PROPER IDENTIFICATION
                    —  CHECKS FOR TRANSMITTAL ERRORS
               —  DATA FLAGGED OR REJECTED
     Techniques of Data Validation.  Obviously, because  the methods of data
gathering are so varied, the particular techniques that  are to be used for
data validation for a particular program will  depend upon many things.
Following are mentioned a few of the factors which need  to be considered.
The nature of the^ response output,  that is, whether you get a response on a
strip chart recorder, or whether it is generated on paper tape, magnetic
tape, or is fed directly into a computer will  determine  the technique of
data validation.  The techniques will depend upon the method of data
reduction, i.e., whether it is a manual-type method or a computer system.
The form of the data transmittal, i.e., whether data are transmitted by some
handwritten form, typewritten form, computer printout, or magnetic tape will
determine the types of data checks to use.  The techniques will also depend
upon the amount of data.  As we get involved with larger studies and larger
blocks of data involved, such as NADB, different techniques must be used
from those utilized for small sets of data.  The techniques will depend upon
the type and amount of ancillary (related) data that can be used for evalua-
tion, comparison, or for correlation purposes.  Techniques will depend upon
what computing capabi1ity is available for use.  The extent of available
plotting capability is an important consideration, particularly for large
blocks of data.  Personally,  I would like to see graphical presentations
used in data validation.  Much more can be learned by graphical representa-
tion that would be very difficult—almost impossible—to learn from visual
review of large masses of data.  Finally, the  nature and extent of data
validation techniques would depend on the intended use of  the  data.
Different criteria may be used for validating  data from which  long term
trends are estimated as compared to data for three-hour peak values, for
example.  To summarize:
                                     14
-------
          TECHNIQUES WILL DEPEND ON
               --  NATURE OF RESPONSE
               —  METHOD OF DATA REDUCTION
               --  FORM OF DATA TRANSMITTAL
               --  AMOUNT OF DATA
               —  AMOUNT OF ANCILLARY DATA
               —  COMPUTING CAPABILITY
               —  PLOTTING CAPABILITY
               —  USE OF THE DATA
     Lastly, there are two key principles of data validation that I  want to
mention.  First, data validation ought to occur as close in time and
location a^ possible to_ the originating location of the data.   If question-
able values are discovered, and corrective actions need to be made to the
system, they must be made in a very timely and effective manner.  For
example, NADB may be validating data for as much as two years after the
initial generation of the data.  That is much too late to get effective
corrective action at the local level.  Therefore, data validation techniques
should be located as closely as possible to the source of the data.   Second,
where possible, the persons having data validation responsibilities  should
not be the persons directly responsible for acquiring the data.  Ideally,
the person or persons responsible for data validation should be independent
of the data acquisition activities and should be the most knowledgeable and
experienced technical individual available to perform the function.
     Thus,
          DATA VALIDATION SHOULD BE
               —  CLOSE TO THE ORIGINATION OF THE DATA
               —  INDEPENDENT
     Perhaps I have raised a number of questions in your mind concerning the
subject of data validation.  Hopefully, the other speakers will answer some
of these questions, or will raise further questions, and will promote bene-
ficial discussions and interchange of ideas and techniques of data validation.
                                    15
-------
    THE SHEWHART CONTROL CHART TEST FOR
SCREENING 24-HOUR AIR POLLUTION MEASUREMENTS
                     by
              William F. Hunt
 Office  of  Air  Quality  Planning  and Standards
     U.S. Environmental  Protection Agency
 Research Triangle  Park, North Carolina  27711
                      17
-------
                          THE SHEWHART CONTROL CHART TEST FOR

                      SCREENING 24-HOUR AIR POLLUTION MEASUREMENTS

                                       W.F.  Hunt


                                      INTRODUCTION
     A quality control program is being  developed for the U.S.  Environmental Protection
Agency's (EPA) National Aerometric Data  Bank (NADB).   The initial phases of the work
                         (1 2)
were reported previously.   '     The purpose of the program is to develop and apply
quality control tests to check ambient air quality data for anomalies,  such as  trans-
cription and keypunch errors,  as well as to detect erroneous data resulting from the
periodic malfunctioning of air monitoring instruments.  For the sake  of completeness, it
is worth reviewing some aspects involved in the collection and uses of  air quality data.
To begin with, air quality data are primarily collected to measure the  success of
emission control plans in achieving the National Ambient Air Quality  Standards.

     National Ambient Air Quality Standards (NAAQS)     have been established by EPA for
five pollutants:  total suspended particulate (TSP),  sulfur dioxide (802), carbon monox-
ide (CO), photochemical oxidants  (Ox), and nitrogen dioxide (N0£).  These standards are
intended to protect both human health and welfare.  They may be stated as annual means
or as upper limit values that may not be exceeded more than once per  year.  Although
different averaging times are used for various standards, this paper  is primarily con-
cerned with the examination of 24-hour average values for TSP, S02, and N02 concentra-
tions.  While only TSP and S02 standards are in terms of 24-hour averages, all three
pollutants have standards expressed in terms of annual averages.  Because of the impor-
tance that is attached to violations  of the NAAQS, a quality control program to ensure
the validity of the measurement of both short- and long-term concentrations is extremely
important.

     The application of the Dixon Ratio Test    and Shewhart Control Chart Test    to
measured levels of three major pollutants—TSP, S02, and N02—is examined.  The tests
apply to data from monitoring instruments which generate one measurement per 24-hour
period and are operated on a  systematic sampling schedule of approximately once every 6
days.  In the cases of SQ2 and N02, there are also continuous monitoring instruments,
which monitor the pollutants  constantly; but our discussion here is concerned only
with 24-hour  data.  The application of the tests results is flagged data which need  to
be verified as either valid or invalid.

     These statistical tests  are  presently being applied to data collected in EPA's
Region V.  Region V encompasses  the states of Illinois,  Indiana, Michigan, Minnesota,
Ohio, and Wisconsin.  In  terms of population, it is the  largest of EPA's regions, and
there is extensive monitoring of  the  above pollutants.   The purpose of  the Region V
evaluation is to determine whether the data  flagged by the  tests are valid or  invalid
and  to  identify, if possible, the source of  the error.

     This paper will  discuss  the  flow of data from the state and local  government;  the
data-editing  process; the basic  characteristics of the data; and  the application and
evaluation of the two tests;  it  will  conclude with our recommendations.

                                        DATA FLOW
     Most ambient air quality data are collected by state  and local air pollution con-
 trol agencies and are forwarded  via  EPA's  Regional Offices  to the NADB.  A considerable
 amount  of data  is   forwarded.  For example,  the minimum  legal requirements  for air
 pollution monitoring  across the  nation will  result in the  annual  submittal of  over  20
million air  quality measurements to  the NADB.  The data  are sent  quarterly  in  a standard


  Copyright ©1977 American Society For Quality Control,  Inc.  Reprinted by permission.

                                               18
-------
format    that specifies the site location; the year, month, and day of sampling; and
the measurement itself (24-hour or 1-hour value) in micrograms or milligrams per cubic
meter (pg/m3 or mg/m3) or parts per million (ppm).  A corresponding site file contains
descriptive information on the sampling-site environment.  EPA edits the submitted
data, checking for consistency with acceptable monitoring methods, and other identify-
ing parameters.  In the data-editing program, air quality data with extremely high
values are flagged.  Data that do not pass these checks or that have values exceeding
certain predetermined limits are returned to the originating agency via the Regional
Office for correction and resubmittal.

     As might be expected with data sets this large, there are still anomalous measure-
ments that slip through the existing editing and validation procedures.  Therefore,
there is a need for a simple cost-effective statistical test that can be applied to
the air quality data by which to detect, primarily, obvious transcription, keypunch, and
measurement errors. Statistical tests do not eliminate, however, the need for more
intensive quality assurance at the local level.  For example, inadequate calibration
procedures or similar problems that result in measurement bias will not be detected by
our statistical procedures, which are intended primarily for macroanalysis.

                   BASIC CHARACTERISTICS OF TSP, S02, AND N02 DATA
     Basic characteristics of the TSP, S02, and N02 data were considered in selecting
the quality control tests being used.  To begin with, the tests were applied to data
which were obtained from monitoring instruments that generate one measurement per 24-
hour period.     For such monitoring methods, EPA recommends that a systematic sampling
procedure of once every 6 days, or 61 samples per year, be used at a minimum to collect
         /Q\
the data.     Such a sampling procedure generates data, which for our purposes, may be
considered as approximately independent.

     In examining the distributional properties of the data, past research has shown

that ambient TSP concentrations are approximately lognormally distributed.  '     This
is sometimes true for S02 and N02, also, but is not always the case.  Current work

suggests that these pollutants may follow an exponential or Weibull distribution.

     In selecting the quality control tests, the averaging times which correspond to
the NAAQS are important.  The values of interest are the peak concentrations (24-hour
average measurements) for TSP and S02, and the annual means for TSP, S02, and N02.

     The final data characteristic of importance is the seasonality of the pollutants.
As an example, in some areas of the country, TSP and S02 measurements are highest in
the winter months and lowest in the summer months.  Therefore, the factor of seasonality
had to be considered in the selection of the quality control test to minimize this as a
possible source of error.

                                THE QUALITY CONTROL TESTS
     Two quality control tests are presently being applied and the results of the appli-
cation evaluated, the Dixon Ratio Test    and the Shewhart Control Chart Test.     The
output of the quality control tests is a listing of the suspicious data, including the
site and the time of occurrence.  The tests are discussed below.

Dixon Ratio Test                                                        ,^\
     The use of the Dixon Ratio Test was discussed in an earlier paper.     The test
was applied to TSP quarterly data and was found to work reasonably well in detecting a
single anomalous value.  Problems occurred when there were multiple transcription errors
within a quarter, such as the miscoding of an entire month of data.  This problem was
corrected when the test was applied to monthly averages.

     As part of the evaluation of quality control of Region V data, the Dixon Test
was applied to all 1974 monthly averages of TSP, S02, and N02 on a site-by-site basis
to examine the data for possible multiple transcription, keypunch, or measurement
errors occurring within a month.  By applying the test to the monthly averages, the
assumption of normality can be satisfied, although the monthly averages are not entire-
ly independent because of the seasonality in the data.  This must be considered in
examining the flagged data.

     The Dixon Ratio Test requires that the monthly averages be ordered in increasing
levels of magnitude.  The test .basically constructs an "r" ratio that compares the
distance of the maximum (minimum) observation from its neighbors with the range of all

                                          19
-------
but one or two of the observations.   Let us assume that Y^ equals the itn order pollu-
tant monthly average, where Y^j is the highest monthly average and N equals the number
of months within the year for which there are data.  The test procedure is as follows:

1.  Choose a, the probability (risk) of rejecting an observation that really belongs in
    the group.

2.  Order the monthly averages from Y^ through Y^j, where Y^ is the highest value.

3.  If  3 < N <  7,  compute rin =  (Y.T - Y.T ,)/(YM - Y,) ;
          —   —               10     N    N-l    N    1

        8 <_ N ^ 10,  compute rn =  (YN - Vl;/(YN " Y2};

       11 1 N H I2-  compute rn =  (YN - YN_2)/(YN - YZ> ;

    where Y  is the highest value.
                                                         4
4.  Look up r,   for r.  from a table of critical values.
             1-ct      ij
5.  If r.. is greater than r   , print out a list showing the suspect monthly averages,

    the remaining monthly averages, and the site location.

     The Shewhart Control Chart Test    can be used  to examine both  shifts in monthly
averages, as well as shifts in the monthly range.  From the former it can detect
possible multiple errors and from the latter, single anomalous values.  In this  test
                                                                        (12)
the data can be divided up into what Shewhart called rational subgroups.      In a
manufacturing process the subgroups would most likely relate to  the  order of production.
Ambient air quality measurements can be viewed in the same way because  they are  col-
lected by a monitoring instrument over time.  A month of data was selected as the
rational subgroup because the air quality data are recorded by the state and local

agencies on a monthly basis in a standard format.     The monthly subgroup generally
                                                                           / 0\
consists of  live measurements based on EPA's recommended sampling schedule    of 61

observations per year, which also is the common subgroup size found  in  industrial use.
Using a subgroup size of five, it can be assumed  that  the distribution  of the monthly
means is nearly normal, even though the samples are  taken from a nonnormal universe.

     The test was applied to the 1974 Region V data  on a moving  4-month basis:   that  is,
the averages and range of values in the month in  question were compared with  the overall
averages of  the three previous monthly averages and  monthly  ranges.   The moving  4-month
comparison was used  to minimize the effect of the  seasonality of the pollutants.  The
formulas for calculating the trial  limits are as  follows:

For the monthly range:  UCL  = D.R, and

                        LCLn = D..R.
                            K     j
 For  the  monthly  means:   UCL_  =  x  +  A0R,  and
               }             x         2  '
                         LCL_  =  x  -  A2R,

 where  R  =  the  monthly  range;  R  =  the average  of  the three previous monthly ranges;  x =
 the  monthly  average  in question;  x  = the average of the  Three previous monthly aver-
 ages,  and  D-j,  D^,  and  A2 are  factors for determining from R the 3-sigma control limits
 for  x  and  R.   (See Table C  on page  562,  reference number  5.)

                     RESULTS  OF APPLICATION OF QUALITY CONTROL TESTS
      During  1974,  TSP,  S02, and N02 were being monitored  in Region V at 855,  366, and
 303  sites,  respectively.   Both  the  Dixon and  Shewhart Tests were applied to all 1974
 TSP,  S02,  and  N02  data from Region  V.   Still  in  progress, an extensive effort is being
 made on  the  part ot  EPA personnel in Region V, in conjunction with state air pollution
 control  officials, to  evaluate  the  air quality data flagged by both the Dixon and
 Shewhart Tests.  As  an initial  phase of this  evaluation,  examination was made of those
 data in  which  the  flagged monthly mean or range  exceeded  one of the pollutant-specific
 NAAQS.   For  'ISP  and  S02,  appropriate cutoffs  were thought to be 260 yg/m3 and 365 ng/m3,
 which aro  their  respective primary  short-term 24-hour standards.  In the case of N02,
 the  annual primary NAAOS of 100 Mg/m'  was used because N02 has no short-term primary
 standard.   Although  their choice  was somewhat arbitrary,  the NAAQS wero used as cutoffs
 beLausr  their  violation results in  ruexamination of the overall adequacy of local air


                                            20
-------
 pollution control measures in effect.  Thus, high values must be verified because they
 can result in significant impact on the original control strategy designed to achieve
 the NAAQS.

      Table 1 indicates the number of Region V sites reporting TSP, S02, and N02 data
 which were flagged by the Dixon Test, by the Shewhart Control Test, and by both tests.
 As would be expected, there are more sites flagged by the Shewhart Control Test as
 having anomalous data than the Dixon Test, because it looks at both shifts in the
           TABLE 1.   Comparison of Dixon Ratio and Shewhart Control Chart
                     Tests as Applied to Sites in Region V Monitoring TSP,
                     S02,  and N02 in 1974


                                                      Pollutant
                                             TSP          S02          N02
                High value in               .> 260        >_ 365        _> 100
                  question3
                  (yg/m3)

                Total sites, no.              855          366          302
Dixon test
Flagged sites, no.
Flagged sites, no.
with errors
Shewhart test
Flagged sites, no.
Flagged sites, no.
with errors
Both tests
Flagged sites, no.
Flagged sites, no.
with errors

35
31


38
31


32
31


1
1


4
3


1
1


25
11


36
16


19
10

      The High value in question is the monthly mean in the case of the Dixon Test and
the monthly mean or range in the Shewhart Control Chart Test.  The National Ambient
Air Quality Standards (NAAQS) were used as high value cutoffs:  260 yg/m3 and
365 yg/m3 are the 24-hour primary NAAQS for the TSP and S02, respectively, while
100 pg/m3 is the annual primary NAAQS for NO,,.


 monthly mean and range while the Dixon Test examines only the monthly means.  The pre-
 liminary evaluation of the flagged sites is also given as the number of flagged sites
 which were found to have one or more erroneous 24-hour measurements.

      Of the 855 sites in Region V measuring TSP in 1974, 35 were flagged by the Dixon
 Test, 38 by the Shewhart Control Test, and 32 by both tests.  The flagged sites report-
 ed at least one monthly mean and/or range eoual to or greater than 260 ug/nr.  The
 preliminary evaluation indicates that data from 31 sites, which were flagged by both
 tests, were found to have multiple transcription or keypunch errors.  In the caso of
 S02, 1 of the 366 sites was flagged by the lixon Test, 4 by the Shewhart Test, Jnd 1
 by both tests.  The monthly mean and ranges in question were equal to or greater t'tan
 365 ug/m3.  Data from the site flagged by both tests were found to have multiple tran
 scription errors, while data from the remaining tvo sites flagged by the Shewhart
 Test had single transcription errors.  Finally, of the 302 sites measuring NO^, -5
 were flagged by the Dixon Test, 36 by the Shewhart Test, and 19 bv^both tests.  The
 monthly means and ranges in question equalled or exceeded 100 ug/nr .   Transcription
 and keypunch errors were found at 11 of the sites flagged by the Dixon Test, Ib c>t" the
 sites flagged by Shewhart Test, and 10 of the sites flagged by both.

      An example of a site flagged by both tests was one that measured TSF for 11 rionth---
 in 1974.  The monthly mean  (x) , ranges (R) , and subgroup sizes (n't ait? indicated bol^w
 by month:
                                              21
-------
  x
  R
  n
          Jan
           0
Feb
 67
 74
  4
Mar
60
25
5
Apr
56
71
5
May
70
44
5
June
67
102
3
Jul
66
37
5
Aug  Sept  Oct  Nov   Dec
 73   59   591   82    41
 64   68   595   68    30
  55534
The Dixon Ratio Test was applied  to  the  entire year of data; the ratio of the largest
monthly mean, 591, minus the third largest  mean,  73,  was compared with the difference
of the largest mean and second  smallest  monthly mean, 56.   The test statistic is
which is significant at  the
                  = 591-73
               C2 1  591-56
               0.005  level.
     The Shewhart Control  Chart  Test  was applied on a moving 4-month basis.  When  the
monthly average and range  for  October became the values in question, they were com-
pared with the overall  averages  of  the July, August, and September averages and  ranges.
The test results are  shown in  Figure  1 for both the monthly mean and range.  In  both
             600
             500
             400
          E
          a.
          <
          cr
             300
             200
             100
                                   	J	UCL,
                     -T-—+—\
                                                                        LCLX
                     JUL    AUG   SEP    OCT

                   a. R CHART FOR MONTHLY RANGE
                                  JUL    AUG   SEP    OCT

                                 b. X CHART FOR MONTHLY MEAN
                 Figure 1. Example of Shewhart Control Chart Test applied to data with
                 multiple transcription errors in month of October.


 cases the air quality data are "out of  control" for  the  month  of October,  with both
 the October average and range way above  their respective upper  control limits.  The
 problem was later identified as multiple transcription errors  in which all numbers in
 the month of October were off by a factor of  10.
                                            22
-------
                                      CONCLUSION
     From the initial results of the Region V evaluation, it appears that both the
Dixon and Shewhart work well on the TSP, S02," and N02 data and are in reasonably good
agreement.  Ideally, both tests should be used in the screening process.  However, if
an air pollution control agency wanted to employ only one test, the Shewhart Control
Chart Test would be preferable, because it has the advantage that it can simultaneously
examine shifts in both the monthly mean and range and can be presented graphically.
Further, in the case of S02 and N02, the Shewhart Test flagged sites with a single
transcription or keypunch error—identified by shifts in the range—which were not
identified by the Dixon Test.

     The second phase of the Region V evaluation will cover those sites whose highest
measured value did not exceed one of the pollutant-specific NAAQS.  This phase will be
examined in a later paper, along with the development of quality control tests for data
generated by the continuous monitoring methods.

                                   ACKNOWLEDGMENTS
     The authors wish to express their appreciation to the state air pollution control
agencies in Region V for their help in the evaluation of the tests, to Mrs. Ann Rogers
and Mrs. Aline Rolaff for providing the computer programming support, to Mrs. Joan
Bivins, Miss Hazel Browning, and Mr. Willie Tigs for their clerical support, and to
Dr. Thomas Curran for his many helpful comments on earlier drafts of the paper.

                                      REFERENCES
 1.  Hunt, W. F., Jr., and T. C. Curran.  An Application of Statistical Quality Control
     Procedures to Determine Progress in Achieving the 1975 National Ambient Air Quality
     Standards.  Transactions of the 28th Annual ASQC Conference, Boston, Massachusetts,
     May 1974.

 2.  Hunt, W. F., Jr., T. C. Curran, N. K. Frank, and R. B. Faoro.  Use of Statistical
     Quality Control Procedures in Achieving and Maintaining Clean Air.  Transactions
     of the Joint European Organization for Quality Control/International Academy for
     Quality Conference, Venice Lido, Italy, September 1975.

 3.  Title 40 - Protection of Environment.  National Primary and Secondary Ambient Air
     Quality Standards.  Federal Register.  _36(84):8186-8201, April 30, 1971.

 4.  Dixon, W. J.  Processing Data for Outliers.  Biometrics.  9^75, 1953.

 5.  Grant, E. L.  Statistical Quality Control.  New York, McGraw Hill Book Co.
     p. 122-128.  1964.

 6.  SAROAD Users Manual.  U. S. Environmental Protection Agency, Research Triangle
     Park, N.C.  Publication No. APTD-0663.  July 1971.

 7.  Hoffman, A. J., T. C. Curran, T. B. McMullen, W. M. Cox, and W. F. Hunt, Jr.
     EPA's Role in Ambient Air Quality Monitoring.  Science.  190(4211):2A3-248,
     October 1975.

 8.  Title 40 - Protection of Environment.  Requirements for Preparation, Adoption, and
     Submittal of Implementation Plans.  Federal Register.  3.6(158) -.15490, August 14,
     1971.

 9.  Larsen, R. I.  A Mathematical Model for Relating Air Quality Measurement to Air
     Quality Standards.  U.  S. Environmental Protection Agency, Research Triangle Park,
     tl.C.  Publication No. AP-89.  1971.

10.  Hunt, W. F., Jr.  The Precision Associated with the Sampling Frequency of Lognor-
     mally Distributed Air Pollutant Measurements.  J. Air Poll. Control Assoc.  22(9):
     687, 1972.

11.  Curran, T. C. and N. H. Frank.  Assessing the Validity of the Lognormal Model Vhen
     Predicting Maximum Air  Pollutant Concentrations.  Presented at the 68th Annual
     Meeting of the Air Pollution Control Association, Boston, Massachusetts, 1975.

12.  Shewhart, W. A.  Economic Control of Quality of Manufactured Product.  Princeton,
     D. Van Nostrand Company, Inc.  1931.  p.  299.


                                      23
-------
     DISTRIBUTION GAP TEST FOR HOURLY
            AIR POLLUTION DATA
                     by
              Thomas  C.  Curran
 Office of Air Quality Planning and Standards
    U.S. Environmental Protection Agency
Research Triangle Park, North Carolina  27711
                      25
-------
                             DISTRIBUTION GAP TEST FOR HOURLY

                                    AIR POLLUTION DATA

                                        T.C.  Curran


     Previous papers    have discussed techniques for screening air pollution data sets
with particular attention given to 24-hour measurements.   The present paper focuses
upon the use of screening procedures for hourly ambient air quality measurements.  As
with any quality control procedure, it is useful to consider the nature and intended
use of the data before discussing the screening technique.

     Hourly air pollution data sets present some interesting practical problems when one
considers the use of a screening procedure.   The most obvious feature is the volume of
data.  For example, 24-hour air pollution measurements are usually obtained by every-
sixth-day sampling resulting in approximately 60 samples per year.  In contrast, hourly
measurements are obtained from continuous monitors that operate every day and, therefore,
may produce as many as 8,760 values per year.  Thus, hourly data sets are commonly 100
times larger than those for daily measurements.  The reason that the volume of data is
important becomes apparent when the use of the data is examined.  For the most part, air
pollution data is collected to determine status with respect to certain legal standards,
                                                   4
such as the National Ambient Air Quality Standards.   These standards specify upper
limits for air pollution concentrations.  Of particular interest for this paper are the
standards for oxidants or carbon monoxide which indicate hourly values "not to be
                                  4
exceeded more than once per year."   In these situations it is the second highest
value from a data set of 8,760 observations that becomes the decision-making value.
Obviously, this places a premium on ensuring data quality.

     From a practical viewpoint, maintaining a data bank for air pollution measurements
involves the basic conflict of having to routinely process large volumes of data and
yet at the same time ensure an almost zero defect level of data quality.  Many sites
monitor for several pollutants so that on the national level,  thousands of sites are
routinely submitting tens of thousands of data points each year.  However, because of
the nature of the standards, many users may only be interested in the two highest values
at each site for each pollutant.  It should be noted that two  values from a data set of
8,760 observations constitutes 0.023 percent of the data.  This means that the user's
perception of data quality may be entirely different from the  true data quality.  For
example, if only 0.05 percent of  the data points were too high due to errors, this
would still be sufficient to have the user complain that "the  data are useless."  On
the other hand, if elaborate editing checks  are introduced, the sheer volume  of  data
may result in high costs or processing delays, and  the user may now complain  that the
data are not sufficiently current  for him to make timely decisions.

     With this background in mind,  it is apparent that an air  quality data screening
program must be able to  process  large volumes  of data  in an inexpensive fashion  while
flagging virtually every error.   Also, because it is frequently difficult and time con-
suming  to- verify suspect data points, every  flagged value should  be a genuine error.
Unfortunately, while these  characteristics are obviously desirable,  they are  also almost
impossible to attain.  The  approach presented  here  is primarily intended to eliminate
the more glaring errors  from these  hourly data sets.  The major emphasis is on  screening
the  higher concentration values  to  check for  general internal  consistency within the
data  set.

                            RATIONALE  FOR SCREENING  PROCEDURE
      In our  initial development  of  a  screening procedure  for hourly data, a computer

program was  developed that  checked  for departures  from typical patterns.   These typical
patterns we-re  selected on the basis of experience with various types of air pollution
d.it.i.   Basically,  the values were flagged on  a yes-no decision, and  there was no proba-
bility  statement associated with the  rejected  values.  One  stage  in  this development was


Copyright ©1977  American Society For Quality Control, Inc.   Reprinted by permission.

                                             26
-------
to give sample data sets' to experienced air pollution data analysts to see what values
they would reject.  There were two reasons for this step.  The most obvious was to en-
sure that the computerized screening procedure was consistent with so-called expert
judgment.  However, another reason was the need for a test that would mimic the decisims
made by an experienced analyst.  The reason for this was an attempt to avoid a black-
box approach where the screening procedure was viewed as a mysterious oracle delivering
arbitrary decisions.  The point here is that it can be quite time consuming for the data
analyst to check flagged data points.  Values that appear to be quite unlikely from a
statistical viewpoint may actually be quite likely in the real world.  For example,
massive traffic jams do happen and may result in high carbon monoxide levels.  Windstorms'
can mean high total suspended particulate levels.  Sudden shifts in wind direct ion "can
mean that a monitor near a point source goes from a zero reading to almost full scale
and back in a few hours.  The high variability associated with peak air pollution values
makes it almost impossible to develop a screening procedure that does not occasionally
flag real values.  But it seemed desirable to avoid the situation where an air pollution
analyst would tire of repeatedly checking flagged values that turned out to be correct.
Therefore, emphasis was given to developing a test that would flag values that an air
pollution analyst would want to investigate.  An effective way to accomplish this was to
develop a test that would mimic experienced human judgment so that the analyst would
understand why the value was flagged.

     To a large degree the preliminary test on patterns was successful.  Experienced
analysts used the same basic approach of looking for unusual jump discontinuities between
successive hourly values or departures from expected diurnal or seasonal patterns. How-
ever, there were  two main deficiencies in this computerized procedure based upon depart-
ures from suspected patterns.  One was the lack of a probabilistic framework.  The
second, and probably the more serious from a practical standpoint, was the need to vary
the amount of allowable departure from site to site.  The probabilistic framework could
be provided by a  time series model,  and the parameters varied from site to site.  However,
it became apparent during the preliminary investigation that many of the outliers could
be detected by a  much simpler approach.  In most cases, unusually high values could be
detected by examining the frequency  distribution of the hourly data  for a given period
of time, such as  a month, quarter, or year.  Suspect values would be associated with
large gaps in the  frequency distribution.  The length of the gap and the number of
values above the  gap afforded a convenient means of detecting possible errors. With this
simplification of  the problem, it becomes possible to develop a probabilistic framework
for the problem as discussed below.

                                 PROBABILITY OF A GAP
     In order to  compute  the probability of a gap in the empirical frequency distribu-
tion, it is necessary to  assume some type of underlying distribution.  Although this
involves an oversimplification because it ignores dependency between successive hourly
values,  such approaches have traditionally been used with success in air pollution data
analysis.   The lognormal distribution has customarily been used for this purpose.  How-
ever, the exponential distribution has also been found to provide a  reasonable approxi-
mation for the upper tail, or higher concentrations, of hourly air pollution data.
Because  the higher  concentration values were of primary  interest and the exponential
distribution  is mathematically convenient, it was used as the underlying distribution.
As with  any measurements, although the approximating distribution is continuous,  the  air
pollution values  are discrete  valued.  For simplicity, they may be assumed  to be Integers
because  this  involves merely a change  of scale.  A gap in the  frequency distribution  may
then be  described in terms of  its length, the number of  values above the gap, and at
what concentration  the  gap begins.   Therefore, if a monthly empirical  frequency distri-
bution of hourly  values has n values greater than concentration c but  no values between
c, and c+k, this  would  be a gap of length k starting at  c with n observations above the
gap.  To compute  the probability of  this event,  consider the  following:

     Let X be an  exponential random  variable.

     Then Pr(X^c)  =  l-e~       where ,\>0, c^O.

     Thus, Pr(X-c)  = e~A(c~6).            __


                                             27
-------
The probability that X is greater than c+k given that X is greater than c is

                                  -A(c+k-6)     ,,
                     Pr(X>c+k)
                     Pr(X>c)
            -A(c-0)
 Because X is distributed exponentially, this expression is independent of the concen-
 tration c.

      Assuming independence, the probability that n values are greater than c+k given
 that these n values are greater than c is
                                     i -N
                                     (e   )
                                               -nAk
                                                                                 -nXk
      Thus, the probability of a gap of length k with n values above  the gap  is e
 This probability then becomes the criteria for rejecting suspect data.

                                       APPLICATION
      A relatively simple FORTRAN program was written to process hourly data,  compute
 the empirical frequency distribution, and examine any gaps.  Because of the  manner  in
 which the data is routinely submitted to the U.S. Environmental Protection Agency's
 National Aerometric Data Bank, the program was written to check the  data  on  a monthly
 basis (744 hourly values).  The parameter A obviously varies from  one data set to
 another.  For simplicity, A was determined from the 50th and 95th  percentiles of the
 data.  This was computationally convenient and also emphasized the fit for the upper
 tail.  Results to date in evaluating this test Indicate that this  approach is adequate.

      Past experience has indicated that an occasional source of error is  the miscoding
 of units so that an entire month of data would be internally consistent yet  too high
 by some scale factor.  To account for this, a second estimate of  A was computed using
 an assumed value for the 99.9th percentile, i.e., a value that historically  should  not
 be exceeded more than one time in a thousand.

                                         RESULTS
      In order to provide a realistic test of this screening procedure, actual data  sets
 were used.  One of particular interest  involved carbon monoxide data that had been
 quickly key-punched and then manually edited for a  specific study.  This  provided a pre-
 liminary and corrected version of the file.  The preliminary file  had known  errors  and
 the corrected file was presumably valid.  The first test  run on the preliminary  file
 processed 21,362 hourly values from 40  monthly data sets.   Eight  of these monthly data
 sets were flagged.  Hourly carbon monoxide values would be  expected to mostly fall  in
 the range of 0  to 50 ppm.  In this first  test, values of  900, 800, 700,  and  500 were
 found resulting in gap lengths greater  than 100 and associated probabilities of  less
 than 1  in 10,000.  These results are shown in Table 1.  Of  the eight flagged data  sets,

                   TABLE 1.  Rejected Site Months From Sample Data Set
  Site   Month/year
Number                                   Number of
  of              2nd    Gap   Starting   values
values  Maximum  high  length     at       above
Probabilitv
33
33
33
33
33
39
901
901
Oct.
Nov.
Dec.
Jan.
Feb.
June
July
Aug.
1974
1974
1974
1975
1975
1975
1974
1974
530
604
671
653
510
707
620
334
30
500
800
500
33
900
15
800
13
300
500
500
18
700
14
800
16
-100
-100
>100
14
-100
3
-100
14
15
41
20
19
27
11
. 11
1
3
4
Z.
1
3
3
5
.0006
• .0001
- .0001
-.000:
. :ooi
.000;
.0056
.0001
                                                28
-------
seven had keypunch errors.  The one remaining month was flagged on the basis of a gap of
length 3 and the data appeared to be reasonable.  This presented no difficulty for the
analyst because the computer printout was sufficient to indicate that these data were in
an intuitively acceptable range and probably did not warrant further investigation.

     It took less than 30 seconds on EPA'a UNIVAC 1110 to process these 21,362 hourly
values, and the total cost was approximately $1.  It should be noted that the program
does several other editing checks so that this cost includes more than the screening
procedure for gaps.


                                      CONCLUSIONS
     Using gaps in monthly frequency distributions appears to be a convenient means of
screening hourly air pollution data sets for outliers.  Results to date indicate that it
satisfies the criteria of being easy and economical to implement while producing output
that is intuitively understandable to an air pollution data analyst.  The test success-
fully spots the more obvious errors.  As expected, the initial results also suggest that
these types of data sets do have a much lower error rate than the user perceives because
of the emphasis on only the few highest values.    .

     There are certain refinements that can be made in screening these type of data sets.
Time series models and the use of associated data, such as meteorological variables,
would be expected to increase sensitivity and possibly result in even better data qualUty.
However, it remains to be seen if these more elaborate approaches are cost effective
when processing vast quantities of data from locations throughout the nation.

     As a final cement, it should be noted that once a value is flagged as a possible
anoaialy, it cannot be arbitrarily dropped from the data set.  It must first be verified
that the data point actually is incorrect.  The fact that the data point is statistically
unusual does not necessarily mean that it did not occur.

                                      REFERENCES
1.  Hunt, W. F., Jr., and T. C. Curran.  An Application of Statistical Quality Control
    Procedures to Determine Progress in Achieving the 1975 National Ambient Air Quality
    Standards.  Transactions of the 28th Annual ASQC Conference, Boston, Massachusetts,
    May 1974.

2.  Hunt, W. F., Jr., T. C. Curran, N. H. Frank, and R. B. Faoro.  Use of Statistical
    Quality Control Procedures in Achieving and Maintaining Clean Air.  Transactions of
    the Joint European Organization for Quality Control/International Academy  for Quality
    Conference, Venice Lido, Italy, September 1975.

3.  Hunt, W.. F., Jr., R. B. Faoro, and S. K. Goranson.  A Comparison of the Dixon Ratio
    Test and Shewhart Control Chart Test Applied to the National Aerotnetric Data Bank. •
    Presented at j|t>e 30th Annual Conference of the American Society for Quality Control.
    Toronto, Ontaflo, Canada, June  1976.

4.  Title 40 - Protection of Environment.  National Primary and Secondary Ambient Air
    Quality Standards.  Federal Register. 36:(84):8186-8201, April 30, 1971.

5.  Larsen, R.  I.  A Mathematical Model for Relating Air Quality Measurements  to Air
    Quality Standards.  U.S. Environmental Protection Agency, Research Triangle Park,
    N.C.  Publication No. AP-89.  1971.

6.  Curran, T. C.  and N. H. Frank.  Assessing the Validity of the Lognormal Model when
    Predicting Maximum Air Pollutant Concentrations.  Presented at the 68th Annual
    Meeting of  the Air Pollution Control Association, Boston, Massachusetts,  1975.
                                              29
-------
  USE OF STATISTICAL SAMPLING IN VALIDATING
            HEALTH EFFECTS DATA
                     by
             Carolyn  P. Chamblee
     Health Effects Research Laboratory
    U.S.  Environmental Protection Agency
Research  Triangle Park, North Carolina  27711
                       31
-------
                       USE OF STATISTICAL SAMPLING IN
                       VALIDATING HEALTH EFFECTS DATA

                             Carolyn P.  Chamblee
                    Statistics and Data  Management Office
                     Health Effects Research Laboratory
                Research Triangle Park,  North Carolina  27711


                                  ABSTRACT


     A quality control  plan has been adopted for large computer data files of
health effects research studies.  The Dodge-Romig acceptance sampling technique
was selected.  This procedure has the capability of guaranteeing within specific
tolerance limits the agreement between the information on the computer files
and the information on the original data documents.  The method is easy to use
and is adaptable to a wide range of files and to a varying quantity of documents.
The type of plan chosen utilizes a file as a lot, a single document as a
characteristic, a single sampling procedure, and a 2% Lot Tolerance Per Cent
Defective (LTPD).  Our experience with this acceptance sampling plan has been
positive enough that we have extended its use to most of our current studies.
                                       32
-------
                       USE OF STATISTICAL SAMPLING IN
                       VALIDATING HEALTH EFFECTS DATA

                             Carolyn P. Chamblee
                    Statistics and Data Management Office
                     Health Effects Research Laboratory
                Research Triangle Park, North Carolina  27711


     The Statistics and Data Management Office supplies statistical and
data processing support to the Health Effects Research Laboratory (HERL) as
required.   One of the principal responsibilities of the Laboratory, as the
name suggests, is to research and assess effects of air pollution on the
human health.  One method HERL uses to carry out its responsibility is to
conduct nationwide epidemiological research to establish the relationship
between human health and community air quality.  This research includes field
studies that examine the health of population groups residing in communities
exposed to definable air pollutants.  Exposure-response relationships and
injury thresholds are estimated and the studies document changes in health
that accompany changes in environmental quality.

     Questionnaires are designed for the field studies to allow more uniform
collection of data.  These questionnaires are usually designed as keypunch
entry or optical  scanning documents or a combination of both.  As one would
imagine for nationwide studies, the collection effort results in a large
volume of information which is then processed through various steps and
results in a computerized master file or files.

     During the period from 1970 to 1975, HERL conducted a large number of
epidemiological studies commonly referred to as CHESS or the Community Health
and Environmental Surveillance System.  There are approximately 83 of these
studies covering five different types of studies over five areas.  While
successfully completing this intensive data collection effort during a period
of in-house personnel limitations, computer conversion from an IBM 360/50 to a
Univac 1110 and high contractor personnel turnover, HERL developed a backlog
of raw data.  Although these data were computerized and computer-edited, no one
could make a definitive statement regarding the accuracy of the files versus the
original source documents.  The emphasis and importance of quality control
procedures and our inability to qualify the data files led us to the point that
a quality control program for these files had to be developed.  I was assigned
to develop the quality control plan.

     The plan selected had to be able to guarantee that when properly followed
the contents of the computer file reflected data reported on the forms within
a small error tolerance.  Also, each file must meet the error tolerance.
That is to say, a statement that over the 83 files the error rate is less than
the specified tolerance is not sufficient.  Each individual file must satisfy
the limit.  Lastly, the quality control plan had to minimize the verification
effort but must also be simple to use, easy to understand and adaptable over a
wide range of data files and for a varying number of data forms.

                                      33
-------
     A number of statistical  and quality control  references  were  reviewed
before it was decided that the Dodge-Romig acceptance sampling  technique was
the most desirable.   This approach is  discussed in  detail  by Harold  F.  Dodge
and Harry G. Romig (1).   Their book is very easy to read  and understand and
offers a twelve step procedure for selecting a specific sampling  plan.   To
explain how we decided on the plan we  currently use, I will  briefly  describe
the steps and discuss how we implemented Dodge-Romig.

     1.  Decide what characteristics to include.   For example,  a  characteristic
         could be considered a variable or a data field,  which  could lead  to
         distinctly different error rates.  In our case,  we  considered  all
         information on a single questionnaire form as a  group  so that  one form
         equals one record.

     2.  Decide what is to constitute a lot.  A lot is defined  as a  homogeneous
         material unit from a common source.  In choosing the lot unit  we
         balanced the fact that a small number of large lots can  shorten inspec-
         tion time against the additional difficulty of processing the  rejected
         lots.  In our case a lot equals one file.

     3.  Choose the type of protection.  There are two types of protection:
         Lot Tolerance Per Cent Defective (LTPD) and Average Outgoing Quality
         Limit  (AOQL).  The AOQL applies to the average level of quality over
         all lots being inspected.  It is appropriate for a  continuing  supply
         of a product.  The LTPD applies to the quality level of each
         individual lot.  We chose the LTPD type of protection.

     4.  Choose a suitable level of LTPD or AOQL.  For LTPD  choose the value of
         per cent defective you are willing to accept not more than 10 per cent
         of the time, that is, reject at least 90 per cent of the time.  We
         balance the inspection costs against the consequences of accepting a
         file of bad quality.  We considered rates in the range of 1% to 3%
         LTPD.  We decided that 1% error rate was too costly and selected a rate
         of 2%  LTPD.

     5.  Choose between single sampling and double sampling.  For better economy
         in an  overall inspection effort, double sampling is usually preferable.
         However, for minimum variation in the workload, single sampling should
         be used.  We selected single sampling as a more straight forward and
         preferable method in our case.

     6.  Select the proper sampling table on the basis of the preceeding choices.
         We selected the Single Sampling Table for LTPD = 2  per cent (Figure 1
         reproduced from reference 1).

     7.  Obtain an estimate of the Process Average Per Cent  Defective  (PA).  Use
         previous data to obtain the  PA.  Even a rough estimate should be used
         if little prior data are available.  A poor  estimate will only decrease
         the economy of  the plan but  maintains the same LTPD protection.  After
         some  initial examination of  HERL data, the  column  entitled "Process
         Average  0.61% to 0.80%" was  used.

                                         34
-------
                         Single Sampling Table for

                lot Tolerance Per Cent Defective (LTPD) = 2.0%
  SINGLE
 SAMPLING


2.0%
  LTPD
LotSiM
1-75
76-100
101-200
201-300
301-400
401-600
601-600
601-800
801-1000
1001-2000
2001-3000
3001-4000
4001-6000
6001-7000
7001-10,000
10,001-20.000
20,001-60,000
60,001-100,000
PnetmA
OtoO.
•
An
70
86
96
100
106
106
110
118
118
118
118
198
196
196
200
200
208
c
0
0
0
0
0
0
0
0
0
0
0
0







02%
AOQL
%
0
0.16
0.26
0.26
0.28
0.28
0.29
0.29
0.28
0.30
0.31
0.31
0.41
0.42
0.42
0.42
0.42
0.42

0.03 to
ii
All
70
85
95
100
106
106
110
115
190
190
195
260
265
265
286
335
336
c
0
0
0
0
0
0
0
0
0
1
1
1
2
2
2
2
3
3
Average
0.20%
AOQL
%
0
0.16
0.25
0.26
0.28
0.28
0.29
0.29
0.28
0.40
0.41
0.41
0.80
0.50
0.50
0.51
0.58
0.88
Process Average
0.21 to 0.40%
«
All
70
86
95
100
105
175
180
186
255
260
330
335
336
396
460
620
586
e
0
0
0
0
0
0
1
1
1
2
2
3
3
3
4
5
6
7
AOQL
%
0
0.16
0.26
0.26
0.28
0.28
0.34
0.38
0.37
0.47
0.48
0.64
0.64
0.66
0.62
0.87
0.73
0.76
Process Average
0.41 to 0.60%
A
AU
70
85
95
160
166
175
240
245
325
385
460
455
515
620
650
710
770
e
0
0
0
0
1
1
1
2
2
3
4
6
5
6
6
8
9
10
AOQL
%
0
0.16
0.25
0.26
0.32
0.34
0.34
0.40
0.42
0.60
0.58
0.63
0.63
0.69
0.69
0.77
0.81
0.84
Process Average
0.61 to 0.80%
•
All
70
85
95
160
165
175
240
305
380
450
610
675
640
760
885
1060
1180
e
0
0
0
0
1
1
1
2
3
4
5
6
7
8
10
12
16
17
AOQL
%
0
0.16
0.25
0.26
0.32
0.34
0.34
0.40
0.44
0.64
0.60
0.66
0.69
0.73
0.79
0.86
0.93
0.97
Process Average
0.81 to 1.00%
n
AU
70
85
95
160
166
235
300
305
440
666
690
750
870
1050
1230
1520
1690
e
0
0
0
0
1
1
2
8
3
6
7
9
10
12
16
18
23
26
AOQL
%
0
0.16
0.25
0.26
0.02
0.84
0.36
0.41
0.44
0.68
0.64
0.70
0.74
0.80
0.88
0.94
1.0
1.1
                         Figure 1 - Reference 1
Reproduced by permission of John Wiley & Sons, Inc. and Copyright (1959)
Bell  Telephone Laboratories from Sampling Inspection Tables Single and

Double Sampling,  2nd  Edition by Harold F. Dodge and Harry G. Romig.
                                  35
-------
    8.   Choose a sampling plan for the given lot size and estimated PA.
         Since the  sampling plan is designed as a function of the PA, use
         the estimated PA as  the table entry.  Remember to obtain revised
         PA estimates from new data and if possible to select a more
         economical  plan.  For one HERL study, there were 7800 source
         documents.  Based on our estimated PA of 0.61% to 0.80%, we would
         go to the  2% LTPD single sample  table, locate the correct  PA
         column and find the  sample size  for 7800 forms.  This would result
         in the row corresponding to  7001 to 10000 forms being used.  A
         sample size of 760 forms with no more than 10 errors would be used
         for the  study.  For  the purposes oT our plan, the original source
         form was considered  correct  and  any code difference on the computer
         file was considered  an error.

    9.   Find the OC curve of the sampling plan.  If the operating  charac-
         teristic  (OC) curve  is satisfactory, choose the plan.  The OC curve
         for our  plan is shown in Figure  2.

    10.   Select sample units  from the lot by a random  procedure.  A preferred
         method for accomplishing randomization is the use of random numbers.

    11.   Follow the prescribed procedure  for single sampling.   Inspect each
         unit for the characteristics adopted  in step  one and in  accordance
         with sampling procedures.

    12.   Keep a running check of the  PA.  Change the sampling plan  as necessary
         to match  shifts  in  the PA.   Adopt a definite  time period for making
         new estimates such  as every  month or  every quarter.  In  our experience,
         the PA did not change significantly over 6 to 7 months.

     The Dodge-Romig acceptance sampling  plan  described  is not  only
being  used on the past  CHESS  studies  but  is  also used  on current  studies.
For each study  undertaken  by  the data processing staff,  a data  processing
protocol is  prepared in addition to  the  normal  study protocol.  The protocol
describes what  is  to be done  including manual  and computer  steps  and the
expected timeframe.  Edit  checks to  be performed usually  include

     1.   Check  for  valid  codes
     2.   range  checks
     3.   field  type, numeric, alpha  and/or.alphanumeric
     4.   consistency checks  such as  date  of  birth versus age.

Edits  may be  accomplished  by an  individually designed  program or  one or  more
SPSS runs.  SPSS  frequency distributions  are principally used to  identify  out
of range and  other unacceptable  codes.  Audit  trails  are maintained throughout
the processing.

     In conclusion, we  believe we  have a  successful  operational  quality  control
program for our current needs relative to processing  of  large computer  data  files.
For similar applications,  I  would  recommend  reviewing  these  procedures  as  described
by Dodge and Romig and  investigating more usage of  SPSS  as  a quick  evaluation
of the contents of the  data  files.


                                      36
-------
         PROBABILITY OF ACCEPTING A FILE WITH TRUE ERROR
           RATE 6 USING DODGE-ROMIG LTPD (2.0%) PLAN FOR
             n = 760, c = 10; -10000 DIARIES WITH ESTIMATED
                      ERROR IN RANGE .61 - .80%
    100
 X
                                            n=760,  c=10
                                               TRUE
                                              ERROR
                                              RATE  (e)
                             0.00
                             0.25
                             0.50
                             0.75
                               00
 o
 o

 1
 Q.
 Ul

 8
 u.
 O
 >
 Ij
 OQ
 00
 O
 CC
 Q.
      0.00
0.50         1.00         1.50

   % TRUE ERROR RATE (0)
    AOQ=0.79
    PROBABILITY
    OF ACCEPTING
     WITH 6

      100.0
      100.0
       99.0
       97.0
       85.0
       65.0
       42.0
       22.0
       10.0
2.00
            Figure 2 - Operating Characteristics Curve

Reproduced by permission of John Wiley & Sons, Inc.  and  Copyright
(1959) Bell  Telephone Laboratories from Sampling Inspection Tab!es
Single and Double Sampling. 2nd Edition by Harold F.    '~
Harry S.  Romig.
                                  Dodge and
-------
                                 REFERENCE

1.  Dodge, Harold F.  and  Romig,  Harry  G., Sampling  Inspection Tables Single and
    Double Sampling,  2nd  Edition,  John Wiley and Sons,  Inc., New York, 1959.
                                        38
-------
  USE OF  SUCCESSIVE TIME  DIFFERENCES AND  DIXON
       RATIO TEST FOR  DATA  VALIDATION
                      by
                Tyler  Hartwell
         Research Triangle Institute
Research Triangle Park, North Carolina  27709
                       39
-------
            USE OF SUCCESSIVE  TIME DIFFERENCES AND DIXON

                  RATIO TEST FOR DATA VALIDATION

                           Tyler Hartwell*


                            ABSTRACT


     This paper describes preliminary work on two statistical data

editing procedures designed to flag  suspect minute and hourly data from

the Regional Air Pollution Study (RAPS)  computer  data bank which contains

data from the Regional Air Monitoring System (RAMS)  network of monitor-

ing stations in St. Louis, Missouri.  In particular, the  data editing

procedures are:   (i) an intraparameter check where the differences of

successive minute averages for a given variable and station are evaluated,

and (ii) an intraparameter check where hourly averages for a given hour

and variable are compared across the RAMS network or across a selected

subset of stations by use of the Dixon ratio.  The paper  describes how

the procedures were developed for their current application and gives

results of applying the procedures to actual data on the  RAPS data bank.

In addition, suggestions for future research on the two procedures are

presented.  It is concluded that at the present time the  two data edit-

ing procedures should be useful to EPA in flagging suspect minute and

hourly data from  the RAPS data bank.
*  Dr. Hartwell is a senior statistician, Statistical Methodology and
   Analysis Center, Research Triangle Institute, Research Triangle Park,
   North Carolina 27709.
                                  40
-------
                         I.  INTRODUCTION









     The RAMS network of 25 monitoring stations in and around St. Louis,




Missouri collects data on a large number of pollutant (e.g., 0.,, CO,




THC, CH4, NO, N0x, S02, TS, H2S) and meteorological variables (e.g.,




wind speed, wind direction, temperature, dew point, delta temperature,




barometric pressure).  Figure 1 presents a map of the location of the 25




RAMS stations.  The figure indicates that the urban stations (nos. 101-




108) may be as much as 8 miles apart while the rural stations (e.g.,




nos. 122-125) may be as much as 35 miles apart.  The RAPS Data Bank




contains data from the RAMS network of stations.




     The purpose of the two statistical data editing rules  (i.e., minute




successive differences and the Dixon Ratio) examined in this paper is




only to flag suspect RAPS jlatji, not to delete it from the data bank.




That is, because of the vast amount of data collected by the RAMS




network, data editing rules are needed to limit the amount of suspect




data that meteorologists and atmospheric chemists need to examine in




detail.  Thus, the purpose of this paper is to examine two data editing




rules that indicate data that should be examined in more detail by EPA




personnel who have an intimate knowledge of the data that the RAMS




network collects.




     In addition, it is important to note here that the work presented




here is only preliminary.  Because of the complexity of trying to obtain




data editing rules that apply to a large network of monitoring stations,




additional work needs to be done on refining the two rules.  However, at




this point in time, it is felt that the two data editing rules presented




should prove to be useful in flagging suspect data from the RAMS network.





                                   41
-------
    °t
    CO I
    CM I
-------
                II.   MINUTE SUCCESSIVE DIFFERENCES



     The RAMS data received at the Research Triangle Park, North Carolina

contains minute data on several air pollution and meteorological vari-

ables.   Several computerized range validation checks are performed on

this data by the prime contractor, prior to forwarding it to the RAPS

Data Bank.  The RAPS Data Bank was interested in determining if a

statistical procedure could be used in further validation of the data,

to flag minute data values which appeared to be outliers.  In particular,

there was a need to develop and evaluate a procedure (i) which could be

applied to each station's data for one variable at a time and (ii) was

easy to compute and only required one pass through the data.  Accord-

ingly, this study was limited to a simple statistical procedure that

required little computation.  After discussions between EPA and RTI

staff members, it was decided to examine a statistical data editing rule

based on minute successive differences.

     In general, the editing rule examined is designed to flag minute

values which are relatively much higher or lower than the preceding

minute value; i.e.,


                                  flagged value
          Variable
            Level
                         Time in minutes

Thus, the editing rule is designed to detect large spikes in the minute

values of a variable at a station.

                                   43
-------
     In particular, the data editing rule is the following:  at a par-




ticular station compute successive differences between minute values of




a particular variable and if a successive difference is "too large" then




flag this value.  This rule is extremely simple to apply and requires




only one pass through the data base.




     In order to determine when a successive difference was "too large",




the (i) distributions, (ii) sample means, and (iii) sample standard




deviations (s.d.) of minute successive differences for several stations,




times of the day, and air pollution and meteorological variables from




the RAMS network were examined.  For example, Figures 2, 3, and 4 pre-




sent three of these distributions for the variables windspeed, ozone,




and NO-.  In all, over 200 of these minute successive difference plots




were examined.




     After examining these distributions and the corresponding sample




means and standard deviations in detail, it appeared reasonable to




assume that in  general the minute successive differences were approxi-




mately normally distributed with a mean of zero.  However, it was also




clear that the  standard deviation of minute successive differences was




not constant over  stations, times of the day, seasons of the year, and




pollutant or meteorological variables. For example, Table  1 presents




s.d.s of minute successive differences for CO and methane  by the factors




time of the day  (0-4 a.m., 4-8 a.m. and 8-12 a.m.), season of the year,




and two rural and  two urban stations.  It is obvious from  the table that




the s.d.s vary  a great deal over the various factors.




     Accordingly,  it was decided to assume that the distribution of




minute  successive  differences for variables in the RAMS network was




normally distributed with a mean of zero and a standard deviation that





                                   44
-------
CNl

 LU
 ce

 CD
 CO
 LU
 O

 LU
 a: .
 LU !  -
 — 
 CO
 CO O
 LU
 CJ  LU
 CJ  SI
 ZD  •—
CO I—
    CD
 a CD
 LU cn
 LU i—I
 a.
 CO  >
 a  <
 •z. t=\
                                                                                                       ru
i
CO

CsJ

 II

 LU
 N
 I—»
CO
                                                                                                                 CO
                                                                                                                  CD
                                                                                                                  CO
 LU  CD
 0  rH
 O   O

 h-   K
 D   <
 CQ   h-
 —•  CO
 a:
 h-
 co
                                                                                          1
                                                                                                       ru
                                                                                                        i
                                                                                                                 LJ
                                                                                                                 Q
                                                                                                                 in
                     CD
                       *
                     o
                                                                QJ
 
  »
 
                                                    §•
 I
 H

-------
                                FIGURE 3


             DISTRIBUTION OF OZONE  MINUTE  SUCCESSIVE DIFFERENCES
              FOR STATION 122; DAY  180,  1976;  TIME 0 TO 4 A,M,
 o
 c
 0)
 3
 cr
 OJ
 V-i
OJ


•H
4J
o . *t —
00


-


-


™
„


_
0Ck
-e















	 	 P*1
.6 -0.4 -0.
















2
















1






MM









(

MM*
MHMP
•MHB
mm**
mwm*













\


__













)

•






— •




1 — «
{
"l
1 1 f 1 ' 1
8.2 6.4 8.6
                                 xie
                                    -2
 MEAN"  3.60089    STDEU= 0.00084
SAMPLE SIZE = 236
                                    46
-------
                                FIGURE
            DISTRIBUTION OF NC^ MINUTE SUCCESSIVE DIFFERENCES

            FOR  STATION 122; DAY 180, 1976;  TIME 4 TO 8 A,M,
o
c
-------
PQ
          CQ

          LU
          z
          <
          in
          t-
          LU
          Q
          Z
               z
          a:   o
          o   •—
          u-   H-
CD
i\
CD
              GO
          •-•   UJ
               a.
          co   >-
          LU  i—
          u
          Z   Q
          LU   Z
          on   <
          LU
          U_    •»
          U-   C£
 LU
 >  U_
 —  O
 CO
 CO
 LU
 O  CO
 O  <
 ID  LU
               Z
               O
          GO  GO

           LU    -N
 u.
 O
               LU
           CO  •-•
           Z I—
           o
 LU
a

 Q
 a:
 <
 a
 z
 <

GO



CN!
CNI
rH
z
o
1
r—
<
f—
GO




r-x
r-H
i-H
Z
0
>— 1
J-
<
1—
GO





•zr
0
r-H
Z
0
K
t-
GO




i-H
O
, 1
^H
^
0
1— 1
1—
<
GO
















CNJ
t-H
i
oo

oo
1
C3-

j- s::
i
0 -
<
G
LU
_l
CO
<
1— t
a:
<











cn
t-H
i-H
•
1-^
hn
CD
"
i-H
cn
CD

CD
CD
i— 1
•

r-H
un
t— 1

CD
UD
CNI
•
cn
ca-
r—i
-

un
UD
r-H
•
CNJ
i_n
CNJ

OL
LU
r-
z
»— 1
is.
CD
-=r





CD
<_>
CD
fO
CD
"
OO
CNJ
CD
^T
r-H
CD

cr
UD
cr
*
t-H
hO
CD
"
CNI
o->
CD

CNI
CNJ
i_n
»

t^\
00
CNI

CD
l^>.
CNJ
r-H
CO
CD
-

cr
cn
rA
"
^r
Ln
o
™
CD
Z
i— i
o:
a.
GO
cn
CNI
r-H





CD
r-H
CD
"
r^.
r— 1
CD
cr
!—i
CD

UD
r>n
CD
•
UD
CNI
CD
"
cn
CNJ
CD

UD
r-H
^r
-

r-H
hO
CNJ

CNJ
CNI
r— I
UD
-=r
CD
-

UD
OO
CNI
"
Ln
•— i
CD
**
cc.
LU
S
SI
•=>
GO
CD
CNI
OJ





CNI
CNI
CD
"
r\
UD
CD
N-\
r-H
CD

i-H
CN!
CD
"
hO
CN)
CD
"
h^
CNJ
CD

tn
cn
CNI
-

r-H
cn
CD

en
oo
r-H
r->.
UD
CD
-

i-H
cn
CD
•
Ln
cr
CD
™


_l
_l
<
LL_
•^^
^v.
CD
l\
CNI
















cn
r-H
CD
**
r^
r-H
CD
"
cr
CNJ
CD

UD
CD
CD
-

K>
hn
CD

r-H
UD
CD
r^
N^
CD
-

CD
cr
o
"
CD
UD
t-H
m
OL
LU
\-
Z
H— 4
~-^-
CD
•=r


LU
z
<
•3^
I-
LU
s:
t-H
t-H
CD
•*
t\
i-H
CD
K^»
CNJ
CD

OO
CD
CD
~
cn
CD
CD
**
!-«>.
CD
CD

CD
t-H
CD
-

CO
CNI
CD

.3-
r-H
CD
CD
r— 1
CD
-

OO
r-H
CD
"
i_n
CN)
CD
m
o
z
•—*
C£.
D-
GO
cn
CNI
r-H





un
CD
CD
"
UD
CD
CD
Ln
CD
CD

CO
CN)
CD
™
|V^
i-H
O
*
UD
r-H
CD

r-^
r-H
CD
•

cr
t— i
CD

CD
CNJ
CD
*
1^.
CD
CD
-

cn
CD
CD
"
CD
t-H
CD
•i
a:
LU
s:
2:
a
GO
V^
^*S.
CD
CN)
CN|





1^
CD
CD
m
00
CD
0
UD
CD
CD

—j
i-H
CD
"*
UD
t-H
CD
•
CNI
r-H
CD

i_n
r>n
CD
•

K"\
cr
CD

CN)
i_n
CD
CO
CD
CD
•

cr
r-H
CD
"
UD
cr
CD
—


_i
_i
<
u_
N^,^
^*S.
CD
r^.
CNJ





                                                                                                   <

                                                                                                   <
                                                                                                   z
                                                                                                   LU
                                                                                                   1—4
                                                                                                   O
                                                                                                   I-M
                                                                                                   U.
                                                                                                   10-
                                                                                                   ^}
                                                                                                   CO
                                                       48
-------
may vary over time of the day, season, and type of station (urban or




rural).   This implied that a minute successive difference would be




flagged when it was greater than a function of the appropriate standard




deviation (e.g., a standard procedure for detecting outliers for the




normal distribution with mean zero is to flag observations which are




greater (or less) than 4 s.d.s).  The probability of an observation




being greater (or less) than 4 s.d.s for the normal distribution is less




than .0001.   Thus, the problem reduced to determining for each variable




of interest the appropriate standard deviation which might depend on




time of the day, season of the year, and type of station.




     To examine this problem, standard deviations of minute successive




differences for each variable of interest for approximately 4 days per




season in 1976, 4 stations (2 urban and 2 rural), and 3 times of the day




(0-4 a.m.,  4-8, 8-12) were computed.  Thus for each variable between 100




and 192 (4  days x 4 seasons x 4 stations x 3 times = 192) standard




deviations  were computed.  A standard deviation was only computed if at




least 60 minute successive differences were available during the 4-hour




time period being considered.  The results of some of these computations




are summarized in Table 2 for 10 of the variables measured in the RAMS




network.  These 10 variables were chosen not only because there was




interest in editing their minute values but also because sufficient data




was available on them for the days selected for computing standard




deviations.   The s.d.s presented in Table 2 are average s.d.s (i.e.,




averaged over several stations and days).




     Using the s.d.s in Table 2 a statistical technique referred to as




the analysis of variance (ANOVA) was used to test if these average




standard deviations were significantly different (in a statistical





                                   49
-------
 ce
 o
 CO
 LU
 O
 z
 LU
 o:
 LU
Q


 LU
 CO
 CO
 UJ
 o
 u

00

 UJ
      oa
      <
      o:
      <
      >-
      m
O

CO

§   <

p   <
<  C3
 Q
 C£
 <
 a

 <

oo

 u.
 o
 a.
CNI


 LU
UJ
cc
LU
•*
"^
CsJ
_l
ce
LU
CD



ce
.
ca
H
c
u_
^

z
o
»-4
1—
<
t-
oo
_i
LJU
ce
UJ
y|
jr
3
oo


O
Z

cc
0.
oo

CC
UJ
1—
z
ae

i
OO

oo
.sr

i
CD
3!
a:
a;
z

03
cc

LU
	 |
CO
•^
•— «
ce
<







N"1*
Ln


i— i
cr
•



CS|
Ln
.





•sr
UD
.




OO
Ln •

Ln
-


cr

r— 1

•=r



cn
Ln
^.
Q 0
LU LU
LU CO
a. ^
CO CO
a 01
Z LU
3 UJ






cn
CD


UD
0
•



cn
O
.





•— <
•




CD
i— 1

r- 1



UD
CD

CD

OO
CD



^.
UJ
a:

l—
^
CD
CD
CD
t— 1
CSJ
3

CSI
i-H
8
Cs|
CD
CD
i— 1
S
•
CD
Csl
CD
CD



UJ x->
Z •£.
o a.
NI a.
CD ^— •






cn

•—I


O
CSI
"


O
Ln
•—i
•




Ln
CD
i— i
.



UD
cn
CD
CSJ
Csl
CSI


cn
i— t
i— i

CD
^
r- 1
1— 1
•

r- 1
r— 1



s~^
•z.
o a.
<_> a.
v- f






LA
CSI
CD


LA
o
•


|x^
1— 1
CD
m




zy
CSI
CD




^r
CSI
CD

o


oo
CSI
CD

Csl
CD
cn
!— 1
CD
-

r-H
CD


LU

*c y!
i a.
1- D.
LU "*— ^
"





LA
f*s^
CD


cn
CD
•


CSI
oo
o





fs^_
Ln
CD




Ln
uo
CD

OO
CD


-a-
oo
CD

LA
CD
Ln
CD


^^
cn
CD



^^
C— ) ^.
2C Q.
1 — 0-
1 — /






•a-
0
CD


CD
O
CD
•

rH
hn
CD
O




Csl

i



UD

CD
O
cn
o
CD

OO

i
CSI
8
r-l
I— I
CD
CD

cn

CD
CD



^•^
y*
CD Q-
~Z Q-
•— '





^3.
\-C\
0
o


cn
O
CD
•

CD

CD
O




cn
Ln
o
CD



OO
Ln
C~i
°
Ln
Ln
CD
CD

Csl

CD
CD
Ln
^J-

CD
Csl
CD
CD

OO
OO
CD
CD



^-^
x 2:
0 0-
TZ a.
— '





oo
1— 1
o
CD


CSI
CD
0


CSI
CD
CD
CD




CD
t— 1
8



OO
CSI
CD
0
Csl
CD
CD

Ln
1— 1
8
IJTJ
CD
CD
cn
CD
8

^
Csl

cc

u.

3 ^
00 Q.
a.

^
0



1— 1
f-H
CD
0


Ln
CD
s
•

N"\
CD
CD
CD




j— . (
c-H
CD
CD



CSI
Csl
CD
CD
.—1
T— 1
O
CD

CD
rH
0
CD
g
8
UD
CD

-
LT\
r— |
8



s-^.
Csl Zi
CD Q-
oo a.



                                                                                                                    CN
                                                                                                                     Q
                                                                                                                   o cn
                                                                                                                    CD Ln
                                                                                                                    i-H CD
                                                                                                                    cn CD
                                                                                                                    Z CD
                                                                                                                    o r>n
< cn   o
i— cn   —
co CM   i—

     -.   t-
co CD   co
5- 1—
< CSI   U.
a       o
     «s
c3 Ln   111
z UD   a.
< csi   >
         i—
CO   >
Z CD   Q
O CSI   z
— CSI   <
                                                                                                                     I- cn
                                                                                                                              z
                                                                                                                              o
                                                                                                                        csi    co
                                                                                                                     _i       <
                                                                                                                     <  ^    tu
                                                                                                                     cc CD    to
                                                                                                                     UJ OO
                                                                                                                     > 1— I     -
                                                                                                                     UJ       >
                                                                                                                     fO  -N    <
                                                                                                                        O    Q
                                                                                                                     cn r-^
                                                                                                                     UJ r— I    U.
                                                                                                                     >       o
                                                                                                                     o  ->
                                                                                                                        CD    UJ
                                                                                                                     co tf\    E
                                                                                                                     co  cn    a:
                                                                                                                         CSJ    LU
                                                                                                                     z  — i    >
                                                                                                                     o        o
                                                                                                                     Q CD
                                                                                                                     LU CD
                                                                                                                     CO r— I
                                                                                                                     <
                                                                                                                     Z UD

                                                                                                                     UJ   -.
                                                                                                                     Z CD

                                                                                                                     LU
                                                                                                                     _l   •>
                                                                                                                     a. cn
                                                                                                                      •

                                                                                                                     LJ  Q
                                                                                                                               <
                                                                                                                               LU
                                                                                                                               UJ
                                                                                                                               a:
                                                                                                                               LU
                                                                                                                              CH
                                                                                                                              LU


                                                                                                                             CD
                                                               50
-------
sense).  In particular, a separate ANOVA was carried out for each of the




10 variables.  In each of the 10 ANOVAs, statistical tests were used to




determine if average standard deviations by time of day, type of station,




and season of the year were significantly different.  The results of




running the 10 ANOVAs indicated that in the majority of cases the average




s.d.s in Table 2 were significantly different.  For example, in column 3




of the table (i.e., ozone by station) the average s.d.s of minute suc-




cessive differences for ozone for two urban stations was .0020 and two




rural stations was .0012.  The test of significance of these two averages




was significantly different at the .01 level.  Note that Table 2 also




presents the average s.d. for each variable over time of day, season of




the year, and type of station (e.g., .0016 for ozone).




     The average standard deviations in Table 2 clearly indicate that




from a statistical point of view the s.d.s of minute successive dif-




ferences are significantly different for several of the variables for




one or more of the factors examined  (i.e., station type, time of day,




and season of the year).  Thus, to be strictly correct  (in a statistical




sense) in applying the data editing rule based on minute successive




differences, it would be necessary to base the rule on varying s.d.s by




season of the year, etc. (i.e., the rule would be ±4 s.d.s where the




s.d.s are given in Table 2).




     Due to the fact that the above data editing rule might prove to be




somewhat confusing, a more conservative and easier to program rule has




been initially examined  (of course, only actual application of a data




editing rule can determine its practical usefulness).  This rule is




based upon using, for all minute successive differences for each vari-




able in Table 2, ±4 times the largest average s.d. across station type,







                                   51
-------
time of day, and season of the year.   Thus, for ozone the rule would be




based on ±4 times the s.d. = .0024 (i.e., the average s.d. for summer).




This rule is extremely easy to apply requiring only one value to be




exceeded by each minute successive difference of a variable regardless




of station, time of day, or season.  Of course, it is conservative in




the sense of having a limit which is somewhat high in many cases (e.g.,




for ozone in winter a more exact rule would be based on a s.d. = .0007).




     Accordingly, using the largest average s.d. for each variable in




Table 2 and basing the data editing rule on ±4 s.d. limits, Table 3




gives possible limits for flagging minute successive differences by




variable for the RAMS network of stations.  In deriving the limits in




Table 3, it was noted for the RAMS network that CO, methane, and THC




were only measured every 5 minutes and total sulfur (with the exception




of Station 117) and S09 were only measured every 3 minutes.  Therefore,




the s.d.s given to RTI based on minute successive differences of these




variables were underestimates for detecting spikes at five (or three)




minute intervals (i.e., the average s.d.s in Table 2 are too small for




these five variables).  In an attempt to compensate for this under-




estimate, Table 3 includes an adjustment factor for these five vari-




ables.  The adjustment factor multiplies the ±4 s.d. limits for CO,




methane, and THC by /5 and the ±4 s.d. limits for total sulfur and S02




by /3.  These factors were derived by assuming that on the RAMS data




file the minute successive differences for CO, methane, and THC are zero




except for every 5th minute (etc. for total sulfur and S0~).




     Using the limits given in Table 3, RTI then examined the percentage




of minute successive differences that would be flagged for Stations 101




and 122 for 8 days in 1976 for 10 RAMS variables.  The results of these





                                 52
-------
                          TABLE 3
         POSSIBLE MINUTE SUCCESSIVE DIFFERENCE LIMITS
                 ON 10 RAMS VARIABLES17
                  VARIABLE                LIMIT
       WINDSPEED  (METERS/SEC,)           ±3,0
       TEMPERATURE  (°C)                  ± ,660
       OZONE  (PPM)                       ± ,0096
       CO  (PPM)                          ±1,97
       METHANE  (PPM)                     ± ,316
       THC  (PPM)                         ± .m
       NO  (PPM)                          ± ,028
       NOX  (PPM)                         ± ,035
       TOTAL  SULFUR  (PPM)                ± ,022
       S02  (PPM)                         ± ,015
***'  BASED ON ±4 STANDARD DEVIATION LIMITS,   IN ADDITION,
    FOR CO, METHANE, AND THC, THE ±4 S,D,  LIMITS HAVE BEEN
    MULTIPLIED BY-\/5 TO ADJUST FOR THE FACT THAT THESE
    VARIABLES ARE ONLY MEASURED EVERY 5 MINUTES,  SIMILARLY,
    FOR TOTAL SULFUR AND S02 THE ±4 S,D,  LIMITS HAVE BEEN
    MULTIPLIED BY>/3 SINCE THESE VARIABLES ARE ONLY MEASURED
    EVERY 3 MINUTES,
                            53
-------
computations are given in Table 4.   The table shows that except for



ozone the percent flagged per variable was less than .6 percent.   For



ozone it appears that entirely too  many minute differences were flagged.



For example, Figure 5 presents a plot over two days of minute values of



ozone for Station 105.  Examination of the figure indicates that a



relatively large percentage of the  minute values would be flagged using



the limits given in Table 3; although, EPA personnel have indicated that



the data in Figure 5 is not atypical.  In addition, discussions with EPA



personnel have indicated that in the RAMS network, five of the stations



(101, 104, 105, 107, and 115) are heavily affected by traffic.  Thus, it



may be necessary for these stations to have higher limits for flagging



minute values for ozone than those  given in Table 3.  Also, it has been



suggested that for these traffic affected stations it may be necessary



to examine both ozone and NO  minute values simultaneously before flagging
                            X


ozone values (e.g., if a minute ozone value jumps significantly from one



minute to the next but the NO  reading does not jump, then and only then
                             X


should the ozone value be flagged.)



     In addition to the results given in Table 4, the percentage of



minute successive differences that would be flagged for Station 105 for



8 days in 1976 using the limits given in Table 3 were examined.  The



results were the following:



             Percentage Flagged for  Station 105


               Variable       Percent Flagged
ws
Temp.
°3
CO
CH.
4
THC
NO
NO
X
TS
S00
.04
.12
7.9
.56
.41

.90
1.5
1.4

1.5
1.8
                                   54
-------
The above table clearly indicates that additional work needs to be done



for ozone since entirely too many values are being flagged.  In addition,



for the other pollutant variables many more values are being flagged



than in Table 4 (Stations 101 and 122 combined).  Thus, it would seem



that the variables at Station 105 are probably being affected by auto-



mobile traffic.



     Accordingly, at the present time it appears that the minute suc-



cessive differences limits in Table 3 except for ozone are probably



reasonable for a majority of the RAMS monitoring stations.  However, for



Stations 101, 104, 105, 107, and 115 which are heavily affected by



traffic, additional work needs to be done on the limits for the pollutant



variables (the limits for windspeed and temperature appear reasonable



for all stations).  Of course, since only Stations 101, 104, 105, 116,



117, and 122 were examined in this analysis to date, it may be that



additional work is needed for other specific stations.  For Stations 101,



104, 105, 107, and 115 wider minute successive differences limits should



be examined for the pollutant variables particularly for ozone.  As



mentioned previously, it may be for ozone that simple minute successive



difference limits are impractical (i.e., for ozone at the traffic



effected stations it may be necessary to have a minute data editing rule



which is tied to another pollutant such as NO ).
                                             X


     Another refinement of the limits given in Table 3 that might be



examined is to have them vary by time (see Table 2).  For example, for



ozone the s.d. of minute successive differences is much higher for 8-



12 a.m. than from 0-8 a.m.



     Finally, before proceeding, Figures 6 and 7 present two plots of



minute values for CO from the RAPS Data Bank for Stations 101 and 105,






                                  55
-------
respectively.  These plots indicate minute CO values which were flagged




by the CO limits given in Table 3.   The plots indicate that the data




editing rule based on minute successive differences may be quite useful




in detecting minute outliers for CO.
                                    56
-------
                        TABLE 14

PERCENTAGE OF MINUTE SUCCESSIVE DIFFERENCES FLAGGED FOR
STATIONS 101 AND 122 FOR 8 DAYS IN 1976; BY VARIABLE^
VARIABLE
WINDSPEED
TEMPERATURE
OZONE
CO
Cfy
THC
NO
MOX
TS
S02
PERCENT FLAGGED
,53
,17
3,0
,06
,11
,24
,12
,12
,51
,34
   TOTAL NUMBER OF MINUTE SUCCESSIVE DIFFERENCES COMPUTED
   = 23,040 (8 DAYS x 2 STATIONS x 1,440 MINUTES/DAY),
   THE DAYS WERE 1, 2, 96, 97, 231, 232, 286 AND 287,
                            57
-------
CO
o
  .
o
u.
LU
 LU

V^
 CO
 CO
 LU
LU
                                                             Q.
                                                             0-
                                                    58
-------
                                                                  5)
                                                                  3
                                                                  tfl
                                                                  60
                                                                  oo
       V
       •<  r~  in
       x  r  o

       i  r^  »vi  —i
       o  iu  -

Q

a:
o
u.

UJ
               o
               o
            « ".
            in <»
or  •—    -S
      OQ

      LJJ
      C/5
      ID
      {/)
      a:
      LU
       •>-4 CO
                              O
       
                                                                                                         I

                                                                                                        »
                                                                                                             O


                                                                                                             


                                                                                                             00
                                     CO
                                                           i                i                r

                                                 LD              ^r              csi
                                                           CD  a.
                                                           CJ  0.
                                                             59
-------
UJ
CO VERSUS TIME BY MINUTE FOR DAYS 2 AND 3 IN 1976
RW1S RETRIEVAL PLOT USING DEFAULT LABELS DALEXXX)
con n/760ioa/769ie3/0eec/a359/i/i05/2/3'4/co 11/92/7
1SI33M
C.l COAB
•
•>
•









1 I 1 1
3 =;
^ .=3









1 I 1 1
3 —
N"









1 1 1 i
5 C
\ CN









1 1 1 1
3 r~.
J «—
"^
4
-j
4
ml
m
__3C
•
t
*e
_J
1
|
4
o-H
^_<
1 1 1 I
> C
H




t




till
3 C
t^-
1
^
CD
I
»
cu
1
n
CB
Ul
O £
UJ -
O O
s: 2
§ $
^
CN z
r-H g
A C
- »»
t 0
IV) Ul
<» -
m
o r^
g §
^^^ ^9
^ 1
^S «»
O g
0 ?
.. 10
03 ^
» 2
C» \
•• *^
ni
3 *
H
                                                                                                     LLJ
                                                     o  a.
                                                     cj  a.
                                                 60
-------
                         III.  DIXON RATIO







     This study also examined the possibility of flagging hourly data



across the RAMS network by use of the Dixon Ratio.  That is, for a



particular variable and hour of the day this ratio will flag stations



whose hourly averages are "too high" or "too low" as compared with the



other stations in the network for that hour and variable.  The Dixon



criterion was examined as a potential validation procedure for hourly



averages because it was a simple procedure which was easy to compute and



only required one pass through the data.



     In brief the Dixon criterion is the following:



          For a particular hour and variable (e.g. , 0.,) rank the N



     hourly values X. over stations such that X. ^ X_ < ... £ X^.  Then



     compute the criterion (if N is between 11 and 13)





          Wz
     RU = ~z—v—  (to check largest hourly value)
           VX1
           - ;r~  (to check smallest hourly value)               (1)
     If Rj- (or R. ) is too large reject largest  (smallest) hourly value



     (e.g., if number of stations = 12 and R^ >  .642, reject X^ at the



     .01 level of significance under normal distribution assumptions).



     Note, see [1] for the Dixon criterion when  N is less than 11 or



     greater than 13.



     In general, the Dixon criteria is designed  to reject the following



type of station hourly values:
 [1]  Dixon, W. J., "Processing Data for Outliers," Biometrics, Vol.  9

    (1953), p. 74.



                                  61
-------
                                                     Flagged value
                   Station Hourly Values






Thus, the criteria is designed to flag hourly values which contribute a




relatively large percentage of the range of the hourly values across the




network.  Note, that the criterion as given in (1) only flags one high




and one low hourly value and not multiple hourly values.  For the




present study RTI did not examine the flagging of multiple hourly values




for a particular hour of the day.




     The Dixon criterion given in Equation (1) was first applied across




the entire RAMS network of 25. stations.  The results of these calcula-




tions indicated that entirely too many hourly values were being flagged




for several of the RAMS variables (e.g., ozone, CO, and N02).  Accord-




ingly, the reason so many hourly values were being flagged was deter-




mined by examining the sample means of several RAMS variables for urban,




residential, and rural stations in the RAMS network.  Table 5 gives a




summary, for six RAMS variables, of some of these computations.  The




table gives the sample means and standard deviations for the three types




of stations by season of the year.  In general the table shows that the




sample means of hourly values for a particular variable are not the same




for urban, residential, and rural stations in the RAMS network.   (This




is particularly true of the pollutant variables; whereas, for the




meteorological variables the means are much more similar across the




network.)  Statistical tests of these station type means were found to




be significantly different in several cases.  Thus, it became clear that




one of the underlying assumptions of applying the Dixon criterion was




being violated; namely, that the hourly station values come  from a
                                  62
-------
                                   TABLE 5
HOURLY SAMPLE MEANS AND STANDARD DEVIATIONS FOR URBAN,  RESIDENTIAL,  AND RURAL

      STATIONS^ IN THE RAMS NETWORK SY SEASON OF THE YEAR AND VARIABLE

                                            STATION TYPE

VARIABLE

°3
(PPM)


CO
(PPM)


H02
(PPM)


CH/,
(PPM)


WINDS PEED
(METERS/SEC.)


TEMPERATURE
(°C)


SEASON
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
URBAN j RESIDENTIAL
MEAN2/ STD. DEV,
.030
,052
.017
.008
,699
.999
.837
.451
.030
.046
.026
.024
1.781
2,068
1.834
1.768
4.122
2,799
4.518
5.255
13.993
26.474
11.924
-5.180
.013
,023
.021
.006
.756
.917
,987
.526
,017
,087
.015
,012
.257
.486
.391
,275
1.775
1.082
1.385
1.656
4.337
1.879
6.271
6.648
MEAN
.034
,055
.018
,009
.634
1.072
.793
.418
,025
.047
.023
.017
1.604
1,905
1.787
1,436
3,928
2.710
4.122
4.871
13.864
25.767
11.277
-4,612
STD. DEV.
,013
.018
.011
.006
.670
.837
1.092
.327
.016
,094
.012
.010
.202
,279
,244
,235
RURAL
MEAN STD.
.041
.063
.025
.016
,333
.329
.249
.215
.011
.011
,011
.010
1.480
1.778
1,621
1.604
1,525 ! 4,020 1,
1,183
1.252
1,602
4.314
3.153
6.212
7,228
2,249
3.967 1.
5,277 2,
13.367 4.
25.497 2,
10.786 6.
-5.535 6.
DEV.
015
021
013
008
474
638
307
163
012
010
010
008
294
297
195
210
839
898
483
097
408
026
335
722
URBAN = STATIONS 101 TO 108; RESIDENTIAL - STATIONS 111 TO 113, 119,.
        AND 120; RURAL - STATIONS 109, 110, 114 TO 118, 121 TO 125.

MEANS AND STANDARD DEVIATIONS ARE BASED UPON 4 HOURS PER DAY FOR 5 DAYS
OVER THE VARIOUS STATIONS (URBAN, RESIDENTIAL, OR RURAL).
                                     63
-------
normal distribution with the same mean and variance.  Instead, for




example, the hourly station values for ozone have different means for




urban and rural stations (the means are higher for rural stations).  The




consequence of groups of stations having different means is illustrated




below:





          Urban                      Rural
                 Station Hourly Values






The above figure shows, using the Dixon criterion, that some rural




stations may be flagged simply because their means are always higher




than the means for urban stations.  Accordingly, after examining sample




means such as those presented in Table 5, RTI decided to apply the Dixon




criterion separately to rural stations (Stations 109, 110, 114 to 118,




121 to 125) and urban-residential stations  (101-108, 111 to 113, 119,




and 129).




     After applying the Dixon criterion to  the two types of stations




separately, RTI then examined the results and again found that too many




hourly values were being flagged.  Accordingly, after discussions with




meteorologists and air chemists it was decided that the Dixon criterion




for certain pollutant variables should only be applied across the rural




or urban-residential stations if the following criteria were met:




      (i) the high station value > twice the low station value, and




      (ii) the high station value > some constant  (e.g., constant =  .03 ppm




          for ozone).




The first criteria simply means that a factor of 2 for an hourly average




across the network is not uncommon.  Criteria  (ii) limits the applica-




tion of the Dixon Ratio to situations where most of the measurements are






                                  64
-------
well above minimum detectable.  In addition, it was decided that the



Dixon criterion could not be used for hourly NO, NO , TS, and SO,,
                                                   X            «£


values.  Furthermore, it was felt that the use of the Dixon criterion



for CO was questionable due to the heavy influence of traffic on this



variable.



     With the above restrictions, the Dixon criterion for detecting high



hourly values only (i.e., R  in Equation (1)) was then applied to 7
                           rl


variables in the RAPS Data Bank for both urban-residential and rural



stations.  In applying the rule an hourly value was flagged if R,, was



greater than .7 (except for dew point where R,, > . 6 was flagged).  The



results of these computations are presented in Tables 6 and 7.  In



addition, Tables 8 and 9 present examples of flagged hourly values for



several different RAMS variables.  Tables 6 and 7 indicate that the



percent flagged is usually 5% or less.  Thus, the Dixon rule as applied



does not seem to be impractical.  In addition, the examples given in



Tables 8 and 9 clearly indicate hourly values for several variables on



the data bank that should be examined in more detail by knowledgeable



meteorologists and air chemists.



     Accordingly, as with minute successive differences, RTI feels that



the Dixon criterion will be useful in flagging hourly data across the



RAMS network.  However, further refinement of the rule may be required.



For example, two points that need further examination are:



       (i) can the rule be applied to flagging low hourly values  (RT in



          Equation (1)), and
                                  65
-------
(ii)  can the rule be applied in practice to the stations which are




     heavily influenced by traffic (101, 104,  105,  107,  and 115),




     particularly for the variable,  CO?   An alternative  here would




     be to only use the Dixon rule for the 20  stations not heavily




     influenced by traffic.
                              66
-------
                         TABLE 6

  RESULTS OF APPLYING DIXON RATIO TO 12 URBAN STATIONS
                  IN RAMS NETWORK172717
VARIABLE
OZONE
CO
CH4
THC
TEMPERATURE
DEW PoiNT^7
WlNDSPEED
PERCENT OF TIME RATIO > ,7
1,0
2,9
2,1
5,2
2,7
4,4
4,0
NUMBER FLAGGED
5
14
10
25
13
21
19
17  RATIO APPLIED TO ALL 24 HOURS ON 20 DIFFERENT DAYS FOR THE
    VARIOUS POLLUTANT AND METEOROLOGICAL VARIABLES (= 480 RATIOS
    FOR EACH VARIABLE),

    RATIO ONLY USED TO FLAG HIGH VALUES,

    FOR OZONE RATIO APPLIED ONLY IF HIGH STATION VALUE > ,03 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR CO RATIO APPLIED ONLY IF HIGH STATION VALUE > 3,0 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR CH/j RATIO APPLIED ONLY IF HIGH STATION VALUE > 2,0 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR THC RATIO APPLIED ONLY IF HIGH STATION VALUE > 2,0 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR WlNDSPEED RATIO APPLIED ONLY IF HIGH STATION
      VALUE > 3,0 METERS SEC,

    FOR DEW POINT PERCENT OF TIME RATIO > ,6,
                             67
-------
                             TABLE 7

           RESULTS OF APPLYING DIXON RATIO TO 13 RURAL
                 STATIONS IN RAMS NETWORK^/^/

  VARIABLE        PERCENT OF TIME RATIO >  ,7        NUMBER FLAGGED
OZONE                       3,8                           18
CO                          6,0                           29
Cfy                         4,2                           20
THC                         5,0                           24
TEMPERATURE                 8,5^                         41
DEW POINT^                 6,5                           31
WlNDSPEED	*K4	21


1*  RATIO APPLIED TO ALL 24 HOURS ON 20 DIFFERENT DAYS FOR THE
    VARIOUS POLLUTANT AND METEOROLOGICAL VARIABLES (= 480 RATIOS
    FOR EACH VARIABLE),

*/  RATIO ONLY USED TO FLAG HIGH VALUES,

**  FOR OZONE RATIO APPLIED ONLY IF HIGH STATION VALUE > ,03 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR CO RATIO APPLIED ONLY IF HIGH STATION VALUE > 3,0 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR CHjj RATIO APPLIED ONLY IF HIGH STATION VALUE > 2,0 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR THC RATIO APPLIED ONLY IF HIGH STATION VALUE > 2,0 PPM
      AND HIGH STATION VALUE > 2 LOW STATION VALUE,
    FOR WlNDSPEED RATIO APPLIED ONLY IF HIGH STATION
      VALUE > 3,0 METERS SEC,

^  FOR DEW POINT PERCENT OF TIME RATIO > ,6,

^  FOR TEMPERATURE THE DlXON RATIO FLAGGED EVERY HOURLY VALUE FOR
    TWO CONSECUTIVE DAYS IN WINTER WHERE ONE STATION IN THE RURAL
    NETWORK READ APPROXIMATELY 8°C AND ALL OTHER STATIONS READ LESS
    THAN -2°C, SEE TABLE 9 (THIS COULD PERHAPS BE DUE TO A SIGN
    MISTAKE AT THE HIGH STATION),
                                   68
-------
                                 TABLE 8
             EXAMPLES OF FLAGGED VALUES FOR SEVERAL VARIABLES
      USING DIXON RATIO ON 12 URBAN STATIONS  IN THE RAPS DATA

VARIABLE
WINDS PEED
(METERS/SEC,)

TEMPERATURE
(°C)

DEW POINT


OZONE
(PPM)

CO
(PPM)

CH^
(PPM)

THC
(PPM)

DIXON
RATIO
,895
,821
,776
,766
,768
,769
,724
,723
,889
,853
,755
,797
,755
,996
,932
,789
,941
,931
,743
,734
,924
STATION VALUES
HIGHEST
15,7
14.4
6,4
6,5
32,1
18,7
-2,8
15,7
18,4
,163
,133
,041
3,81
43,5
4,2
4,2
5,6
4,1
3,10
4,19
4,31
2 HI
4,5
4,3
2,7
3,2
29,2
15,9
-6,2
9,6
7,1
,037
,100
,040
1,41
,51
,42
1,8
1,8
1,7
1,97
1,91
1,80
3 HI
3,6
4,2
2,6
1,0
28,4
15,2
-9,6
7,0
-6,5
,033
,089
,010
1,04
,24
,37
1,7
1,8
1,7
1,97
1,78
1,59
3 LO
2,4
2,6
1,5
-,30
27,3
14,2
-11,8
4,0
-8,7
,012
,080
,002
,16
,07
,13
1,3
1,6
1,5
1,63
1,28
1,50
2 LO
2,1
1,9
1,5
-.65
27,2
14,2
-12,1
3,7
-9,6
,011
,075
,002
,14
,06
,09
1,0
1,5
1,5
1,58
,91
1,36
LOWEST
2,0
1,7
1,3
-1,6
27,0
13,9
-13,8
-2,2
-10,1
,003
,002
,002
,13
,05
,07
,06
1,4
1,4
1,40
,05
1,25
I/
RATIO ONLY USED TO FLAG HIGH VALUES,
                                   69
-------
                                  TABLE 9

              EXAMPLES OF FLAGGED VALUES FOR SEVERAL VARIABLES
       USING  DIXON  RATIO  ON  13 RURAL STATIONS IN THE RAPS DATA BANK-*-/

VARIABLE
WINDSPEED
(METERS/SEC,)

TEMPERATURE
(°C)

DEW POINT


OZONE
(PPM)

CO
(PPM)

CHij
(PPM)

THC
(PPM)

DIXON
RATIO
,815
,808
,772
,940
,957
,921
,845
,782
,701
,933
,900
,771
,929
,973
,889
,944
,873
,791
,842
,904
,855
STATION VALUES
HIGHEST
6,4
4,1
5,5
8,0
8,5
6,1
31,4
18,1
33,2
,206
,252
,045
8,7
28,9
8,4
10,1
7,6
5,0
4,2
3,9
5,4
2 HI
3,9
2,2
3,1
-3,1
-11,4
-7,0
6,7
5,0
17,6
,018
,051
,015
,96
1,7
1,3
2,4
3,9
2,9
2,3
2,4
3,0
3 HI
3,8
1,5
3,1
-3,6
-11,5
-7,2
,44
4,6
17,3
,016
,048
,012
,77
,94
1,1
2,3
2,7
2,7
2,3
1,9
2,2
3 LO
3,3
,95
2,7
-4,3
-12,3
-8,3
-5,1
2,1
13,4
,006
,028
,003
,21
,43
,28
1,9
2,1
2,1
2,0
1,7
1,9
2 LO
3,2
,84
2,3
-4,4
-12,4
-8,4
-5,2
,82
10,6
,002
,026
,002
,17
,17
,21
1,8
2,0
2,1
1,9
1,7
1,7
LOWEST
3,1
,49
1,9
-4,9
-12,6
-8,5
-7,8
-2,9
6,9
,002
,015
,002
,12
,17
,14
1,7
1,9
2,0
1,7
1,3
1,4
II
RATIO ONLY USED TO FLAG HIGH VALUES,
                                    70
-------
CLUSTER ANALYSIS AS A DATA VALIDATION
              TECHNIQUE
                  by
          Harold L. Crutcher
              (Consultant)
           35  Westall  Avenue
    Asheville, North  Carolina   28804
                   71
-------
               CLUSTER ANALYSIS AS A DATA VALIDATION TECHNIQUE
                               H.L.  Crutcher
                               INTRODUCTION

     In any study the collection,  processing, and storage of data are fun-
damental.  Contaminated, adulterated, or "noisy" data confuse the investi-
gator.  Data do not necessarily fall into neat categories.   Usually there
are mixtures.  Some of these are determinant; some are not.
     There are many techniques used to cluster and to classify data.   This
paper discusses one.  This technique separates mixed data sets into subsets.
Each subset will exhibit homogeneous characteristics.  The investigator can
assess the relative importance of the subsets and the nature of the subsets.
Outlying subsets may indicate anomalous true conditions or may indicate mal-
functioning of some part of the observational program.*  Thus some idea of
data quality may be obtained.
     The techniques used here require the assumption of the normality of
distribution of the data.  If the data are not normally distributed, then
some transformation to approximate normality should be made.  The loga-
rithmic transformation is often used.  Where this is known to be inapplica-
ble, then another transformation is needed.  For example, cloud cover is not
well represented by the normal nor the log-normal distribution.
     The clustering program discussed here was initially developed by Wolfe
(1) and modified by Crutcher and Joiner (2).  It will accept any input data.
However, the criteria selected by the user to enable the computer to make
decisions are based on the assumption of normality of distribution.  Any de-
parture from this assumption introduces some uncertainty in the results.
  In particular, the outlying subsets may be examined for their validity.
  The minimum number in a set is determined by the number of elements being
  examined simultaneously.  With five elements, the minimum subset will be
  one more than five, or six.

                                     72
-------
There is always enough uncertainty without introducing more.   For example,
although the 0.05 probability level is selected for decision, departure from
normality may actually cause the decision to be made at some  other level, but
this level will never be known.

                             ESSENTIAL PHILOSOPHY
     Many elements and many observations may be treated.  Computer capacity,
time, 'and money will be the controlling factor.  Within these constraints
the user may wish to randomly select a representative sample  of the data for
processing.
     Most investigators choose to standardize their data.  This produces
dimensionless numbers with means of zero and a variance of one for each el-
ement.  A mean of zero and a variance equal to the square root of n, the
number of variables, are obtained in the multivariate case.
     If the elements are uncorrelated and are homogeneous, a  spherical clus-
ter of data points is obtained.  If the data are correlated,  the original
element axes are rotated so that along the new axes obtained, the new com-
ponents are not correlated.  The new system will then be spherical in shape
if the data are homogeneous.

                                 CLUSTERING
     If the data are clustered, even though the data are standardized, tests
of normality will be rejected.  Therefore, the usual procedure is to cluster
data into probable groups.  Then null hypotheses are established to compare
two groups against one, three against two, and so on until the null hypothesis
is not rejected.
Initial
     The computer program may be set to establish any number of initial clus-
ters.  Here, the first 40 entry data serve to establish 40 clusters, but
arbitrary clusters could have been inserted.
     The number of elements is n so each datum is an n-vector with its point
in n-space.  The 40 clusters represent 40 centroidal points in n-space.  The
distances between the centroids are computed.  The two closest are merged to
a new centroid which is a mean or average of the two.  After merging, there
                                     73
-------
are now 39 clusters.   A new datum enters from storage to again fill  out the
40 spaces reserved.  This procedure is repeated over and over again until all
data have entered and have been assigned to one of the clusters.  Variance
considerations or other distance measurements as well as distances between
the centroids can be used.
Intermediate
     Forty clusters were obtained initially.  These forty clusters are com-
pared on an argument of the distance between centroids or on variance consid-
erations as before.  The two most nearly alike are merged into one cluster.
The procedure continues until one final cluster remains.
     Figure 1, which came from Figure 1 Crutcher and Joiner (3), illustrates
in an abbreviated way the flow of 74 observations until final coalescence in-
to the final group.  The final group is made up of the initial data but the
sequence is altered to show the entrance into the group.  The data are 4-space
upper air observations at the Canton Island 30-mb surface (1960-1964).  The
four elements of the subspace at the 30-mb surface are:
     1.  Height of the surface;
     2.  Temperature of the surface;
     3.  East-west wind component;
     4.  North-south wind component (orthogonal to east-west component).
Final
     The user is required to provide the number of clusters wanted for review
and the probability level of rejection for the null hypothesis.  The sequen-
tial tests are for k+1 groups versus k groups where k runs from 1 to 40.  The
tests will continue until the null hypothesis in not rejected or until the
requested number of clusters have been examined.
Output
     At the completion of the computer program, output is presented as:
     1.  The initial set of data in some sequence established by the use.
     2.  Matrix of observational setup and preliminary comparisons.
     3.  Forty groups  (clusters) with identified input data and means.
     4.  Coalescence,  step by step, into fewer and fewer groups until the
         unlike final  group is obtained.
     5.  Statistics with means and standard deviations for the main group
                                      74
-------
;|
1
1
1
1
1

1
1
IN i
1
$
R
r- 1
1
1
1
1
1
"1
"1
»•
J
1

I




1
"!
1
1
i


s '
!-J
1 !
i
i
r i
r'~ i
i"
1"*
T—
r
I""*-"
r
r—
TS- -
7s
r
r
7 ~ "
r
r'"
r
\
r
\"""
r
7
i
i1
r
i •
i
i
r
i

l
i
r
r
r
r
i
r
j-. .




"""<_


	
n

; 	
";
RSRS~



.....
1






"V

1
*



l-I
\


1 	

i





"1
sss —
"1
"~~l
r* in f- r-

*




___
1


*
™"






}
— m


t
1


-«a-t



-^
(^^ ^•••^^«

















	 1
4








1





1
a;KPS
^^•••^•™







^


















[




'I
* * *
"*"""
? J7?5 '

* 4

_, i
ss t

*
"««



~1




«


_ —
{
f


""•1

1 •"







1
t

ssss



"}
*
S2£ S*






i


—

** •"


i
i



















e» £ »5


*
SS=;SS
• "SSSS






— 1



" "
f
2aa™2
...



1
sjsas
*

SSKPS
S J^?'


T
i



S5SS
+
g»^«5



SS^SS
NSSSJ •













1




















1



SXSK
M ifi t~ r- «
gasss
S3SRS
""""°~
ss-ss
f
"SSSI
" "
"
1 ™SS"S I

                                                CO
                                                O
                                                CO-
                                                Q  I
                                                Z O
                                                «q; 10
                                                _l CD
                                                CO ,—
                                                Z 3
                                                 i—i
                                                O DC
                                                U-  «

                                                s: i—
                                               <£ O
                                               i—' ex
                                               o S

                                               C3 O
                                               QC <
                                               UJ Z
                                               I— O
                                               oo (vi
                                               O UJ
                                                  QC
                                               o
                                               UJ Q_


                                               CJ UJ
                                               00 I—
                                               UJ
                                               cc
75
-------
         and each  subset.   Each  set  comparison  has  the  actual probability
         level  printed.   An option in  the  program permits either the  selec-
         tion of the eigenvector-eigenvalue  output  or the correlation matrix
         output.
     6.   Discriminant function  scores  for  each  datum.
     7.   In case the eigenvector-eigenvalue  output  is selected, computer
         print-plots of  discriminant scores  are shown with  appropriate
         assignment of each datum to a cluster.  The clusters are  numbered.
     8.   The final printing shows the data in order of  the  input  identified
         by the cluster  configuration assignment.   This permits easy  review
         of the classification  of each individual datum.

                                 EXAMPLES
     Table 1 is taken from Table 5,  Crutcher and Joiner (3).  The  statistics
are for two clusters derived from a  January  data set, Canton  Island 30  mb
surface data, 1953-67.  There are 434 data.   The set  is separated  into  two
clusters which comprise  31 and 69 percent  of the total  set.  The  zonal  com-
ponent of the wind speed, which is an average of -6.4 m/s,  is  separated into
two groups whose means are -28.1 m/s and 3.5 m/s.
     Figure 2 is taken from Figure  3, Crutcher  and  Joiner (3).  The figure
exemplifies the separation of the Canton Island January data  in the 2-space
of the orthogonal  components of the  wind.   The  mean height  and temperature
data are shown. The variances may  be compared  with data of Table 1.

                               APPLICATIONS
     Data of any type may be examined to determine  whether  there  are reason-
able subsets of homogeneous characteristics.  As the standardization tech-
niques remove the  dimensionality of the data, i.e., degrees,  mph,  meters,
grams, etc., any measurements may be used.  Thus,  application may be made  to
environmental data ensembles which  include measurements of  elements such as
particulates, pollutants  (gaseous),  precipitation,  wind, temperatures,  pres-
sures, or changes   of any of the above.
     Pollutant source or  likely deposition areas may be identified or sug-
gested.  Extension from one observational  point to  several  will  permit
                                     76
-------
TABLE 1. CANTON ISLAND, 30-MB DATA AND THEIR
         SEPARATION INTO TWO CLUSTERS, 1953-67

Data fraction
H (gpm)
ft (gpm)
T (°C)
*• (°Q
« (m s"1)
su (m s"1)
V (m s"1)
s, (m s-1)
rm


Data fraction
# (gpm)
j* (gpm)
f (°C)
*• (°C)
i* (m s ')
ju (m s"1)
0 (m s~l)
s, (m s-1)
rut
Group 1
(total)
1.000
23764.9
94.9
-57.0
2.7
- 6.4
16.2
0.4
4.3
- 0.1


1.000
23938.0
94.0
-53.6
3.1
- 4.3
19.6
0.1
3.9
- 0.0
January
TV = 434
Group 2
(easterly)
0.310
23720.6
65.6
-57.4
2.3
-28.1
5.8
1.1
4.7
0.2
July
TV = 558
0.470
23879.6
78.5
-55.4
2.2
-23.4
5.8
0.3
3.3
0.1
Group 3
(westerly)
0.690
23785.0
100.0
-56.9
2.9
3.5
7.5
0.0
4.1
0.1


0.530
23989.7
75.6
-52.1
3.0
12.6
9.3
- 0.2
4.3
0.1
Group 1
(total)
1.000
23801.3
81.8
-54.9
3.0
- 4.6
16.7
- 0.0
4.0
0.0


1.000
23886.6
104.4
-55.1
2.8
- 3.8
19.7
0.2
4.2
0.0
April
# = 509
Group 2
(easterly)
0.427
23773.8
71.0
-56.1
2.7
-20.9
10.7
- 0.2
3.7
0.0
October
#=476
O.?30
23829.3
67.6
-55.9.
2.2
-30.2
4.7
0.0
3.7
- 0.1
Group 3
(westerly)
0.573
23821.9
84.5
-54.0
3.0
7.6
7.6
0.1
4.2
0.0


0.670
23915.0
108.4
-54.7
2.9
9.3
7.3
0.3
4.4
0.0
                      77
-------
      0
     I o
     j. «
"5
r. >
n u
IS!
o 4
                a
                z

                i
                                      Si
                                      Z I
                                      <
                                      o
§«r- .„
-< « 10 «

u

ii
2

                                                             t- 10 ui W

                                                              TTT
P NO. H

23887

23829

23915
.SO

UP
O.

GRO

1

2

3
                                    /^
                                                                         CQ
                                                                o

                                                                CO

                                                                
                                                     o» • ffi
                                                    .j n n n
                                                    O ft ft fi
                                                    Z

                                                   i o.


                                           O^Nmoc^O^Mn


                                           (9      o.   o
                                                          ,-s
                                                                 -
                                                                         >-
                                                                         a:
                                                                         C\J

                                                                         UJ
                                                                         oc.


                                                                         cs
                            78
-------
mapping in a topographical sense.  These techniques can be applied to data
sets derived from topographical studies made by use of polynomials, trigono-
metric, polynomial orthogonal, or other types of polynomials.  Recent com-
puting advances in the computation of orthogonal polynomials known as asym-
metric singular decomposition (ASD) procedures make these polynomials easier
to obtain and use.

                                 REFERENCES
1. Wolfe, J.H. NORMIX 360 Computer Program.  Research Memorandum SRM 72-4,
   Naval Personnel and Training Research Laboratory, San Diego, CA (1971)
   125 pp.
2. Crutcher, H.L. and Joiner, R.L.  Separation of Mixed Data Sets into Homo-
   geneous Sets.  NOAA Technical Report EDS 19, National Oceanic and Atmos-
   pheric Administration, Asheville, North Carolina 28804 (1977) 165 pp.
3. Crutcher, H.L. and Joiner, R.L.  Another Look at the Upper Winds of the
   Tropics. J. Applied Meteorology, 16 (5), (May 1977) pp462-476.
                                      79
-------
ENGINEERING COMPUTATIONS AND DATA COLLECTION
     FORMATS USEFUL IN DATA VALIDATION
                     by
            A. Carl Nelson, Jr.
      PEDCo  Environmental,  Incorporated
           505 South Duke Street
        Durham, North Carolina  27701
                      81
-------
       ENGINEERING COMPUTATIONS AND  DATA COLLECTION
            FORMATS USEFUL  IN  DATA VALIDATION
                      A.C.  Nelson, Jr.

          A considerable number of "after-the-fact" data
validation techniques will be given during this one-day
conference.  It is not the attempt here to try to summarize
all of these techniques but to indicate some of the impor-
tant validation procedures from the view point of the
laboratory and field experts.  The approach taken is as
follows.  The question was asked of laboratory and field
experts:  What are some of the important areas of data
validation in order to yield data of good quality?  Some of
the areas listed are briefly described below for both ambient
air monitoring and source testing.
Data Validation - Ambient Air Monitoring Data
     1.   Audits
               The purpose of audits should not be to point
          a finger at the organization/team being audited.
          In some cases the auditor can be in error.  The
          value of an audit is that it can identify a gross
          bias or  inaccuracy  in reported data.  One recent
          example of a problem was  in the use of an incor-
          rect method.  The audit pointed out the problem,
          special  instruction was given, and the condition
          was hopefully corrected.  It is possible that no
          after-the-fact data validation techniques could
          have identified the error in this case since there
          was a bias throughout the region as the same
          procedure was taught to all operators.  The audit
          has served its purpose well in this example.

                               82
-------
In EPA sponsored audits the auditor is almost
always checked out at EPA, Quality Assurance
Branch (QAB) prior to conducting a field audit.
This provides a traceability to a common standard
and method.
Knowledge of the instrument
     This is certainly one of. the most important
considerations in obtaining good quality data.
The operator must know the sensitivities/inter-
ferences of the instrument.  An example would be
the interference of CO- concentration on an S02
analyzer employing a Flame Photometric Detector.
A test was designed to test the possible sensi-
tivity and the results indicated a definite and
reproducible relationship.
     This particular type of error detected for a
particular analyzer could be very difficult to
detect by after-the-fact validation procedures.
Some independent and accurate check must be made
using an instrument which has been tested for
possible interferences/sensitivities.  If the
sensitivity of an instrument to an interference has
been precisely determined and is reproducible,
then a correction can be made to obtain the result
which would have been observed if no sensitivity
existed, this was true in the case mentioned.
     Another means of gaining information about
the instrument/method is to design a ruggedness
test to check out possible gross factors/steps
which may have a significant effect on the results
or measurements if appropriate control is not
exercised.
                   83
-------
Interlaboratory tests
     Participation in these tests provides a means
of checking the laboratory analysis methods and of
validating the current multipoint calibration
curve.  The feedback of information from the
laboratory performing the overall analysis of the
results from all participating laboratories is
most important.  For example, if a laboratory is
consistently in error for a particular analysis or
range of concentrations, then this laboratory must
have some means of correcting this problem through
communication with the overall test laboratory or
some representative thereof.
     The performance survey is a very good means
of validating data.  Furthermore, it is also a
good source of information about what can be
expected from a particular analysis procedure.
     In a conversation with a supervisor in one
laboratory, he indicated that they were performing
their CO analysis incorrectly and that the per-
formance survey helped them to identify a problem
which they did not know they had.
     On a small scale, three or four laboratories
could set up their own interlab test for a par-
ticular analysis for which no performance survey
data are being obtained by EPA, NIOSH or some
other agency.
Standards traceable to an NBS standard
     Often one hears that calibration gases are
not accurately analyzed.  It is thus necessary
that the user check the calibration gases prior to
their use in developing new calibration curves.
All measurements must be traceable ultimately to a
primary standard.

                      84
-------
               Significant  errors  in  calibration  gases  can
          usually  be  determined  by a  check  against  the  pre-
          vious calibration curve  obtained  using  the  most
          recent gas.   EPA,  QAB  has developed a protocol  for
          traceablity of gases.
     5.    Data reduction
               The raw data must be recorded  legibly  and
          completely  on appropriate data formats.   The
          calculations should be checked either completely
          or on a  sampling  basis.   In this  manner the equations
          used, the substitution of the correct values  in
          these equations,  and the calculated results are
          all checked.  This should be an internal  audit  as
          well as  a part of an external audit.
     6.    Other considerations
               Some other considerations are  the  more routine
          types of quality  control and assurance  techniques
          which are primarily internal functions.   Some of
          these techniques  are the use of blind reference
          samples, quality  control limits for internal
          checks of reference samples, comparison checks  of
          two or more calibrators, ruggedness tests,  inter-
          nal audits  by an independent operator,  and  chain
          of custody  procedures.

Data Validation -  Source Tests
     The results of the first series  of collaborative source
tests clearly showed  that more quality control and data
validations were needed to ensure  good quality data.  The
results of the first  Method 5; particulate collaborative
test, using average testing teams  and no special  quality
control, produced  a relative standard deviation for each run
in excess of 50%,  with the outliers thrown out.  As a result
                              85
-------
of this poor reproducibility,  several quality control and
data checks were incorporated  into the collaborative test
series.  These controls and the use of selected testing
firms produced results that were repeatable to within about
10 percent for each run.  Most of these additional quality
control checks are now detailed or implied in the revised
methods contained in the Federal Register, August 18, 1977.
The collaborative test series  showed two other areas of
concern with respect to quality control and data validation.
The first was that the methods and additional written proce-
dures were thought to be clearly written as to their exe-
cution.  This was found not to be the case in the early
collaborative tests as many variations were noted in the
performance of the methods.  It became obvious early in the
program that the performance of the average testing team
should be observed by a qualified observer.  Also the nature
of most of the errors were such that they could not or would
not have been detected by any data validations on the emis-
sion test report.
     The second area of concern was that most of the quality
control techniques were executed prior to the performance of
the field test and the assumption was made that all com-
ponents remained unchanged during testing.  Two examples are
dry gas meter calibration and the pretest leak check.  If
the dry gas meter calibration changed during testing or if  a
leak developed in the sampling train, this would not be
detected.
     The collaborative testing program has clearly demon-
strated that to properly perform the needed data validations,
controls must be clearly defined and observed before, during
and after each test series.  Data validation of uncontrolled
and unobserved sampling is not effective as a general rule
and usually will not clearly determine acceptability or
unacceptability of data.
                              86
-------
     The revised method published August 18, 1977 contains
many equipment performance calibrations and validations.
The examples used before, a dry gas meter calibration and
leak check only prior to testing, have now been changed to
include a post-test leak check and a post-test meter cali-
bration.
     The best method for data validation has become equipment
performance validation.  If the equipment is operating
properly the data should be accurate/precise within the
determined limits for that method.

Source Test/ Report Review
     One aspect of source testing report review involves
checking the results, not only that the correct equations
were used and there were no mathematical errors, but also
that the correct values were used as inputs into the equations,
The latter requirement can be checked quickly if all of the
required data were measured and recorded legibly, and the
raw data sheets are submitted with the report.  Any report
which includes a computer listing of the raw data, instead
of the original data sheets, should be rejected.
     The degree to which the calculations should be checked
is generally a function of the consistency of the results
and the reviewer's confidence in the tester's ability.  The
various levels of review possible for the calculations would
be  (1) none at all,  (2) random spot checks,  (3) complete
review of results which seem inconsistent, with respect to
each other or to typical results,  (4) complete review of one
randomly chosen run, and  (5) complete review of all runs.
     There are some empirical techniques that can be used to
check or validate process and sampling data provided by the
tester and the source.  In some cases, the sampling data
from the tester can be used to check process data supplied
  DSSE Workshop  -  draft  report.
                             87
-------
by the source.  Some of the available techniques are given
herein.  The experienced reviewer will ultimately develop
his own list of short cuts, cross checks, and rules of
thumb.

     1.   Barometeric Pressure
               Incorrect barometric pressure measurement
          will not generally cause errors of more than 10 to
          15 percent, but it is a very common error.  The
          value reported by the tester can be checked in two
          separate ways:   (1) At sea level, the barometric
          pressure is almost always between 29 and 31 inches
          of mercury, and usually close to 30.  For every
          1000 feet above sea level, the value will decrease
          by  1.1 in. Hg.  Therefore, if a test is run in
          Denver, with an elevation of 5000 feet above sea
          level, the barometric pressure reported should be
          from 23.5 to 25.5  inches of mercury.   (2) The
          reviewer can call  the airport closest to test
          site, and ask for  the "station" pressure  (not
          corrected to sea level) for the date of the test.

     2.   Leak Tests
               If the report claims that leak tests were
          performed either before each test or after filter
          changes, the dry gas meter readings on the data
          sheet would indicate this.  In other words, it is
          unlikely that a  leak test was done before run #2
          if  the final volume reading for run #1 is the same
          as  the initial volume reading on run #2.  If a
          leak test was made in the middle of the run  (because
          of  a filter change, for example), the volume
          readings before  and after the  leak test would be
-------
         shown on the data sheet,  so that the computed
         meter volume could be adjusted accordingly.

         Moisture Data
              The results presented in the report for the
         volume percent of water vapor in the gases sampled
         can be checked in several different ways.   For any
         combustion source, the moisture content can be
         approximated by use of nomographs  if the reviewer
         calculates the excess air and can estimate the
         ambient temperature, ambient humidity, and the
         free water in the fuel.  Hopefully, the process
         data will include an analysis of the fuel.  If
         not, use zero for gas and oil, 10 percent for
         bituminous coal, and 25 percent for lignite, bark,
         wood, and refuse unless the fuel has been rained
         on recently.  If the best estimates available are
         ranges, use the high and low estimates to bracket
         the moisture content.
              Entrained droplets of liquid water in the
         stack gases can yield an erroneously high moisture
         content.  All moisture data should be checked
         (even if there are no entrained water droplets) to
         ensure that the reported value is not higher than
         the saturation moisture content.  Nomographs
         provide moisture content at saturation as a function
         of stack absolute pressure and stack gas temperature.
         If the reported value is higher than the maximum
         read from the nomograph, the data are suspect.
         Generally, if the high reading was caused by
         entrained water droplets, the value is adjusted to
         the saturation moisture content.
DSSE Workshop - draft report.
                            89
-------
              In  sources where the process  involves drying
         (removing water) from a raw material or product, a
         water balance  across the process should validate
         the moisture data  in the report.   Remember to
         include  the water  introduced  as humidity  in  the
         ambient  air.

    4.    Orsat Data
              For any combustion source, the  relative
         amounts  of oxygen  and carbon  dioxide in the  flue
         gases can be predicted by the use  of a nomograph.
         When  a report  is submitted containing orsat  data
         (or C02  and 02 data from any  other instrument),
         the data can be checked by aligning  the type of
         fuel  with the  %C02/ and checking  the %G>2  from the
         nomograph with the reported value.  If the results
         do not check,  it indicates that  there is  a problem
         with  the reported  data.
                                                            i1
              This nomograph also gives the percent excess
         air based on  the type  of fuel and orsat  analysis.
         The reviewer  should be  cautioned  that if  the orsat
         data  were taken after  a water scrubber,  the  nomo-
         graph will  not work,  since  the scrubber  will
         remove  an  indeterminate  amount of carbon  dioxide.

    5.    Volumetric  Flow Rate Data
              The volumetric flow rate is  difficult  to
         cross-check accurately,  but  there are several ways
         of determining if  the  reported values are in the
         "ball-park".   In  any duct  or  stack where  the air
         is moved by a blower,  the  design criteria generally
         result  in  a gas velocity of  25-40 feet per  second.
DSSE Workshop - draft report.
                            90
-------
The idea is that higher velocities cause prohi-
bitive pressure losses, and lower velocities are
uneconomical due to the cost of the duct work.
Since the size of many stacks is dependent on
structural strength or future needs, the check
works best for the duct work leading to the stack.
If the velocity measurements are made in the
stack, and the stack cross-sectional area is much
larger than that of the duct work, apply the
25-40 feet per second check by dividing the
volumetric flow rate by the duct area.  If there
is no fan or blower in the process, such as with a
natural draft boiler or incinerator, the flow will
generally be 5-15 feet per second.  Keep in mind
that the ranges given here are not theoretical
limits, but merely commonly encountered values.
If the test results presented do not fall within
these ranges, it is only a signal to look at the
velocity data more closely.
     In reviewing test results, it is always
desirable to have available the results from any
previous tests on the same source, previous tests
on any similar sources (such as an identical unit
at the same plant), or tests performed at the
inlet to the control device.  If the inlet tests
were done simultaneously with the outlet tests,
the volumetric flow rates  (corrected to standard
conditions) should match from inlet to outlet.  If
the control device uses water, the checks should
be made on a dry basis.  Air leakage in or out of
the control device can occur, which would lessen
the value of this check, but air leakage can
generally be identified by a change in the moisture,
                     91
-------
     temperature, or C02 content  from  inlet to outlet.
     Since  inlet tests are not usually done for com-
     pliance,  they  are often  performed in  ducts with
     little or no straight run, which  can  cause higher
     than real velocity data, a factor to  consider  when
     making inlet-outlet comparisons.
         Many sources have fan performance curves  for
     the fans  used  in the process,  and these  can  be
     used as a check against  the  reported  flow rate
     data.   The gas flow moved by the  fan  is  a function
     of the pressure head produced (or induced, or
     both), the gas temperature,  the gas composition,
     and the fan speed  (rpm).  Unless  all  of  these
     factors are controlled or quantified  (which  is a
     rare  situation) the  fan  curves can  only  be used to
     estimate  or roughly  check the flow  rates.
         When process equipment  and/or  control devices
     are designed,  there  is generally  a  design  speci-
     fication  on volumetric  flow  rate.  If these  speci-
     fications are  available  (from the source or  from
     permit forms)  they  can be used to check  the  tester's
     results.

6.   Process Data
          There are probably  as many different ways to
     check process  data  as  there  are types of processes.
     Some  are  so much  a  part  of  a particular  process
     that  they could not all  be discussed  here.   There-
     fore,  if  the  checks mentioned herein  are not
     adequate  for  the  process in  question, then that
     process should be  studied  (using  the  literature
     and  communications  with  the  source) to determine
     if some additional  checks  are available  for  use.
                       92
-------
     For many processes, the production rate is
relatively constant from day to day.  In this
case, the production rate reported should compare
favorably with the annual production rate (or
annual raw material usage rate) divided by the
number of operating days.
     In a case where the reviewer wants to compute
the production rate from the raw material rate, or
compute the raw material rate to check the produc-
tion rate, the principle of material balances
should be employed.  If one ignores nuclear
reactions, then it can be stated that in any
process, matter will be neither destroyed nor
created.  This means that any materials entering
the process must either accumulate or leave the
process (in minus out equals accumulation).  The
material balance can be done on all components of
the process stream, or it can be limited to a
single component such as water or carbon dioxide.
     Drying operations, like grain dryers, are
good examples of sources which adapt readily to a
water balance.  Water enters the process from the
grain itself, from the drying air (which is
generally ambient air), and from the combustion of
fuels containing hydrogen; it leaves as water
vapor in the exhaust and as residual water in the
grain.  The stack test data provide the total
water vapor leaving the dryer, by multiplying the
total gas flow rate by the percent water vapor,
and converting the result to a mass rate.  From
the amount of fuel burned, one can compute the
water vapor produced by the combustion.  If the
                    93
-------
     ambient  temperature  and  relative  humidity  are
     known, the water  supplied  by  the  drying  air  can  be
     computed.  From the  symbols shown in  Figure  1, the
BM..T
C-

from ambient air
Wcl LUJL JL L UIll yiilill •
water from
combustion
DRYER
—water vapor 	
residual water
in grain
                                                    > D
            Figure 1.   MATERIAL BALANCE
     water balance would be:
               A+B+C=D+E,
      and A,  C, and D have been computed.   If F is the
     weight of grain dried, W is the inlet moisture
     fraction for the grain,  and W  is the outlet
     moisture fraction for the grain, then
               B = WF, and E = W'F.
     Substituting these expressions  in the above equation
     and solving for F yields:
                   D-A-C
               TJI _
                    w-w
7.    Emission Results
          Unfortunately, the most difficult data to
     validate are the emission results, which also are
     the most important data to validate.  Emission
     rates for gaseous pollutants,  such as SO9 and NO ,
                                             f-f       2C
     can often be checked against process parameters,
     but this is because these pollutants are rarely
     controlled.  For example, since essentially all
     the sulfur present in coal or oil will be liber-
     ated as S02 during combustion, a sulfur balance
                        94
-------
          should  yield  a  good  check  of  SO-  emission results.
          For  a, specific  design  of boiler or  incinerator,
          the  amount  of NO   produced can be estimated  from
                         x                      2
          the  emission  factor, published by EPA.
              The  generation  of particulate  pollutants  is a
          function  of a large  number of process  parameters,
          many of which cannot be measured.   Emission  factors
          are  available for  many sources of particulates,
          but  most  of these  sources  use some  type  of control
          device  to remove the bulk  of  particulates prior  to
          the  stack exhaust.   As an  example,  consider  the
          particulate emissions  from a  utility boiler  with
          an electrostatic precipitator, or from an asphalt
          batch with  a  baghouse  collector.  The  emission
          factor  for  these sources,  prior to  the control
          device, are listed in  the  emission  factor book and
          the  literature, and  for now it will be assumed
          that they are accurate. The  control devices that
          are  used  would  have  a  design  efficiency of 99  to
          99.5 percent, but  the  actual  efficiency could
          range  from  50 to  99.9  percent.  What this is
          saying  is that  if  the  uncontrolled  emissions are
          100  pounds per hour,  the design efficiency would
          yield  emissions of 0.5 to  1.0 pounds per hour, but
          the  actual  emissions could range  from  0.1 to 50
          pounds  per  hour.   The  emission  factor  book lists
          factors for various  types  of  control devices,  but
          these  are design efficiencies,  and  the reviewer
          should  resist paying much  attention to them.  They
          only reflect  the emissions if the control equip-
          ment is operating  at its design efficiency,  and  if
          that is assumed to be  true, then  there is no need
          to perform  a  compliance test  in the first place.
2
 "Compilation of Air Pollution Emission Factors", U.S. EPA,
  Publication No. AP-42.
                             95
-------
     One approach which is often suggested (and
used)  by control agencies is the idea of comparing
the three runs to one another.  In other words,
the validity of the data can be measured by the
proximity of the three results to the average.
This would work if all of the variation in the
results was a function of random sampling errors.
Using these assumptions data could be handled as
in the following examples:  (1) the three results
are 2, 3, and 4, and the reported emission rate is
3, and  (2) the three results are 2, 4, and 15, the
15 value is thrown out as an outlier, and 3 is
reported as the emission rate.  The second example
says that since 2 and 4 are close to one another,
and 15 is not close to 2 or 4, that the 15 must
represent a gross sampling error, and only the 2
and 4 should be averaged together to get the
emission rate.
     Several additional considerations should
discourage the reviewer from applying this vali-
dation technique.  There is no question that three
nearly  identical results will  instill confidence
in the reviewer's mind, and that three widely
different results will reduce  that confidence.
Using the example above, however, with the results
of 2, 4, and 15, how can the observer tell what
the rest of the  "population"  looks like?  Had  four
samples been taken instead of  three, with results
of 2, 4,  15, and 15 and the last three were
reported  instead of the first  three, the 4 would
be thrown out as an outlier and 15 would have  been
reported  as the emission rate.  Process variations
can occur during testing that  could  produce ten-
to-one  variations in the actual emission rates,
                    96
-------
          and these variations can occur at any time,
          without any warning, and often without being
          noticed.
               As a final note, any statistician can supply
          dozens of methods for evaluating a set of results,
          including ways to calculate confidence limits and
          eliminate outliers.  Any statistician will also
          tell you, however, that a single set of three
          results is really too small to study statistically.
          And all the statistics in the world cannot replace
          common sense.

Summary
     In summary, data validation must be an integral part of
the data collection, analysis, reduction, and reporting
process.  Several useful data validation techniques which
will aid in detecting large inaccuracies in the reported
results are described.  However, there are obvious limitations
to the types of data inaccuracies which can be identified in
both ambient and source tests as pointed out in this paper.
It is hoped that the techniques suggested herein, from the
viewpoint of laboratory and field experts, will stimulate
further discussion.

Acknowledgement
     Lawrence Elfers and William DeWees of PEDCo Environmental
provided much of the information which is briefly summarized
herein.  "In addition, appreciation is due to Entropy Environ-
mentalists, Incorporated because the draft copy of the Division
of Stationary Source Enforcement Workshop contains information
provided by this organization."  I hope that this paper does not
misinterpret their written and verbal suggestions.   These inputs
are greatly appreciated.
                               97
-------
  VALIDATION  PROCEDURES  APPLIED  TO  IN-USE
        MOTOR VEHICLE  EMISSION DATA
                     by
             Marcia E.  Williams
Office of Mobile Source Air Pollution Control
    U.S. Environmental Protection Agency
         Ann Arbor, Michigan  48105
                      99
-------
                VALIDATION PROCEDURES APPLIED TO IN-USE

                     MOTOR VEHICLE EMISSION DATA

                            M.E. Williams
                               ABSTRACT
     One of the functions of the Office of Mobile Source Air Pollu-
tion Control  is the collection and subsequent assessment of data on
the emission performance of in-use vehicles.   On an annual  basis,
over 4.5 million fields of data are collected.   These data  must be
carefully validated before they are used by EPA, by other government
agencies, and by private citizens.  The current data editing proce-
dures are designed to be fairly routine but quite complete.  Systematic
problems are eliminated with thorough laboratory facility check-
outs, frequent calibration checks, and correlation programs with the
EPA laboratory.

     The data editing procedure is divided into two parts - manual
editing of the large amount of supporting data forms and strip
charts and computer editing of all data cards.   This edit procedure
has detected error rates of from 14 to 32 percent manually and from
5 to 50% in the computerized phase.  Most of these errors are cor-
rectable and less than five percent of tests are invalidated.
Although some of the errors can be found in either the computer or
manual phase, many errors can only be detected in one of the two
phases.  To avoid needless effort, the phases are performed in
series rather than in parallel.

     Editing costs are less than two percent of the total program
cost and the current edit program is estimated to achieve a final
error rate of about one percent.  Future changes to the editing
procedure focus on reducing the EPA manpower requirements without
sacrificing the current quality level.  An effort will be undertaken
to determine how much effect various types of errors have on the
ultimate uses of the data so that the question of "How good do the
data have to be?"  can be factored into the design of data valida-
tion methodology.
                                 100
-------
                              BACKGROUND

     The Office of Mobile Source Air Pollution Control  is responsible
for generating a data base on the in-use emission and fuel  economy
performance of all mobile sources.   These data are used by many
groups within EPA as well as by other Federal  agencies, state and
local governments, private industry, and private citizens.   A fairly
complete list of uses for emission factor data is given in Table 1.
Different degrees of data accuracy are needed  for different data
applications.  Most applications are concerned with having an accurate
estimate of average emissions or fuel economy.  However, those items
in Table 1  which are notated with an asterisk  require that the data
on every vehicle be completely accurate.  At this point in time, the
data edit procedure is geared to ensure that all fields of data are
correct.

     Table 2 provides a list of typical ongoing test programs.  On
an annual basis, OMSAPC spends between 2.5 and 4.5 million contract
dollars on characterizing the performance of in-use vehicles.
Typical ongoing test programs are listed in Table 2.  Each test
program involves the procurement and subsequent testing of consumer
owned vehicles.  Vehicle owners are given incentives such as a U.S.
savings bond, a leaner car, and a free tank of gas to participate in
the EPA test program.  Vehicles are then tested over a variety of
different test sequences.  In some cases, entire test sequences are
repeated with vehicles in different states of  tune or with ambient
test conditions varied.  Table 3 lists the types of variables which
are collected for each vehicle test.  For each vehicle test sequence,
there are 150 to 600 pieces of information gathered.  For each
vehicle tested, there are from one to six test sequences performed.
Thus, on an annual basis, approximately 4.5 million fields of data
are collected and must be validated.

                           GENERAL APPROACH

     The data validation procedure begins with the assumption that
there is no systematic bias or error in the data.  Systematic errors
are prevented by the development of detailed test procedures, record-
keeping procedures, and mandatory recording formats.  Each contractor
must undergo a rigorous facility check-out at the beginning and end
of each test program.  The check-out includes  performance tests for
all contractor personnel including equipment operators, drivers,
test technicians, and data handlers.  In addition, EPA personnel
specify frequent calibration checks on all equipment, carry out


                                101
-------
reference gas and vehicle correlation testing against the EPA
laboratory in Ann Arbor,  and implement both announced and unan-
nounced contractor inspections throughout the duration of each test
program.  Table 4 details the procedures which are implemented to
prevent systematic bias.

     All EPA contractors  are required to perform data validation
before submitting any data to EPA.   EPA contracts specify that the
contractor must use some  form of computerized edit procedure in
addition to a manual procedure.  However, the exact contractor
procedure is not specified.  Since  contractors are not paid for
tests until EPA accepts the tests as valid, an incentive exists to
submit correct data as soon as possible after the completion of a
vehicle test.

     The EPA edit procedure is diagramed in Figure 1.  The manual
and computer aspects of data validation are carried out in series to
avoid needless manpower effort and  to ensure that at no time will
data files contain any data that are not validated.  The manual  edit
procedure is performed first and many of the steps in that procedure
are listed in Table 5.  The manual  procedure concentrates on strip
chart data including the driving trace and the emission concentrations.
Although there is a trend toward contractor computerization of these
items, a fully computerized data aquisition system is expensive and
is not required by EPA due to the short term (annual), fixed-price
nature of EPA contracts.   Thus, errors in following the appropriate
driving trace and properly zeroing and calibrating analyzers can
only be detected in the manual phase.

     Table 6 presents the types of checks which are performed in the
computer edit procedure.   As shown in Figure 1, the computer editing
does not occur until the manual edit checks indicate a potentially
valid test.  Table 7 summarizes the types and the severity of errors
which have been detected.  Tests are invalidated only in cases where
test procedure errors are uncovered or in cases where key data are
missing.  Table 8 lists typical reasons that complete test sequences
have been invalidated.

     The one type of error which the current edit procedure  is not
specifically designed to detect is discrepancies between the computer
cards and the supporting documentation.  If each computer card entry
is within range and consistent with other computer data fields,  it
will not be flagged.  For example, if a highway emission result  is
incorrectly keypunched as 4.52 instead of 4.92, it would not be
detected since both numbers could be equally valid.  One would have
to examine the analyzer trace and the data packet notation to know
which value was correct.  Since data are double keypunched,  these
                                   102
-------
types of errors are assumed to be minimal.   However,  consideration
is being given to the implementation of an  acceptance sampling
procedure to ensure that information on the data cards matches
information in the supporting documentation.

                         RESOURCES AND RESULTS

     Table 9 indicates the EPA detected error rates in three recent
test programs.  In each case, the error rate is the percentage of
vehicles with at least one detected error;   some vehicles may have
multiple errors.  The range of detected error rates clearly indicates
that not all contractors employ the same levels of quality control.
However, despite the high detected error rates, less  than five
percent of total tests are invalidated;  most errors  can be corrected.

     Table 10 summarizes the required EPA editing resources for two
recent test programs.  These resources are  examined as a fraction of
total contract cost in Table 11.   Assuming  that EPA manpower used to
perform data editing costs $20,000 per manyear, the EPA cost for
data validation is about two percent of total contract cost.  With
this resource effort, it is estimated that  the undetected error rate
is less than one percent.

                           FUTURE APPROACHES

     Table 12 lists a number of additional  data validation approaches.
More automated data aquisition is being implemented in one ongoing
contract.  The system has taken considerable time to debug and it is
too early to judge the cost-effectiveness of this approach.  In
recent contracts, EPA has increased the contractor data validation
requirements.  Again, it is too early to determine whether this
action will prove to be a cost-effective way to achieve a low final
error rate.  Finally, in one large test program, EPA has stationed
personnel at the contractor's site on a full time basis.  Again, the
effect on final error rate is not yet known.

     Table 12 lists three additional approaches which have not yet
been implemented.  If improved contractor error rates can be achieved,
EPA will attempt to reduce dedicated edit manpower by implementing a
general spot check procedure.  Such procedures are based on statistical
principles and one such procedure is outlined in detail in Tables 13
and 14.

     Before edit procedures can be implemented which attempt to
lower the cost of editing by applying statistical procedures to key
data fields, two important philisophical questions must be answered.
First, it must be determined how good the data have to be.  Variables
                                   103
-------
of maximal interest need to be specified and in each case,  the
confidence and range within which the variable needs to be  known
must be determined.  The second major area of uncertainty requires a
determination of the impact that various errors make on the vari-
ables of maximal interest.  Answers to these question areas are
currently being pursued so that statistical edit procedures can be
considered in more detail.
                                  104
-------
                              Table 1
                        USES FOR TEST DATA
 EPA
 1,   ASSESSMENT OF EMISSION AND DETERIORATION RATES FOR AP-42
     (HANDBOOK OF AIR POLLUTION EMISSION FACTORS)
 2,   DEVELOPMENT OF EMISSION AND FUEL ECONOMY CORRECTION FACTORS
     FOR AP-42
 3,   COMPARISON OF IN-USE LEVELS WITH CERTIFICATION. ASSEMBLY
     LINE LEVELS
 4,   DETERMINATION OF REASONS FOR POOR IN-USE VEHICLE PERFORMANCE
 5,   ASSESSMENT OF SHORT TEST/FTP CORRELATABILITY FOR ALT, 207(B)
     (APPLICABLE SECTION OF THE CLEAN AIR ACT;
 6,   ASSESSMENT OF INSPECTION/MAINTENANCE BENEFITS
*7,   EVALUATION OF IN-USE VEHICLE COMPLIANCE WITH STANDARDS -
     SUPPORT FOR AGENCY RECALL PROGRAM
*8,   COMPARISON OF PRODUCTION/PROTOTYPE FUEL ECONOMY LEVELS
 9,   SUPPORT FOR REGULATION DEVELOPMENT PACKAGES - ENVIRONMENTAL
     IMPACT ANALYSES
10,   PRIORITIZATION OF AGENCY REGULATION/COMPLIANCE PROGRAMS

 OTHER USERS
 1,   HIGHWAY ENVIRONMENTAL IMPACT STATEMENT (EIS) WORK
 2,   INDIRECT SOURCE REVIEW
 3,   REGION/STATE EMISSION INVENTORY WORK
 4,   EVALUATION OF IMPROVED PUBLIC TRANSPORTATION SYSTEMS
 5,   EVALUATION OF VEHICLE MILES TRAVELED (VMT) REDUCTION STRATEGIES
 6,   GENERAL TRANSPORTATION CONTROL PLAN (TCP) EVALUATION
 7,   FUEL AVAILABILITY STUDIES
 8,   AlR QUALITY MODEL INPUTS
 9,   STATE IMPLEMENTATION PLAN (SIP) CONFORMANCE WITH AMBIENT
     STANDARDS
10,   HEALTH ASSESSMENT STUDIES
                                105
-------
                           Table 2

                TYPICAL ONGOING TEST PROGRAMS

1,    ANNUAL IN-USE AUTOMOBILE TESTING PROGRAM

     A,    7 CITIES
     B,    WIDE RANGE OF MODEL-YEARS
     C,    LARGE NUMBER OF EMISSION TEST CONDITIONS
     D,    32000 VEHICLES PER YEAR

2,    ANNUAL AUTOMOBILE RESTORATIVE MAINTENANCE TESTING PROGRAM

     A,    3-4 CITIES
     B,    PRIMARILY NEW MODEL-YEAR VEHICLES
     C,    EXTENSIVE DIAGNOSTIC AND MAINTENANCE WORK PERFORMED
     D,    3400 VEHICLES PER YEAR

3,    IN-USE LIGHT DUTY TRUCK TESTING PROGRAM

     A,    MULTIPLE CITIES
     B,    WIDE RANGE OF MODEL-YEARS
     C,    LARGE NUMBER OF EMISSION TEST CONDITIONS
     D,    3200 VEHICLES PER YEAR

4,    IN-USE HEAVY DUTY TRUCK TESTING PROGRAM

     A,    SINGLE CITY
     B,    WIDE RANGE OF MODEL-YEARS
     C,    LARGE NUMBER OF EMISSION TEST CONDITIONS
     D,    3200 VEHICLES IN FY/8

5,    IN-USE MOTORCYCLE TESTING PROGRAM

    • A,    2 CITIES (HIGH AND LOW ALTITUDE)
     B,    WIDE RANGE OF MODEL-YEARS
     C,    LARGE NUMBER OF EMISSION TEST CONDITIONS
     D,    3250 VEHICLES IN CURRENT TEST PROGRAM

6,    INSPECTION/MAINTENANCE DEMO PROJECT

     A.    PORTLAND/ OREGON
     B,    1972-1977 MODELS
     C,    LARGE NUMBER OF EMISSION TEST CONDITIONS AND
          AMBIENT CONDITIONS
     D,    33000 VEHICLES AND 6000 TESTS

7,    OTHER SMALL TESTING PROGRAMS

     A,    MOPEDS
     B,    DIAL-A-RIDE BUSES
     C,    AMBIENT TEMPERATURE TESTING
     D,    GOOD TECHNOLOGY VEHICLES
                              106
-------
                           Table 3


         TYPES OF VARIABLES COLLECTED FOR EACH TEST

1,    IDENTIFICATION DATA
     MODEL YEAR
     MAKE
     MODEL
     ENGINE DISPLACEMENT
     CARBURETOR VENTURIS
     CATALYTIC CONVERTER
     AIR PUMP
     NUMBER OF CYLINDERS
     TRANSMISSION TYPE
     VEHICLE IDENTIFICATION NUMBER (VIN)
     ENGINE FAMILY CODE

2,    EMISSION DATA
     IDLE DATA (2 POLLUTANTS)
     SHORT TEST DATA (3 POLLUTANTS/ UP TO 5 TESTS)
     FTP DATA (4 POLLUTANTS/ 3 BAG VALUES/ COMPOSITE VALUE)
     OTHER CYCLES (4 POLLUTANTS)
     FUEL ECONOMY DATA
     EVAPORATIVE EMISSION DATA
     SULFATE EMISSION TESTS
     PARTICULATE EMISSION TESTS
     METHANE - NON-METHANE MEASUREMENTS
     MODAL TESTING

3,    AMBIENT CONDITIONS
     TEMPERATURE
     HUMIDITY
     BAROMETRIC PRESSURE

4,    TEST CONDITIONS
     ROAD LOAD HORSEPOWER
     INERTIA WEIGHT
     SOAK TIME
     PRE-CONDITIONING SCHEDULE

5,    PARAMETRIC DATA
     ENGINE IDLE SPEED
     ENGINE TIMING
     MANUFACTURER SPEC VALUES
     COMPLETE DIAGNOSTIC CHECKS (9 MAJOR VEHICLE SYSTEMS/
      5)10 COMPONENTS PER SYSTEM)
     TAMPERING DATA
     DRIVEABILITY DATA

6,    OWNER QUESTIONNAIRE
     NUMBER OF TRIPS PER DAY
     NUMBER OF MILES PER YEAR
     TYPE OF DRIVING
     TYPICAL PASSENGER LOADING
     LAST MAINTENANCE PERFORMED - TYPE AND COST
     FUEL ECONOMY ESTIMATE
     TYPE OF GASOLINE USED

                               107
-------
                         Table 4



            PROCEDURES TO PREVENT SYSTEMATIC BIAS



1,    SPECIFICATION OF GAS ANALYZER, DYNAMOMETER/ AND CVS
     EQUIPMENT (WITH PROVISIONS MADE FOR EQUIVALENT SUBSTITU-
     TIONS),

2,    EPA NAMES REFERENCE GASES,

3,    EPA SPECIFIES ANALYZER CALIBRATION PROCEDURES INCLUDING
     GAS TYPES, CYLINDER FITTINGS, IMPURITY LEVELS, CURVE-
     FITTING PROCEDURE, AND REQUIRED ACCURACY,

4,    SPAN GASES, SIMILAR TO CALIBRATION GASES, ARE SPECIFIED,

5,    SYSTEM PLUMBING MATERIALS ARE SPECIFIED,

6,    EQUIPMENT CHECKS ARE SPECIFIED
          DAILY LEAK CHECKS - CVS AND ANALYTICAL SYSTEM
          WEEKLY COMPLETE CURVE CHECK - ANALYTICAL SYSTEM
     A,

     c!    DYNAMOMETER~WARM-UP PROCEDURES
     D,    COMPLETE CURVE CHECKS AFTER ANY SYSTEM MAINTENANCE
     E,    DYNAMOMETER CALIBRATED BI-WEEKLY
     F,    SAMPLE BAGS LEAK CHECKED BEFORE EACH TEST
     G,    DAILY NOx ANALYZER CONVERTER EFFICIENCY TEST
     H,    COMPLETE DAILY LOGS OF ALL GASES, CALIBRATIONS,
          MAINTENANCE, ETC,

7,    MAXIMUM BACKGROUND LEVELS SPECIFIED,

8,    COMPLETE EPA FACILITY CHECK-OUT INCLUDING TESTS TO
     CONTRACTOR PERSONNEL,

9,    UNANNOUNCED EPA VISITS,

10,   CORRELATION TESTING WITH EPA LAB,
 i
 CONSTANT VOLUME SAMPLER
                             10£
-------
            Data Cards
           Arrive at EPA
                        Support Data
                        Arrive at EPA
                             Data Arrival  Logged
                               Into Record Book
                                     1
                                 Supporting
                               Data Screened
                                                Not
                                Additional
                                Data or
                                Clarification
                            Acceptable
Data Card
Corrections
Received
                Accept-
                able
   Call
Contractor
                                No Solution
                        Eliminate
                         Vehicle
    i
      Additional
      Data or
      Clarification
 Data Cards To
 Computer Group
   Call
Contractor
 Update
Log Book
                          Accept-
                          able
      No
      Solution
Eliminate
Vehicle
                                        Acceptable
Vehicle Added
   To File
                                     I
                                   Update
                                  Log Book
       Figure  1.   Flow  Diagram of  Edit  Procedure
                                     109
-------
1/1

 01
i-H
XI
     CO
2   co
^   W
(2   o
o   o
q   rf
a   &i
      o   w
      55   M
      H
      CO
O    CJ
H    <
O    O,
       *   3
       O   Q
       CO
       CO
                        CO
                        M
                        33
                        CO
                        o
                        ,-1
                     )-l CO
                     3   •
                     CO Q
                     X  HI
                     -o  u
                     0  0
                     (U  U
                     4-1  CO
                     CO
                 0
                 CO



X
X
-3
n
XI
•^s
X
X
X
X

X
X
^-N
to
ra
XI
v_x
X
X
X
X

X
X
U)
C8
XI
X
X
X
X
_
X
X
X
X
X
X

X
X
oo
rt
XI
v^
X
X
X
X

X
X
oo
rt
XI
^x
X
X
X
X
                                                                                                                            4J
                                                                                                                            01 4J
                                                                                                                            CJ 0)
                                                                                                                            4-1 CJ
                                                                                                                      rt  r-H
                                                                                                                     -o  rt
                                                                                                                      O  13
                                                                                                                      E  O

                                                                                                                     •H  E
                                                                                                                      O  >4H
                                                                                                                         O
                                                                                                                      C
                                                                                                                      O  Vi
                                                                                                                     •rl  0)
                                                                                                                      4-i  -a
                                                                                                                      Vl  C
                                                                                                                      o  -H
                                                                                                                     P-I  rt

                                                                                                                      I  CJ
                                                                                                                      OJ
                                                                                                                      4->
                                                                                                                      rt
•O  OJ
 0)  O
 o  a.
 a. co
CO

 S  00
 O  -rl
                                                                                                                     •a      i   i
                                                                                                                      rt >
                                                                                                                      oJ si  co co
                                                                                                                      4-1 3  O M
                                                                                                                     co co  >J K
zeroes,
*
to
d
o
o.
01

v<
o
I-l

to
4J
Vl
rt
_r^
y

c.
•rl
Vi
4-1
to
rl
CJ
N
>-,
rH
ra
d
ra

^
o
OJ
Si
CJ
to
60
C
•H
•d
rt
CJ
M

U
O
CJ
M
Vl
o
o

T3
d
rt

f>
0)
0)
to
d
rt
VI •
to
u-t C
o o
•H
U 4-1
01 CJ
3 OJ
rH
VI 14-1
OJ >
rH
«
d
«J
OJ
VI
3
01

CJ
.*!
•9
X




•
tn
4J
CJ
U

01

rt
4-1
rt
•a

01
XI
4-1

o
U
d
o

>>
rH
4-1
Cl
CJ
VI
Vi
o
CJ




ations
Vi
4-1
d
OJ
Cl
C
o
o

01
rt
60

VI
01
o.
o
Vl
p.

01
x;
4J
4J
rt
X
4-1
01
>J
3
01

OJ
4» 01
Xl >
-rl
01 4-J
« O
CJ OJ
OJ &.
XI 01
to oi
Vi
rt
4-1 OJ
rt x:
T3 4J

OJ 4-1
XI 01
4-1 C
•H
C rt
O t>0
rt
•a
QJ CO
Vi OJ
OJ 3
4-1 r-H
d w
QJ >
01 OJ
vi x;
rt 4J


























•
01
4J
VI
C)
x:
o
o
•H
4-1
IS




4-1
•H
O)
4J
01
01
XI
to

rt
4J
rt
•o

C
o

60
d
•H
T3
rt
u
PS
Vi
CJ
4J
OJ
e
o
VI
rt
M

.M
0
CJ
x;
u
OJ
rH
XI
rt
rH
•H
rt
>
rt

d
SJ
x:
S

4-1
VI
S
XI
o

D.
•rl
U
U
01

Vi
0
4-1
CJ
e
o
U
rt
XI




4-1 OJ
to xi
CJ 4J
4J
4-1
OJ rt
vi xl
3 4-1
U
rt .
01 Vi
Vi 13

VI «}
o
a U
o ,
SrH
,  O  01
                                                                                                                                H  CJ W  -O
                                                                                                                                 0)  O  3  S", S
                                                                                                                                 41  x: en  oj
                                                                                                                                H  co     t£  tr>

                                                                                                                                rH  rH  rt  C  rH
                                                                                                                                 O  rt  S  O  rt
                                                                                                                                 M  S-l XI 4-1  rl
                                                                                                                                 CJ  O  tO  >-, 0)
                                                                                                                                •a  T3 -H  O  13
                                                                                                                                 QJ  CJ W rH  OJ
                                                                                                                                fm  fn     U  {*

                                                                                                                                •II*     II
                                                                                                                             CJ H  CO C=j X  H
                                                                                                                            fe! lii  tu S O  fe
                                                                            no
-------
           Od  CO


           g«
           H
           CO CO
       0     IH
C
o
(J
                       X
                                                                      X
                                                                                     X
                                                                                     X
                                                                                                  X
                                                                                      X
                                                                                                   X
                                                                                                               X
,0
 o
E-"


•o
CJ
•3
1-1
O
CJ
o
u

CO
01
M
3
4-)
CD

CJ
ex
E
cu
4-1

OJ
.C
4-1

4J
cd

'u

OJ
OJ
CO

o

V
o
01


CU
rH
.a
cd
4-1
O.
CJ
o
CJ
cd

d
•rl
J3
4-1
•H
Jj

OJ
M
rt

OJ
CJ
o
1-4
4-)

J4
rj
O
CO

OJ
r;
4J
C
0





CO
•rl

01
O
o
M
4J

a)
^4
u

jj
2



CJ
j_j
3
CO

CJ
^
rt
(^

•a
c
rt

CO
4J
•H
•H
rH


14
3
O


CM
rH

•
X
o

ex
a.
f3

.
OJ

•H


OJ
S
•H
4J

lJ
CJ
a.
o

ex

u
-^
4-1

O
U-l







s^.
WJ
c
•H
4-1
CO
OJ
4-1

O
4J

^4
O
•H
|J
a

>^
rH
CJ
4J
cd
•rH
•d
OJ
g
g
•H

•d
c
•rH
OJ
ex


                                                      60
                                                      C   «
                                                     •rt   •
                                                     •d  o.
                                                      VH  CO
                                                      O  -H
                                                      O  T3
                                                      OJ   •
                                                      V4  tfl
                                                          c
                                                      >H  01
                                                      cu
                                                      Cu  ^
                                                      o   >
                                                      >-i  B
                                                      IX O
                                                          •a
                                                      n  o
                                                      o
                                                     M-H   "^

                                                          CO  -
                                                        .  CU  •
                                                      o  >,
                                                  4J U-4      .
                                                  co  d   «

                                                  >J      OJ
                                                   o   cj  e  cd
                                                   OJ  'rH      VJ
                                                  J^  4:   -  U
                                                  o   o  cu
                                                       > ^S   «>
                                                   CJ       ca   .
                                                  rH   d  E .43
                                                   CJ   O      1J
                                                  •H   O UH  Cd
                                                  J2  CAJ  O  O
                                                   CJ
                                                   OJ
T3

 cd  co
     cd
T3  3
 cu
 C t-H
•H  OJ
 cd  3

•o

 CO  OJ
 cd  D.
 3  o
 CJ
 3 'I t
14-H  O

•O  4-1
rH  ti
 O  3
     o
 4J  g
 P3  O


     o    •
^  GJ  T3
 O  VJ   QJ
 GJ  V^  'O
^  O  T3
O  cj   cd
preliminary 10-min-

j^
CJ

*4J
GJ

*£

GJ
CJ
05

O
U

L>
o

c*
CJ

,
c
G

""j-
u

CO

•>
*""
0)
>
•H

•o


a

3
CU
a
CO
C3
0)
CJ
•H
C-;
CJ
*>

jj
cd
.e


CJ
0)
CO

o


vj
o
CU
(£
o
CU
•H
4J

4_J
r"
to
•rH
M

C
•rl

^
CO
O
CO

MH
O

4J
3
O
ssion controls are
•rH
e
CJ

4-1
C3
^C
3

OJ
CJ
CO

o
4J

^•S,
O
0)
^G
o
ed onto the mechan-
t-i
M
OJ

10
a
cd
u
[ i

-d
d
Cd

4J
d
CJ
CO
0)
j^
ex
4-1
CU
CU
to

cd

R)
•d

d
o
•H
4J
CJ
CJ
r\
CO
C
•rH

rH
cd
CJ
•H
CO
G
CJ
OJ
rH
•o
•H
O
UH
CO
OJ
d
CJ
0)
o.
CO

t^J

•o
CJ

3
CO
cd
CJ
6

•^
CJ
CJ
_f~*
u
OJ
OJ ,
CO
CJ
O
o
01
!-H
•d
•rl
v>
00
d
•H
E

4-1

n
rH
t-H
CU
3
•a

*>
•d
0)
CJ
a.
u
a
o
OJ
ll-
CU
4-J
c
o
rH
CU
O
a.
'd
OJ
^i
a)
4J
d
OJ

cu
M
cd

.X
o
,c
4J

4.J
cd
£.
4J

4J
O
0)
^
M

cd
u
(-3
•d

0
o

4-1
CJ
O
ex
CO
d
•H
                                                                                        111
-------
* 3
o
in
C/5
O
_!
o4 en
q w
< H
H H
0 H
en u}
C-i
t~
























































































































































X
X
X
-
X
X



X
-














X
X





O
o
•8
H
u
cu
o

CO

rt
u
rt
•o

c
o
                          OO
     rt
 CU  T3
13       rt

 o   to   rt
           o       -  R      g	

                          VJ  VJ  O  Cu CU   U       uyoCUCUOCUCUCJ
                          CU  CJ  Cu     O.  O.      CCCuCuC-CXCUC-(X
                          O. C. O      OOCUCU-rlOOOOOOO
                                                                 VJ
                                                                 o
                                                                — ^
                                                                 -a
                                                                 C
                                                                 to

                                                                 13
                                                                 CU
o

^5

•a

JS  -H
     Ul
                                                                          JJ  4-1
                                                                          cu  o
                                                      •a   cu
                                                       C   to
                                                       (0
                          CJCUGCUCUCUCUCU;ncUCUCUCUCJCU
                          OOCJOCJCJCJJ-J      CJ  CJ  CJ  O  CJ  CJ
                                                                          cu  
  CU
      o

 •rl  C
      •H
  cu
  o  -o

•  5  j-i
 J3  rt

 •a  o
  C  D-

      o
    •- o
 T3  C
  CU  -H
  Ul

  3S
  cu  3


  o  c
  U  -rl

  •a  rt
  cu  o
  CO  rH
  o
  Cu   •
  C.IX
  3    •
  to  K
                                                                                                                   o

                                                                                                               6  ,z
                                                                                                               •a  iJ
                                                                                                               O  -rl
  o
  VJ

  CO

  o
 xi  -a
      cu
  o  to
  [  f  j

  X Ul
  J-l  o
 •rl  3
  o
  rt  r-J.
  a. -y
' rt  3
  O  «VJ
                                                                              o  d
                                                                              cu  o
                                                                             X!  -H
                                                                              u  u
                                                                                  rt

                                                                              ..  u
                                                                             en  vj
                                                                             O  VJ
                                                                                  o
                                                                              cu  o
                                                                              o
                                                                              rt  -a
                                                                              E. C
                                                                              ui  rt
                                                                                      cu  co
                                                                                      3  CU
                                                                 o  o
                                                                 cu
                                                                 XI  J-*
                                                                 O  C
                                                                     3

                                                                 ••  1
                                                                 U~t  rt

                                                                     VJ
                                                                 cu  cu
                                                                 o  a.
                                                                 rt  o
                                                                 CU VJ
                                                                 co  c.
H
CU
u

I
o
o

•a
a>

o
1-1
t-H
O
                                                                                             0)
                                                                                            rH
                                                                                            XI   CU
                                                                                             o!   >
                                                                                          •   C  -H
                                                                                         CU   O   to
                                                                                         >   (0   Ul
                                                                                        •rl   W   .
    O  C3
CU
M  vj  to
rt  o  cu
                                                                  CJ      Ul      'rl
                                                                  o      a   tt>  u


                                                                  Ul     -rl  -r)  T3

                                                                  VJ  tfl           O
                                                                  Ci  O  ^i  rH  .C
                                                                  >  cu   c   rt
                                                                  •rH  CX  rt   *-*   >»
                                                                  VJ  Ul   VJ   O   C

                                                                      c:
                                                                                                                                                        rt
                      o
                      o
                                                                                                              VJ
                                                                                                              cu

                                                                                                              •H
                                                                                                              VJ
                                                                                                              Q

                                                                                                              .M  rH
                                                                                                              O
                                                                          01  Q)  Q)
                                                                  VJ XI  VJ  VJ  VJ
                                                                  3^333
                                                                  CO -H  U)  CO  01


                                                                          CU  CU  CU
                                                                                                                                            c .    ..
                                                                                                                                            o  rt  rt
                                                                                                                                            P. 23 A.
                                                                                     112
-------
 H
         O
     
     Q  W
     •3  H
     M  <:
     H  H
     O
     to
                                                    X
                                                                X!
                                                                                    X   !   X
                                                                                                                                                X
                                                                                                               X
                                                                                                                           X
                                                                                                                                  X
 OJ

,0
                      o
                      •H


                      CO
                      OJ
                      3
M
r^

to
cu    •
rH   CU
v-l  rH
E  XI
     rt
<4J   C
     O
  •  to
cu   rt
to   OJ
rt   l-i

o   to
)-l  -H
3

°--e
  •   o

CD  O
                          CJ -r-l

                          O
                          C  O
                         -H  C
                          CO -H


                          CJ  U

                          B  O
                         •H  1-1
                          4-1  CU
    CO
 CJ
 1-1  4J

•H  CO

 (I  rH
 c
 O  VI
•H  O
                  to  to  o   •>
                  CJ  -H     ^~,
                  3  co  x:  c
                  o  rt  u  a

                  >J      C  -H
                  o  r:  cj  u
                  C  O  rH  T3
                   o
                   CU
                  x:
                  u
    co
    cu
 to  to
•H  o
    a.
 O  IJ
rH  3
 o  a.
•H
x;  to
 CU  3
 >  O
    •H
 CJ  V4
 e  rt
•H  >
 4-1
    V4
VI  O
 O  VI

 4-1  CO  IA
 C  CJ
 3  O  13
 O  rt
 S  rH  -J-
 rt  (X

 4-1  CO   *•
 rt  3  O-
x:  o  ^
 4J  -r-l
    rJ   0)
 QJ  rt  rH
 cu  >  xi
 to       rt
    c   c
 O  -H   O

    C   rt
,M  CJ      V4
 O  -H
 x;  V4   co
 CJ  TJ  -rl
                                   o
                                   c
                                   o
 c
 o

•a
 c
 rt

 cu
 C
 o    •

 CJ  "tj
 l-i   rt
 3   o
 to

 CJ   O

"rt
 E  -a
     cj
    ,-Si
 ••   CJ
  cj   oj
     w   cs
 u      to
 O  O   Ci
 c  4-i   rt

 U .*!   to
 O  CJ   -H
     CU
 1-1 x:  ^N
 CU  O  rH
x;      cu
      -  3
                                        
x>
T3

4.J
«
x:
4J

OJ
CU
CO

0
4-1

y
o
CU
x:
o


































4J
C
rt
cu
•H
O
•H
J
j_i
rt
a.





4J
a
OJ
to
cu
V4
P.
to
•rl

-d
!-l
rt
o

c
o
•H
4J
rt
o
•r<
rH
CU
CU
rt

•a
d
o
x>
4J
rt
rt
4J

CJ
CJ
to

o
4-1

^
CJ
a)
_r~;
O
































-3
O

•a
CJ
, — i
rH
•H
Vl

-CJ
C
rt




error
or obvio
shee
                                                                                                          rt
                                                                                                         •o
                                                                                                   c:
                                                                                                   o
                                                                                                                                   CO

                                                                                                                                   a)
                                                                                                                                   en

                                                                                                                                   o
                                                                                                          c
                                                                                                          rt
                                                                                                          CJ
                                                                                                          en
 rt
•H

 Cu  CJ
 O   QJ

 P*  to
 a,
 rt   "
     4J
 4J   '.J
 ni  ""o

 •M   C


 cj  T;
 co   o

 o   o
 4J   4J











4->
<1)
CJ
_n
to

rt
i .
rt
•a
CJ
T3
O
E

>^
cu

C
r*l
CTv
rH
O
CJ
O

CO
*^^

•a
cu
>*
o
4J
C


.
4-1
u

•
CJ
>

V*
O
O
x;


rH
CJ
CJ
rt
Cu
CO
XJ*

p ,
p^

rH

LJ
O
"J
1_!
M
O
CJ

O
U-l

_^i
CJ
CJ
^
u



OJ
o
rt
Cu
CO


-o
CD
O
£,
re

OJ
CO
-H
3
i_i
U
•rt
tc

_^
o
CJ
^
u

X— V
r^
CO

cu
o
rt
Cu
CO
*^s

CJ
CO
•H
^J
1-4
O

O
CJ
ru
01
0
rJ

^
O
a
XI
u

ndence between
o
a.
U}
CJ
>-<
M
O
O

cu
rH
XI
rt
d
0
CO
ra
GJ
l-i

O
•4-1

^
CJ
O
r"
O



•
c\4

4-J
to
OJ
1_J

t<3

rH

LJ
CO
cu
4-1

S-l
o

CO
4-1
rH
3
W
CJ
1-1

                                                                                    113
-------
   *~    «=•!
   H 1-9 2
   U.    o
 C
 o
 CJ
       3
       CO
       w  <
       H  H
       C/3  CO
           H
        O W
        CO fe
        tn 33
C-,
H
.0
 ni
H
                              >        <3

                              OS     CO  CO
                                     M  O
                               «     O3
                              Oi     -Q -H
                              S     Er  cd
                              0     3  >
                              EH     C
                           4J  cd

                           O  >
                           n
                           •H    -
                       CO     J3

                       3  &T Ow
                       O  S


                       C  sT Q
                       •H  IJ
                       p  n    •
                       »• -O  C3
                        a) -:t   -
                        u   .  o
                        3 x;  M
                        o. o  oj
                        E > P3,
                        o
                        a
                       CO
                       c
                       o
                      -H  C
                       u  O
                       r;  "H
                       VJ  CO
                       i-l  CO
                       ci  -H
                       a>  E
                       O  £1)
                       C
                       O  CJ
                       O  rH
                          XI
                       CO  C3
                       rt  c
                         I O
                          CO
                          rj
                          t)
                          R!
 cu  co
 CJ  OJ
 S  3
 *J rH
 O  tij
Xi  >

 O  GJ
 O ^H
 d  ex
 cu  B
T3  tO
 d  co
 o
 o. to
 m  c
 •U M
 M  C
 >-i  d
 O  3
 O  M
                        o
                        o
                       6'
                                                                           114
-------
                          Table 6


                  TYPES OF COMPUTER CHECKS

1,    RANGE CHECKS ON ALL VARIABLES

     EXAMPLE:  68°F 1 DRY BULB TEMP 1 86°F (VALID TEST RANGE)

2,    ID CHECK FOR ALL TESTS WHICH SHOULD BE INCLUDED FOR
     EACH VEHICLE

     EXAMPLE:  A SET OF 1/0 CODES ARE BUILT INTO THE VEHICLE
     ID INDICATING WHETHER FTP/  HFET, EVAP,, SULFATE/ MODAL/
     SHORT TESTS ,,, ARE RUN,   THEN/ THE EDIT PROGRAM CHECKS
     FOR APPROPRIATE DATA CARDS,

3,    ID INFO CHECKED AGAINST VIN

     EXAMPLE:  MODEL YEAR CHECKED AGAINST VIN CODE,

4,    FUNCTIONAL RELATIONSHIPS ARE DEVELOPED WHEREVER POSSIBLE

     EXAMPLES:   MODEL YEAR RELATED TO MILEAGE;  ROADLOAD
     HORSEPOWER RELATED TO INERTIA WEIGHT;  ENGINE SETTINGS
     COMPARED WITH MANUFACTURER  SPECIFICATIONS;   ALLOWABLE
     EMISSION LEVELS DEPENDENT UPON MODEL YEAR;   NUMBER OF
     CYLINDERS RELATED TO ENGINE DISPLACEMENT;  FUEL ECONOMY
     RELATED TO ENGINE DISPLACEMENT,

5,    COMPOSITE VALUES COMPUTED FROM COMPONENTS AND COMPARED
     TO CARD VALUE

     EXAMPLES:   COMPOSITE FTP COMPUTED FROM INDIVIDUAL BAGS;
     FUEL ECONOMY COMPUTED FROM  HC/ CO/ COo DATA USING CARBON
     BALANCE,

6,    RANKING COMPARISON OF RELATED VARIABLES

     EXAMPLES:   IDLE MODE EMISSIONS LESS THAN HIGH SPEED MODE
     EMISSIONS;  HIGHWAY FUEL ECONOMY GREATER THAN FTP FUEL
     ECONOMY;  COLD EMISSIONS GREATER THAN STABILIZED EMISSIONS,

7,    CHECK THAT EXPECTED BLANK COLUMNS ARE BLANK TO ENSURE
     PROPER COLUMN ALIGNMENT

8,    DATA ON VEHICLE COMPARED TO PREVIOUS DATA ON SAME VEHICLE

     EXAMPLE:  EMISSIONS TAKEN AT TWO DIFFERENT LOCATIONS OR
     TWO DIFFERENT TIMES ARE COMPARED,


                            115
-------
                          Table 7
              TYPES/SEVERITY OF  DETECTED ERRORS

1,    ERRORS IN TEST PROCEDURE
     A,    DETECTED IN MANUAL AND/OR COMPUTER EDIT
     B,    TEST IS INVALIDATED
     c,    EXAMPLES:  DRIVING TRACE OUT OF SPECS (MANUAL)
          WRONG INERTIA WEIGHT SETTING (COMPUTER)
2,    ERRORS IN CALCULATION METHODOLOGY
     A,    DETECTED IN MANUAL AND/OR COMPUTER EDIT
     B,    ALL DATA ARE CORRECTED BY LOOKING AT PACKET
     c,    EXAMPLES:  USED WRONG SCALE TO READ EMISSIONS (MANUAL)
          COMPOSITE FTP INCORRECTLY CALCULATED (COMPUTER)
3,    KEYPUNCH ERRORS
     A,    DETECTED IN COMPUTER EDIT
     B,    ALL DATA ARE CORRECTED BY LOOKING AT PACKET
     c,    EXAMPLE:  ENGINE DISPLACEMENT DISAGREES WITH VIN
          AND/OR  IS OUT OF RANGE
L\,    MISSING DATA
     A,    DETECTED IN MANUAL AND/OR COMPUTER EDIT
     B,    TEST INVALIDATED UNLESS MISSING DATA CAN BE FOUND
     c,    EXAMPLES:  DRIVING TRACE MISSING  FROM PACKET  (MANUAL)
          BLANK FIELD FOR ENGINE DISPLACEMENT  (COMPUTER)
5,    DISCREPANCY  BETWEEN DATA CARD AND DATA PACKET
     A,    DETECTED IN COMPUTER EDIT CHECK-OUT  PHASE
     B,    PACKET  VALUE ASSUMED CORRECT
     c,    EXAMPLE:  RLHP READING ON DATA CARD  is OUT OF RANGE
          AND DISAGREES WITH WHAT  IS  RECORDED  IN THE PACKET
                           116
-------
                          Table 8
          TYPICAL REASONS TESTS HAVE BEEN REJECTED

1,    WRONG CVS COUNTS, EITHER TOO HIGH OR TOO LOW
2,    EXCESSIVE CRANKING TIME, OVER 10 SECONDS WITHOUT REGARD
     FOR PRESCRIBED PROCEDURES FOR RESTART
3,    WRONG INERTIA WEIGHT SETTING ON DYNAMOMETER
4,    WRONG HORSEPOWER SETTING ON DYNAMOMETER
5,    EMISSIONS CONCENTRATIONS READ OFF-SCALE OF ANALYTICAL
     EQUIPMENT
6,    LABORATORY BACKGROUND EMISSION LEVELS TOO HIGH
7,    VEHICLE HAS WRONG AXLE RATIO
8,    SAMPLE BAGS NOT ANALYZED WITHIN 10 MINUTES OF TEST
     COMPLETION
9,    DRIVER'S TRACE NOT FOLLOWED AS PRESCRIBED
10,  RECORDING MALFUNCTION/ 110°F DURING TEST
11,  INITIAL FUEL TEMP, TOO HIGH (63°F)/ OR HIGHER
12,  SOAK AREA TEMPERATURE TOO HIGH, FOR PRESCRIBED PORTION
     OF VEHICLE SOAK PERIOD
13,  TEST AREA TEMPERATURE TOO HIGH FOR VEHICLE TEST PERIOD
m.  CVS TEMPERATURE NOT WITHIN ±10° OF SET POINT
15,  ANALYTICAL INSTRUMENT(S) SPANNED INCORRECTLY
16,  TEST ITEM(S) NOT DOCUMENTED AS REQUIRED
17,  ENGINE TIMING NOT CHECKED
18,  ENGINE TIMING SET INCORRECTLY
19,  ENGINE IDLE CO NOT CHECKED
20,  ENGINE IDLE CO SET INCORRECTLY
21,  ENGINE IDLE RPM SET INCORRECTLY
                           117
-------
                          Table 9
                CURRENT DETECTED ERROR RATES4
              MANUAL
PROGRAM
1
2
3

CONTRACTOR 1
14%
17%
17%
COMPUTER
CONTRACTOR 1
CONTRACTOR 2

21%
26%
(PROGRAM 1)
CONTRACTOR 2
CONTRACTOR 3


32%
CONTRACTOR 3
                               10%
50%
*    PERCENTAGE OF VEHICLES WITH AT LEAST ONE ERROR DETECTED,
     LESS THAN 5% OF TESTS ARE INVALIDATED - MOST ERRORS
     CAN BE CORRECTED,
                           118
-------
                          Table 10
                         EDITING EFFORT PER CAR (MANHOURS)


                                     PROGRAM 1    PROGRAM 2

LOGGING, FILING, SCOREKEEPING           ,1
INITIAL REVIEW                          ,3
REVIEW AND SCOREKEEPING, RETURNED       ,1           ,2
  AND RESUBMITTED PACKETS
SUPPLEMENTAL TESTS                      ,2           ,4
COMPUTER EDIT                           ,05          ,15
PRO-RATED COMPUTER PROGRAM              ,05          ,3
  DEVELOPMENT
PRO-RATED MANUAL EDIT                   ,10          ,5
  PROCEDURES DEVELOPMENT**                          	
                                        ,90         2,35
     BASED ON 100 HOURS OF EFFORT SPREAD OVER 2000 VEHICLES
     FOR PROGRAM 1 AND 120 HOURS OF EFFORT SPREAD OVER 400
     VEHICLES FOR PROGRAM 2,

     BASED ON 200 HOURS OF EFFORT SPREAD OVER 2000 VEHICLES
     FOR PROGRAM 1 AND 200 HOURS OF EFFORT SPREAD OVER 400
     VEHICLES FOR PROGRAM 2,
                          119
-------
        oo
4-1
^1
O
cu 14-1
CO MH
3 W
o
1 iH

/— s
CO

3
O
A
h
a




o
o
CM



•K
*
m
CM
CM
r-l




m
^
00
in





0
m
co





O O
o m
r^ co
^N
CO
^
6
^
•
-a-
^
                      w
CO 00
O F^
c_> ^

CU 1
4-1
CO 1^

o --
»4
o
o
o
o
oo
m



o
o
o
o
o
CM
•%
r-H

O
O
O
0
o
CM
{/>


o
o
o
m
i^.
rH



O
0
o
o
in
CO
vy-


o
o
0
o
m
rH



O
O
o
o
rH
00
ft

0
O
O
m
*sQ
*3"
«i
{/i-

                       cu oo
                       4J r^
                                                                                                    u
                                                                                                    cfl
                                                                                                    cu
                                                                  o
                                                                  .c

                                                                  o
                                                                  m
                                                                  oo
        •H
         M
M
1
ra
cu r-^
H r~»
O ^~
•H 0-
O
o
00
o
o
o
m
CM
o
m
CM
o
o
CM
O
o
r-l
O O
o o
o o
        cn
 ca
H
4-1
o
cfl
)-l
4J
G
O f-\
U 
^-^
r-l
CO 4-1
4-1 CO
O O
H U


O
0
o
*
0
0
V0
^
r-l
•W




o
o
o
•s
o
o
o
r»
CN
•C/J-




o
o
o

o
0
CM
•{jy-






o
o
o

m

rH
•CO-






CO
o
o
*
o
o

•C^h






0
o
o
»l
o
m
rH
{/)-






o
o
o

o
o

*
CM
•0>
O
O
O
•V
o
CO
m
9\
[X^



CU
4-1
o cu
O CO
Z cu
rH
rH O
Cfl -H
O CU
H >


O ON O O
CM CM m m
CM CO CM CM
CM CM






0 O O
O 0 0
**^ *"H **^
^O




                           cfl
                          la
                                                                                     ON
W

m


£
o
o
m
00  CO

o  a
o  3
O  )H
vC  IH
 O
 4-)

a  co
    CU
in rH
r~  o
                                                                    m
                                                                                  13
                                                                                   C
                                                                                   cfl
                                                                                   O
                                                                                  PL.
                                                                              CO
                                                                              cu
                                                                              o

                                                                              cu
                                                                              3
                                                                              a1
                                                                              cu
                                                                              co
O
m
CM

 0)
•a


 a
 c
                                                                                                               *
                                                                                                               *
                                                           120
-------
                           Table 12

 FUTURE APPROACHES - CURRENTLY BEING TESTED ON A TRIAL BIAS


1,   MORE AUTOMATED DATA ACQUISITION

     A,   DEDICATED COMPUTER ON SITE (GENERATE DRIVING
          TRACE/ SET DYNAMOMETERS/ AUTOMATIC DATA RECORDING),

     B,   CENTRALIZED COMPUTER,

     c,   COST is HIGH ($75K - 125K FOR A DEDICATED SYSTEM)
          AND DIFFICULT TO JUSTIFY FOR YEAR AT A TIME CONTRACTS,

2,   REQUIRE STRICTER CONTRACTOR DATA EDIT PROCEDURES

     A,   ALL KEYPUNCHED DATA MUST BE ENTERED AND THEN
          VERIFIED BY TWO DIFFERENT PEOPLE,

     B,   REQUIRE CONTRACTOR TO APPLY MANUAL AND COMPUTER
          EDITING TECHNIQUES,

     c,   CHECK-OUT OF ALL CONTRACTOR DATA HANDLING PERSONNEL,

     D,   CONTRACTOR WILL SUBMIT ERROR PRINT-OUT WITH EACH
          GROUP OF TEST PACKETS,

3,   STATION EPA PERSONNEL FULL-TIME AT EACH TEST SITE

4,   ASSUMING CONTRACTOR ERROR RATE DECREASES/ USE SPOT
     INSPECTION OF PACKETS

5,   USE STRATIFIED SAMPLING SPOT INSPECTION OF PARAMETERS
     TO MINIMIZE COST OF ERROR TIMES VARIANCE IN VARIABLE J,
     STRATA CAN BE DIFFERENT DRIVERS, ETC, EACH STRATA FOR
     EACH VARIABLE IS INVERSELY PROPORTIONAL TO THE SQUARE
     ROOT OF ERROR COST AND DIRECTLY PROPORTIONED TO THE
     VARIANCE OF THE MEASUREMENT IN THE STRATA,   THIS IS
     ONLY GOOD FOR CORRECTLY ESTIMATING THE MEAN OF VARIABLE
     J,

6,   SEQUENTIAL LIKELIHOOD RATED TEST,  BASED ON MINIMIZING
     THE COST ASSURE A GIVEN OVERALL ERROR RATE,  START BY
     EDITING THE VARIABLE WITH THE HIGHEST IMPACT ON THE
     EMISSION RATE OR THE SMALLEST COST/BENEFIT RATIO IF
     COSTS OF EDITING ARE DIFFERENT,
                            121
-------
                          Table 13
                        SPOT TESTING
QUESTION:   GIVEN A TOTAL TEST POPULATION OF N CARS, IF
     WE EDIT A SAMPLE OF Y CARS AND FIND NO ERRORS/ WHAT
     IS THE LIKELIHOOD THAT THE ERROR RATE IN THE ENTIRE
     POPULATION IS LESS THAN X%,


     ASSUME UNKNOWN ERROR RATE OF P%, SAMPLE SIZE N/
     TOTAL NUMBER BAD TESTS X = PN, TOTAL NUMBER OF GOOD
     TESTS IS N-X = Y,
         ,  OF N GOOD PACKETS IS


  -  Y  (Y-l)   (Y-2)    (Y-N+1)   _    Y!(N-N)!
     N  IIFTI   TIFZT    (N-N+1)   ~   (Y-N)!N!


     SEE P (CONFIDENCE LEVEL)/ N, p (ERROR RATE)


     DETERMINE N
                            122
-------
43
cfl
H
        CB
        e
       •H
        O
        to
        to
       W

       MH
        O
       to
       QJ
       U
        O

       rH
        Q)













o
Ox
0
00


O
Is*

o

0
LO

o
•*
o
ro
m
CN
0
CN
• *
0
O
•H
4J
ca
4-1
QJ
to
&

QJ
4J
rj
H
O
Ox
0
00


O
r~-

o












no errors among n
CO
13
0
•H
UH

QJ
0
O

<4H
H
Ox
Ox
VO
OX


rH
Ox

^)-
00
m
r^

 Cfl CO to Q) Cfl -H
4J 4-1 42 42
QJ QJ Q) 0 4J O CO
rH to £t QJ 4-1
•H QJ O to QJ Cfl
42 42 cfl 0 O 42 QJ
S 4-i cx <; IH 4-1 4-1



J>^
4-1
0
•H
cfl
M
QJ
O
Ox
°\
A!






r-- ox
OX OX
                co
                4J
                CO
                QJ
        CN
                O
                o
                0
                o
                •H
                4J
                3
                cx
                o
                p.
                cfl
                4-J
                O
                to
                O
                         CO
                         QJ
         Ox


         00


         r-
 to
 Q)

1
            >,
            4-1
            0
            •H
            ca
                    O
                    m.
                    v
                                                                        rH 
-------
 DATA VALIDATION TECHNIQUES USED  IN MOBILE
              SOURCE TESTING
                     by
              C.  Don  Paulsell
Office of Mobile Source Air Pollution Control
    U.S. Environmental Protection Agency
         Ann Arbor, Michigan  48105
                     125
-------
INTRODUCTION

The EPA  laboratory  at Ann  Arbor is the  primary government  facility
responsible  for  certification  testing  of  engine-driven  vehicles  to
determine compliance with the standards for emissions levels  and  fuel
economy.   Approximately 2500 to  3000 vehicles of foreign  ami  domestic
manufacture  are  tested annually.   This  testing is  performed  in  10
dynamometer test cells using the constant-volume sampling  (CVb)  method
to collect  emission  samples from vehicle exhausts •   The samples  are
analyzed  on seven analyzer sites each  equipped with all  of the various
instruments necessary for sample analysis.   As the vehicles  are
operated  through a prescribed simulated  driving  cycle, sufficient  data
are also  recorded to determine  fuel economy.  A complete  data set for
a vehicle  includes  information  such  as vehicle identification data,
test specifications, instrument  calibrations, calibration data  corre-
lations,  test  data,  calculated   (reduced)  test  data, vehicle  manufac-
turer's  test results,  EPA test  results, and  quality  control data.
After  these data have been collected and/or generated,  they  are
subjected  to  quality control procedures  to assess overall  accuracy,
precision, uniformity, and validity.

QUALITY CONTROL SYSTEM

The "products"  of  our test process are the data which represent the
intangible  exhaust  emissions of the  vehicles   tested.   The quality
control system assesses the acceptability of this  product in  terms of
accuracy, uniformity, and validity.

Accuracy is  important  since these  data are used to decide whether a
vehicle meets federal standards.  Moreover,  a financial  penalty  may be
applied  to any  manufacturer  for not  meeting  the  standard   for  fuel
economy.    This assessment  is  five  dollars  per vehicle  produced  for
each tenth of  a mile  per gallon less  tnan  the standard.  Thus,  the
question of  accuracy could potentially involve millions of  dollars.

Since accuracy can be  a relative attribute,  the data  are  also checked
for precision and uniformity to determine whether  measurements  can be
repeated   at an analyzer  site  and  whether  results from  each  oi  the
seven analyzer  sites  are essentially equivalent.   Finally, since the
"data product"  is  dependent upon the test process  used,  the  validity
of that process must be verified.  Data validation techniques  comprise
a very important part of the total quality control system.

TYPES OF DATA VALIDATION

Data validation  begins  long before  the vehicle  test  is performed anu
continues after the vehicle has  been returned to its company.

This broad  application  of data  validation is illustrated in  the  five
parts of  the overall process.

                                126
-------
        1.  Calibration Acceptance
        2.  Operational Verifications
        3.  Procedural Checks
        4.  Test Data Review
        5.  Comparative Measures

The  following  paragraphs  discuss  each  of  these  areas and  provide
examples of the methods used.

CALIBRATION ACCEPTANCE

A wide variety of  instruments and equipment are  used  in the measure-
ment process.   It  is  obvious that each unit must  be  calibrated, but
what is not  obvious  is how to validate  that  a  calibration has normal
characteristics.   The calibration  procedures  are  often more  compli-
cated  than  the test  procedures  -  an  erroneous  calioratiori can only
produce erroneous test data.

The QC  methods  used  for calibration validation  emphasize the  quanti-
tative  aspects of  the  equipment  characteristics.   For example,  a
dynamometer  is  calibrated  to establish  residual  bearing  frictions.
These frictional values tend to have predictable magnitudes across all
dynamometers.   Use of  this  characteristic can  provide  confidence in
both accuracy and uniformity for dynamometer calibrations.

The analyzer and  constant  volume sampler  also  have unique character-
istics.   An  analyzer  curve can be  assessed in  terms of nonlinearity,
curve fit deviations,  and  the absence  of  inflections.   A CVS utilizes
a critical  flow  venturi which has  a  characteristic discharge  coeffi-
cient  of  .985 to  .995.   This  coefficient,  the  ratio  of  actual  to
theoretical  flow for  a given  throat  diameter,  can  be  used  to assess
flow metering accuracy and long term stability.

The dynamometer, constant volume sampler  (CVS),  and analyzer represent
the  three major  components  of  the  measurement process.   A proper
calibration  is a  necessary condition  for  getting valid  test results,
but the operational verification is equally important.

OPERATIONAL VERIFICATIONS

This phase  of  the process is used  to assure that  the  equipment can
measure and  produce  a known  result.   Special  tests  are conducted at
daily, weekly, or bi-weekly  intervals  to  produce a QC parameter whicti
can  be normalized  relative  to  all  systems.    These  parameters  are
manipulated  statistically  or  plotted  graphically to assess control of
the process accuracy and precision.
                                127
-------
For  example,  the CVS  is  checked by  injecting  a known  mass of  pure
propane as though it  were  auto exhaust.  All measurements  and  calcu-
lations are  performed as  in  a  test and  the result must  be  within
plus  or minus  2 percent  of  the known value.   Leaks,  calibration
drift,  erroneous  analyzer  span gas  values, and many other  parameters
can cause the verification to fail•

The  analyzer  is verified  daily  by  analyzing a  bag of blended  gases
at  each of  the  seven analyzer  sites.    The deviation  of  each  site
from  the overall  average  serves  as  the normalized  parameter.   A site
which is consistently high or low or inconsistent  will be obvious from
the automated control chart analysis.  Positive  or negative  consecutive
runs  greater  than five, or  excessive data scatter are  automatically
flagged and noted by a QC message on the analysis  printout.

The dynamometer gets a short version of  the full calibration to  verify
its  stability.    Control   charts  of  flywheel  frictional values  will
graphically  show a  deteriorating   bearing  or  load control  problem.

Finally, a  repeatable car  is  tested on  each site  utilizing all  the
normal  components  of the  system.   The  emission  results are statis-
tically analyzed to show significant differences between sites.   These
operational verifications each address a specific  part  of the process,
and  when  assessed  in total,  provide  assurance  that   the  system  is
capable of producing  valid  emission results  from  a  properly conducted
test procedure.

PROCEDURAL CHECKS

A complete emissions test can require a total of about  eighteen  hours,
including the twelve hour overnight  "soak" period.  The  specifications
and  criteria  are so numerous  that  a set  of  checklists  have  been
developed  to  document  that  each one has been  clone  properly.    Test
times,  temperatures,  shift  patterns,  horsepowers, special procedures,
and  many  other  conditions  are  noted  or  checked  off  as  each  phase
progresses.   In some  cases, such  as  fueling,  the  operation must  be
witnessed by  two people,  since  the  type  of fuel  can  greatly  affect
emissions.

Although  the  test  equipment  has   been  previously  verified,  several
checks  are performed  as  part of the  test.   An  open valve or improper
horsepower setting would cause the test to be voided-

At  the end  of  the  test  process   all  the  stripcharts,  cnecksheets,
datasheets, and  driving traces  are consolidated,  reviewed,  and sent
for computer processing.
                                128
-------
TEST DATA REVIEW

The test processing office validates  that all necessary data have been
obtained.   The  data  sheet  is  then batch  processed by  computer  to
generate a printout of input data, calculated results, QC checks, and
pass/fail criteria.  The  computer program has  been designed to audit
the various test  data for  omissions  or  unrepresentative values.

For example, a  40,000 pound  automobile  would  likely be a 4,000 pound
value improperly keypunched.  Other  data,  such as ambient background
concentrations   can  be compared  to   a  normal  distribution  of  values
obtained  at  EPA to  flag  high  levels.   Higher values  may indicate
improper  analyzer  parameters or  a  leaking  vehicle  exhaust  system.
Since some of the test sequence is repeated, a ratio  of two flowrates
or  distances travelled  can be very  useful  in highlighting abnormal-
ities.   A normalized  ratio has become  a valuable  tool because it is
not affected by the  magnitudes  of parameters, which  may  normally be
different.   It  is  the ratio of these  different magnitudes that pro-
duces a  value   which  lies within  a  narrow  bandwidth.  The  ratio of
highway  to city fuel economy  is an  example  of  this application.

If all data  have been  validated and  all acceptance criteria met, the
documentation is stored and  the  results are updated  as valid  in the
computer data base.  While this completes the processing of one test,
it is not the end of  the  data validation process.

COMPARATIVE TESTS

Each  test  alone has certain characteristics,  and  all tests combined
have other useful measures.  Comparative tests  on large populations of
vehicle  results  can highlight differences  and  trends that an indivi-
dual test does  not  show.

The manufacturer has normally tested the vehicle prior to EPA's test,
so  an independent set of data  is  available  for comparison.   The
MFR/EPA  emission differences and percent  differences are calculated
and stored in a "paired data" file.  These normalized values can then
be  statistically summarized for each manufacturer group.  The results
of  this  analysis  show the  relative agreement between EPA and  all
individual manufacturers.   If EPA is consistently  higher or lower, a
systematic  bias may be  indicated.   Diagnostic  tests or correlation
programs can be performed  to identify and correct the  cause.

Statistical analysis of all these  data  can  provide  the upper and lower
limits which  are used to assess  the  significance of a  bias.   Test
conditions  and  equipment  identifiers   can  be  used  to  stratify the
analysis for assessing whether such  things as  altitude differences or
specific test  sites correlate  with  the  paired  data differences.
Finally, the data  validation loop can  be  refined  by the statistical
determination of QC limits.

                              129
-------
QUALITY CONTROL REFINEMENTS

A strong data  validation  program  can be developed by automating many
of  the  checks  being  made-   Computerized  validation  and  acceptance
tests require  that  the  data be pertinent  and  accessible.  Use ot  an
integrated data base structure  can minimize manual operations,  improve
security, and assure the integrity of the  data.

A computerized  data base  can also enable  the automation  of  screening
programs,  plotting  routines,  and  statistical  summaries.    It will
permit rapid development of more precise tools and tests  which can  be
used in the data validation process.

Finally,  a computer  data  base can provide  a  trail for  audits  or
requests for documentation.

CLOSURE

This paper has shown that the data validation process is not simply  an
inspection of  results at the  end of a  test.   Rather,  it is a combi-
nation of  specific  individual  tests  and checks which when  taken as a
whole,  form  the  foundation  for a  quality control  system  which can
provide documented, quantitative assurance  that  the "data product"  of
the EPA mobile source  program is fit  for use in our regulatory  process.
                               130
-------
VALIDATION OF CONTINUOUS STACK MONITORING
                   DATA
                     by
             Joseph  E. McCarley
Emission Standards and Engineering Division
    U.S. Environmental Protection Agency
Research Triangle Park, North Carolina  27711
                     131
-------
                   VALIDATION OF  CONTINUOUS  STACK MONITORING
                                     DATA
                                 J.E.  McCarley
                                    SUMMARY
       The Emission Standards and Engineering Division  is  currently  developing
a revised standard of performance for new  steam generators.   As  part of this
study, the feasibility of continuous regulation of  sulfur  dioxide  emissions,
as well as a percentage of sulfur reduction  from fossil  fuels,  is  being
evaluated.  In support of this study, the  Emission  Measurement  Branch is con-
ducting sulfur dioxide continuous monitoring projects  at five coal-fired
power plants equipped with flue-gas desulfurization units.  When data are be-
ing collected for supporting regulations,  validation of these data is an
important consideration.
       Prior to collecting emission data,  the continuous monitoring  systems
are validated by following the procedures  described in Performance Specifi-
cations—Appendix B 40 CFR 60.  (Performance Specification 2--Performance
Specifications and Specification  Test Procedures for Monitors of S02 and NOX
from Stationary Sources.)
       The monitoring data are then collected and recorded continuously from
each emission point at least once every 15 minutes.  In this study,  the data
are then placed in a computer bank, printed and then edited or  validated man-
ually.  During the monitoring periods when data are collected during instru-
ment malfunction, calibration, or plant upset conditions the time  periods for
these conditions are recorded by plant personnel.  These data are  purged from
the computer bank and the remaining data are averaged for  each  1-hour, 3-hour,
8-hour, 24-hour, and 30-day periods of time.  If more than one  15-minute data
point has been determined to be invalid in any one hour period, that entire
1-hour data are considered invalid and not included in the longer averaging
periods.  In summary, the data are edited for actual known errors  and no

                                      132
-------
statistical  validation procedures are performed.
      Further details of these monitoring projects  are contained in the fol-
lowing report and references therein: Kelly, W.  and Sedman,  C.  First Interim
Report: Continuous Sulfur Dioxide Monitoring at  Steam Generators.  EMB Project
No. 77SPP23A, Emission Standards and Engineering  Division,  Office  of Air
Quality Planning and Standards, U.S. Environmental  Protection Agency, Re-
search Triangle Park, North Carolina  27711, June 1978. 54pp.
      Future plans for evaluating validation procedures include (1) applica-
tion of more automatic recording and data validation instrumentation, and
(2) quality control steps to assure the accuracy of long-term emission
monitoring.
                                     133
-------
         SCREENING CHECKS USED BY THE
           NATIONAL CLIMATIC CENTER
                       by
                William E.  Klint
National Oceanic and Atmospheric Administration
           National Climatic Center
                Federal Building
       Asheville, North Carolina  28801
                       135
-------
                        SCREENING CHECKS USED BY THE

                          NATIONAL CLIMATIC CENTER

                                  W.E.  KLINT


                                   ABSTRACT

Current processing is discussed with emphasis on validation checks and manual
interface.  The need for an automated quality control  program is recognized
and plans for such are presented.  Plans for a new modular surface edit are
presented along with a new quality control  procedure using an interactive
graphics system.  Data management is addressed through a Data Dictionary/Data
Base Management system.
                                     136
-------
The National  Climatic Center is:

     Responsible for receipt, processing, archiving and publication of
     climatological  data.   Coordinates the analysis of past meteorological
     data for NOAA,  other Government agencies and the oublic to accommodate
     user requirements for climatological data through special studies
     and statistical analyses.   Manages the national program of climatolog-
     ical data recall and works closely with the military in meeting
     this special requirement.   Provides facilities, data processing
     support, and expertise, as requested, for World Meteorological
     Organization programs (e.g., 6ARP and GATE).  Assists in training
     programs to familiarize the representatives of developing countries
     with modern meteorology and coordinates (through World Data Center-
     A) international exchange of climatic data.

Of the various types of incoming data, paper forms predominate.  These
then must be keyed to digital form for processing.  This effort entails
keying approximately 37 million bytes of data per month.  Because of a
cutback in funding years ago only three-hourly surface observations, or
eight observations per day, are digitized.

At the present time, processing exists in two modes; a machine edit, and
a manual interface.

The machine edit consists of data verification, a range limit check, a
cross-field consistency check,  a continuity check, and appropriate flags
to "verifiers."

Data verification is a simple machine check to see if there is indeed
data keyed into the appropriate field, and if there, and the field is
coded, is it a legitimate code.

The range limit checks to see if the value in a particular field falls
into an appropriate range.  However, at the present time there is only
one range limit per field.  This, in and of itself, causes many unnecessary
"kickouts."

The cross-field consistency check looks at the entries in related fields
for consistency; i.e., clouds and precipitation.

The continuity check does a range limit check on certain fields between
the previous observation and the one being checked.

Finally, if any of the above checks fails, the appropriate flags are
printed out for return to the "verifiers" and appropriate action.

The manual interface, due to the magnitude of data, consists primarily
of a visual scan of all forms.   A random sampling of stations receives a
closely scrutinized check of all observations.  Problems with the data
requiring corrections are handled as follows:  first, the erroneous
entry is crossed through with a blue pencil and the "correct" entry is
made directly above the erroneous one.  Second, if the observation is
one which is normally digitized, a change form is routed to key entry.

                                      137
-------
The "kickouts" from the machine edit, which were returned for action,
are scrutinized, a decision on validity is made, and,  if necessary,  a
correction is made both on the original paper form and on a change form
to key entry.

The change forms are routed to key entry for digitizing and the changes
are again run through the machine edit.

The above procedure is a recurring one until no more errors appear.
Once all the data "pass" the edit, they are formatted into the surface
observation file and entered into the data bank.2

It is fairly obvious that, due to the rather limited nature of these
checks, some erroneous data slip through and are placed into the data
bank.  This fact, coupled with the realization that the magnitude of
incoming data in digital form is on the increase, and with the fact that
a more closely "real time" edit is both possible and needed, is forcing
changes upon NCC.

Although the basic processing stages of machine edit and manual interface
will remain the same, the nature of each will take on a new and challenging
meaning.

With the innovation of the new National Weather Service Automation of
Field Operations and Services (AFOS) system, the NCC will acquire near
real time collection capabilities of data in digital form.  These, plus
manuscript forms, create a real need for dual processing of data.

The edit computer program is being completely rewritten, as in its
present form it is difficult to maintain.   It is designed in a modular
form and many previously manual functions are designed  into the program.

The creation of a Master Station  Inventory  (MSI) will completely change
the complexion of the edit program.  The basic  edit routines remain the
same, with the following changes:

     1.  The verification step will  now be  checked against  the MSI for
validity.  Previously some missing entries  were flagged  to  a "verifier"
whether they were missing or simply  not observed at that particular
station.  The MSI will  now be checked  for proper disposition before an
error flag is returned,  thus alleviating  the  "verifier"  of  this task.

     2.  The creation of  the MSI  will  allow for a complete  set of range
limits  for every field  of every  individual  station, thus preventing
unnecessary  "kickouts"  for "good" data, and providing  for  a narrower
range limit  check of each field.

     3.  Cross-field consistency  checks will  remain basically  the same
with the provision  that with the  above mentioned checks, should be more
reliable.  They have been  "beefed-up"  to  contain closer checks and
checks  previously left  to  the  "verifiers."

                                      138
-------
     4.  If an error is isolated and a flag is called for, a check is
first made with the MSI to see if a mathematical  relationship exists.
If one does, a new value is calculated and entered beside the original
with an appropriate flag.

If an error is isolated and no mathematical relationship exists,  the
appropriate flag is issued and the observation queued for scrutiny by a
"verifier."  All observations changed by a "verifier" are automatically
re-entered into the edit program.

The manual interface by the verifier will consist of interacting  with
the data through use of an interactive graphics system.  The "verifier"
previously had only manuscript forms as input to his decision.  Now he
will be able to present the data in any of several displays including
contoured map analyses of a surrounding areal coverage.  With this input
the verifier will be able to make a more intelligent decision as  to
proper disposition of questionable data.

Up to this point we have discussed only a superficial edit of the incoming
data.  We have not, as yet, looked at the inherent quality of the data
itself.  NCC, at the present time, does not have the capability of doing
relational checks on the data.  With the acquisition of the Asymptotic
Singular Decomposition (ASD) model, developed by Dr. John Jalickee,3
CEDDA, NCC now has this capability.

In its simplest terms the ASD model uses the method of least squares on
a data matrix.

The first step is to calculate a "characteristic" vector for the matrix.
Next, the differences between the data matrix and the appropriate "charac-
teristic" are calculated.  The matrix is now overlayed with these dif-
ferences and the process is iterated.

The first component of vector magnitudes, when plotted, results in a
graph of the dominant features;  the second component, the features of
the difference matrix, etc.  We  have found that with most data fields,
the second and third component plots prove to be the most useful  for
validation.  By the time the fifth component plot is made, we have
usually reached the noise level.

The data, thus plotted, can be expected to show "continuity."  The
physical relationship of the field should be apparent in the graph.  If
that relationship breaks down at any point in the graph, we can assume
bad data.  This model will give  NCC the capability to perform quali-
tative (relational) checks on all  incoming data.

A side effect benefit of this model is  the capability of building a
station "normal" situation.  Based on this, such things as instrument
drift, miscalibration, and erroneous launch data become readily apparent
when exposed to a trend analysis.  Once  isolated, these "bad" data can
be adjusted  "toward" the normal  with at  least some degree of  accuracy.

                                       139
-------
The concept of the "verifier's"  job changes somewhat under this  new
approach.  The computer edit now will  do much of the job the verifier
did previously, thus relieving him of  that task; the bulk of which was
scanning "good" data.  Upon his  arrival  for duty,  he sits down at a KCRT
console or terminal  and calls up the flag file for his particular area
of interest.  He sits in the seat of judgment and  makes those decisions
too delicate or volatile to have been  programmed into the edit routine.
Once made, these observations are returned to the  edit queue to be run
once again.  Only after an observation "passes" the edit program is it
allowed to continue into the ASD model.

The results of the ASD run are displayed in one of several graphic modes
for verification.  Realizing that the  normal range of this display is
from +0.5 to -0.5 units, its power and usefulness  becomes apparent.
Remember here that this is a display of the second or third component of
the data field, and, as we are working with differences, should nicely
fit within this range.  The "outliers" will stand out here with striking
notoriety.  The verifier now has the task of "replacing"* the "outliers"
with a more reasonable value.  This can be done simply by sight align-
ment of his cursor or light pen with the trend of the curve or by having
the computer do a best fit.  Although  this sight alignment appears to
be a rather gross correction, when it is "blown up" into the initial
state it becomes very tolerable.

All original data are kept, with corrections and appropriate flags being
entered adjacent to them before being  incorporated into the NCC data
bank.  This will allow use of either datum by the user.

The NCC is currently planning a database environment.  This quality
control process will allow us to place only QC'd data of a high relia-
bility into our database, thus assuring the user of quality data.
Another side effect of ASD is its compaction possibilities for storage.
The set of components for a data field can be "blown up" to explain 99%
of the original field; thus NCC can store components and blow them up  to
the "original" field on output.  This will result in many orders of
magnitude reduction of the necessary storage facilities.
 *Note  here  that  "replace" does not imply that we destroy the original
 value.   It  will  be maintained and output along with the corrected value.

                                      140
-------
                                 REFERENCES
For your convenience,  a copy of the following three references  are included
herein, starting on the next page.   The generosity of Mr.  Walter James  Koss,
Primary Data Branch, EDS,  Asheville, NC 28801, for supplying these references
for publication in these Proceedings is appreciated.

1.  Barton, G.  and Saxton, D.   The  Role of Interactive Computer Systems in
    Data Processing at CEDDA.   Environmental  Data Service  (EDS) Magazine,
    pp. 10-14.

2.  Edit Procedures -  Surface Observational  Data.  Surface Section, Primary
    Data Branch, National  Climatic  Center, Asheville, NC 28801.  August 1975,
    31  pp.

3.  Jalickee, J., et.al. Validation, Compaction,  and Analysis of Large
    Environmental Data Sets.  Environmental  Data  Service (EDS)  Magazine,
    pp. 3-9.
                                     141
-------
The Role  of
Interactive Computer
Systems  in Data
Processing at CEDDA
By Gerald Barton and David Saxton
Introduction
The  Environmental  Data  Service's
Center for  Experiment Design  and
Data  Analysis  (CEDDAI  processes
enormous volumes  of  interdisciplin-
ary environmental data collected in
major  field research  programs  and
projects, such  as the  recent GARP
I Global  Atmospheric  Research Pro-
gram )  Atlantic Tropical Experiment
I GATE I. As an example, CEDDA re-
ceived 1,700 miles of magnetic tape
data from  the  four U.S. ships (Re-
searcher. Oceanoprapher, Dallas, and
Gillis) in GATE's primary array.
   CEDDA's goal is rapid  processing
to provide the  data to  the scientific
rommunitv  as soon as possible after
the completion  of a field experiment.
One necessary step is editing the data
to rcmo\e invalid readings. CEDDA's
current turnaround time for interac-
ti\e  editing of  a data  file,  is 1 to 3
weeks. It is hoped that a  new inter-
active  computer  system CEDDA  is
cunentl)  assembling  will cut  this
lime to ''.  hour or less.
Data Collection
During  field  experiments,  environ-
mental data are recorded continuously
by  instruments  on  ships,  towers,
buoys, balloons, and other platforms
at sample  rates from  10/second to
4/second. A wide variety of specially
calibrated sensors measure such vari-
ables as temperature, dewpoint, pres-
sure,  wind, radiation,  salinity,  and
rainfall.  The  outputs  are processed
and  stored on  multitrack  magnetic
tapes. One track is used  exclusively
for time so that the exact Julian date,
hour, minute,  second,  and 1/10 sec-
ond  for each sample are known.
  To  augment  this  high-resolution
taped data, each major sensor  sub-
system  output  is  supplemented by
logs, stripcharts, and optical marked
cards that record calibration checks,
sensor changes  (with all  serial num-
bers), and special events,  such as the
beginning or  end  of  an  instrument
cast.
   The completeness of the  data sets
and their security  are  matters  of
prime concern. At the  end of a phase
of  a field  experiment, or at  other
convenient intervals,  all  tapes,  logs,
cards, etc., are shipped  to CEDDA
using  the safest methods  available.
During GATE,  CEDDA had  a  data
manager  on each of the 4 U.S. ships
in the primary (B-scale) array and
also at the GATE Operations Control
Center to ensure the completeness and
security  of the  transfer process.

Current Processing  System
At CEDDA, the incoming  analog data
tapes are first checked for recording
quality   and completeness.  Next,  a
minicomputer  converts the  analog
data  to  digital form,  producing  a
digital  tape.  Playback time is 32
times  faster  than  field  recording
speed, so an 8-hour field tape is tran-
scribed in about 15 minutes. During
the minicomputer processing, an ad-
ditional computer time word is added
to each sample to control subsequent
data processing programs and to pre-
clude the loss of any sensor data due
to malfunction  or noise in the field
time system.
  Processing next proceeds to one of
NOAA's  larger  computer  systems,
where data sets are organized by com-
ponent systems  used on the data col-
lection platform,  e.g., Oceanographic
Data  Set  or Rawinsonde Data  Set.
Graphical display of the data as time-
series  plots  and graphs,  and  fre-
quency distribution plots, is required
for the analysis of these data sets.
  The editing features of the current
computer  processing  system can be
thought of as an  interactive graphics
system, with the time required for in-
teraction varying up to a  week or
more. For optical mark cards, reac-
tion is rapid since all event cards may
be  listed in  chronological order and
cards may  be  inserted, deleted, or
corrected using a list-edit program in
the minicomputer. However, for high-
resolution meteorological or  ocean-
ographic data which must be trans-
formed  to  engineering  units  and
properly scaled, display for editorial
review is currently limited to a micro-
film graphics subsystem  located in
nearby  Suitland. Md. For these data
sets the time required for interaction
includes the transport of  data tapes,
generation of microfilm graphics in  a
batch mode at the remote site, trans-
port of microfilm on the return loop,
review using microfilm readers, test-
ing of  automated  corrections  when
required, and the recycling to display

New Processing  System
CEDDA is  currently assembling the
hardware and software necessary to
implement an  interactive computer
system that will allow the data editing
and updating  functions to be per-
formed in a  single  processing step
 I real time). The main components of
the system  will  remain  a Digital
Equipment Corporation (DEC)  PDP-
] 1/50  minicomputer  and  an  IBM
360/65. It will  be possible to access
data on the IBM 360/65 through the
PDP-11  or through  terminals. The
                                                  142
-------
PDF 11/50
(184K bytes)
                     Floating-point hardware
                     Line frequency clock
                     Programmable real-time clock
                     RSX-11D operating system
DRUB
Interactive
? graphics
interface
DEC-writer
terminal
Optical
reader
•:•/•..- •:-->'fc:v??.y.f3
	 -r DR11C (12) t;
J~^" Uecomutation ••'/
•*" interfaces >>
Future DPll-DA
link to .S"~ Synchronous
IBM 360 communications
interface
BLUE
__ — _^_ Auto answer
acoustic coupler
interface
DL11E
— _f~~ Auto answer
acoustic coupler
interface
High speed
paper tape
reader/punch
Versatec
printer-plotter

9-
_. track
800 BPI 9-
^_^^ track
"~~~~ track
2UO, bob,
9. 800 BPI
track 	
1 — 800. 1600
RPT 9-
J — - track
• — — 800, 1600
BPI
40 ..:;\;\ . •'•.:;•
C million
u » 40
bytes ;*y
million
bytes
SUPERBEE r
Kpvbnnrrl Centronics
cathode ray tube iuu char/ sec
. . printer
terminal
I .phnratorv 	 _ \ir i.- l ]

peripheral analop
system "" "" ' " • .
A/D hardware
CEDDA's proposed interactive
computer configuration.
      143
-------
IBM
360/65

Color
TV monitor
Monochrome
TV monitor
Monochrome
TV monitor
Pictoral
hard copy
( 16 shades)
-4-7 PDF 11/50
* 	 ** (184K bytes)
)
	 R
> , i \ > i
"• v.
*•'••'••
Track '•,-}
ball (2) ;;
	 	 ._,.,..,_ Kcyhnnrd (?)

Pencil and
tablet (2)
i
AMTEK . -,,.,
and display
1256 levels)


t
j Video
' tape
recorder
        CEDDA's proposed interactive
        graphics subsystem configuration.
144
-------
PDP-] 1  will  have a  graphics sub-
system that will take less than 30 min-
utes to  perform the functions  of the
current microfilm subprogram.
  The major features and components
of the interaction system  are:
(Ij  Access  to the IBM 360/65 time-
sharing facilities via key, board cath-
ode ray tube iKCRT) terminal, ASR-
33  teletype  terminal, or  PDP-11/50
minicomputer.
(2)  Input terminals to the PDP-11/
50. including an LA-30  DEC  writer
terminal,  a KCRT. and  two  dial-in
terminal interfaces for use  with  re-
mote terminals.
13)  A  graphics subsystem  for the
PDP-11/50.
(4)  DEC's   (RSX)-llD   real  time,
priority-driven,  multidisking  execu-
tive system  for the PDP-11/50.
   With these  features, a user can  ac-
cess the 360 to perform mathematical
computations  or generate data sets.
He can look at  the data and analyze
them in real time  on  the interactive
graphics subsystem. V^ hen he finds
errors,  he can  immediately  correct
the data,  and display them again on
the  graphics  system  to validate the
corrections. He can then archive the
updated data  set for future  use.

Interactive Graphics
Capabilities
The interactive graphics  subsystem,
designed and assembled by Operating
Systems  Incorporated   of Tarzana,
California,  consists of  a RAMTEK
graphics  display  system interfaced
with CEDDA's PDP-11/50 computer
by an appropriate switching network.
Features of the full system (onlv part
of  which  is  required  for the data
editing job)  include two black and
white  TV monitors, one color TV
monitor,  two data entry  keyboards,
two pencil  and  tablet  systems, two
track ball cursor controls, a television
tape recorder with  microphone input.
a TV camera  with zoom lens, an ana-
log to digital converter, eight  planes
of  memory  that  allow  up  to  256
shades of gra}  or coloi and a cross-
print  switching  network  that allows
mixing control of inputs and outputs.
  A  simple  use  of an  interactive
graphics system  is the editing of raw
data displayed as'a time-series analy-
sis  or plot.  For example, a single
parameter, such  as temperature,  is
plotted at its highest resolution in a
time sequence covering  several hours
or days. Visual inspection of the data
may  reveal large  errors  where the
sensor or telemetry sv stem failed. To
correct  these  larger  errors,  a  win-
dow  edit program might be tested
with all "'good"'  values of the param-
eter constrained  to  fit  between the
upper and lower limits of the window.
Diurnal  and other trends might be
superimposed  on the data plot.  The
limits  and trends can  be displayed
with the raw data to  show which data
points ^hould  be edited out.
  A slightly  more sophisticated  ver-
sion  of  this time-series  plot would
compute running means over minutes
ur hours and shou which of th° liiHh-
resolutiori point* nil!  fall outside two
or thiee standard deviation*. Complex
cm vp=  nsini:  higher   01 dot   polv-
nominals  can  be fitted  to  time-series
data,  both before  and  aftci various
editing  passes, to eliminate,  insofar
as possible, "noise"  from the  data.
Various  filteis and  smoothing  func-
tions  also ran be tested and evaluated
befoie going into an  Automatic  Data
Pioccssing ( \DPl production mode.
   In  general.  CEDD V's new interac-
tive maphic« .system  will make it pos-
sible  to display  two 01  more curves
simultaneously, using coloi. intensity.
or blinking characteristics to distin-
guish, for example, between a stand-
ard and dial edit scheme or between
different parameters. It  u ill provide
the capability to produce haul-copy
documentation of  both  the trial  pio-
grams as thev progress dming a test
and the data sets used.
   \ more demanding requirement  of
an  interactive giaphics  svstem is the
ability to display  and opeiate on dig-
itized field data.  Vn example of this
type of data  i- ,i digitized i.id;u pic-
ture.  Under the contiol of an inter-
active graphics system,  the  analyst
should be able to select and  display
a radar picture, to rotate and rescale
it to a standard grid size, to enhance
the digitized increments bv contours
or  false  color transfer*, to  overlay
and compare it with  the previous pic-
ture,  and to display onlv those points
fiom  the two  pictutes whose change
exceeds some  threshold  value. Simi-
larly, the analyst  should be  able to
display  the  overlap  portion of disi-
tized  radar  pictures fiom  two loca-
tions  and to scale and normalize these
independently  so that compatibility i?
established on common echo systems.
  A further refinement is (he addition
of a TV-type  scanner  so  that  analog
material  can be rapidly  digitized  at
high ieso)ution and then handled with
all  the capabilities of  the interactive
graphic* system For example, a satel-
lite visual range photograph could be
scanned  and digitized  and then dis-
placed with a radar  picture coveiins
the same area. Specific rainfall rates
fiom  surface  observations could be
overlayed on (be same display so that
some  integration  of  area]  rainfall
amounts would be immediately avail-
able.
   \n interactive graphics system pro-
v ides the ability to overlay data from
different  platforms or  different  sv s-
teins  For example,  the temperature
and vertical velocity  from sensors at
several levels  on a  tethered  balloon
sy stem  could   be  compared  by  an
analyst for coherence and lags as eon-
vertive plume?  are  sampled.  Prop-
sonde* I atmospheric  soundings'1 from
aircraft could  be  graphically  super-
imposed  on simultaneous ladiosonde
soundings from ships   Spcdia taken
by  instrumented aii craft  duiinc ship
fly In s can be  compared with hiiih-
resolution  data recoided  on  board
each ship.
  CFDP \  plans  to rune the  neu
interactive computei system in oper-
ation by  late 1(17,~>  In that time, im-
plementation of flic  graphic* subsv --
tem should  include the v\oik done in
                                                      145
-------
the  current   COM  c\cle.  Future
CEDDA applications of the graphics
s\stem will include  program'; that al-
low display  of  radar or satellite pic-
tures  in  multicolors  or up to 256
shades of gray  using the tape-record-
ing features of the graphics -system.
It should be  possible  to  construct
time-motion   pictures  of   changing
weather features. Also envisioned is
the  capability   to  display   slices
through 3-D models of weather sys-
tems. CEDDA currently has analysis
programs that  allow an  analyst  to
change   parameters   in  a  weather
model. The real time  operation  of
the  graphical  display should  allow
the  scientists   to  experiment  with
parameters that he may never  have
had  fhe opportunity  to look at previ-
ously.
   It can be seen from the above ex-
amples  that an interactive graphics
system has broad  applicability, ex-
                                                  P
Gerry  Barton

The Authors

GERALD BARTON, Chief of CED-
DA's Computer  Systems Branch, has
a  B.S.  degree  in  Geophysics  from
Pennsylvania State I niversiu  and an
M.A. in Geological Science from the
Lniversit)  of Texas. Before coming
to CEDDA, he worked  for ten \ears
with the U.S Naval Oceanographic
Odice as a  geophysicist. His early  as-
sociation with the Oceanographic Of-
fice  included  gravity surve\  cruises
in the  I'SS Archerfish,  a research
submarine,  in  the  Western  Pacific
and  off the east and west coasts of
the United  States. From  1967 through
January  of  1974,  when  he  joined
CEDDA,  Cerrv   spent  most  of  his
time working in computer program-
ming,  systems design, and the  proc-
                   Dave  Saxton

essing of gravity and geodetic data—
to determine,  among  other  things,
the  deflection  of  the vertical,  or
"which way is up."
DAVID SAXTON joined  CEDDA as
Chief of the Operations Division in
April 1974,  following a 30-year ca-
reer  in  the  Air  Weather   Service
which took him to England. France,
German),  and Japan. Dave has  a
B.S. degree  from the University of
Michigan and an M.S. from  the Uni-
versit\  of  Chicago.  During World
War II,  he served as an Air Force
weather  forecaster in  Europe. After
the war  and a year of civilian/stu-
dent life, he was recalled  to  active
duty  and  assigned   to  the  joint
Weather Bureau/Army/Navy Weath-
er Central  in Washington, D.C. Sub-
sequently, he was posted to the Tokyo
tending from program design and test
through all  stages of data reduction
and processing to scientific data anal-
ysis. In addition, interactive graphics
provides programmers  and  analysts
with the ability  to see the data move
through  programs  from  recorded
voltages on  multiple  channel  tapes
until they become validated meteoro-
logical or oceanographic data suitable
for permanent archival  and  dissemi-
nation to the  user community.
Weather Central, then  to the USAF
Weather Central in  Suitland,  Md.,
later  moving  with  that  organization
to Offutt AFB, Nebraska. In 1961 he
was assigned as Chief of the Strategic
Air Command Weather Support Cen-
ter in High Wycombe, England. Four
years later  he  was assigned to Air
Weather Service Hqs.. Scott  AFB,
Illinois, as Chief of AWS' Computer
Techniques  Division. In 1967,  Dave
returned  to   Offutt,  now  the  Air
Force's  Global  Weather Central, as
Chief,  Development  Division,   and
later  Chief  of  Operations.  In  1971
he went to Hickman AFB. Hawaii, as
Chief of Operations Division, Head-
quarters, First Weather Wing. Retir-
ing fiom the military in March  1974
I with the Legion of Merit). he joined
CEDDA the following month.
                                                       146
-------
                    EDIT PROCEDURES
              SURFACE OBSERVATIONAL DATA
Contents                                                     Page

Card Images Keyed                                              1
Procedures                                                     2
No. 1 Card Edit                                                4
    Psychrometrie Check                                        4
    Limiting Range of Variability                              6
    Wind, Weather, Temperature, and Visibility                 7
    Cloud Coding                                               9
    Clouds and Obscuring Phenomena                            11
    Explanation of Edit Flags                                 16
Visual Checking of Records                                    19
No. 3 Card Edit                                               19
    Machine Computations                                      24
Precipitation Data Card Images                                26
    Checking Procedure - Hourly Precipitation                 27
    Checking Procedure - Extreme Precipitation                28
    Maximum Short Period Precipitation                        29
                    Surface Section
                  Primary Data Branch
               National Climatic Center
                Asheville, N. C. 28801
                      August 1975
                         147
-------
                     SURFACE  OBSERVATION  RECORDS  PROCESSING
                                              NWS   FAA   NAVY LAND*
                                                       FOB
       TSB
Eistrib«ti»n to E3S,
Kfl U>. and ISO  t».
   geceiv«s records (unuscript tacSM
   • Dd charts) froa KWS £ FAA stations.
   rre-«dits fora* and indicates keying
   instruction*.
   Kikes copies oC Preliminary LCD'"
   (Tora »-«> .
                                                                                                                  ADPSD
                                                                                                           Data Entry *eys eata on tape.
                                                                                                           Opens. Sect,  organise* cUt«
                                                                                                           on  tape and edits.
                                                     FOB
                                                                                                                   ADPSD
                                         1.  Revitv* edit
                                         2.  CorreeCs li«tii>9» and fomi.
                                         3.  Preparu ditenpaney r*pacu.
                                         4.  Kauriui discrepancy reports.
                                                               I.  D«tft entry keys corrections.
                                                               3.  Oprns. Sect, updates tapes  and re-
                                                                  edits (repeated as necessary  to
                                                                  obtain clean data) .
                                                               3.  tuns LCD COM copy for printing.
                                                                                                                                      \
                                                                                                                                       IfDCCK
                                                                                                                                         for
                                                                                                                                       frlntia?
                                                      toe
                                                                                  %>*
                                                                                                                   ADPSD
                                                                                                   \
 •tier SCC rroira
                         Ustings-
1.  Heviews Listing..
2.  rrepares data for special jobs as
    required.
J.  **vi Annual cape*.

                                                                                                              nan* data tables  for LCD Annual
                                            Ann. corrections-^

                                    ->U33 Ann. control cards ^
                                            . da« tables'
                                                        trol cards for ten
                                         3.  IrtMhl-T aad reviios US Annual
                                                                                    Print* and diitribut*. LCD
                                                                                                                        Data on tape
                                                                                                                          listings*
                                                                                                                          or cards
                                 1.  Microfilm records.
                                 2.  Archives reco: i» and Kicrofilai.
                                 1.  Kalataias stack. -A publications
   •S«rr Land r*cei
-------
                           EDIT PROCEDURES
                      SURFACE OBSERVATIONAL DATA
 I.     Introduction

       A.  Surface records are received at NCC for processing and quality
           control to produce several routine summaries by machine methods
           from taped data.  Processing includes keying, verification,  and
           quality control procedures.  After processing, records and sum-
           mary products are archived at the NCC.

       B.  A joint machine edit program for a portion of the hourly obser-
           vations has been made by EDS and AWS.  However, where different,
           only that which is applicable to EDS is listed in this outline.

       C.  Data are keyed on magnetic tape.  If wet bulb temperature and
           relative humidity values are not in the basic data as keyed,
           machine computations of these values are entered on tape.

       D.  The taped data are machine edited, corrected, and used in a
           number of machine programs producing various monthly and annual
           summaries.

II.     Card Images Keyed

       A.  WBAN No.l card - Hourly Surface Observations.  This image is
           keyed only for the hours corresponding to 3- and 6-hourly syn-
           optic times in LST for NWS, FAA, and Navy stations.

       B.  WBAN No. 3 card - Summary of Day.  This image is keyed from the
           summary blocks of Form MF1-10B, the B-16 or, in a few cases,
           from the F-6.  For FAA stations, the form is MF1-10C.  In gen-
           eral, this image is not used when the station program is such-
           that a summary approximately midnight to midnight is not possible.

       C.  The precipitation card series is:

           1.  Hourly precipitation - 2 images ( 1 & 2 keyed in col. 12)  for
               each day of the month having precipitation and for the last
               day of the month with or without precipitation.

           2.  Maximum short period precipitation, per month - 2 images
                ( 1 S 2 keyed in col. 10) for each station  per month showing
               maximum amounts for time intervals of 5 to 180 minutes.

           3.  Maximum 24-hour amounts, per month - 1 image (4 in col.  12)
               is keyed showing the greatest precipitation and date(s),
               greatest snowfall and date(s), and the maximum snow depth and
               date(s) .

                (a)  When the value is zero  (0), date is left blank.

                                149
-------
III.   Procedures
       A.   A scan edit of the forms is made and keying  instructions  appli-
           cable to the station program indicated on the  station folder.

       B.   Data Entry Section keys data on tape.

       C.   Operations Section transfers keyed data to computer tapes by
           record type.

           WBAN No. 1 images, hourly observations,  are  placed on two tapes.

           1.  Tape No. 1 includes NWS (except Antarctica)  and FAA stations.

           2.  Tape No. 2 includes NWS Antarctica and Navy stations.

           3.  The edit program'provides for priority editing on tape No.  1
               into two groups.

               a.  Group No. 1, stations in the LCD program, is edited in
                   two lots - first and second cutoffs.  The first cutoff
                   is made at the discretion of the Chief, Surface Section,
                   when 75 to 90 percent of the records for the month are
                   available; remaining records constitute the second cut-
                   off.  Records received unduly late can be held for
                   processing with data for the next month.
               b.  Group No. 2, stations not in the LCD program, is usually
                   processed after completion of group No. 1.

       D.   WBAN No. 1 records are edited according to the station's observa-
           tional program using a reference tape containing the station WBAN
           Number, Name, Elevation, Psychrometric  Pressure Table, and Obser-
           vational Pattern.

           1.  The observational pattern is designated by assignment of nu-
               meric values to fields in the card image and use of the sum of
               the field values applicable to the station for each hour as a
               control of the machine tests to be made.

                                     Value       Card Image Columns
                                         1              14-16
               Sky Condition             2              17-20
               Visibility                4              21-23
               Wea. & Obstruction        8              24-31
               S. L. Pressure           16              32-35
               Dry Bulb Temp.           32              47-49
               Dew Point Temp.          64              36-38
               Wind Dir. & Speed       128              39-42
               Station Pressure        256              43-46
               Wet Bulb Temp.          512              50-52
               Relative Humidity      1024              53-55

                               150
-------
Field                Value       Card Image Columns

Total Sky Cover       2048                56
Cloud Layers          4096               57-58
Total Opaque          8192                79

The observational pattern is keyed in two cards as illus-
trated in Fig. 1.  The station WBAN number in cols. 1-5,
the first 12 hours LST of the day in cols. 7-11, 13-17, etc.,
in the card keyed 2 in sol. 80 as an identifier and the last
12 hours in the card keyed 3 in col. 80 as an identifier.
The observations are sorted from the original tape into
chronological day and hour order, edited, and one observa-
tion only for each hour (first on the original tape if
multiple entries) transferred to another tape (called the
sorted tape).

Only the records questioned in the edit are listed.  Complete
data, keyed and computed, in a questioned record are listed
on format paper (Fig.Sa) with triple spacing.  Appropriate
flags appear on the line above the data in the first column
of the field(s) questioned.  Field corrections are entered
on the second line above the data for keying.
a.  An asterisk "*" indicates inconsistency.
b.  An ampersand "&" indicates data not in the station's
    program, except that, if there is an inconsistency, the
    "*" flag instead of the "&" will appear.
    Observations not in the station's program are edited
    as though all fields were required.
c.  "DUP1," "DUP2." etc., are listed to indicate duplicates
    up to three.  All duplicates are edited, but only the
    first observation on the original tape is transferred to
    the sorted tape.
d.  "MSG" above the day and hour indicates an observation in
    the edit pattern is missing.  "- -" for the hour indicates
    the entire day is missing.

An inventory listing (Fig. 3g) at the end of the edit listing
for each station shows all hours for which observations are
on tape with the total number of observations on the tape
for the month at the end of the inventory listing.
a.  "01" printed under an hour indicates an observation with
    the cloud field keyed.
b.  "02" indicates an observation without the cloud field
    keyed.
                151
-------
031/53 pOO|Op Op Op 14(347,
1 1 )i> S «|7 1  1,
J 45 JG|;7*4I A[ p' 5: H|M ej SO[S' *- i K'lfc! '-' '^M £>K|b' ti -.9|^['i 11 73(" n »
7 1 "j


|00
111
222
3*3
444

555
666

777
888
999
i i >
|00
1 1 1
222
3|3
444

555
566
III
1 1 1
222
333
444

555
666
ONE
777
388
999
SAC
777
888
999
M4*3
101
1 1 1
222
(333
444

555
,666

?77
!888
5999

III
1 1 1
222
333
444

555
666
TWC
7 7 /
888
999

too
ME
222
333
44*

555
666
)
777
388
999
000
1 1 1
222
1333
t4j4

5 55
566

77|
!I88
3999

Jll
1 1 1
222
333
444

5 5 5
5 66
rHRE
7 77
888
999
n
in
1 1 1
222;
333"
444-

555:
66ft
E
7 7 7
888
999J
Ill
1 1 1
222
333
444

555
666

777
!888
1999
ma ,»•!.*
lie
1 1 1
222
333
4 44

555
666
FOU
777
888
999
ijj,.
000
ill
222
333
414

5 5 5
666
R
7 7 7
884
999

loot
1 1 1
222
>333
1444

)555
>G66
III
1 i 1
222
333
444

555
66 6
FIV
"tn
)888
3998
7 7 /
888
999
""""
|0||
1 1 1
222i
333:
44 4 <

555:
656E

7 7 7
888
999

111
1 1 1
222
333
444

555
666
000
1 .'1
222
333
4 4 i

555
566
0 (i ',' !
1 1 1
222
333
•!|4

5 5 ;.
5 £ 2
SIX
7 77
888
)999
777
888
99?

/ 1
|8S
999
III
1 1 1
2?2
1333
144

555
,666

7 7 7
)888
< ~3 * J
no
1 1 1
222
333
444

5 55
566
Itl
1 1 1
22?
333
4M

555
i ' 6
I
1
2
3
4

3
5
SEVEN
/ ; ?
888
999
"'"""' " -"""
// /
888
999
/
8

IGO
! U
2 22
3 * -
4 4 4
000
1 1 1
222
333
"4f
i
5 5 5
6 65
3 0 C G
111!
222|
3333
!4 4 4

j 5 S'.i 5 ; 5
5 6 6|6 6 6 5
EIGHT
7 7 7
88S
'99
'
/ / /
eg 8
999

1777
5883
3993
.,.«..
03103 pOO,Op OpOOp I|*j34r, 00,000, pOOpO
i i i|i : !j> t •|io|r a u|i< is ii|i.' ii u|»bi uap< Bxfmn\a\it av\x tut
1494,7 0,000,0 0,0,000, 14,3*7
,000, ii ,'jOOflQ 1,^4,7 ,:.
•- s 44i '.' «! • « «|a « «Uk •! -^ •-• • 1" - -I* H e i'H - «!- - --[ *• « .'


|00
H|
222
3|3

444

555


666

III
888
999
1 ! 1
|ou
1 1 1
222
31 3

444

555


666
ONE
7 7 /
888
999
4 I f
CAC
Ill
11 1
222
333

444

555


666

1 t 1
888
999
i i i
M40
loiill
iiini
!222
1333

444

555


222
333

444

55-5


> 6 6 6|6 6 6
TW(
mi
i888
)999
Mil 1!U
/ 7 1
888
999
n u»
4uo
ill
222
333

444

555


366
3
7 7 /
388
999
0
1
2
3

1

5


6

'
8
9
D; u life
000
1 1 1
222
333

4|4

555


666

} 7l
188
999
binz,
oil
1 1 1
222
333

444

55-5


666
FHRF
7 7 1
388
999
WUH
n
Ui
1 11
222
333

444

555


666
E
7 7 7
888
999
bra a
itn ,
J
1
2
3

4

5


6

7
8
9
£
1
Hill"
,11
222
333

444

555


1 1 1
222
333

444

555


6 6 6K 6 6
FOU
7 77
888
999
11 BE
7 7 /
888
999
WSJ!
001
in
222
333

4|4

555


666
R
77 7
88|
999
axx
•/
1
2
3

1

5


6

7
S
9
tw
oo|
1 1 1
222
333

444

555


666

17 7
888
999
k «
-------
           5.  Corrections for updating the tape are keyed on a "correction
               card" image by fields or by keying a complete No. 1 card image
               for missing observations or those having numerous field errors.
               Following the updating of the tape, another edit is made in-
               cluding the inventory listing.

IV     Details of No. 1 Card Image Edit

       A.  Major check groups
           1.  Psychrometric check:  relationships between T, T ,  T  , & KH.
                                                               w   dp
           2.  Limiting ranges of variability.
           3.  Wind, weather, temperature, and visibility and certain
               interrelationships.
           4.  Cloud coding.
           5.  Cloud, ceiling, and sky cover relationships.

       B.  Psychrometric Check

           Psychrometric relationships.  The program is designed to accept
           and check the interrelationships between the four psychrometric
           parameters if all are keyed, or to compute T  and RH and then
           check the interrelationships if only T and T   are keyed.  The
           notations are in terms of whole degrees Fahrenheit for tempera-
           ture and whole percentages for relative humidity.  If there is a
           suspected error in these relationships, the observation is printed
           out complete, including an appropriate error flag.

           The empirical formulas used to compute Tw and RH (with respect to
           water)  are:

           1.  Computation of wet bulb (Tw):
               If the dry-bulb temperature (T) is zero and above:

               Tw = T - (.034N - .00072N  JN - Ij  ) (T + Tdp - 2P + 108)

               a.  If T is less than 100°F., rounding of Tw follows this
                   scheme:

                   Tw rounded = Tw + .9 if the tens position of T is 0, 1, 2.

                   Tw rounded = Tw + .9 -.01(T +  .9)  if tens position of T is
                   3, 4.

                   Tw rounded = Tw + .4 if the tens position of T is 5 thru 9.

               b.  If T is 100°F. or higher:

                   Tw rounded = Tw + .9.


                                    153
-------
     If the  dry-bulb temperature  (T) is less than  zero;.

     TV = T  -  (,03'tf -  .oo6;;?)(,6rr -f Tdpl - 2? -v-  108)

     TV rounded =  Tw -  .OlTdp

     H » T ~ Tgp in the above  equations ,
          10
2.    Computation of relative humidity:

     RH
            173 + ,9T

     The checking procedures print out the error flag if:

     Tg  is greater than Tw and if Tw is greater than T, and
     if the following are not satisfied:

     a.   In the range of temperature from -60° to -M39°» the
         dew point range may be -60° to +90° •   For individual
         observations t the dew point check requires that s.
         maximum Tdp taking T -0,5°F, and T v+ O.U°F., and a
         xaininum Tdp- taking T + Oj*°F,,and TV - 0.5°F.  (if
         T = TV, maximum Tdr> = TV.) Saturation vapor pressures
         from tables stored  in memory:   Table A  (vapor pressure
         over vater for  the  range  -60° to  -VlUO°F,)  or B  (vapor
         pressure  over ice for the range -60° to -5-31°F.) are taken
         for the above values of Tv.  The  vapor  pressures for each
         end of the allowable dew  point  range are  then computed,
         using

         e = ew - 0.000367? (T - TV) A + Tv^-jsa  \
                                     \     1571  /
                    P are in inches of mercury.  Pressure may be
         taken from individual observation, or from the pressure
         applicable to elevation range in vhich station is located.)

         From the vapor pressure tables in memory, the dew point
         temperatures corresponding to the vapor pressures at
         each end of the range, which are for the air at tempera-
         tures T + O.Uo?t| Tv - 0,5°F.t and T - 0,5°F., Tw + 0.1*0?.,
         are taken in terms of whole degrees of dew point.  If the
         dew point being checked falls with 2° above or 2° below,
         it is accepted as correct.  If outside this range, an in-
         dication of psychronetric error is printed.  Note that
         if station pressure values are not recorded in the obser-
         vations, computation of Tw should still always be possible
         since the program will taken an appropriate pressure va,lue
         that corresponds to the station elevation.

                              154
-------
        b.   Relative humidity values are accepted if they are in the
            range of 4% to 100% and are within 2% above and 2% below
            the computed range of humidity below.  All values less
            than 4% are flagged for review.  For hygrothermometer sta-
            tions, the relative humidity will have been computed by
            the formula in 2 above; for other stations it will have
            been keyed from the original record.

            The range for relative humidity is determined in the same
            way as for the dew point check.  Maximum and minimum vapor
            pressures are obtained from the taped tables for each end
            of the range, and the computation at each end of the range
            is by this formula:
                 e
            RH = — ,   e being the vapor pressure of the dew points
                  s    and e  the saturation vapor pressure of the
                            s
                       air at the observed temperature plus 0.4°F or
                       less 0.5°F.

            If liquid fog is reported in present weather and the tem-
            perature is 31°F. or less, T   = T  = T is acceptable.

            If T is less than -35°F., no formula is applied.  In the
            latter case, when T = -36° or - 37°, an error is listed if
            the dew point does not fall within the range T - 6° (plus
            of minus 1°).  An error also lists if temperature is within
            the range -38° through -53° and dew point is not in the
            range T - 7° (plus or minus 1°).

C.          Limiting Range of Variability

            Limiting values, some absolute and some dependent on other
            elements within an observation, are incorporated into the
            machine edit program for checking purposes.  Items with
            values outside the limits, or such as appear inconsistent
            with other elements in the observation, or approach extreme
            conditions are flagged for technical review as follows:

            1.  Sea-level pressure:  above 1060.0 or below 940.0 mb.

            2.  Station pressure:  if pressure in inches and hundredths
                plus 10~3 times the elevation (H ) in feet is less than

                27.75 or greater than 31.30 inches.

            3.  Change of sea-level pressure from one observation to the
                next is greater than 6.0 mb., change of station pressure
                from one observation to the next is greater than 0.20
                inches.  The interval between observations in both cases
                is 3 hours.  For 1-hour, 3.0 mbs. & 0.10 inch apply.

            4.  Temperature:  T, above 125° or below -60°; T  , above 125°
                or below -60°; T, , above 90° or below -60°,Wand if T
                                dp                                   w
                and T.,  are present and T is -53° or colder.
                     dp

                             155
-------
    5.   Temperature fluctuation from one 3-hourly observation to the
        next:   if T or T   changes 20°  or more from one 3-hourly obser-
        vation to the nex€,  the observation which varies 20°  or more
        from the preceding is flagged for review.  Changes of 10° are
        flagged for hourly observations.

    6.   Relative humidity:  below 4%.

    7.   Winds:   When wind at one 3-hourly observation of 20 knots or
        more doubles at the next 3-hourly observation, or reaches 50
        knots,  the wind speed is flagged for review.  (In AWS version
        of this edit, all winds 30 knots and higher are flagged for
        review.)

    8.   Visibility:  is 15 miles or less at one observation and 70
        or above at the next.

    9.   Obscuration and cloud heights,  as follows:

        a.  Obscuration                        greater than 4,000 ft.
        b.  Fog                                greater than 1,500 ft.
        c.  Stratus, stratocumulus,            greater than 9,000 ft.
            stratus fractus, cumulus
            fractus, cumulus mamatus
        d.  Cumulus, cumulonimbus              greater than 12,000 ft.
        e.  Altostratus, altocumulus,          less than 4,500 ft. and
            nimbostratus, and altocumulus      greater than 20,000 ft.
            castellanus
        f.  Cirrus, cirrostratus, and          less than 15,000 ft.
            cirrocumulus

D.  Wind, Weather, Temperature, Visibility:

    1.   Wind:  direction is recorded and keyed in tens of degrees from
        north  (00 = Calm), and wind speed in knots  (00 = Calm).  If
        speed is 00, direction must be 00.  Legal directions other than
        00 are 01 through 36.  The wind error indication is printed
        with illegal directions, for speed of 01 or more with direction
        00, for direction of 01 - 36 with speed 00, and for exceeding
        limits mentioned in b above.

        Speed is related to the check of blowing dust, sand, blowing
        spray and blowing snow.  Observations in which these items
        appear with wind speed less than 9 knots are flagged.

    2.   Weather:  the following items and observations containing them
        are flagged for review:

        a.  Tornado.

        b.  Ice crystals with intensity indication or in combination
            with any other element.

                              156
-------
c.  Fog or any form of precipitation with clear sky (0 cloud
    amount) except ice crystals.

d.  Fog with dew point depression greater than 8°F.

e.  Fog with less than 1/10 cloud cover.

f.  Weather types below with visibilities other than those listed:

    Weather                        Visibility range
    S+, SP+,SW+, L+, ZL+, SG+      000-004 (0 - 1/4 mile)
    S, SP, SW, L, ZL, SG, 1C*      005-007 (5/16 - 1/2 mile)

    (* Note:  1C may be reported with higher than 1/2 mile visibility)

    S-, SP-, SW-, L-, ZL-, SG-     008	 (3/4-unlimited)
    F, IF, GF, BD, BN, K, H,KH,    000-060 (0 to 6 miles)
    D, BS, BY

g.  Weather types (all intensities)  with temperatures other than
    within ranges below:

    Weather                        Range of temperature

    R, RW, L                       28°F. or higher
    ZR, ZL                         No lower limit, to 39°F.
    IP                             10°F, through 44°F.
    SP, SG, S, SW                  -40°F. through 44°F.
    1C                             -40°F. through 15°F.
    IF                             -40°F. through 15°F.

h.  100% relative humidity reported without liquid fog or liquid
    precipitation in the weather fields and wind speed > 6 knots.

i.  Illegal visibility codes are flagged for correction.  The legal
    visibility codes are:

    VSBY      Code   VSBY     Code       VSBY     Code
0
1/16
1/8
3/16
1/4
5/16
3/8
1/2
5/8
3/4
1
1 1/8
1 1/4
1 3/8
000
001
002
003
004
005
006
007
008
009
010
012
014
016
1
1
2
2
2
3
4
5
6
7

5/8
3/4

1/4
1/2






018
019
020
024
027
030
040
050
060
070

8
9
10
11
12
13
14
15
20

and,
080
090
100
110
120
130
140
150
200

by 5 mile
increments , on






to
95

950
    1 1/2     017                      > 100      990

                         157
-------
E.  Cloud Coding

    Ceiling, sky condition, and clouds are interrelated.  Three
    columns are keyed for ceiling height.  The valid codes are as
    indicated below and any others are flagged for correction.

    Ceiling height                        Card code

    Unlimited                             XXX
    Zero                                  000
    100 ft. - 5000 ft.                    001 - 050
    (every hundred feet)
    5000 ft. - 10,000 ft.                 050 - 100
    (every five hundred feet)
    10,000 ft. and higher                 100 - 250, etc.
    (every thousand feet)

    Sky condition is a four-position  (4 card columns) field, with
    provision for keying four sky condition symbols, as may be
    recorded in the MF1-10A Sky column.  Heights of clouds are not
    keyed in this field  (ceiling is keyed in the ceiling field and
    cloud heights in the "layer" fields discussed below).  If less
    than 4 symbols are reported, keying begins at the left of the
    field, with "0" keyed in each column at the right of the field
    for which no sky symbol is reported.  The lowest sky symbol is
    keyed first, the next highest second, etc., until the 4-column
    field is coded completely, either with sky condition symbols
    (including blanks) or zeros.

    If more than 4 sky condition symbols are reported,  the highest
    is keyed in column 20, and the first three in columns 17-19, un-
    less this excludes the ceiling symbol.  In the latter case the
    ceiling symbol is keyed in column 19, the first two in columns
    17 and 18, and the highest in column 20.

    For a partial obscuration  (-X) the first column of  the sky con-
    dition field is left blank.  The  succeeding three columns are
    keyed for reported sky conditions.

    No clouds or obscurations  (clear) is keyed 0000.

    An obscuration  (not partial) requires an X key in the first or
    second column of  the sky condition field.  If the obscuration
    is the lowest sky condition, the  X will be in the first  column.
    If a cloud layer  is  reported below the obscuration  it will be
    keyed in  the first  column  in the  normal manner, and the  X in the
    second column of  the field.  In this situation, the last two
    columns of the field would be 00.
                             158
-------
The table below presents the valid codes of the Sky Condition
field,  in the table, p = punch, b = blank, and - = X.
Card
code
0000
Card column punching | Description
possibilities j
0
pOOO ' 1,2,4
5,7,8

ppOO


pppO



pppp

-000

bOOO

bpOO

bppO

bppp

p-00


1,2,4
5,7

1,2,4
5,7


1,2,4
5,7
X

Blank

Blank
j
Blank

(
Blank

1,2,4

0
0


1,2,4
5,7,8

1,2,4
5,7


1,2,4
5,7
0

0

1,2,4
5,7,8
1,2,4
5,7
1,2,4
5,7
X

0 0 I Clear sky, (less than 1/10) .
00 j One symbol only, not an
i obscuration or partial
j obscuration.
0 j 0 | Two symbols reported, no
obscuration or partial
. obscuration.
1,2,4 i 0 i Three symbols reported, no
5,7,8 f j obscuration or partial
1 obs cur a ti on .
i
1,2,4 j 1,2,4 i Four symbols, no obscuration
5,7 : 5,7,8 i or partial obscuration
0 0 Obscuration, 10/10 sky hidden,
no layer below obscuration.
0 0 Partial obscuration, no other
symbols.
0 0 Partial obscuration, one other
symbol.
1,2,4 0 Partial obscuration, two other
5,7,8 symbols.
1,2,4 |l,2,4 Partial obscuration, three other
5,7 ! 5,7,8 symbols.
j
i
0 j 0 Obscuration above one layer of
i i cloud
                           159
-------
F.  Clouds and Obscuring Phenomena.

    Provision for keying as many as four layers of clouds and/or
    obscuring phenomena, total sky cover, and opaque sky cover
    amount is made in this field.  Cloud layers are keyed in ascend-
    ing order.  If more than four layers are reported, the four
    lowest are keyed.  The lowest layer is always keyed in the left
    hand cloud field of the card.  For each layer, amount, type, and
    height are keyed.  For the second and third layers (if reported),
    the summation amount(s) is keyed at the level(s) involved.

    If a complete cloud layer section is reported unknown, "U", on
    MF1-10B, the corresponding card field for the entire layer is
    left blank.

    When fog or any other obscuring phenomenon is reported, it will
    be handled in a manner similar to a cloud layer, and an amount,
    type, and height will be keyed.  Obscuring phenomena other than
    fog (smoke, for example) are keyed X for type.  Heights of clouds
    and vertical visibility into obscurations are keyed in hundreds
    of feet.  Where vertical visibility is unlimited  (dash in height
    column of MF1-10B) height is keyed XXX.  If cloud field is re-
    ported clear or none, height will be keyed XXX.  If cloud height
    is reported unknown (U), height is left blank if type is unknown.

    Summation totals may not exceed 10/10, but the  first summation
    (card col. 67) may be 1 greater (not exceeding  10/10) than the
    sum of card columns 57 and 62; and card column  73 may be 1 greater
    than the sum of card columns 67 and 68, not to  exceed 10/10.
    Total cloud amount  (card col. 56) should be the same as col. 57
    if only one layer is reported, the same as col. 67 if only two
    layers are reported, the same as col. 73 if only three layers are
    reported, and equal to not more than 1 greater  than the sum of
    cols. 73 and 74  (not exceeding 10/10) if four layers are reported.

    1.  Legal codes in  the  card  field for  "Clouds and Obscuring
        Phenomena" are  related to the Ceiling, Sky  Condition, and
        Weather and/or  Obstruction to Vision fields.  Accordingly,  a
        discussion of the several relationships is  presented.

        a.  If sky condition is  reported clear, ceiling must be un-
            limited.  Summation  of all clouds must  be zero.  Type and
            height in the  cloud  layer fields may be keyed for zero
            amount  (less than 1/10).
                             160
-------
b.  The ceiling height must be consistent with the height of
    the lowest cloud layer whose corresponding symbol in the
    sky condition field is broken or overcast, or with the
    height of an obscuration.  The total (if one layer)  or
    the summation amount at the layer constituting the ceiling
    must be equal or greater than 6/10.  The cloud type must
    be coded either 2 through 9, X/2, X/4 through X/7, X/9, or,
    if an obscuration, 1 or X.

c.  If the ceiling is not XXX (unlimited), some sky symbol must
    be keyed:  i. e., broken (5), ovc (8), or obscured (X in
    1st or 2nd col. of sky condition field).  Only one X may be
    keyed.  It will be the first column of the sky condition
    field if the lowest layer is the obscuration; the second is
    a layer below a portion of the surface-based obscuration.

d.  If the first cloud layer contains 10/10 F or IF (not GF),
    the ceiling height must equal the height of the first layer,
    and the sky must be obscured.

e.  If fog is keyed as an Obstruction to Vision with clear sky
    or with partial obscuration or less than 5/10, with no
    clouds above, the fog must be GF or IF (not F).  If the
    partial obscuration is 6/10 or greater, with no clouds above,
    or with obscuration (10/10), the fog may not be classified
    as GF.

f.  If total opaque is zero, all sky symbols must be thin or
    clear, and ceiling must be unlimited.

g.  If any sky symbol is thin, the total opaque amount must not
    be more than half the summation amount of that layer and
    all higher layers (not always in error for higher layers,
    but should be flagged for review).   If the ratio of total
    amount is 1/2 or less, the highest sky symbol must be thin.

h.  Sky condition symbols must,  with increasing height,  reflect
    equal or increasing sky cover.  Only the highest sky symbol
    may be overcast, except that below an overcast there may
    be a thin overcast.

i.  The highest sky condition symbol must be compatible with the
    amount of total sky cover.

j.  If obscuration  (X) is reported as the second sky condition,
    the second cloud layer type must be obscuring phenomena
    (fog, ice fog, smoke, rain,  snow, for example) keyed X;
    total amount, total opaque, and first summation total must
    be 10/10.  The third and fourth cloud layers and the second
    summation total columns should be blank (may be keyed if
    an aircraft report has been received, but should be flagged
    for review).  The third and fourth sky cover symbols must
    be zero  (0).  Ceiling must not exceed the second cloud layer
    height, and that height should be 4,000 ft. or less.  Normally
    fog will be questioned in such a situation.

                          161
-------
k.  If obscuration is reported as the first sky symbol, the
    type of the first cloud layer must be fog (code 1)  or
    other obscuring phenomena (code X), total amount and total
    opaque must be 10/10, and height must correspond with ceil-
    ing height.  The second, third, and fourth cloud layer and
    first and second summation total columns should be blank
    (may be keyed if an aircraft report has been received, but
    should be flagged for review).  The second, third, and
    fourth sky cover symbols must be zero (0).  Height and
    ceiling should be the same.  Height should be 1,500 ft. or
    less if fog (code 1) or 4,000 ft. or less if other obscur-
    ations (X) are encoded.

1.  When fog  (code 1} is reported in the first cloud layer,
    amount not coded 0, the sky condition must reflect an ob-
    scuration or partial obscuration.

m.  When fog is reported as the only cloud field (code 1), it
    should be coded in Obstructions to Vision as GF if amount
    is 1 to 5 tenths, or F if 5 to 10 tenths (prevailing
    visibility being 6 miles or less).

n.  The corresponding cloud and summation total columns for
    sky cover symbol reported (code ) above an overcast
    (code 8 in Sky Condition) should be:

    -1.  Blank if total opaque is 10/10.
    -2.  Zero  (0) in amount and type columns, 10 in summation
         total columns, and XXX in height columns whenever
         total opaque is less than 10/10.  Additional  layers
         may be keyed if an aircraft report has been received,
         but should be  flagged for review).

o.  Partial obscuration  (blank in first position of Sky Condi-
    tion) must have a first-layer amount from 1 to 9,  type
    must be fog  (code 1) or obscuration  (code X), and  height
    must be unlimited.

p.  Some stations  (FAA) do not observe cloud layer values,
    but do enter total  cloudiness and total  opaque.  If the
    ratio of  total opaque to total amount is 1/2 or less,
    there should be no  codes 5,  8, or obscuration  (X)  in Sky
    Condition.  If the  ratio  is greater than 1/2, there must
    be 5, 8, or X in Sky Condition, and  if ratio is not 1:1,
    X is invalid.   (The valid blanks  in  cloud layer field will
    cause "2's" to print in the  inventory listing for  these
    stations.
                     162
-------
2.  The testing procedure to flag errors or suspected conditions
    in Clouds and Obscuring Phenomena, Sky Condition, Total Clouds,
    and Total Opaque is systematic.  Missing fields are indicated
    (except the valid condition for FAA coded "2" in the inventory)
    in the usual manner.  The system, in general follows these
    steps:

    In the cloud fields, the valid codes for cloud amounts are
    0 = no clouds or less than 1/10; 1 - 9 = 1/10 to 9/10 clouds;
    X = more than 9/10 or 10/10.

    Valid codes for cloud types are 0 = NONE; b (Blank)  when a
    cloud type is reported UNKNOWN; 1 = Fog; 2 = Stratus;
    3 = Stratocumulus; 4 = Cumulus; 5 = Cumulonimbus; X/2 (K)  =
    Stratus fractus;  X/4 (M)  = Cumulus fractus; X/5 (N) = Cumulus
    mammatus; 6 = Altostratus;'"7 = Altocumulus; X/6 (0)  = Nimbo-
    stratus; X/7 (P) = Altocumulus castellanus; 8 = Cirrus;  9 =
    Cirrostratus; X/9 (R) = Cirrocumulus; X = Obscuration other
    than fog.

    The valid codes for cloud heights in the cloud layer fields
    are the same as for ceiling heights (in number of hundreds of
    feet); XXX indicates NONE or (in the first layer) a surface-
    based partial obscuration; and bbb (Blanks) indicates cloud
    height unknown with type unknown.

    Errors are listed for invalid codes, and if

    a.  Any cloud field element is keyed and the Total Clouds left
        blank.
    b.  Total opaque is not keyed, unless indicated in the station's
        observation pattern (III. D. 1.) .
    c.  Total opaque is greater than total cloud amount.
    d.  Total opaque is less than 10/10, and any blanks occur in
        Cloud and Obscuring Phenomena fields including summarions,
        FAA excepted.
    e.  Any element within a cloud layer is keyed (amount, type,
        or height), and any other element is left blank.
    f.  Total Cloud amount is keyed from 0/10 to 9/10 inclusive,
        and amounts and types of fields above highest reported layer
        are not coded "0" and heights are not coded "XXX."
    g.  Each summation amount does not equal or exceed the next lower
        summation amount, or if a succeeding summation amount is
        greater than 1 more than the amount(s) of the accitional
        layer(s), or exceeds 10/10.

        1.  In the case of partial obscuration in the first layer,
            the second summation is not greater than the, amount of
            the first layer if a cloud layer is also reported.

                          163
-------
    2.  The summation amount is less than any lower individual
        cloud layer amount.

    3 .  Blanks in a summation amount are not preceded by 10/10
        in the last summation amount (which is not blank) by
        blank cloud amounts in fields with blank summarion
        amounts, and 10/10 Total Opaque is not reported.

h.  Height ranges by cloud types are in disagreement with those
    listed under Obscuration and Cloud heights .

i.  Fog (code 1) is coded in layers above the first.

j . * Height in cloud layers reporting height does not increase
         one layer to the next.
k.HfThe ceiling height does not agree with the lowest layer
    height constituting a ceiling, or the highest sky symbol
    is not compatible with total sky cover.
   4
  »
                        164
-------
G.  Explanation of Edit Flags

    Beginning on the following page are numbered explanations of edit
    flags appearing in the correspondingly numbered observations printed
    out in the Surface Weather Observations format.  An asterisk (*)
    prints over fields to be checked.

    The edit is designed for No. 1 card images keyed every 3 hours (at
    local standard time corresponding to 00, 03, 06, 09, 12, 15, 18 and
    21 hours GMT) and is readily adaptable to programs under which all
    record observations are keyed.






1
1-
c:
in
(0
0
0
u_
1
"
i
<
*-
n
y
s
§


STAT'OH
NUMBER


00000
i a 3 4 s
11111
22222

33333

DATE

rr»

00
j 7

no

3D
8 9
1111
22

33
2

DAY

iiO
1911
1 1
22

333


00 NOT PUNCH
IN THESE
COLUMNS
S666C
77777

88886

99993


66
77

88

6
7

8

99 9




4

5
6
7

8

9


APK323O BSC I
1 CEIltj } «KV 1 {WEATHER AMD/OKI
j II IcovjT^fi! l^sr^jcrod-, To.sjW
KOJA


OC
1211
1 1
22

3

4

5
E
7

8
•
»ct w s 'S Ij i)
mi, II !| I, 1
^M ; i ,
S333 !
OOG
srrVf '.-•""'-'' J 5. I;"-;
°EW I WIND 1 I ORY
PO,»4T 1 	 -— r-J 1 Bill p

iMttLC?jL£*i*>i "| F>*l",->, j « irStSSJRc1 j
pS|>«c»
3IOIDISIO 8:0
0 5^0
;4l5i; 17 '8 :S !i 21 2125 24
1 1 1
222

333

444

55
66
77

88
I
cj g o
i


dstes

1 t t

assist

fiSU^I
1 1 1
25I»
1 1V!"-1
..ill
1 '
33V4-!

44V,

5551s
66'i
me&i ? \
i
jte&felaa'i


III 9 9 *<


12 13, '4 15 11 n ig iv A'UI ^/ a
i 1 1



o








24
|
K L-
1

1
1
T.'K'b
1
»Ui
1
[8-fl-
1

73! 21
i
TKt
1
OlOiOICl!)
27IH<21|3sh
S-*3»-if f 1 1
1 1
1 1
s>ii«if-
1 1

1 1
5?l U
1 1
1 1
teH
1 1
i;kcl
1 1
l»»
1 1
I
I
Sl|i>
1
1515
1

1
b-
1
1
1
1
1
1
1
WET
RM B
^I'tm^^'Jj-f'f^-"
"" a i" «
C'.eEi j-t, f-i:1-!
0 0 DlO
n 15 "JJ3
1 111
1
22(2
I
3313
I
4414
1
55'5
66i6
1
7717
1
8818
1
99919
1

IT i 7S|29l 30)31 1 3! 33 34)35
i i 1 i 1 i
-flOOC
36 37 C3 :9 4C
1 ill 1
22122

33

44

55
66
77

88

99


2S 3J 33

33

110
41 4:
1 1
22

33

444
i
r,
6
7

8

9


39 41
55
66
7 7

88

39


4! 42
(I'.CHES) | i

0 CIO !i
43 «|4) 45
1 111 1
1
22122
j .
331^3
H
4144
1
515 i
616 L
. I
^
£ D

H-^M
CLOUDS AND O8SCUWN6 PHENCV^1. A
£*--
B,'' ^ 1
CU-
- o : - oo;o o o
47 48 43 5C 51 V
1 ! 1
22

33

44

5 5
J11
7'77! 77
1
8'3 6
1
S'3 3
1

43 44143 45
I
88

39

1 1
22

33

44

1] 54 *5
1 1 1
22

33

OiO
Sj ''
ih
J.

i

44J4

5 S| 5 5
66
77

88

99


-!»  E B
'
1
,' 1
t 5 -v
v 77i;; ? 7. ''7..-



5 6 6
7 T 7
i , ' ' •
B 8i? *•'• 8 s!?.3' ! 3 ?. Gj

i
i
;
ok Q q!o c) ,i c C'. c:-
3 •. 3 S 0 1 u 1 Jj'v.a-'"





u2 DJr^ t] bs
i
1

,: O
i
1

j' i
i
9 S SlSi
|

"6!





                                  Fig. 2
' STATION
NUMBER
WSAN
00000
1 Z I < 5
. 11111
§22222
133333
• §44444
^55555
3^66666
2|77777
g;88888
§199999
C* 1 ! 1 4 5
; . OC M-
Y
E
A
R
00
( 7
1 J
22
33
44
55
66
77
88
99
t 7
m
N
T
H
00
8 9
11
22
33
44
55
66
77
88
99
8 9
54>4
S
00
ion
1 1
22
33
44
55
66
77
88
99
101'
o
!
** i
*
00(
12 n i
1 1
2 2
33
44
55
661
77
88
99
u-
1'
^
JOO
4 IS IS
1 1
222
333
»44
555
166
777
!S8
599
i
f
0
17
1
2
3
4
5
6
7
8
9
u
DATA
00000
18 19 20 21 22
11111
22222
33333
4444 j
55555
66666
77777
88888
99999
18 II) 3 21 22
-
i.
00(
23142
1 1
22
33
44'
55
66f
77
38!
99
21242
; DATA
0000 0
i 26 27 28 23 X
11111
22222
3333.3
44444
55555
66666
77777
88888
99999
5 262228 2938
or~n— 1|
001
11 323
1 1
22
33'
44.
555
66
775
88!
99C
;i 321
1 DATA
00000
1 34 3; -o; B
11111
22222
33333
44444
55555
66666
77777
38888
99999
3 31 »36 3/ *>
f r
1
E
b '
oot
39 40 1
1 1
22
33
444
55
66
77
88!
39
DATA
00000
42 41 44 45 46
11111
22222
33333
44444
55555
66666
77777
888S8
99999
42 41 44 -.3 46
F
b
00
4748
1 1
22
33
44
55
66
77
38
99
47 48
1
0
n
1
2
3
4
5
6
7
8
9
'5
DATA
00000
£0 M 52 53 54
11111
22222
33333
44444
55555
66666
77777
88888
99999
SO SI S2 53 54
b '•
OOC
5556 .
1 1
223
33:
444
55S
66
77)
88E
99'
5553 '.
DATA
00000
58 59 60 61 62
11111
22222
33333
44444
55555
66666
77777
88888
99999
1 58 i-J U 61 62
orm— •*!
000
53 64 S
1 1 1
222
33:
44<
5 55
6 6 E
7 7)
38!
991
au E
DATA
00000
66 67 68 69 70
11111
22222
33333
44444
55555
66666
77777
88888
99999
5 68 !7 68 B 70
F ,
1
E1
001
71 :i 7
1 1
22
33
44'
55
66
77
88
9S
7,72,7
DATA
00000
3 74 75 75 T 78
11111
22222
833333
44444
55555
66666
77777
88888
99999
3 74 '' 7j 77 73

00
r9 80
1 1
22
33
44 J
U*
55 i
66-S
77 |
88 8
UJ
99 g
73 80 1*4
                                  Fig.  2a
                                   165
-------
       Explanations  of Reasons  for  Flags  on Correspondingly
           Numbered  Observations  in the Edit Listings  **


 1.   Dew point incorrectly keyed.
 2.   Hour other than that in the  normal program.
 3.   Card missing for hour in station program.
 4.   Dew point temperature higher than dry-bulb temperature.
 5.   Ceiling height  differs from  cloud layer height.
 6.   Ceiling height  differs from  cloud layer height.
 7.   Obscuration under Weather  is snow with type of obscuration fog.
 8.   Obscuration with less than 10/10 sky cover.
 9.   First sky symbol scattered with corresponding layer over 5/10,
     and Second cloud group and ceiling non reportable value.
10.   Ceiling, sky symbol, and summation total not in agreement with
     total sky cover.
11.   Two opaque overcast symbols.
12.   Lower layer is  not opaque.
13.   Incorrect relationship of  ceiling, sky condition, and total
     opaque sky cover.
14.   Dry-bulb temperature incorrectly keyed.
15.   Review of wind  speeds over 50  knots.
16.   Wind direction  value over  36 (360°).
17.   Partial obscuration with lowest layer a cloud type.
18.   Wind direction  with calm wind  speed.
19.   Wind speed with no wind direction.
20.   Ceiling height  missing.
21.   Amount of partial obscuration  is greater than total opaque.
22.   Total opaque missing.
23.   Amount of obscuration and  total sky  cover differ.
24.   Amount of obscuration less than 10/10.
25.   Partial obscuration due to fog omitted from weather, and in-
     complete keying of cloud and obscuring phenomena.
26.   Amount of partial obscuration  is greater than total opaque.
     Also, incomplete keying of clouds and obscuring phenomena.
27.   Third layer summation is missing.
28.   Total sky cover omitted.
29.   Visibility omitted.
30.   First two columns of weather omitted.
31.   Fog not shown as obstruction to vision.  Clouds and obscuring
     phenomena layers less than total sky cover.
32.   Illegal visibility.
33.   GrOund fog with obscuration greater  than 5/10.
34.   Fog reported with less than 6/10 obscuration and lowest cloud
     layer greater than 5000 feet.
35.   Ceiling not a reportable height.
36.   Ground fog with over 5/10 obscuration.
37.   Sky symbol and first cloud layer not in agreement.
38.   Total opaque cloudiness and cloud layer data in error; or
     ceiling, sky and cloud layer relationships in error.
             **  See  paqes  168  through  174.

                            166
-------
39.  Blowing dust with wind speed less than 7 knots.
40.  Flagged for review - no increase in 2nd layer summation amount.
41.  Ceiling height not a reportable value.
42.  Fog reported as obstruction to vision with visibility greater
     than six miles.
43.  Visibility value not reportable.
44.  Visibility value not reportable.
45.  Ground fog reported as obstruction to vision with visibility
     greater than six miles.
46.  Visibility reduced to less than seven miles and no obstruction
     to vision.
47.  Illegal keying in weather and obstruction to vision columns.
48.  Illegal keying in weather and obstruction to vision columns.
49.  Squalls reported with wind speed less than 16 knots.
50.  Fog with less than 6/10 obscuration and lowest cloud layer
     greater than 5000 feet and psychrometric error.
51.  Sea level pressure flagged for non-reportable value.
52.  Dry bulb and dew point sequence check.
53.  Station pressure flagged for improbable value.
54.  Dew point incorrectly keyed.
55.  Cloud height incorrectly keyed.
56.  Illegal punch in weather & obstruction to vision columns.
57.  Flagged for intensity of snow with 1/4 visibility.
58.  Station pressure sequence check.
59.  Duplicate cards, date and hour 1st card.
60.  Duplicate cards, date and hour 2nd card.
61.  Visibility sequence check.  Change in values up.  Sea level and
     station pressure check.
62.  Visibility sequence check.  Change in values down.
63.  Missinb observation.
64.  Dry bulb sequence check.
65.  Flagged for intensity of snow with 1/4 mile visibility.
66.  Flagged for liquid precipitation with 24-degree temperature.
67.  Station and sea level pressure sequence check.
68.  Station and sea level pressure sequence check.
69.  Sea level pressure sequence check, station pressure flagged for
     review.
70.  Frozen precipitation with 45-degree temperature, station pressure
     flagged for review.
71.  Station pressure flagged for review.
72.  Station pressure flagged for review.
73.  Snow intensity not in agreement with visibility.
74.  Snow intensity not in agreement with visibility.
75.  Missing observation.
76.  Sea level pressure sequence check.
77.  Sea level pressure sequence check.
78.  Duplicate cards, date and hour 1st card.
79.  Duplicate cards, date and hour 2nd card.  Dry bulb sequence check.
SO.  Sky condition symbols missing.
81.  Weather and obstruction to vision symbols missing.
82.  Observations for the 29th day missing.
83.  Observations for the 30th day missing.
84   Observations for the 31st day missing.
85.  Monthly inventory check.

                             167
-------
IMOAA F«rw K2-3U , ft IPPArP \A/PATWPP OR^PRVATinN-K NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION
(1.73, OUKPAV-t WCAmCK WBOCKVMIIUINO ENVIRONMENTAL CATA SERVICE
c
f
Ul
u
u
-
s
u
E
3
Z



i
CM

1 STATION NAME
WBAN EDIT TEST *1
CO
5 °
STATION NO.
OOOOA
S
s
K
K
S
s
-
=
R
S
:
6
2
-
-
-
-
-
fl
F*
*;
Q
u

inorrfo
•c
u


fit Ofl -MW
MAI
4NOO«
NOI1VWMOS
OAVl OHC
uaxvi out
";"..v*
MAI
innoHv
SjAvrss
2ND LAYER
LOWEST LAYER
"'"..T1
Jjju
ANCWm
C14 -K) IMtl
MU
H3A03 AXf
1V101
(V)
AiiownH
3AI1VT3M
•ina
(•j
13*
>)
iitnstaaj
NO 11 VIS
** i
* i
Ii

TENS OF
DECRCCS
S|t
(•$««>
13A11 V3S
HEATHER AND/OR OBSTRUCTION TO VIIION
s>
PROZCN
PRECIP.
LIQUID
PBECIP.
.•u:r.'«
VISIBILITY
(MILES)
• SKY
STMBOLS
Ml
M ~
•14 JO 'SOH
9HI1I33

1
2

*^
1
1
0

O
1
1
O

o
1
1
o

1
1
1
o
	 o
	 o
CO
o
o
-«
•o
CO
CM
o
CM
O
O
* 1
o
o
o
o
o
o
o
o
o
o
o

0

1
1
o
o
^
1
1
o
0
1
1
o

o
1
o
1
1
1
0
	 o
1-
o
o
o
-•
1-
03
N
o
C"
CM
O
0
O
o
0
o
o
0
o
o
0
o
o
IM



1
1
CMCO
o
o









	

	














	



o -»
in O
o

1
1
o
0
•o
1
1
0

•o
1
o
	 -o
o
0
en
	 o
"°
o
*4
CO
0
o
r-
co
CM
O
CD
CM
O
m
0
o
0
o
o
o
o
o
o
0
o
— o
"
1
1
1
0
0
*







o
o
m
Cl
o
CO
o

CO
o
IT-
CM
1-
co
CM
O
O
•fl
•O
o
o
0
o
o
o
o
o
o
o
o
:3
f-
* o
2
0
^








	
o
CO
°
o
o
CM
CM
0
CM
r-1
0
CM
O
0
O
O
0
o
o
o
o
o
o
o
o
c\


"**

* 0
•o
0
^











O>
o
o
D
u.
	 o
CO
o
CM
CM
•O
CD
CM
.-*
m
CM
r-
c-
o
0
o
o
o
o
I/)
0
0
* o
o
o
— ^

*
o
o
o
o
*
1
0
•-•o
*
e»
l
l
o
*
o
1
1
0
*
t-
o
o
1
	 o-
	 o
*
CO
r-
0
l\
CM
a
CM
O
o-
o-
o
o
o
0
o
o
1
o
o
o
CM
o


*
0
0
CM
CM
O
z


*



*
o
CO
CO
U
	 CO
*
o
o
— «
	 o
CD
O
o
CM
m
CO
CM
IA
O
tA
CM
*
C
0
X
o
o
o
o
0
0
o

o
	 o
o
IA
O
(/•
-
•o
o
~
"
CO
CO

o
r-
CM
CM
o
0
o
o
o
0
o
o
0
o
o

XJ

*
u
o
o
o
'%J
1
1
o

0
1
o

o
f\
"
* r-l
o
a-
o
— 0
	 o
o
o
o
o
-
CO
CO

*
s
CM
o
•o
0
o
o
o
o
o
0
o
o
o
in

	 a
1
o<
* o
*
o
o
"'
1
1
o
o
o
1
0
0
0
c\
•"
s
0
0
o
» -f
	 o
•o
1-
0
o
o
o
o
CO

CO
o
fx
o
1-
o
0
0
o
o
o
o
o
o
o
o
CM
— ^
°
*
o
» 0
o
o
o
1
I
o
- -— o
o
1
o
-(5
o
1
1
1
0
	 TC
I
t
1
o
o
4
0
o
0
« 1
o-

o
1^
CM
o
u
CM
o
0
o
o
o
o
0
o
o
o
CS
— c
- 
CM
It IA
O
CM
l~
O
f
<\
0
O
o
o
o
o
o
o
o
o


m
CN
O

o
o
1
_ 1
o
c
1
1
I
o
— js
o
1
0
c
1
1
1
o
o
<
o
c
o
*M
o
CM
0
r-
* m
r—
o
I
a
CM
C
0
o
0
0
o
0
0
o
o
in

u
•-*

o
o
OXEN LATER X» OBSCURATION ("X" APPEARING IH CLOUD TTPE COLUMNS NOTEl "+" DENOTES -MEAVY-
ERCAST LATEH DENOTES OBSCURING PHENOMENA OTHER THAN POO) «-" DENOTES" -PARTIAL." "LIGHT." "THIN". OR "MINUS- AS APPROPRIATE
• H
• O
,
TTHIO LAY
CLOUD
ENTRIEIiSiSC'
C>NO
i
S

i-l i-l 1-4 i-l i-l fH r-l
Fig. 3a







   168
                                                      OINVBM
-------
1 33IAB3S*.l\rO-|VJ.N3MJOHmN3 CM/-M 1 W A M3Car\ V3UIW3AA 3^WJVnt- Ert) 1
| Noavtij.stNirvav3iii=w"«*« von J
NATIONAL CUMATIC CENTER |




X
u
£
O1
(M

STATION NAME
WBAN EDIT TEST #1
to
rf 0
« 0
u •"
STATION NO.
00001
a
s
n
«
s
a
R
~

"
i
to
£
O
-
•
-
-
'
n
"
O

ano*dQ
K
u« *••*•«
«A1
IHTOW
NOIiYNHnt
D3AY1 Odt
3RD LATER
'""„"?"
IKflOKY
a 3 xvi OWE
K
*
J
a
z
*
w
>•
<
UJ
*
O
_»•
' " ™£"
IdAi
iHnoiw
en ID tew
WAI
INnOWT
Mii?ioi>$
(M
3AUV13H
aina
13«
ains ma
(S3HDHO
NOUV.IS
Q u
J «
ll
|
0 JJJ
P. S
sft
T3A3T f 35
WEATHER AND/OR OBSTRUCTION TO VISION
51
S>
FROZEN
PRCCIF.
,__J
LIQUID
PRECIP.
«",r,°.v«
>-
>
* SKY
SYMBOLS
(MILES)
„, "
"
„ "
«. -
'IJ JO'SQH
ONI1133
I










o

*
CM
O
o
o
CO
CM
m
o
X
o
o
o
o
o
o
c
o
m
o


**
* i
CM
2
O
*
t










f-
0
o
to
•-)
-
o
!•-
O
r-
en
*
*n
•o
0
S
«
P-I
cf>
o
o
o
o
o
o
0
o
o
m
o

— — o
o
o
o
0

	 r
i
i
o
0
CO
1
1
o
<_>
CO
*
1
1
o
o
1
1
1
D
U.
•ft

CO
o
in
in
in
CM
•o
0
z:
i
•o
(N
O
O
a.
CO
o
o
o
o
0
o
^
	 — X
1
t

O
s

o
0
CM
•O
O
Z
m
fM
o
o
u.
o
o
o
o
1
a.
0
o
0
G


* **
O
O
o
o





»


*
*


*
1
1
a
u.
* C'
o-

m
CO
o
CM
-t
CO
CO
CM
m
o
CO
o
m
o
o
o
o
o
o
o
* o
o
o
0
	 o-


1
1
r-
o
o





*


*
*


«
1
1
1
o
u.

•^

o
CO
o
*

t>
CO
CM
m
0
l
r-t

-«
CO
in
o
s
CM
CM
O
CO
0
m
r-t
CM
O
O
o
o
o
o
o
o
o

0

0

» CM
0
-•












o
CO
o
<
W
*
CO
in
o
CM
(M
CM
tM
CO
o
r-
0
(M
r-4
o
o
o
o
o
0
o
*
o

— ^
CO
o
o
r-l






-—-
o
* -•
o
o
«
*
1
1
1
D
u.
*
-
CO
f-
o
fvJ
(M
r-t
PJ
o
rsj
O
OJ
o
O
O
a.
0
0
0
1
a:
K|
* C
o
0
1
o
o
ft
r*

_ —









m
O
o
I/I
r-l

CO
o
-------


1C NATIONAL OCEANIC AND ATMOSPHERIC AOMINISTBATIO
"*a ENVIRONMENTAL DATA SEiWtCE
NATIONAL CLIMATIC CENTER
I JKUUF..UW SURFACE WEATHER OBSERVATlOt"
1 u-73








5
£
O
CM

STATION HAKE
WBAN EDIT TEST f»l
?°
2°
STATION NO.
ooool
3
s
s
s
s
R
R
K
s
t
=
t
2
-
-
-
-
-


"
c
u

anerwo
J


„.-,..
arfAJ.
iHnomr
NOIlYWtM
X
< i< n tMQ
im
1NHOT1V
WOIJ.V
o
x
y
MHOS
OH:
•"M.2T"
3JA1
IMHOHT
1IOIIH
»A1
INnonr
B3AOD A«
3AI1T13S
U
flina
.)
u.)
anna «ao
NOIJ.V1S
I||
x«2'B
*• S
s|£
aanssigj
T3A3T ta$
VIATHE* AND/OR OBSTRUCTION TO VIIION
Ii
MOZCH
PRCCIP.
5u
uauiMi
VISIBILITY
MILES!
• SKY
SYMBOLS

^ •*
V, **

'J.J 40 'SON
ONnm


2

*
1
1
O

0-
1
t
o
	 0

1
1
o

1
1
1
CD
u_
*
*
CM
CO
0
CO
rj
CM
O
Co
CM
O
•-4
(-4
CM
CM
0
0
u.
o
o
o
o
o
0
« 1C
o
n
o



« i
l
i
I
tr
•-4


1
t
O

CO
1
1
O

CO
o
in
	 1-
I
1
O
u.
*

tn
CO
O
CM
CM
CO
CM
0
in
o
CM
o
o
UL
O
O
e>
0
» o
o
•-I
o
	 -_A
01
1
o
o
CM
O

O









	
o
o
tn
* -i

o
o
o>
CM
CO

CO
CM
O
CO
0
CO
CM
1-
0
0
0
o
o
o
o
o
•3
0
c

"
^^
^•*
m
* 0
o

o







o
o
0

en
i
i
i
o
u.
*

CM
CO
O
CO
CO

CO
CM
in
o
CM
CO
CM
in
•-*
o
o
u.
u
o
o
o
0
1
tf
* o
CSJ
o
o

c
1
o
o
0


1
1
1
o
..... JJ.
CO
1
1
o
0
CO
1
1
0
o
0
o
u_


•o
o-
o
CO
m

CM
o
S
CO
CO
o
o
o
u.
o
o
o
o
1
o
o
o
o

— ^
*
o
m


•

*
*


*
o
* -<
0
CO
— — Iff
m
o
V,
*~
* -i
CO
o
o

ro

CO
CM
-
CO
CM
CO
CM
C"
0
0
0
o
o
o
o
o
o
o
0
in
	 -o
—~—- ^
in
o
p-
o
^.'
o
1
1
o
o
o
i
i
i
o
o
1
1
o
o
o
<
w

O
CO
0
^

CO

CO
' CM
* o
CO
CM
O>
CM

0
o
0
CO
o
o
o
o
_ o
* 0
0
o
o
IT0
1
1
o
i
*~
1
1
0
o
o
r-t
CM
o
tn
^
CO
*
o
0
1-
tn
1
l
1
" O
a
u.
*

CO
CO
o
CO

~
o
CM
O

CM
O

o
X
u_
o
o
o
o
	 p
c
0
•0
o
	 >c
1
CM
o
r-
0
_4
O
r-l







0
CO
CO
• 'in
	 if
*
CO


~°
"

fM
O
O
f.
.-J
1
1
1
o
0
-"
1
1
o
**
-
1
1
o
**
o
CO
p

~
CO
c
*

-

Csl
O

CO
-

0
o
o
o
o
o
o
o
o
0

	 en
i
r-
o
_4
*
1
1
O
-o
CO
1
1
1
o
— - — TO
CO
1
1
o
o
o
o
en
u


o
in
o
3}

CO

CM
0

CM
r-

o
o
u.
o
o
0
o
o
o
* o
o

TQ
1
!
o
*
«-<
1
0
o
CO
1
1
o
0
CO
o
tj

o
•o
o
*


in
o
in

*
O*
fM
*

cs
CO
CM

O
0
0
O
o
o
o
o
* 0
o

^
VI


f
,-
»-«







O
o
I/
o
u
l/>


m
o
u\

m
CO
CM
O
O
O
CM
O
CO

O
O
O
CO
o
o
o
0
* 0
0

w


o
o
_
o







o
10
o
«J
I/I
CM
*-•
o
u
in
vu
*-<
CM
CO
O
CO

CM
r-
co
CSJ
O>
O
o
CM
•*•
CM
0
0
0
o
•o
0
o
o
o
* o
o
o
	 o
*"*

0
r-
o
r*
1« IDOKEII LAYEK K» 001CU«ATIOH ("X" APPEAKIN9 IN CLOUD tYPI COLUMNS NOTEl "+" DlROIlt "BEAVlf "
0 . OVEHCAST LAYEK DENOTES OOJCUKIH6 PHENOMENA OTHEH THAN POO) "-" DENOTES" "PAKTIAL." "LIOHT," -THIN". OK "MINUS" AS APPSOf KIATE
•e
TTEHED LAY
CLOUD
ENTKIESiS>SCA
C* NO
i
8

SSSSKSSS5335355S
Fig. 3c





    170
OlNt8M
-------
o
NOAA fwm S206* StlRPArP WPATHPR OBSERVATIONS NATIONAL OCEANIC ANO ATMOSPHERIC AOM.NISTSAT
(3-73) iUKrAV-C WCAinCK VJD3CKVAIIUIN3 ENVIRONMENTAL OATASERV'CE
NATIONAL CLIMATIC CS.NTS3


x

STATION NAUI
WBAN EDIT TEST #1
CO
d O
« 0
3 r-
STAT10H NO.
00001
»
«
R
£
S
s
r«
=
o
:
5
IS.
O
'
-
-
-
-
«
~
_J
O
300VJO
X
X
>-
"ixy
Mil
INflONV
M3AV1 08C
w
•€
a
K
n
1HMM
Mil
IHnONV
SSjiVToS!
2ND LAYER
LOWEST LAYER
1MIU
J..U
IHOOHV
4M-M1W
3JA1
iNnonv
"*?ioV$
Aliarwnn
flina 13*
tins Ana
{53H3NO
NOI1V1S
e SP
- "2
* «s
Q«*° «
S5£i
»- Q
a*.>"
T3A31 V3«
i
o
X
o
i
1-
a
O
«
a
Cf
X
t-
*
*t o
g?
N U
O 411
DC tt
M. k.
LIQUID
PRECIP.
~"!££ii
= 5
*

0 v, "*
> «•
•U JO '5CJH
i
2
O







O
r-t
t/»
O
o


o
o
1/1


-0
o
s

•0
a)
CM
CM
* O
CM
O
CO
o
o
o
a.
o
0
1
t/)
o
o
* 0
CM
•*N.
r-t
r-4
O


o
o
o
r-4
^
CM
1
1
1
O
	 a
*
i
o
	 cr
-*
O
o
1
1
1
a
u.
*

o
o
o
CM
*
CO
CM
*
CM
CO
*
O
O
0
u.
o
0
o
o
o
# o
o
in
o
i
i
i
2
Z

t
t
o
0
-
o
0
•+
1
1
o
	 -o
o
o
r-4
•^r

O
•O
O
O
CM
CO
CM
^
O
O
CM
O
* 0
O
O
O
O
0
o
0
o
o
m
~ — \^-
u

o
^
CM
1
1
O
°
*
1
1
O
- - -ts
*
S

tn
m
o
VI

o
•o
o
o
1



o
o
o
CM
CM
r>J
CO
CM
in
o
tn
CM
CM
in
o
o
o
u_
CO
o
o
o
1
ce.
•» o
o
tn
0


COT
CM
o
o
(M
o
•-I











CM
o
o
1

,-t
in
CO
o
CM
CM
CM
CO
CM
0
CM
o
CM
o
CM
CM
o
o
0
0
o
o
o
o
* o
i-4
o
* 0


	 XT
o
o
t 0

u

0
2

CM m
CO CO
CM CM
—1
CM
o
o
o
o
o
o
o
0
o
o
o
o



1
1
-o
—«

o
	 o
o
1
t
1
o
	 'O
o
1
o
	 D
1
1
1
o
- " o
a
tn
in
o
•*
o
o
r-4 CO
O CO
CM CM
i
CO
CM
r-
o
1
-o -»
CM — t
00
o
o
0
o
o
o
0
o
o
o
in o
i~* «3



1
o
CM
o
i
1
o
— o
o
1
1
o
• o
o
1
1
1
o
o
1
1
1
o
— TO

m
0
CM
o
o
CO
CO
CM
en
CM
o
1
r-
o
o
o
o
o
o
o
o
0
o
o
o o
CO »-«



1
r-
o
CM































	

O .-"
to o
IV
*
1
1
1
o
e>
o
1
1
1
o
-—(5
o
1
1
o
0
o
o
m
	 'Cfi
o
~H

CM
o
CM
O CM
O
o
CM
0
f^t
O
in
C\J
o
o
o
o
o
o
o
o
o
o
tn
"



-0
r-*
fM
SSSSSSSSKSSSSSSS
* COLUMN 3 ENTRIES) S = SCATTERED LAYER t * BROKEN LAYER X« OBSCURATION C*X" APPEARING IN CLOUD TYPE COLUMNS NOTEi "+" DENOTES "HEAVY**
C* MO CLOUD 0 * OVERCAST LAYER DENOTES OUCUHtHC PHENOMENA OTHER THAN POC) "-" DENOTES" **PARTIAL." "LIGHT." "THIN**, OR "MINUS** AS APPROPRIATE
Fip. 3d ZiErsnvABH PINVEV
SKDU^AtiSEO SJ^KHS OTSS ftBOd
171 -
-------
| NOAAr-ormtt-:** . ' ci ipFArP WFATWFR OR^FRVATION^ NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION
1 (».73) oUKrM\-C YYCMinnK \^DOCr\YAMiw.^o ENVIRONMENTAL DATA SERVICE
r
u
E
JJ
J
-t
<
X
j
-i
J
<
E
3
4
E



B
X
eg

STATION NAME
WRAN EDIT TEST #1
CO
« 0
>.
STATION NO.
00001
s
s
s
R
5
n
"
K
S
t
=
-
0
-
-
*
-
-



"
-J
o
u
an&Y«
K
J

"!£»*
3JAI
1MHOWY
NOI1VNHRS
ti
i
o
DC
1M9I1H
adxi
IHnOKY
NOIlYI-mnS
tl 3 AVI QNZ
3ND LAYER
LOKliT LAYER
'"*»",«'°
3JA1
iNROHV
JM9OH
Ull
lNnoit>
K3AOD AJCS
(*)
AllQimm
aAuviaii
tine ia«
(^*)
vina JIB a
(S3H3NI)
NOI1VJ.S
III
«. .SB
His
»- o
lit
auntsaitj
0
X
o
p
at
0
ae
X
ac
X
*
=1
o *•
St
at at
LIQUID
PRECIP.
-•asa.
VISIBILITY
(MILES)
>>
tt
M
*
^ "
_1 M
O M
a «••
S *

|VU rfO 'SON
9NI1I33
I

o











o
o
*/>

-
CO
0

CO
CM
r4
^t
r-
o
?
o
o
o
o
o
1
to
o
0
* o
*»•
•-4
o
* o


w

o
o
o
CM
"





— __




o
o

-
CM
O
0
CM
CM
*
CO
CM
CM
CO
O
IM
CM
O
O
U.
O
O
0
o
1
n:
* o
o
m
o



o
o
0
 o
o o
0
u.
0
o
o
o
o
o
o


— «
o
0
o
CM
2











r-t
o
o
" ej
o
u_

r-4
o
o
en
en
.» CM
CO CO
CM CM
1-
0
in
o
en
CM en
0 O-
O 0-
0
u_
0
o
1
to
0
o
o
•J-
0
o

°

0
o
en
CM
«












OOOpCdDT
i i
"
eg
O>
o
in
«n
m
o
* CM
m
o
eg
o
en
enm
C-C"
o
u.
o
o
CO
0
o
0
0
o
o
Ci

-^
o
o
•o
CM

1
1
1
o
o
o
1
o
o
0
o
I-
o
0
•-4
o
o
"a;
0
	 KT
"
en
CO
o
en
*
o
i-
* eg
CM
O-
o
eg
CO

0
o
o
u
to
0
o
1
3:
» o
o
03
o
	 v
t~
o
o-
eg

1
1
1
o
0
0
1
1
o
	 o
o
.-I
i
i
i
o
D
CO
en
	 o
to
—-a
.-i
^
o
o
•o
en
en
o
* CM
CM
o
CM
en
m
CM
03

O
o
o
o
o
o
o
o
o
o
•-4

	 o
	 0
en
o
CM
CM
IN
•-4







O
•4-
0

* eg
t-
en
CM
CO
CM
en
CO

0
0
o
o
o
0
o
o
0
m


-r
o
o
CM

p-4











O
en
	 o
to
cf
.-i
r-4
O
o
CM
eg
CO
eg
r-4
1-4
CM
eg
"
o
o>

o
o
o
o
10
o
0
* o
CM
O
* o

	 tr
en
o
o
eg

1
1
1
o
cs
0
1
o
o
r>
D
O
en
«
w
0
en
	 0
10
w
•0
o
o
CM
CM
U
CO
CM
03
0
en
•o
o
o

o
o
o
o
to
o
o
o
« o
o
» o
"
™ -to
ft
0
o
CM












	



















o *
in o
CM
•f
1
1
1
o
— e>
CO
1
1
1
O
	 e>
CO
1
1
1
o
0
o
o
	 en
	 m
CD
•O
o
CM
X
CC
CM
•"*
CO

CO
o
o o-
o o
o
o
o
o
o
o
0
0
o
in
m



i
i
i
•o
CM
O
1
1
1
O
0
o
1
1
1
o
o
1
o
o
1
	 1
o
o
0
i-
o
o
eg
c\
r-
03
CM
CM
CO

O
CM C\
CM -<
O O
O
o
o
o
0
o
o
o
o
0
CM

- 0
l_)
I
1
CM
CM
CM
O





	




	 	 O
cy
	 e>
o
03
0
CM
CM
ir\
eg
r-
co
CM
O
CO

CM
O
O
0
0
0
o
o
o
o
-3
0
m

l-J
•f
o
— » -d
a. o
OIM
0
1
o
0
o
1
1
o
0
o
1
o

1
1
	 1
o
o
°
o
o
o
o
a
*
o

CM •-*
-• 0
O 1
o
o
o
o
o
o
o
o
0
o
m

— 0
"
1
1
CM >J-
a. o
O 'M

1
A
o
<3
*
en
1
1
1
o
*
en
I
1
0
	 e>
m
o
o
*
en
-r
o
i-
CM
«
CO
CM
r-
en

"
o
O
o
o
o
o
o
0
o
o
o
m

	
*
I
I
-o
0
0 PHENOMENA OTMEU THAM POC) "-" DENOTM" "PAKTIAL." "LIGHT." "THIN". OX "MINUS" AS APP«OP«IATE

K I a eKOHEN LATEH X> O5SCUKATIOH I"
0 * OVEXCAST LATEX DENOTES OiSCU
TTEXED LAY
CLOUD
sS
u *
Ml U
ie
X
i
3
s--ss°---"-----s
Fig. 3e
    172
SKOUVAMSiO 33VJKnS OtK HKOJ
-------
1 KOAA form t2-3*» SURFACE WFATHFR ORSFRVATIONS RATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION
1 IMS OUKPMVwC YYCAmCK VJD3CKVAIIWINO EMVIRONWSMTAL DATA SERVICE
NATIONAL CUMATIC CENTER

CLOUDS AND OBSCURING PHENOMENA
r



STATION NAME
WBAN EDIT TFST SI
CO
0

O

STATION NO.
00001 '
S
s
s
"
™
s
ft
s
o
=
-
t
s
•
-
-
»
*
-
-
o
u
3 nor jo
w
X
r"."il°"
3411
1NHOHV
B31Y1 OSC
]RD LAYER
'">S£"
3411
iNnowv
D3AV1 ONE
INO LAYER
LOWEST LAYER
'"js,y
3411
1NHOHV
1HM1H
3411
J»
no>.»
»3AO> A»S
1Y101
AllomnH
3AUV13tl
•ins i»
Uc)
9ing ABO
(13H3HI)
NOIiViS
;£l=
a .°S
""
slf
(•!»«!
13A31 VIS
WEATHER AND/OR OBSTRUCTION TO VISION
s'i
PROZEN
PRECIP.
D fe
^ 9.
• 0 BO*HI01
2
. SKY
SYM&OLS
S
„ "
„, "
„ "
„ -
•14 JO 'SOU
9NI1I»
1


o
1
t
o
0
0
1
1
o
	 o
o
1
o

1
1
0


o
CM
N
CO
 SCATTERED LAYER • • BROKEN LAYER Urn OBSCURATION <"K" APPEARING IN CLOUD TYPE COLUMNS NOTHl *H" DENOTES "HEAVY"
C» NO CLOUD 0 « OVERCAST LAYER DENOTE! CISCURINO PHENOMENA OTHER THAN POO) "_" DENOTES" "PARTIAL," "LIGHT," "THIN", 0* "MINUS" AS APPHOPRIATt!
Fig. 3f
    173
-------
1 HOA«. f a-m u-3»* . ^IIRFATP WFATHFR OBSERVATIONS »*THMAL OCEANIC AND ATMOSPHERIC ADMINISTRATION
1 (MM OUKPAt-C YYCAinCK UD3CKYAIIWINO ENVIRONMENTAL DATA SERVICE
NATIONAL CLIMATIC CENTSR |


kf
5

i
00
o
o
r-
STATION NO. |V
oooox 1
s
n
R
a
s
s
s
«
s
s
=
E
~
-
'
*
-
'
"
"
«•
8

JftOYJO
«
M
£
"is.r
94J.I
1NOOHV
NOIXV
)RD LAYER
WMRt
1H911M
301
INnoKY
NOUVMwns
>NO LAYER
LOVCST LAYER
"is,"r
MAI
imotn
"ISff"
MAI
1NHOMV
U3AOD AXf
1Y101
(t)
AllOIHnH
1AI1V1JII
BlrtB 194
U.)
(I3HDHI)
j»nsi3»J
NOllYlf
£ i
|
lil|
°£~"
fSVMV
T3A3TT3S
X
O
o
\
o
1
o
ac
X
-'I
e •»
LIQUID
PRECIP.
M DO*»Mi
\
«<
1
J
„ '
„ "
„ "
«* ^
I'la JO'SOH
9NI1U3


5

J- L_


1 n
i n
an
ca
(a
Fe
























.111111

ventory of No. 1 car
formation. Note the
d the 0400 card on 1
rd count, 222, is 2
t 8 observa-t ions per
bruary.

































o
o

~*










000
o o o

o o o
o o"6
o do"

000



o C- o

o rt rs










O o O
O O O

o o o
o'o'o
O 0 O

o o o

o'6'o


o c o

" n o










000
o o o

o o o
o'o'o
o o o

o oo

000




so o- a










0~0 0
o o o

o o o

o"5 o

o o o

S o~o

oTTo

-f CM ffl
_1
•ds
>t t
he
les
• da










o o o
o oo

o o o
—t ft ft
o'oo
o o o

o o o

o'-o'o

o o o

*"lf. -0
_JL. L J
conta? ni ng
he 0100 ca
27th are rr
s than the
y) for a n










0 0 0
o o o

o o o
ft f-t ft

o'oo

o o o

°_?_?

o o o

t- 00 O>










o o o
o o o

o o o
O'O'O
o'oo

o o o

?^_1

o o o

O ft N










o o o
o o o

o o o
o'oo
000

o o o

0"0"0

o o

rt * in
_l
cl
rd
iss
pr
on-










000
00 O

000
—4 r-t t~t
a~c o
o o o

o o o

0 0


coo

_J .
-

oud layer
on the 24t
ing. The
ogram coun
leap year






















•o r* oo o* o ft





















fM

























h -
t -
-
-























«TES "NEAYY"
HOTEI" "PARTIAL," "LI6NT," "THIN", OK "MINUS" AS APP*OT«UTE
> Q
t
I
• > MOKEN LAYER X* OMCUHAT1OH f»" APPEARIH* IM CLOUO TYPE COLUMNS NO
0 > OYEHCAST LAYER DENOTES OUCUftlHO PHENOMENA OTHER THAN POO)

TTEREO LAY
CLOUO
u i
i £
M
t-
M
8

oo
Fig. 3g
     174
                                    z«i -Blur/van
                                                      ot«»SM
-------
VI.
Visual Checks

Data that are not keyed onto tape are given a limited visual scan
as a random check for quality control and consistency of climatolog-
ical data entries.

WBAN No. 3 Card Edit and Listing

The No. 3 WBAN Card images  (Fig.4 ) contain varying daily Climatolog-
ical data for individual stations for the period midnight-midnight LST
, 	
STATION
NUMBER
00000
1 2 3 4 5
1 1 1 1 1
r~DAtr~
Y*
00
9 7
1 1
uo
00
1 9
1 1

DO NOT
PUNCH IN
THESE
COLUMNS

6GE6E
77777

88888

99999


66
77

88

99

AfK
6
7

8

9

DAY
00
!« 11
1 1
7?

33
4
5

6
7

8

9

MAX
TEMP
CFI
3
* 0 0
a 13 14
1 1
77

33
44
55

66
77

88

99

1231 BSC
HIM
TEMP
l-F]
3
* 0 0
15 16 17
1 1
77

33
44
55

66
77

88

99


PRECIP
(1.-.J
E
OOlOO
II 19120 21
I 111 1
77i?7
1
33133
1
44144
1
5155

6|66
1
7177
1
8188
1
9199
i
II rt 11
SNOW-
FALL
-------
 VII.  Program and Tape Control Procedures

       In order to provide appropriate edit information and to meet publi-
       cation deadlines, the records are placed on two separate  tapes each
       month, preceded by certain station program and priority information.

       A.  Tape No.l contains data for all  stations in the LCD and CD
           programs and Tape No.2 contains  data for all others.

       B.  A thirteen-digit Program and Priority Editing Code is provided
           in the station's header on the name tape.  Positions  1-11 indi-
           cate the programs in which the station participates.   The figure
           "1" in the various positions indicates that the station has that
           program; "0" indicates it does not.

           1.  Sunshine data keyed in cols. 54-58.
           2.  Fastest mile in compass points.
           3.  Station in 1009 program.
           4.  Station has monthly temperature normals.
           5.  Station has mid-monthly temperature normals.
           6.  Station has monthly precipitation normals.
           7.  Station has degree day normals.
           8.  Station in Extended Forecast program.
           9.  "Days With" are keyed in cols. 41-51.
          10.  "Water Equivalent" keyed in  cols. 63-65
               when snow depth 002, or greater.
          11.  Station has CD number.
          12.  Station has LCD, coded 1; no LCD, coded 2.
          13.  Station is operating, coded 1; closed, coded zero
               (a convenience in using the  name tape as a reference
               in other programs).

VIII.  Edit and Listing

       The machine edit is designed to detect various inconsistencies of
       data.  The corrected  (updated) listings provide various computations
       of sums, averages, departures from normal and counts of number of
       occurrences, etc., used in the climatological programs.

       Sample  listings  appear on pages 20a and 20b.

       The fields for all inconsistencies noted in the edit are flagged
       with appropriate symbols in the column(s) to the right of the
       field(s) questioned.

       Checks are made and field(s) flagged for review, according to the
       outline below:

       A.  All columns 1-80

           "12" overpunch.

                                   176
-------
I
I
8
O

c
c
o
0
o
z
c
t—
t-
<3
t-
V
>
u.
H
• c
c
f-
vl
•—
c
c
c
c
UJ
o
o
EATHER TW ES
. Z 0 KH BS MF
OCCURRENCES Of W
f T (P A R S
2?
SUNSHINE
HOURS K
HJ Of
I5
ll
20
C
Od
z
L
35
ii
ii
X ^
S"
c
o —• m -o ^1-
rn O ^f -O ir\ ;
o o o — « «-t,
^S \^ jf
UJ 3:
^-« Z
*-*
% ^t A -l
o co r- o o
r^ & C3 & -t
S (si UJ
-O Z •-!
m z (o
O •-• (SI O
•-t in (si r-«
* *

O O O O O
ft I
fl r-t f*~ *£" -T
* O O CO
cs» O O<
«
3: 2 IU;
r- 21Z
m .
r-« o rsi -O
rsi CM (Si .-< <
*i
•M ^« O •-• i-
csl 000
=? O O O
0 O O O
CO O O --<|
r-' o o -«
•O 'r+ O •-«
u^ o .0 o
,* 0 O •-•
CA O O O
rsi 1-1 O ^-«
~$\ Q* fi^ ^3
^ &• »-4 ^3
O • in o o
(SI O
r- 4 ^J f*^ C5
(si O
rH r-«
*
ro c^ co
r^- (A (O
*
csi rA rH in
O CM CSI .-•
o -»•

o o o *$•

o CM o m
o rj>o (si

O O (7s
•^ o o o o i
o n ir» o csi <
D (si o O r-J (
S 3 3 S 3:;
IS 2-
•si »o •-> m in
>J ^4 ff) ^t ^<
-< .-i ^-< O O '<
D O O O O
O O O O O
O O O O O
r-4 •-» O O O
O O O -• — i
r-* r~« O O O
o o o o o
o •-> o o o
o o o o o
r-l .-• _4 O O
CO O Cl O(SI
•-4
co O rsi o o
r-t •-<
n o co O (•<•>
03 O w^L^(>(slrj
in -f rj
~o A o -*
*J ^f ~3 csi ro
>0 1? co 0 o N n -* in
•^ r- csi —i so
D M -O O O .
>J (SI r-« CNJ O
S 3 3 3: 2
z^z-zz
£. 10 3
co -O 1— rA co
D O O O O
O O O O O
o o o o o
o o o o o
O O O 0 O
O O -i O O
o o o o o'
o o o o o
o o 6 o o
o o o o o
O O .-i O O
f*- (O CO -^ »-«
-o o o»  csi o cst r-«
M csi csi en en
-< cst (si co r-
M .-t esi m csi
." •
- • •
^-l O1 ~t
S 01 - Z
1
D O O O r*
o -• o o o
o o o o o
o o o o o
o o o o >-i
— • ••* O O r*
r-l »^ O O »-«
o o o o o
O O O O -i
O O O O -i
O r-l O O •-*
r~ in o IA o
~
A in o -o o
ro O CO *—* O
-o m o o
CO -t CO O O
WD IAOO

O O O O IA
;sj rs( csi
r- >o co csi (si
-1 C<1 CS) r-t CSI
rA m CM

OO«-«OO| OOO
1"
D O O O O

 IA in
CM r- o csi -i
O rsi o O O
' CM
•O ro ro I*- co
ro -J in -J- (si
o (sj csi co r-
rj CSI -• r-l PI
D O O
300 .
300
D O O
D O O
-< o o
boo
o o o
0 O 0
000
300
•- en ^
•Q t-'CO
•t O 1A
-• CO IA
A O -O

•>- o o •
si ro (A .
SJ (SI (SI


o o

0 0 -
o o

o -o -a
~- o o
rSl r-l CM
INJr-ICSI Ui • CM j-t r-l CM _ - •
-i- -a r- rsi ^
^ CA C '1 CSJ C^a
,-o r- ca c* o
xnr-ocoot^-co-o
n"T"--
3: 2:
NCMIMfMCMNCMCMfMrt

r-
r-(
•V
r-l
•o
CO
IA
IA
(M
•o
CO



CO
-o
(M
o
»A
CO

•a-
•o
o
CSI
o
1


«SI
SA
CO
>f
•






IA
CO
(M
CO
r-
cs
0>
m

	






-4-
(SI
IA
r-
I
•o


3 «
O:
UJ
Z c*»
IA
•


(SI
O
kTV

(SI
O
(SI
O IA
• CSI
(SI

.
o
-J-
t
O
sr \f\
s
c*
5
(1) Water oquivaloni of mow and ice on ground.
SNOW MLY (2) If 'no / (salidus) appears, speeds are gusts. Figures for directions are lens
) 0+ ClEAIt ClOY ClOY of degrees from Irue North; i.e., 9 = Eosl, 18 =SoutS, 27 = Wesl, and 34 =Norlh.
4 1178' When directions ore in tons of degrees, speeds ore fastest observed 1-minule values.
(3) S-S indicates sunrise lo sunset and M-M midnight lo midnight.
(4) Enlry of 1 indicates occurrenco, 0 indicates no occurrenco. Weoiher types orei
.. • • F =fog, visibility moro than '/Smile; T = thunderstorm; Ifc &~ K, A =hoill R = roinj
—,- . 	 S =snowj 2 =glazo; D =dust, visibility Vjmilo or less; KH =imoko or hoio or both;
BS = blowing »now; ond HF =heoxy fog • (visibility 14 mil* or less du« lo fog).
+ (SI |
*!«! i
o^-H IA
c
. . o
.i » N 5
Si V •;
Tf^J
Si y j5
2 | | — o
Q ' A 0
«. '. *~~co ^
2 si v
z
                                        177
-------
      -
      Olu
      00
                           o  rum mix »« o«t nrainom r-+ me* «A o> •»    n»    «-»
                                       •«•«•«••••••«»*        -«       on    •» o
        0


        a
      IT*
           Or- CMVtCM
           0*»<
               CMRItriCMfX^IWfM
                      oooo

                    ooooo
           ooooooooooooooo
5 0

5 *
S o
  CM
        -
           ooooo


           _• O O'*-*
                    OOOOOOOOX^
       S3
  	  _   -<—«o> r- o o «in r- —
»«O'<» o  «coo  «r-r»^
           -rmc««
           r-» OCMO

       gi
       IS
  •»


  X
  o "^
       •:
           ooooo
           CW O O -
                                     O <


                                     00-


                                     O I
                                       ooooooooo ooooo
                                     oooooooooo ooooo
                                              OOO-^O 3OOOO
                                     oo-<»--«ooo-^ >-
                                     O «fk !*• Ol>-
                                     n m m a> tn
                                                      U tMUl til I
                                                           z:
                                                      ooooo
                                              O OO t-O
                                              oo> m r-r^
                                             OOOO


                                            >~t.«,«O
                                                        r- •» « o
                                                       ••»••» CO O
                                                        IUUIIU IU
                                                             z
                                                       or- o 01 o
                                                       ooooo
                                                       ooooo
                                               -^

                                                                           S»

                                                                                            Z


                                                                                            O:
                                                                                            u O
                                                                                            fS
                                                                                       o*   u
                                                                                       t-f    - 5
                                                                                              o
                                                                                            2 0
            XDQ ai^t jo XjDiuu
                                                                        jtY puo jiu
                                             178
-------
B.  Day (col. 10-11)

    1.  ^>  Possible for month
    2.  Missing

C.  Max. Temp. (cols. 12-14)

    Legal punches are:  X, 0, or 1 in col. 12 and 0-9 in cols. 13-14.

    1.  Illegal punches
    2.  < Min. Temp  (cols. 15-17)
    3.  < Min. Temp. (cols.  15-17)  of previous day
        Print negative values (X punch in col. 12)  with a minus (-)
        preceding numerical values in cols. 13-14.

D.  Min. Temp. (cols. 15-17)

    Legal punches are:  0 or X for col. 15 and 0-9 for cols. 16-17.

    1.  Illegal punches
    2.  > Max. Temp. (cols.  12-14)  of previous day
        Print negative values (X punch in col. 15)  with a minus (-)
        preceding numerical values in cols. 16-17.

E.  Precipitation (cols. 18-21)

    Legal punches are 0-9 or BBBX

    1.  Illegal punches
    2.  "0000" with cols. 22-24 other than "000" or "BBB"
    3.  Col. 21 "X" with cols. 18-20 other than B
    4.  Any of cols. 18-20B with col. 21 "0-9"
    5.  > 1000
        Print BBBX as "T."  Also print "OOOX" as "T" but flag as
        error as indicated above.

F.  Snowfall  (cols. 22-24)

    Legal punches are 0-9 or BBX

    1.  Illegal punches
    2.  Col. 24 "X" with cols. 22 and 23 other than B
    3.  Cols. 22 or 23 B with col. 24 "0-9"
    4.  > 200
        Print BBX as  "T."  Also print "OOX" as "T," but flag as
        error as indicated above.

G.  Snow Depth (cols. 25-27)

    Legal punches are 0-9 or BBX.  May also be B for entire field.

    1.  Illegal punches
    2.  Other than "000" with cols. 22-27 for preceding day and cols.
        22-24 for same day punched all O's.

                              179
-------
    3.  Col. 27 "X" with cols.  25 and 26 other than "B"

    4.  Cols. 25 or 26 B with col. 27 "0-9"
        Print "BBX" as "T."  Also print "OCX" as "T," but flag as error
        as indicated above.

H.  Peak Gusts, Direction and Time (cols. 28-35)

    Legal punches are:

    0-9 for cols. 28-30,
    The "Alpha" Compass Point Code for cols. 31-32, and
    000 - 239 for cols. 33-35 or entire field may be "B."
    An "X" in col. 31 is programmed to convert peak gust speeds from
    knots to mph and publish under fastest mile heading with "/" following
    the direction as an indicator of peak gust speed.  Omission of "X" in
    col. 31 will be flagged by "$" following the direction on the edit
    listing.
    A "#" following the speed and direction spaces on the edit calls atten-
    tion to entry of speed with direction omitted.

    The Alpha Compass Point punches are:

    00  C   (calm)  22   NE    44   SE    66   SW
    11  N          32  ENE    54  SSE    76  WSW
    12  NNE        33    E    55    S    77    W
    18  NNW        34  ESE    56  SSW    78  WNW

    1.  Illegal punches
    2.  Cols. 28-30 > 050
        Print in Alpha Code Letters.

I.  "Days With"  (cols. 41-51)

    Legal punches  are 0 or 1 if in station's program, otherwise all cols.
    should be B.   (If punched, all columns should be punched.)

    1.  Illegal punches
    2.  CoL. 41  "0" with  "1" in col. 51
    3.  Col. 43  "1" with  either or both  fields  (cols.  18-21, 22-24)
        all  O's.
    4.  Col. 43  "1" with  min. temp.  (cols. 15-17) > 044
    5.  Col. 44  "1" with  cols. 18-21 "0000"
    6.  Col. 44  "1" with  cols. 43 & 46  "0" & cols. 22-24 other than "0000"
    7.  Col. 45  "1" with  cols. 18-21 "0000"
    8.  Col. 46  "1" with  either or both  fields  (cols.  18-21, 22-24) all O's.
    9.  Col. 46  "1" with  min. temp.  (cols. 15-17) > 044
   10.  Col. 47  "1" with  col. 45  "0" (some exceptions, but flag)
   11.  Col. 47  "1" with  min. temp.  (cols. 15-17) > 039
   12.  Col. 50  "1" with  either cols. 28-30 or 59-60  (if punched 010 or 10
        respectively).


                                    180
-------
J.  Sky Cover (cols. 52, 53)

    Legal punches are 0-9 and X for both cols, or "B" for col. 53
    if cols. 41-51 are B.

    1.  Col. 52 B with other than B in col. 53
    2.  col. 52 > 3 greater than col. 53
    3.  Col. 53 other than B with cols. 41-51 B
    4.  Col. 53 > 2 greater than col. 52

        Print "X" punches as "10"

K.  Sunshine and Percent of Possible (cols. 54-58)

    Legal punches are:  000-199 for cols. 54-56, 0-9 or X for col. 57
    and 0-9 or B for col. 58.  Also entire field may be B.

    1.  Illegal punches
    2.  Col. 57 "X" with underpunch
    3.  Cols. 54-58 are blank
    4.  Col. 57 "X" with other than B in col. 58

        Print as "100" when cols. 57-58 punched "XB."  Also print
        "100 when col. 57 has an "X" punch regardless of other illegal
        punching in either or both cols. 57 or 58, but flag as error
        as indicated above.

    5.  With cols. 54-56 punched 000, cols. 57-58 with other than zeros
    6.  With cols. 57-58 punched 000, cols. 54-56 with other than zeros
    7.  With cols. 54-56 punched greater than 000, cols. 57-58 will be
        greater than 00.

L.  Fastest Mile and Direction (cols. 59-62)

    Legal punches are:  0-9 for cols. 59-60 with an X overpunch permitted
    in col. 59 for speeds of *> 100, 00-36 for cols. 61-62 if neither
    col. has an "X" overpunch, and the "Alpha" Compass Point Code (see
   VIII,H above) if either or both (cols. 61-62) have an "X" overpunch.

    Illegal punches:

    1.  Cols. 59-60 "00" (without "X" overpunch in col. 59) with
        other than "00" in cols. 61-62.
    2.  Cols. 59-60 > 50.
    3.  Cols. 59, 61 or 62 punched "X" without an underpunch 0-9.
    4.  Col. 62 "X" overpunched with no  "X" overpunch in col. 61.
                                181
-------
          Print:

          1.  "1" preceding speed punched in cols. 59-60 when col. 59 has an
              "X" overpunch.

          2.  Direction in the "Alpha" code letters when either or both cols.
              61, 62 have an "X" overpunch.  (See VIII, H above.)

          3.  A dash (-)  in col. following direction with col. 61 has an "X"
              overpunch.

          4.  A plus (+)  in 2nd col. following the direction when col. 62 has
              an "X" overpunch.

      M.  Water Equivalent  (cols. 63-65)

          Legal punches are:  0-9 or B.  Water Equivalent is in inches & tenths.

          Illegal punches:  B in any of cols. 63-64 with col. 65 punched 0-9.

          Other cols. (36-40, 66, 68-80) should be B.

IX.   Machine Computations

      Various sums, means, departures  (from pre-programmed normals), frequency
      counts, summary cards, etc., necessary in the verification program and
      used in the preparation of formats for the LCD, CDNS, and Table J are
      made by the computer.

      Print the sums, averages, etc.,  from the data available when some days
      and/or items are missing.

      A.  Daily Computation are made for:

          1.  Average temperature
          2.  Departure from normal
          3.  Degree days

      B.  Monthly Sums are  computed and listed for:

          1.  Max. temperature
          2.  Min. temperature
          3.  Mean temperature
          4.  Degree days,  heating and cooling
          5.  Precipitation
          6.  Snowfall
          7.  Sunshine
          8.  "Days With" (if in  station's program)
          9.  Sky Cover  (SR-SS & Mid-Mid)
                                       182
-------
C.  Monthly Averages are computed and listed for:

    1.  Max. temperature
    2.  Min. temperature
    3.  Mean temperature (this is 1/2 the sum of the average
        max. and min., C, 1 & 2 above).
    4.  Average percent of possible sunshine (sum of daily
        percentages divided by the number of days).

        Monthly percent of possible sunshine is computed from
        total sunshine recorded and the pre-programmed possible
        amount, sunrise to sunset.
                             183
-------
D.  Monthly Departures are computed and listed for:

    1.  Mean Temperature
    2.  Degree days, heating and cooling
    3.  Precipitation

E.  Seasonal Departure for Degree Days (from seasonal totals carried
    forward from preceding month and current month's total) are com-
    puted and listed.  Season begins with July for heating and January
    for cooling.

P.  Extremes and Dates are selected and listed for:

    1.  Highest temperature
    2.  Lowest temperature
    3.  Greatest precipitation
    4.  Greatest Snowfall
    5.  Greatest Snow Depth
    6.  Greatest Wind Speed and Direction

    (When the same value occurs on two or more dates, the date of
    the last occurrence followed by a plus (+) is  listed.  Also,
    the direction of the last occurrence of multiple "Greatest Wind
    speed" is printed.)

G.  Frequency Counts are made and listed for:

    1.  Temperature

        a.  Max. :£ 32
        b.  Max. ^90, except 5^70 for Alaskan stations
        c.  Min. 2:32
        d.  Min. :a 00

    2.  Precipitation

        a.  Trace  (BBBX)
        b.   > 0001
        c.    >, 0010
        d.   5; 0050
        e.    > 0100

    3.  Snowfall

        a.   =>: 010

    4.  Character  of Day  (SR-SS)

        a.  Clear  (Avg. 0-3)
        b.  Partly Cloudy  (Avg. 4-7)
        c.  Cloudy  (Avg. 8-10)   (Punched 8,  9, or  X)


                                184
-------
X.
Precipitation Data Card Images
       A.  Program Involved.  Hourly precipitation,  monthly extremes, and
           maximum precipitation.

           1.  Hourly precipitation, greatest amounts of precipitation,
               snowfall, and snow depth and maximum precipitation are con-
               tained in a series of tape formats currently known as the
               HPD Deck.  These are identified as to station, year, and
               month in the same manner as the WBAN #1 and #3 cards.
STATION
NUMBER
OOOOC
1 2 1 * 5
11111

YR
00
C I
1 1
DATE
WO
00
1 >
1 1
DAY
00
iQ II
1 1
|c«flO W.MBSR 1
22222|22|22|2 2
(CASO HUMSER 2
3 33 3 3|3 3|3 3i3 3
[CAP? DUMBER 3
44444)4 4I44J4 4
JCARO «uuacR 4
55555
55
00 HOT
PUNCH IN
T1CSE CCUUHi
9999999
1 1 ! 4 ill 1
55
NS
I 99
"
5 i
66
7 7
86
99
it 11
.
8
0
i?
1

2

3

4
5
6
7
E
c
11
c«os • c---j 2 ;*»- i
1
IE
0100
!j|l4 IS
111 1
0.00
2122
1300
3|33
9 MM
4|44
1
5|55
6|6G
'!"
he
3l99
1) U IS
1
!n
i
0:00
5|IJ IS
111 1
0200
?I22
1400
3133
10 M'.N
4|44
1
5155
1
E|66
1
7|77
1
6)88
1
9)99
KI7II
1
'It
1
0100
nb a
111 1
OiOO
21:2
IWO
313 3
.3 UH4
4|44
1
5|55
t
6166
1
7177
1
818 0
1
9199
HHII
1
:'E
OIOO
nb ;t
ill i
0400
212 2
isoo
3133
20 MiN
4|44
1
5135
1
6)66
1
717 7
1
eise
i
9199
nilU
1
•JE
0100
-.Is .-7
111 1
0500
'.M 2
17X>
3133
W M"l
4j;4
1
5|55
1
6|6 6
1
717 7
1
8|88
1
9190
Bai7
1
•!c
OIOO
3sl:j M
ih 1
O6UO
2I2 7
leoo
3I33
45 M.S
4U4
1
5155
1
6|G6
1
7177
1
8138
1
919 9
nil 30
1
,'E
o;oo
'ii t
0700
2122

_•_•. - - .. .
TS
3
0 010 03 0 OJO 0
i ih i
1
1 2|2 2
,i,
4j
j 515 5
1
56|6 6
1
7 7|7 7
1
88188
1
3 919 9
57 53 53 £3
5
Q

51
1 1
2 2
3 3
',

5
6
7
8
9
ti a
1 1
2 2
33
4

5
6
7
8
9
C4 IS

•R
~!


0 O'OJO'I 0
1 111
,i
1
3 3,3
1
4 4U
1
5515
1
6 S|6
1
77|7
1
9813
I
9 919
u t; &
it
z.
<
ll
<
£S
i i
22
3 o
_i
5
E
7
8
9
ESI
0~0
1 1
22
3 3
11
5 'j
86
7 7
38
9 9
'2 73
Ji_
00
1 1
2 2
33
4J
5l
66
7 7
38
3 9
14 IS
j"1
0
c
a:
c
4
0 0
1 1
2 2
"
4
5
6
7
8
o
77 «
j 0
; i
i
22
3 3
•1 4
55
6 E
7 7
8 8
3 9
o «
                                  Fig.  5
            2.  Hourly precipitation, HPD card format  1 or 2 in col. 12
               as  identifier.

               For each  station  in  the LCD program, #1 and 2 HPD data are
               keyed each day with  precipitation  and  for the last  day of
               the month whether precipitation has occurred or not.  If
               stations  are  not  equipped with recording gages, amounts are
               keyed only at 6-hourly synoptic times.  In this case, the
               daily total is not keyed in the second format; the  monthly
               total, however, is keyed in the last format of the  month  for
               all stations.
                                   185
-------
B.  Checking Procedure - Hourly Precipitation

    1.  The checking is accomplished by a computer cross-foot listing
        to insure internal compatibility.  A second check is made be-
        tween the daily totals and the monthly total.  The cross-foot
        for each station is begun by building in the memory of the
        computer a grid of zeros for all days in the month, i. e.,
        28 days, 30 days, etc., as the calendar requires.  The keyed
        data are read into the grid and then edited.  Information
        concerning missing record, blank fields, erroneous keying,
        and arithmetic mistakes is listed to the right of the data
        field.  If a record is missing, the computer will list all
        hourly fields as having zero precipitation with indication
        to the right that the record is missing.  In the case of
        duplicates, only the last presented to the computer will be
        used and duplication indicated to the right.

        Since the presence of the HPD 1 & 2 record is a controlling
        factor, stations not having hourly precipitation must have
        "dummy" records for the last day of the month, containing
        only identification, date, and card number data.

        The HPD #4 card image  (4 in col. 12) has the greatest 24
        hour precipitation and date, snowfall and date, and great-
        est depth of snow on the ground and date.  There is only
        one #4 per station month.

    2.  The edit checks of the HPD 1, 2, and 4 card images are
        as follows:  (Sample shown on page  ]ygm

        Column   Data                          Edit Check

         1- 5    Station No.      Sequence checked by number with a
                                  4 punched in column 12 of 1st image.

         6- 9    Year & Month     Values are checked and must be the
                                  same for the entire edit.  Month
                                  must be  in range of 01-12 in cols.
                                  8-9.

        10-12    Day Card No.     Only days with pcpn. are keyed ex-
                                  cept for the last day of the month.
                                  Each day will have only  two images
                                  identified as 1 and 2 in col. 12.
                                  No. 2 has the daily total in cols.
                                  49-52.   The #4 in  col. 12 will not
                                  have day punched in cols. 10-12.
                              186
-------
   Hour Iy  edi t
                                                       10  1J  12 __
00001
oooof
ooeoi
ooooiT
00001
ooooi"
00001
O'OOOl"
• 00001
00001"
ooool
~~oooor
00001

ooool
00001
70
70
70
"70
70
'70
70
"70
70
70
70
70
70

70
70
ooboT"7o"
00001 70
~ 00001
70
06
~06
06
T6
06
"06
06
"06
06
"06
06
"06
06

06
06
1
2
~~S
6
~7
'e
"10
11
"12
1«
"15
IS

21
22
""66-2S
06 26
"06
'JO
1 000
2 000
~i~ooo
_2 	 000
1 000
2 000
~i ooo"
2 000
1 OCO
2 OCO
'V'ooo
2 000
'l 002
2 000
~i~" 000
_2_COO
1 000
~2_ 000"
"1" 000
2 000
1 000
2 000
1" 000
2 000
1 000
2 000
1 .000
2_000
1 000
2 000'
"T~OOV"
_2__005
1 000
2 000
1
000
000
000
000
000
000
000
000
000
'ooo
000
000
010
000
000
000
000
000
000
000
ooo
'000
000
000
000
000
000
000
000
000
010
010
000
000

oos
000
000
003
000
000
000
000
000
ooo"
001
005
005
010'
003
000
000
000
001
001
000
000
ooo
000
000
ooo"
001
001
001
'005"
"010~
010
T
• r

00-
000
000
000
T
000
000
000
000
005
000
002
015
005
000
000
000
001
001
T
T
000
000
000
000
001
001
001
005
015
010
000
000

000
004
000
000
000
' 000
000
030
000
005
005
003
005
005
000
000
"ooo
001
OU1
000
"ooo
001
001
T
f
001
001
001
' f
"005
T
ooo
000

000
CO*
000
000
000
000
0 1
000
000
000
005
005
000
" 10
000
000
T
T
001
001
000
"ooo
000
000
T
	 T
000
000
000
010
"020
oos
000
000

001
"ooo"
000
000
000
~ooo~
000
000
000
"ooo
005
000
000
"010
000
000
T
T"
000
000
000
000
000
000
000
~00l'
ouo
000
000
000
"o:'5"
005
000
ooo

000 <
000 (
000 <
000 (
000_<
ooo"i
000 (
000 (
000 (
ooo~c
005 (
000 <
005 (
005 (
000 (
000 (
000 <
000~<
000 (
000 (
000
000
000 (
000 (
000 <
ooo~<
000 !
001 <
001 (
ooo'c
020~(
010 C
000
coo

                                                   000 000 000
                                                   ooo ooo ooo
                                                   000 000 000
                                                   000 000 000
                                                   ooo_ooo_coo
                                                   ooo ooo ooo"
                                                   ooo ool ooo
                                                   000 000 000
                                                   000 000 000
                                                   oco'ooo'ooo"
                                                   005 000 000
                                                   000 000 000
                                                   005 00  000
                                                   005 000 000
                                                   005   T 005
                                                   000 000 000
                                                   000 000 000
                                                   ooo'ooo'ooo
                                                   ooo ooo ooo
                                                   000 000 000
                                                    _T__000 000
                                                    ~T 000"~000~
                                                   000 000 000
                                                   001 000 000
                                                   000 000 000
                                                   coo" ooo ooo'
                                                   000 000 000
                                                   000 000 000
                                                   001_000 000
                                                   000 000~000"
                                                   O'.O" 005 005"
                                                   005 005 000
                                                    _T_000__000
                                                     T 000 000"
                                                             000
                                                             "ooo~
                                                                     0010   ERROR HR  3 *
                                                             ooo         MISSING no.  i  CARD
                                                             000  _0003  ERROR HR   _ 16	
                                                             ooo_
                                                             000
                                                                         _H 1SS1NC _Np_._V_CARD_
                                                                     00  -   CROSSFOOT IKH.OH
000
000
000
000
--5
000
000
000
000
000
coot

T


0092'
002)
ERROR HR 6
HISSING N3. 1 CAR?
ERROR KR 17
ERROR HR 12

ERROR HR 1018

                                                               000
                                                             000
                                                             000
                                                               000
                                                                     oooa
                                                             000"
                                                             000
                                                             000_
                                                             'ooo
                                                             000
                                                             000
                                                             000
                                                             'ooo
                                                                     0003
                                                                     0001
 0007_


-oWf
                                                             010
                                                             010
                                                             009
                                                             000""
                                                                     020",
_ 00001. 70 .0»_.f.
_CiRD. MONTHLY. .TOTK_0<.30_...CC!',?UTE 3

 CCMPUTED
                                   TOTAI._.03ao__£RRCR._

                                    _ 25 __________
  6 Hnurly Edi t_
23237 70 06 t 1
2
23237 70 06 » 1
2
23237 70 06 12 1
2
23237 70 06 1* 1
2
23237 70 06 25 1
2
23Z37 70 06 24 1 '
2
23J37 70 04 30 1
'• 2
23237 70 04 4
000
ool
005
000
000
000
T
000
ooo
000
000
000



000
T
001
000
000
I
T
000
000
T
T
000




CROSSFOOT ERROR 0001
CROSSFOOT ERROR OOOi

CROSSFOOT ERROR. T
CROSSFOOT ERROR T

CROSSFOOT ERRBR T
CROSSFOOT ERROR T


0007 00309 000 000
	K(
                                              HPD   EDIT   LISTING
                                                     187
-------
       Column
Data
             Edit Check
       13-48
Hourly Values
       49-52
Daily Total
       53-56
Monthly Total
Each hour has three cols,  for data,
i. e., hour 0100 cols. 13-15, etc.
Zero pcpn. is keyed "000."  BBO, BOO,
OBO, COB & BBB are flagged.  All cols.
are keyed with zeros placed to fill
the col.  Blanks are flagged.  Trace
amounts are indicated by an X in the
right col. of the hour, preceded by
two blank columns. Punched data of OOX,
OBX, BOX and over-punches are flagged
as errors.  A trace is indicated by
an X and accumulation by a Y punch.

Flagged for error when these data are
omitted from the HPD #2 or keyed in the
#1.  When entered, the field is fully
keyed and errors are indicated for
blank columns.  Trace is BBBX.  Data
are flagged for OOOX, BOOX, etc.  The
daily total is checked with the values
in cols. 13-48 of the HPD #1 & #2
cards.  If the values do not agree it is
indicated as a "cross-foot error" and
the amount of error is shown.  The cross-
foot does not function if there are
illegal punches.

Keyed in the #2 card of the last calen-
dar day of the month.  This datum is
listed at the bottom of the edit as
card total.  It is compared to the com-
puted total taken from all daily #2
cards.  If the totals are the same, the
word  "agree" is printed and, if not,
the word "error" appears.  If the total
is omitted from the last #2 card, the
card total is blank and error indicated.
C.  Checking Procedure  - Extreme Precipitation

    1. The remaining data are contained in the HPD #4 card image.  This
       card contains no date in cols.  10-11,  and cols.  13-56 are blank.
       The card is listed on the edit below the last day of the month.
       Column
Data
             Edit Check
       57-60     Greatest pcpn.
                 in 24 hours
                 The value is checked for illegal
                 punching.
                            188
-------
       Column

       61-65
Data

Date of 24
hour amount
       66-68
       69-73
       74-75
Greatest 24
hr. snowfall

Date of 24
hr. snowfall

Greatest
snow depth
       76-78
       79-80
Date of
snow depth
None
             Edit Check

Col. 61 is keyed zero or X.  Other
values are flagged.  When the value
in 57-60 is 0000 these cols, will be
blank and are flagged if not.  The
field is fully keyed if there is a
value for 57-58 and listed as an
"error pcpn. date" if miskeyed.

Datum is keyed the same as the hourly
pcpn. and has the same error check.

Same check as in cols. 61-65, with 69
keyed 0 or X with data in cols. 66-68.

Zero is keyed for no snow.  2" = 02;
110 = X/10.  Note:  The snow depth is
keyed in two cols., but prints to
three places.  This is to accommodate
the overpunching for values greater
than "99."

These cols, are blank if the value
in cols. 74-75 is 00.  76 is keyed X
for + dates or zero.  Other values
are flagged.

HPD cards 1, 2, and 4 are blank in
these fields.
    2. The edit contains a "Computed High - 24 Hour Precipitation"
       with dates.  This is a guide to checking this value on the LCD.
       There is no check by the computer between this value and the
       one keyed in the HPD #4 card.

D.  Correction of HPD Data

       Data contained in the HPD 1, 2, and 4 card forms are corrected
       by submitting to the computer a new card punched in its entirety
       containing the information to be updated.

E.  Maximum Short Period Precipitation.

       For each month, maximum precipitation is keyed as two records:
       1 in col. 10 with data for 5, 10, 15, 20, 30 and 45-minute
       periods, and 2 in col. 10 with data for 60, 80, 100, 120, 150
       and 180-minute periods.  See page 31 for the keying format.
       Day and time entries designate the end of the time period in
       which the amount of precipitation occurred.  Day and time are
       omitted when the amount is zero, trace, or missing.
                                189
-------
A computer edit program checks completeness and consistency
of the data and produces an edit listing with flags indicating
the deficiencies.  The flags and associated deficiencies are
as follows:
A = Record #1 missing
B = Record #2 missing
C = Month < (2(1 or > 12
D = Day < 01 or > 31
E = Hour < 00 or > 23
F = Minutes  59
R = Amount zero or trace
S = Missing "M"
T = Pcpn, 5; 0.01 with day
    or time missing
W = 10.00 or greater
AA
AB
AC
AD
AE
AF
AG
AH
AI
AJ
AK
AL
AM
AN
= 10
= 15
= 20
= 20
= 30
= 30
= 45
= 80
= 100
=120
= 60
= 150
=120
=180
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
> 2
;> 5
:p=- 5
-y* 2
^»10
•y* 2
^=*15
>>20
?-20
^•20
:=- 2
5>30
> 2
^•60
X
MIN
MIN
X
MIN
X
MIN
MIN
MIN
MIN
X
MIN
X
MIN
5 MIN
+ 10
+ 15
10
+ 20
15
+ 30
+ 60
+ 80
+ 100
30
+ 120
60
+120
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
Corrections are keyed in the format shown on page 31, enter-
ing data for the time period involved only, for updating the
tape.  The updated tape produces printers copy for use in
the CDNS Annual.
                      190
-------









s
H
IS
H
H
O
§
a
w
p<
o
ffi
0)
S
H
i
o
E-J
O

S3
H




ana
on 15
ens
onC
o» SS
enS
o»8
en 3
ons
enS
enS
ens
en 3
en9
ens
en at
en a
ens
en 51
en 3
en S
en 3
en 9
enS
en 9
en*
coy
en 3
en n
COS
cnS
e»3
on n
c»$i
e»R
en n
cnR
en*
en R
o>S
0>R
er>K
en K
cog
en 2
en 5
en c
o»S
en B
en :£
enS
en ~
en 2
en •»
en <•
en r»
en «•
en •<•
en ••
en •%
O9 w
en —

.
£
-H
X

m

1-1
X
«n
81
O
tj


O
3
|
~
n
3
I


0
c

M






z
rt
•

X
•H
O
?
X
•-«

-5
2
8
S
3
X


«
•H
S
c«



8
C
»5
C
IS
K
T£ 
-------
                   Validation,
                   Compaction,  and
                   Analysis of  Large
                   Environmental
                   Data  Sets

                   By John Jalickee
                      Jerry Sullivan
                      Richard Rozett
                   EDS scientists have  developed a tech-
                   nique which, among,  other  benefits,
                   allows them to compact a data set of
                   184,000  values  into  an  equivalent
                   data set of fewer than 6,000 values,
                   while retaining 90  to 98 percent of
                   the  variability of the  original data
                   fields. Moreover, much of the remain-
                   ing  variability appears to be sensor
                   noise.
Introduction
Large-scale  environmental  field  ex-
periments  such  as  the  Barbados
Oceanographic  and  Meteorological
Experiment  (BOMEX), the Interna-
tional Field Year for the Great Lakes
(IFYGL),  and the  GARP (Global
Atmospheric Research Program)  At-
lantic Tropical  Experiment (GATE)
produce huge data sets and attendant
large-scale problems in data  valida-
tion, analysis, and synthesis. New and
more sophisticated  techniques   are
needed  to  extend  and  complement
traditional  methods when working
with such large data sets.
  The failure of conventional smooth-
ing techniques to adequately  remove
noise from  an  IFYGL  rawinsonde
(atmospheric sounding)  wind data
set   and  still   retain   meaningful,
though highly variable, natural fluc-
tuations  led  the  authors and  other
scientists of EDS' Center for  Experi-
ment Design  and  Data  Analysis
(CEDDA)   to  try  a  new method,
called the asymptotic singular decom-
position  method,  or ASD  for short.
The resulting computer program elim-
inated the noise and retained  the es-
sential data.  It also greatly reduced
the size of the original data base and,
through  intermediate  graphics,  pro-
vided  a  quick and  efficient  method
of error  detection,  while  isolating
physical  relationships  and character-
istic  patterns.

The  ASD Method
The idea behind this data decomposi-
tion technique is to extract meaning-
ful information  in the form of char-
acteristic  patterns.  As an  example,
consider  a  meteorologist  studying
daily  maximum  temperature data for
the east roast. Station by station, he
observes that, in general, it is warmer
in summer lhan in  winter: from this
he abstracts  a typical seasonal  vari-
ation. On  the other  hand, studying
station-to-station  variations, he notes
that temperatures are generally colder
in the north than in  the south at al-
most  any  time  of  year. With  these
two  characteristic  variations  (space
and time)  lie can qualitatively explain
the main  features  of  the entire  data
set. And b) retention of a relatively
few significant temperature \alues he
could quantitatively describe  perhaps
')()  percent  of  the  east  coast maxi-
mum  temperature  field.
  The ASD data decomposition meth-
od adapted by CEDDA formalizes this
process  and provides a technique  to
calculate  characteristic  patterns for
small and  large data sets.  I  sing the
ASD method, dominant patterns with-
in the data are easily extracted in an
objective,  repeatable fashion.  In many
respects, the science of ASD  is much
akin to  the art of the caricaturist: the
major  features  of  the  subject  are
quickly shown  with a few sure, deft
strokes.
   CEDDA scientists have  used  ASD
to reduce  the quantity of data needed
for a  sufficient representation  of  a
physical situation:  often the equiva-
lent data set is an order of  magnitude
smaller than the original  one.  Data
generated by the method also  are used
in calculations  that require relatively
noise-free   data:  random  noise   is
smoothed out, while real discontinui-
ties or  sharp changes are relatheh
unchanged. An  unexpected bonus of
the method is  its  error-detection ca-
pabilities: keeping  with the caricatur-
ist analo<:\.  distorted  (erroneous^
features stand  out sharpK.  Physical
relationships within  the  data,  often
buried  b\ the volume of numbers, are
also highlighted h\ the  method.
   The  ASD  method is  related   to
other  statistical techniques  such   as
principal   component  analysis.  Lor-
enz's1 empirical orthogonal functions
in meteorology, and the  factor analy-
sis method of psychologists, political
scientists,  and  sociologists:  however.
ASD has  the advantages of simplicit\
and accuracy. A factor analysis com-
puter program might fill over a thou-
sand  punched  cards,  while  ASD
would  use  a  hundred.  And  ASD  is
almost  immune to  computer  roundoff
error,  an  important  consideration
when large data sets aiv in\ol\ed.
                                                    192
-------

























x
t
*
•=
*
0-
ft,
:
a
C,
L.
a.


























18
-2*. 3
-22. C
-Jfl. 5
-15.5
-19. 0
-17.9
-io. e
-Iv. 7
- It. t
500-13.2
-12.0
-10.8
— 5.0
-8. -3
-e. 3
-7. 7
-7.1
-o.5
-b. 0
400-6.0
-"•. 3
-".. 0
-3. 5
-2. 7
-<:• u
-1. 5
, D
1. D
300 l.o
2.7
3.1
3. a
t. 5
t. 9
3 . t
o.O
o. 7
7. 0
200 7. 7
0.2
0.3
3.9
9.2
9.6
9. 8
9.0
5. 0
9.3
100 S.t
9.o
5. 5
1U. 1
9.8
9.5
9.t
9.5
9. *»
5.b
0 5. 8

1 21
-23. J
-22.5
-21.5
-2j.9
-20. U
-19. u
-17.9
-I'.b
-15.2
-»U.C
-12.6
-11. t
-9.3
-b.b
-5.1
-f .'
-/ .0
-o. u
-3.7
-5.2
-t.5
- 1 . t
-3.5
-2.°.
-2.3
— 1.3

• d
1. c
2.6
3.5
t.2
t . /
j.l
5.5
6.2
7.0
7.2
8 • u
8.5
9 ,j
9 ,u
9.5
!<..<•
10.5
11. t
11.7
12.0
12.3
12. b
13.0
12.}
li.b
1 ?. 1
11.6
11. H
11.3
11.0
10 .b
Nov. 3
00
-2o.l
-2t.t
-<: J.I
-fink
-20.3
-10.9
-17.?
-16.3
-It. 9
-13.6
-13.0
-ll.o
-in /
i. J • /
-9.9
-9.2
-e.s
-7.6
-7.1
-o . 2
-5.3
"* J • (l
-•». t?
*" «* « *t
-3.5
-2.9
-l.o
t C
1. 3
1.9
3.T
t.2
5.2
6.1
/.I
0.2
9.2
10.1
n.?
11.1
10.5
1U.7
10. C
10.2
10.3
10.7
11.0
11.6
12.3
12.5
13.3
13.5
1-..5
It .5
13.0
13. t
11.2
13.S
13.o
12.0

03
-25.?
-23.0
-22.6
-21.0
-20.0
-19.3
-10. J
-17.0
-lo.2
-l^.o
-It. 1
-13.t
— 12.0
-10 . y
-9.6
-9.7
-o. u
-7.7
-O.O
--•. t
-t.3
- '. 1
-c. 5
-1.9
-1.0
-.8
. 1
.ft
1.3
i. 5
.9
1.8
...9
T ^
t.5
5.0
5. U
-.. . 7
t.5
fa. 9
7.6
7.9
0.7
5.2
5.7
5.0
1C. t
11.0
11.7
12'. t
13. u
13.5
1^.3
12.8
12. 7
13.2
1-..5
13.1
12.1

' Oft
-2i..t
-25. 0
-23.7
-22.0
-22.3
-21. t
-20.0
-10. 3
-10.2
-1 7.2
-15.9
-It. 8
-12.0
-1 !.!•
-1 C.7
-9.t
-S.t
- / • 3
-6. 5
-3.3
-t.7
- ?. 9
-3.<
- 3 . t
-t.t
~ 3 • i
-3.7
-3.3
-t.6
-3.9
-3.'
-2.'
-2.0
-l.o
-••i
-.a
-. ^
.2
.1
1.7
2.t
3.7
t.l
t . 0
5.o
6.2
7.1
7. i
3.7
9.7
10.3
11.1
ll.l
11.7
12.1
12.o
13.2
13.1
12.0
Time
09
-2*. 7
-28.7
-27.l)
-2o.O
-25.1
-2U.1
-?.>. 1
-22.0
-2i . 9
-Zll.i
-18.7
-18.?

-I5.o
-la.O
-li./'
-It. 0
-13. t
-13.0
-12.0
-11.0
-lu . 3
-•3.3
-K.7
-b.3
-9.f-
-5.1
-l.«
-b.t
-7.5
-6.2
-7.3
-7.3
-7.C
-3.6
-3.7
-2,0
-2.0
-.7
_ C
.5
1.9
?.6
3.3
t.l.
5.2
6.1
7.1
8.5
8.7
9.9
lu.1
15 .6
11.1
11.7
12.1
12.1
11.7
(GMT)
12
-jl.5
-32.'
-33.7
-32. 5
-31.o
-J0.1
-?Q . U
-27,-n
-2o . e
-25.^,
-25. (
-23.5
-.2.0
-21.7
-tG.O
-20.0
-19. *
-18.2
-17.3
-io. ti
-16.5
-15. i
-It. 9
-In. 3
-1J.9
-1^.-
-11.0
- 1 9 . r
-10. u
-9. 3
-S. b
-s.a
-< . t
-6.?
-6.0
-5.5
-t.9
-t . 0
-o. 1
-2.5
-l.o
- . D
. 1
1.0
A. 0
2.0
^, c
1.2
t.l)
t, ,7
3.6
6.2
7.3
8.?
8.9
3.9
iu.6
10. j

15
-29.
-29.
-3u.
-29.
-28.
-30.
-29.
-27.
-2o.
-25.
-2t.
-23.
-22.
-21.
-2u ,
-19.
-16.
-17.
-16.
-15.
-15.
-lu.
-Io.
-lj.
-12.
-11.
-in.
- ^ .
-9.
-9.
-8.,
-7.
-6.=
-b.c
-t-.t
-H. "
-t *
-3.
-0.
— ? .
-1.
-1 •£
~ • C

• C
1.-
1 • *•
2.
3.
3.

5.
5.-
5.
b.
7.1
8.
9.
9.

18
-25.5
-?i* . t
-Ct. 1
-23.0
-23.5
-22. t
-? 1 . t
-i J.I
-15.2
-.a.i
-If .U
-16. 1
-15.2
-It. -j
-13.3
-12. t
-11.7
-13.0
-1L.1
-8.9
-1.3
-8.'.*
-1.9
-1 . 1
-1.1
-.H
.0
- . b
- ?.l
-3.5
-U.J
-3.8
-3.6
-3.7
-"..5
-U.I
-3.t
-2.7
-2.2
-1 .1
-.9
-. J
. .>
,S
1. J
1.9
2. a
1.7
2.3
3.i
3.*
t.7
E.5
8. D
Nov. 4
00
-25. 9
-2t.3
-2^.9
-21.1
-±9.7
-IS. t
-If. 3
-lu.U
-lo.l
-lp. 0
-It. 5
-13. T
~ A 2 • 0
-11. S
-li.O
-10.1
-?. J
-0.2
-0.9
-0. 1
-5.3
-3.2
-t . T
-t.O
-3.2
-2.7
-1.9
-2.3
-1.5
-1.2
-1.1
-. 8
-1.0
- A . 0
-1.9
-'.U
-'.3
-2.5
-3.2
-,.1
- 1 . b
-t.5
-t.t
-1.7
-3.2
-2.0
-2.0
-1.5
-1.3
-1.2
-.8
-.2
.3
1.1
2.1
2.9
o.b
t.3
•j.2

03
-26.1
-•",.7
-?o.3
-21.8
-2C. H
-15.1.
-19.3
-1< .9
-16.6
-15.7
-11.. 7
-11-. 0
~ 1 ? • 2
-12.
-11.
-1C.
-9.
-6.
-7.5
-6.7
-b.2
-5.2
-U.6
-t. 2
-3.6
-2.8
-t.9
-L,q
-t.7
-<•.!»
-3.9
-3.9
-3.6
-<«. 1
-"..7
-t.l
-3.6
- 1, 7
-1..8
-6. 3
-b.6
-E.2
-t.5
-t.5
-3.0
-o.C
-2.t
-1.6
-.8
-.1
. <4
.9
.5
-.0
.5
l.C
1.7
i.b
3.7
                                                 Figure 1. Upper-air temperature data for Stony Point, New York.
Data Compaction

Data  from IFYGL for 1972-73  pro-
vide some vivid  illustrations  of the
benefits of ASD applications. To dem-
onstrate the data-compacting capabili-
ties  of  the  ASD method  (plus  the
method itself), consider 12 successive
1FYGL ravvin^onde launches from sta-
tion Stony Pt,, N.Y.. for the  period
1800 GMT Nov. 2, 1972, to 0300 Nov.
1. 1972 (fig. 1 ). Temperature values
are pivcn  for each 10-mbar pressure
level, so that up to the 590-mbar level
we have 12 X 60 ~ 720 values. (The
pressure  variable used in all figures
is P*, the difference between surface
pressure  and observed pressure, i.e.,
P» =P,ur,.r,.-P.)  The  particular
time period  was  chosen  because a
sharp upper-air  trough was  passing
over Lake  Ontario,  producing  the
characteristic  temperature variations
represented  by the solid lines in  fig-
ure 2.
  The  object  of  ASD  application in
ibis instance is to replace the 12 col-
umns of 60 numbers with 1 column
of f>0 numbers and 1 row of 12 num-
bers, as in figure 3. In the latter illus-
tration, the column  represents the
pressure dependence of the tempera-
ture soundings, while the row repre-
sents  the  time variation. To  obtain
the 350-mbar  temperature  for 0000
GMT on November 3. one would mul-
tiply the 36th  number  from the bot-
tom of the column by the 3d number
of the row (as shown  in fig. 3), or,
                                                      193
-------
                                                                                                  500-
                                                                                                  400-

                                                                                                           A
                                                                                                           L.
                                                                                                           5
                                                                                                          i

                                                                                                          y—v
                                                                                                  300-    ft.
                                                                                                  200-
                                                                                                  100-
                                              Time (GMT)
                                                                                                  0-
to get  the  150-mbar  temperature at
inOO GMT on November 3. multiph
the 16th column  number from the
bottom ]r>\ the Oth number in the row.
   Where  did the  column  and  row
come fromV  t\u\ column and ro\\ of
numbers can be multiplied together
to generate a temperature field. The
best choice is one that minimizes the
sum of squared  differences  between
the generated field and the original
field. In practice,  the ASI)  computer
program begins with a  trial column
and row,  then  generates successive
values until there  is no further  mini-
mization of  differences  between the
two temperature  fields.
   In the example at hand, the origi-
Figure 2. Time-height temperature
analyses for Stony Point. The solid
lino are based on the original
data set. the dashed lines on a
reconstituted data set.
nal 7'20 numbers. have been replaced
by 60  -  12 = 72 numbers, a  10-fold
reduction.  The  new  field generated
by the row and column explains ap-
proximately ('0  percent of the varia-
tion  about the mean of the original
field. The  ASD  method now mav  be
used again to describe the residuals
of the original  field minus the first
generated  field,  producing  another
row  and column. Vsuallv.  about  OM
percent of the original  temperature
held  variation i? covered  li\  three
rows  and columns. The broken line*
in ft;:lire 2 show  a it-constituted tem-
peratuie field using thiee iow-  and
columns.
  Overall.  CEDP \   scientist-  wen-
able to compact 60 levels of tempera-
lure, humiditv. and wind \.ilues from
768 IF^GL  upper-air -oundinns  i (>
stations. l'2o launches each i.  a total
of 184.000 values, into an  equivalent
data set containing fewei  than 6.0(H'
values. From 1X>  to «>{'> peicent  of the
characteristics  of the  oiiginal field-
are retained, and much of the une\
peeled variabilitv  appeal* to be  sensoi
noise.
                                                      194
-------
Error Checking
Figure 4  illustrates ASD's error-de-
tection  capability.  Obviously,  the
sounding  for station 2 differs greatly
from the soundings for the other five
stations.   Figure  5  shows  the  time
components  corresponding   to  the
pressure component of figure 4.  Once
again, a   strong  anomaly  (circled
values') shows up. The six soundings
indicated  were checked and did prove
to be erroneous.  Thus, a 10-second
scan of these two ASD graphs isolated
an error that previous!) had escaped
detection.

Physical  Relationships
Three station  pairings   stand  out
clearly  in  the lower  levels  of the
soundings shown  in  figure 6. These
station pairs—1-2, 3-6, and 4-5 —are
geographically  related.  Stations 4-5
are on the  western end of Lake On-
tario, 3-6  on the middle shoreline, and
] -2 on the eastern end. Figure  7, a
plot of the corresponding time com-
ponent, shows that the effect is most
pronounced for launches number 20
through 27. A detailed check of the
soundings from all stations for this
period revealed a large east-west wind
velocity gradient  which varied from
2 m/s in the west to  6 m/s in the
middle to 14 m/s in the east.

Other Uses
With ASD.  new  data  can be  com-
pared quickh  with  older data ob-
tained by the  same measuring  sys-
tem.  Drastic differences in the  ASD
plots will  suggest  instrument  drift
and/or mistaken  assumptions about
experimental background  conditions.
The same approach can be used where
different t\pes of instruments are sup-
posedly measuring the  same physical
phenomenon. This type of application
allows CFDDA scientists to study the
very  large  data sets associated  with
ecosystems and often, through simul-
taneous  analysis  of many  different
kinds of   variables,  uncover hidden
interactions.






400








£ 300
^
jg
.9
^^
L

IE
s

200







100






Nov. 3 Time (GMT)
18 21 00 03 06 09 12 15
.77 84 .91 95 106 1.24 U9 IDS
27,0
110
101
9.2
85
7.9
73
6.6
5.6
54
50
4.5
3.9
3.3
2.9
24 *
18
1.6
13
7
2
2
6
8
1 2
1 7
23
3.4
40
46
5 1
57
64
69
75
7.7
80
83
                                                                                             18
                                                                                             .84
                                                               Nov. 4
                                                           21   00   89
                                                           AO   41   MS
        94
        98
      0  98
Figure 3. An illustration of the ASD
data compaction technique. The
single column oj 60 numbers and
single row of 12 numbers replace
the 12 columns oj data appearing in
figure I, yet retain approximately
90'/i oj the details oj the original
data set.

Modeling  and  Experiment
Design
CEDDA scientists are pursuing other
potential  applications  of  the  ASD
method, including its use  in modeling
and experiment design. The  charac-
teristic patterns obtained  provide  im-
portant clues as to the physical reali-
ties underl)ing the data. We hope that
the  pattern-detection  capabilities  of
the ASD method may lead to an em-
pirical, data-oriented form of system
modeling.
   Another  promising path leads to-
wards the economical  design of field
experiments and data  collection  sys-
tems, based on characteristic patterns
derived from preliminary survey data.
Much  redundant data  and informa-
tion are  often collected in  large-scale
field experiments. If the redundancies
could  be eliminated,  all  subsequent
data collection,  processing, analyses,
archival, and  dissemination activities
would be greatly simplified and more
cost-elfective.  The  ASD method,  by
highlighting  significant patterns of
preliminary survey  data  sets,  could
suggest which data contribute most
to the definition of the patterns, and
which are  dispensable.

Reference
1 Loren/, E. H., Empirical  Orthogonal Func-
  tion',  
-------
                 Figure 4. Composite printout of
                 U-components of the wind for 48
                 upper-air soundings taken at each of
                 six IFYGL observation stations.
           0 S
           ii
                  Figure 5. Time analysis of data from
                  figure 4 isolates six anomalous
                  soundings (circled).
196
-------
Figure 6. Composite of V-components
of the wind for 48 upper-air
soundings taken at each of six
IFYGL observation stations.
 Figure 7. Time analysis of data from
 figure 6 indicates that the pairing
 pattern is most pronounced in
 soundings 20-27.
                                                 197
-------
About the Article
and the Authors

JACK   JALICKEE  was   thumbing
through  a scientific  journal  in  the
spring  of 1973 when he came across
an  article   on  the  mathematical
theorem  of  singular  decomposition.
It  was  evident  that the theorem was
adaptable to the analysis of the large
data sets the EDS Center for Experi-
ment   Design  and   Data   Analysis
(CEDDA) was working with. This
was the origin  of the ASD I Asv mp-
totic Singular Decomposition) meth-
od.
  CEDDA  analysis  of atmospheric
data from  the  International  Field
Year for  the Great Lakes  (IFYGLl
began  in  the autumn of 1974. Prob-
lems arose  almost  immediately.  Di-
vergence  calculations  derived  from
upper  air winds did not make physi-
cal sense. (The calculation is a ver\
sensitive one. involving small differ-
ences of large numbers which contain
noise.) The  data themselves appeared
reasonable and consistent  with  ob-
served   weather  conditions,  \\bich
were   highK   variable.  Traditional
analysis techniques  could not resoKe
the problem: ASD did.
   A native of Vi ashington,  D.C., Jack
Jalickee   worked his  vva)   through
Catholic Universitv  I in D.C.I, receiv-
ing a  R.A.  in  1962.  and  a Doctor's
degree in 1966. both in Phvsics. Sub-
sequently, he worked  as  a research
associate and teacher at Northwestern
Uni\ersit\ in  E\ anston. Illinois. A
Presidential   Internship appointment
brought   him  "to EDS/CEDDA  in
1972.
JERRY  SULLIVAN v»as  the  man
having problems with  IFYGL  data
divergence calculations. His  inhouse
paper  on  the subsequent resolution of
those problems through ASD applica-
tions provided  the nucleus of the cur-
rent article.
   Jerry received a  Bachelor's degree
in Phvsics from Hol\  Cross  College.
Worcester,  Mass.. and  his Doctor's
Degree from Catholic I nhersitv. He
Jack Jatickfe
t
Dick Rozett
joined EDS/CEDDA in the fall  of
1970.

Fr. RICHARD ROZETT, S.J.,  is  on
a  year's  sabbatical from  Fordham
I'nivershv in New York. His previous
work and interest in the application
of statistical techniques to large data
sets   led  Fr.  Rozett  to  come  to
CEDD \.  where  he heads  up  its
MESA (Marine Eeosvstem Analysis)
Project.  Since September  1974.  he
has been working with Jack  Jalickee
in collecting,  devising,  and develop-
ing  ASD and  similar techniques to
analvzp ecosv stems data  sets.
   Kcosv stems  data  sets  are   verv
large, complex, and highlv redundant.
 They include  plivsieal measurements
such as temperature, depth, pressure.
and  the particle size of sand: chemi-
cal measures of oil. lead, phosphate.
aciditN. =>alinitv. nitrate,  and  carbon-
ate  concentrations—not  to  mention
garbage  and sewer sludge:  and bio-
Jerry Sullivan
logical  measurements  such as  the
number  of  barnacles  per  square
meter, or the percent of flounder with
fin  rot. ASD and  similar techniques
make it possible to massage the orig-
inal data  into  a  simpler,  concen-
trated. and more meaningful data set.
  Dick Rozett earned a B.S.  degree
in chemist r\ from Spring  Hill  Col-
lege in Spring Hill. Alabama,  a M.S.
degree  fiom  St.  Louis  Vniversitv.
then  studied  chemical   phvsirs  at
Johns  Hopkins  in Baltimore. Md..
where he received  bis Ph.D. in 1967.
  Ordained  a  priest  in 1062.  Fr.
Rozett was an Assistant Professor of
Chemistiv at Fordham from 1967 to
1972. when he \sas made an \ssoci-
ate  Professor. He  is  the author  of
more than  30 scientific  papers  on
chemistrv  and the  statistical analvsis
of large data sets, and  has partici-
pated in international scientific  con-
ferences in I.eningiad. Lisbon. Kifis-
si.i  l Greece i . and Kvoto.
                                                       198
-------
 DATA VALIDATION FOR UPPER AIR SOUNDING DATA
         AND EMISSION INVENTORY DATA
                      by
                 J.H.  Novak
 Environmental Sciences Research Laboratory
    U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
                     199
-------
           DATA VALIDATION FOR UPPER AIR SOUNDING DATA




                   AND EMISSION INVENTORY DATA




                           J.H. Novak







     A  systematic  approach   to  data   validation   requires   that




several  steps  be taken during the design of  a validation  scheme.




For any set of data it is essential to be familiar  enough with the




data  collection and data handling  procedures  to  be able  to locate




all possible sources  of  error   and   to define   a  criteria  for




distinguishing  good  and  bad  data  at  those  critical  points. The




next task is to  determine  which   techniques   can   be   used   most




effectively in error checking, and  what  course of action  should be




taken if an error is  detected.   Finally,   after  the   validation




scheme  has  been  implemented  the   quality of  the validated data




should be assessed in some manner.




      Therefore the first step in the validation  of RAPS  upper air




data was to determine all possible  sources of  error  in  the   data




handling  system.  The  upper  air  data consists  of two  types of




observations, Pibals and  Radiosondes.




      A  pibal is a pilot ballon  which is filled  with helium to an




exact pressure in order to insure that it will rise with  a  known




ascension rate when released  into the atmosphere. An observer uses




a mechanical device known as  a theodolite to track the balloon  by




recording  azimuth  and   elevation   angles at 30  second intervals.




These angles are then used to  calculate  wind speed  and  direction




at various heights above  ground.  There are  two possible sources of




error during this phase of data collection.   First,  the  observer




                               200
-------
may  read  the  angles incorrectly during  the  sounding  and  second,




transcription errors may occur when coding  the  data  onto  forms  for




keypunching.




     The radiosonde is similar to a pibal  in that  it   is   also   a




balloon;  however,  a package of instrumentation  containing  various




meteorological sensors is attached to  the  balloon  which is  tracked




electronically instead of manually. In  addition to the  azimuth  and




elevation angles, pressure,  temperature,   and   relative  humidity




readings  are  recorded. A variety of  thermodynamic  parameters  can




be  determined  from  these  measurements.   There   are    several




potential  sources  of  error  associated   with the  soundings   :




electronic   difficulties,   sensor    malfunction,    calibration,




misinterpretation  of  the   strip  charts,   interpolation  of   the




adiabatic charts and transcription errors.




     Once all possible sources of error have been  determined  and a




range of good and bad data   defined,   various   techniques  can   be




chosen  to  search  the  data  for  possible errors. The  upper  air




sounding  network's(UASN)  preliminary   quality control    program




contained the following tests on  the  raw data:




  1. Routine data checks - data was checked for completeness  and




    compared  with  known  data(e.g.   station   date   and  time vs a




    performance matrix,  station  #   vs  station  height,   balloon




    weight vs release time)




  2. Consistency checks with alternate  data source(e.g. wind  data




    vs  station  log  books,  doubtful   data   vs  weather maps  and




    recording barograph)






                                201
-------
  3.  Intra-station checks with previous and following  soundings.




  4.  Inter-station checks with simultaneous soundings.




  5.  Checks with known meteorological relationships(e.g.




    comparison of temperature and relative humidity  with  adiabatic




    charts, shape of the pressure-altitude curve).







    The actual key punching of the data forms   introduces  another




source  of  error.  But  at  this  point  the   data   checks can be




computerized, so that all data will  routinely   undergo   the  same




tests.  The UASN data validation programs test  the  data  for order,




range, missing values, station height, and special  conditions such




as  calms  or  wind  speeds  greater than 40 meters/second. Again,




additional checks can be performed on  the  radiosonde  data  when




special  relationships  exist(e.g.  inverse  relationship  between




pressure and time). The advantage of computerized   error   checking




is that the entire data set can be objectively  evaluated.




     Once  the known data errors have been flagged   and  corrected,




the  next  step  is  to  archive  the data. During  this  phase both




printouts  and printer plots  are  produced  in   order  to  provide




additional information to be used for error detection. The printer




plots of speed,  direction, temperature, dew point  and  atmospheric




pressure can quickly be scanned for remaining  inconsistencies.




     In an effort  to further validate  the UASN  radiosonde  data  ,




dew  point,  relative  humidity  and  vapor pressure calculated at




sites  141  and 142 were compared with corresponding   data  recorded




by  the national weather service at Lambert Field.  Correlation and
                               202
-------
incidence  matrices   were   also   calculated  for  the  same   three




parameters on  a  seasonal basis.  Both types of preliminary analysis




proved very effective  in isolating some remaining data problems.




      The final  quality  assurance effort produced Calcomp plots  of




wind speed, direction, temperature,  potential wet bulb temperature




and  mixing  ratio  for each of  the 5,717 UASN radiosondes. Each  of




these plots were  scanned for data errors  and  used  to  determine




mixing depths  for the  St.  Louis  area.




     In summary,  the  important  concepts to  be  derived  from  the




previous  discussion   of   validation techniques used with the RAPS




Upper Air Sounding  Network data  are:




  1) Determination  of  all  possible sources of error in the




    collection and  data  handling.




  2) Use of alternate  sources of data for consistency checks.




  3) Use of intra and  inter station comparisons.




  4) Use of known relationships(meteorological in this case)  for




    compar isons.




  5) Completeness and  objectivity of computerized comparisons.




  6) Use of preliminary  analysis routines in error detection.




  7) Use of computer  graphics.






     The second  topic  for  discussion is the validation of  the RAPS




emission  inventory.   The  main objective of the RAPS program  is to




provide a body of data  (emissions,  meteorological,  air   quality,




etc.)  which   could  be  used to develop, improve and validate air




quality simulation  models. The first priority is  to determine what
                                  203
-------
accuracy  is  required   in  any  data  base  to be able to achieve the


objectives of RAPS and  secondly,   what   accuracy,  precision,  and


bias  currently exists  in  the RAPS  emission inventory. The answers


to these questions are  too  complex  to  be  addressed in this  paper,


but  they are essential  to  the  design  of  a good validation scheme;


therefore I have  included(as  references)  a list  of  papers  which


discuss  this important  question  of  accuracy in detail. Thus,  from


this point I will limit  the discussion to the procedures that  were


chosen  to verify the  accuracy  of the  acquired and estimated data.


     The RAPS emission  inventory  is  composed of three separate data


bases:  (1) point, (2) area, and (3)  line source.  The choice of validation
                         \
technique depends on  the amount and form of the data in each   data


base.  The  point source data base contains hourly, daily, monthly


and annual raw  process  data ;  no  emissions are stored in the   data


base.  The methodologies used to  calculate emissions and determine


temporal  resolution   are   applied  at  data  retrieval  time.   In


contrast,  the  area   and   line  source  data bases contain annual


emissions. The  methodologies   used  to  calculate  emissions   have


already  been   applied   before   the data was entered into  the  data


base.  Temporal apportionment  is accomplished through the retrieval


software.  As usual,  checks must  be performed on raw data  at  their


entrance  into the data handling system . For the area source   data


base, this implies  checking the raw data inputs  to  the methodology


programs. There are  seven source categories  for  the  area  source


inventory  -  river   vessels,  fugitive dust, highways,  railroads,


stationary residential and commercial sources, off-highway  mobile


                                204
-------
sources, and stationary industrial  sources and airports.  The software




for these source categories was  developed   by   several  different




contractors  and  therefore  must   be  reviewed  independently.  Area




source data is mainly checked  for  internal  consistency within  each




grid.  Parameters  such  as population,  number  of  homes,  amount of




water area per grid, agricultural  acreage etc,  are  compared  with




each  other  in terms of overall land  use per grid.  Typical errors




that were found include a  1 KM square  grid  which contained over  2




million  acres  of  tilled farm land and a  grid with population of




180 and only 11 single family  homes.   Calcomp graphics was heavily




used  in  the  validation  of  line source  data.  Line sources and




associated  characteristics  such   as    average   daily   traffic,




functional  class, etc.  were  plotted on gridded  maps to the  same




scale as county roads and  DOT  maps. Overlaying  these maps provided




an excellent means of checking the  raw line source data.




     In contrast with the  area source  data  base, the point  source




data  base  contains  all  the raw  data  for emission calculations.




Because the raw point data includes temporally-distributed process




data  for  the  entire  study  period  in contrast  to annual county




statistics for area data,  the  amount and type  of point  data  must




be  taken  into  account   when  choosing a technique for raw data




validation.  Parameters  which  apply   at   the   stack  level   and




therefore  do  not  have   a  temporal   association can be manually




verified against original  plant  data.  These   parameters  include




stack  and  fuel  characteristics,  operating  patterns, stack test




data, and applicability of the SCC  to  a  given  stack.  And  because



                             205
-------
of the small amount,  monthly  process data was verified manually.




     In order  to  perform  a  reasonable check on  the  remaining   data




a  random  selection   of   representative  sources were chosen.  The




prime determinants  in the  selection  of  test  sources   were  the




method  of emission calculation and the time  interval  of  reporting




the data.  One  source from  each combination of  these   two  factors




was  chosen  to   insure   that  all  paths in  the software would be




exercised. The  following  tests  were  performed  on  the   selected




sources:  1)   manual  verification of process  data,  2)  verification




of diurnal, weekly  and/or   seasonal  variations,   3)  hourly  and




annual  retrievals.   Computer software was developed  to  check all




process data in  the point data base for consistency and continuity.




      Finally,   all  test  software  runs  were compared  with hand




calculations and  the  retrieval programs themselves   were   compared




with the  documented methodologies.




     In summary,  the  important concepts to  be  derived   from  the




above discussion of the  validation  of RAPS emission inventory data




are:




   1) Preliminary determination of required accuracy.




   2) Analysis  of  current  accuracy.




   3) Selection of validation  techniques by:




    a) amount  of data




    b)  form  of data




    c)  availability of supporting data




    d)  significance of data  to the  overall accuracy




    e)  availability of time  and personnel






                               206
-------
                     REFERENCES
Kock, R.C. et al, "Validation and  Sensitivity  Analysis
  of the Gaussian Plume Multiple-Source Urban  Diffusion
  Model", NTIS Publication Number  PB-2Q6951, Geomet  Inc.,
  Rockville, Maryland(1971).
Ditto, F.H. et al, "Weighted Sensitivity Analysis  of
  Emission Data", Final Report, EPA  Contract  #  68-01-0398(1973)
Littman, F.E., S. Rubin, K.T. Semrau,  and  W.F,  Dabberdt,
  "A Regional Air Pollution Study(RAPS) Preliminary  Emission
  Inventory", SRI Project 2579 Final Report,  EPA  Contract
  #68-02-1026 (1974).
Gibbs, L.L., C.E. Zimmer, and J.M.  Zoller,  "Source
  Inventory and Emission Factor Analysis",  Volumes I  and II,
  Final Report, EPA Contract # 68-02-1350  (September  1974).
Ruff, R. E., P. B. Simmon,  "Evaluation  of  Emission Inventory
  Methodologies for the RAPS Program",  SRI  Project 4331,
  Final Report, EPA Contract #  68-02-2047  ((1977).
                             207
-------
  VALIDATION OF BIOMEDICAL DATA THROUGH AN
           ON-LINE COMPUTER SYSTEM
                     by
               Larry  D. Claxton
     Health Effects Research Laboratory
    U.S.  Environmental Protection Agency
Research  Triangle Park, North Carolina  27711
                      209
-------
            VALIDATION OF BIOMEDICAL DATA THROUGH AN
                     ON-LINE COMPUTER SYSTEM
                             L.D.  Claxton
                             INTRODUCTION
     Within the biomedical disciplines there are a variety of testing pro-
cedures used routinely within many separate laboratories.   Since health,
research and regulatory decisions are being based upon the results from
many laboratories, there is a basic need for assuring the quality of the
data.  In the area of microbial mutagenesis, the use of Salmonella
typhimurium as an indicator organism for mutational events is employed
by many laboratories across the country.  The various procedures available
are rapid, relatively simple, sensitive and are used in a variety of
laboratory situations including private industry, government and university
laboratories.  Presently, a great deal of emphasis is placed upon these
types of tests as prescreens for substances that may be human mutagens and
potential carcinogens.  Therefore, the use of a system involving Salmonella
typhimurium could provide an excellent pilot study for methods involved in
data validation.  Data validation is used in this context to mean the
process by which generated data is filtered and accepted or rejected by
objective criteria.   Likewise, computerization provides a potential
means for systematically applying a predetermined set of objective
criteria in a rapid non-biased manner.  With the use of TSO (Time
Sharing Option), portions of the data validation can be conducted during
the performance of a  biological test.  This article will describe the
design of a pilot system for the on-line computer assistance of testing
Also published separately as EPA-600/1-73-038, "Biomedical Data Vali-
dation Through an On-Line Conpucer System,"  Hay 1978.
                               210
-------
protocols and data validation.   The scientific protocols and initial
computerization have been completed and the system will  be tested in  a
laboratory situation in the near future by the National  Institute of
Environmental Health Sciences.
DESCRIPTION OF TEST:
     From a variety of microbial mutation  test systems,  the suspension
test using a mammalian activation  system was  chosen because it  is well
defined  and  is a  quantitative test system.^- '  The more  commonly used
Ames plate incorporation method  is only semiquantitative.  We also chose
to compare three  strains of  Salmonella typhimurium and  a forward muta-
                             (2)
tion strain  of K-12 E. colIi.v J  In simple  terms,  the  test involves the
combining of the  bacterial strain  with a compound and a  mammalian activation
system into  an Erlenmeyer  flask  which  is incubated at 37°C for  30 minutes
to 2 hours.  The  bacteria  are then separated  and  aliquots are plated on
minimal  media for the detection  of mutants and on supplemented  media for
relative survival.  Figure 1 provides  a representation  of the pilot  test
presently used.   Pilot tests are used  to define more  appropriate testing
conditions,  and definitive tests provide data from which mutagenicity is
judged.   For complete testing,  the substance  must be  tested in  several
strains  of bacteria to monitor  for a variety  of different types of
genetic  alteration.
SYSTEMS  OVERVIEW
     This program uses TSO and  was written in COBOL with some additional
FORTRAN  being  integrated  into the  final program.  All programming was
accomplished on an IBM System/370  at the Division of  Computer Research
and  Technology within  the  Nationa} Institutes of  Health, Bethesda,
Maryland.
                                  211
-------
                                                                              C/J
                                      D   w
                                      O   £
             I
         cc  ><
         =>  2
         to  I-
oc
111
>
LU
QC
    OQ

    O
    CC
    O
    cr:
    O
    CO
O)  Z
s-  o
3  HH
cn  co
    Q-
    co

    co


    O
    _l
    I—I
    ex.

    u_
    o


    o
    I—I

    5
     LU

     CO

     LU

     CsL

     Q-

     LU
                                  UJ  C
                                  1—  f


                                  CO  §




                                  —  co
tt^Jig I-*
i
O
+ 1
1 I
"} =
oaf >
O
1
z

3  O
o  o
^  r*
±  CO
-------
     For ease of programming,  the task was divided into three individual
programs (Figure 2).   Information, needed prior to testing of a parti-
cular substance, is stored with the use of Program 1.   This program also
supplies a number for the blind coding of the compound.  The second
program provides for the technician the proper form of the basic proto-
col , performs certain "within-experiment" calculations, accepts the
input of data from the tests,  and evaluates the test by predetermined
objective criteria.   The ability for the central laboratory to monitor
the accomplished work and recall any pertinent data is provided by
Program 3.  A more precise description of the program is available.  '
  Quality Control Through Interactive Computerization
     One of the basic premises of quality control is that good data
yields good decisions.   By monitoring the quality of data during an
experiment and providing  feedback to the technical personnel, both
personal bias and technical variation can be reduced.   With an inter-
active computer network this can be done.  This pilot project demon-
strates these capabilities in several ways.  First, the compound to be
tested is coded and only essential information for the test is provided.
Secondly, certain other variables, e.g., concentrations of various
components, are predetermined for both the pilot tests and definitive
test.  Within this testing system, two pilot tests are conducted to
determine levels of toxicology and potential mutagenicity.  From this
data a narrower range of concentrations for the definitive tests are
calculated by predetermined rules so that there are a  limited number of
                                213
-------
o
Z

GO
UJ
I-
Z
O
I-
<
     00
     o
     cc
     o
EXPERIMENT
PERFORMED



00
< <
_l



r TERMINAL
c -^
                                                                         M -J  D
                                                                          . o  a.
                                                                        o o  g
                                                                        a j73  ^
                                                                        a. h-  <
a>
     UJ
     o
     GO
     GO
     Q
     UJ
     N

     CC
     UJ
     H
     D
     Q.
     S
     O
     o
     cc
     o
     LU
     t-
     00
          < o cc
          CC UJ O
          O -J a.
          CQ uj 2
                                                214
-------
definitive concentrations used across all laboratories.   Next,  the
computer performs any needed calculations during the performing of a
test thus lessening the occurrence of potential computational errors.
Some of the calculations performed for this system are:   (1) bacteria
per ml solution based on a standardized spectrophotometer curve, (2)
variance for the weights of animals used in microsomal S-9 preparation
(if outside normal limits, these will be rejected), (3) calculation of
liver weights and amounts of buffers to be used in microsome prepara-
tion, and (4) calculations for the dilution of samples.   Final  data
validation is also performed automatically upon the final data output.
The computer's ability for data storage and retrieval is very important
in this regard.  For example, in this system, final results are recorded
as number of colonies per plate.  This software program compares the
average number of colonies per plate for the controls to the past 100
accumulated controls to determine statistically if the controls are
within normal limits.  After the statistical examination of the controls
the test is either accepted or rejected.  If the test is a pilot then
the data is also used to determine the concentrations of test substance
to be used in further testing.  All data are, however, recorded per-
manently.  Rejected data are recorded so that problems can be analyzed
as they are encountered.  A flow diagram for the areas within the de-
cision processes is shown in Figure 3.  The TSO is the component that
allows for immediate technician/program interaction, thus allowing for
a rapid and constant quality control.
                                215
-------
    o
    a.
    o

    UJ
    _j
    CD
    i—i
    to
    QC
    0
 •  OO
oo  to
    UJ
O)  C_>
i.  O
3  QC
Ol  Q_
    0
    O
    LU
    Q

    LU
    O

    ce
    rs
    to
is
         u
         uj
           0
           "
                                                       O


                                                       CO

                                                       H <
                                                       uj E
                                                       2 ui
                                                       It
                                                       LU CC
                                                       a u
m
6
Z
CO
H<
UJ CC
Z UJ
u. t
UJ CC
Q 0
i







t

-i
OH"1
2 58
UJ UJ <
H a H


*








z
o
p
<
o
_l
UJ
1-
co
UJ
H-
                                                                           Ik
                                                                                           CO

                                                                                           O
                                                                                           CO

                                                                                           U
                                                                                           UJ
                                                                                           O
                                                                                           O
                                                                                           UJ
                                                                                           N

                                                                                           CC
                                                                                           Ul


                                                                                           0.
                                                                                                    u
                                                                                                   «
                                                                                           a
                                                                                           z
<
z
                     -   i
                     O   wl
                     Z  .LO
                     SoS
                     UJ _ <
                     H a. H
                                   CO






t/ANTED
RAINS V
i-
CO

















i






CO
Ul
H <
Ul CC
Z "'
iZ t
Ul CC


»
ro
0
Z
CO
UJ
I- <
UJ CC
Z UJ
It
UJ CC
QO






H






^ ^
OH
H
UJ U.
1- C
'

»

O H
2 =
* 2
CO U
1


col
<
1-
'



: col
> W>|







H






3
>
LU
CO
UJ

nr
»
z
o
H
D
_J
Ul
CO
UJ
H
T

J





















                                                     216
-------
     This prototype system demonstrates that interactive computer pro-
grams can be used to effectively increase the quality control of rapid
ui vitro tests.   However, it is also apparent that the more simple
i_n vitro microbial mutagenesis tests such as spot tests and simple plate
incorporation tests do not require such extensive computerization if well
documented and detailed protocols are available.   Since most i_n vivo
mammalian systems have extended experimental time periods, the time sharing
option would be of little benefit due to cost factors and experimental
design.  However, even with the more simple i_n vitro tests and mammalian
cell culture tests, this system can serve as a model for data storage
and test evaluation for the purpose of quality control.
                                                /4)
     This paper was extracted from an EPA reportv ' which is available
through the National Technical Information Service, Springfield,
Virginia 22161.
                                217
-------
                              REFERENCES



1.    Frantz, C.  N.  and Mailing, H.  U.   1975.   The Quantitative Microsomal



     Mutagenesis Assay Method.   Mutation Research 31:365-380.



2.    Mohn, Georges, Ellenberger, J.  and McGregor, D.   1974.  Development



     of Mutagenicity Tests Using Escherichia coli K-12 As Indicator Organism.



     Mutation Research 25:187-196.



3.    Claxton, Larry and Baxter, Richard.  1978.  The Computer Assisted



     Bacterial  Test for Mutagenesis.   Mutation Research (In Press).



4.    Claxton, L.   Biomedical  Data Validation  Through  An  On-Line  Computer



     System.  EPA-600/1-78-038, U.S.  Environmental  Protection Agency,



     Research Triangle Park,  North  Carolina 27711,  May 1978.   10 pp.
                                218
-------
REGIONAL VALIDATION OF STATE AND LOCAL AIR
              POLLUTION DATA
                     by
              Thomas H. Rose
                Region  IV
   U.S. Environmental Protection Agency
          Athens, Georgia  30605
                    219
-------
                 REGIONAL VALIDATION OF  STATE AND LOCAL  AIR
                               POLLUTION DATA
                               Thomas H. Rose
                                   SUMMARY
      Two types of data auditing are performed on state  and local  data in the
region.   One is directed.  The goal  of a directed audit  is to verify a certain
value such as a violation of a standard.  The other is undirected.   The goal
of the undirected audit is to determine  the quality of the data being gener-
ated.  Both are systems audits but the undirected audit  will  have wider ram-
ifications.  For the most part, I will address the undirected audit.
      Each measurement system requires a different auditing path.   I point
this out not to make the job sound complicated, but to emphasize the impor-
tance of having the auditor to be knowledgeable in the area of the audit.
The path of auditing will be determined by:
      • the quantity and quality of records,
      • the existance of an agency SOP,
      • the availability of records,
        •• on a macro geographic scale,
        •• on a micro geographic scale,
      • the system itself,
      • the time frame allowed.
      Thus you can see that the auditing process is tailored to the specific
system being audited.
      In Region IV where every funded state and local agency is audited at
least once a year this is the approach  that we take.
      1.  Establish the flow of samples and data through the system (from
the agency SOP).
      2.  Trace each parameter of the measurement process  (volume, time, flow-
rate, etc.) back to the base standard and verify the quality of that standard.

                                     220
-------
      3.  Verify that all  measurements and transfers of data are documented
and follow reference methods.
      4.  Verify that all  measurements, calculations, and data transfers
are accurate.
      5.  Provide feedback to  the agency being audited of improvements that
could be made in the measurement process as well  as the data handling.
      One of the most important aspects of this audit is that the agency
itself has to participate  and  will itself determine the best corrective
action for their own system.
                                     221
-------
    DATA VALIDATION FOR THE LOS ANGELES
            CATALYST STUDY  (LACS)
                      by
               Charles E.  Rodes
  Environmental Monitoring Systems Laboratory
    U.S.  Environmental Protection Agency
Research  Triangle Park, North Carolina  27711
                      223
-------
                   DATA VALIDATION FOR THE LOS ANGELES

                          CATALYST STUDY (LACS)

                              C.E. Rodes

                              INTRODUCTION

     The Environmental  Monitoring and Support Laboratory (EMSL) is very
concerned with the quality of data generated in its field studies.  This
is reflected in the quality control measures employed by EMSL during
sampling and analysis,  and the data validation performed before data are
released.

     Data validation like other aspects of quality control requires
resource allocations, especially in terms of the manpower required to
complete the final validation.  The amount or degree of validation
required is dependent upon the end use of the data.  In a study such as
the Los Angeles Catalyst Study (LACS) which is primarily concerned with
long-term trends, the emphasis in data validation is to detect any
extreme outliers which  would affect monthly averages.  Since we do not
report maximums, our data validation philosophy for this study is
primarily concerned with those values that may affect long-term averages.


                              CONCLUSIONS

     As the project officer responsible for the study, I initially chose
an acceptance error band of ±10% on individual measured values, hence,
this is also the error band of the averages generated from these numbers.

     Given this requirement one should be able to assess statistically
the amount and types of validation required to prevent data reduction
and transfer errors from contributing more than 1 to 2% to this overall
±10% error.  Unfortunately this area has really not been examined for
this study in any detail nor I expect for many other studies.  The
present validation levels used for the LACS probably examine more data
than are necessary to maintain the desired error level, but in regard to
validation, I would much rather be conservative than embarrassed after
the data are released.
                                  22k
-------
                              PROCEDURES

     The main objective of the Los Angeles Catglyst Study (LACS)  is to
develop ambient air data bases for sulfate (SO,), carbon monoxide (CO),
lead (Pb), and other mobile source related pollutants before and  after
introduction of the 1975-model automobiles that employ catalytic  converters.
The data from this study are being analyzed to determine whether  the
catalytic converter has significantly increased the ambient sulfate
levels and/or simultaneously decreased the ambient CO and Pb levels near
the San Diego Freeway in Los Angeles.

     The Environmental  Monitoring and Support Laboratory (EMSL) is
responsible for all study-related functions including instrumentation,
operation, sample analyses, quality control, and data validation  and
analyses.  Since January 1976, the operation of instruments and analyses
of samples were performed under contract to Rockwell International or by
interagency agreement with the Lawrence Berkeley Laboratory.  To  assure
the quality of the data supplied by these two organizations, EMSL maintains
a comprehensive quality assurance program covering all aspects of the
study.  EMSL issues periodic reports which discuss the trends and the
interrelationships among the various pollutant patterns.

     The site locations in Los Angeles and the site layouts in relation
to the San Diego Freeway are shown in Figure 1.  By selecting sites with
the prevailing wind perpendicular to the freeway, the cross-freeway
contribution to the ambient pollutant levels can be determined using
concurrent upwind and downwind measurements.

     The data collected are classed as either continuous or integrated
depending on the measurement method.  Continuous data are reduced to
hourly averages and integrated data are collected either over a 4-hour
or 24-hour period.  The total data volume generated by the LACS is shown
in Table 1.  Since the sites are usually shut down in December of each
year for routine maintenance, the data volumes are based on an 11-month
year.

     The flow of samples and data are shown in Figure 2.  All block
items except "Data Processing at RTP" and "Final Data Validation" are
performed by the contractor.  Data validation steps taken by the  contractor
are referred to as "pre-validation", while validation performed at RTP
under more direct EPA control are referred to as "final validation".
                                 225
-------
                               _j _i OC
                               O O oo
                               »E
                               CC CC OC

                               Z Z Z
                                 >
                                 _J
                                 <
                                 z
                                 <


                                 fftc
                                 CO UJ
                                               < ce
                                               CO UJ
                                      SCO 2
                                      O <
                                   Q.  oc co

                                 -rr  S =
                                 Sco  ^2
                                          CO
                                          >
                                          CO
o
o
                               oil*
                                               j
                                      "  >°1^>1
                                      o  — > S < uj —
  UJ
  o
  <
  u.
  cc
  =3

°£
a >
LLJ 4

5 5
< cc
co u.
                            !Z  o o O ro .i.
                            CO  O H- Z O S
                                                 ec
                                                 z
u
LU


o
•h
^ I
                               -1-1 SS
                               oo £
                                                           c
                                                           o
                                                           '+J
                                                           OJ

                                                           I
                                                           (U

                                                           T3
                                                           c
                                                           03

                                                           c
                                                           O
                 O
                 Q.

                 E
                 o
                 o
                 (U
                                                                    T)
                                                                    3
                                                                    o
                                                                    3
                                                                    O5
                               zz S

                               OC OC CC
                                 N
                                       OC

                                 60UJ   UJ
                                 — KJ   -J
                                               «S I
                                            =  co 5 »
                                                   UJ
                                                   a
                                         fills6s
                            <  Sigis  s s=»;
                            JS  
-------
           Table 1. LACS YEARLY* DATA VOLUME

CONTINUOUS (HOURLY)
INTEGRATED (4-HR)
INTEGRATED (24-HR)
INTEGRATED (WEEKLY)
INTEGRATED MONTHLY)

SUMMER
70,080
20,160
9,000
816
120
100,176
WINTER
58,560
11,100
4,050
680
100
74,490
TOTAL
128,640
31,260
13,050
1,496
220
174,666
'ASSUMES OPERATION FOR 11 MONTHS/YEAR.
                    227
-------
CONTINUOUS SAMPLER OUTPUT
                                           INTEGRATED SAMPLER OUTPUT
       STRIP CHARTS
                                                   SAMPLES/
                                                  DATACARDS
        DIGITIZER
        PRINTOUT
                                   SAMPLES
                                  ARCHIVED
ri
 SAMPLE
ANALYSES
                                                   PRINTOUT
DIGITIZER
DATA QC
CHECKS
                                                             I
                                                         LABORATORY
                                                           DATA QC
                                                           CHECKS
| j STRIP CHARTS TO RTP |

1

PREVALIDATED
DATA






DATA PROCESSING
RTP
1
FINAL DATA
VALIDATION



,

PREVALIDATED
DATA




•j DATA CARDS TO RTP


                           Figure 2. LACS Data Flow.
                                   228
-------
     Pre-validation by the contractor is performed  in  two  areas  -  electronic
digitization of the strip charts and compilation  of the analysis data  in
the laboratory.  A portion of the data generated  by the electronic
digitizer are checked against manually read strip charts to verify
scaling and digitizer performance.   At present 5% of the data  are  spot
checked in this procedure.  The laboratory analysis results are  compared
on the contractors computer listing against the data cards manually
completed during the analyses.  All data (100%) generated  in the laboratory
are checked in this procedure because of the importance of single  integrated
values.  We do not at present require the contractor to keep records of
the amount of data corrected during prevalidation.

     Final data validation is performed at RTF following the general
procedure in Figure 3.  This step in the validation is concerned primarily
with data transfer errors, but also examines data that are not consistent
(outliers) with the rest of the data base.  In general all of the  values
in approximately the highest and lowest 1.0 percentile are verified with
a check made at random of approximately 5% of the remaining data.   These
validation levels were initially selected somewhat  arbitrarily by the
project officer as a compromise between data quality and the amount of
resources required for the validation.

     The final data validation procedures are based upon the output
formats used to list the individual data values.  The three formats are:
(1)  an hourly listing for continuous data such as  CO and NO,  (2)   an
integrated data listing for samples averaged over 4-hour or 24-hour
periods, and (3)  a summary listing comparing simultaneously collected
upwind and downwind data for freeway contribution.   The general  instructions
given to the data clerks are shown in Figure 4.  A  sample printout of
hourly data is shown in Figure 5, followed by the outlier limits in
Table 2 used in validating the hourly data.  A sample of a 24-hour
integrated data printout is shown in Figure 6 with  its associated validation
limits given in Table 3.  A study is presently being made of the frequency
distributions of the LACS data to reassess the validation limits listed
in Tables 2 and 3.  The starred (*) values on the printouts are values
determined to be outside ±3 standard deviations of  the monthly means.
A sample of the summary format is shown in Figure 7.

     For possible future validation requirements, portions of integrated
samples are stored at the contractor's laboratory,  and the strip charts,
data cards, and final validated printouts are stored by EPA at RTF.
                                   229
-------
RAW DATA FROM
CONTRACTOR
'
I i
STRIP
CHARTS






PREVALIDATED DATA
FROM CONTRACTOR
r i
F






C°RTD*S | OAT»nH>CE«M«G |

/
HOURLY
PRINTOUT
\
^ ^^
;i

•
r
24-HR
PRINTOUT
1
r
FINAL DATA
VALIDATION
\
SUMMARY
PRINTOUT
f
'







VERIFICATION
OF OUTLIERS
            FINAL
          VALIDATED
          PRINTOUTS
Figure 3. Final data validation.
         230
-------
          Figure 4. LACS PRINTOUT VALIDATION INSTRUCTIONS
GENERAL
(1)  VERIFY THAT BLANK SPACES ON PRINTOUT MEAN THAT NO DATA EXISTS.
(2)  VERIFY THAT ALL ZERO VALUES (0.0) ARE REAL,
CONTINUOUS
(1)  CHECK ALL HOURLY PRINTOUT VALUES THAT EXCEED THE OUTLIER
    LIMITS AGAINST THE STRIP CHART.
(2)  SPOT CHECK 5 RANDOM HOURLY VALUES ON EACH STRIP CHART (ONE
    WEEK/CHART) OTHER THAN THE MAXIMUM VALUES.
INTEGRATED
(1)  CHECK ALL 4-HOUR AND 24-HOUR PRINTOUT VALUES THAT EXCEED THE
    OUTLIER LIMITS AGAINST THE SAROAD CARDS.
(2)  SPOT CHECK 2 RANDOM VALUES FOR EACH POLLUTANT AND TIME
    INTERVAL PER MONTH.
(3)  CHECK ALL STARRED VALUES ON THE SUMMARY PRINTOUT IN COLUMNS
    A, B, C, D, AND (C-A). IF ONLY (C-A) IS STARRED, IN ADDITION CHECK A, B,
    C, and D.
                           231
-------


































>•
w
lu 4
149
< 4
z «
0 t
IM
»— £
O U
Hi h
t»- «
o d
or M
O.
•
_J • :
« 1
»- t
z
UJ
* u.
O r-
ac c
1-4 1
>
z <
UJ 4


























M
^

rg
*•









»-


a
BE f*
O lA
u. O






















UJ
<3
) IM
M
O
»-4
0
1
1 2
Ui .»
u> «-
O rg
?i
i 1-4 rg
z •*









































j»
.j
IM
<
A









rw
r»-
0
*"

O.
UJ
1/9














Z
o
^
-1 141
« 141
*$
a. u
u  ••* ••
0 O

a
0 ci

oo
K. (V
•4t •"* O
O O
«- »
•A »- O
00

ro »•
IM •- »-
O 0

•- »-
«* r*. r*
O O


C •*
r ocl

». T2
4T- »- O


r> an KI
»- O a


oo rg
&• *o ^
oo


* *™r
06 w\ i/t
o ci

r^ *n trt
°°4
rout
•d ro *a*
O O

«• fM -*
«*. iA ro +9
D •-)

«- -*
•* >g -t
0 Cl


*) -1
Kl rvj O
O O

•- l/l

0 0


••* ro a)
o o


Ui iA
•g -o A|
r* CJ c J
rg rg IM
Ktl/MA
0 O O

4*O *O ***^
0 OO


^- lA «"*

0 O O



r-j rg ro
c*> u «— *


ro oo gj
*- IM IM
0 O 0


^* O fO

o o o
rgr- fi
r- IM IM
O O C.




O O O
*~ u-.r-
r- rg rg
0 O O
CO -O IM
o oo

f^ rg !>•
^ N^ IO
O O Cl

o ro o
tj\l (jf) ^
0 O < J


V* O O
** r~> ^
t> «- C-

SroT
O »» tJ


OO fO 0(>
o «- o


r^- o^- *o
« o  
O O 0
• • •
o «» r^
ro ^j go
f i O t

.00--
>J K> r^
0 CJ 0

0 K> -»
r") ^r rw
r- o c


t *J ^O
ro •» r^
o O o

IA r^. r\j

O O C.


«- ~» gj
o o o


TJ N. >»
»- .» -o
t j 0 C.
«~ rg i>* ry IM
J iA -O iA gj
ooooo

IM sr rg O IA
ooooo



•43 O g/ lA IM
OQOOO



O g) .» >» «-



fC?c ms»c
0 r> « . LJ 3


*~ ao -* oo ro

O cj Ci ' J O
-» M A, ^- A,
K1 ^ *^ Kt *~
o o o o c-




ooooo
r\j *c ^i a c
r\J (\^ f\J ,1^ <|r_
o o o o c:
O r* fO Ki O
CM <^ K\ KV C4
ooooo

-4 M f^ »- o
rvj rn ^ ^ fvj
0 O C- 0 C;

••* -O f*» W~»

O '•-> O O


>4 ht T* O
O -O oo JO
o c> c c

l/^ w~ ^ Ow
«^) r- O1 o
O T- 0 f J


ao LJ »- t>
Q f>J r* O




«- •- r* O


Kl Kl -^ *-
JCJ i— ^ -O
o «- o o

O 0 0 « t
r** IA *o ^
** wi ^> IA
o o cj r


•* >» M ^»
J o cj o

0 u) .) ^
• (Wl ^ 1O
r) o c ) o


^ D wi (ij
IA ro IA ro
:3 0 O 0

IA •- r^ gj

a o o o


>O <4 Wl iT%
a '_. • i o



•CI ^» l/J >»
CJ C 0 Ci
IM rg
fM IM
OO

O r«t
OO



Co



4JJ— O^



•^ir?
O O


t- CJ

o o
oa «o
o o
c- c




OCJ
«- x<
«- c>
oo
rg ao
0 0

r-j >»
CM »-
Cl CJ

0 >»

•-J O


^ -0
C C)

JO rg.
r >»
O rj


•* Jo
c > cj




o o


o o.
ro to
r> o

'^^
Cl M
•0 r-1
O f I

>A -(>
t* t f 1
f > -J

art 0
rO f
<-•>.)


•0 v|
ro io
O O

N. <)

O u


>t «-
o o



J r-
r. c i
4r- <\j rg
rg rg -»
a o u

IA ro O
OO O



-» |A IA



< lA •»



°Jo ro
' •> CJ CJ


IA rv sf

cj o o


o o o




OO 0
0 «.;
T* V
O O
rg o
O O

•O «r-
^- f\J
Ci C

•o o

0 -J


^ ^
C' c

^ r°
u 1-J


^- ao
c_> c:




o o o



•~ rg o
O O e

.J O . 1
rj r j .-
'M ^^ *A
r J t i r

T- .N) LA
r- i r-> N*
Cj O .J

o- r- ro
i-\J fO ^
r. n i;


* i r^« -j
IM ro 'rf
0 T j

uj cj r^

C, CJ O


1- 1 ~T IA
l-l O J



"> ^4" l/^
I-' CJ 1
IM IM rg rv rw
rg m in IM rg
u O OO O

ro gj 10 rg g)
oo oo a



K> g» i»> IM g/
o O o o o



ro K> ro rg ro



r- IA o -o o
O K> O O O


ro c> •— •» o*

O O o O '-3
c, rg ro in ro

OOOOO



O O r- C »~
OOOOO
O •»• •- -4 C
CJ Cj v~ U ^
o o ci O o
0 Kt oarjo
OOOOO

o- 06 >» O rg
O •• *- ^* v-
c O o c. c

O *- w» O f '

J O ( 1 -J 3


»- ^* -^ 10 r^
CJ O O O C

^ r i? * ?
V_J l^ \J »• 1 1


IM iv* r*- ("J IA
f.; O 0 •- 0




O O O O CD



to »^> s* fM IO
ci ct c:i n o

o o o -} o
•- o o IA r-
rO r*> ro IM fM
c r» c - r> r

J-, o* rg o r-
i\i rg r* t r\j **
i > O O O c.>

r- rs. ro >» r^-
g '\J .-•> AJ ^*
• I ) I 3 Cl O


0 > - 1 » 1
rvj rg ro rg r\j
:J T ci ~J O

u- ro c i •* r-

c-. cJ O Ci cj


ro ro -J fM i lA OO
*A -^ tf\ & gj
cj O cj ci o

>* O Kl AJ (A
IA Kl •* •- J
OO O rg O



-«-«»- r- ro
O O o f 0


r«- oo on r- o
ro rg »- ro •-



^2^2^
C. C5 0 0 0


CJ IA T- lA U"V

OOOOO
•OIA r- rg rg

C O C- Cl O




OOOOO
K r-. c >* ro
^~ •"* IM ro T^
CJ O O O O
ao of gj rg •-
O O o O O

-* r1- rg ro KI
10 rg T- N. T—
CJ O C • O C

IA f* c> O f^

0 . » 0 ' 1 0



4T- C) CJ •- C-

c" ^ u" r4, o?
»- 0 0 «- 0


LA f>- 00 5 r^
«- o o »• o




»- o o T- o



oo -d o rg ^j
C' n o •- cj

-) T 0 T- o
o o o- r- a,
u> ui ^ ji r**
f O r • i ) n


b> IA .4 f*- Gv>
O CJ O O J

f^. IA 0 0 IA
u-\ Jf t O >
c r-1 i i r> i >


-o n> ,•> ,i M
g^ *4* i^* «o •—
'_) o n n •-

N* IN) r- u\ r^

O CJ V'J Cl «-


lA I/I >» INI 1*1
*,» o rj *- ^j



w m j •>• g
c ' i 1 i- j r j 10
rg cv rg fM rg
ro Kl IM KI T-
ro Kl O lA lA
OOOOO

gf f»fc o 30 O
IA oo f»- r»- »•
ooooo



K% 0^ OQ Qr* 1"^
o o o o o


*f ^ fig ^ ^-
*~ aO oO fit i/^



^3 Lrt O U^ CVJ
O O O O ' J


O «" *•• O1 O

o o o o ^
o- u P«> i/.

c^ o o o c




ooooo
0- O 0 ,» >»
O «• 1- 4f» «r-
0 CJ O O O
oa O - •*
ooooo

C< a. ro r>- c.
•- O g/ rg ru
c^ c; o o o

ro «™ •* ro •—

J C> 'J 00


^ |/\ 9^ QQ ^.
ro r* n >o *M
r c: *- o c

*, is £ r *'
'" i >. > •- O CT


r«- %» a. Ni
0 C) C. 0


r«- -o o r^ -o

OOOOO



-O r-1 ~O ^o o
O O Cl O O

1.1 1 1 J C» O
K K> Of «- C
ro »• O O uu
t • c) o ci o

^ .0 IM «A IA
-4> «i », i>t gj
-3 O U CJ C 1

IA ui r\i vj ^J
•O O * A
i_. n c' ci ••*


^1 M -O A Is-
r»> 1*1 g^ ^> IA
0 ^3 '3 0 »-i

» OO O

0 CJ (.i 0 C-


Kl "1 Is- -O XJ
• J ' J • J CJ • 1



g- u> r*- r*- «j
C. Cl VJ Cl U
oo
o
1CJ
-» IM
O IM
• •

Kl O IM
IA Kl •
O T-
• •

rg ors.
•^ K> K>
»_* T*
• •

r» 0 **%
rO ro fw
f > O
• •

fM CD grj
fs* f • *O
0 0
» •
aw c-K»

C C
• •

h» o ^
«- Kt -^
^ S
V 0 *
v* rw r*1
0 0
• •
jtj O rg
r* f\i m
0 0
• •
u^ o ra
IM r\j t*-
O o
» •
««O Qfl O"
f • P-J a)
> O
• *

-» a 0
C •-
* •
*" O CO
1 3 f^-
* *

aJ rj o
ci fM
• •

«p— *^- >*
r** «**a oo
O •-
• •

a 0s o
*A r>j rg
CT «-
4 •
tA f\( O
ul <-
* •
m o r--
-* "ij J&
r> n
• •
KI ;v » t>j a>
tJ 0
• •
*o i> *n
* M v>
Cl < J
• •

(\ > l vl
<>4 r>i «*
i ) T-
*
< J o r*
*/^ <^g iA
O v
« •

w> f-j CJ
J r-i
• •

u i c> r\i
tj > ' \j AJ
i_j ro
                                                                    o
                                                                    •f-J
                                                                    c
                                                                    'L_
                                                                    Q.
                                                                    O
                                                                   .G
                                                                    D.

                                                                    E
                                                                    03
                                                                   to

                                                                   iri
                                                                    0)
                                                                    L_
                                                                    D
        • O- O   »- rg i
232
-------
Table 2. LACS CONTINUOUS SAMPLER OUTLIER LIMITS
LACS CONTINUOUS SAMPLER OUTLIER LIMITS
(ppm)

CO (CARBON MONOXIDE)
NO (NITRIC OXIDE)
N02 (NITROGEN NIOXIDE)
O3 (OZONE)
TS (TOTAL SULFUR)
WS (WIND SPEED)
SITE 008
25.0
—
-
-
ALL OTHERS
15.0
0.5
0.3
0.3
0.05
15A
AMILES/HOUR
                  233
-------
1













o
if,
IT
•a
u
Z 2
C »
— Y
t- t
*- •
li-' -
t-
c «
01 >
0. «
: £
•3 'O
'Z C
L. :

i a-
—
^

u





























'•



\
i











i.
c
•
c
c
c
0
-
C
c




































































t
I r n P N i 4
_i (,
* <


















































;


















vr
i C
t
ViJ

i in





































































5
i

<




.
-
]
c
3

•





£
"*
t





L
<
u
~

0




t
*"
c
t-


c
u
-z
u
(X
u
u






















1
,
x c
J C 3
r c 3



r*
-
c
P
* O
c
ff
r
;

r-
_
c
c
a
^
~"
f^
"


r->
C
' "
o
f*
t
• '
! fV
: —


r-
-
i r
o^
(N

r>
_
c
&•
• — H
t- tr
!t -
;?i
-



















,






'
=
C







!
r







C

"







g
f

,
'




C
L


-

a
r
x

:
r*


















r







b*-
Lf =
r r







i C' <
'• — i
• Ul u







IT r

3- -4







(SJ ~-
*i 3" C1
f-i K
J




CM C
_* »



, X X
k tl t
i^;
r"
i
i
















CV.'







r -
fx -








• C1 C
' 3 1
er f







cr r»
'
*^* u







rO 5"
i 1^ C






— c
_-



X X
j CJ C
(^



















r. 3







C T
• -t r
C^ 3







O (
1^ -
o- a







<




i rg a
.-•



- CS! 1*
I O C
x x
«SI fS>
1 c r
> X X



















t;








t. a
i -







t <
^ r- c
IT 0







r- 3
1
3








! 0- O
'
*s cc




^ oC




1 C C
•s >•
c c







C' -










o c
* *





& tr




IsT ?s
xC
M  3-

rs —















1/1 (S
•~* ^<-



ec (*•
X X
fsi rs,
C' C
X X
fs- Js.





1




1



1












f" f
Cr C
-c.







t c
fv *J
cc -t









— .









I^-L f
t_' t




0- O-




c -
(N (S
X X
IS-  c
X X
rs. i^
fs, fs





j
i











c o-
r rj







*
cr -t
C (S
r c







r c
o -
<*: u'







0- C

















00 sO




(V IS

r -i c
TS |s.
rs ^^








;









r .







-c is
f ^
t r







C C
— 0
o r-







^ iT

















U i.




^5 rs
X X
fsi r\i
t, c
x x
t-s rs
r. rs






i
I

''



i


1









c
f
L,







c
-t.
^







c.










ts






a
_•



cr
ISI
IN
r
x
rs
r~.





I
i











-







r
-
c







c
(Si







•c









{£
IT
*
ISi




_
*-,



O
?"
X.
^





































+•
C
C
*^
c
L
c
s.
^
r
(
1

I


C



L



















1
234
-------
Table 3. LACS INTEGRATED SAMPLER OUTLIER LIMITS

TSP (SUSPENDED PARTICULATES)
NOa (NITRATE)
SOi (SULFATE)
NH4 (AMMONIUM)
Pb (LEAD)
S02 (SULFUR DIOXIDE)
24 HR
200
30
30
3.0
8.0
50
4HR
300
30
50
3.0
12.0
-
235
-------














rv
FV
V.
f
_
X
O
c


u
z
4
ct
cr
>-
UI U
•-• z
I/I UJ
>- u>
_l 4
S H
«• c
^
_l 1-
4 U
u u
*-• >-
z c
x a
o a.
UI
»- _
4
O 1-
z z
4 U
Z
_l Z
4 C
^J Q;
— —
UI Z
— u.
1-
4
l>—
UI


































>-
o
3
IT1

• »-
IT
>-
1 _(
4
»-
4
U

UI
UI
_J
12
z
4

I/I
O
_l


































—
3"
CM
1
O









CNJ
cr-
eo
CM

CM


Q
4
UI
_l



























U

m
INI
— •

tt
UI
I-
UI Z
I O
—
u n.
x cc
13 C
r v
cr
z <
•—
c
z -
c z

1— 1-
4 4
or
h-
Z _l
UI O
u >
Z 1
0 —
O I




t
I



t


1

1
1
t
»


t

»




II




































t
1
H
o a
i
i
- H
1 U
1 C
1
1 t
ICO 4
1 ••»
IV, _
1 U
1 I
1 >
1 1
0 -1
1 U
a
i
i »
10 -
u
1
1
i CO _
1 u
c
1
c
1
1 1-
•-
1 U
1
1 1.
> u
1 «-
V
1
a
u
u
4
1-
V


O
X.
*
1
0
c
o
3-
CM
0- 0
0 -
c

o
CM
•
3"
C> f
t —
O
rv
0 O
CM rv
rv «
c. o
CM CM
O O
0 C
CM C

< 1 C

o a
o m
in in
o c
CM CM
c a
o-
T
1
f>
1
O
1
C
3-
O O
O -
1
o
-
o
-0
3"
D
O
O O
rv 00
co a>
C C
CM CM
|0C
o-
T
*
t
Cl
1
tl
CN
o-
0
1
o a
— CN
C
c
c
c
-o
c
0 C
rv -o
o c
CM CM
o c
o c
— c
- .J


0 0

c —
CM CN
0 C
1
J1
1
o-
IT
D
Cl
c
CM
1
0 0-
1
0

IT
•
1/1
O
o a


CM CN
O C
,
•
"'
1
O-
1
C
or
o
o o
CM CN
r
r
c
o
•
in
o
rv
0 0


CN CM
O C
1
1
o o
— c
1



0 0


CM CM
O C
!
" I
1
C'
c
o
o
CM
o
C' C
3- -<
C
o
-c
•
un
0
3-
0 C


CN, CN
0 C
3
•
1
r-1
»
C
C
3
C
<
c
0 C
^\ u
c
c
c
t>
c
0 C

— CN
CM CN
C C
236
i
0 0
rv *•
--



0 O

CM CM
CM CM
0 C
O
T
CN
1
1
O
-c
1
0
CM
1
0 C
c, c
D C
D
-
C
•
CM
CJ
X
o o


CM CM
C C
i
•Ci
fMJ
0-
a
i
INI
CNi
o- a
-C 3
— a
i
c
-
Cl
3-
•
«•>
O
n
o o

CM CNI
CM CM
0 0
0 O
-C Q
O CM



0 0

CM CM
X X
CM CM
0 C
1

<
X
o si
* »l
» 1
> •*!
* *
IT tS
1
|
1
O" iN
• i
— IN
0- 0
o m
3- —
i
o- un
-o o
T-
O m
c c
o rv
1 -O —
• •
 <
1 — CM
Cl IT
•C -O
|o m
IrC-
•
c
4 • •
q z o
3-
a
CM

O-
CC
CM
-
CD
t£
C
^
t-
O
*-
•Hi X > 1
•1 • •
t =r C

IT
*
C
i
•v C
IV C
<~ f
1
c
o
*~

^1
oc
o

m
CM
^
~
U
a
CM
rn
i
\r
in
-c
c
o

1*1
n
o

o
3 |v | ir
• 1 •
n l o




u
u-



* \T: CM
CN
-




























CM O
11














C'
















i












i i









U.
u.
u^
1 c

-\ n
»-; ' a:
4 O
or
t~ o
i






1














!










1









t.
a
<
z
c
%-
I/
! *+•
1 0
, o
u
a

o-
c
c
ir o-l o
o- H
• 0
0 II
ii a.
CL
K-
>-
UI
X
•-
-
"J-| 4
1- Z- U
Z 4 -
4 UI O_
03 >-
— u.: i-
U, — , 4

Z IT UI
IT •— 1 or
~ 01, 4
<"
U\ UI
U1 -, r^
M -1
Ui O
o z
4
>
Z UJ
UJ Or' —
cc u, •
U U. Q
u. — ] UH
— o! cr
c cr
••' 4
-• UI 1—
UI Ui UI

-------
  VALIDATION TECHNIQUES USED IN CONTINUOUS
               AIR MONITORING
                     by
              Marvin B. Hertz
     Health Effects Research Laboratory
    U.S. Environmental Protection Agency
Research Triangle Park, North Carolina  27711
                     237
-------
                  VALIDATION TECHNIQUES USED IN CONTINUOUS
                               AIR MONITORING
                                 M.B.  Hertz

     The Community Health Air Monitoring Program (CHAMP) is a network of air
monitoring stations used to acquire reliable air quality data for use in
epidemiologic health effects studies.
     The CHAMP network has remote air  monitoring stations located in each of
the selected health study communities   across the country.  The focal point
of the CHAMP network is the central computer facility located at the Environ-
mental Protection Agency, Research Triangle Park, North Carolina.  A mini-
computer at each of the remote stations controls and acquires data and asso-
ciated system status information from aerometric and meteorologic instrumen-
tation and transmits the data by phone lines to the central computer facility.
The central controller for the CHAMP network is a dual processor system with
a full complement of input, storage, and display peripherals.  One minicom-
puter was selected to perform the tasks associated with the management of the
large data base to be generated by the network.  The telecommunications and
real-time processing tasks are handled by the other processor.
     Two fundamental system objectives were:  (1) to provide machine valid-
ation of the data, and (2) to develop a management information system for use
in the quality assurance, field logistics, and field maintenance tasks asso-
ciated with system operations.
Remote Data Acquisition System
     Basically, the minicomputer in the remote station serves as an  interface
between the pollutant analyzers and associated system, magnetic  tape data
storage, the remote field service operator, and the telecommunications net-
work.  The data generated and recorded at the remotes and transmitted to
central includes not only the actual meteorologic and pollutant  sensor re-
sponses, but also associated analog signals and digital status signals.
                                     238
-------
These signals supply information about the performance and status  of each  in-
strument.  For example, if an instrument is switched from an ambient sampling
mode to the calibration mode, a status bit is recorded which reflects this
change.
Telecommunication
     Data is retrieved on the request of the central computer system from
each of the remote stations via a dial up phone line at two-hour intervals.
The central and remote computers converse via voice-grade telecommunications
system consisting of modems operating in a full duplex mode at the rate of
1200 baud from remote to central and 150 baud in the reverse direction.
Polling is under the complete control of the central controller.  A file is
maintained on a disk at central which contains the phone number of each sta-
tion in the format required by the calling software system.  An alterable
polling queue is also disk resident.  A rigid protocol has been established
to guarantee accurate transmission and retrieval of data.  Central makes sev-
eral tries to establish contact with a remote station before abandoning the
attempt and placing the station at the bottom of the polling queue.  A hard-
ware carrier detect protocol establishes the communications link.   Each frame
is checked for parity and framing errors by the modem controller.   Checksums
are computed for each 512 frame record and compared by the computer.  An
acknowledge character is exchanged indicating correct receipt of the record.
Should any of the tests fail, several transmission retries are made.  Commu-
nications are terminated by receipt of a character from the remote system
indicating the end of data or by failure of the remote to transmit in the
required period of time.
The Central Controller Hardware Configuration
     The focal point of the CHAMP network is the central computer facility
located at the Environmental Protection Agency Complex, Research Triangle
Park, North Carolina.  The central controller for the CHAMP network is a
dual processor system with a full complement of input, storage, and display
peripherals.  The heavy burden on processor time placed by the telecommuni-
cations and real-time processing of the large quantities of data justified
the choice of a dual processor system.  A PDP-11/40 with 40K of core was

                                     239
-------
selected to perform the tasks associated with the management of the large
data base generated by the network.   The telecommunications  and real  time
processing tasks are handled by a PDP-11/05 computer with 16K of core.   The
two processors are interconnected by a Unibus window which takes advantage of
the unified asynchronous data path architecture of the 11 system.   The  window
allows each processor to address the core and peripherals on the other  pro-
cessor as if it were its own.  In addition, the DEC memory management option
was added to the PDP-11/40 to handle addressing above 32K in the 16-bit sys-
tem.  An extensive complement of peripherals including two 1.2M word car-
tridge type disks, three tape drives, and electrostatic printer-plotter, line
printer, and CRT display were initially selected; the rapid retrieval require-
ment for large quantities of data necessitated the addition of a Telefile
dual spindle, quad density, removable 20 surface pack disk system capable  of
storing 98M words.
The Central Controller Software Requirements
     As mentioned previously, the PDP-11/05 processor is dedicated to the
system telecommunication tasks and the storing of the data simultaneously
on a magnetic tape and the Telefile disk as received.  The data (Level  1)
so stored is an image of the tapes recorded at the remote station.  These
data include the primary data (those data which actually represent parameters
of interest such as pollution levels), secondary data (that data required to
validate primary data or which are used only to insure proper station oper-
ation), and the status bits.  For flexibility any channel at the remote sta-
tions can be selectively assigned a primary or secondary function as required;
furthermore, the number of primary and secondary channels is made arbitrary.
All of the data at the remote stations, whether primary  or secondary data,  is
assigned a remote station data slot  (RSDS).  The complement of  instruments
and the RSDS number corresponding to a given instrument  may be  different  in
each station.   It is, therefore, necessary to append a "map" which gives  the
correspondence  between instrument and RSDS number at the front  of each set  of
station data.   At the central each parameter is assigned a mnemonic  (2 to 4
letters) which  describes the parameter  (NOS, 03, TOUT, etc.).   The map,
therefore, must contain the mnemonic and the link between the mnemonic and  the
corresponding RSDS.  As the complement of  instruments in a station changes,
                                     240
-------
a new map will  be created and appended to future data from that station.   All
operations at Central, will, therefore, refer only to the mnemonic names  and
not any "channel," "data slot," or other number.
     The 11/40 processor is devoted to data validation tasks and task asso-
ciated with the management information system requirements for quality assur-
ance, field maintenance, and field logistics.
CHAMP Software Features
     As mentioned previously, fundamental system objectives were:
1.   To develop and implement a computer-based management information
     system for use in system quality assurance, field logistics, and field
     maintenance tasks.
2.   To provide for machine (computer)-validation of the data.
     The current CHAMP software, which will now be described, represents
the composite of original system programs plus those that were subsequently
developed in response to needs recognized after the system became operational
and to inadequacies in the original software.
Current CHAMP Central Software
1.   FILMAP - This program creates the air quality data base station map
     files.  The station map files contain station location information,
     instrument complement by station, status bit configuration indicating
     hardware failure, calibration data, and correct operating ranges of
     the hardware.  In addition, the map files  identify validation criteria
     such as secondary parameter limits, dependencies between primary para-
     meters, filtering and interpolation techniques to be applied, and the
     format of the final data output.
2.   FILSET - This program sorts polled and station data  (mailed) by
     type and writes the data in the correct format into the appropriate
     files in the data base.  Six types of data are sorted.  These are
     (1) primary parameter data,   (2) calibration constants,  (3) status
     words, (4) journal entries,   (5) secondary parameter data, and
     (6) calibration data. Table 1 presents a sample of secondary data.
3.   TIMSTN - This program processes the remote station data tapes and
     checks for remote station data time anomalies.  The  program enables
                                     z4i
-------
the operator to edit the reported times to resolve the time  jumps.
The edited station data is recorded on magnetic tape.
PDAILY - This program summarizes and produces  a printed report of
daily station performance.  PDAILY invalidates primary parameter
data in cases where bits associated with hardward failures are set.
Calibration data and data which is collected on the wrong instrument
range is invalidated.  The number of invalid five-minute averages for
each primary parameter for each hour is tallied as well as the number
of times each status bit is set for each hour.  The amount of data
found invalid, valid, missing or in calibration made for each primary
parameter is summarized over  the day.  The journal entries, which are
operator comments entered at the remote stations, are also listed.
PSUMRY - This program summarizes daily station performance and produces
printed summaries of station performance.  The performance summary
contains the following:
1.  Percentage of valid data by parameter by day.
2.  Percentage of valid, missing, and calibration data by day.
3.  Percentage of valid, missing, and calibration data over the
    days shown on the summary.
4.  Logs of data processing progress.
5.  Primary and secondary parameter calibration occurrences, and
    occurrence of control chart  samples which exceed  upper or
    lower control chart limits.
PSCHRT - This program samples the air quality data base  secondary
parameters and generates  control chart files from the  sampled
secondary parameter  file  data.
REVMAP - This program allows manual editing of  the data  by processing
validation actions entered on punched cards in  the standard  "Review
Change Request" format.
PCHEMS - This program generates  the calibration,  performs pre-
established validation  tests, and  produces  a  printed  report  summarizing
chemical analysis data  (i.e., hi vol  data,  bubbler data).
                                242
-------
9-   PPCHRT - This program samples the air quality data base primary
     calibration parameters (A and B constants) and generates control
     chart files for these data.
10.  PCALIB - This program generates the calibration coefficients used
     to convert the air quality data collected as a voltage by the remote
     station into a concentration.  The calibration constants are calculated
     from the raw calibration mode data recorded at the remote station on
     tape.
11.  FAUT - This program performs automatic calibration for selected
     equipment under control  of the Central Computer and/or the remote
     station operator.
12.  FILCAN - This program sorts the chemical analysis data by station
     by data and creates chemical analysis data files.
13.  PLOTCC - This program takes input from the control chart files and
     plots control charts for the primary parameter zeroes, spans and B
     coefficients and for the secondary parameter ranges and values.
14.  FLMRG - This program sorts station data (mailed) by types and writes
     the data in the correct format into the appropriate file in the data
     base, then these files are compared with those created by the FILSET
     Program (polled data) and fills the gaps in the data base.
                                    243
-------
00
0
t—
CO
1 — 1
1—



SECONDARY PARAMET!
77222 TO 77237
r~*i . _

NJ ff.
_J
QL 1-1
0 —I
Lu.
0 C£.
O
2 Lu.
0
Q O
1-1 CD
3 CO

f-r i^
Q 1—
co co
o:
u_ 
S



0)
cu
-M
fO
O

-^

[__
a>
03
Q
rO
Q
o> r**. csj ^ LO o^
i— 
i^D O> OO OO ^ **O
CM O O O O O
i— O^t OO CT» CTi CJ»

rf\ m ^4-
- VO
CM CM CO CO CO CO
CM OJ CM OJ OJ OJ
CO r— «* OJ O •*
CO «* OJ CM CO t—
^- OJ <* OJ OJ r—
OJ VO CO O OJ <*
OJ CM CM CO CO CO
OJ CM OJ OJ OJ OJ









C
cu
ai
>,
X







i — i-~ CM o un cn
CM OJ O CO CD r—
r-- un un un un un
o i — cn co oj un
OO VO 00 CO O OO
co CM vo r--. r-^ r--.
^3" CM i — ** OJ OJ
O O 0 O O O
CO r— 00 «* O CO


o o o o o o
CM o cn oo vo cn
o un o CM o r—
un co co cn cn i—
O O CM r— CD r—
vo r*1* cn CD *^ ^i-
CM CM OJ CO CO CO
OJ OJ OJ OJ OJ OJ
co oo cn o <• co


r— r— O O r— •—
oj *d- cn cn -------





















































*^++
*^3

a


cu
3
re
>
cu
_£

(—


CU


re
a


cu
E
•r—



CU
.p
re
a





r**» cn r— • r^ i*** CM
CM 00 O 00 CO r—
r-~ LO LO LO LO LO




i — r~» «*• 1 — CM r^
cn «d- LO o LO CM
LO l-v i — O LO LO
r^ CM CM CM i— i—
O O O O O O


1 — VO «3" 01 LO LO
cn cn LO co co co
«J- <»• LO LO LO LO
CM CM CM CM CM CM
CO LO LO r— CM O
i— CM LO OO i— LO
Cn Cn CO 00 cn O
r- O O O O r—



co co 01 1 — LO «*
CM CM CM CO OO OO
CM CM CM CM CM CM


LO LO  r-» oo r^.
CM LO r^ CM r^. CM
V4J LM LAJ r^« CU
CO LO LO LO LO LO
CM CM CM CM CM CM



CM LO CTi OO <=3- «=t-
LO O «* CM LO r—
r^ LO o LO LO i —
r— O i— r— r— r—



LO tv. O r— «* OO
CM CM OO OO OO CO
CM CM CM CM CM CM



p^ o cn LO oo •*
«* r^ i— Osj o •*
tM ^r t-j uj *~j r^
r^ ^ LO ^~ LO ^f
r— CM CM CM CM CM


r^* CM r^ r~- LO cn
i — i — O LO CD CM
CM •* CM CM r— i—


VO CO O CM VO •*
CM CM OO CO OO OO
CM CM CM CM CM CM



^1- CM O i— O «3"
r— O CM r— LO CM
LO OO LO OO i — OO



CM LO CO O «=J- CM
CM CM CM CO OO OO
CM CM CM CM CM CM



























CO
CO
"a.
E
re
CO

E
CU
cn
>{


j-
re

cu
^3
•i—
^
O

t/1
ZJ
o
s-
4_>
•r—
^


ti
H—
3
3
O
re
^>




















































S-
^.
o
•*-*

cu
3
CO
CO
ai
s-
Q.
CU

3
r—
O
CO
J3
•^













•a
o
•r-
S-
cu
o.
01
E
•i—
a.
E
re
00



o

i.
cu
1^
E
3
^

•o
i.
re
~£2
E
•P
oo



re
cu








E
3
E
•r—
X

s:










E
3
E

E
•^
:^











E
T3
E
UJ





cn
E
E
E
•r—
cn
cu
CO



(/)

E
O

.Jj_)
fO

cu
Q



3
,E
re


Q^
E

^^


cu
•P
re
o



0)
3




cu
E
L_M
fmn.


CU
+J
re
0



cu

"re
,




•p—


cu

re
Q


cu
P^
••^
*~"


cu
•p
re
a



cn r^ CM «^- LO cn
«* CM O CM O i —
IO LO LO LO LO LO



CM
oo r-- r~» LD LO oo
LO CT> LO "^ O LO
oo cn cn o co vo
OO CM LO CM O LO
CM i — O «d" i — CO


o CM cn LO «3- co
LO O O LO «* CO
LO LO LO CM OO LO
CM CM CM CM CM CM


LO *j- vo cn r^ *±
r— LO r— OO «* r—
CM CM CM r--. co r^
O O CM O O O


•sj- r>. co CM «vf LO
CM CM CM OO CO CO
CM CM CM CM CM CM



LO CM cn cn «* «*
<* «* oo LO cn O

C^i P"*1* t-Q vO r**^ uo
OO CM CM CO CM CO



CO CT> LO ^t" CM CTi
•* LO CM O CO i—
CO LO LO CT> CO CM
__ —~ f \ f-~i (—x __
f— , - (^J ^^^ 1 ^^ 1


CM LO CTI CM 


LO CO O CM «=}• LO
CM CM CO OO OO CO
CM CM CM CM CM CM



CO i — «d- CM O «*
00 •* CM CM CO i—
^J- CM *J- CM CM r—



OO LO OO O CM «d~
CM CM CM OO CO 00
OO CM CM CM CM CM

































CO
CU

a.
E
rO
CO
c
cu
r—
>>

4->
t Lj

-o

re

cu

o
NJ
o

JT
o
M-
E
3
3
0
re






















LO LO O O CM "3"
•3- oo o CM oo o
LO LO LO LO LO LO



LO
r— LO f^ ^3" CO *~~
oo sf cn CM ^^ r^
oo oo •— CM o r^
co CM r~-- CM r^ o
CM i— O -!d- 00 r—


i — co J- LO cn 'd- r^
i— LO •— co r— «d-
CM CM CM l^> r~ OO
O O CM O O O


<5f r~» co CM vo «*
CM CM CM CO OO CO
CM CM CM CM CM CM



LO CM cn cn •* <*
-------























































•o
CD
^
C
• r—
4J
C
o
o


•
r—

UJ
^_
ca
•^
h-




























_
CD
i—~
r}
03
1—
C
H- 1

(/)
0)

"c.
e
03
oo
,_
5
s-
o
Li-

eu
s-
3
10
in
CD
S-
O-

O
si
f\ \
UJ
s-
03
CQ




















































01


£
£

**

(/)
OJ
Q.

CD
-P
^3
r—
o
If)
j3
^£
















XJ
o
•r~
S-
OJ
Q.

CT
C
•r-
"o.
E
fO
oo



0
S-
eu
o
E
3
^^



"O
S-
n3
T3
03
^_)
oo


«0
^






E

E
•r—
X
cO
s:











£
Z3
_E

C
•r-
2£!










cn
c
•5
c
UJ






01
c
E
c:
•i —
01
OJ
on


to
CD
^
1 —
03
^>


C
o
•f—
-p
03
'>
CD
Q

CU
fr<
•W

CD
_E

1—


QJ

03
Q





03


CD
E

t_



CU
4_>
03
Q


CD

>



QJ
£
•p.
f—


Qj^
^_>
03
Q



CD
E
i 	
i


CU
-4_>
(O
Q


<~O 00 •=*• i— CO CO
«=!••* i— Ln CM co
r~. Ln Ln Ln Ln Ln

CM CM
CO CO
1 1
en Ln vo *d~ en
CO CM CO CO i — i —
co p*-» tn en en en
o en co co co co
en en oo Ln vo 10
O O r— O O O


i — O «D P~- Ln CM
Ln Ln oo o CM «*
ID Ln co en oo co
p^s r*^ j*«» ^^ r^^ r^^

Ln co "^ r^. cr> en
«* «5t O CM O «*
CM o co CM CM en
O r- CM r— r— O


«* co en o co vo
CM CM CM CO CO CO
CM CM CM CM CM CM



co en en co o Ln
co «* Ln en i — r—
co co r- o en o
r— r— CM CM r— CM


r~- r— co co o vo
Ln co Ln CD ^~ ^"
CO CO CM CO CO CO
•— •— O i— O r-



Ltt VO CO r— -=3- •^•
CM CM CM CO CO CO
CM CM CM CM CM CM


O i— Ln rv. o ^t
«* en oo vo tn co
«* CM «e- 1^ r- ID
i*» h» r~ r^ r^ r->



^* co r*** en *^ '^
r— Ln Ln o ^3" Ln
CM co i— CM o en
i— •— t— i— r— O


^O CO O CM *^J" VO
CM CM CO CO CO CO
CM CM CM CM CM CM




CO CM CO CM «* r-
CO ^3- CM CM CO i—
•=d- CM *d- CM CM r—



CM ID OO O CM «*
CM CM CM CO CO CO
CM CM CM CM CM CM

246
-------
   USE OF PRECISION AND ACCURACY ESTIMATES
            FOR VALIDATION OF DATA
                      by
                David T. Mage
  Environmental Monitoring Systems Laboratory
    U. S. Environmental Protection Agency
Research Triangle Park, North Carolina  27711
                      247
-------
                  USE OF PRECISION AND ACCURACY  ESTIMATES
                           FOR VALIDATION OF  DATA
                               David T.  Mage
                                INTRODUCTION
      A basic need in the presentation of a data set is  a  description of the
precision and accuracy associated with the measurements.   The definition of
valid data for a given study is then determined  by the precision and accuracy
claimed for the data set.  For example, a statement may be made that the data
have a precision of 10%.  Assuming that the errors are independent and normal-
ly distributed, one expects that 68% of the measurements are within ±10% and
95% of the measurements are within -20% of the true value  which is unknown.
By this definition the 5% of the data in error by more than -20% are not in-
valid since one expects from probability alone that runs of positive or runs
of negative errors can occur.  It is not productive and probably impossible
to examine each datum point and determine an  individual  error associated with
it.  The approach being taken for the Community Health Air Monitoring Program
(CHAMP) data base is to determine the precision and accuracy of the entire
data set and not invalidate data except for known cause such as instrument
failure.  The alternative of eliminating data suspected of higher uncertainty
in order to improve the precision of the remaining data set is counter pro-
ductive in the context of a health study.
      In a health study, the aerometric data  are paired with health data.
When aerometric data are invalidated, the associated health statistics are
also removed from the analysis.  Because the  occurrence of the health indica-
tor, such as an asthmatic attack, is relatively infrequent, the loss of the
information significantly reduces the validity of the overall study.  For
this reason an approach which provides a large aerometric data base of moder-
ate precision and accuracy is preferable to an approach which provides a re-
duced data base of higher precision and accuracy.  The following sections

                                     248
-------
describe the system of data validation currently being  used  by  CHAMP.

                               ERROR ANALYSIS

      When an aerometric analyzer is continually monitoring  pollutant,  several
sources of potential error can influence the measurement.  The  ten  sources  of
error given in Table 1 are discussed below.
      1.  Span Gas Analysis—This error covers the uncertainty  in the  process
of preparing a known concentration of pollutant to provide an upscale  reading.
      This process may contain errors associated with preparation of a primary
standard and subsequent analysis of a secondary or transfer  standard to be
used in the field.  When dilution is necessary to attain the desired concen-
tration, the errors in the flow measurements of standard and diluent air also
contribute to the overall error.   It is the belief of the author that this
uncertainty is on the order of 5% (al = .05  ).
      2.  Zero Gas Impurity—The gas used to zero the instrument may contain
some impurity.  The presence of 0.1  ppm as opposed to 0.0 ppm represents po-
tential error of 10% at the 1 ppm level and  1% error at the  10  ppm  level.
For the purpose of this analysis a low error of 2% (o2 = .02 )  is chosen
since greatest concern is at or above the National Ambient Air  Quality Stan-
dard (NAAQS) where this error is minimized.
      3.  Instrument Drift, Electronic—When the sample and  reagent flows to
the instrument are held constant, a constant input concentration produces a
signal  which fluctuates about a mean value.   This "noise" in the output sig-
nal may be caused by electonic noise in the  photomultiplier  tube and other
electrical components due to voltage, frequency and temperature fluctuations.
The estimate of error for this effect is 3%  (a3 = .03 ).
      4.  Instrument Drift, Flow Variations—After an instrument is calibrat-
ed, and with input concentration held constant, an increase  or  decrease of
the sample flow rate will tend to cause the  instrument to drift away from the
equilibrium point.  When reagent flows are also being mixed  in  a reaction
chamber, such as ethylene flow in a chemiluminescent ozone analyzer, the
fluctuations in reagent flows also influence the output signal.  These flows
may be influenced by fluctuations in atmospheric pressure and vacuum in the
flow system.  The overall effect of these variations of flow from the

                                     249
-------
                                                                                                      to
             cxi
    LT\  C3-  CD  CD  UD  CD
'    CXI  CD  CD  hO  i—H  CD
~    CD  CD  CD  CD  CD  CD
    CD  CD  CD  CD  CD  CD
                             CD   CD   CD   CD   CD     r-H
                             CD   CD   CXI   CD   CD     CD
                             CD   CD   CD   CD   CD     CD
                                                                                                      O
                                                                                                      0.
                   IT\  CM   N^   CJD   CT
                   CD  CD   CD   CD   CD
                             CXI  CNJ   CNJ
                             O  CD   CD
                                                           CD  O
                                                                                                       LU
uu
_i
CO
        o:
        o
       O


       cn
        a:
        o
        on
        ce
        u.
        o

        CO
        UJ
        o
        on
        ^
        o
       GO
                                   00

                                   o
CO
C£
O
or
o:
en
O
o
a:
i-
o
00   >-    ~>    i   0)
—   I-   I-   h-   —
CO   •—i   U_   U-   O
>-   on   i—ii—i   LU
—J   :D   on   on   on
^   Q_  f*"^  f~^i   Q.
                                                            UJ
                              O
                              CO
                                                            CO
                                                            z
                                                            o
                                                            O
                                                            LU
                                                            on
                                                            on
                                                            o
                                             LU
                                             a:
                                                  CD

                                                  Q
                                                                                O
                                                                                0£
     CO   CO


     CD   <_D
                                        a:
                                        o
                                        h-
                                        <
                                        a:
               on   on
     Z   O   t-   I-
     <   OC   V>   CO
     Q-   UJ   Z   Z   0-
     CV5   rx|   ^-   H—   CD
CO
o:
o
on
on
_u

Qi
O
Z
1—4
s:





o

CO
LU
i— *
1-
1— 1
on
^
UJ
z
1— 1
_J
1
z
o





LU
21

h^

UJ
CO
z
o
Q_
CO
UJ
Q^




CO
UJ
u
z
UJ
on
UJ
u.
o;
LU
h-
z
1— 1
1-
^
a:
LU
a.
2:
LU
1—
1
LU
on
^
00
CO
UJ
o:
Q_
^

CO
z
t— 1
CO
CO
LU
o
o
en
O-

«^
1-
«^
CD
                                                                                                       on
                                                                                                       o
                                                                                                       u_
                                                                                                      CD
                                                                         II

                                                                        b"
                                                                    on
                                                                    O
                                                                    cc
                                                                    or
                                                          a
                                                          LU
                                                          i-
                                                          o
                                                          UJ
                                                          Q_
                                                          X
          CNI
                                  -=r   in
                                       to
                                        oa   o>
                                                   250
-------
calibration condition is taken to be 6% (a^ =  .06  ).
      5.   Operator Impreci si on—The station operator  in  performing  calibra-
tions must adjust potentiometers and rotometers  and perhaps  read  the  mean  of
a fluctuating signal.  A different operator repeating these  procedures  will
arrive at a slightly different result for each of  them.   The resulting  uncer-
tainty, due to the human element, is estimated at  4%  (o5 =  .04 }.
      6.   Non-Linearities of Scale—A linear relation is usually  assumed  be-
tween voltage output and pollutant input.  Slight  non-linearities of  scale
are usually masked by the uncertainties in the measurements  themselves. Where
the scale appears to be linear an error of 2% in the  linearity is almost  in-
distinguishable, consequently this error of 2% is  treated as a possibility
which cannot be ignored (o6 = .02 ).
      7.   Response Time—Due to the finite response time of  the instruments,
a rising signal will lag and tend to be underestimated and a falling  signal
will lag and tend to be overestimated.  These errors  are felt to  provide  an
error on the order of 2% in the measurements (a7 = .02 ).
      8.   Interferences—Variations in the atmospheric composition from the
composition of the gas used to calibrate the instrument can  cause errors.
Common gases, besides the common pollutants, which fluctuate in the atmosphere
are C02 and H20.  These fluctuations can cause variations in output signal on
the order of 2% (a8 = .02).
      9.   Pressure Temperature Correction—When  data  are corrected to stan-
dard conditions (25°C and 760 mm Hg), uncertainties in measured pressure  and
temperature can cause a slight error on the order  of  1% (a9  = .01 ).
      10.  Data Processing and Round Off--In analog-digital  conversions and
vice versa, an error is created.  When data are  outputted a  round off error
also occurs.  These errors are quite small and probably less than 1% (a10 =
.01 ).
      The net result of all of these effects, assuming that  the variances are
additive, is an overall uncertainty of -10%.  This is interpreted as follows.
If  the atmospheres at 10 stations were all 10 ppm  of some arbitrary pollutant,
one would expect the mean of all 10,station measurements to  be 10 ppm,  7  of
them would be between 9 and 11 ppm, 2 of them would be in the range 8-9 ppm
and 11-12 ppm, and 1 would be over 12 ppm or less  than 8 ppm.
                                     251
-------
                          CHAMP VALIDATION CRITERIA
      The CHAMP aerometric system is unique in that it measures and records
the secondary flow parameters within the instrument simultaneously with the
measurement of the primary parameter (pollutant concentration).
      The fluctuations in the secondary parameters produce uncertainties in
the measurements as described previously.   When the fluctuations exceed their
expected range, or limit of normal  operation, two possible causes must be in-
vestigated.  The first is pure randomness  which means that the data are valid.
The second cause may be a component failure, such as a clogged capillary tube,
and this produces a bias in the data.  Besides signalling the need to repair
the instrument, the analyst has a cause for invalidating that portion of the
data set where the instrument was operating out of the normal range.
      As an example of the usage of secondary parameter data for determining
precision, accuracy, and validity, the effect of ethylene flow within the
Bendix chemiluminescent ozone analyzer is chosen for discussion.  Figure 1 is
a typical plot of the ozone flow variations within a CHAMP station.  On the
day shown the ethylene flow (FETH) varied from 25 cc/min to 26.3 cc/min.
Figure 2 is a histogram of the fluctuations of FETH about the previous cali-
bration setting of FETH during the 11-day period August 15 - August 25, 1977.
The histogram has a mean of -0.03 cc/min and a standard deviation of 0.29 cc/
min.  The mean close to zero confirms that the fluctuations are not producing
an appreciable bias, as expected.  The standard deviation can be related to
a standard error by examining how the output of an ozone analyzer varies with
FETH.
      Figure 3 shows the instrumental bias  (A ppm 03) as a function of AFETH
at various ozone levels.  In this case the instruments were on the 0.5 ppm
scale, and were calibrated with a sample flow of 1000 cc/min and a  FETH of
25 cc/min  (bias = 0 by definition).  When these data are normalized by di-
viding bias by original ppm value, the percentage changes at all four concen-
trations follow a common curve, Figure 4.   In this case a linear fit to these
data is not justifiable.  At 25 cc/min the slope of the curve  is +1.6% per
cc/min.  The standard deviation of 0.3 cc/min, therefore, corresponds to a
standard error in ozone of 0.5% as shown on Table 2.  The biases of the meas-
urements to changes in vacuum  (VAC), sample flow of ozone (SFOa), sample

                                     252
-------
     CD
     C
i—   OJ

UJ   >>
a:   .£=

en   QJ
i—i
u_   <*-
     o

     c
     o
     •r—
     -!->
     (O
     •r—
     s_














(
0


















"



I







|
!
i








c
o




o
CL
O










0
n °
o
o
o
o
o
o
Q
e
0 <
Q
o
o
o
C
o
o
» 0
e
r)
*o0



O
tf





I
i

i


c
o <
' u
Oo
0 4
o
o
°0
> °
o
o

0
0 0


1
d»«°






i




o
o
MJ<

o
0
o
0 ^
,°3
0 0
0
0 °
0 0


f *










0°
0°

0°
o
0
° o
0°
0°
0 °
o
o 0
0 °
o
o
o
o
o
\J
o



0





o

o
0


u
o
o
3








0

„ 0
p

I '

o
0


o


o
o
o
3
0*&
0°~^








9
o
c


o
o
o
e
0
1
»v
o
0°
o
o
fl
3°°
0







!





t
J°o
p
o


0°
^
1
0 ^

o
o
d- »
o
>o
lO

> „<>
> o
>0



I
11 	 1











1
00
»
o
o
> o
(f3
0
o


I
1
f
i
i

!









o
{
{
c
o
e
: O
i 0
o
0
<
1
o
<
1















p
0
a
1


o
o

0
0

?tl

&

y

2
U3

.-r
rH
^1
l-l

y








0
                                                                                            <£
                                                                                            Q
CM
                                           u>
                                            •

                                           CM
                                                                            4-
                                                                              •

                                                                            CM
O
  •


CM
                                       253
-------
CNJ

LLl
ce.

o
    LU  IO
    Z  CO
    LU  CM
Z  O
1-1  --3


00  O
Z  I—
O
I—l  VO

1—  CM
<  CM
    O
    II


    LU
        o
        -3
    o  :r
    H-  »-
    CO  LU
    h-H  LU
    a:
                                                                                             o
                                                                                             UJ
                                                              £

                                                              u



                                                             CO
                                                                                          Si
                                                                                               o
                                                                                               ID
                                                                                          o
                                                                                          J-
                                                                                              s
            o
            o
                  o
                  o
o
o
o
§
                                       sNOiivniorm  do
                                                254
-------
                              -NnjMtfll^fflUJB-NnTtl
B   M   J   n   W   -   H   B  Bl   B   B   B   H.   H  H   B   ---.---
B   B   B   B   B   B   B   g  gf   g   jj   gj   g|   gj  5}   ^   B*   B   B   B  B   B
SBBBBBBllllllilllilJl!
(6Q
                                                      SVia
                                                  255
-------
                                                                                 "SE
«3-

LU
C£
        •      •     •                .    H    M
       BUB     .      .     M    -    -
       N     -    -    W     0     I      1     i
                              (aBueip  %)  SVI9

                                      256
B     W    B    W    B
N     PI    PI    PI    J
I     i     i     i     I
-------




CD
U_
oo


ZE
|— —
UJ
LJ_



CO
ry
LU
1- CO
LU hO C_)
Si CNl «=C
-
a: r^.
< CO 1 <
d_ LU CNJ •— '
o CNI z
>- < CNJ o:
o: o; r^. o
< LU l\ U_
o > — •
CNl 2  «-" ^ CNl
< z. a: o CD
) — LL. •— UJ Z U_
O 2: Q_ <
1 _l
co i_n > o.
•^ ^ -^ 	 '
O Q
•— 1
1- LA
< I— I
»•— t
<
>


CD
^
u_
oo











s:
•^^
CJ
CJ
oo cr>
T— H N^v

s:
o
CJ
LO en
CNl 1^
N^ *
^ +
a:
or
0
I—

N™\
r— 1
CO N~\
CT i— i
CO -f






^r
^^
CJ


00 CD
en
CNl r-(








2:
CJ
CJ

•CJ" r— • |
cn
CNJ -=T

0
*— i
LU |-
O <
> 21
CC CC. ID
uj LU s:
CQ CO — •
s: CQ x.
~\c~~i  cr en
CD 1^. CNl
• - -
1 1 0
a:
a:
0
l-
oo
oo en hn
- i— i
CD C.O
N"^
+ 1






^"
v^
O
OO CJ
CD
CD 00
CO -3"
. •
+ T— | 0
1





S
U
CJ

co i_n
CD rv. to
- . .
LO r— i
1 1

•z.
0
Q— r
SI fv* (—
13 ^ - 2 >
< 2 < LU
HI t~i f— C5
SZ SI CO


fe-5
un
CD




&-S
un
•
CD





fe-S
tn
-
CNl












B-S
hO
«
O










&~s
to
-
r-H




G
Od
«i
Q
"Z.
<£
^—
GO
257
-------
flow of NO (SFNO), and flow of air through  the ozone  generator  (FCh)  are
shown on Figures 5-8.   These biases were used to compute  the  other standard
errors given in Table  2.

                                 DISCUSSION
      The assumption developed in the preceding section is  that the measure-
ment errors from instrument-to-instrument are independent and normally dis-
tributed.  When the same standard is used for repeated calibration of an  in-
strument to provide a  time series of measurements,  the error  in the analysis
provides a bias.  One  of the functions of an audit, using an  independent  set
of standards and observers, is to disclose whether  a  significant bias exists.
In September of 1977,  an audit of 46 instruments located at seven CHAMP sta-
tions was performed by the contract operator of the stations.  The audit  pro-
cedure was not truly independent since the transfer standards in station  use
were originally compared to the primary standard used in the  audit.  Of the
46 instruments, seven  of one type showed a consistent bias  indicating a prob-
lem with the audit procedure.  The remaining 39 instruments showed an ex-
pected positive and negative scatter about the audit values.
      In order to test the hypothesis that these 39 results are normally dis-
tributed, the results  are plotted on normal probability paper at frequencies
corresponding to their rank, lowest to highest, divided by  the total (39)
plus one.  Each datum point represents the average  of four  span results at
approximately 20%, 40%, 50%, and 80% of full scale.  These  four values of
deviation are not independent since the same operator used  the same standards
for each one.  However, the set of 39 averages are  mutually independent.   The
mean of the deviation, y, is -1.55% and the standard deviation, o, is 5.5%.
The maximum difference between the frequencies predicted by the normal dis-
tribution, N( u,a), and the data points is 6%.  This corresponds to a
Kolmogorov-Smirnov statistic of 0.06 which indicates that the hypothesis of
normality for the distribution cannot be rejected at the 5% level.  If an
independent auditor performed the audit with independent standards, a stan-
dard deviation  larger than 5.5% would be expected,  probably on the order of
10 to 15%.  The results of independent CHAMP audits are being analyzed in the
manner described  above and the results of  the analyses will be reported with

                                     258
-------
                                                                       Q
                                                        139
                                                                                 '55
                                                                                 •05
LO

LU
C£.

CD
                                                                                    CM
                                                                                   O
                                                                                 T3h
                                                                                 "SE
                                                                                 •0E
                                                                                 •52
       S    W
       W
M     El
. El
M -
1 1
(aSueip %
U H
N
1 1
) svia
u
n
i

El
n
i

w
n
i

d
j
i

                                        259
-------
                                                                     *0hZ
                                                                      "01Z
VO
                                                                      •061
                                                                           u

                                                                      •0B1-

                                                                           o

                                                                           u_
                                                                           CO
                                                                      •051
                                                                      •021
                                                                      •0hl
                                                                      •0EI
-------
       o
LlJ
a:

CD
       o

       <:
 to
•»->
 E


 3
 S-
•P

 c
•I—

 
-------
262
-------
U3
                CM
                                                                                                                                                       o
                                                                                                                                                       o
                                                                                                                                                      •r-
                                                                                                                                                      4->
                                                                                                                                                       IB
                                                                                                                                                       .c
                                                                                                                                                       4->
                                                                                                                                                       •r—
                                                                                                                                                       X
                                                                                                                                                       T3
                                                                                                                                                       4J
                                                                                                                                                        C
                                                                                                                                                        OJ
                                                                                                                                                        o

                                                                                                                                                        O)
                                                                                                                                                       Q.
                                                                                                                                                       IO-
                                                                                                                                                       CS
                                                                                                                                                       O
-------
the data elsewhere.
      In conclusion, the approach being taken for the CHAMP data validation
procedure is to accept data from the instruments when they are known to be
operating properly and make a probability statement for the individual data
set as a whole.  For example, a pollutant data set for one station may have a
standard deviation of 10%  but the standard deviation for another pollutant
at the same station may be 15%.
      These different uncertainties allow the statistical analyst to weight
the data higher when the expected error is low and adjust for the relative
uncertainties in making correlations between air pollution and health.
                                     261*
-------
 VALIDATION SYSTEM USED IN THE ST. LOUIS
   REGIONAL AIR MONITORING STUDY  (RAMS)
                    by
             Robert B. Jurgens
 Environmental Sciences Research Laboratory
    U.S. Environmental Protection Agency
Research Triangle Park, North Carolina  27711
                     265
-------
                VALIDATION SYSTEM USED IN THE  ST.  LOUIS

                  REGIONAL AIR MONITORING STUDY  (RAMS)*

                            R.B.  Jurgens+


                               Abstract


     This paper describes the RAMS measurement system,  screening categories

of data validation, the RAMS automated validation  system, the current

status of special validation studies - visual  validation and successive

differences, and updates to the RAMS data base.   The conclusion presents

a generalized measurement system including quality control, data validation

and feedback.
*Portions of this paper have been discussed in detail elsewhere (1) and
 will only be mentioned here for completeness.

+0n assignment from National Oceanic & Atmospheric Administration.
 U.S. Department of Commerce, Rockville, Maryland 20852.
                                  266
-------
                             Introduction

     The Regional  Air Monitoring System (RAMS)  is  the ground-based
aerometric network of the St.  Louis Regional  Air Pollution Study (RAPS).
See references (2-3) for a discussion of the  objectives, scope and
accomplishments of RAPS.
     The location  of the 25 RAMS stations within the St. Louis metropolitan
area is shown in Figure 1.  The air quality,  meteorological  and solar
radiation measurements within  the RAMS network  are listed in Table 1.
Note that not all  measurements are made at each station.  Measurements
began being recorded in mid summer 1974 and continued through June 1977.
From April to June 1977 only stations 104, 106, 107, 111, 115, 121 and
125 were in operation.  The approximate volume  of data recorded during
the network operation is 500 million values.   Figures 2 and 3 show the
data flow through the RAMS stations and through the central  facility at
Creve Coeur.  Rockwell International Corporation was the prime contractor
for the installation operation and maintenance  of the RAMS network.  A
detailed description of RAMS can be found in references  (4,5).
                                 267
-------
Quality Control & Data Screening
     Data validity results from:  1) a quality control  program designed
to provide accurate data as it is measured and 2) a screening process to
detect spurious values which exist despite the quality control process.
The quality control program for the RAMS network is reviewed in (1) and
(5).  Detailed definition and discussion of the elements of quality
control for air pollution measurement systems have been published in (6).
The specific quality control activities relating to calibration, zero/span
checks, status and analog checks associated with the gas analyzers are
quite similar to those of the CHAMP program which are discussed in the
preceding paper by Dr. Marvin Hertz.
     Based on the experience of managing the data validation activities
of RAMS we have developed a summary of screening techniques which is
applicable for any continuous automated monitoring network (air pollution,
water quality, etc.).  These tests  (Table 2) have been divided into
three categories:  1) Operational, 2) Continuity and Relational and 3) A
Posteriori.  Discussion of screening tests within each category and
their application in RAMS follows.
     The first category,  "Operational," contains checks which document
the network instrument configuration and operating mode of the recording
station.  These checks, which in RAMS are part of the quality control
program, include checks for station instrumentation, missing  data,
system analog and status  sense  bits, and instrument calibration mode.
In addition to documenting system performance the checks are  used to
flag data in the RAMS archive.  As designed, the RAMS data bank contains
space for every potential measurement.  For example, if an instrument  is
in calibration mode, the  corresponding data slots will contain a "calibration"
flag.
                                    268
-------
     The second category, "Continuity and Relational,"  contains  temporal
and spatial  continuity checks and relational  checks  between parameters
which are based on physical  and instrumental  considerations or on statistical
patterns of the data.   A natural subdivision  can be  made between intra-
station checks, checks which apply only to data from one station, and
interstation checks, those which test the measured parameters for uniformity
across the network.
     Intrastation checks include tests for calibration drift (gas
analyzers in RAMS), lower detectable limits,  gross limits, aggregate
frequency distributions, relationships, and temporal continuity.
     The drift calculations, which are part of the quality control
program, are discussed in the above references.  Many measurement instruments
have a threshold, or lower detection limit (LDL), below which their
output is obscured by instrument noise.  A standard practice adopted in
RAMS is to replace values in this range  (0.0 +_ LDL) with +1/2 LDL. The
LDL's for the gas analyzers and the wind speed sensor are the lower
instrument limits listed in Table 3.
     Gross limits, which in RAMS are used to screen impossible values,
are based on the ranges of the  recording instruments.  These, together
with the parametric relationships which  check for internal consistency
between values, are listed in Table 3.   Setting limits for relationship
tests requires a working knowledge of  noise levels  of the individual
instruments.  The  relationships used are based on meteorology, atmospheric
chemistry, or on the principle  of chemical mass balance.  For example,
at a station for any given minute, TS  cannot be less than S02 +  H2S  with
allowances for noise limits of  the instruments.
                                  269
-------
     A refinement of the gross limit checks can be made using aggregate
frequency distributions.  With a knowledge of the underlying distribution,
statistical limits can be found which have narrower bounds than the
gross limits and which represent measurement levels that are rarely
exceeded.  A method for fitting a parametric probability model to the
underlying distribution has been developed by Dr. Wayne Ott of EPA's
Office of Research and Development (7).  B.E. Suta and G.V. Lucha (8)
have extended Dr. Ott's program to estimate parameters, perform goodness-
of-fit tests, and calculate quality control limits for the normal distribution,
2- and 3-parameter lognormal distribution, the gamma distribution, and
the Weibull distribution.  These programs have been implemented on the
OSI computer in Washington and tested on water quality data from STORET.
This technique has not been implemented within RAMS.
     Also, under intrastation checks are specific tests which examine
the temporal continuity of the data as output from each sensor.  It is
useful to consider, in general, the types of atypical or erratic responses
that can occur from sensors and data acquisition systems.  Figure 4
illustrates graphically examples of such behavior, all of which have
occurred to some extent within RAMS.  Physical causes for these reactions
include  sudden discrete changes in component operating characteristics,
component failure, noise, telecommunication errors and outages, and
errors in software associated with the data acquisition system or data
processing.  For example, it was recognized early in the RAMS program
that a constant voltage output from a sensor indicated mechanical or
electrical failures in the sensor instrumentation.  One of the first
screens  that was implemented was to check for 10 minutes of constant
                                 270
-------
output from each sensor.   Barometric pressure is not among the parameters
tested since it can remain constant (to the number of digits recorded)
for periods much longer than 10 minutes.   The test was modified for
other parameters which reach a low constant background level during
night-time hours.   SO^ was generally at zero and no persistency check
was applied against it.
     A technique which can detect any sudden jump in the response of an
instrument, whether it is from an individual outlier, step function or
spike, is the comparison of successive differences of a measurement with
predetermined control limits.  These limits are determined for each
parameter from the distribution of successive differences for that
parameter.  These differences will be approximately normally distributed
with mean zero (and computed variance) when taken over a sufficiently
long time series of measurements.
     The type of "jump" can easily be identified.  A single outlier will
have a large successive difference followed by another about the same
magnitude but of opposite sign.  A step function will not have a return,
and a spike will have a succession of large successive differences of
one sign followed by those of opposite sign.
     Though not implemented in Rockwell's data processing and validation
program, TAPGEN, (partly because of an expected large increase in processing
time on the PDF 11/40), or in EPA's data archiving programs (version
6.4) strong consideration has been given to this technique for potential
applications in data screening and quality control checks.  In 1976, the
Data Management and Systems Analysis Section (DMSAS) awarded a contract
to RTI to study validation procedures for the RAPS data bank (9) in
                                  271
-------
which a major area of investigations was the use of minute successive
differences.  Use of successive differences in an ongoing special  validation
study will be discussed in a later section.
     A number of interstation checks on meterological  parameters are
implemented in Rockwell's TAPGEN program.   However, they have only been
used for quality control of the RAMS system and not for validation
(flagging) of RAMS data.  These tests, which are shown in Table 4, are
performed on hourly average data.
     Another interstation check, the Dixon ratio test has been examined
to determine its applicability for screening RAMS network outliers
(1,9).  Dr. Ty Hartwell, RTI, in an earlier session presented some
results he has obtained using the Dixon ratio test on RAMS data.  This
test was never implemented into the RAMS data validation system.
     Referring again to Table 2, the third screening category, "A
Posteriori", was established to provide a mechanism for overriding the
automated flagging schemes which have been implemented in the instrumentation
at the remote sites and in the data screening module.  From a review of
station logs and preventive maintenance records, a knowledge of unusual
events, or through visual inspection of data, it may be determined that
previously valid data should be flagged as questionable.  Conversely,  it
may  be determined that  previously invalid  data  should be validated by
removing existing flags.  An example of when data would be invalidated
is when an  instrument,  such as a wind direction  indicator, becomes
misaligned  or uncalibrated because  of some non-linear or unknown  reason.
Removal of  flags or revalidation can occur, for  example, when the recording
instruments function properly, but  either  the sense bit or analog status
circuitry is  known to  have malfunctioned.
                                 272
-------
RAMS Automated Validation System
     The screening tests used in validating RAMS data were largely
developed and tested at RTF and then implemented in the St. Louis central
facility computer for on-site, near-real-time processing.  Through
continued testing and modification the validation system evolved to its
final version - version 6.4.  All data archived by previous versions
have been rearchived to this standard.  Table 5 lists the causes and
flags of screening tests while Figure 5 shows a flow diagram of the
order in which the tests are applied.

Special  Validation Studies
     Special known problems have occurred on certain parameters from
time to time.  The origin of these problems can be traced to sensor
failure, electrical transients, software bugs at RAMS stations and at
the central facility, data acquisition hardware, etc.  Despite the
automated validation program these problems have lead to the archival of
erroneous data.  It should be noted that these problems have only effected
a small  percent of all data - estimated to be less than 1 percent of the
total.
     In an effort to locate, review and flag any remaining suspect data
(known a priori or not) several studies have been initiated within
DMSAS.   Two major efforts involve a graphical review of hour average
data and a computer study of minute successive differences:
     Rams Hour Average Graphical Review. — Table 6 lists the volume of
data from the RAMS networks, the number of minute and hour plots and the
number of microfiche (24x) required for plotting all RAMS data.  The
tremendously large number (70,000) of minute plots preclude a graphical
                                 273
-------
review at this time interval.   Therefore, a graphical review system
using hour average data was developed.  This system combines the use of
computer graphics, interactive programs and computer files (lists)
wherever possible to reduce the manual labor associated with the various
tasks.
     The steps in this study are shown in Figure 6.  Computer-generated
note books of hour average plots are reviewed by trained personnel for
any suspect data.  See Figure 7 for an example.  The plots are also
reviewed by a second individual.  A consolidated list of dates and times
is entered into the computer for input into an automatic retrieval of
minute plots using the RAPS*GRAPHICS program.  These plots from the RAMS
minute archive are reviewed by DMSAS personnel.  From the original
review file a second computer disc file  (preliminary update file) containing
dates, time periods and suggested changes and flags  is prepared.  This
list with corresponding minute/hour plots is forwarded to Rockwell for
review and investigation of cause.  With the concurrance of DMSAS and
Rockwell the final output from the graphical review  process is reached:
an update file for the RAMS minute/hour  archive.
     Minute Successive Difference Study. — Visual  inspection of  hour
data will detect  large discontinuties  in time series plots of a measurement
or uncorrelated  traces between stations.  However,  if a few minutes of
"spiky" data were recorded during an  hour,  the hour  average may only be
changed by a few  percent.  Since hour-to-hour variations  in almost all
RAMS parameters  can normally be much  larger than a  few percent, small
changes caused by errors in minute data  will not be  detected by visual
observation.
-------
     To determine the quality of the minute archival  data we have been
applying a flagging procedure based on distribution functions of minute
successive differences.   This technique is based on the assumption that
minute successive differences will approximate a normal distribution
with mean zero.  See Figure 8 for an example of a distribution function
of ozone data from a five hour period.
     The RTI study (9) has shown that for a given parameter sample
standard deviations of minute successive differences are not constant
over stations, time of day, or seasons.  The functional form for E,
which can be expressed as:
                    I E z (parameter, date, time, station)
is not known, however.  Therefore, this study data flags have been
chosen as 4*Zmax where Zmax is the largest sample standard deviation
found in the RTI study for a given parameter.  These 4 sigma limits are
listed below.

               RAMS Variable                       4 Sigma Limit
          Windspeed (meters/sec.)                    +_ 3.0
          Temperature (°C)                           +_ 0.7
          Ozone (ppm)                                + 0.010
          CO (ppm)                                   +1.97
          Methane (ppm)                              +0.32
          THC  (ppm)                                  +0.84
          NO (ppm)                                   +   .028
          NOX  (ppm)                                  +   .035
          Total Sulfur (ppm)                         +_  .022
          S02  (ppm)                                  +_  .015
                                  275
-------
     Flagged data (dates, times,  station,  etc.)  are stored on the Univac
1110 and can be used as input in  further analysis.   Programs exist to
automatically print and plot suspect data.   An example of minute temperature
data with a succession of outliers is shown in Figure 9.   The corresponding
hour average is circled in Figure 7.
     Application of the minute succesive difference technique in a RAMS
data base update module will permit recalculation of hour averages which
contain significant amounts of erroneous spiky data.

RAMS Data Base Update
     The visual validation and successive difference studies are part of
a review of the RAMS data being conducted by DMSAS and Rockwell.  The
results of this review process will be an update file (dates, times,
changes, flags, etc) and separate modules for an update program.  Figure
10 shows this update process including review studies and specifically
known problem areas.  Underlying this review is the requirement  that all
changes will be documented and concurrance required as to probable cause
of suspect data.

Monitoring System with Quality Control and Data Screening
     Figure 11 illustrates the data validation process within the framework
of a generalized monitoring  system.  Associated with sensor  instruments
and the data acquisition system are quality control blocks which contain
those elements required  for  acquiring acceptable data:  calibration,
system status and sense  bits, preventive maintenance, training and
operation and maintenance documentation and records.  Data  processing
                                 276
-------
and screening should take place soon after data acquisition to permit
system feedback in the form of corrective maintenance, changes to control
processes and even to changes in system design.
     A control data set (or sets) should be created for use in software
verification.  When software changes are made, the control data set is
processed and the output compared with previous versions.  This is
analagous to the recalibration of gas analyzers after maintenance.  The
control data set used in RAMS is 1-day's data from all sites.
     The effectiveness of the data review process can be greatly enhanced
by the use of graphics.  Review of graphical displays of raw data permits
a rapid continuity check of individual time series and a visual correlation
of network data.  Graphics naturally augments automated data validation
techniques which are necessarily based on a priori knowledge of system
performance characteristics, expected magnitude and variations in recording
levels, etc.
     A monitoring system is dynamic in nature - responding to changing
hardware/software requirements and to variations in operating and maintenance
procedures.  On-site, near-real-time data review and allowance for
feedback in system design can minimize the amount of lost or marginally
acceptable data.
                                 277
-------
References


(1)  Jurgens, R.B. and R.C.  Rhodes,1976:   Quality Assurance and Data Vali-
     dation for the Regional Air Monitoring System of the St.  Louis
     Regional Air Pollution  Study.   Proc.  of the Conference on Environmental
     Modeling and Simulation, EPA 600/9-76-016,  730-735.

(2)  Burton, C.S. and G. M.  Hidy, 1974:   Regional Air Pollution Study Program
     Objectives and Plans, EPA 630/3-75-009, 53  pp.

(3)  Browning, R.H., 1977:  (RAPS)  Description and Status of the Data Measure-
     ments, Quality Assurance and Data Base Management System, (unpublished),
     72 pp.

(4)  Meyers, R.L. and J.A. Reagan,  1975:   Regional Air Monitoring System at
     St. Louis, Missouri, International  Conference on Environmental Sensing
     and Assessment, Paper 8-2, Lofc #75-37494,  4 pp.

(5)  Hern, D.H. and M.H. Taterka, 1977:   Regional Air Monitoring System Flow
     and Procedures Manual,  EPA Contract DU 68-02-2093, 177 pp.

(6)  Quality Assurance Handbook for Air Pollution Measurement Systems, 1976:
     Volume  I, Principles, EPA 600/9-76-005, 365 pp.

(7)  Ott, W.R., 1974:  Selection of Probability Models for Determining Quality
     Control Data Screening Range Limits, Presented at 88th Meeting of the
     Association of Official Analytical  Chemists, Washington, D.C., 6 pp.

(8)  Suta, B.E. and G.V. Lucha, 1975:  A Statistical Approach for Quality
     Assurance of STORET-Stored Parameters, SRI, EPA Control No. 68-01-
     2940, 8 pp.

(9)  Hartwell, T. and F. Smith, 1977:  Study of Two Data Validation Procedures
     for  the RAPS Data Bank, RTI project 43U-1291-2, EPA Contract 68-02-2407,
     46 pp.
                                    278
-------
"I
ro I
a '
        a
        •—I
        a
                                                c
                                                o
                                               en
                                                c
                                                o
                                                '+-J
                                                03
                                                o
                                                o
  273
-------
      00
   LU O
   CO >— i
   Z h-
      oo
                  CM ro    no    in
                  i— r—    r—    CM
                          in
                          CM
                                in
                                CM
in
CM
           in
           CM
                                    in
                                    CM
in
CM
in
CM
in
CM
in    CM
CM    i—
                                                                                                             CM
                                                                                                                   CM
   OH —I
   •=> et
   oo >
   
                                                    01

                                                    o         •—
                                                    C      O  I
                                                    •r-     O     t
                                                                                                                o
ex:
cc
o
CO
              CM

             O
             00
             LU
             O
             X
             o
Q

o:
                            oo
                             CM
                                              X
                                              o
                                                                                                 LU
                                                                                                 O
                                                                                                       CM
     00



     eg

     U-
     	I
     =3
     oo
                            OO
                                   co  x
                                  o    o
CD          O
O    LU    i—i
OH    Z    OH
O    O    I—
>•    M    HH
x    O    z
o

oo
LU
O
                                              X    HH
                                              o    z
                                         CM
                                       O
 O    LU
 i-i    Q
 X    i—i
 O    X
 1-1    O
 O    Z
                                                    LU
                                             a:
                                             5
                                                                C_J
                                                                LU
                                                         OO

                                                         O
                                                         CD
                                                                      u
                                                                      o
                                                                                           Q.    t— •
                                                                                    UJ
                                                                                    t— •
                                                                                    O
                                                                               00
                                                                                     O
                CJ
                LU
                OH


                O


 -
                                                                                     Q.
             
-------
                        o
                        QC <
                                         LU
                                         H- 00


                                         II
                                         LU O_
                                         QC
<
I-
<
a

oo
00

00
        cc

        <

                                                                  CO
                                                                  2
                                                                  O
                                                               LU
                                                                  o
                                                                  o
                                                                              -t >
                                                                              < I-
                                                                                 0
                                                                              LU <
                                                                              O LU
                               x o o
                               320
                               5 < <
o
_J
2
O
p
<
K
oo
                                                                             00
                                                                  CO LU
                                                                  2 h-
                                                                  — CO
                                                                     >-
                                                                     CO
                                                                                                o
                                                                                               in
                                                                                               cc
03
•o
                    CO
                    I-
                    2
                    LU
                    ^

                    cc
                    H-
                    GO

,
^

j
*•
*u.
O
0
0
Mi
J
o
QC
O
LU
t-
LU
^
•
^










2
O
H-

Q
<
QC
QC
_i
O
00











>
j-
_
<


»•
•N
J
t
3
C
3
QC
<

I

                                                            id cc
                                                            < 22
                                                            a _,
                                               281
-------
    CJ
CO h-

cc a
   co
   LU Jrt
 CC
 *"
            1
LU
O
oo
00

CO
  _    -
  « CO

  0
  a. J3
   a co
 in -J LU
 CM UJ K
   LU CO
                                   CO

                                 LU C3
                                 H- <
                                 = CC

                          m|*l>

                          H- co — co" cc

                          u5£5i =
                          CC CO of L! 3-
                          " S rf ^
                 I- >
                 < 00 2
                 a Q o
                       CO
= < H
2 a co


111
                          P < o <
                          co a x DC
                          oc — u. LU
                                       •M

                                       'o
                                       c
                                       01
                                       o
                                       CO
                                       •H1
                                       nj
                                       T3
                                       CO
                                       CO

                                       O)
                                       O)
                           LU
                                CO
                 o:c
          282
-------
      TABLE 2.   SCREENING CATEGORIES FOR AUTOMATED RECORDING NETWORKS








I.    OPERATIONAL



          NO INSTRUMENT



          MISSING MEASUREMENT



          STATUS



          CALIBRATION






II.   CONTINUITY AND RELATIONAL



     A.    INTRA-STATION



               CALIBRATION DRIFT



               LOWER DETECTABLE LIMITS



               GROSS LIMITS



               AGGREGATE FREQUENCY DISTRIBUTIONS



               RELATIONSHIP AMONG PARAMETERS



               TEMPORAL CONTINUITY



                    CONSTANT OUTPUT



                    SUCCESSIVE DIFFERENCE



     B.    INTER-STATION



               METEOROLOGICAL NETWORK UNIFORMITY



               STATISTICAL OUTLIERS




               DIXON RATIO






III. A POSTERIORI



          REVIEW OF STATION LOG



          UNUSUAL EVENTS OR CONDITIONS




          VISUAL INSPECTION OF DATA





                                  283
-------
               TABLE 3.   GROSS LIMITS AND RELATIONAL CHECKS
PARAMETER
INSTRUMENTAL LIMITS
INTERPARAMETER CONDITION

Ozone
Nitric Oxide
Oxides of
Nitrogen
Carbon Monoxide
Methane
Total Hydro-
carbons
Sulfur Dioxide
Total Sulfur
Hydrogen Sul-
fide
Aerosol Scatter
Wind Speed
Wind Direction
Temperature
Dew Point
Temperature
Gradient
Barometric
Pressure
Pyranometers
Pyrgeometers
Pyrehliometers
LOWER
.005 ppm
.005 ppm
.005 ppm
.1 ppm
.1 ppm
.1 ppm
.005 ppm
.005 ppm
.005 ppm
0.00001 m"1
.27 m/s
0°
-20°C
-30°C
-5°C
950 mb
-0.50
0.30
-0.50
UPPER
5 ppm
5 ppm
5 ppm
50 ppm
50 ppm
50 ppm
1 ppm
1 ppm
1 ppm

N0*03 £0.01
NO - NOX £ .002
NO - NOX <_ .002

CH4 - THC 1 -1
CH4 - THC <_ .1
S02 - TS 1 .002
S02 - TS l .002
H2S - TS £ .002


(NO)
(NOX)

(CH4)
(THC)
(so2)
(TS)
(H2S)
0.00099 m"1
22.2 m/s
360°
45°C
45°C
5°C
1050
2.50
0.75
2.50


DP - 1.0 <.T

mb
Langleys/min
Langleys/min
Langleys/min








                                  28^
-------
             IRREGULAR INSTRUMENT
                    RESPONSE
A) SINGLE OUTLIER                B) STEP FUNCTION




       A.
 •••••S   •••••••           •*•••••••••••••

C) SPIKE                       D) STUCK
 • •%••      ••*••%          ••*•••
 E) MISSING                     F) CALIBRATION






  • ••••*


 G) DRIFT





         Figure 4. Irregular instrument response.
                        285
-------
        TABLE 4.   MAXIMUM ALLOWABLE DEVIATIONS  FROM NETWORK MEAN
             UNDER MODERATE WINDS (NETWORK MEAN > 4 m/sec)
         WIND SPEED                    2 m/sec OR mean/3
                                        (WHICHEVER IS LARGER)

         WIND DIRECTION                30°

         TEMPERATURE                    3°C

         TEMPERATURE DIFFERENCE       0.5°C

         DEW POINT                      3°C

         ADJUSTED PRESSURE            5.0 millibars
                TABLE 5.   RAMS DATA VALIDATION VERSION 6.4


     CAUSE                                   FLAG

1.   MISSING DATA                            1037

2.   CALIBRATION DATA                        1035

3.   EXCESS DRIFT                            -VALUE

4.   FAILED RANGE TESTS                      1034

5.   LDL CHECKS                              1/2 LDL

6.   STATUS ERROR                            VALUE X 10"25

7.   FAILED RELATIONAL TESTS                 VALUE X 1032

8.   FAILED TIME CONSTANT TESTS              VALUE X 1024

9.   FAILED NETWORK TESTS                    Q.  A. REVIEW

10.  DATA MANAGEMENT OVERRIDE                VALUE X 10"15
                                286
-------
 ro
T3
 13
 C
 O)
 C






S screen
<
tr
73

6

LO
 O)
 i_
 13
 O)
-------
            UJ
            o
            oo
            O
            o;
                                          i—    CM
                                                      o
                                                      CM
                                                   O
                                                   CM
C£ UJ
ro >-
o \
:r i—
   o
   _i
   CL.
                        VO
                              00
                              CO
      CD    00
      r—    CO
•—    O
co    co
CTl    CO
      CM
ce
o
            o
            oo
                  O
                  Lf>
                  CM
               O
               CM
                              CO
                                                 CO
                  •=*
                  Cft
                  CM
                                                             LO
                                                             CO
o
a:
u,
o

UJ
s
I
— _J
o
LU LU
1— >-
Z3 \
2: oo
1—4 t—
	 [
Q.
LO
OU
f>n
CO
CM


O
O
cn
px—



o
co
f*«*.




LO
vo
co
                                  LO
                                  ko
                                  co
                                                 O
                                                 co
                                                       CM
      v:

      o
        *
      o
en
•a:
         00
            o:
            e£
         UJ >-
          ^3 o:
          (/> UJ
                  o
                  co
                               co
                         co
                            oo
                            to
                                     co
                                           LO
                                           co
            o
            CM
                                                       CM
                                                       o
                                                       CM
                                                    o
                                                    o
                                                    LO
          eu oo
          z. -z.
          i-. o
          Q 1-1
          0£. \—
          0<
          O I—
          uj oo
          cc
          LO    CO
          CM    r-
                      CM
                                  <£>
             oo
             Qi
          CC. UJ
          UJ h-
          CQ LU
                   co
                         CM
                                     i—    i—    CM
                                                             OO



                                                             I
                                                    ct:
                                              o    
-------
                HOUR AVERAGE
                    PLOTS
                IDENTIFICATION
                     OF
                 SUSPECT DATA
                   REVIEW
                    PLOTS
                     OF
                 MINUTE DATA
                INITIAL ANALYSIS
                     OF
              QUESTIONABLE DATA
                 VERIFICATION
          OUTPUT: FILE OF TIME PERIODS
                 AND CHANGES
Figure 6.  Visual review of RAMS hour average data.

                       289
-------
                                             SNOIlViS
                                                  in
                                                  o

C;
in
                                                                                                            a z
                                                                                                            UJ O
                                                                                                      my  Jf?
                                                                                                      ro ^r  I**
                                                                                                            at


                                                                                                            a>
                                                                                                            in

                                                                                                            at
                                                                                                             in
                                                                                                             r»
                                                                                                             O)
                                                                                                      _ o>   o
                                                                                                      S£S   5
                                                                                                             <
                                                                                                             a
•3- CM    CM «» CM    CM rt CM
                                000 OOOO OOOO 000  O  00000  000 00
                                CM^CM    CM^CM    CtJ^-CM    CM^-CM     CM^a**"    	---
                                                                              CMQ'CM    (M«teM
                                                                                                                    CD
                                                                                                                    a

                                                                                                                    a>
                                                                                                                    (O
                                                                                                                    •4—'
                                                                                                                    TO
                                                                                                                    tu
                                                                                                                    O)
        O
        JC

        CO



        cc


        r--'
        
-------
I    I   I   I   I   I    I   I    I   I   I   I   II
II   I   I
       I    I   I    I   I   I   I   I   I   I    I   I   I   I   I   I   I
                                                                E
                                                                           c
                                                                           o
                                                                           o
                                                                           c
                                                                           C
                                                                           .Q
                                                                           O)
                                                                           o
                                                                           c
                                                                           0)
                                                                            0)
                                                                            0
                                                                            u
                                                                            c

                                                                            E
                                                                            O)
                                                                            c
                                                                            o
                                                                            N
                                                                           O


                                                                           CO
                                                                            01
                                 291
-------
1 1 \ fl I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1
1
1

V
1

—



	







— 1 -
1
/
1

•

—





J t






lAv

. . 1 , , , , 1 . . . I 1 . . . . 1 . . . .
a
D

en
CM
CM
CD
o
^
CM
O

o
en
en
0


o
o
00
^
a



en
5
a

o
o
in
0
o
g
CM
in o in o in a *T
«- «- CM«-







CO
in
en
CM
in
^
o
CM
O
O
*~
§
CM

in
^
o
s










CO
0
c
o
(O
+-t
E
s
'ro
(D
T3
_^
L
3
to
a3
Q.
E

3
C
ai
O)
^
LL



(30) 3UniVU3dW31
      292
-------
                   HOUR
                  ARCHIVE
                 VERSION 6.4
                  MINUTE
                  ARCHIVE
                 VERSION 6.4
                                        UPDATE
                                       PROGRAM
VISUAL VALIDATION
     STUDY
   SUCCESSIVE
  DIFFERENCES
                            UPDATE
                             FILE
                             WIND SPEED
                               SPIKES
                                 CO SPIKES
                               (AFTERCALIB)
                                     NEGATIVE
                                   POLLUTANTS
                                          LDL
   HOUR
 ARCHIVE
VERSION 7.0
                       Figure 10.  RAMS update.

                                  293
-------
                     ll
                     LU {/)
                        CJ

                     oc o

                     =3 K

                     O O.

ocu 2

1-0-1
Z co <

  ~~
                                       II
                                       t— UJ UU

CO

a
                                                                   E
                                                                   o>
                                                                   01
         c
         o

         E

        •a
         a>
         N
                                                                   9)


                                                                   CD
                                                                   O)


                                                                  LL
                              294
-------
                      NAMES  AND  ADDRESSES:
                             PROGRAM



EPA/RTP INTERLABORATORY QUALITY ASSURANCE COORDINATING COMMITTEE



              DATA VALIDATION CONFERENCE, SPEAKERS



             DATA VALIDATION CONFERENCE, ATTENDEES
                                295
-------
                        Conference on Data Validation
                    Research Triangle Park, North Carolina
                              November 4, 1977

The program that was distributed before the meeting is presented.  Following
this is an alternate schedule, which was used the day of the meeting.
                                     296
-------
         CONFERENCE
                ON
     DATA VALIDATION
  Environmental Research Center Auditorium
      Highway 54 and Alexander Drive
    Research Triangle Park, North Carolina

            November 4, 1977
              Sponsored by

       ERC/RTP Interlaboratory Quality
      Assurance Coordinating Committee
U.S. ENVIRONMENTAL PROTECTION AGENCY
     Office of Research and Development
    Research Triangle Park, North Carolina
              297
-------
                     PROGRAM
  8:00
 Registration
  8:25
 Welcome
       Dr. J.K. Burchard
Sen/or ORD Official, RTP
  8:30
 Opening Remarks
          S. Hochheiser
                 EMSL
GENERAL SESSION
                                 D.J. von Lehmden
                                           EMSL
                                       Moderator
 8:35
What is Data Validation?
           R.C. Rhodes
                EMSL
  8:45      Validation Procedures Applied to
           In-Use Motor Vehicle Emission Data
                                     M.E. Williams
                                   EPA, Ann Arbor
 9:15      Computer Graphics In Data
           Validation
                                    Dr. R.H. Allen
                                       COMP-AID
 9:45
         COFFEE BREAK
 10:00      Engineering Computations and Data
           Collection Formats Useful in
           Data Validation
                                   A.C. Nelson, Jr.
                                          PEDCo
 10:30      Regional Validation of State and
           Local Air Pollution Data
                                        T.H. Rose
                                   EPA, Region IV
 11:00      Use of Precision and Accuracy
           Estimates for Validation of Data
                                    Dr. D.T. Mage
                                           HERL
 11:30
 12:30
            LUNCH
                           298
-------
                 J2 S to
                 -*UJ ®
                 <    t5

                 q    i

                 CD
           f 2
           LU§
ceo
29;
S5
O
L_I
r^
Q
.

^


GO
—.  w

   €
   o
4
  O"
  O
>

                 "5.
                 a.
                 <

                 «
                 "5

                 a
                 (A
         J3 C

           !w
         o>|
         £-2
         c15
           !Z
            r
            co£
            o
            co
            cvi
                                                o

                                                cvi
                                                   m
                                                   co
                                                   cvi
co

O

CO
f0
uj
co
u.
O

cc
UJ
Q



1
LU
cc
                                     299
-------
                      AUDITORIUM
Special Environmental
Monitoring Studies
                                  Dr. M.M. Bufalini
                                           ESRL
                                       Moderator
2.50       Data Validation for the Los Angeles
           Catalyst Study (LACS)

3:25       Validation Techniques Used in
           Continuous Air Monitoring
           Network (CHAMP)
                                       C.E. Rodes
                                           EMSL

                                    Dr. M.B. Hertz
                                           HERL
3.50       Validation System Used in the
           St. Louis Regional Air Monitoring
           Study (RAMS)
                                      R.B.Jurgens
                                            ESRL
4:15
Closing Comments
S. Hochheiser
       EMSL
                          300
-------
                       CONFERENCE ON DATA VALIDATION
                   Research Triangle Park,  North Carolina
                             November 4,  1977


                           (Alternate Schedule)
                                AUDITORIUM
8:00     Registration


8:25     Welcome


8:30     Opening Remarks



8:35     What is Data Validation?
Dr. O.K. Burchard,
Senior ORD Official, RTP

Seymour Hochheiser,
EMSL
R.C. Rhodes, EMSL
                                    301
-------
o

I—I
Q
       C  i-
      IS
       
       (J  fC
 C  O -i- Q
 O 4-> -C
•i-     O)  C
-t-» TD >  O
  
 US     $••  fO
 CO     CD T3
 J-        •!-
03     J- r-
       CO  <0
 CO     •»-> >

«t-     Q. «O
              0)
              CT
              (O
                                       • o:
                                      S- LU
                                     Q DC
                                                          CO
                                                          QJ
              C +•>  O
              O  CO
              •r- LU  C
              00     O
              •r-  >,-i-
              O  O  -4->
              O)  (0  (O
              J-  S.-0
              a.  3  -I-
                  O  r—
              <4-  O  
 O  (O
O Q
                        O
                        O
                                                       0)
                                                       CO
                 -O $-
                  C O
                           o
                           co
                                                                            co
                                                            CO
                                                            CO
                                                            LU
                                                            co
                                                                                   CO
                                                                                   01
                                                                                   3
                                                                                   CT
                                                                                    U
                                                                                    OJ
                                                                                    U
                                                                                   40
                                                                                    CO
                                                                                   co       r-
                                                                                   cos.
                                                                            CO  -P O) Q. 3
                                                                                5- O)     CO
                                                                             a>  n3 i-  $- (O
                                                                            j=  x: o -i- aj
                                                                            H-  o co «a: s:

                                                                            o
                                                                            co
                                                                             • •
                                                                            CM
       •r-  S-
       f—  O
       (O 4J
       »*-  ra
       3  S-
       CQ  O)
          •a
         •  o
                                                            LU

                                                            U.
                                                            <
          co
       a
 O
 CO

 O)
•z.
     o
  • c_>
O Q
  . LU
«C 0-
    O
 OJ •r-
 co cn
 o a>
Q; a:
                                                      -a
                                                       o
                                                      cc:
                                                                                    c  s-
                                                                                    a>  
        CO
        CO
        CD
       CO
        «0

        a>
        c
        0)
       CD
 3     C
 O.    .r-c

 O  «O r—  O
O  -M 3 •!-
    «O <4- 4->
 O)Q CO  (O
 C     00 T3
•r-  TO Z> -i-
 J-  C    .—
 co  «o co   U. O
 O
 O
                   (U
                   S-
                  CQ

                   OJ
                   at
                                   o
                                  o
                  in
                         c:
                         O r-  «J
                         •^  »0  +->
                         •»J  O  fO
                          ( —
 C  «J  O
 O 4-> O-
 •r- CO
 a>    s-
 co M- -i-
 Q£  O <
 O
 O
                                                 3     T-
                                                              CO
                                                       S-     O
                                                       o     <
               C CO
               o a»  >>
              •r- i— -O
              +•> a>  3
                o  >>
                                                           _J t—
                                                        (O     (O
                                                       4-> O) 4->
                                                        to x:  (O
                                                       Q 4-> O
                     LO


                     O
                                                    o
                                                                     o
                                                                     co
                                                                                    CO
                                                                                    c
                                                                                    o
                                                                                    IT]
                                                                                    o
                                                                    a.
                                                                    a.
                                                                   o
                                                                   O)
                                                                   Q.
                                                                  co
 00      CO

 U      (O
 co      e

O  CO r—
    ^ o

 C     r—

 C  Z3  C  S-
 CO      O  CO
 co  -o ••-  +•>
 S-  OJ +J  C
 u  co  (O  co
CO  3 Z  CJ

o
co
 • •
CM
                                                            302
-------
an
o
o
         c
         re

         S-
         3
        CJ>

        I— 00
            D.
          • cr
         i- <
        Q O
         (/) 3
         CU r-

            'o
         Q.Q-
         fO
        C3 S-

         C <
         O
         3 S-
        -Q 3
        •r- O
         s- n:
        -t->     to
         01 s- +->
        •r- O  re
        O 4- Q
                 OJ
                 cu

                 .a

                 re
                 Q-

                 O
 I
 E
 re
oo
    cn  re
r—  C -P
 re -I-  re
 o -P o

•P T3  01
 to -i— -p
•r- r—  O
-p  re  cu
 re > 4-

oo  c LU
 o  cn-p
    c r—
 cu -r- re
 01 r— CU
                                       (0
                                       s-
                                       CQ
 CU
4-

 O
                                      CU
                                              S-
                                              03
                     S- I—
                     a o:
        CU
        E  C
       •r-  O
       i—  x re
          •r- -P
        cu o re
        >     o
       •i- -a
        01  C i-
        01  re o
        cu     4-
        (_)  01
        o  cu -P  c
           o to  o
           c cu •!-
           oji- -P
           s-     re
           cu o -a
                                    3
                                    i.
                                                                 c
                                                                 re
                          tO
                        •  c
                       i-  O
                      Q O
re

01


to

(/) O


re re
                                                C
                                                CU
                                                01

                                                3
                                                CQ CtL
                                                 • 00
                                                                       oo

                                                                   "re -=c
                                                                    C OL
                                                                -a  o —
                                                                 CU •!-
                                                                 01  cn >,
                                                                rD  CU T3
                                                                   ct  3
                                                                 E     -P
                                                                 cu  01 co
                                                to
                                                O)
                                               -o
                                                O
                                                                              o  oo
                                                                              to
                                      3
                                     OO
                                                     i-
                                                       a c£. >
                                                     01 re,
                                               r-   3
                                                            
                                      ' Q
            01  3 cn    c
            >, o c     cu
            
         o
                          O
            o:
            oo
                             oo
                             a.
                                              C.
                                              o
                                              o
                                              a a:
                                                       s-
                                                   i—  O
                                                   r- _a
                                                   cu  s-
                                                   tO ef.

                                                   3  C
                                                   re  c
                                                   Q. et
                                                   Q o:
                                                     • Q_
                                                   CJ LU
                                                s-
                                                O)
                                                                         CQ
                                                                          S-
                                                                         Q
                                                                            C£.
                                                 i.
                                                 O)
                                                 to

                                                 ai
                                                                                       o
                                                                                       o
                                                                • oo
                                                              oo LU
o
o
OL
oo
oo
 S-
 O  CD
n-  c
 o  c  o  re
•1-  3 •!- -P
•p  o  01  re
 re oo  01 Q
•o     -r-
•r-  S-  E  >,
i	1- LU  S-
 re 
 re  Q.  re  c
a •=> Q •-<
 01
 3
 O
 3 re
 c 4->
•i- re
4-> Q
 c
 O O)
o c
    •^-
1- S-
 o o
    +J
 E -r-
 o c
•r- O
•P 5!
 re
                          r—  re
                          re -t->
                          s> oo
                                               re
                                               o









\s
re
cu

03

cu
cu
4-

o
0
•r—
•o cu
QJ £T
£ *r~
O _l
•I- 1
CO C E
o cu
4- -P
O -C 01
01 >>
C 3 00
o o
•r- S_ J-
-P JC CU
re i— -P
-O 3
•r- re a.
r- -P E
re re o
=> Q O
cn
C
• r—
C -P
•r- 01
CU
c -o i—
o cu
•i— O1 CU
-p => a
re s-
TD 01 3
•r- CU O
•— 3 CO
re cr
> -i- CU
C r—
re jz T-
•P O J3
re cu o
a i— 2:
01
CD
3
cr 01
•r- 3
C O
.C 3
o c cn
0) -r- C
J— -P •!-
c s-
C O 0
O 0 -P

-P C C
re -i- o
T3 2!
•i- T3
.— cu s-
re 01 -i-
5» ^3 «i





01
•^>
c
cu


o
o

cn
c
•r-
01
o
r~
CJ
         o
         o
                 o
                 oo
              o
              o
                                              LT>
                                                O
                                                o
                                                                         OO
                                                            303
-------
                        Conference on Data Validation
                   Research Triangle Park, North Carolina
                              November 4, 1977
Members of the EPA/RTP* Inter!aboratory Quality Assurance
Coordinating Committee:
Mr. Seymour Hochheiser, Chairman
Assistant to the Director
EMSL
MD-75
RTP, NC 27711
Telephone:  (919) 541-2106
            FTS:  629-2106

Mr. Raymond C. Rhodes, Secretary
Quality Assurance Specialist
STAB/EMSL
MD-75
RTP. NC 27711
Telephone:  (919) 541-2293
            FTS:  629-2293 -

Mr. Ferris B. Benson
Quality Assurance Coordinator
HERL
MD-52
RTP, NC 27711
Telephone:  (919) 541-2545
            FTS:  629-2545
Dr. Marijon M. Bufalini
TPRO/ESRL
MD-59
RTP, NC 27711
Telephone:  (919) 541-2949
            FTS:  629-2949

Mr. William B. Kuykendal
Mechanical Engineer
IERL
MD-62
RTP, NC 27711
Telephone:  (919) 541-2557
            FTS:  629-2557

Mr. Darryl von Lehmden
Chemical Engineer
QAB/EMSL
MD-77
RTP, NC 27711
Telephone:  (919) 541-2415
            FTS:  629-2415
 Acronyms arranged alphabetically and used in this and the subsequent two
 sections.
     EMSL - Environmental Monitoring and Support Laboratory
     EPA  - Environmental Protection Agency
     ESRL - Environmental Sciences Research Laboratory
     HERL - Health Effects Research Laboratory
     IERL - Industrial Environmental Research Laboratory
     MD   - Management Division
     NC   - North Carolina
     QAB  - Quality Assurance Branch
     RTP  - Research Triangle Park
     STAB - Statistical and Technical Analysis Branch
     TPRO - Technical Planning and Review Office
                                     305
-------
                        Conference on Data Validation
                   Research Triangle Park, North Carolina
                              November 4, 1977
                              List of Speakers
Dr. Rod Allen
COMP-AID, Inc.
Box 12327
RTF, NC* 27709
Telephone:  (919) 967-6376

Ms. Carolyn P. Chamblee
EPA/HERL
MD-55
RTP, NC 27711
Telephone:  (919) 541-2348
            FTS:  629-2348

Mr. Larry Claxton
EPA/HERL
MD-68
RTP, NC
Telephone:  (919) 541-2518
            FTS:  629-2518

Dr. Harold Crutcher
Consultant
35 Westall Ave.
Asheville, NC 28804
Telephone:  (919) 253-2539
            FTS:  672-0961

Dr. Thomas Curran
EPA/OAQPS
MD-14
RTP, NC 27711
Telephone:  (919) 541-5351
            FTS:  629-5351

Dr. Tyler Hartwell
RTI
Box 12194
RTP, NC 27709
Telephone:  (919) 541-6453
Dr. Marvin Hertz
EPA/HERL
MD-56
RTP, NC 27711
Telephone:  (919) 541-3124
            FTS:  629-3124

Mr. William F. Hunt
EPA/OAQPS
MD-14
RTP, NC 27711
Telephone:  (919) 541-5351
            FTS:  629-5351

Mr. Robert B. Jurgens
EPA/ESRL
MD-80
RTP, NC 27711
Telephone:  (919) 541-4545
            FTS:  629-4545

Mr. William E. Klint
NOAA
Federal Building
Asheville, NC 28801
Telephone:  (704) 258-2850, ext. 755
            FTS:  672-0755

Dr. David T. Mage
EPA/HERL
MD-56
RTP, NC 27711
Telephone:  (919) 541-3121
            FTS:  629-3121
                                     306
-------
Mr. Joseph E. McCarley, Jr.
EPA/ESED
MD-13
RTP, NC 27711
Telephone:  (919) 541-5245
            FTS:  629-5245

Mr. A. Carl Nelson
PEDCo
Suite 201
5055 Duke Street
Durham, NC 27701
Telephone:  (919) 688-6338

Ms. Joan Novak
EPA/ESRL
MD-80
RTP, NC 27711
Telephone:  (919) 541-4545
            FTS:  629-4545

Mr. C. Don Paul sell
EPA
2565 Plymouth Road
Ann Arbor, MI 48105
Telephone:   (313) 668-4342
            FTS:  374-8342
Mr. Charles E. Rodes
EPA/EMSL
MD-76
RTP, NC 27711
Telephone:  (919) 541-3076
            FTS:  629-3076

Mr. Thomas H. Rose
EPA/SB
College Station Road
Athens, GA 30605
Telephone:  (404) 546-3489
            FTS:  250-3489

Ms. Marcia Williams
EPA
2565 Plymouth Road
Ann Arbor, MI 48105
Telephone:  (313) 688-4342
            FTS:  374-8323
 See previous section for definition of acronyms.
 viously are defined as follows:
            Acronyms  not  used  pre-
     NOAA  - National Oceanic and Atmospheric Administration
     OAQPS - Office of Air Quality Planning and Standards
     RTI   - Research Triangle  Institute
     GA    - Georgia
     MI    - Michigan
                                     307
-------
                     Conference on Data Validation
                Research Triangle Park, North Carolina
                           November 4, 1977
                           List of Attendees
Gerald G. Akland
EPA/EMSL/STAB
MD-75
RTP, NC  27711
Tel:  (919) 541-2346
      FTS:  629-2346

Rod Allen
COMP-AID
P.O. Box 12327
RTP, NC  27709
Tel:  (919) 967-6376

Joseph S. All
EPA/HERL
MD-55
RTP, NC  27711
Tel:  (919) 541-2240
      FTS:  629-2240

J. Anderson
Rockwell International
5529 Chapel Hill Blvd.
Durham, NC  27707
Tel:  (919) 942-2407

D. W. Armentrout
PEDco
1499 Chester Road
Cincinnati, OH  45246
Tel:  (513) 782-4700

James D. Ashworth
U.S. Army Corps of Engineers
P.O. Box 2127
Huntington, WV  25721
Tel:  (FTS) 924-5694
Andy Berlin
Xonics, Inc.
P.O. Box 12415
RTP, NC  27709
Tel:  (919) 541-3080

John Boston
EPA/SDMO
MD-55
RTP, NC  27711
Tel:  (919) 541-2337

Frank Briden
EPA/IERL
MD-60
RTP, NC  27711
Tel:  (919) 541-2557
      FTS:  629-2557

T. G. Brna
EPA/IERL
MD-61
RTP, NC  27711
Tel:  (919) 541-2915
      FTS:  629-2915

Steve Bromberg
EPA/QAB
MD-77
RTP, NC  27711
Tel:  (919) 541-2273
      FTS:  629-2273
                                   308
-------
Robert Browning
EPA/ESRL
MD-80
RTP, NC  27711
Tel:  (919) 541-4545
      FTS:  629-4545

Sam Bryan
EPA
Chapel Hill, NC  27514
Tel:  (919) 541-2872
      FTS:  629-2872

Marijon M. Bufalini
EPA/ESRL
MD-59
RTP, NC  27711
Tel:  (919) 541-2949
      FTS:  629-2949

Bob Burton
EPA/HERL
MD-52
RTP, NC  27711
Tel:  (919) 541-1394
      FTS:  629-1394

D.  Calafiore
EPA/HERL
MD-54
RTP, NC  27711
Tel:  (919) 541-2674
      FTS:  629-2674

Oon Carpenter
EPA
Ann Arbor, MI  48105
Tel:  (FTS) 374-4293

Tom Caldwell
Xonics,  Inc.
P.O. Box  12415
RTP, NC   27709
Tel:   (919) 541-3080

Susan S.  Casada
Northrop  Services,  Inc.
P.O. Box  12313
RTP, NC   27709
Tel:   (919)  549-0611
Carolyn Chamblee
EPA/HERL
MD-55
RTP, NC  27711
Tel:  (919) 541-2518
      FTS:  629-2518

Ronald Chambler
NCHS - DPB
Box 12214
RTP, NC  27709
Tel:  (919) 541-4422
      FTS:  629-4422

Jonn Chavy
Xonics, Inc.
P.O. Box 12415
RTP, NC  27709
Tel:  (919) 541-3080

Larry Claxton
EPA/HERL
MD-68
RTP, NC  27711
Tel:  (919) 541-2518
      FTS:, 629-2518

John Clements
EPA/EMSL
MD-77
RTP, NC  27711
Tel:  (919) 541-2196
      FTS:  629-2196

Wayne Clements
TVA
345 Evans  Bldg.
Knoxville, TN   37902
Tel:  (615) 632-4579

William M. Cox
EPA/OAQPS
MD-14
RTP, NC  27711
Tel:  (919) 541-5312
      FTS:  629-5312
                                   309
-------
C. L. Cox, Jr.
EPA/ADM
MD-30
RTP, NC  27711
Tel:  (919) 541-2296
      FTS:  629-2296

Tom Curran
EPA/MDAD
MD-14
RTP, NC  27711
Tel:  (919) 541-5351
      FTS:  629-5351

Bob Currin
Xonics, Inc.
P.O. Box 12415
RTP, NC  27709
Tel:  (919) 541-3080

Harold Crutcher
35 Westall
Asheville, NC  28801
Tel:  (919) 253-2539

Robin Davis
EPA/CH/HERL
MD-73
RTP, NC  27711
Tel:  (919) 541-2872
      FTS:  629-2872
Davis Davis
P.O. Box 12313
RTP, NC  27711
Tel:  (919) 549-2333

Robert Denny
EPA/QAB
MD-77
RTP, NC  27711
Tel:  (919) 541-2785
      FTS:  629-2785

0. L. Dowler
EPA/HERL
MD-56
RTP, NC  27711
Tel:  (919) 541-3126
      FTS:  629-3126
Ronald Drago
EPA/MDAD
MD-14
RTP, NC  27711
Tel:  (919) 541-5486
      FTS:  629-5486

Cary Eaton
RTI
P.O. Box 12194
RTP, NC  27709
Tel:  (919) 541-6920

Foy W. Edwards
TVA
345 EB
Knoxville, TN  37902
Tel:  (615) 632-2071

Susan B. Edwards
NRCD - Air Quality
P.O. Box 27687
Raleigh, NC  27611
Tel:  (919) 733-5125

Gardner Evans
EPA/STAB
MD-75
RTP, NC  27711
Tel:  (919) 541-2292
      FTS:  629-2292

Gary Evans
EPA/STAB
MD-75
RTP, NC  27711
Tel:  (919) 541-2294
      FTS:  629-2294

B. E. Edmonds
EPA/EMSL
MD-76
RTP, NC  27711

Donald H. Fair
EPA/STAB
MD-75
RTP, NC  27711
Tel:  (919) 541-2732
      FTS:  629-2732
                                   310
-------
Bob Faoro
EPA/OAQPS
MD-14
RTP, NC  27711
Tel:  (919) 541-5351
      FTS:  629-5351

Paul Feder
NIEHS - EBB
P.O. Box 12237
RTP, NC  27709
Tel:  (919)  541-5402
      FTS:   629-5402

H. L. Fisher
EPA/HERL
MD-74
RTP, NC  27711
Tel:  (919) 541-2631
      FTS:  629-2631

R. Fisher
EPA/ESRL
MD-80
RTP, NC  27711
Tel:  (919) 541-4551
      FTS:  629-4551

Nancy Gaskins
RTI
P.O. Box  12194
RTP, NC  27709
Tel:  (919) 541-6915

Gerald Gipson
EPA/OAQPS
MD-14
RTP, NC  27711
Tel:  (919) 541-5486
      FTS:  629-5486

Maurice  E.  Graves
Northrop  Services,  Inc.
P.O. Box  12313
RTP, NC   27709
Tel:   (919) 549-0411
D. Glover
Rockville International
5529 Chapel  Hill Blvd.
Durham, NC  27707
Tel:  (919)  942-2407

Bonnee Gryder
Xonics, Inc.
P.O. Box 12415
RTP, NC  27709
Rel:  (919)  541-3080

Ed Hanks
EPA/MDAD
MD-14
RTP, NC  27711
Tel:  (919) 541-5474
      FTS:  629-5474

F. Hageman
Xonics, Inc
P.O. Box 12415
RTP, NC  27709
Tel:  (919) 541-3080

Martin Hamilton
NIENS - EBB
P.O. Box 12237
RTP, NC  27709
Tel:  (919) 541-5402

Tyler Hartwell
RTI
P.O. Box 12194
RTP, NC  27709
Tel:   (919) 541-6453

Tom Heiderscheit
EPA/HERL
MD-55
RTP, NC  27711
Tel:   (919)   541-2468
      FTS:    629-2468

Marvin Hertz
EPA/HERL, MD-56
RTP, NC  27711
Tel:   (919) 541-3124
      FTS:  629-3124
                                   311
-------
David 0. Hinton
EPA/HERLD
MD-56
RTF, NC  27711
Tel:  (919) 541-3126
      FTS:  629-3126

Seymour Hochheiser
EPA/EMSL
MD-75
RTF, NC  27711
Tel:  (919) 541-2106
      FTS:  629-2106

William F. Hunt
EPA/OAQPS
MD-14
RTP, NC  27711
Tel:  (919) 541-5351
      FTS:  629-5351

R. C. Jordan
Northrop Services, Inc.
P.O. Box 12313
RTP, NC  27709
Tel:  (919) 541-2766

Robert B. Jurgens
EPA/ESRL
MD-80
RTP, NC  27711
Tel:  (919) 541-4545
      FTS:  629-4545

Robert Jungers
EPA/ESRL
MD-78
RTP, NC  27711
Tel:  (919) 541-2456
      FTS:  629-2456

William E. Klint
NOAA
Fereral Building
Asheville, NC  28801
Tel:  (704) 258-2850
      FTS:  672-0755
William B. Kuykendal
EPA/IERL
MD-62
RTP, NC  27711
Tel:  (919) 541-2557
      FTS:  629-2557

Ralph I. Larsen
EPA/ERSL
MD-80
RTP, NC  27711
Tel:  (919) 541-4565
      FTS:  629-4565

William D. Lee
EPA/QAB
MD-75
RTP, NC  27711
Tel:  (919) 541-2293
      FTS:  629-2293

Robert E. Lee
EPA/HERL
MD-51
RTP, NC  27711
Tel:  (919) 541-2283
      FTS:  629-2283

Barry Levene
EPA - Region VIII
1860 Lincoln Street
Denver, CO  80203
Tel: (303) 837-2226
     FST:  327-2226

Dan Litton
EPA/HERL
MD-73
RTP, NC  27711
Tel:  (919) 541-2873
      FTS:  629-2873

Raymond Michie, Sr.
RTI
P.O. Box 12194
RTP, NC 27709
Tel:  (919) 541-6492
                                    312
-------
Randell Morgan
Xonics, Inc.
P.O. Box 12415
RTP, NC  27709
Tel:  (919) 541-3080

Gerald K. Moss
EPA/MDAO
MD-14
RTP, NC  27711
Tel:  (919) 541-5335
      FTS:  629-5335

George C. Murray, Jr.
NCAQ
P.O. Box 27687
Raleigh, NC  27611
Tel:  (919) 733-5125

J. E. McCarley, Jr.
EPA/ESED
MD-13
RTP, NC  27711
Tel:  (919) 541-5243
      FTS:  629-5243

Linda J. McDay
TVA
345-EB
Knoxville, TN  37902
Tel:  (615) 632-2071

John S. Nader
EPA/ESRL
MD-46
RTP, NC  27711
Tel:  (919) 541-0385

A. Carl Nelson
PEDco
5055 Duke  Street
Durham, NC  27701
Tel:   (919) 688-6338

William C. Nelson
EPA/HERL
MD-53
RTP, NC  27711
Tel:   (919) 541-2330
      FTS:  629-2330
W. Norris
Xonics, Inc.
P.O. Box 12415
RTP, NC  27709
Tel:  (919) 541-3080

Joan Novak
EPA/ESRL
MD-80
RTP, NC  27711
Tel:  (919) 541-4545
      FTS:  629-4545

Barbara Nye
EPA/HERL
MD-56
RTP, NC  27711
Tel:  (919) 541-3125
      FTS:  629-3125

Blaine F. Parr
EPA/HERL
MD-56
RTP, NC  27711
Tel:  (919) 541-3123
      FTS:  629-3123

C.  Don Paul sell
EPA
2565 Plymouth Road
Ann Arbor,  MI  48105
Tel:  (313) 668-4342
      FTS:  374-8342

Debora R. Pizer
EPA/HERL
MD-56
RTP, NC  27711
Tel:  (919) 541-3124
      FTS:  629-3124

Francis Pooler
EPA/ESRL
MD-59
RTP, NC  27711
Tel:  (919) 541-2857
      FTS:  629-2857
                                   313
-------
James Reagan
EPA/ESRL
MD-59
RTP, NC  27711
Tel:  (919) 541-4486
      FTS:  629-4486

Joan Reece
EPA/HERL
MD-55
RTP, NC  27711
Tel:  (919) 541-2466
      FTS:  629-2466

Raymond C. Rhodes
EPA/STAB
MD-75
RTP, NC  27711
Tel:  (919) 541-2293
      FTS:  629-2293

Wilson Riggan
EPA/HERL
MD-54
RTP, NC  27711
Tel:  (919) 541-2674
      FTS:  629-2674

Charles D. Robson
EPA/HERL
MD-67
RTP, NC  27711
Tel:  (919) 541-2625
      FTS:  629-2625

Charles E. Rodes
EPA/EMSL
MD-76
RTP, NC  27711
Tel:  (919) 541-3076
      FTS:  629-3076

Tom Rose
EPA - Region IV
College Station Road
Athens, 6A  30601
Tel:  (404) 546-3111
Glenn Ross
NCAQ
P.O. Box 27687
Raleigh, NC  27611
Tel:  (919) 549-8941

Bill Sensing
EPA/IERL
MD-62
RTP, NC  27711
Tel:  (919) 541-2557
      FTS:  629-2557

Frank D. Slaveter
EPA
401 M Street, S.W.
EN 340
Washington, DC  20460
Tel:  (202) 755-1572

Ben  Smith
EPA/IERL
MD-62
RTP, NC  27711
Tel:  (919) 541-2557
      FTS:  629-2557

Paul E. Smith
Xonics, Inc.
P.O. Box 12415
RTP, NC  27709
Tel :  (919) 549-8941

Ralph Sullivan
Xonics, Inc.
P.O. Box 12415
RTP, NC  27709
Tel:  (919) 549-8941

Jake Summers
EPA/MDAD
MD-14
RTP, NC  27711
Tel:  (919) 541-5395
      FTS:  629-5395

Jose  Sune
EPA/HERL
MD-56
RTP, NC  27711
Tel:  (919) 541-3127
      FTS:  629-3127
                                   314
-------
Richard Symonds
Catalytic, Inc.
P.O. Box 240232
Charlotte, NC  28224
Tel:  (704) 542-4107

Charles Tate
Xonics, Inc.
P.O. Box 12415
RTP, NC  27709
Tel:  (919) 541-3080

C. E. Tatsch
RTI
P.O. Box 12194
RTP, NC  27709
Tel:  (919) 541-5945

Lawrence E. Truppi
EPA/HERL
MD-54
RTP, NC  27711
Tel:  (919) 541-2861
      FTS:  629-2861

John Van Bruggen
EPA/HERL
MD  55
RTP, NC  27711
Tel:   (919) 541-2465
      FTS:  629-2465

Darryl vonLehmden
EPA/QAB
MD-77
RTP, NC  27711
Tel:   (919) 541-2415
       FTS:  629-2415

Betty Wagman
EPA/EMSL
MD-56
RTP, NC   27711
Tel:   (919) 541-3125
       FTS:  629-3125

Kim Wattenbarger
Xonics,  Inc.
P.O. Box  12415
RTP, NC   27709
Tel:   (919) 541-3080
J. E. Whitney
EPA/WA
RD-680
401 M Street, S.W.
Washington, DC  20460
Tel:  (202) 426-4477

Cindy Wingarden
Xonics, Inc.
P.O. Box 12415
RTP, NC  27709
Tel:  (919) 541-3080

Mack Mil kins
EPA/EMSL
MD-45
RTP, NC  27711
Tel-(919) 541-3119
    FTS:  629-3119

Marcia Williams
EPA
2565 Plymouth Road
Ann Arbor, MI  48105
Tel:  (313) 688-4342
      FTS:  374-8323

Max Woodbury
Rockwell  International
5529 Chapel Hill  Blvd.
Durham, NC  27707
Tel:  (919) 493-2471

Chris Woodbury
Xonics, Inc.
P.O. Box  12415
RTP, NC   27709
Tel:   (919) 541-3080
                                   315
-------
TECHNICAL REPORT DATA
(Please read fnuructions on the reverse before completing}
1 REPORT NO. 2.
EPA-600/9-79-042
4 TITLE ANDSUBTITLE
DATA VALIDATION CONFERENCE, Proceedings
7 AUTHOR(S)
Raymond C. Rhodes and Seymour Hocheiser, Editors
9 PERFORMING ORGANIZATION NAME AND ADDRESS
Office of Research and Development
Environmental Monitoring and Support Laboratory
Research Triangle Park, N. C. 27711
12 SPONSORING AGENCY NAME AND ADDRESS
3. RECIPIENT'S ACC£SSIOf*NO
5 REPORT DATE
September 1979
6. PERFORMING ORGANIZATION CODE
8. PERFORMING ORGANIZATION REPORT NO.
10. PROGRAM ELEMENT NO.
11. CONTRACT/GRANT NO
13, TYPE OF REPORT AND PERIOD COVERED
14. SPONSORING AGENCY CODE
EPA 600/08
15 SUPPLEMENTARY NOTES
16 ABSTRACT
The proceedings document technical presentations made at a l-day
   conference  on  Data  Validation  for environmental data.  The conference
   was  hosted  and sponsored  by the U.S.  Environmental  Protection Agency,
   Research  Triangle Park I nter laboratory Quality Assurance Coordinating
   Committee on November  k,  1977, at the Research Triangle Park.  Various
   approaches  and techniques used for data validation are presented.
17. KEY WORDS AND DOCUMENT ANALYSIS
a. DESCRIPTORS
Data Val idat ion
Data Screening
Data Editing
Qua! i ty Assurance
Out 1 iers
Stat ist ics
Environmental Data
18. DISTRIBUTION STATEMENT
Release to publ ic
b. IDENTIFIERS/OPEN ENDED TERMS
Environmental monitoring
Data management
19 SECURITY CLASS (This Report/
Unclass if ied
20 SECURITY CLASS (This page/
Unclass if ied
c COS AT i 1 icld. Group
43F
68A
21 NO OF PAGES
315
22. PRICE
EPA Form 2220-1 (9-73)
                                         316
-------