United States
Environmental Protection
Agency
Environmental Monitoring
Systems Laboratory
Research Tnangle'Park NC 27711
EPA-600/9-79-042
September 1979
vvEPA
Research and Development
Data Validation
Conference Proceeding^
-------
RESEARCH REPORTING SERIES
Research reports of the Office of Research and Development, U.S. Environmental
Protection Agency, have been grouped into nine series. These nine broad cate-
gon' - % - -e established to facilitate further development and application of en-
vironm, ,di technology. Elimination of traditional grouping was consciously
planned to foster technology transfer and a maximum interface in related fields
The nine series are:
1. Friivnonmental Health Effects Research
2 Br/ironmental Protection Technology
3. Ecological Research
4. E'r.Konmental Monitoring
5 Secioeconomic Environmental Studies
8 Sc«;;ntific and Technical Assessment Reports (STAR)
7 io;.jragency Energy-Environment Research and Development
3 "' ^.<>;;,a\" Reports
9 M'i'nellaneous Reports
This ref-:vt has been assigned to the MISCELLANEOUS REPORTS series. This
series is , t,:erved for reports whose content does not fit into one of the other specific
series Conference proceedings, annual reports, and bibliographies are examples
of mitcv.'uneous reports.
EPA REVIEW NOTICE
• r•,.$ been reviewed by the U.S. Environmental Protection Agency, and
. -'a' publication. Approval does not signify that the contents necessarily
refleU 1; •; views and policy of the Agency, nor does mention of trade names or
commH.G.a! products constitute endorsement or recommendation for use.
jr.iont is availabletothepublicthroughtheNationalTechnical Information
..\irgfield, Virginia 22161.
-------
DATA VALIDATION CONFERENCE
Proceedings
Hosted and Sponsored by
The U.S. Environmental Protection Agency
RTF Inter!aboratory Quality Assurance Coordinating Committee
November 4, 1977
Edited by
Raymond C. Rhodes
and
Seymour Hochheiser
Office of Research and Development
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
-------
DISCLAIMER
This report is a compilation of the papers presented at the Data
Validation Conference. Each individual paper may not have received peer
technical review. Technical review and clearance of these proceedings
was based primarily on the review of the executive summary and on the
general merits of the proceedings as a total entity.
The content of these proceedings do not necessarily reflect the
views and policies of the U.S. Environmental Protection Agency nor does
mention of trade names or commercial products constitute endorsement or
recommendation for use.
n
-------
FOREWORD
Measurement and monitoring research efforts are designed to anticipate
potential environmental problems, to support regulatory actions by developing
an in-depth understanding of the nature and processes that impact health and
the ecology, to provide innovative means of monitoring compliance with regu-
lations and to evaluate the effectiveness of health and environmental pro-
tection efforts through the monitoring of long-term trends. The Environmental
Monitoring Systems Laboratory, Research Triangle Park, North Carolina, has
the responsibility for: assessment of environmental monitoring technology
and systems; implementation of agency-wide quality assurance programs for
air pollution measurement systems; and supplying technical support to other
groups in the Agency including the Office of Air, Noise and Radiation, the
Office of Toxic Substances and the Office of Enforcement.
Data validation, an element of quality assurance is necessary to provide
accurate and reliable environmental data. Data of known and acceptable
quality are needed for measuring compliance with regulations, assessing
health effects, and developing optimum strategies to cope with environmental
pollution situations. A unified treatment of validation of particular types
of data bases is needed to support broad-scale uses of these data. Current
in-use data validation procedures were presented at the conference to promote
a better understanding of available techniques. Hopefully, the conference
and these proceedings will provide an impetus toward the development of more
unified and systematic approaches to data validation.
Thomsas R. Hauser, Ph. D.
Director
Environmental Monitoring Systems Laboratory
Research Triangle Park, North Carolina '
-------
ABSTRACT
These proceedings are a record for future reference of the technical
presentations made at a conference on Data Validation for environmental
data. The conference was hosted and sponsored by the U. S. Environmental
Protection Agency, Research Triangle Park Interlaboratory Quality Assurance
Coordinating Committee on November 4, 1977, at the Research Triangle Park.
Various data validation approaches and techniques were presented and are
documented in this publication.
iv
-------
CONTENTS
FOREWORD iii
ACKNOWLEDGEMENTS vii
INTRODUCTION 1
EXECUTIVE SUMMARY AND RECOMMENDATIONS 3
Seymour Hochheiser
Raymond C. Rhodes
WHAT IS DATA VALIDATION? 7
Raymond C. Rhodes
THE SHEWHART CONTROL CHART TEST FOR SCREENING
24-HOUR AIR POLLUTION MEASUREMENTS 17
William F. Hunt
DISTRIBUTION GAP TEST FOR HOURLY AIR POLLUTION
DATA 25
Thomas C. Curran
USE OF STATISTICAL SAMPLING IN VALIDATING
HEALTH EFFECTS DATA 31
Carolyn P. Chamblee
USE OF SUCCESSIVE TIME DIFFERENCES AND DIXON
RATIO TEST FOR DATA VALIDATION 39
Tyler Hartwell
CLUSTER ANALYSIS AS A DATA VALIDATION
TECHNIQUE 71
Harold L. Crutcher
ENGINEERING COMPUTATIONS AND DATA COLLECTION
FORMATS USEFUL IN DATA VALIDATION 81
A. Carl Nelson, Jr.
VALIDATION PROCEDURES APPLIED TO IN-USE MOTOR
VEHICLE EMISSION DATA 99
Marcia E. Williams
-------
DATA VALIDATION TECHNIQUES USED IN MOBILE
SOURCE TESTING 125
C. Don Paul sell
VALIDATION OF CONTINUOUS STACK MONITORING DATA 131
Joseph E. McCarley
SCREENING CHECKS USED BY THE NATIONAL CLIMATIC
CENTER 135
William E. Klint
DATA VALIDATION FOR UPPER AIR SOUNDING DATA
AND EMISSION INVENTORY DATA 199
J. H. Novak
VALIDATION OF BIOMEDICAL DATA THROUGH AN ON-LINE
COMPUTER SYSTEM 209
Larry D. Claxton
REGIONAL VALIDATION OF STATE AND LOCAL AIR
POLLUTION DATA 219
Thomas H. Rose
DATA VALIDATION FOR THE LOS ANGELES CATALYST
STUDY (LACS) 223
Charles E. Rodes
VALIDATION TECHNIQUES USED IN CONTINUOUS AIR
MONITORING 237
Marvin B. Hertz
USE OF PRECISION AND ACCURACY ESTIMATES FOR
VALIDATION OF DATA 247
David T. Mage
VALIDATION SYSTEM USED IN THE ST. LOUIS REGIONAL
AIR MONITORING STUDY (RAMS) 265
Robert B. Jurgens
NAMES AND ADDRESSES:
PROGRAM 295
EPA/RTP INTERLABORATORY QUALITY ASSURANCE
COORDINATING COMMITTEE 305
DATA VALIDATION CONFERENCE, SPEAKERS 306
DATA VALIDATION CONFERENCE, ATTENDEES 308
-------
ACKNOWLEDGMENTS
The cooperation of all participants in the conference is gratefully
acknowledged. Particular appreciation is due to the participants who
prepared written copy of their presentations for post-documentation of
the conference.
vn
-------
SECTION I
INTRODUCTION
A conference on data validation was held on November 4, 1977, at
Research Triangle Park, North Carolina. The conference was sponsored,
organized, and hosted by the Environmental Research Center - Research
Triangle Park (ERC-RTP) Interlaboratory Coordinating Committee. Con-
ference participants represented (a) EPA's RTP Research Laboratories and
Program Offices, (b) an EPA Regional Office, (c) EPA Contractors, and (d)
the National Climatic Center, Asheville, North Carolina.
Welcoming remarks were made by Dr. John K. Burchard, Director of the
Industrial Engineering and Research Laboratory, and Senior ORD official at
RTP. Each of the speakers presented their current practices for data vali-
dation. The conference provided an opportunity for a free exchange of
viewpoints and techniques and was intended to enhance the state-of-the-art
of Data Validation.
-------
EXECUTIVE SUMMARY AND RECOMMENDATIONS
by.
Seymour Hochheiser
and
Raymond C. Rhodes
Environmental Monitoring and Support Laboratory
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
-------
SECTION II
EXECUTIVE SUMMARY AND RECOMMENDATIONS
The nature and scope of data validation activities varies considerably
among those involved with the collection, analysis, review, and use of
environmental data for research and monitoring purposes. The objective of
the conference was to review and discuss the various current practices of
data validation, and to provide a forum for the exchange of information.
It was intended that as a result of this conference, the function of data
validation might become more specifically defined and more uniformly and
widely implemented.
Following is a general review of the contents of the papers presented
at the conference.
DEFINITIONS AND SCOPE OF DATA VALIDATION
The conference authors implied differing definitions of data validation
and used various words relating to data validation activities. The scope of
activities extended from that of simple checks for data transfer errors to
that of a total quality assurance program. Words used in relation to data
validation activities included: editing, screening, verification checking,
auditing, and qualification.
TYPES OF DATA INVOLVED
The type of data involved in most of the papers was ambient air
pollution concentrations. However, other types of data -- including epidemi-
ological, meteorological, stationary source emission, mobile source emission,
and in vitro and in vivo bioassays -- were discussed.
Data were validated in a variety of forms, including strip charts, hand-
written forms, computer printouts, magnetic tape, and optical sensing records.
In some cases the activities of data validation, as indicated by the
authors, were performed by those producing the data; in other cases, by
independent reviewers; and in still other cases, by the users.
SIZE OF DATA BLOCKS CONSIDERED
In most cases, the data being validated were reviewed in definable
blocks, varying from one day to one year. In one case, a real-time
-------
computerized system was used, and in another, the results of single tests
were reviewed individually. The number of data values considered as a
block, or group of data, varied from only three results for stack sampling
tests to over 30,000 for pibals (meteorological balloons).
TYPES OF TECHNIQUES EMPLOYED
Both manual and computer techniques were used, usually depending upon
the amount of data involved. Some systems employed both manual and computer
methods. Only a few of the papers included graphical techniques for reviewing
data. In less than half of the systems described, specific checks were
made of the identification (or coding) of the data. In about half of the
data validation systems, statistical techniques were used. Some of the
techniques included were Dixon outlier tests, Shewhart control chart limits,
exponential distributions, and asymptotic singular decomposition. In several
instances, statistical sampling plans were utilized to select specific data
sets for checking.
WRITTEN DATA VALIDATION PROCEDURES
In only one case were detailed procedures written describing the data
validation activities and criteria.
FLAGGING AND REJECTION OF DATA
In most cases the questionable results of the data validation process
were flagged, i.e., identified for more detailed evaluation and/or identified
as questionable values in the data records. In most cases, as a result of
data validation, questionable data were invalidated (rejected) or were
corrected as a result of further investigation.
RECOMMENDATIONS
Although no official conference recommendations were made, the following
recommendations were generally expressed:
1. The functions and scope of data validation should be more specifically
defined.
2. Data validation techniques should be presented and summarized in some
logical manner.
3. Data validation systems should be recommended or specified for use in
certain situations.
The above recommendations could be pursued in several possible ways.
One would be for a task group to be formed to develop standardized nomencla-
ture, to summarize in a systematic way various activities and techniques of
data validation, and to recommend data validation systems for specific
-------
situations. The above tasks could also be performed by a knowledgeable
contractor.
SUMMARY
It is evident that the current practices of data validation vary
widely in nature and scope. The conference provided an excellent opportunity
for an open exchange of information concerning data validation practices
and should result in a broader utilization of data validation techniques.
In addition, the conference discussions and these proceedings should promote
a greater awareness of the need to develop a more organized and unified
approach to this important element of quality assurance for environmental
data.
-------
WHAT IS DATA VALIDATION?
by
Raymond C. Rhodes
Environmental Monitoring and Support Laboratory
U.S. Environmental and Protection Agency
Research Triangle Park, North Carolina 27711
-------
WHAT IS DATA VALIDATION?
R.C.Rhodes
Just what is data validation?
Many of us are involved in activities which, we feel, constitute data
validation, or at least, a part of a data validation process. My first
encounter with the term "data validation" in connection with air pollution
monitoring data occurred about five years ago. Since then my concepts of
the function and scope of data validation have expanded considerably, and
in fact they are still changing.
I'm sure that each person attending this conference has his or her own
concept—probably different from anyone else's—of data validation. Whatever
these concepts are, we're here to exchange our ideas, thoughts, and techni-
ques on the subject. I feel sure that each of us will learn something new
and useful for our own particular area of application.
Before we hear the other speakers, let's think a little bit about this
subject of "data validation." Webster defines "validation" as follows:
VALIDATION
— THE ACT OR PROCESS OF VALIDATING*
That doesn't help us very much, does it? So we might look at the definition
of the word "valid" itself.
VALID
-- HAVING LEGAL EFFICACY OR FORCE
— SUPPORTED BY OBJECTIVE TRUTH
*The capitalized items in this paper were used as visual aids for the
presentation.
8
-------
This definition is getting a little closer to our desired meaning in the
data validation sense. The word "valid" does have some connotation of a
"stamp of approval," indicating that things are "right."
In the "Quality Assurance Handbook for Air Pollution Measurement
Systems," EPA 600/9-76-005, the following definition is given:
DATA VALIDATION
-- THE PROCESS WHEREBY DATA ARE FILTERED AND ACCEPTED OR
REJECTED BASED ON A SET OF CRITERIA
There is a short section on "data validation" in the Handbook, which you
may be interested in reading. My own definition, which I use in the "Data
Validation" lecture of Air Pollution Training Institute (APTI) Course 470,
"Quality Assurance for Air Pollution Measurement Systems," is somewhat more
detailed:
DATA VALIDATION
— A SYSTEMATIC PROCEDURE OF REVIEWING A BODY OF DATA
AGAINST A SET OF CRITERIA TO PROVIDE ASSURANCE OF
ITS VALIDITY PRIOR TO ITS INTENDED USE
The above definition says, in other words, that "a body of data" is reviewed
according to some previously defined plan in a rather comprehensive and
extensive way using all available expertise and knowledge at hand to assure
that the data are technically consistent, correctly identified, and contain
no obvious errors before the data are used.
Following are a number of terms which seem to involve functions or
activities related to data validation.
RELATED TERMS
— DATA EDITING
— DATA SCREENING
-- DATA AUDITING
-- DATA VERIFICATION
-- DATA EVALUATION
— DATA QUALIFICATION
-- DATA QUALITY ASSESSMENT
9
-------
During the remaining presentations today, you will hear further references
or usages of some of these terms. Since some of these terms are used inter-
changeably, I believe we need more specific definitions for each of the
above to better understand how each one is, or is not, involved in data
validation. As I define data validation and the above terms, I would
include data editing, data screening, data auditing and data verification
as part of data validation. However, according to my definitions, data
evaluation, data qualification and data assessment are not parts of the data
validation process.
Before considering some of the aspects of data validation, let us
consider the obvious need for data validation and its relation to quality
assurance. EPA and other organizations need good data from which to make
good decisions. This truism applies equally well to research studies as well
as to monitoring programs although data validation is not usually considered
as a separate activity in research efforts.
GOOD DATA > GOOD DECISIONS
-- RESEARCH STUDIES
— MONITORING PROGRAMS
Since data validation is concerned with an assurance of having obtained good
data, one might think that data validation includes.everything that is done
to get valid, or good, data. But that is the concern of quality assurance.
Whereas quality assurance is concerned with all activities which may affect
data quality, the activities of data validation involve an after-the-fact
review of the data, along with related information, to assure that valid
data have, in fact, been obtained. As such, data validation is considered
as only one element of quality assurance.
DATA VALIDATION
li
AN ELEMENT OF
QUALITY ASSURANCE
In the APTI Course 470, data validation is one of 23 elements of quality
assurance as shown by the following "Q.A. Wheel" in the Quality Assurance
Handbook.
10
-------
QUALITY ASSURANCE ELEMENTS AND RESPONSIBILITIES
(THE QUALITY ASSURANCE WHEEL)
%
%>,
*/» >v />
%
-a
s
IP -
%**
^
fflSTICAL
PROCUREMENT Q,C,
^
//^
•f ^
• I
£
f
^
g
•§
«=*:
^
^
"nr—
CJ3
z:
2:
2
Q_
>-
1^*™*"
H!
-------
What are some of the attributes of a data validation system? With no
intent to restrict the other speakers concerning their views of data valida-
tion, following are some key features, in my opinion, of a data validation
system.
After-the-Fact Review. Data validation is an after-the-fact review of
data to assure that good data have been obtained. Many activities of
quality assurance are concerned with the planning and acquiring of data, but
these activities are accomplished before or during the acquisition of the
data. Data validation activities (a part of quality assurance) are accom-
plished after the data are obtained.
Applied to Blocks of Data. Data validation is applied to incremental
blocks of data. The blocks in case of air monitoring data that are sent to
the National Air Data Bank (NADB) could be the quarterly blocks of data sub-
mitted to the NADB. The blocks of data for source emissions testing would
most likely be the run of three individual tests of a test set. Perhaps in
automotive emissions monitoring, the block of data may be that from a single
test. So, a block would depend upon what seems to be logical for a particu-
lar type of data-gathering. In any case, the data would be given a valida-
tion review as a defined block of data.
Systematic and Uniform Application. Data validation should not be
conducted on an occasional or spot-check basis. Once the procedure is
defined it should be applied systematically and uniformly to all sequential
blocks of data acquired. This is not to say that the procedure should not
be continually improved. It is helpful for details of the procedure to be
written to assure uniform application of the procedure in case of change of
personnel and to avoid "reinventing the wheel."
A Set of Criteria. A set of criteria ought to be developed and docu-
mented as a part of the written procedure to be used during data validation
to determine if the data are valid, questionable, or invalid. If the causes
of questionable or invalid are not evident from the data validation activity,
the detection of questionable or invalid data could trigger investigation
into possible cause with appropriate corrective action implemented to pre-
clude recurrence of questionable, or invalid data.
12
-------
Checks for Internal Consistency. Data validation might include checks
for internal consistency, such as relationships among pollutants, or rela-
tionships between pollutants and meteorology.
Checks for Temporal and Spatial Continuity. Data validation might
include checks for continuity with respect to time, as might be evaluated by
having a chronological plot of the data, to look for discontinuities, spikes,
gaps, etc. The data may also have some spatial continuity if the data are
from a network within some relatively small region, such as a local air
monitoring network.
Checks for Proper Identification. To be useful, data must be properly
identified. Improperly identified data may well be considered "no data."
Although identification may seem to be a trivial thing, the Regions, for
example, have difficulties with such improper identifications as (a) one
state reporting data identified for another state, (b) data for October 35,
and (c) duplicate data from one site and none from another. For medical
history questionnaires of health effects studies, checks may be made to
make sure that children are not older than their mothers!
Checks for Transmittal Errors. For paperwork systems, simple checks
may be made to assure that the data have not been incorrectly transferred
from one paper to another. With more sophisticated electronic and computer
data handling and with telemetry of data, checks could be made to assure
that the data have not been changed in the process.
Flagged or Rejected Data. A data validation system might include a
scheme for flagging questionable data and may make provision for outright
rejection of data for use. It may be desirable, however, to retain such
data in the data system with proper indication of its status.
In summary, some of the aspects I consider as parts of a Data Validation
System are as indicated below:
DATA VALIDATION
— AFTER-THE-FACT REVIEW
— APPLIED TO BLOCKS OF DATA
-- SYSTEMATICALLY AND UNIFORMLY APPLIED
13
-------
— A SET OF CRITERIA
— CHECKS FOR INTERNAL CONSISTENCY
-- CHECKS FOR TEMPORAL AND SPATIAL CONTINUITY
— CHECKS FOR PROPER IDENTIFICATION
— CHECKS FOR TRANSMITTAL ERRORS
— DATA FLAGGED OR REJECTED
Techniques of Data Validation. Obviously, because the methods of data
gathering are so varied, the particular techniques that are to be used for
data validation for a particular program will depend upon many things.
Following are mentioned a few of the factors which need to be considered.
The nature of the^ response output, that is, whether you get a response on a
strip chart recorder, or whether it is generated on paper tape, magnetic
tape, or is fed directly into a computer will determine the technique of
data validation. The techniques will depend upon the method of data
reduction, i.e., whether it is a manual-type method or a computer system.
The form of the data transmittal, i.e., whether data are transmitted by some
handwritten form, typewritten form, computer printout, or magnetic tape will
determine the types of data checks to use. The techniques will also depend
upon the amount of data. As we get involved with larger studies and larger
blocks of data involved, such as NADB, different techniques must be used
from those utilized for small sets of data. The techniques will depend upon
the type and amount of ancillary (related) data that can be used for evalua-
tion, comparison, or for correlation purposes. Techniques will depend upon
what computing capabi1ity is available for use. The extent of available
plotting capability is an important consideration, particularly for large
blocks of data. Personally, I would like to see graphical presentations
used in data validation. Much more can be learned by graphical representa-
tion that would be very difficult—almost impossible—to learn from visual
review of large masses of data. Finally, the nature and extent of data
validation techniques would depend on the intended use of the data.
Different criteria may be used for validating data from which long term
trends are estimated as compared to data for three-hour peak values, for
example. To summarize:
14
-------
TECHNIQUES WILL DEPEND ON
-- NATURE OF RESPONSE
— METHOD OF DATA REDUCTION
-- FORM OF DATA TRANSMITTAL
-- AMOUNT OF DATA
— AMOUNT OF ANCILLARY DATA
— COMPUTING CAPABILITY
— PLOTTING CAPABILITY
— USE OF THE DATA
Lastly, there are two key principles of data validation that I want to
mention. First, data validation ought to occur as close in time and
location a^ possible to_ the originating location of the data. If question-
able values are discovered, and corrective actions need to be made to the
system, they must be made in a very timely and effective manner. For
example, NADB may be validating data for as much as two years after the
initial generation of the data. That is much too late to get effective
corrective action at the local level. Therefore, data validation techniques
should be located as closely as possible to the source of the data. Second,
where possible, the persons having data validation responsibilities should
not be the persons directly responsible for acquiring the data. Ideally,
the person or persons responsible for data validation should be independent
of the data acquisition activities and should be the most knowledgeable and
experienced technical individual available to perform the function.
Thus,
DATA VALIDATION SHOULD BE
— CLOSE TO THE ORIGINATION OF THE DATA
— INDEPENDENT
Perhaps I have raised a number of questions in your mind concerning the
subject of data validation. Hopefully, the other speakers will answer some
of these questions, or will raise further questions, and will promote bene-
ficial discussions and interchange of ideas and techniques of data validation.
15
-------
THE SHEWHART CONTROL CHART TEST FOR
SCREENING 24-HOUR AIR POLLUTION MEASUREMENTS
by
William F. Hunt
Office of Air Quality Planning and Standards
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
17
-------
THE SHEWHART CONTROL CHART TEST FOR
SCREENING 24-HOUR AIR POLLUTION MEASUREMENTS
W.F. Hunt
INTRODUCTION
A quality control program is being developed for the U.S. Environmental Protection
Agency's (EPA) National Aerometric Data Bank (NADB). The initial phases of the work
(1 2)
were reported previously. ' The purpose of the program is to develop and apply
quality control tests to check ambient air quality data for anomalies, such as trans-
cription and keypunch errors, as well as to detect erroneous data resulting from the
periodic malfunctioning of air monitoring instruments. For the sake of completeness, it
is worth reviewing some aspects involved in the collection and uses of air quality data.
To begin with, air quality data are primarily collected to measure the success of
emission control plans in achieving the National Ambient Air Quality Standards.
National Ambient Air Quality Standards (NAAQS) have been established by EPA for
five pollutants: total suspended particulate (TSP), sulfur dioxide (802), carbon monox-
ide (CO), photochemical oxidants (Ox), and nitrogen dioxide (N0£). These standards are
intended to protect both human health and welfare. They may be stated as annual means
or as upper limit values that may not be exceeded more than once per year. Although
different averaging times are used for various standards, this paper is primarily con-
cerned with the examination of 24-hour average values for TSP, S02, and N02 concentra-
tions. While only TSP and S02 standards are in terms of 24-hour averages, all three
pollutants have standards expressed in terms of annual averages. Because of the impor-
tance that is attached to violations of the NAAQS, a quality control program to ensure
the validity of the measurement of both short- and long-term concentrations is extremely
important.
The application of the Dixon Ratio Test and Shewhart Control Chart Test to
measured levels of three major pollutants—TSP, S02, and N02—is examined. The tests
apply to data from monitoring instruments which generate one measurement per 24-hour
period and are operated on a systematic sampling schedule of approximately once every 6
days. In the cases of SQ2 and N02, there are also continuous monitoring instruments,
which monitor the pollutants constantly; but our discussion here is concerned only
with 24-hour data. The application of the tests results is flagged data which need to
be verified as either valid or invalid.
These statistical tests are presently being applied to data collected in EPA's
Region V. Region V encompasses the states of Illinois, Indiana, Michigan, Minnesota,
Ohio, and Wisconsin. In terms of population, it is the largest of EPA's regions, and
there is extensive monitoring of the above pollutants. The purpose of the Region V
evaluation is to determine whether the data flagged by the tests are valid or invalid
and to identify, if possible, the source of the error.
This paper will discuss the flow of data from the state and local government; the
data-editing process; the basic characteristics of the data; and the application and
evaluation of the two tests; it will conclude with our recommendations.
DATA FLOW
Most ambient air quality data are collected by state and local air pollution con-
trol agencies and are forwarded via EPA's Regional Offices to the NADB. A considerable
amount of data is forwarded. For example, the minimum legal requirements for air
pollution monitoring across the nation will result in the annual submittal of over 20
million air quality measurements to the NADB. The data are sent quarterly in a standard
Copyright ©1977 American Society For Quality Control, Inc. Reprinted by permission.
18
-------
format that specifies the site location; the year, month, and day of sampling; and
the measurement itself (24-hour or 1-hour value) in micrograms or milligrams per cubic
meter (pg/m3 or mg/m3) or parts per million (ppm). A corresponding site file contains
descriptive information on the sampling-site environment. EPA edits the submitted
data, checking for consistency with acceptable monitoring methods, and other identify-
ing parameters. In the data-editing program, air quality data with extremely high
values are flagged. Data that do not pass these checks or that have values exceeding
certain predetermined limits are returned to the originating agency via the Regional
Office for correction and resubmittal.
As might be expected with data sets this large, there are still anomalous measure-
ments that slip through the existing editing and validation procedures. Therefore,
there is a need for a simple cost-effective statistical test that can be applied to
the air quality data by which to detect, primarily, obvious transcription, keypunch, and
measurement errors. Statistical tests do not eliminate, however, the need for more
intensive quality assurance at the local level. For example, inadequate calibration
procedures or similar problems that result in measurement bias will not be detected by
our statistical procedures, which are intended primarily for macroanalysis.
BASIC CHARACTERISTICS OF TSP, S02, AND N02 DATA
Basic characteristics of the TSP, S02, and N02 data were considered in selecting
the quality control tests being used. To begin with, the tests were applied to data
which were obtained from monitoring instruments that generate one measurement per 24-
hour period. For such monitoring methods, EPA recommends that a systematic sampling
procedure of once every 6 days, or 61 samples per year, be used at a minimum to collect
/Q\
the data. Such a sampling procedure generates data, which for our purposes, may be
considered as approximately independent.
In examining the distributional properties of the data, past research has shown
that ambient TSP concentrations are approximately lognormally distributed. ' This
is sometimes true for S02 and N02, also, but is not always the case. Current work
suggests that these pollutants may follow an exponential or Weibull distribution.
In selecting the quality control tests, the averaging times which correspond to
the NAAQS are important. The values of interest are the peak concentrations (24-hour
average measurements) for TSP and S02, and the annual means for TSP, S02, and N02.
The final data characteristic of importance is the seasonality of the pollutants.
As an example, in some areas of the country, TSP and S02 measurements are highest in
the winter months and lowest in the summer months. Therefore, the factor of seasonality
had to be considered in the selection of the quality control test to minimize this as a
possible source of error.
THE QUALITY CONTROL TESTS
Two quality control tests are presently being applied and the results of the appli-
cation evaluated, the Dixon Ratio Test and the Shewhart Control Chart Test. The
output of the quality control tests is a listing of the suspicious data, including the
site and the time of occurrence. The tests are discussed below.
Dixon Ratio Test ,^\
The use of the Dixon Ratio Test was discussed in an earlier paper. The test
was applied to TSP quarterly data and was found to work reasonably well in detecting a
single anomalous value. Problems occurred when there were multiple transcription errors
within a quarter, such as the miscoding of an entire month of data. This problem was
corrected when the test was applied to monthly averages.
As part of the evaluation of quality control of Region V data, the Dixon Test
was applied to all 1974 monthly averages of TSP, S02, and N02 on a site-by-site basis
to examine the data for possible multiple transcription, keypunch, or measurement
errors occurring within a month. By applying the test to the monthly averages, the
assumption of normality can be satisfied, although the monthly averages are not entire-
ly independent because of the seasonality in the data. This must be considered in
examining the flagged data.
The Dixon Ratio Test requires that the monthly averages be ordered in increasing
levels of magnitude. The test .basically constructs an "r" ratio that compares the
distance of the maximum (minimum) observation from its neighbors with the range of all
19
-------
but one or two of the observations. Let us assume that Y^ equals the itn order pollu-
tant monthly average, where Y^j is the highest monthly average and N equals the number
of months within the year for which there are data. The test procedure is as follows:
1. Choose a, the probability (risk) of rejecting an observation that really belongs in
the group.
2. Order the monthly averages from Y^ through Y^j, where Y^ is the highest value.
3. If 3 < N < 7, compute rin = (Y.T - Y.T ,)/(YM - Y,) ;
— — 10 N N-l N 1
8 <_ N ^ 10, compute rn = (YN - Vl;/(YN " Y2};
11 1 N H I2- compute rn = (YN - YN_2)/(YN - YZ> ;
where Y is the highest value.
4
4. Look up r, for r. from a table of critical values.
1-ct ij
5. If r.. is greater than r , print out a list showing the suspect monthly averages,
the remaining monthly averages, and the site location.
The Shewhart Control Chart Test can be used to examine both shifts in monthly
averages, as well as shifts in the monthly range. From the former it can detect
possible multiple errors and from the latter, single anomalous values. In this test
(12)
the data can be divided up into what Shewhart called rational subgroups. In a
manufacturing process the subgroups would most likely relate to the order of production.
Ambient air quality measurements can be viewed in the same way because they are col-
lected by a monitoring instrument over time. A month of data was selected as the
rational subgroup because the air quality data are recorded by the state and local
agencies on a monthly basis in a standard format. The monthly subgroup generally
/ 0\
consists of live measurements based on EPA's recommended sampling schedule of 61
observations per year, which also is the common subgroup size found in industrial use.
Using a subgroup size of five, it can be assumed that the distribution of the monthly
means is nearly normal, even though the samples are taken from a nonnormal universe.
The test was applied to the 1974 Region V data on a moving 4-month basis: that is,
the averages and range of values in the month in question were compared with the overall
averages of the three previous monthly averages and monthly ranges. The moving 4-month
comparison was used to minimize the effect of the seasonality of the pollutants. The
formulas for calculating the trial limits are as follows:
For the monthly range: UCL = D.R, and
LCLn = D..R.
K j
For the monthly means: UCL_ = x + A0R, and
} x 2 '
LCL_ = x - A2R,
where R = the monthly range; R = the average of the three previous monthly ranges; x =
the monthly average in question; x = the average of the Three previous monthly aver-
ages, and D-j, D^, and A2 are factors for determining from R the 3-sigma control limits
for x and R. (See Table C on page 562, reference number 5.)
RESULTS OF APPLICATION OF QUALITY CONTROL TESTS
During 1974, TSP, S02, and N02 were being monitored in Region V at 855, 366, and
303 sites, respectively. Both the Dixon and Shewhart Tests were applied to all 1974
TSP, S02, and N02 data from Region V. Still in progress, an extensive effort is being
made on the part ot EPA personnel in Region V, in conjunction with state air pollution
control officials, to evaluate the air quality data flagged by both the Dixon and
Shewhart Tests. As an initial phase of this evaluation, examination was made of those
data in which the flagged monthly mean or range exceeded one of the pollutant-specific
NAAQS. For 'ISP and S02, appropriate cutoffs were thought to be 260 yg/m3 and 365 ng/m3,
which aro their respective primary short-term 24-hour standards. In the case of N02,
the annual primary NAAOS of 100 Mg/m' was used because N02 has no short-term primary
standard. Although their choice was somewhat arbitrary, the NAAQS wero used as cutoffs
beLausr their violation results in ruexamination of the overall adequacy of local air
20
-------
pollution control measures in effect. Thus, high values must be verified because they
can result in significant impact on the original control strategy designed to achieve
the NAAQS.
Table 1 indicates the number of Region V sites reporting TSP, S02, and N02 data
which were flagged by the Dixon Test, by the Shewhart Control Test, and by both tests.
As would be expected, there are more sites flagged by the Shewhart Control Test as
having anomalous data than the Dixon Test, because it looks at both shifts in the
TABLE 1. Comparison of Dixon Ratio and Shewhart Control Chart
Tests as Applied to Sites in Region V Monitoring TSP,
S02, and N02 in 1974
Pollutant
TSP S02 N02
High value in .> 260 >_ 365 _> 100
question3
(yg/m3)
Total sites, no. 855 366 302
Dixon test
Flagged sites, no.
Flagged sites, no.
with errors
Shewhart test
Flagged sites, no.
Flagged sites, no.
with errors
Both tests
Flagged sites, no.
Flagged sites, no.
with errors
35
31
38
31
32
31
1
1
4
3
1
1
25
11
36
16
19
10
The High value in question is the monthly mean in the case of the Dixon Test and
the monthly mean or range in the Shewhart Control Chart Test. The National Ambient
Air Quality Standards (NAAQS) were used as high value cutoffs: 260 yg/m3 and
365 yg/m3 are the 24-hour primary NAAQS for the TSP and S02, respectively, while
100 pg/m3 is the annual primary NAAQS for NO,,.
monthly mean and range while the Dixon Test examines only the monthly means. The pre-
liminary evaluation of the flagged sites is also given as the number of flagged sites
which were found to have one or more erroneous 24-hour measurements.
Of the 855 sites in Region V measuring TSP in 1974, 35 were flagged by the Dixon
Test, 38 by the Shewhart Control Test, and 32 by both tests. The flagged sites report-
ed at least one monthly mean and/or range eoual to or greater than 260 ug/nr. The
preliminary evaluation indicates that data from 31 sites, which were flagged by both
tests, were found to have multiple transcription or keypunch errors. In the caso of
S02, 1 of the 366 sites was flagged by the lixon Test, 4 by the Shewhart Test, Jnd 1
by both tests. The monthly mean and ranges in question were equal to or greater t'tan
365 ug/m3. Data from the site flagged by both tests were found to have multiple tran
scription errors, while data from the remaining tvo sites flagged by the Shewhart
Test had single transcription errors. Finally, of the 302 sites measuring NO^, -5
were flagged by the Dixon Test, 36 by the Shewhart Test, and 19 bv^both tests. The
monthly means and ranges in question equalled or exceeded 100 ug/nr . Transcription
and keypunch errors were found at 11 of the sites flagged by the Dixon Test, Ib c>t" the
sites flagged by Shewhart Test, and 10 of the sites flagged by both.
An example of a site flagged by both tests was one that measured TSF for 11 rionth---
in 1974. The monthly mean (x) , ranges (R) , and subgroup sizes (n't ait? indicated bol^w
by month:
21
-------
x
R
n
Jan
0
Feb
67
74
4
Mar
60
25
5
Apr
56
71
5
May
70
44
5
June
67
102
3
Jul
66
37
5
Aug Sept Oct Nov Dec
73 59 591 82 41
64 68 595 68 30
55534
The Dixon Ratio Test was applied to the entire year of data; the ratio of the largest
monthly mean, 591, minus the third largest mean, 73, was compared with the difference
of the largest mean and second smallest monthly mean, 56. The test statistic is
which is significant at the
= 591-73
C2 1 591-56
0.005 level.
The Shewhart Control Chart Test was applied on a moving 4-month basis. When the
monthly average and range for October became the values in question, they were com-
pared with the overall averages of the July, August, and September averages and ranges.
The test results are shown in Figure 1 for both the monthly mean and range. In both
600
500
400
E
a.
<
cr
300
200
100
J UCL,
-T-—+—\
LCLX
JUL AUG SEP OCT
a. R CHART FOR MONTHLY RANGE
JUL AUG SEP OCT
b. X CHART FOR MONTHLY MEAN
Figure 1. Example of Shewhart Control Chart Test applied to data with
multiple transcription errors in month of October.
cases the air quality data are "out of control" for the month of October, with both
the October average and range way above their respective upper control limits. The
problem was later identified as multiple transcription errors in which all numbers in
the month of October were off by a factor of 10.
22
-------
CONCLUSION
From the initial results of the Region V evaluation, it appears that both the
Dixon and Shewhart work well on the TSP, S02," and N02 data and are in reasonably good
agreement. Ideally, both tests should be used in the screening process. However, if
an air pollution control agency wanted to employ only one test, the Shewhart Control
Chart Test would be preferable, because it has the advantage that it can simultaneously
examine shifts in both the monthly mean and range and can be presented graphically.
Further, in the case of S02 and N02, the Shewhart Test flagged sites with a single
transcription or keypunch error—identified by shifts in the range—which were not
identified by the Dixon Test.
The second phase of the Region V evaluation will cover those sites whose highest
measured value did not exceed one of the pollutant-specific NAAQS. This phase will be
examined in a later paper, along with the development of quality control tests for data
generated by the continuous monitoring methods.
ACKNOWLEDGMENTS
The authors wish to express their appreciation to the state air pollution control
agencies in Region V for their help in the evaluation of the tests, to Mrs. Ann Rogers
and Mrs. Aline Rolaff for providing the computer programming support, to Mrs. Joan
Bivins, Miss Hazel Browning, and Mr. Willie Tigs for their clerical support, and to
Dr. Thomas Curran for his many helpful comments on earlier drafts of the paper.
REFERENCES
1. Hunt, W. F., Jr., and T. C. Curran. An Application of Statistical Quality Control
Procedures to Determine Progress in Achieving the 1975 National Ambient Air Quality
Standards. Transactions of the 28th Annual ASQC Conference, Boston, Massachusetts,
May 1974.
2. Hunt, W. F., Jr., T. C. Curran, N. K. Frank, and R. B. Faoro. Use of Statistical
Quality Control Procedures in Achieving and Maintaining Clean Air. Transactions
of the Joint European Organization for Quality Control/International Academy for
Quality Conference, Venice Lido, Italy, September 1975.
3. Title 40 - Protection of Environment. National Primary and Secondary Ambient Air
Quality Standards. Federal Register. _36(84):8186-8201, April 30, 1971.
4. Dixon, W. J. Processing Data for Outliers. Biometrics. 9^75, 1953.
5. Grant, E. L. Statistical Quality Control. New York, McGraw Hill Book Co.
p. 122-128. 1964.
6. SAROAD Users Manual. U. S. Environmental Protection Agency, Research Triangle
Park, N.C. Publication No. APTD-0663. July 1971.
7. Hoffman, A. J., T. C. Curran, T. B. McMullen, W. M. Cox, and W. F. Hunt, Jr.
EPA's Role in Ambient Air Quality Monitoring. Science. 190(4211):2A3-248,
October 1975.
8. Title 40 - Protection of Environment. Requirements for Preparation, Adoption, and
Submittal of Implementation Plans. Federal Register. 3.6(158) -.15490, August 14,
1971.
9. Larsen, R. I. A Mathematical Model for Relating Air Quality Measurement to Air
Quality Standards. U. S. Environmental Protection Agency, Research Triangle Park,
tl.C. Publication No. AP-89. 1971.
10. Hunt, W. F., Jr. The Precision Associated with the Sampling Frequency of Lognor-
mally Distributed Air Pollutant Measurements. J. Air Poll. Control Assoc. 22(9):
687, 1972.
11. Curran, T. C. and N. H. Frank. Assessing the Validity of the Lognormal Model Vhen
Predicting Maximum Air Pollutant Concentrations. Presented at the 68th Annual
Meeting of the Air Pollution Control Association, Boston, Massachusetts, 1975.
12. Shewhart, W. A. Economic Control of Quality of Manufactured Product. Princeton,
D. Van Nostrand Company, Inc. 1931. p. 299.
23
-------
DISTRIBUTION GAP TEST FOR HOURLY
AIR POLLUTION DATA
by
Thomas C. Curran
Office of Air Quality Planning and Standards
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
25
-------
DISTRIBUTION GAP TEST FOR HOURLY
AIR POLLUTION DATA
T.C. Curran
Previous papers have discussed techniques for screening air pollution data sets
with particular attention given to 24-hour measurements. The present paper focuses
upon the use of screening procedures for hourly ambient air quality measurements. As
with any quality control procedure, it is useful to consider the nature and intended
use of the data before discussing the screening technique.
Hourly air pollution data sets present some interesting practical problems when one
considers the use of a screening procedure. The most obvious feature is the volume of
data. For example, 24-hour air pollution measurements are usually obtained by every-
sixth-day sampling resulting in approximately 60 samples per year. In contrast, hourly
measurements are obtained from continuous monitors that operate every day and, therefore,
may produce as many as 8,760 values per year. Thus, hourly data sets are commonly 100
times larger than those for daily measurements. The reason that the volume of data is
important becomes apparent when the use of the data is examined. For the most part, air
pollution data is collected to determine status with respect to certain legal standards,
4
such as the National Ambient Air Quality Standards. These standards specify upper
limits for air pollution concentrations. Of particular interest for this paper are the
standards for oxidants or carbon monoxide which indicate hourly values "not to be
4
exceeded more than once per year." In these situations it is the second highest
value from a data set of 8,760 observations that becomes the decision-making value.
Obviously, this places a premium on ensuring data quality.
From a practical viewpoint, maintaining a data bank for air pollution measurements
involves the basic conflict of having to routinely process large volumes of data and
yet at the same time ensure an almost zero defect level of data quality. Many sites
monitor for several pollutants so that on the national level, thousands of sites are
routinely submitting tens of thousands of data points each year. However, because of
the nature of the standards, many users may only be interested in the two highest values
at each site for each pollutant. It should be noted that two values from a data set of
8,760 observations constitutes 0.023 percent of the data. This means that the user's
perception of data quality may be entirely different from the true data quality. For
example, if only 0.05 percent of the data points were too high due to errors, this
would still be sufficient to have the user complain that "the data are useless." On
the other hand, if elaborate editing checks are introduced, the sheer volume of data
may result in high costs or processing delays, and the user may now complain that the
data are not sufficiently current for him to make timely decisions.
With this background in mind, it is apparent that an air quality data screening
program must be able to process large volumes of data in an inexpensive fashion while
flagging virtually every error. Also, because it is frequently difficult and time con-
suming to- verify suspect data points, every flagged value should be a genuine error.
Unfortunately, while these characteristics are obviously desirable, they are also almost
impossible to attain. The approach presented here is primarily intended to eliminate
the more glaring errors from these hourly data sets. The major emphasis is on screening
the higher concentration values to check for general internal consistency within the
data set.
RATIONALE FOR SCREENING PROCEDURE
In our initial development of a screening procedure for hourly data, a computer
program was developed that checked for departures from typical patterns. These typical
patterns we-re selected on the basis of experience with various types of air pollution
d.it.i. Basically, the values were flagged on a yes-no decision, and there was no proba-
bility statement associated with the rejected values. One stage in this development was
Copyright ©1977 American Society For Quality Control, Inc. Reprinted by permission.
26
-------
to give sample data sets' to experienced air pollution data analysts to see what values
they would reject. There were two reasons for this step. The most obvious was to en-
sure that the computerized screening procedure was consistent with so-called expert
judgment. However, another reason was the need for a test that would mimic the decisims
made by an experienced analyst. The reason for this was an attempt to avoid a black-
box approach where the screening procedure was viewed as a mysterious oracle delivering
arbitrary decisions. The point here is that it can be quite time consuming for the data
analyst to check flagged data points. Values that appear to be quite unlikely from a
statistical viewpoint may actually be quite likely in the real world. For example,
massive traffic jams do happen and may result in high carbon monoxide levels. Windstorms'
can mean high total suspended particulate levels. Sudden shifts in wind direct ion "can
mean that a monitor near a point source goes from a zero reading to almost full scale
and back in a few hours. The high variability associated with peak air pollution values
makes it almost impossible to develop a screening procedure that does not occasionally
flag real values. But it seemed desirable to avoid the situation where an air pollution
analyst would tire of repeatedly checking flagged values that turned out to be correct.
Therefore, emphasis was given to developing a test that would flag values that an air
pollution analyst would want to investigate. An effective way to accomplish this was to
develop a test that would mimic experienced human judgment so that the analyst would
understand why the value was flagged.
To a large degree the preliminary test on patterns was successful. Experienced
analysts used the same basic approach of looking for unusual jump discontinuities between
successive hourly values or departures from expected diurnal or seasonal patterns. How-
ever, there were two main deficiencies in this computerized procedure based upon depart-
ures from suspected patterns. One was the lack of a probabilistic framework. The
second, and probably the more serious from a practical standpoint, was the need to vary
the amount of allowable departure from site to site. The probabilistic framework could
be provided by a time series model, and the parameters varied from site to site. However,
it became apparent during the preliminary investigation that many of the outliers could
be detected by a much simpler approach. In most cases, unusually high values could be
detected by examining the frequency distribution of the hourly data for a given period
of time, such as a month, quarter, or year. Suspect values would be associated with
large gaps in the frequency distribution. The length of the gap and the number of
values above the gap afforded a convenient means of detecting possible errors. With this
simplification of the problem, it becomes possible to develop a probabilistic framework
for the problem as discussed below.
PROBABILITY OF A GAP
In order to compute the probability of a gap in the empirical frequency distribu-
tion, it is necessary to assume some type of underlying distribution. Although this
involves an oversimplification because it ignores dependency between successive hourly
values, such approaches have traditionally been used with success in air pollution data
analysis. The lognormal distribution has customarily been used for this purpose. How-
ever, the exponential distribution has also been found to provide a reasonable approxi-
mation for the upper tail, or higher concentrations, of hourly air pollution data.
Because the higher concentration values were of primary interest and the exponential
distribution is mathematically convenient, it was used as the underlying distribution.
As with any measurements, although the approximating distribution is continuous, the air
pollution values are discrete valued. For simplicity, they may be assumed to be Integers
because this involves merely a change of scale. A gap in the frequency distribution may
then be described in terms of its length, the number of values above the gap, and at
what concentration the gap begins. Therefore, if a monthly empirical frequency distri-
bution of hourly values has n values greater than concentration c but no values between
c, and c+k, this would be a gap of length k starting at c with n observations above the
gap. To compute the probability of this event, consider the following:
Let X be an exponential random variable.
Then Pr(X^c) = l-e~ where ,\>0, c^O.
Thus, Pr(X-c) = e~A(c~6). __
27
-------
The probability that X is greater than c+k given that X is greater than c is
-A(c+k-6) ,,
Pr(X>c+k)
Pr(X>c)
-A(c-0)
Because X is distributed exponentially, this expression is independent of the concen-
tration c.
Assuming independence, the probability that n values are greater than c+k given
that these n values are greater than c is
i -N
(e )
-nAk
-nXk
Thus, the probability of a gap of length k with n values above the gap is e
This probability then becomes the criteria for rejecting suspect data.
APPLICATION
A relatively simple FORTRAN program was written to process hourly data, compute
the empirical frequency distribution, and examine any gaps. Because of the manner in
which the data is routinely submitted to the U.S. Environmental Protection Agency's
National Aerometric Data Bank, the program was written to check the data on a monthly
basis (744 hourly values). The parameter A obviously varies from one data set to
another. For simplicity, A was determined from the 50th and 95th percentiles of the
data. This was computationally convenient and also emphasized the fit for the upper
tail. Results to date in evaluating this test Indicate that this approach is adequate.
Past experience has indicated that an occasional source of error is the miscoding
of units so that an entire month of data would be internally consistent yet too high
by some scale factor. To account for this, a second estimate of A was computed using
an assumed value for the 99.9th percentile, i.e., a value that historically should not
be exceeded more than one time in a thousand.
RESULTS
In order to provide a realistic test of this screening procedure, actual data sets
were used. One of particular interest involved carbon monoxide data that had been
quickly key-punched and then manually edited for a specific study. This provided a pre-
liminary and corrected version of the file. The preliminary file had known errors and
the corrected file was presumably valid. The first test run on the preliminary file
processed 21,362 hourly values from 40 monthly data sets. Eight of these monthly data
sets were flagged. Hourly carbon monoxide values would be expected to mostly fall in
the range of 0 to 50 ppm. In this first test, values of 900, 800, 700, and 500 were
found resulting in gap lengths greater than 100 and associated probabilities of less
than 1 in 10,000. These results are shown in Table 1. Of the eight flagged data sets,
TABLE 1. Rejected Site Months From Sample Data Set
Site Month/year
Number Number of
of 2nd Gap Starting values
values Maximum high length at above
Probabilitv
33
33
33
33
33
39
901
901
Oct.
Nov.
Dec.
Jan.
Feb.
June
July
Aug.
1974
1974
1974
1975
1975
1975
1974
1974
530
604
671
653
510
707
620
334
30
500
800
500
33
900
15
800
13
300
500
500
18
700
14
800
16
-100
-100
>100
14
-100
3
-100
14
15
41
20
19
27
11
. 11
1
3
4
Z.
1
3
3
5
.0006
• .0001
- .0001
-.000:
. :ooi
.000;
.0056
.0001
28
-------
seven had keypunch errors. The one remaining month was flagged on the basis of a gap of
length 3 and the data appeared to be reasonable. This presented no difficulty for the
analyst because the computer printout was sufficient to indicate that these data were in
an intuitively acceptable range and probably did not warrant further investigation.
It took less than 30 seconds on EPA'a UNIVAC 1110 to process these 21,362 hourly
values, and the total cost was approximately $1. It should be noted that the program
does several other editing checks so that this cost includes more than the screening
procedure for gaps.
CONCLUSIONS
Using gaps in monthly frequency distributions appears to be a convenient means of
screening hourly air pollution data sets for outliers. Results to date indicate that it
satisfies the criteria of being easy and economical to implement while producing output
that is intuitively understandable to an air pollution data analyst. The test success-
fully spots the more obvious errors. As expected, the initial results also suggest that
these types of data sets do have a much lower error rate than the user perceives because
of the emphasis on only the few highest values. .
There are certain refinements that can be made in screening these type of data sets.
Time series models and the use of associated data, such as meteorological variables,
would be expected to increase sensitivity and possibly result in even better data qualUty.
However, it remains to be seen if these more elaborate approaches are cost effective
when processing vast quantities of data from locations throughout the nation.
As a final cement, it should be noted that once a value is flagged as a possible
anoaialy, it cannot be arbitrarily dropped from the data set. It must first be verified
that the data point actually is incorrect. The fact that the data point is statistically
unusual does not necessarily mean that it did not occur.
REFERENCES
1. Hunt, W. F., Jr., and T. C. Curran. An Application of Statistical Quality Control
Procedures to Determine Progress in Achieving the 1975 National Ambient Air Quality
Standards. Transactions of the 28th Annual ASQC Conference, Boston, Massachusetts,
May 1974.
2. Hunt, W. F., Jr., T. C. Curran, N. H. Frank, and R. B. Faoro. Use of Statistical
Quality Control Procedures in Achieving and Maintaining Clean Air. Transactions of
the Joint European Organization for Quality Control/International Academy for Quality
Conference, Venice Lido, Italy, September 1975.
3. Hunt, W.. F., Jr., R. B. Faoro, and S. K. Goranson. A Comparison of the Dixon Ratio
Test and Shewhart Control Chart Test Applied to the National Aerotnetric Data Bank. •
Presented at j|t>e 30th Annual Conference of the American Society for Quality Control.
Toronto, Ontaflo, Canada, June 1976.
4. Title 40 - Protection of Environment. National Primary and Secondary Ambient Air
Quality Standards. Federal Register. 36:(84):8186-8201, April 30, 1971.
5. Larsen, R. I. A Mathematical Model for Relating Air Quality Measurements to Air
Quality Standards. U.S. Environmental Protection Agency, Research Triangle Park,
N.C. Publication No. AP-89. 1971.
6. Curran, T. C. and N. H. Frank. Assessing the Validity of the Lognormal Model when
Predicting Maximum Air Pollutant Concentrations. Presented at the 68th Annual
Meeting of the Air Pollution Control Association, Boston, Massachusetts, 1975.
29
-------
USE OF STATISTICAL SAMPLING IN VALIDATING
HEALTH EFFECTS DATA
by
Carolyn P. Chamblee
Health Effects Research Laboratory
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
31
-------
USE OF STATISTICAL SAMPLING IN
VALIDATING HEALTH EFFECTS DATA
Carolyn P. Chamblee
Statistics and Data Management Office
Health Effects Research Laboratory
Research Triangle Park, North Carolina 27711
ABSTRACT
A quality control plan has been adopted for large computer data files of
health effects research studies. The Dodge-Romig acceptance sampling technique
was selected. This procedure has the capability of guaranteeing within specific
tolerance limits the agreement between the information on the computer files
and the information on the original data documents. The method is easy to use
and is adaptable to a wide range of files and to a varying quantity of documents.
The type of plan chosen utilizes a file as a lot, a single document as a
characteristic, a single sampling procedure, and a 2% Lot Tolerance Per Cent
Defective (LTPD). Our experience with this acceptance sampling plan has been
positive enough that we have extended its use to most of our current studies.
32
-------
USE OF STATISTICAL SAMPLING IN
VALIDATING HEALTH EFFECTS DATA
Carolyn P. Chamblee
Statistics and Data Management Office
Health Effects Research Laboratory
Research Triangle Park, North Carolina 27711
The Statistics and Data Management Office supplies statistical and
data processing support to the Health Effects Research Laboratory (HERL) as
required. One of the principal responsibilities of the Laboratory, as the
name suggests, is to research and assess effects of air pollution on the
human health. One method HERL uses to carry out its responsibility is to
conduct nationwide epidemiological research to establish the relationship
between human health and community air quality. This research includes field
studies that examine the health of population groups residing in communities
exposed to definable air pollutants. Exposure-response relationships and
injury thresholds are estimated and the studies document changes in health
that accompany changes in environmental quality.
Questionnaires are designed for the field studies to allow more uniform
collection of data. These questionnaires are usually designed as keypunch
entry or optical scanning documents or a combination of both. As one would
imagine for nationwide studies, the collection effort results in a large
volume of information which is then processed through various steps and
results in a computerized master file or files.
During the period from 1970 to 1975, HERL conducted a large number of
epidemiological studies commonly referred to as CHESS or the Community Health
and Environmental Surveillance System. There are approximately 83 of these
studies covering five different types of studies over five areas. While
successfully completing this intensive data collection effort during a period
of in-house personnel limitations, computer conversion from an IBM 360/50 to a
Univac 1110 and high contractor personnel turnover, HERL developed a backlog
of raw data. Although these data were computerized and computer-edited, no one
could make a definitive statement regarding the accuracy of the files versus the
original source documents. The emphasis and importance of quality control
procedures and our inability to qualify the data files led us to the point that
a quality control program for these files had to be developed. I was assigned
to develop the quality control plan.
The plan selected had to be able to guarantee that when properly followed
the contents of the computer file reflected data reported on the forms within
a small error tolerance. Also, each file must meet the error tolerance.
That is to say, a statement that over the 83 files the error rate is less than
the specified tolerance is not sufficient. Each individual file must satisfy
the limit. Lastly, the quality control plan had to minimize the verification
effort but must also be simple to use, easy to understand and adaptable over a
wide range of data files and for a varying number of data forms.
33
-------
A number of statistical and quality control references were reviewed
before it was decided that the Dodge-Romig acceptance sampling technique was
the most desirable. This approach is discussed in detail by Harold F. Dodge
and Harry G. Romig (1). Their book is very easy to read and understand and
offers a twelve step procedure for selecting a specific sampling plan. To
explain how we decided on the plan we currently use, I will briefly describe
the steps and discuss how we implemented Dodge-Romig.
1. Decide what characteristics to include. For example, a characteristic
could be considered a variable or a data field, which could lead to
distinctly different error rates. In our case, we considered all
information on a single questionnaire form as a group so that one form
equals one record.
2. Decide what is to constitute a lot. A lot is defined as a homogeneous
material unit from a common source. In choosing the lot unit we
balanced the fact that a small number of large lots can shorten inspec-
tion time against the additional difficulty of processing the rejected
lots. In our case a lot equals one file.
3. Choose the type of protection. There are two types of protection:
Lot Tolerance Per Cent Defective (LTPD) and Average Outgoing Quality
Limit (AOQL). The AOQL applies to the average level of quality over
all lots being inspected. It is appropriate for a continuing supply
of a product. The LTPD applies to the quality level of each
individual lot. We chose the LTPD type of protection.
4. Choose a suitable level of LTPD or AOQL. For LTPD choose the value of
per cent defective you are willing to accept not more than 10 per cent
of the time, that is, reject at least 90 per cent of the time. We
balance the inspection costs against the consequences of accepting a
file of bad quality. We considered rates in the range of 1% to 3%
LTPD. We decided that 1% error rate was too costly and selected a rate
of 2% LTPD.
5. Choose between single sampling and double sampling. For better economy
in an overall inspection effort, double sampling is usually preferable.
However, for minimum variation in the workload, single sampling should
be used. We selected single sampling as a more straight forward and
preferable method in our case.
6. Select the proper sampling table on the basis of the preceeding choices.
We selected the Single Sampling Table for LTPD = 2 per cent (Figure 1
reproduced from reference 1).
7. Obtain an estimate of the Process Average Per Cent Defective (PA). Use
previous data to obtain the PA. Even a rough estimate should be used
if little prior data are available. A poor estimate will only decrease
the economy of the plan but maintains the same LTPD protection. After
some initial examination of HERL data, the column entitled "Process
Average 0.61% to 0.80%" was used.
34
-------
Single Sampling Table for
lot Tolerance Per Cent Defective (LTPD) = 2.0%
SINGLE
SAMPLING
2.0%
LTPD
LotSiM
1-75
76-100
101-200
201-300
301-400
401-600
601-600
601-800
801-1000
1001-2000
2001-3000
3001-4000
4001-6000
6001-7000
7001-10,000
10,001-20.000
20,001-60,000
60,001-100,000
PnetmA
OtoO.
•
An
70
86
96
100
106
106
110
118
118
118
118
198
196
196
200
200
208
c
0
0
0
0
0
0
0
0
0
0
0
0
02%
AOQL
%
0
0.16
0.26
0.26
0.28
0.28
0.29
0.29
0.28
0.30
0.31
0.31
0.41
0.42
0.42
0.42
0.42
0.42
0.03 to
ii
All
70
85
95
100
106
106
110
115
190
190
195
260
265
265
286
335
336
c
0
0
0
0
0
0
0
0
0
1
1
1
2
2
2
2
3
3
Average
0.20%
AOQL
%
0
0.16
0.25
0.26
0.28
0.28
0.29
0.29
0.28
0.40
0.41
0.41
0.80
0.50
0.50
0.51
0.58
0.88
Process Average
0.21 to 0.40%
«
All
70
86
95
100
105
175
180
186
255
260
330
335
336
396
460
620
586
e
0
0
0
0
0
0
1
1
1
2
2
3
3
3
4
5
6
7
AOQL
%
0
0.16
0.26
0.26
0.28
0.28
0.34
0.38
0.37
0.47
0.48
0.64
0.64
0.66
0.62
0.87
0.73
0.76
Process Average
0.41 to 0.60%
A
AU
70
85
95
160
166
175
240
245
325
385
460
455
515
620
650
710
770
e
0
0
0
0
1
1
1
2
2
3
4
6
5
6
6
8
9
10
AOQL
%
0
0.16
0.25
0.26
0.32
0.34
0.34
0.40
0.42
0.60
0.58
0.63
0.63
0.69
0.69
0.77
0.81
0.84
Process Average
0.61 to 0.80%
•
All
70
85
95
160
165
175
240
305
380
450
610
675
640
760
885
1060
1180
e
0
0
0
0
1
1
1
2
3
4
5
6
7
8
10
12
16
17
AOQL
%
0
0.16
0.25
0.26
0.32
0.34
0.34
0.40
0.44
0.64
0.60
0.66
0.69
0.73
0.79
0.86
0.93
0.97
Process Average
0.81 to 1.00%
n
AU
70
85
95
160
166
235
300
305
440
666
690
750
870
1050
1230
1520
1690
e
0
0
0
0
1
1
2
8
3
6
7
9
10
12
16
18
23
26
AOQL
%
0
0.16
0.25
0.26
0.02
0.84
0.36
0.41
0.44
0.68
0.64
0.70
0.74
0.80
0.88
0.94
1.0
1.1
Figure 1 - Reference 1
Reproduced by permission of John Wiley & Sons, Inc. and Copyright (1959)
Bell Telephone Laboratories from Sampling Inspection Tables Single and
Double Sampling, 2nd Edition by Harold F. Dodge and Harry G. Romig.
35
-------
8. Choose a sampling plan for the given lot size and estimated PA.
Since the sampling plan is designed as a function of the PA, use
the estimated PA as the table entry. Remember to obtain revised
PA estimates from new data and if possible to select a more
economical plan. For one HERL study, there were 7800 source
documents. Based on our estimated PA of 0.61% to 0.80%, we would
go to the 2% LTPD single sample table, locate the correct PA
column and find the sample size for 7800 forms. This would result
in the row corresponding to 7001 to 10000 forms being used. A
sample size of 760 forms with no more than 10 errors would be used
for the study. For the purposes oT our plan, the original source
form was considered correct and any code difference on the computer
file was considered an error.
9. Find the OC curve of the sampling plan. If the operating charac-
teristic (OC) curve is satisfactory, choose the plan. The OC curve
for our plan is shown in Figure 2.
10. Select sample units from the lot by a random procedure. A preferred
method for accomplishing randomization is the use of random numbers.
11. Follow the prescribed procedure for single sampling. Inspect each
unit for the characteristics adopted in step one and in accordance
with sampling procedures.
12. Keep a running check of the PA. Change the sampling plan as necessary
to match shifts in the PA. Adopt a definite time period for making
new estimates such as every month or every quarter. In our experience,
the PA did not change significantly over 6 to 7 months.
The Dodge-Romig acceptance sampling plan described is not only
being used on the past CHESS studies but is also used on current studies.
For each study undertaken by the data processing staff, a data processing
protocol is prepared in addition to the normal study protocol. The protocol
describes what is to be done including manual and computer steps and the
expected timeframe. Edit checks to be performed usually include
1. Check for valid codes
2. range checks
3. field type, numeric, alpha and/or.alphanumeric
4. consistency checks such as date of birth versus age.
Edits may be accomplished by an individually designed program or one or more
SPSS runs. SPSS frequency distributions are principally used to identify out
of range and other unacceptable codes. Audit trails are maintained throughout
the processing.
In conclusion, we believe we have a successful operational quality control
program for our current needs relative to processing of large computer data files.
For similar applications, I would recommend reviewing these procedures as described
by Dodge and Romig and investigating more usage of SPSS as a quick evaluation
of the contents of the data files.
36
-------
PROBABILITY OF ACCEPTING A FILE WITH TRUE ERROR
RATE 6 USING DODGE-ROMIG LTPD (2.0%) PLAN FOR
n = 760, c = 10; -10000 DIARIES WITH ESTIMATED
ERROR IN RANGE .61 - .80%
100
X
n=760, c=10
TRUE
ERROR
RATE (e)
0.00
0.25
0.50
0.75
00
o
o
1
Q.
Ul
8
u.
O
>
Ij
OQ
00
O
CC
Q.
0.00
0.50 1.00 1.50
% TRUE ERROR RATE (0)
AOQ=0.79
PROBABILITY
OF ACCEPTING
WITH 6
100.0
100.0
99.0
97.0
85.0
65.0
42.0
22.0
10.0
2.00
Figure 2 - Operating Characteristics Curve
Reproduced by permission of John Wiley & Sons, Inc. and Copyright
(1959) Bell Telephone Laboratories from Sampling Inspection Tab!es
Single and Double Sampling. 2nd Edition by Harold F. '~
Harry S. Romig.
Dodge and
-------
REFERENCE
1. Dodge, Harold F. and Romig, Harry G., Sampling Inspection Tables Single and
Double Sampling, 2nd Edition, John Wiley and Sons, Inc., New York, 1959.
38
-------
USE OF SUCCESSIVE TIME DIFFERENCES AND DIXON
RATIO TEST FOR DATA VALIDATION
by
Tyler Hartwell
Research Triangle Institute
Research Triangle Park, North Carolina 27709
39
-------
USE OF SUCCESSIVE TIME DIFFERENCES AND DIXON
RATIO TEST FOR DATA VALIDATION
Tyler Hartwell*
ABSTRACT
This paper describes preliminary work on two statistical data
editing procedures designed to flag suspect minute and hourly data from
the Regional Air Pollution Study (RAPS) computer data bank which contains
data from the Regional Air Monitoring System (RAMS) network of monitor-
ing stations in St. Louis, Missouri. In particular, the data editing
procedures are: (i) an intraparameter check where the differences of
successive minute averages for a given variable and station are evaluated,
and (ii) an intraparameter check where hourly averages for a given hour
and variable are compared across the RAMS network or across a selected
subset of stations by use of the Dixon ratio. The paper describes how
the procedures were developed for their current application and gives
results of applying the procedures to actual data on the RAPS data bank.
In addition, suggestions for future research on the two procedures are
presented. It is concluded that at the present time the two data edit-
ing procedures should be useful to EPA in flagging suspect minute and
hourly data from the RAPS data bank.
* Dr. Hartwell is a senior statistician, Statistical Methodology and
Analysis Center, Research Triangle Institute, Research Triangle Park,
North Carolina 27709.
40
-------
I. INTRODUCTION
The RAMS network of 25 monitoring stations in and around St. Louis,
Missouri collects data on a large number of pollutant (e.g., 0.,, CO,
THC, CH4, NO, N0x, S02, TS, H2S) and meteorological variables (e.g.,
wind speed, wind direction, temperature, dew point, delta temperature,
barometric pressure). Figure 1 presents a map of the location of the 25
RAMS stations. The figure indicates that the urban stations (nos. 101-
108) may be as much as 8 miles apart while the rural stations (e.g.,
nos. 122-125) may be as much as 35 miles apart. The RAPS Data Bank
contains data from the RAMS network of stations.
The purpose of the two statistical data editing rules (i.e., minute
successive differences and the Dixon Ratio) examined in this paper is
only to flag suspect RAPS jlatji, not to delete it from the data bank.
That is, because of the vast amount of data collected by the RAMS
network, data editing rules are needed to limit the amount of suspect
data that meteorologists and atmospheric chemists need to examine in
detail. Thus, the purpose of this paper is to examine two data editing
rules that indicate data that should be examined in more detail by EPA
personnel who have an intimate knowledge of the data that the RAMS
network collects.
In addition, it is important to note here that the work presented
here is only preliminary. Because of the complexity of trying to obtain
data editing rules that apply to a large network of monitoring stations,
additional work needs to be done on refining the two rules. However, at
this point in time, it is felt that the two data editing rules presented
should prove to be useful in flagging suspect data from the RAMS network.
41
-------
°t
CO I
CM I
-------
II. MINUTE SUCCESSIVE DIFFERENCES
The RAMS data received at the Research Triangle Park, North Carolina
contains minute data on several air pollution and meteorological vari-
ables. Several computerized range validation checks are performed on
this data by the prime contractor, prior to forwarding it to the RAPS
Data Bank. The RAPS Data Bank was interested in determining if a
statistical procedure could be used in further validation of the data,
to flag minute data values which appeared to be outliers. In particular,
there was a need to develop and evaluate a procedure (i) which could be
applied to each station's data for one variable at a time and (ii) was
easy to compute and only required one pass through the data. Accord-
ingly, this study was limited to a simple statistical procedure that
required little computation. After discussions between EPA and RTI
staff members, it was decided to examine a statistical data editing rule
based on minute successive differences.
In general, the editing rule examined is designed to flag minute
values which are relatively much higher or lower than the preceding
minute value; i.e.,
flagged value
Variable
Level
Time in minutes
Thus, the editing rule is designed to detect large spikes in the minute
values of a variable at a station.
43
-------
In particular, the data editing rule is the following: at a par-
ticular station compute successive differences between minute values of
a particular variable and if a successive difference is "too large" then
flag this value. This rule is extremely simple to apply and requires
only one pass through the data base.
In order to determine when a successive difference was "too large",
the (i) distributions, (ii) sample means, and (iii) sample standard
deviations (s.d.) of minute successive differences for several stations,
times of the day, and air pollution and meteorological variables from
the RAMS network were examined. For example, Figures 2, 3, and 4 pre-
sent three of these distributions for the variables windspeed, ozone,
and NO-. In all, over 200 of these minute successive difference plots
were examined.
After examining these distributions and the corresponding sample
means and standard deviations in detail, it appeared reasonable to
assume that in general the minute successive differences were approxi-
mately normally distributed with a mean of zero. However, it was also
clear that the standard deviation of minute successive differences was
not constant over stations, times of the day, seasons of the year, and
pollutant or meteorological variables. For example, Table 1 presents
s.d.s of minute successive differences for CO and methane by the factors
time of the day (0-4 a.m., 4-8 a.m. and 8-12 a.m.), season of the year,
and two rural and two urban stations. It is obvious from the table that
the s.d.s vary a great deal over the various factors.
Accordingly, it was decided to assume that the distribution of
minute successive differences for variables in the RAMS network was
normally distributed with a mean of zero and a standard deviation that
44
-------
CNl
LU
ce
CD
CO
LU
O
LU
a: .
LU ! -
—
CO
CO O
LU
CJ LU
CJ SI
ZD •—
CO I—
CD
a CD
LU cn
LU i—I
a.
CO >
a <
•z. t=\
ru
i
CO
CsJ
II
LU
N
I—»
CO
CO
CD
CO
LU CD
0 rH
O O
h- K
D <
CQ h-
—• CO
a:
h-
co
1
ru
i
LJ
Q
in
CD
*
o
QJ
»
§•
I
H
-------
FIGURE 3
DISTRIBUTION OF OZONE MINUTE SUCCESSIVE DIFFERENCES
FOR STATION 122; DAY 180, 1976; TIME 0 TO 4 A,M,
o
c
0)
3
cr
OJ
V-i
OJ
•H
4J
o . *t —
00
-
-
™
„
_
0Ck
-e
P*1
.6 -0.4 -0.
2
1
MM
(
MM*
MHMP
•MHB
mm**
mwm*
\
__
)
•
— •
1 — «
{
"l
1 1 f 1 ' 1
8.2 6.4 8.6
xie
-2
MEAN" 3.60089 STDEU= 0.00084
SAMPLE SIZE = 236
46
-------
FIGURE
DISTRIBUTION OF NC^ MINUTE SUCCESSIVE DIFFERENCES
FOR STATION 122; DAY 180, 1976; TIME 4 TO 8 A,M,
o
c
-------
PQ
CQ
LU
z
<
in
t-
LU
Q
Z
z
a: o
o •—
u- H-
CD
i\
CD
GO
•-• UJ
a.
co >-
LU i—
u
Z Q
LU Z
on <
LU
U_ •»
U- C£
LU
> U_
— O
CO
CO
LU
O CO
O <
ID LU
Z
O
GO GO
LU -N
u.
O
LU
CO •-•
Z I—
o
LU
a
Q
a:
<
a
z
<
GO
CN!
CNI
rH
z
o
1
r—
<
f—
GO
r-x
r-H
i-H
Z
0
>— 1
J-
<
1—
GO
•zr
0
r-H
Z
0
K
t-
GO
i-H
O
, 1
^H
^
0
1— 1
1—
<
GO
CNJ
t-H
i
oo
oo
1
C3-
j- s::
i
0 -
<
G
LU
_l
CO
<
1— t
a:
<
cn
t-H
i-H
•
1-^
hn
CD
"
i-H
cn
CD
CD
CD
i— 1
•
r-H
un
t— 1
CD
UD
CNI
•
cn
ca-
r—i
-
un
UD
r-H
•
CNJ
i_n
CNJ
OL
LU
r-
z
»— 1
is.
CD
-=r
CD
<_>
CD
fO
CD
"
OO
CNJ
CD
^T
r-H
CD
cr
UD
cr
*
t-H
hO
CD
"
CNI
o->
CD
CNI
CNJ
i_n
»
t^\
00
CNI
CD
l^>.
CNJ
r-H
CO
CD
-
cr
cn
rA
"
^r
Ln
o
™
CD
Z
i— i
o:
a.
GO
cn
CNI
r-H
CD
r-H
CD
"
r^.
r— 1
CD
cr
!—i
CD
UD
r>n
CD
•
UD
CNI
CD
"
cn
CNJ
CD
UD
r-H
^r
-
r-H
hO
CNJ
CNJ
CNI
r— I
UD
-=r
CD
-
UD
OO
CNI
"
Ln
•— i
CD
**
cc.
LU
S
SI
•=>
GO
CD
CNI
OJ
CNI
CNI
CD
"
r\
UD
CD
N-\
r-H
CD
i-H
CN!
CD
"
hO
CN)
CD
"
h^
CNJ
CD
tn
cn
CNI
-
r-H
cn
CD
en
oo
r-H
r->.
UD
CD
-
i-H
cn
CD
•
Ln
cr
CD
™
_l
_l
<
LL_
•^^
^v.
CD
l\
CNI
cn
r-H
CD
**
r^
r-H
CD
"
cr
CNJ
CD
UD
CD
CD
-
K>
hn
CD
r-H
UD
CD
r^
N^
CD
-
CD
cr
o
"
CD
UD
t-H
m
OL
LU
\-
Z
H— 4
~-^-
CD
•=r
LU
z
<
•3^
I-
LU
s:
t-H
t-H
CD
•*
t\
i-H
CD
K^»
CNJ
CD
OO
CD
CD
~
cn
CD
CD
**
!-«>.
CD
CD
CD
t-H
CD
-
CO
CNI
CD
.3-
r-H
CD
CD
r— 1
CD
-
OO
r-H
CD
"
i_n
CN)
CD
m
o
z
•—*
C£.
D-
GO
cn
CNI
r-H
un
CD
CD
"
UD
CD
CD
Ln
CD
CD
CO
CN)
CD
™
|V^
i-H
O
*
UD
r-H
CD
r-^
r-H
CD
•
cr
t— i
CD
CD
CNJ
CD
*
1^.
CD
CD
-
cn
CD
CD
"
CD
t-H
CD
•i
a:
LU
s:
2:
a
GO
V^
^*S.
CD
CN)
CN|
1^
CD
CD
m
00
CD
0
UD
CD
CD
—j
i-H
CD
"*
UD
t-H
CD
•
CNI
r-H
CD
i_n
r>n
CD
•
K"\
cr
CD
CN)
i_n
CD
CO
CD
CD
•
cr
r-H
CD
"
UD
cr
CD
—
_i
_i
<
u_
N^,^
^*S.
CD
r^.
CNJ
<
<
z
LU
1—4
O
I-M
U.
10-
^}
CO
48
-------
may vary over time of the day, season, and type of station (urban or
rural). This implied that a minute successive difference would be
flagged when it was greater than a function of the appropriate standard
deviation (e.g., a standard procedure for detecting outliers for the
normal distribution with mean zero is to flag observations which are
greater (or less) than 4 s.d.s). The probability of an observation
being greater (or less) than 4 s.d.s for the normal distribution is less
than .0001. Thus, the problem reduced to determining for each variable
of interest the appropriate standard deviation which might depend on
time of the day, season of the year, and type of station.
To examine this problem, standard deviations of minute successive
differences for each variable of interest for approximately 4 days per
season in 1976, 4 stations (2 urban and 2 rural), and 3 times of the day
(0-4 a.m., 4-8, 8-12) were computed. Thus for each variable between 100
and 192 (4 days x 4 seasons x 4 stations x 3 times = 192) standard
deviations were computed. A standard deviation was only computed if at
least 60 minute successive differences were available during the 4-hour
time period being considered. The results of some of these computations
are summarized in Table 2 for 10 of the variables measured in the RAMS
network. These 10 variables were chosen not only because there was
interest in editing their minute values but also because sufficient data
was available on them for the days selected for computing standard
deviations. The s.d.s presented in Table 2 are average s.d.s (i.e.,
averaged over several stations and days).
Using the s.d.s in Table 2 a statistical technique referred to as
the analysis of variance (ANOVA) was used to test if these average
standard deviations were significantly different (in a statistical
49
-------
ce
o
CO
LU
O
z
LU
o:
LU
Q
LU
CO
CO
UJ
o
u
00
UJ
oa
<
o:
<
>-
m
O
CO
§ <
p <
< C3
Q
C£
<
a
<
oo
u.
o
a.
CNI
LU
UJ
cc
LU
•*
"^
CsJ
_l
ce
LU
CD
ce
.
ca
H
c
u_
^
z
o
»-4
1—
<
t-
oo
_i
LJU
ce
UJ
y|
jr
3
oo
O
Z
cc
0.
oo
CC
UJ
1—
z
ae
i
OO
oo
.sr
i
CD
3!
a:
a;
z
03
cc
LU
|
CO
•^
•— «
ce
<
N"1*
Ln
i— i
cr
•
CS|
Ln
.
•sr
UD
.
OO
Ln •
Ln
-
cr
r— 1
•=r
cn
Ln
^.
Q 0
LU LU
LU CO
a. ^
CO CO
a 01
Z LU
3 UJ
cn
CD
UD
0
•
cn
O
.
•— <
•
CD
i— 1
r- 1
UD
CD
CD
OO
CD
^.
UJ
a:
l—
^
CD
CD
CD
t— 1
CSJ
3
CSI
i-H
8
Cs|
CD
CD
i— 1
S
•
CD
Csl
CD
CD
UJ x->
Z •£.
o a.
NI a.
CD ^— •
cn
•—I
O
CSI
"
O
Ln
•—i
•
Ln
CD
i— i
.
UD
cn
CD
CSJ
Csl
CSI
cn
i— t
i— i
CD
^
r- 1
1— 1
•
r- 1
r— 1
s~^
•z.
o a.
<_> a.
v- f
LA
CSI
CD
LA
o
•
|x^
1— 1
CD
m
zy
CSI
CD
^r
CSI
CD
o
oo
CSI
CD
Csl
CD
cn
!— 1
CD
-
r-H
CD
LU
*c y!
i a.
1- D.
LU "*— ^
"
LA
f*s^
CD
cn
CD
•
CSI
oo
o
fs^_
Ln
CD
Ln
uo
CD
OO
CD
-a-
oo
CD
LA
CD
Ln
CD
^^
cn
CD
^^
C— ) ^.
2C Q.
1 — 0-
1 — /
•a-
0
CD
CD
O
CD
•
rH
hn
CD
O
Csl
i
UD
CD
O
cn
o
CD
OO
i
CSI
8
r-l
I— I
CD
CD
cn
CD
CD
^•^
y*
CD Q-
~Z Q-
•— '
^3.
\-C\
0
o
cn
O
CD
•
CD
CD
O
cn
Ln
o
CD
OO
Ln
C~i
°
Ln
Ln
CD
CD
Csl
CD
CD
Ln
^J-
CD
Csl
CD
CD
OO
OO
CD
CD
^-^
x 2:
0 0-
TZ a.
— '
oo
1— 1
o
CD
CSI
CD
0
CSI
CD
CD
CD
CD
t— 1
8
OO
CSI
CD
0
Csl
CD
CD
Ln
1— 1
8
IJTJ
CD
CD
cn
CD
8
^
Csl
cc
u.
3 ^
00 Q.
a.
^
0
1— 1
f-H
CD
0
Ln
CD
s
•
N"\
CD
CD
CD
j— . (
c-H
CD
CD
CSI
Csl
CD
CD
.—1
T— 1
O
CD
CD
rH
0
CD
g
8
UD
CD
-
LT\
r— |
8
s-^.
Csl Zi
CD Q-
oo a.
CN
Q
o cn
CD Ln
i-H CD
cn CD
Z CD
o r>n
< cn o
i— cn —
co CM i—
-. t-
co CD co
5- 1—
< CSI U.
a o
«s
c3 Ln 111
z UD a.
< csi >
i—
CO >
Z CD Q
O CSI z
— CSI <
I- cn
z
o
csi co
_i <
< ^ tu
cc CD to
UJ OO
> 1— I -
UJ >
fO -N <
O Q
cn r-^
UJ r— I U.
> o
o ->
CD UJ
co tf\ E
co cn a:
CSJ LU
z — i >
o o
Q CD
LU CD
CO r— I
<
Z UD
UJ -.
Z CD
LU
_l •>
a. cn
•
LJ Q
<
LU
UJ
a:
LU
CH
LU
CD
50
-------
sense). In particular, a separate ANOVA was carried out for each of the
10 variables. In each of the 10 ANOVAs, statistical tests were used to
determine if average standard deviations by time of day, type of station,
and season of the year were significantly different. The results of
running the 10 ANOVAs indicated that in the majority of cases the average
s.d.s in Table 2 were significantly different. For example, in column 3
of the table (i.e., ozone by station) the average s.d.s of minute suc-
cessive differences for ozone for two urban stations was .0020 and two
rural stations was .0012. The test of significance of these two averages
was significantly different at the .01 level. Note that Table 2 also
presents the average s.d. for each variable over time of day, season of
the year, and type of station (e.g., .0016 for ozone).
The average standard deviations in Table 2 clearly indicate that
from a statistical point of view the s.d.s of minute successive dif-
ferences are significantly different for several of the variables for
one or more of the factors examined (i.e., station type, time of day,
and season of the year). Thus, to be strictly correct (in a statistical
sense) in applying the data editing rule based on minute successive
differences, it would be necessary to base the rule on varying s.d.s by
season of the year, etc. (i.e., the rule would be ±4 s.d.s where the
s.d.s are given in Table 2).
Due to the fact that the above data editing rule might prove to be
somewhat confusing, a more conservative and easier to program rule has
been initially examined (of course, only actual application of a data
editing rule can determine its practical usefulness). This rule is
based upon using, for all minute successive differences for each vari-
able in Table 2, ±4 times the largest average s.d. across station type,
51
-------
time of day, and season of the year. Thus, for ozone the rule would be
based on ±4 times the s.d. = .0024 (i.e., the average s.d. for summer).
This rule is extremely easy to apply requiring only one value to be
exceeded by each minute successive difference of a variable regardless
of station, time of day, or season. Of course, it is conservative in
the sense of having a limit which is somewhat high in many cases (e.g.,
for ozone in winter a more exact rule would be based on a s.d. = .0007).
Accordingly, using the largest average s.d. for each variable in
Table 2 and basing the data editing rule on ±4 s.d. limits, Table 3
gives possible limits for flagging minute successive differences by
variable for the RAMS network of stations. In deriving the limits in
Table 3, it was noted for the RAMS network that CO, methane, and THC
were only measured every 5 minutes and total sulfur (with the exception
of Station 117) and S09 were only measured every 3 minutes. Therefore,
the s.d.s given to RTI based on minute successive differences of these
variables were underestimates for detecting spikes at five (or three)
minute intervals (i.e., the average s.d.s in Table 2 are too small for
these five variables). In an attempt to compensate for this under-
estimate, Table 3 includes an adjustment factor for these five vari-
ables. The adjustment factor multiplies the ±4 s.d. limits for CO,
methane, and THC by /5 and the ±4 s.d. limits for total sulfur and S02
by /3. These factors were derived by assuming that on the RAMS data
file the minute successive differences for CO, methane, and THC are zero
except for every 5th minute (etc. for total sulfur and S0~).
Using the limits given in Table 3, RTI then examined the percentage
of minute successive differences that would be flagged for Stations 101
and 122 for 8 days in 1976 for 10 RAMS variables. The results of these
52
-------
TABLE 3
POSSIBLE MINUTE SUCCESSIVE DIFFERENCE LIMITS
ON 10 RAMS VARIABLES17
VARIABLE LIMIT
WINDSPEED (METERS/SEC,) ±3,0
TEMPERATURE (°C) ± ,660
OZONE (PPM) ± ,0096
CO (PPM) ±1,97
METHANE (PPM) ± ,316
THC (PPM) ± .m
NO (PPM) ± ,028
NOX (PPM) ± ,035
TOTAL SULFUR (PPM) ± ,022
S02 (PPM) ± ,015
***' BASED ON ±4 STANDARD DEVIATION LIMITS, IN ADDITION,
FOR CO, METHANE, AND THC, THE ±4 S,D, LIMITS HAVE BEEN
MULTIPLIED BY-\/5 TO ADJUST FOR THE FACT THAT THESE
VARIABLES ARE ONLY MEASURED EVERY 5 MINUTES, SIMILARLY,
FOR TOTAL SULFUR AND S02 THE ±4 S,D, LIMITS HAVE BEEN
MULTIPLIED BY>/3 SINCE THESE VARIABLES ARE ONLY MEASURED
EVERY 3 MINUTES,
53
-------
computations are given in Table 4. The table shows that except for
ozone the percent flagged per variable was less than .6 percent. For
ozone it appears that entirely too many minute differences were flagged.
For example, Figure 5 presents a plot over two days of minute values of
ozone for Station 105. Examination of the figure indicates that a
relatively large percentage of the minute values would be flagged using
the limits given in Table 3; although, EPA personnel have indicated that
the data in Figure 5 is not atypical. In addition, discussions with EPA
personnel have indicated that in the RAMS network, five of the stations
(101, 104, 105, 107, and 115) are heavily affected by traffic. Thus, it
may be necessary for these stations to have higher limits for flagging
minute values for ozone than those given in Table 3. Also, it has been
suggested that for these traffic affected stations it may be necessary
to examine both ozone and NO minute values simultaneously before flagging
X
ozone values (e.g., if a minute ozone value jumps significantly from one
minute to the next but the NO reading does not jump, then and only then
X
should the ozone value be flagged.)
In addition to the results given in Table 4, the percentage of
minute successive differences that would be flagged for Station 105 for
8 days in 1976 using the limits given in Table 3 were examined. The
results were the following:
Percentage Flagged for Station 105
Variable Percent Flagged
ws
Temp.
°3
CO
CH.
4
THC
NO
NO
X
TS
S00
.04
.12
7.9
.56
.41
.90
1.5
1.4
1.5
1.8
54
-------
The above table clearly indicates that additional work needs to be done
for ozone since entirely too many values are being flagged. In addition,
for the other pollutant variables many more values are being flagged
than in Table 4 (Stations 101 and 122 combined). Thus, it would seem
that the variables at Station 105 are probably being affected by auto-
mobile traffic.
Accordingly, at the present time it appears that the minute suc-
cessive differences limits in Table 3 except for ozone are probably
reasonable for a majority of the RAMS monitoring stations. However, for
Stations 101, 104, 105, 107, and 115 which are heavily affected by
traffic, additional work needs to be done on the limits for the pollutant
variables (the limits for windspeed and temperature appear reasonable
for all stations). Of course, since only Stations 101, 104, 105, 116,
117, and 122 were examined in this analysis to date, it may be that
additional work is needed for other specific stations. For Stations 101,
104, 105, 107, and 115 wider minute successive differences limits should
be examined for the pollutant variables particularly for ozone. As
mentioned previously, it may be for ozone that simple minute successive
difference limits are impractical (i.e., for ozone at the traffic
effected stations it may be necessary to have a minute data editing rule
which is tied to another pollutant such as NO ).
X
Another refinement of the limits given in Table 3 that might be
examined is to have them vary by time (see Table 2). For example, for
ozone the s.d. of minute successive differences is much higher for 8-
12 a.m. than from 0-8 a.m.
Finally, before proceeding, Figures 6 and 7 present two plots of
minute values for CO from the RAPS Data Bank for Stations 101 and 105,
55
-------
respectively. These plots indicate minute CO values which were flagged
by the CO limits given in Table 3. The plots indicate that the data
editing rule based on minute successive differences may be quite useful
in detecting minute outliers for CO.
56
-------
TABLE 14
PERCENTAGE OF MINUTE SUCCESSIVE DIFFERENCES FLAGGED FOR
STATIONS 101 AND 122 FOR 8 DAYS IN 1976; BY VARIABLE^
VARIABLE
WINDSPEED
TEMPERATURE
OZONE
CO
Cfy
THC
NO
MOX
TS
S02
PERCENT FLAGGED
,53
,17
3,0
,06
,11
,24
,12
,12
,51
,34
TOTAL NUMBER OF MINUTE SUCCESSIVE DIFFERENCES COMPUTED
= 23,040 (8 DAYS x 2 STATIONS x 1,440 MINUTES/DAY),
THE DAYS WERE 1, 2, 96, 97, 231, 232, 286 AND 287,
57
-------
CO
o
.
o
u.
LU
LU
V^
CO
CO
LU
LU
Q.
0-
58
-------
5)
3
tfl
60
oo
V
•< r~ in
x r o
i r^ »vi —i
o iu -
Q
a:
o
u.
UJ
o
o
« ".
in <»
or •— -S
OQ
LJJ
C/5
ID
{/)
a:
LU
•>-4 CO
O
I
»
O
00
CO
i i r
LD ^r csi
CD a.
CJ 0.
59
-------
UJ
CO VERSUS TIME BY MINUTE FOR DAYS 2 AND 3 IN 1976
RW1S RETRIEVAL PLOT USING DEFAULT LABELS DALEXXX)
con n/760ioa/769ie3/0eec/a359/i/i05/2/3'4/co 11/92/7
1SI33M
C.l COAB
•
•>
•
1 I 1 1
3 =;
^ .=3
1 I 1 1
3 —
N"
1 1 1 i
5 C
\ CN
1 1 1 1
3 r~.
J «—
"^
4
-j
4
ml
m
__3C
•
t
*e
_J
1
|
4
o-H
^_<
1 1 1 I
> C
H
t
till
3 C
t^-
1
^
CD
I
»
cu
1
n
CB
Ul
O £
UJ -
O O
s: 2
§ $
^
CN z
r-H g
A C
- »»
t 0
IV) Ul
<» -
m
o r^
g §
^^^ ^9
^ 1
^S «»
O g
0 ?
.. 10
03 ^
» 2
C» \
•• *^
ni
3 *
H
LLJ
o a.
cj a.
60
-------
III. DIXON RATIO
This study also examined the possibility of flagging hourly data
across the RAMS network by use of the Dixon Ratio. That is, for a
particular variable and hour of the day this ratio will flag stations
whose hourly averages are "too high" or "too low" as compared with the
other stations in the network for that hour and variable. The Dixon
criterion was examined as a potential validation procedure for hourly
averages because it was a simple procedure which was easy to compute and
only required one pass through the data.
In brief the Dixon criterion is the following:
For a particular hour and variable (e.g. , 0.,) rank the N
hourly values X. over stations such that X. ^ X_ < ... £ X^. Then
compute the criterion (if N is between 11 and 13)
Wz
RU = ~z—v— (to check largest hourly value)
VX1
- ;r~ (to check smallest hourly value) (1)
If Rj- (or R. ) is too large reject largest (smallest) hourly value
(e.g., if number of stations = 12 and R^ > .642, reject X^ at the
.01 level of significance under normal distribution assumptions).
Note, see [1] for the Dixon criterion when N is less than 11 or
greater than 13.
In general, the Dixon criteria is designed to reject the following
type of station hourly values:
[1] Dixon, W. J., "Processing Data for Outliers," Biometrics, Vol. 9
(1953), p. 74.
61
-------
Flagged value
Station Hourly Values
Thus, the criteria is designed to flag hourly values which contribute a
relatively large percentage of the range of the hourly values across the
network. Note, that the criterion as given in (1) only flags one high
and one low hourly value and not multiple hourly values. For the
present study RTI did not examine the flagging of multiple hourly values
for a particular hour of the day.
The Dixon criterion given in Equation (1) was first applied across
the entire RAMS network of 25. stations. The results of these calcula-
tions indicated that entirely too many hourly values were being flagged
for several of the RAMS variables (e.g., ozone, CO, and N02). Accord-
ingly, the reason so many hourly values were being flagged was deter-
mined by examining the sample means of several RAMS variables for urban,
residential, and rural stations in the RAMS network. Table 5 gives a
summary, for six RAMS variables, of some of these computations. The
table gives the sample means and standard deviations for the three types
of stations by season of the year. In general the table shows that the
sample means of hourly values for a particular variable are not the same
for urban, residential, and rural stations in the RAMS network. (This
is particularly true of the pollutant variables; whereas, for the
meteorological variables the means are much more similar across the
network.) Statistical tests of these station type means were found to
be significantly different in several cases. Thus, it became clear that
one of the underlying assumptions of applying the Dixon criterion was
being violated; namely, that the hourly station values come from a
62
-------
TABLE 5
HOURLY SAMPLE MEANS AND STANDARD DEVIATIONS FOR URBAN, RESIDENTIAL, AND RURAL
STATIONS^ IN THE RAMS NETWORK SY SEASON OF THE YEAR AND VARIABLE
STATION TYPE
VARIABLE
°3
(PPM)
CO
(PPM)
H02
(PPM)
CH/,
(PPM)
WINDS PEED
(METERS/SEC.)
TEMPERATURE
(°C)
SEASON
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
SPRING
SUMMER
FALL
WINTER
URBAN j RESIDENTIAL
MEAN2/ STD. DEV,
.030
,052
.017
.008
,699
.999
.837
.451
.030
.046
.026
.024
1.781
2,068
1.834
1.768
4.122
2,799
4.518
5.255
13.993
26.474
11.924
-5.180
.013
,023
.021
.006
.756
.917
,987
.526
,017
,087
.015
,012
.257
.486
.391
,275
1.775
1.082
1.385
1.656
4.337
1.879
6.271
6.648
MEAN
.034
,055
.018
,009
.634
1.072
.793
.418
,025
.047
.023
.017
1.604
1,905
1.787
1,436
3,928
2.710
4.122
4.871
13.864
25.767
11.277
-4,612
STD. DEV.
,013
.018
.011
.006
.670
.837
1.092
.327
.016
,094
.012
.010
.202
,279
,244
,235
RURAL
MEAN STD.
.041
.063
.025
.016
,333
.329
.249
.215
.011
.011
,011
.010
1.480
1.778
1,621
1.604
1,525 ! 4,020 1,
1,183
1.252
1,602
4.314
3.153
6.212
7,228
2,249
3.967 1.
5,277 2,
13.367 4.
25.497 2,
10.786 6.
-5.535 6.
DEV.
015
021
013
008
474
638
307
163
012
010
010
008
294
297
195
210
839
898
483
097
408
026
335
722
URBAN = STATIONS 101 TO 108; RESIDENTIAL - STATIONS 111 TO 113, 119,.
AND 120; RURAL - STATIONS 109, 110, 114 TO 118, 121 TO 125.
MEANS AND STANDARD DEVIATIONS ARE BASED UPON 4 HOURS PER DAY FOR 5 DAYS
OVER THE VARIOUS STATIONS (URBAN, RESIDENTIAL, OR RURAL).
63
-------
normal distribution with the same mean and variance. Instead, for
example, the hourly station values for ozone have different means for
urban and rural stations (the means are higher for rural stations). The
consequence of groups of stations having different means is illustrated
below:
Urban Rural
Station Hourly Values
The above figure shows, using the Dixon criterion, that some rural
stations may be flagged simply because their means are always higher
than the means for urban stations. Accordingly, after examining sample
means such as those presented in Table 5, RTI decided to apply the Dixon
criterion separately to rural stations (Stations 109, 110, 114 to 118,
121 to 125) and urban-residential stations (101-108, 111 to 113, 119,
and 129).
After applying the Dixon criterion to the two types of stations
separately, RTI then examined the results and again found that too many
hourly values were being flagged. Accordingly, after discussions with
meteorologists and air chemists it was decided that the Dixon criterion
for certain pollutant variables should only be applied across the rural
or urban-residential stations if the following criteria were met:
(i) the high station value > twice the low station value, and
(ii) the high station value > some constant (e.g., constant = .03 ppm
for ozone).
The first criteria simply means that a factor of 2 for an hourly average
across the network is not uncommon. Criteria (ii) limits the applica-
tion of the Dixon Ratio to situations where most of the measurements are
64
-------
well above minimum detectable. In addition, it was decided that the
Dixon criterion could not be used for hourly NO, NO , TS, and SO,,
X «£
values. Furthermore, it was felt that the use of the Dixon criterion
for CO was questionable due to the heavy influence of traffic on this
variable.
With the above restrictions, the Dixon criterion for detecting high
hourly values only (i.e., R in Equation (1)) was then applied to 7
rl
variables in the RAPS Data Bank for both urban-residential and rural
stations. In applying the rule an hourly value was flagged if R,, was
greater than .7 (except for dew point where R,, > . 6 was flagged). The
results of these computations are presented in Tables 6 and 7. In
addition, Tables 8 and 9 present examples of flagged hourly values for
several different RAMS variables. Tables 6 and 7 indicate that the
percent flagged is usually 5% or less. Thus, the Dixon rule as applied
does not seem to be impractical. In addition, the examples given in
Tables 8 and 9 clearly indicate hourly values for several variables on
the data bank that should be examined in more detail by knowledgeable
meteorologists and air chemists.
Accordingly, as with minute successive differences, RTI feels that
the Dixon criterion will be useful in flagging hourly data across the
RAMS network. However, further refinement of the rule may be required.
For example, two points that need further examination are:
(i) can the rule be applied to flagging low hourly values (RT in
Equation (1)), and
65
-------
(ii) can the rule be applied in practice to the stations which are
heavily influenced by traffic (101, 104, 105, 107, and 115),
particularly for the variable, CO? An alternative here would
be to only use the Dixon rule for the 20 stations not heavily
influenced by traffic.
66
-------
TABLE 6
RESULTS OF APPLYING DIXON RATIO TO 12 URBAN STATIONS
IN RAMS NETWORK172717
VARIABLE
OZONE
CO
CH4
THC
TEMPERATURE
DEW PoiNT^7
WlNDSPEED
PERCENT OF TIME RATIO > ,7
1,0
2,9
2,1
5,2
2,7
4,4
4,0
NUMBER FLAGGED
5
14
10
25
13
21
19
17 RATIO APPLIED TO ALL 24 HOURS ON 20 DIFFERENT DAYS FOR THE
VARIOUS POLLUTANT AND METEOROLOGICAL VARIABLES (= 480 RATIOS
FOR EACH VARIABLE),
RATIO ONLY USED TO FLAG HIGH VALUES,
FOR OZONE RATIO APPLIED ONLY IF HIGH STATION VALUE > ,03 PPM
AND HIGH STATION VALUE > 2 LOW STATION VALUE,
FOR CO RATIO APPLIED ONLY IF HIGH STATION VALUE > 3,0 PPM
AND HIGH STATION VALUE > 2 LOW STATION VALUE,
FOR CH/j RATIO APPLIED ONLY IF HIGH STATION VALUE > 2,0 PPM
AND HIGH STATION VALUE > 2 LOW STATION VALUE,
FOR THC RATIO APPLIED ONLY IF HIGH STATION VALUE > 2,0 PPM
AND HIGH STATION VALUE > 2 LOW STATION VALUE,
FOR WlNDSPEED RATIO APPLIED ONLY IF HIGH STATION
VALUE > 3,0 METERS SEC,
FOR DEW POINT PERCENT OF TIME RATIO > ,6,
67
-------
TABLE 7
RESULTS OF APPLYING DIXON RATIO TO 13 RURAL
STATIONS IN RAMS NETWORK^/^/
VARIABLE PERCENT OF TIME RATIO > ,7 NUMBER FLAGGED
OZONE 3,8 18
CO 6,0 29
Cfy 4,2 20
THC 5,0 24
TEMPERATURE 8,5^ 41
DEW POINT^ 6,5 31
WlNDSPEED *K4 21
1* RATIO APPLIED TO ALL 24 HOURS ON 20 DIFFERENT DAYS FOR THE
VARIOUS POLLUTANT AND METEOROLOGICAL VARIABLES (= 480 RATIOS
FOR EACH VARIABLE),
*/ RATIO ONLY USED TO FLAG HIGH VALUES,
** FOR OZONE RATIO APPLIED ONLY IF HIGH STATION VALUE > ,03 PPM
AND HIGH STATION VALUE > 2 LOW STATION VALUE,
FOR CO RATIO APPLIED ONLY IF HIGH STATION VALUE > 3,0 PPM
AND HIGH STATION VALUE > 2 LOW STATION VALUE,
FOR CHjj RATIO APPLIED ONLY IF HIGH STATION VALUE > 2,0 PPM
AND HIGH STATION VALUE > 2 LOW STATION VALUE,
FOR THC RATIO APPLIED ONLY IF HIGH STATION VALUE > 2,0 PPM
AND HIGH STATION VALUE > 2 LOW STATION VALUE,
FOR WlNDSPEED RATIO APPLIED ONLY IF HIGH STATION
VALUE > 3,0 METERS SEC,
^ FOR DEW POINT PERCENT OF TIME RATIO > ,6,
^ FOR TEMPERATURE THE DlXON RATIO FLAGGED EVERY HOURLY VALUE FOR
TWO CONSECUTIVE DAYS IN WINTER WHERE ONE STATION IN THE RURAL
NETWORK READ APPROXIMATELY 8°C AND ALL OTHER STATIONS READ LESS
THAN -2°C, SEE TABLE 9 (THIS COULD PERHAPS BE DUE TO A SIGN
MISTAKE AT THE HIGH STATION),
68
-------
TABLE 8
EXAMPLES OF FLAGGED VALUES FOR SEVERAL VARIABLES
USING DIXON RATIO ON 12 URBAN STATIONS IN THE RAPS DATA
VARIABLE
WINDS PEED
(METERS/SEC,)
TEMPERATURE
(°C)
DEW POINT
OZONE
(PPM)
CO
(PPM)
CH^
(PPM)
THC
(PPM)
DIXON
RATIO
,895
,821
,776
,766
,768
,769
,724
,723
,889
,853
,755
,797
,755
,996
,932
,789
,941
,931
,743
,734
,924
STATION VALUES
HIGHEST
15,7
14.4
6,4
6,5
32,1
18,7
-2,8
15,7
18,4
,163
,133
,041
3,81
43,5
4,2
4,2
5,6
4,1
3,10
4,19
4,31
2 HI
4,5
4,3
2,7
3,2
29,2
15,9
-6,2
9,6
7,1
,037
,100
,040
1,41
,51
,42
1,8
1,8
1,7
1,97
1,91
1,80
3 HI
3,6
4,2
2,6
1,0
28,4
15,2
-9,6
7,0
-6,5
,033
,089
,010
1,04
,24
,37
1,7
1,8
1,7
1,97
1,78
1,59
3 LO
2,4
2,6
1,5
-,30
27,3
14,2
-11,8
4,0
-8,7
,012
,080
,002
,16
,07
,13
1,3
1,6
1,5
1,63
1,28
1,50
2 LO
2,1
1,9
1,5
-.65
27,2
14,2
-12,1
3,7
-9,6
,011
,075
,002
,14
,06
,09
1,0
1,5
1,5
1,58
,91
1,36
LOWEST
2,0
1,7
1,3
-1,6
27,0
13,9
-13,8
-2,2
-10,1
,003
,002
,002
,13
,05
,07
,06
1,4
1,4
1,40
,05
1,25
I/
RATIO ONLY USED TO FLAG HIGH VALUES,
69
-------
TABLE 9
EXAMPLES OF FLAGGED VALUES FOR SEVERAL VARIABLES
USING DIXON RATIO ON 13 RURAL STATIONS IN THE RAPS DATA BANK-*-/
VARIABLE
WINDSPEED
(METERS/SEC,)
TEMPERATURE
(°C)
DEW POINT
OZONE
(PPM)
CO
(PPM)
CHij
(PPM)
THC
(PPM)
DIXON
RATIO
,815
,808
,772
,940
,957
,921
,845
,782
,701
,933
,900
,771
,929
,973
,889
,944
,873
,791
,842
,904
,855
STATION VALUES
HIGHEST
6,4
4,1
5,5
8,0
8,5
6,1
31,4
18,1
33,2
,206
,252
,045
8,7
28,9
8,4
10,1
7,6
5,0
4,2
3,9
5,4
2 HI
3,9
2,2
3,1
-3,1
-11,4
-7,0
6,7
5,0
17,6
,018
,051
,015
,96
1,7
1,3
2,4
3,9
2,9
2,3
2,4
3,0
3 HI
3,8
1,5
3,1
-3,6
-11,5
-7,2
,44
4,6
17,3
,016
,048
,012
,77
,94
1,1
2,3
2,7
2,7
2,3
1,9
2,2
3 LO
3,3
,95
2,7
-4,3
-12,3
-8,3
-5,1
2,1
13,4
,006
,028
,003
,21
,43
,28
1,9
2,1
2,1
2,0
1,7
1,9
2 LO
3,2
,84
2,3
-4,4
-12,4
-8,4
-5,2
,82
10,6
,002
,026
,002
,17
,17
,21
1,8
2,0
2,1
1,9
1,7
1,7
LOWEST
3,1
,49
1,9
-4,9
-12,6
-8,5
-7,8
-2,9
6,9
,002
,015
,002
,12
,17
,14
1,7
1,9
2,0
1,7
1,3
1,4
II
RATIO ONLY USED TO FLAG HIGH VALUES,
70
-------
CLUSTER ANALYSIS AS A DATA VALIDATION
TECHNIQUE
by
Harold L. Crutcher
(Consultant)
35 Westall Avenue
Asheville, North Carolina 28804
71
-------
CLUSTER ANALYSIS AS A DATA VALIDATION TECHNIQUE
H.L. Crutcher
INTRODUCTION
In any study the collection, processing, and storage of data are fun-
damental. Contaminated, adulterated, or "noisy" data confuse the investi-
gator. Data do not necessarily fall into neat categories. Usually there
are mixtures. Some of these are determinant; some are not.
There are many techniques used to cluster and to classify data. This
paper discusses one. This technique separates mixed data sets into subsets.
Each subset will exhibit homogeneous characteristics. The investigator can
assess the relative importance of the subsets and the nature of the subsets.
Outlying subsets may indicate anomalous true conditions or may indicate mal-
functioning of some part of the observational program.* Thus some idea of
data quality may be obtained.
The techniques used here require the assumption of the normality of
distribution of the data. If the data are not normally distributed, then
some transformation to approximate normality should be made. The loga-
rithmic transformation is often used. Where this is known to be inapplica-
ble, then another transformation is needed. For example, cloud cover is not
well represented by the normal nor the log-normal distribution.
The clustering program discussed here was initially developed by Wolfe
(1) and modified by Crutcher and Joiner (2). It will accept any input data.
However, the criteria selected by the user to enable the computer to make
decisions are based on the assumption of normality of distribution. Any de-
parture from this assumption introduces some uncertainty in the results.
In particular, the outlying subsets may be examined for their validity.
The minimum number in a set is determined by the number of elements being
examined simultaneously. With five elements, the minimum subset will be
one more than five, or six.
72
-------
There is always enough uncertainty without introducing more. For example,
although the 0.05 probability level is selected for decision, departure from
normality may actually cause the decision to be made at some other level, but
this level will never be known.
ESSENTIAL PHILOSOPHY
Many elements and many observations may be treated. Computer capacity,
time, 'and money will be the controlling factor. Within these constraints
the user may wish to randomly select a representative sample of the data for
processing.
Most investigators choose to standardize their data. This produces
dimensionless numbers with means of zero and a variance of one for each el-
ement. A mean of zero and a variance equal to the square root of n, the
number of variables, are obtained in the multivariate case.
If the elements are uncorrelated and are homogeneous, a spherical clus-
ter of data points is obtained. If the data are correlated, the original
element axes are rotated so that along the new axes obtained, the new com-
ponents are not correlated. The new system will then be spherical in shape
if the data are homogeneous.
CLUSTERING
If the data are clustered, even though the data are standardized, tests
of normality will be rejected. Therefore, the usual procedure is to cluster
data into probable groups. Then null hypotheses are established to compare
two groups against one, three against two, and so on until the null hypothesis
is not rejected.
Initial
The computer program may be set to establish any number of initial clus-
ters. Here, the first 40 entry data serve to establish 40 clusters, but
arbitrary clusters could have been inserted.
The number of elements is n so each datum is an n-vector with its point
in n-space. The 40 clusters represent 40 centroidal points in n-space. The
distances between the centroids are computed. The two closest are merged to
a new centroid which is a mean or average of the two. After merging, there
73
-------
are now 39 clusters. A new datum enters from storage to again fill out the
40 spaces reserved. This procedure is repeated over and over again until all
data have entered and have been assigned to one of the clusters. Variance
considerations or other distance measurements as well as distances between
the centroids can be used.
Intermediate
Forty clusters were obtained initially. These forty clusters are com-
pared on an argument of the distance between centroids or on variance consid-
erations as before. The two most nearly alike are merged into one cluster.
The procedure continues until one final cluster remains.
Figure 1, which came from Figure 1 Crutcher and Joiner (3), illustrates
in an abbreviated way the flow of 74 observations until final coalescence in-
to the final group. The final group is made up of the initial data but the
sequence is altered to show the entrance into the group. The data are 4-space
upper air observations at the Canton Island 30-mb surface (1960-1964). The
four elements of the subspace at the 30-mb surface are:
1. Height of the surface;
2. Temperature of the surface;
3. East-west wind component;
4. North-south wind component (orthogonal to east-west component).
Final
The user is required to provide the number of clusters wanted for review
and the probability level of rejection for the null hypothesis. The sequen-
tial tests are for k+1 groups versus k groups where k runs from 1 to 40. The
tests will continue until the null hypothesis in not rejected or until the
requested number of clusters have been examined.
Output
At the completion of the computer program, output is presented as:
1. The initial set of data in some sequence established by the use.
2. Matrix of observational setup and preliminary comparisons.
3. Forty groups (clusters) with identified input data and means.
4. Coalescence, step by step, into fewer and fewer groups until the
unlike final group is obtained.
5. Statistics with means and standard deviations for the main group
74
-------
;|
1
1
1
1
1
1
1
IN i
1
$
R
r- 1
1
1
1
1
1
"1
"1
»•
J
1
I
1
"!
1
1
i
s '
!-J
1 !
i
i
r i
r'~ i
i"
1"*
T—
r
I""*-"
r
r—
TS- -
7s
r
r
7 ~ "
r
r'"
r
\
r
\"""
r
7
i
i1
r
i •
i
i
r
i
l
i
r
r
r
r
i
r
j-. .
"""<_
n
;
";
RSRS~
.....
1
"V
1
*
l-I
\
1
i
"1
sss —
"1
"~~l
r* in f- r-
*
___
1
*
™"
}
— m
t
1
-«a-t
-^
(^^ ^•••^^«
1
4
1
1
a;KPS
^^•••^•™
^
[
'I
* * *
"*"""
? J7?5 '
* 4
_, i
ss t
*
"««
~1
«
_ —
{
f
""•1
1 •"
1
t
ssss
"}
*
S2£ S*
i
—
** •"
i
i
e» £ »5
*
SS=;SS
• "SSSS
— 1
" "
f
2aa™2
...
1
sjsas
*
SSKPS
S J^?'
T
i
S5SS
+
g»^«5
SS^SS
NSSSJ •
1
1
SXSK
M ifi t~ r- «
gasss
S3SRS
""""°~
ss-ss
f
"SSSI
" "
"
1 ™SS"S I
CO
O
CO-
Q I
Z O
«q; 10
_l CD
CO ,—
Z 3
i—i
O DC
U- «
s: i—
<£ O
i—' ex
o S
C3 O
QC <
UJ Z
I— O
oo (vi
O UJ
QC
o
UJ Q_
CJ UJ
00 I—
UJ
cc
75
-------
and each subset. Each set comparison has the actual probability
level printed. An option in the program permits either the selec-
tion of the eigenvector-eigenvalue output or the correlation matrix
output.
6. Discriminant function scores for each datum.
7. In case the eigenvector-eigenvalue output is selected, computer
print-plots of discriminant scores are shown with appropriate
assignment of each datum to a cluster. The clusters are numbered.
8. The final printing shows the data in order of the input identified
by the cluster configuration assignment. This permits easy review
of the classification of each individual datum.
EXAMPLES
Table 1 is taken from Table 5, Crutcher and Joiner (3). The statistics
are for two clusters derived from a January data set, Canton Island 30 mb
surface data, 1953-67. There are 434 data. The set is separated into two
clusters which comprise 31 and 69 percent of the total set. The zonal com-
ponent of the wind speed, which is an average of -6.4 m/s, is separated into
two groups whose means are -28.1 m/s and 3.5 m/s.
Figure 2 is taken from Figure 3, Crutcher and Joiner (3). The figure
exemplifies the separation of the Canton Island January data in the 2-space
of the orthogonal components of the wind. The mean height and temperature
data are shown. The variances may be compared with data of Table 1.
APPLICATIONS
Data of any type may be examined to determine whether there are reason-
able subsets of homogeneous characteristics. As the standardization tech-
niques remove the dimensionality of the data, i.e., degrees, mph, meters,
grams, etc., any measurements may be used. Thus, application may be made to
environmental data ensembles which include measurements of elements such as
particulates, pollutants (gaseous), precipitation, wind, temperatures, pres-
sures, or changes of any of the above.
Pollutant source or likely deposition areas may be identified or sug-
gested. Extension from one observational point to several will permit
76
-------
TABLE 1. CANTON ISLAND, 30-MB DATA AND THEIR
SEPARATION INTO TWO CLUSTERS, 1953-67
Data fraction
H (gpm)
ft (gpm)
T (°C)
*• (°Q
« (m s"1)
su (m s"1)
V (m s"1)
s, (m s-1)
rm
Data fraction
# (gpm)
j* (gpm)
f (°C)
*• (°C)
i* (m s ')
ju (m s"1)
0 (m s~l)
s, (m s-1)
rut
Group 1
(total)
1.000
23764.9
94.9
-57.0
2.7
- 6.4
16.2
0.4
4.3
- 0.1
1.000
23938.0
94.0
-53.6
3.1
- 4.3
19.6
0.1
3.9
- 0.0
January
TV = 434
Group 2
(easterly)
0.310
23720.6
65.6
-57.4
2.3
-28.1
5.8
1.1
4.7
0.2
July
TV = 558
0.470
23879.6
78.5
-55.4
2.2
-23.4
5.8
0.3
3.3
0.1
Group 3
(westerly)
0.690
23785.0
100.0
-56.9
2.9
3.5
7.5
0.0
4.1
0.1
0.530
23989.7
75.6
-52.1
3.0
12.6
9.3
- 0.2
4.3
0.1
Group 1
(total)
1.000
23801.3
81.8
-54.9
3.0
- 4.6
16.7
- 0.0
4.0
0.0
1.000
23886.6
104.4
-55.1
2.8
- 3.8
19.7
0.2
4.2
0.0
April
# = 509
Group 2
(easterly)
0.427
23773.8
71.0
-56.1
2.7
-20.9
10.7
- 0.2
3.7
0.0
October
#=476
O.?30
23829.3
67.6
-55.9.
2.2
-30.2
4.7
0.0
3.7
- 0.1
Group 3
(westerly)
0.573
23821.9
84.5
-54.0
3.0
7.6
7.6
0.1
4.2
0.0
0.670
23915.0
108.4
-54.7
2.9
9.3
7.3
0.3
4.4
0.0
77
-------
0
I o
j. «
"5
r. >
n u
IS!
o 4
a
z
i
Si
Z I
<
o
§«r- .„
-< « 10 «
u
ii
2
t- 10 ui W
TTT
P NO. H
23887
23829
23915
.SO
UP
O.
GRO
1
2
3
/^
CQ
o
CO
o» • ffi
.j n n n
O ft ft fi
Z
i o.
O^Nmoc^O^Mn
(9 o. o
,-s
-
>-
a:
C\J
UJ
oc.
cs
78
-------
mapping in a topographical sense. These techniques can be applied to data
sets derived from topographical studies made by use of polynomials, trigono-
metric, polynomial orthogonal, or other types of polynomials. Recent com-
puting advances in the computation of orthogonal polynomials known as asym-
metric singular decomposition (ASD) procedures make these polynomials easier
to obtain and use.
REFERENCES
1. Wolfe, J.H. NORMIX 360 Computer Program. Research Memorandum SRM 72-4,
Naval Personnel and Training Research Laboratory, San Diego, CA (1971)
125 pp.
2. Crutcher, H.L. and Joiner, R.L. Separation of Mixed Data Sets into Homo-
geneous Sets. NOAA Technical Report EDS 19, National Oceanic and Atmos-
pheric Administration, Asheville, North Carolina 28804 (1977) 165 pp.
3. Crutcher, H.L. and Joiner, R.L. Another Look at the Upper Winds of the
Tropics. J. Applied Meteorology, 16 (5), (May 1977) pp462-476.
79
-------
ENGINEERING COMPUTATIONS AND DATA COLLECTION
FORMATS USEFUL IN DATA VALIDATION
by
A. Carl Nelson, Jr.
PEDCo Environmental, Incorporated
505 South Duke Street
Durham, North Carolina 27701
81
-------
ENGINEERING COMPUTATIONS AND DATA COLLECTION
FORMATS USEFUL IN DATA VALIDATION
A.C. Nelson, Jr.
A considerable number of "after-the-fact" data
validation techniques will be given during this one-day
conference. It is not the attempt here to try to summarize
all of these techniques but to indicate some of the impor-
tant validation procedures from the view point of the
laboratory and field experts. The approach taken is as
follows. The question was asked of laboratory and field
experts: What are some of the important areas of data
validation in order to yield data of good quality? Some of
the areas listed are briefly described below for both ambient
air monitoring and source testing.
Data Validation - Ambient Air Monitoring Data
1. Audits
The purpose of audits should not be to point
a finger at the organization/team being audited.
In some cases the auditor can be in error. The
value of an audit is that it can identify a gross
bias or inaccuracy in reported data. One recent
example of a problem was in the use of an incor-
rect method. The audit pointed out the problem,
special instruction was given, and the condition
was hopefully corrected. It is possible that no
after-the-fact data validation techniques could
have identified the error in this case since there
was a bias throughout the region as the same
procedure was taught to all operators. The audit
has served its purpose well in this example.
82
-------
In EPA sponsored audits the auditor is almost
always checked out at EPA, Quality Assurance
Branch (QAB) prior to conducting a field audit.
This provides a traceability to a common standard
and method.
Knowledge of the instrument
This is certainly one of. the most important
considerations in obtaining good quality data.
The operator must know the sensitivities/inter-
ferences of the instrument. An example would be
the interference of CO- concentration on an S02
analyzer employing a Flame Photometric Detector.
A test was designed to test the possible sensi-
tivity and the results indicated a definite and
reproducible relationship.
This particular type of error detected for a
particular analyzer could be very difficult to
detect by after-the-fact validation procedures.
Some independent and accurate check must be made
using an instrument which has been tested for
possible interferences/sensitivities. If the
sensitivity of an instrument to an interference has
been precisely determined and is reproducible,
then a correction can be made to obtain the result
which would have been observed if no sensitivity
existed, this was true in the case mentioned.
Another means of gaining information about
the instrument/method is to design a ruggedness
test to check out possible gross factors/steps
which may have a significant effect on the results
or measurements if appropriate control is not
exercised.
83
-------
Interlaboratory tests
Participation in these tests provides a means
of checking the laboratory analysis methods and of
validating the current multipoint calibration
curve. The feedback of information from the
laboratory performing the overall analysis of the
results from all participating laboratories is
most important. For example, if a laboratory is
consistently in error for a particular analysis or
range of concentrations, then this laboratory must
have some means of correcting this problem through
communication with the overall test laboratory or
some representative thereof.
The performance survey is a very good means
of validating data. Furthermore, it is also a
good source of information about what can be
expected from a particular analysis procedure.
In a conversation with a supervisor in one
laboratory, he indicated that they were performing
their CO analysis incorrectly and that the per-
formance survey helped them to identify a problem
which they did not know they had.
On a small scale, three or four laboratories
could set up their own interlab test for a par-
ticular analysis for which no performance survey
data are being obtained by EPA, NIOSH or some
other agency.
Standards traceable to an NBS standard
Often one hears that calibration gases are
not accurately analyzed. It is thus necessary
that the user check the calibration gases prior to
their use in developing new calibration curves.
All measurements must be traceable ultimately to a
primary standard.
84
-------
Significant errors in calibration gases can
usually be determined by a check against the pre-
vious calibration curve obtained using the most
recent gas. EPA, QAB has developed a protocol for
traceablity of gases.
5. Data reduction
The raw data must be recorded legibly and
completely on appropriate data formats. The
calculations should be checked either completely
or on a sampling basis. In this manner the equations
used, the substitution of the correct values in
these equations, and the calculated results are
all checked. This should be an internal audit as
well as a part of an external audit.
6. Other considerations
Some other considerations are the more routine
types of quality control and assurance techniques
which are primarily internal functions. Some of
these techniques are the use of blind reference
samples, quality control limits for internal
checks of reference samples, comparison checks of
two or more calibrators, ruggedness tests, inter-
nal audits by an independent operator, and chain
of custody procedures.
Data Validation - Source Tests
The results of the first series of collaborative source
tests clearly showed that more quality control and data
validations were needed to ensure good quality data. The
results of the first Method 5; particulate collaborative
test, using average testing teams and no special quality
control, produced a relative standard deviation for each run
in excess of 50%, with the outliers thrown out. As a result
85
-------
of this poor reproducibility, several quality control and
data checks were incorporated into the collaborative test
series. These controls and the use of selected testing
firms produced results that were repeatable to within about
10 percent for each run. Most of these additional quality
control checks are now detailed or implied in the revised
methods contained in the Federal Register, August 18, 1977.
The collaborative test series showed two other areas of
concern with respect to quality control and data validation.
The first was that the methods and additional written proce-
dures were thought to be clearly written as to their exe-
cution. This was found not to be the case in the early
collaborative tests as many variations were noted in the
performance of the methods. It became obvious early in the
program that the performance of the average testing team
should be observed by a qualified observer. Also the nature
of most of the errors were such that they could not or would
not have been detected by any data validations on the emis-
sion test report.
The second area of concern was that most of the quality
control techniques were executed prior to the performance of
the field test and the assumption was made that all com-
ponents remained unchanged during testing. Two examples are
dry gas meter calibration and the pretest leak check. If
the dry gas meter calibration changed during testing or if a
leak developed in the sampling train, this would not be
detected.
The collaborative testing program has clearly demon-
strated that to properly perform the needed data validations,
controls must be clearly defined and observed before, during
and after each test series. Data validation of uncontrolled
and unobserved sampling is not effective as a general rule
and usually will not clearly determine acceptability or
unacceptability of data.
86
-------
The revised method published August 18, 1977 contains
many equipment performance calibrations and validations.
The examples used before, a dry gas meter calibration and
leak check only prior to testing, have now been changed to
include a post-test leak check and a post-test meter cali-
bration.
The best method for data validation has become equipment
performance validation. If the equipment is operating
properly the data should be accurate/precise within the
determined limits for that method.
Source Test/ Report Review
One aspect of source testing report review involves
checking the results, not only that the correct equations
were used and there were no mathematical errors, but also
that the correct values were used as inputs into the equations,
The latter requirement can be checked quickly if all of the
required data were measured and recorded legibly, and the
raw data sheets are submitted with the report. Any report
which includes a computer listing of the raw data, instead
of the original data sheets, should be rejected.
The degree to which the calculations should be checked
is generally a function of the consistency of the results
and the reviewer's confidence in the tester's ability. The
various levels of review possible for the calculations would
be (1) none at all, (2) random spot checks, (3) complete
review of results which seem inconsistent, with respect to
each other or to typical results, (4) complete review of one
randomly chosen run, and (5) complete review of all runs.
There are some empirical techniques that can be used to
check or validate process and sampling data provided by the
tester and the source. In some cases, the sampling data
from the tester can be used to check process data supplied
DSSE Workshop - draft report.
87
-------
by the source. Some of the available techniques are given
herein. The experienced reviewer will ultimately develop
his own list of short cuts, cross checks, and rules of
thumb.
1. Barometeric Pressure
Incorrect barometric pressure measurement
will not generally cause errors of more than 10 to
15 percent, but it is a very common error. The
value reported by the tester can be checked in two
separate ways: (1) At sea level, the barometric
pressure is almost always between 29 and 31 inches
of mercury, and usually close to 30. For every
1000 feet above sea level, the value will decrease
by 1.1 in. Hg. Therefore, if a test is run in
Denver, with an elevation of 5000 feet above sea
level, the barometric pressure reported should be
from 23.5 to 25.5 inches of mercury. (2) The
reviewer can call the airport closest to test
site, and ask for the "station" pressure (not
corrected to sea level) for the date of the test.
2. Leak Tests
If the report claims that leak tests were
performed either before each test or after filter
changes, the dry gas meter readings on the data
sheet would indicate this. In other words, it is
unlikely that a leak test was done before run #2
if the final volume reading for run #1 is the same
as the initial volume reading on run #2. If a
leak test was made in the middle of the run (because
of a filter change, for example), the volume
readings before and after the leak test would be
-------
shown on the data sheet, so that the computed
meter volume could be adjusted accordingly.
Moisture Data
The results presented in the report for the
volume percent of water vapor in the gases sampled
can be checked in several different ways. For any
combustion source, the moisture content can be
approximated by use of nomographs if the reviewer
calculates the excess air and can estimate the
ambient temperature, ambient humidity, and the
free water in the fuel. Hopefully, the process
data will include an analysis of the fuel. If
not, use zero for gas and oil, 10 percent for
bituminous coal, and 25 percent for lignite, bark,
wood, and refuse unless the fuel has been rained
on recently. If the best estimates available are
ranges, use the high and low estimates to bracket
the moisture content.
Entrained droplets of liquid water in the
stack gases can yield an erroneously high moisture
content. All moisture data should be checked
(even if there are no entrained water droplets) to
ensure that the reported value is not higher than
the saturation moisture content. Nomographs
provide moisture content at saturation as a function
of stack absolute pressure and stack gas temperature.
If the reported value is higher than the maximum
read from the nomograph, the data are suspect.
Generally, if the high reading was caused by
entrained water droplets, the value is adjusted to
the saturation moisture content.
DSSE Workshop - draft report.
89
-------
In sources where the process involves drying
(removing water) from a raw material or product, a
water balance across the process should validate
the moisture data in the report. Remember to
include the water introduced as humidity in the
ambient air.
4. Orsat Data
For any combustion source, the relative
amounts of oxygen and carbon dioxide in the flue
gases can be predicted by the use of a nomograph.
When a report is submitted containing orsat data
(or C02 and 02 data from any other instrument),
the data can be checked by aligning the type of
fuel with the %C02/ and checking the %G>2 from the
nomograph with the reported value. If the results
do not check, it indicates that there is a problem
with the reported data.
i1
This nomograph also gives the percent excess
air based on the type of fuel and orsat analysis.
The reviewer should be cautioned that if the orsat
data were taken after a water scrubber, the nomo-
graph will not work, since the scrubber will
remove an indeterminate amount of carbon dioxide.
5. Volumetric Flow Rate Data
The volumetric flow rate is difficult to
cross-check accurately, but there are several ways
of determining if the reported values are in the
"ball-park". In any duct or stack where the air
is moved by a blower, the design criteria generally
result in a gas velocity of 25-40 feet per second.
DSSE Workshop - draft report.
90
-------
The idea is that higher velocities cause prohi-
bitive pressure losses, and lower velocities are
uneconomical due to the cost of the duct work.
Since the size of many stacks is dependent on
structural strength or future needs, the check
works best for the duct work leading to the stack.
If the velocity measurements are made in the
stack, and the stack cross-sectional area is much
larger than that of the duct work, apply the
25-40 feet per second check by dividing the
volumetric flow rate by the duct area. If there
is no fan or blower in the process, such as with a
natural draft boiler or incinerator, the flow will
generally be 5-15 feet per second. Keep in mind
that the ranges given here are not theoretical
limits, but merely commonly encountered values.
If the test results presented do not fall within
these ranges, it is only a signal to look at the
velocity data more closely.
In reviewing test results, it is always
desirable to have available the results from any
previous tests on the same source, previous tests
on any similar sources (such as an identical unit
at the same plant), or tests performed at the
inlet to the control device. If the inlet tests
were done simultaneously with the outlet tests,
the volumetric flow rates (corrected to standard
conditions) should match from inlet to outlet. If
the control device uses water, the checks should
be made on a dry basis. Air leakage in or out of
the control device can occur, which would lessen
the value of this check, but air leakage can
generally be identified by a change in the moisture,
91
-------
temperature, or C02 content from inlet to outlet.
Since inlet tests are not usually done for com-
pliance, they are often performed in ducts with
little or no straight run, which can cause higher
than real velocity data, a factor to consider when
making inlet-outlet comparisons.
Many sources have fan performance curves for
the fans used in the process, and these can be
used as a check against the reported flow rate
data. The gas flow moved by the fan is a function
of the pressure head produced (or induced, or
both), the gas temperature, the gas composition,
and the fan speed (rpm). Unless all of these
factors are controlled or quantified (which is a
rare situation) the fan curves can only be used to
estimate or roughly check the flow rates.
When process equipment and/or control devices
are designed, there is generally a design speci-
fication on volumetric flow rate. If these speci-
fications are available (from the source or from
permit forms) they can be used to check the tester's
results.
6. Process Data
There are probably as many different ways to
check process data as there are types of processes.
Some are so much a part of a particular process
that they could not all be discussed here. There-
fore, if the checks mentioned herein are not
adequate for the process in question, then that
process should be studied (using the literature
and communications with the source) to determine
if some additional checks are available for use.
92
-------
For many processes, the production rate is
relatively constant from day to day. In this
case, the production rate reported should compare
favorably with the annual production rate (or
annual raw material usage rate) divided by the
number of operating days.
In a case where the reviewer wants to compute
the production rate from the raw material rate, or
compute the raw material rate to check the produc-
tion rate, the principle of material balances
should be employed. If one ignores nuclear
reactions, then it can be stated that in any
process, matter will be neither destroyed nor
created. This means that any materials entering
the process must either accumulate or leave the
process (in minus out equals accumulation). The
material balance can be done on all components of
the process stream, or it can be limited to a
single component such as water or carbon dioxide.
Drying operations, like grain dryers, are
good examples of sources which adapt readily to a
water balance. Water enters the process from the
grain itself, from the drying air (which is
generally ambient air), and from the combustion of
fuels containing hydrogen; it leaves as water
vapor in the exhaust and as residual water in the
grain. The stack test data provide the total
water vapor leaving the dryer, by multiplying the
total gas flow rate by the percent water vapor,
and converting the result to a mass rate. From
the amount of fuel burned, one can compute the
water vapor produced by the combustion. If the
93
-------
ambient temperature and relative humidity are
known, the water supplied by the drying air can be
computed. From the symbols shown in Figure 1, the
BM..T
C-
from ambient air
Wcl LUJL JL L UIll yiilill •
water from
combustion
DRYER
—water vapor
residual water
in grain
> D
Figure 1. MATERIAL BALANCE
water balance would be:
A+B+C=D+E,
and A, C, and D have been computed. If F is the
weight of grain dried, W is the inlet moisture
fraction for the grain, and W is the outlet
moisture fraction for the grain, then
B = WF, and E = W'F.
Substituting these expressions in the above equation
and solving for F yields:
D-A-C
TJI _
w-w
7. Emission Results
Unfortunately, the most difficult data to
validate are the emission results, which also are
the most important data to validate. Emission
rates for gaseous pollutants, such as SO9 and NO ,
f-f 2C
can often be checked against process parameters,
but this is because these pollutants are rarely
controlled. For example, since essentially all
the sulfur present in coal or oil will be liber-
ated as S02 during combustion, a sulfur balance
94
-------
should yield a good check of SO- emission results.
For a, specific design of boiler or incinerator,
the amount of NO produced can be estimated from
x 2
the emission factor, published by EPA.
The generation of particulate pollutants is a
function of a large number of process parameters,
many of which cannot be measured. Emission factors
are available for many sources of particulates,
but most of these sources use some type of control
device to remove the bulk of particulates prior to
the stack exhaust. As an example, consider the
particulate emissions from a utility boiler with
an electrostatic precipitator, or from an asphalt
batch with a baghouse collector. The emission
factor for these sources, prior to the control
device, are listed in the emission factor book and
the literature, and for now it will be assumed
that they are accurate. The control devices that
are used would have a design efficiency of 99 to
99.5 percent, but the actual efficiency could
range from 50 to 99.9 percent. What this is
saying is that if the uncontrolled emissions are
100 pounds per hour, the design efficiency would
yield emissions of 0.5 to 1.0 pounds per hour, but
the actual emissions could range from 0.1 to 50
pounds per hour. The emission factor book lists
factors for various types of control devices, but
these are design efficiencies, and the reviewer
should resist paying much attention to them. They
only reflect the emissions if the control equip-
ment is operating at its design efficiency, and if
that is assumed to be true, then there is no need
to perform a compliance test in the first place.
2
"Compilation of Air Pollution Emission Factors", U.S. EPA,
Publication No. AP-42.
95
-------
One approach which is often suggested (and
used) by control agencies is the idea of comparing
the three runs to one another. In other words,
the validity of the data can be measured by the
proximity of the three results to the average.
This would work if all of the variation in the
results was a function of random sampling errors.
Using these assumptions data could be handled as
in the following examples: (1) the three results
are 2, 3, and 4, and the reported emission rate is
3, and (2) the three results are 2, 4, and 15, the
15 value is thrown out as an outlier, and 3 is
reported as the emission rate. The second example
says that since 2 and 4 are close to one another,
and 15 is not close to 2 or 4, that the 15 must
represent a gross sampling error, and only the 2
and 4 should be averaged together to get the
emission rate.
Several additional considerations should
discourage the reviewer from applying this vali-
dation technique. There is no question that three
nearly identical results will instill confidence
in the reviewer's mind, and that three widely
different results will reduce that confidence.
Using the example above, however, with the results
of 2, 4, and 15, how can the observer tell what
the rest of the "population" looks like? Had four
samples been taken instead of three, with results
of 2, 4, 15, and 15 and the last three were
reported instead of the first three, the 4 would
be thrown out as an outlier and 15 would have been
reported as the emission rate. Process variations
can occur during testing that could produce ten-
to-one variations in the actual emission rates,
96
-------
and these variations can occur at any time,
without any warning, and often without being
noticed.
As a final note, any statistician can supply
dozens of methods for evaluating a set of results,
including ways to calculate confidence limits and
eliminate outliers. Any statistician will also
tell you, however, that a single set of three
results is really too small to study statistically.
And all the statistics in the world cannot replace
common sense.
Summary
In summary, data validation must be an integral part of
the data collection, analysis, reduction, and reporting
process. Several useful data validation techniques which
will aid in detecting large inaccuracies in the reported
results are described. However, there are obvious limitations
to the types of data inaccuracies which can be identified in
both ambient and source tests as pointed out in this paper.
It is hoped that the techniques suggested herein, from the
viewpoint of laboratory and field experts, will stimulate
further discussion.
Acknowledgement
Lawrence Elfers and William DeWees of PEDCo Environmental
provided much of the information which is briefly summarized
herein. "In addition, appreciation is due to Entropy Environ-
mentalists, Incorporated because the draft copy of the Division
of Stationary Source Enforcement Workshop contains information
provided by this organization." I hope that this paper does not
misinterpret their written and verbal suggestions. These inputs
are greatly appreciated.
97
-------
VALIDATION PROCEDURES APPLIED TO IN-USE
MOTOR VEHICLE EMISSION DATA
by
Marcia E. Williams
Office of Mobile Source Air Pollution Control
U.S. Environmental Protection Agency
Ann Arbor, Michigan 48105
99
-------
VALIDATION PROCEDURES APPLIED TO IN-USE
MOTOR VEHICLE EMISSION DATA
M.E. Williams
ABSTRACT
One of the functions of the Office of Mobile Source Air Pollu-
tion Control is the collection and subsequent assessment of data on
the emission performance of in-use vehicles. On an annual basis,
over 4.5 million fields of data are collected. These data must be
carefully validated before they are used by EPA, by other government
agencies, and by private citizens. The current data editing proce-
dures are designed to be fairly routine but quite complete. Systematic
problems are eliminated with thorough laboratory facility check-
outs, frequent calibration checks, and correlation programs with the
EPA laboratory.
The data editing procedure is divided into two parts - manual
editing of the large amount of supporting data forms and strip
charts and computer editing of all data cards. This edit procedure
has detected error rates of from 14 to 32 percent manually and from
5 to 50% in the computerized phase. Most of these errors are cor-
rectable and less than five percent of tests are invalidated.
Although some of the errors can be found in either the computer or
manual phase, many errors can only be detected in one of the two
phases. To avoid needless effort, the phases are performed in
series rather than in parallel.
Editing costs are less than two percent of the total program
cost and the current edit program is estimated to achieve a final
error rate of about one percent. Future changes to the editing
procedure focus on reducing the EPA manpower requirements without
sacrificing the current quality level. An effort will be undertaken
to determine how much effect various types of errors have on the
ultimate uses of the data so that the question of "How good do the
data have to be?" can be factored into the design of data valida-
tion methodology.
100
-------
BACKGROUND
The Office of Mobile Source Air Pollution Control is responsible
for generating a data base on the in-use emission and fuel economy
performance of all mobile sources. These data are used by many
groups within EPA as well as by other Federal agencies, state and
local governments, private industry, and private citizens. A fairly
complete list of uses for emission factor data is given in Table 1.
Different degrees of data accuracy are needed for different data
applications. Most applications are concerned with having an accurate
estimate of average emissions or fuel economy. However, those items
in Table 1 which are notated with an asterisk require that the data
on every vehicle be completely accurate. At this point in time, the
data edit procedure is geared to ensure that all fields of data are
correct.
Table 2 provides a list of typical ongoing test programs. On
an annual basis, OMSAPC spends between 2.5 and 4.5 million contract
dollars on characterizing the performance of in-use vehicles.
Typical ongoing test programs are listed in Table 2. Each test
program involves the procurement and subsequent testing of consumer
owned vehicles. Vehicle owners are given incentives such as a U.S.
savings bond, a leaner car, and a free tank of gas to participate in
the EPA test program. Vehicles are then tested over a variety of
different test sequences. In some cases, entire test sequences are
repeated with vehicles in different states of tune or with ambient
test conditions varied. Table 3 lists the types of variables which
are collected for each vehicle test. For each vehicle test sequence,
there are 150 to 600 pieces of information gathered. For each
vehicle tested, there are from one to six test sequences performed.
Thus, on an annual basis, approximately 4.5 million fields of data
are collected and must be validated.
GENERAL APPROACH
The data validation procedure begins with the assumption that
there is no systematic bias or error in the data. Systematic errors
are prevented by the development of detailed test procedures, record-
keeping procedures, and mandatory recording formats. Each contractor
must undergo a rigorous facility check-out at the beginning and end
of each test program. The check-out includes performance tests for
all contractor personnel including equipment operators, drivers,
test technicians, and data handlers. In addition, EPA personnel
specify frequent calibration checks on all equipment, carry out
101
-------
reference gas and vehicle correlation testing against the EPA
laboratory in Ann Arbor, and implement both announced and unan-
nounced contractor inspections throughout the duration of each test
program. Table 4 details the procedures which are implemented to
prevent systematic bias.
All EPA contractors are required to perform data validation
before submitting any data to EPA. EPA contracts specify that the
contractor must use some form of computerized edit procedure in
addition to a manual procedure. However, the exact contractor
procedure is not specified. Since contractors are not paid for
tests until EPA accepts the tests as valid, an incentive exists to
submit correct data as soon as possible after the completion of a
vehicle test.
The EPA edit procedure is diagramed in Figure 1. The manual
and computer aspects of data validation are carried out in series to
avoid needless manpower effort and to ensure that at no time will
data files contain any data that are not validated. The manual edit
procedure is performed first and many of the steps in that procedure
are listed in Table 5. The manual procedure concentrates on strip
chart data including the driving trace and the emission concentrations.
Although there is a trend toward contractor computerization of these
items, a fully computerized data aquisition system is expensive and
is not required by EPA due to the short term (annual), fixed-price
nature of EPA contracts. Thus, errors in following the appropriate
driving trace and properly zeroing and calibrating analyzers can
only be detected in the manual phase.
Table 6 presents the types of checks which are performed in the
computer edit procedure. As shown in Figure 1, the computer editing
does not occur until the manual edit checks indicate a potentially
valid test. Table 7 summarizes the types and the severity of errors
which have been detected. Tests are invalidated only in cases where
test procedure errors are uncovered or in cases where key data are
missing. Table 8 lists typical reasons that complete test sequences
have been invalidated.
The one type of error which the current edit procedure is not
specifically designed to detect is discrepancies between the computer
cards and the supporting documentation. If each computer card entry
is within range and consistent with other computer data fields, it
will not be flagged. For example, if a highway emission result is
incorrectly keypunched as 4.52 instead of 4.92, it would not be
detected since both numbers could be equally valid. One would have
to examine the analyzer trace and the data packet notation to know
which value was correct. Since data are double keypunched, these
102
-------
types of errors are assumed to be minimal. However, consideration
is being given to the implementation of an acceptance sampling
procedure to ensure that information on the data cards matches
information in the supporting documentation.
RESOURCES AND RESULTS
Table 9 indicates the EPA detected error rates in three recent
test programs. In each case, the error rate is the percentage of
vehicles with at least one detected error; some vehicles may have
multiple errors. The range of detected error rates clearly indicates
that not all contractors employ the same levels of quality control.
However, despite the high detected error rates, less than five
percent of total tests are invalidated; most errors can be corrected.
Table 10 summarizes the required EPA editing resources for two
recent test programs. These resources are examined as a fraction of
total contract cost in Table 11. Assuming that EPA manpower used to
perform data editing costs $20,000 per manyear, the EPA cost for
data validation is about two percent of total contract cost. With
this resource effort, it is estimated that the undetected error rate
is less than one percent.
FUTURE APPROACHES
Table 12 lists a number of additional data validation approaches.
More automated data aquisition is being implemented in one ongoing
contract. The system has taken considerable time to debug and it is
too early to judge the cost-effectiveness of this approach. In
recent contracts, EPA has increased the contractor data validation
requirements. Again, it is too early to determine whether this
action will prove to be a cost-effective way to achieve a low final
error rate. Finally, in one large test program, EPA has stationed
personnel at the contractor's site on a full time basis. Again, the
effect on final error rate is not yet known.
Table 12 lists three additional approaches which have not yet
been implemented. If improved contractor error rates can be achieved,
EPA will attempt to reduce dedicated edit manpower by implementing a
general spot check procedure. Such procedures are based on statistical
principles and one such procedure is outlined in detail in Tables 13
and 14.
Before edit procedures can be implemented which attempt to
lower the cost of editing by applying statistical procedures to key
data fields, two important philisophical questions must be answered.
First, it must be determined how good the data have to be. Variables
103
-------
of maximal interest need to be specified and in each case, the
confidence and range within which the variable needs to be known
must be determined. The second major area of uncertainty requires a
determination of the impact that various errors make on the vari-
ables of maximal interest. Answers to these question areas are
currently being pursued so that statistical edit procedures can be
considered in more detail.
104
-------
Table 1
USES FOR TEST DATA
EPA
1, ASSESSMENT OF EMISSION AND DETERIORATION RATES FOR AP-42
(HANDBOOK OF AIR POLLUTION EMISSION FACTORS)
2, DEVELOPMENT OF EMISSION AND FUEL ECONOMY CORRECTION FACTORS
FOR AP-42
3, COMPARISON OF IN-USE LEVELS WITH CERTIFICATION. ASSEMBLY
LINE LEVELS
4, DETERMINATION OF REASONS FOR POOR IN-USE VEHICLE PERFORMANCE
5, ASSESSMENT OF SHORT TEST/FTP CORRELATABILITY FOR ALT, 207(B)
(APPLICABLE SECTION OF THE CLEAN AIR ACT;
6, ASSESSMENT OF INSPECTION/MAINTENANCE BENEFITS
*7, EVALUATION OF IN-USE VEHICLE COMPLIANCE WITH STANDARDS -
SUPPORT FOR AGENCY RECALL PROGRAM
*8, COMPARISON OF PRODUCTION/PROTOTYPE FUEL ECONOMY LEVELS
9, SUPPORT FOR REGULATION DEVELOPMENT PACKAGES - ENVIRONMENTAL
IMPACT ANALYSES
10, PRIORITIZATION OF AGENCY REGULATION/COMPLIANCE PROGRAMS
OTHER USERS
1, HIGHWAY ENVIRONMENTAL IMPACT STATEMENT (EIS) WORK
2, INDIRECT SOURCE REVIEW
3, REGION/STATE EMISSION INVENTORY WORK
4, EVALUATION OF IMPROVED PUBLIC TRANSPORTATION SYSTEMS
5, EVALUATION OF VEHICLE MILES TRAVELED (VMT) REDUCTION STRATEGIES
6, GENERAL TRANSPORTATION CONTROL PLAN (TCP) EVALUATION
7, FUEL AVAILABILITY STUDIES
8, AlR QUALITY MODEL INPUTS
9, STATE IMPLEMENTATION PLAN (SIP) CONFORMANCE WITH AMBIENT
STANDARDS
10, HEALTH ASSESSMENT STUDIES
105
-------
Table 2
TYPICAL ONGOING TEST PROGRAMS
1, ANNUAL IN-USE AUTOMOBILE TESTING PROGRAM
A, 7 CITIES
B, WIDE RANGE OF MODEL-YEARS
C, LARGE NUMBER OF EMISSION TEST CONDITIONS
D, 32000 VEHICLES PER YEAR
2, ANNUAL AUTOMOBILE RESTORATIVE MAINTENANCE TESTING PROGRAM
A, 3-4 CITIES
B, PRIMARILY NEW MODEL-YEAR VEHICLES
C, EXTENSIVE DIAGNOSTIC AND MAINTENANCE WORK PERFORMED
D, 3400 VEHICLES PER YEAR
3, IN-USE LIGHT DUTY TRUCK TESTING PROGRAM
A, MULTIPLE CITIES
B, WIDE RANGE OF MODEL-YEARS
C, LARGE NUMBER OF EMISSION TEST CONDITIONS
D, 3200 VEHICLES PER YEAR
4, IN-USE HEAVY DUTY TRUCK TESTING PROGRAM
A, SINGLE CITY
B, WIDE RANGE OF MODEL-YEARS
C, LARGE NUMBER OF EMISSION TEST CONDITIONS
D, 3200 VEHICLES IN FY/8
5, IN-USE MOTORCYCLE TESTING PROGRAM
• A, 2 CITIES (HIGH AND LOW ALTITUDE)
B, WIDE RANGE OF MODEL-YEARS
C, LARGE NUMBER OF EMISSION TEST CONDITIONS
D, 3250 VEHICLES IN CURRENT TEST PROGRAM
6, INSPECTION/MAINTENANCE DEMO PROJECT
A. PORTLAND/ OREGON
B, 1972-1977 MODELS
C, LARGE NUMBER OF EMISSION TEST CONDITIONS AND
AMBIENT CONDITIONS
D, 33000 VEHICLES AND 6000 TESTS
7, OTHER SMALL TESTING PROGRAMS
A, MOPEDS
B, DIAL-A-RIDE BUSES
C, AMBIENT TEMPERATURE TESTING
D, GOOD TECHNOLOGY VEHICLES
106
-------
Table 3
TYPES OF VARIABLES COLLECTED FOR EACH TEST
1, IDENTIFICATION DATA
MODEL YEAR
MAKE
MODEL
ENGINE DISPLACEMENT
CARBURETOR VENTURIS
CATALYTIC CONVERTER
AIR PUMP
NUMBER OF CYLINDERS
TRANSMISSION TYPE
VEHICLE IDENTIFICATION NUMBER (VIN)
ENGINE FAMILY CODE
2, EMISSION DATA
IDLE DATA (2 POLLUTANTS)
SHORT TEST DATA (3 POLLUTANTS/ UP TO 5 TESTS)
FTP DATA (4 POLLUTANTS/ 3 BAG VALUES/ COMPOSITE VALUE)
OTHER CYCLES (4 POLLUTANTS)
FUEL ECONOMY DATA
EVAPORATIVE EMISSION DATA
SULFATE EMISSION TESTS
PARTICULATE EMISSION TESTS
METHANE - NON-METHANE MEASUREMENTS
MODAL TESTING
3, AMBIENT CONDITIONS
TEMPERATURE
HUMIDITY
BAROMETRIC PRESSURE
4, TEST CONDITIONS
ROAD LOAD HORSEPOWER
INERTIA WEIGHT
SOAK TIME
PRE-CONDITIONING SCHEDULE
5, PARAMETRIC DATA
ENGINE IDLE SPEED
ENGINE TIMING
MANUFACTURER SPEC VALUES
COMPLETE DIAGNOSTIC CHECKS (9 MAJOR VEHICLE SYSTEMS/
5)10 COMPONENTS PER SYSTEM)
TAMPERING DATA
DRIVEABILITY DATA
6, OWNER QUESTIONNAIRE
NUMBER OF TRIPS PER DAY
NUMBER OF MILES PER YEAR
TYPE OF DRIVING
TYPICAL PASSENGER LOADING
LAST MAINTENANCE PERFORMED - TYPE AND COST
FUEL ECONOMY ESTIMATE
TYPE OF GASOLINE USED
107
-------
Table 4
PROCEDURES TO PREVENT SYSTEMATIC BIAS
1, SPECIFICATION OF GAS ANALYZER, DYNAMOMETER/ AND CVS
EQUIPMENT (WITH PROVISIONS MADE FOR EQUIVALENT SUBSTITU-
TIONS),
2, EPA NAMES REFERENCE GASES,
3, EPA SPECIFIES ANALYZER CALIBRATION PROCEDURES INCLUDING
GAS TYPES, CYLINDER FITTINGS, IMPURITY LEVELS, CURVE-
FITTING PROCEDURE, AND REQUIRED ACCURACY,
4, SPAN GASES, SIMILAR TO CALIBRATION GASES, ARE SPECIFIED,
5, SYSTEM PLUMBING MATERIALS ARE SPECIFIED,
6, EQUIPMENT CHECKS ARE SPECIFIED
DAILY LEAK CHECKS - CVS AND ANALYTICAL SYSTEM
WEEKLY COMPLETE CURVE CHECK - ANALYTICAL SYSTEM
A,
c! DYNAMOMETER~WARM-UP PROCEDURES
D, COMPLETE CURVE CHECKS AFTER ANY SYSTEM MAINTENANCE
E, DYNAMOMETER CALIBRATED BI-WEEKLY
F, SAMPLE BAGS LEAK CHECKED BEFORE EACH TEST
G, DAILY NOx ANALYZER CONVERTER EFFICIENCY TEST
H, COMPLETE DAILY LOGS OF ALL GASES, CALIBRATIONS,
MAINTENANCE, ETC,
7, MAXIMUM BACKGROUND LEVELS SPECIFIED,
8, COMPLETE EPA FACILITY CHECK-OUT INCLUDING TESTS TO
CONTRACTOR PERSONNEL,
9, UNANNOUNCED EPA VISITS,
10, CORRELATION TESTING WITH EPA LAB,
i
CONSTANT VOLUME SAMPLER
10£
-------
Data Cards
Arrive at EPA
Support Data
Arrive at EPA
Data Arrival Logged
Into Record Book
1
Supporting
Data Screened
Not
Additional
Data or
Clarification
Acceptable
Data Card
Corrections
Received
Accept-
able
Call
Contractor
No Solution
Eliminate
Vehicle
i
Additional
Data or
Clarification
Data Cards To
Computer Group
Call
Contractor
Update
Log Book
Accept-
able
No
Solution
Eliminate
Vehicle
Acceptable
Vehicle Added
To File
I
Update
Log Book
Figure 1. Flow Diagram of Edit Procedure
109
-------
1/1
01
i-H
XI
CO
2 co
^ W
(2 o
o o
q rf
a &i
o w
55 M
H
CO
O CJ
H <
O O,
* 3
O Q
CO
CO
CO
M
33
CO
o
,-1
)-l CO
3 •
CO Q
X HI
-o u
0 0
(U U
4-1 CO
CO
0
CO
X
X
-3
n
XI
•^s
X
X
X
X
X
X
^-N
to
ra
XI
v_x
X
X
X
X
X
X
U)
C8
XI
X
X
X
X
_
X
X
X
X
X
X
X
X
oo
rt
XI
v^
X
X
X
X
X
X
oo
rt
XI
^x
X
X
X
X
4J
01 4J
CJ 0)
4-1 CJ
rt r-H
-o rt
O 13
E O
•H E
O >4H
O
C
O Vi
•rl 0)
4-i -a
Vl C
o -H
P-I rt
I CJ
OJ
4->
rt
•O OJ
0) O
o a.
a. co
CO
S 00
O -rl
•a i i
rt >
oJ si co co
4-1 3 O M
co co >J K
zeroes,
*
to
d
o
o.
01
v<
o
I-l
to
4J
Vl
rt
_r^
y
c.
•rl
Vi
4-1
to
rl
CJ
N
>-,
rH
ra
d
ra
^
o
OJ
Si
CJ
to
60
C
•H
•d
rt
CJ
M
U
O
CJ
M
Vl
o
o
T3
d
rt
f>
0)
0)
to
d
rt
VI •
to
u-t C
o o
•H
U 4-1
01 CJ
3 OJ
rH
VI 14-1
OJ >
rH
«
d
«J
OJ
VI
3
01
CJ
.*!
•9
X
•
tn
4J
CJ
U
01
rt
4-1
rt
•a
01
XI
4-1
o
U
d
o
>>
rH
4-1
Cl
CJ
VI
Vi
o
CJ
ations
Vi
4-1
d
OJ
Cl
C
o
o
01
rt
60
VI
01
o.
o
Vl
p.
01
x;
4J
4J
rt
X
4-1
01
>J
3
01
OJ
4» 01
Xl >
-rl
01 4-J
« O
CJ OJ
OJ &.
XI 01
to oi
Vi
rt
4-1 OJ
rt x:
T3 4J
OJ 4-1
XI 01
4-1 C
•H
C rt
O t>0
rt
•a
QJ CO
Vi OJ
OJ 3
4-1 r-H
d w
QJ >
01 OJ
vi x;
rt 4J
•
01
4J
VI
C)
x:
o
o
•H
4-1
IS
4-1
•H
O)
4J
01
01
XI
to
rt
4J
rt
•o
C
o
60
d
•H
T3
rt
u
PS
Vi
CJ
4J
OJ
e
o
VI
rt
M
.M
0
CJ
x;
u
OJ
rH
XI
rt
rH
•H
rt
>
rt
d
SJ
x:
S
4-1
VI
S
XI
o
D.
•rl
U
U
01
Vi
0
4-1
CJ
e
o
U
rt
XI
4-1 OJ
to xi
CJ 4J
4J
4-1
OJ rt
vi xl
3 4-1
U
rt .
01 Vi
Vi 13
VI «}
o
a U
o ,
SrH
, O 01
H CJ W -O
0) O 3 S", S
41 x: en oj
H co t£ tr>
rH rH rt C rH
O rt S O rt
M S-l XI 4-1 rl
CJ O tO >-, 0)
•a T3 -H O 13
QJ CJ W rH OJ
fm fn U {*
•II* II
CJ H CO C=j X H
fe! lii tu S O fe
no
-------
Od CO
g«
H
CO CO
0 IH
C
o
(J
X
X
X
X
X
X
X
X
,0
o
E-"
•o
CJ
•3
1-1
O
CJ
o
u
CO
01
M
3
4-)
CD
CJ
ex
E
cu
4-1
OJ
.C
4-1
4J
cd
'u
OJ
OJ
CO
o
V
o
01
CU
rH
.a
cd
4-1
O.
CJ
o
CJ
cd
d
•rl
J3
4-1
•H
Jj
OJ
M
rt
OJ
CJ
o
1-4
4-)
J4
rj
O
CO
OJ
r;
4J
C
0
CO
•rl
01
O
o
M
4J
a)
^4
u
jj
2
CJ
j_j
3
CO
CJ
^
rt
(^
•a
c
rt
CO
4J
•H
•H
rH
14
3
O
CM
rH
•
X
o
ex
a.
f3
.
OJ
•H
OJ
S
•H
4J
lJ
CJ
a.
o
ex
u
-^
4-1
O
U-l
s^.
WJ
c
•H
4-1
CO
OJ
4-1
O
4J
^4
O
•H
|J
a
>^
rH
CJ
4J
cd
•rH
•d
OJ
g
g
•H
•d
c
•rH
OJ
ex
60
C «
•rt •
•d o.
VH CO
O -H
O T3
OJ •
V4 tfl
c
>H 01
cu
Cu ^
o >
>-i B
IX O
•a
n o
o
M-H "^
CO -
. CU •
o >,
4J U-4 .
co d «
>J OJ
o cj e cd
OJ 'rH VJ
J^ 4: - U
o o cu
> ^S «>
CJ ca .
rH d E .43
CJ O 1J
•H O UH Cd
J2 CAJ O O
CJ
OJ
T3
cd co
cd
T3 3
cu
C t-H
•H OJ
cd 3
•o
CO OJ
cd D.
3 o
CJ
3 'I t
14-H O
•O 4-1
rH ti
O 3
o
4J g
P3 O
o •
^ GJ T3
O VJ QJ
GJ V^ 'O
^ O T3
O cj cd
preliminary 10-min-
j^
CJ
*4J
GJ
*£
GJ
CJ
05
O
U
L>
o
c*
CJ
,
c
G
""j-
u
CO
•>
*""
0)
>
•H
•o
a
3
CU
a
CO
C3
0)
CJ
•H
C-;
CJ
*>
jj
cd
.e
CJ
0)
CO
o
vj
o
CU
(£
o
CU
•H
4J
4_J
r"
to
•rH
M
C
•rl
^
CO
O
CO
MH
O
4J
3
O
ssion controls are
•rH
e
CJ
4-1
C3
^C
3
OJ
CJ
CO
o
4J
^•S,
O
0)
^G
o
ed onto the mechan-
t-i
M
OJ
10
a
cd
u
[ i
-d
d
Cd
4J
d
CJ
CO
0)
j^
ex
4-1
CU
CU
to
cd
R)
•d
d
o
•H
4J
CJ
CJ
r\
CO
C
•rH
rH
cd
CJ
•H
CO
G
CJ
OJ
rH
•o
•H
O
UH
CO
OJ
d
CJ
0)
o.
CO
t^J
•o
CJ
3
CO
cd
CJ
6
•^
CJ
CJ
_f~*
u
OJ
OJ ,
CO
CJ
O
o
01
!-H
•d
•rl
v>
00
d
•H
E
4-1
n
rH
t-H
CU
3
•a
*>
•d
0)
CJ
a.
u
a
o
OJ
ll-
CU
4-J
c
o
rH
CU
O
a.
'd
OJ
^i
a)
4J
d
OJ
cu
M
cd
.X
o
,c
4J
4.J
cd
£.
4J
4J
O
0)
^
M
cd
u
(-3
•d
0
o
4-1
CJ
O
ex
CO
d
•H
111
-------
* 3
o
in
C/5
O
_!
o4 en
q w
< H
H H
0 H
en u}
C-i
t~
X
X
X
-
X
X
X
-
X
X
O
o
•8
H
u
cu
o
CO
rt
u
rt
•o
c
o
OO
rt
CU T3
13 rt
o to rt
o - R g
VJ VJ O Cu CU U uyoCUCUOCUCUCJ
CU CJ Cu O. O. CCCuCuC-CXCUC-(X
O. C. O OOCUCU-rlOOOOOOO
VJ
o
— ^
-a
C
to
13
CU
o
^5
•a
JS -H
Ul
JJ 4-1
cu o
•a cu
C to
(0
CJCUGCUCUCUCUCU;ncUCUCUCUCJCU
OOCJOCJCJCJJ-J CJ CJ CJ O CJ CJ
cu
CU
o
•rl C
•H
cu
o -o
• 5 j-i
J3 rt
•a o
C D-
o
•- o
T3 C
CU -H
Ul
3S
cu 3
o c
U -rl
•a rt
cu o
CO rH
o
Cu •
C.IX
3 •
to K
o
6 ,z
•a iJ
O -rl
o
VJ
CO
o
xi -a
cu
o to
[ f j
X Ul
J-l o
•rl 3
o
rt r-J.
a. -y
' rt 3
O «VJ
o d
cu o
X! -H
u u
rt
.. u
en vj
O VJ
o
cu o
o
rt -a
E. C
ui rt
cu co
3 CU
o o
cu
XI J-*
O C
3
•• 1
U~t rt
VJ
cu cu
o a.
rt o
CU VJ
co c.
H
CU
u
I
o
o
•a
a>
o
1-1
t-H
O
0)
rH
XI CU
o! >
• C -H
CU O to
> (0 Ul
•rl W .
O C3
CU
M vj to
rt o cu
CJ Ul 'rl
o a tt> u
Ul -rl -r) T3
VJ tfl O
Ci O ^i rH .C
> cu c rt
•rH CX rt *-* >»
VJ Ul VJ O C
c:
rt
o
o
VJ
cu
•H
VJ
Q
.M rH
O
01 Q) Q)
VJ XI VJ VJ VJ
3^333
CO -H U) CO 01
CU CU CU
c . ..
o rt rt
P. 23 A.
112
-------
H
O
>
Q W
•3 H
M <:
H H
O
to
X
X!
X ! X
X
X
X
X
OJ
,0
o
•H
CO
OJ
3
M
r^
to
cu •
rH CU
v-l rH
E XI
rt
<4J C
O
• to
cu rt
to OJ
rt l-i
o to
)-l -H
3
°--e
• o
CD O
CJ -r-l
O
C O
-H C
CO -H
CJ U
B O
•H 1-1
4-1 CU
CO
CJ
1-1 4J
•H CO
(I rH
c
O VI
•H O
to to o •>
CJ -H ^~,
3 co x: c
o rt u a
>J C -H
o r: cj u
C O rH T3
o
CU
x:
u
co
cu
to to
•H o
a.
O IJ
rH 3
o a.
•H
x; to
CU 3
> O
•H
CJ V4
e rt
•H >
4-1
V4
VI O
O VI
4-1 CO IA
C CJ
3 O 13
O rt
S rH -J-
rt (X
4-1 CO *•
rt 3 O-
x: o ^
4J -r-l
rJ 0)
QJ rt rH
cu > xi
to rt
c c
O -H O
C rt
,M CJ V4
O -H
x; V4 co
CJ TJ -rl
o
c
o
c
o
•a
c
rt
cu
C
o •
CJ "tj
l-i rt
3 o
to
CJ O
"rt
E -a
cj
,-Si
•• CJ
cj oj
w cs
u to
O O Ci
c 4-i rt
U .*! to
O CJ -H
CU
1-1 x: ^N
CU O rH
x; cu
- 3
x>
T3
4.J
«
x:
4J
OJ
CU
CO
0
4-1
y
o
CU
x:
o
4J
C
rt
cu
•H
O
•H
J
j_i
rt
a.
4J
a
OJ
to
cu
V4
P.
to
•rl
-d
!-l
rt
o
c
o
•H
4J
rt
o
•r<
rH
CU
CU
rt
•a
d
o
x>
4J
rt
rt
4J
CJ
CJ
to
o
4-1
^
CJ
a)
_r~;
O
-3
O
•a
CJ
, — i
rH
•H
Vl
-CJ
C
rt
error
or obvio
shee
rt
•o
c:
o
CO
a)
en
o
c
rt
CJ
en
rt
•H
Cu CJ
O QJ
P* to
a,
rt "
4J
4J '.J
ni ""o
•M C
cj T;
co o
o o
4J 4J
4->
<1)
CJ
_n
to
rt
i .
rt
•a
CJ
T3
O
E
>^
cu
C
r*l
CTv
rH
O
CJ
O
CO
*^^
•a
cu
>*
o
4J
C
.
4-1
u
•
CJ
>
V*
O
O
x;
rH
CJ
CJ
rt
Cu
CO
XJ*
p ,
p^
rH
LJ
O
"J
1_!
M
O
CJ
O
U-l
_^i
CJ
CJ
^
u
OJ
o
rt
Cu
CO
-o
CD
O
£,
re
OJ
CO
-H
3
i_i
U
•rt
tc
_^
o
CJ
^
u
X— V
r^
CO
cu
o
rt
Cu
CO
*^s
CJ
CO
•H
^J
1-4
O
O
CJ
ru
01
0
rJ
^
O
a
XI
u
ndence between
o
a.
U}
CJ
>-<
M
O
O
cu
rH
XI
rt
d
0
CO
ra
GJ
l-i
O
•4-1
^
CJ
O
r"
O
•
c\4
4-J
to
OJ
1_J
t<3
rH
LJ
CO
cu
4-1
S-l
o
CO
4-1
rH
3
W
CJ
1-1
113
-------
*~ «=•!
H 1-9 2
U. o
C
o
CJ
3
CO
w <
H H
C/3 CO
H
O W
CO fe
tn 33
C-,
H
.0
ni
H
> <3
OS CO CO
M O
« O3
Oi -Q -H
S Er cd
0 3 >
EH C
4J cd
O >
n
•H -
CO J3
3 &T Ow
O S
C sT Q
•H IJ
p n •
»• -O C3
a) -:t -
u . o
3 x; M
o. o oj
E > P3,
o
a
CO
c
o
-H C
u O
r; "H
VJ CO
i-l CO
ci -H
a> E
O £1)
C
O CJ
O rH
XI
CO C3
rt c
I O
CO
rj
t)
R!
cu co
CJ OJ
S 3
*J rH
O tij
Xi >
O GJ
O ^H
d ex
cu B
T3 tO
d co
o
o. to
m c
•U M
M C
>-i d
O 3
O M
o
o
6'
114
-------
Table 6
TYPES OF COMPUTER CHECKS
1, RANGE CHECKS ON ALL VARIABLES
EXAMPLE: 68°F 1 DRY BULB TEMP 1 86°F (VALID TEST RANGE)
2, ID CHECK FOR ALL TESTS WHICH SHOULD BE INCLUDED FOR
EACH VEHICLE
EXAMPLE: A SET OF 1/0 CODES ARE BUILT INTO THE VEHICLE
ID INDICATING WHETHER FTP/ HFET, EVAP,, SULFATE/ MODAL/
SHORT TESTS ,,, ARE RUN, THEN/ THE EDIT PROGRAM CHECKS
FOR APPROPRIATE DATA CARDS,
3, ID INFO CHECKED AGAINST VIN
EXAMPLE: MODEL YEAR CHECKED AGAINST VIN CODE,
4, FUNCTIONAL RELATIONSHIPS ARE DEVELOPED WHEREVER POSSIBLE
EXAMPLES: MODEL YEAR RELATED TO MILEAGE; ROADLOAD
HORSEPOWER RELATED TO INERTIA WEIGHT; ENGINE SETTINGS
COMPARED WITH MANUFACTURER SPECIFICATIONS; ALLOWABLE
EMISSION LEVELS DEPENDENT UPON MODEL YEAR; NUMBER OF
CYLINDERS RELATED TO ENGINE DISPLACEMENT; FUEL ECONOMY
RELATED TO ENGINE DISPLACEMENT,
5, COMPOSITE VALUES COMPUTED FROM COMPONENTS AND COMPARED
TO CARD VALUE
EXAMPLES: COMPOSITE FTP COMPUTED FROM INDIVIDUAL BAGS;
FUEL ECONOMY COMPUTED FROM HC/ CO/ COo DATA USING CARBON
BALANCE,
6, RANKING COMPARISON OF RELATED VARIABLES
EXAMPLES: IDLE MODE EMISSIONS LESS THAN HIGH SPEED MODE
EMISSIONS; HIGHWAY FUEL ECONOMY GREATER THAN FTP FUEL
ECONOMY; COLD EMISSIONS GREATER THAN STABILIZED EMISSIONS,
7, CHECK THAT EXPECTED BLANK COLUMNS ARE BLANK TO ENSURE
PROPER COLUMN ALIGNMENT
8, DATA ON VEHICLE COMPARED TO PREVIOUS DATA ON SAME VEHICLE
EXAMPLE: EMISSIONS TAKEN AT TWO DIFFERENT LOCATIONS OR
TWO DIFFERENT TIMES ARE COMPARED,
115
-------
Table 7
TYPES/SEVERITY OF DETECTED ERRORS
1, ERRORS IN TEST PROCEDURE
A, DETECTED IN MANUAL AND/OR COMPUTER EDIT
B, TEST IS INVALIDATED
c, EXAMPLES: DRIVING TRACE OUT OF SPECS (MANUAL)
WRONG INERTIA WEIGHT SETTING (COMPUTER)
2, ERRORS IN CALCULATION METHODOLOGY
A, DETECTED IN MANUAL AND/OR COMPUTER EDIT
B, ALL DATA ARE CORRECTED BY LOOKING AT PACKET
c, EXAMPLES: USED WRONG SCALE TO READ EMISSIONS (MANUAL)
COMPOSITE FTP INCORRECTLY CALCULATED (COMPUTER)
3, KEYPUNCH ERRORS
A, DETECTED IN COMPUTER EDIT
B, ALL DATA ARE CORRECTED BY LOOKING AT PACKET
c, EXAMPLE: ENGINE DISPLACEMENT DISAGREES WITH VIN
AND/OR IS OUT OF RANGE
L\, MISSING DATA
A, DETECTED IN MANUAL AND/OR COMPUTER EDIT
B, TEST INVALIDATED UNLESS MISSING DATA CAN BE FOUND
c, EXAMPLES: DRIVING TRACE MISSING FROM PACKET (MANUAL)
BLANK FIELD FOR ENGINE DISPLACEMENT (COMPUTER)
5, DISCREPANCY BETWEEN DATA CARD AND DATA PACKET
A, DETECTED IN COMPUTER EDIT CHECK-OUT PHASE
B, PACKET VALUE ASSUMED CORRECT
c, EXAMPLE: RLHP READING ON DATA CARD is OUT OF RANGE
AND DISAGREES WITH WHAT IS RECORDED IN THE PACKET
116
-------
Table 8
TYPICAL REASONS TESTS HAVE BEEN REJECTED
1, WRONG CVS COUNTS, EITHER TOO HIGH OR TOO LOW
2, EXCESSIVE CRANKING TIME, OVER 10 SECONDS WITHOUT REGARD
FOR PRESCRIBED PROCEDURES FOR RESTART
3, WRONG INERTIA WEIGHT SETTING ON DYNAMOMETER
4, WRONG HORSEPOWER SETTING ON DYNAMOMETER
5, EMISSIONS CONCENTRATIONS READ OFF-SCALE OF ANALYTICAL
EQUIPMENT
6, LABORATORY BACKGROUND EMISSION LEVELS TOO HIGH
7, VEHICLE HAS WRONG AXLE RATIO
8, SAMPLE BAGS NOT ANALYZED WITHIN 10 MINUTES OF TEST
COMPLETION
9, DRIVER'S TRACE NOT FOLLOWED AS PRESCRIBED
10, RECORDING MALFUNCTION/ 110°F DURING TEST
11, INITIAL FUEL TEMP, TOO HIGH (63°F)/ OR HIGHER
12, SOAK AREA TEMPERATURE TOO HIGH, FOR PRESCRIBED PORTION
OF VEHICLE SOAK PERIOD
13, TEST AREA TEMPERATURE TOO HIGH FOR VEHICLE TEST PERIOD
m. CVS TEMPERATURE NOT WITHIN ±10° OF SET POINT
15, ANALYTICAL INSTRUMENT(S) SPANNED INCORRECTLY
16, TEST ITEM(S) NOT DOCUMENTED AS REQUIRED
17, ENGINE TIMING NOT CHECKED
18, ENGINE TIMING SET INCORRECTLY
19, ENGINE IDLE CO NOT CHECKED
20, ENGINE IDLE CO SET INCORRECTLY
21, ENGINE IDLE RPM SET INCORRECTLY
117
-------
Table 9
CURRENT DETECTED ERROR RATES4
MANUAL
PROGRAM
1
2
3
CONTRACTOR 1
14%
17%
17%
COMPUTER
CONTRACTOR 1
CONTRACTOR 2
21%
26%
(PROGRAM 1)
CONTRACTOR 2
CONTRACTOR 3
32%
CONTRACTOR 3
10%
50%
* PERCENTAGE OF VEHICLES WITH AT LEAST ONE ERROR DETECTED,
LESS THAN 5% OF TESTS ARE INVALIDATED - MOST ERRORS
CAN BE CORRECTED,
118
-------
Table 10
EDITING EFFORT PER CAR (MANHOURS)
PROGRAM 1 PROGRAM 2
LOGGING, FILING, SCOREKEEPING ,1
INITIAL REVIEW ,3
REVIEW AND SCOREKEEPING, RETURNED ,1 ,2
AND RESUBMITTED PACKETS
SUPPLEMENTAL TESTS ,2 ,4
COMPUTER EDIT ,05 ,15
PRO-RATED COMPUTER PROGRAM ,05 ,3
DEVELOPMENT
PRO-RATED MANUAL EDIT ,10 ,5
PROCEDURES DEVELOPMENT**
,90 2,35
BASED ON 100 HOURS OF EFFORT SPREAD OVER 2000 VEHICLES
FOR PROGRAM 1 AND 120 HOURS OF EFFORT SPREAD OVER 400
VEHICLES FOR PROGRAM 2,
BASED ON 200 HOURS OF EFFORT SPREAD OVER 2000 VEHICLES
FOR PROGRAM 1 AND 200 HOURS OF EFFORT SPREAD OVER 400
VEHICLES FOR PROGRAM 2,
119
-------
oo
4-1
^1
O
cu 14-1
CO MH
3 W
o
1 iH
/— s
CO
3
O
A
h
a
o
o
CM
•K
*
m
CM
CM
r-l
m
^
00
in
0
m
co
O O
o m
r^ co
^N
CO
^
6
^
•
-a-
^
w
CO 00
O F^
c_> ^
CU 1
4-1
CO 1^
o --
»4
o
o
o
o
oo
m
o
o
o
o
o
CM
•%
r-H
O
O
O
0
o
CM
{/>
o
o
o
m
i^.
rH
O
0
o
o
in
CO
vy-
o
o
0
o
m
rH
O
O
o
o
rH
00
ft
>
0
O
O
m
*sQ
*3"
«i
{/i-
cu oo
4J r^
u
cfl
cu
o
.c
o
m
oo
•H
M
M
1
ra
cu r-^
H r~»
O ^~
•H 0-
O
o
00
o
o
o
m
CM
o
m
CM
o
o
CM
O
o
r-l
O O
o o
o o
cn
ca
H
4-1
o
cfl
)-l
4J
G
O f-\
U
^-^
r-l
CO 4-1
4-1 CO
O O
H U
O
0
o
*
0
0
V0
^
r-l
•W
o
o
o
•s
o
o
o
r»
CN
•C/J-
o
o
o
o
0
CM
•{jy-
o
o
o
m
rH
•CO-
CO
o
o
*
o
o
•C^h
0
o
o
»l
o
m
rH
{/)-
o
o
o
o
o
*
CM
•0>
O
O
O
•V
o
CO
m
9\
[X^
CU
4-1
o cu
O CO
Z cu
rH
rH O
Cfl -H
O CU
H >
O ON O O
CM CM m m
CM CO CM CM
CM CM
0 O O
O 0 0
**^ *"H **^
^O
cfl
la
ON
W
m
£
o
o
m
00 CO
o a
o 3
O )H
vC IH
O
4-)
a co
CU
in rH
r~ o
m
13
C
cfl
O
PL.
CO
cu
o
cu
3
a1
cu
co
O
m
CM
0)
•a
a
c
*
*
120
-------
Table 12
FUTURE APPROACHES - CURRENTLY BEING TESTED ON A TRIAL BIAS
1, MORE AUTOMATED DATA ACQUISITION
A, DEDICATED COMPUTER ON SITE (GENERATE DRIVING
TRACE/ SET DYNAMOMETERS/ AUTOMATIC DATA RECORDING),
B, CENTRALIZED COMPUTER,
c, COST is HIGH ($75K - 125K FOR A DEDICATED SYSTEM)
AND DIFFICULT TO JUSTIFY FOR YEAR AT A TIME CONTRACTS,
2, REQUIRE STRICTER CONTRACTOR DATA EDIT PROCEDURES
A, ALL KEYPUNCHED DATA MUST BE ENTERED AND THEN
VERIFIED BY TWO DIFFERENT PEOPLE,
B, REQUIRE CONTRACTOR TO APPLY MANUAL AND COMPUTER
EDITING TECHNIQUES,
c, CHECK-OUT OF ALL CONTRACTOR DATA HANDLING PERSONNEL,
D, CONTRACTOR WILL SUBMIT ERROR PRINT-OUT WITH EACH
GROUP OF TEST PACKETS,
3, STATION EPA PERSONNEL FULL-TIME AT EACH TEST SITE
4, ASSUMING CONTRACTOR ERROR RATE DECREASES/ USE SPOT
INSPECTION OF PACKETS
5, USE STRATIFIED SAMPLING SPOT INSPECTION OF PARAMETERS
TO MINIMIZE COST OF ERROR TIMES VARIANCE IN VARIABLE J,
STRATA CAN BE DIFFERENT DRIVERS, ETC, EACH STRATA FOR
EACH VARIABLE IS INVERSELY PROPORTIONAL TO THE SQUARE
ROOT OF ERROR COST AND DIRECTLY PROPORTIONED TO THE
VARIANCE OF THE MEASUREMENT IN THE STRATA, THIS IS
ONLY GOOD FOR CORRECTLY ESTIMATING THE MEAN OF VARIABLE
J,
6, SEQUENTIAL LIKELIHOOD RATED TEST, BASED ON MINIMIZING
THE COST ASSURE A GIVEN OVERALL ERROR RATE, START BY
EDITING THE VARIABLE WITH THE HIGHEST IMPACT ON THE
EMISSION RATE OR THE SMALLEST COST/BENEFIT RATIO IF
COSTS OF EDITING ARE DIFFERENT,
121
-------
Table 13
SPOT TESTING
QUESTION: GIVEN A TOTAL TEST POPULATION OF N CARS, IF
WE EDIT A SAMPLE OF Y CARS AND FIND NO ERRORS/ WHAT
IS THE LIKELIHOOD THAT THE ERROR RATE IN THE ENTIRE
POPULATION IS LESS THAN X%,
ASSUME UNKNOWN ERROR RATE OF P%, SAMPLE SIZE N/
TOTAL NUMBER BAD TESTS X = PN, TOTAL NUMBER OF GOOD
TESTS IS N-X = Y,
, OF N GOOD PACKETS IS
- Y (Y-l) (Y-2) (Y-N+1) _ Y!(N-N)!
N IIFTI TIFZT (N-N+1) ~ (Y-N)!N!
SEE P (CONFIDENCE LEVEL)/ N, p (ERROR RATE)
DETERMINE N
122
-------
43
cfl
H
CB
e
•H
O
to
to
W
MH
O
to
QJ
U
O
rH
Q)
o
Ox
0
00
O
Is*
o
0
LO
o
•*
o
ro
m
CN
0
CN
• *
0
O
•H
4J
ca
4-1
QJ
to
&
QJ
4J
rj
H
O
Ox
0
00
O
r~-
o
no errors among n
CO
13
0
•H
UH
QJ
0
O
<4H
H
Ox
Ox
VO
OX
rH
Ox
^)-
00
m
r^
Cfl CO to Q) Cfl -H
4J 4-1 42 42
QJ QJ Q) 0 4J O CO
rH to £t QJ 4-1
•H QJ O to QJ Cfl
42 42 cfl 0 O 42 QJ
S 4-i cx <; IH 4-1 4-1
J>^
4-1
0
•H
cfl
M
QJ
O
Ox
°\
A!
r-- ox
OX OX
co
4J
CO
QJ
CN
O
o
0
o
•H
4J
3
cx
o
p.
cfl
4-J
O
to
O
CO
QJ
Ox
00
r-
to
Q)
1
>,
4-1
0
•H
ca
O
m.
v
rH
-------
DATA VALIDATION TECHNIQUES USED IN MOBILE
SOURCE TESTING
by
C. Don Paulsell
Office of Mobile Source Air Pollution Control
U.S. Environmental Protection Agency
Ann Arbor, Michigan 48105
125
-------
INTRODUCTION
The EPA laboratory at Ann Arbor is the primary government facility
responsible for certification testing of engine-driven vehicles to
determine compliance with the standards for emissions levels and fuel
economy. Approximately 2500 to 3000 vehicles of foreign ami domestic
manufacture are tested annually. This testing is performed in 10
dynamometer test cells using the constant-volume sampling (CVb) method
to collect emission samples from vehicle exhausts • The samples are
analyzed on seven analyzer sites each equipped with all of the various
instruments necessary for sample analysis. As the vehicles are
operated through a prescribed simulated driving cycle, sufficient data
are also recorded to determine fuel economy. A complete data set for
a vehicle includes information such as vehicle identification data,
test specifications, instrument calibrations, calibration data corre-
lations, test data, calculated (reduced) test data, vehicle manufac-
turer's test results, EPA test results, and quality control data.
After these data have been collected and/or generated, they are
subjected to quality control procedures to assess overall accuracy,
precision, uniformity, and validity.
QUALITY CONTROL SYSTEM
The "products" of our test process are the data which represent the
intangible exhaust emissions of the vehicles tested. The quality
control system assesses the acceptability of this product in terms of
accuracy, uniformity, and validity.
Accuracy is important since these data are used to decide whether a
vehicle meets federal standards. Moreover, a financial penalty may be
applied to any manufacturer for not meeting the standard for fuel
economy. This assessment is five dollars per vehicle produced for
each tenth of a mile per gallon less tnan the standard. Thus, the
question of accuracy could potentially involve millions of dollars.
Since accuracy can be a relative attribute, the data are also checked
for precision and uniformity to determine whether measurements can be
repeated at an analyzer site and whether results from each oi the
seven analyzer sites are essentially equivalent. Finally, since the
"data product" is dependent upon the test process used, the validity
of that process must be verified. Data validation techniques comprise
a very important part of the total quality control system.
TYPES OF DATA VALIDATION
Data validation begins long before the vehicle test is performed anu
continues after the vehicle has been returned to its company.
This broad application of data validation is illustrated in the five
parts of the overall process.
126
-------
1. Calibration Acceptance
2. Operational Verifications
3. Procedural Checks
4. Test Data Review
5. Comparative Measures
The following paragraphs discuss each of these areas and provide
examples of the methods used.
CALIBRATION ACCEPTANCE
A wide variety of instruments and equipment are used in the measure-
ment process. It is obvious that each unit must be calibrated, but
what is not obvious is how to validate that a calibration has normal
characteristics. The calibration procedures are often more compli-
cated than the test procedures - an erroneous calioratiori can only
produce erroneous test data.
The QC methods used for calibration validation emphasize the quanti-
tative aspects of the equipment characteristics. For example, a
dynamometer is calibrated to establish residual bearing frictions.
These frictional values tend to have predictable magnitudes across all
dynamometers. Use of this characteristic can provide confidence in
both accuracy and uniformity for dynamometer calibrations.
The analyzer and constant volume sampler also have unique character-
istics. An analyzer curve can be assessed in terms of nonlinearity,
curve fit deviations, and the absence of inflections. A CVS utilizes
a critical flow venturi which has a characteristic discharge coeffi-
cient of .985 to .995. This coefficient, the ratio of actual to
theoretical flow for a given throat diameter, can be used to assess
flow metering accuracy and long term stability.
The dynamometer, constant volume sampler (CVS), and analyzer represent
the three major components of the measurement process. A proper
calibration is a necessary condition for getting valid test results,
but the operational verification is equally important.
OPERATIONAL VERIFICATIONS
This phase of the process is used to assure that the equipment can
measure and produce a known result. Special tests are conducted at
daily, weekly, or bi-weekly intervals to produce a QC parameter whicti
can be normalized relative to all systems. These parameters are
manipulated statistically or plotted graphically to assess control of
the process accuracy and precision.
127
-------
For example, the CVS is checked by injecting a known mass of pure
propane as though it were auto exhaust. All measurements and calcu-
lations are performed as in a test and the result must be within
plus or minus 2 percent of the known value. Leaks, calibration
drift, erroneous analyzer span gas values, and many other parameters
can cause the verification to fail•
The analyzer is verified daily by analyzing a bag of blended gases
at each of the seven analyzer sites. The deviation of each site
from the overall average serves as the normalized parameter. A site
which is consistently high or low or inconsistent will be obvious from
the automated control chart analysis. Positive or negative consecutive
runs greater than five, or excessive data scatter are automatically
flagged and noted by a QC message on the analysis printout.
The dynamometer gets a short version of the full calibration to verify
its stability. Control charts of flywheel frictional values will
graphically show a deteriorating bearing or load control problem.
Finally, a repeatable car is tested on each site utilizing all the
normal components of the system. The emission results are statis-
tically analyzed to show significant differences between sites. These
operational verifications each address a specific part of the process,
and when assessed in total, provide assurance that the system is
capable of producing valid emission results from a properly conducted
test procedure.
PROCEDURAL CHECKS
A complete emissions test can require a total of about eighteen hours,
including the twelve hour overnight "soak" period. The specifications
and criteria are so numerous that a set of checklists have been
developed to document that each one has been clone properly. Test
times, temperatures, shift patterns, horsepowers, special procedures,
and many other conditions are noted or checked off as each phase
progresses. In some cases, such as fueling, the operation must be
witnessed by two people, since the type of fuel can greatly affect
emissions.
Although the test equipment has been previously verified, several
checks are performed as part of the test. An open valve or improper
horsepower setting would cause the test to be voided-
At the end of the test process all the stripcharts, cnecksheets,
datasheets, and driving traces are consolidated, reviewed, and sent
for computer processing.
128
-------
TEST DATA REVIEW
The test processing office validates that all necessary data have been
obtained. The data sheet is then batch processed by computer to
generate a printout of input data, calculated results, QC checks, and
pass/fail criteria. The computer program has been designed to audit
the various test data for omissions or unrepresentative values.
For example, a 40,000 pound automobile would likely be a 4,000 pound
value improperly keypunched. Other data, such as ambient background
concentrations can be compared to a normal distribution of values
obtained at EPA to flag high levels. Higher values may indicate
improper analyzer parameters or a leaking vehicle exhaust system.
Since some of the test sequence is repeated, a ratio of two flowrates
or distances travelled can be very useful in highlighting abnormal-
ities. A normalized ratio has become a valuable tool because it is
not affected by the magnitudes of parameters, which may normally be
different. It is the ratio of these different magnitudes that pro-
duces a value which lies within a narrow bandwidth. The ratio of
highway to city fuel economy is an example of this application.
If all data have been validated and all acceptance criteria met, the
documentation is stored and the results are updated as valid in the
computer data base. While this completes the processing of one test,
it is not the end of the data validation process.
COMPARATIVE TESTS
Each test alone has certain characteristics, and all tests combined
have other useful measures. Comparative tests on large populations of
vehicle results can highlight differences and trends that an indivi-
dual test does not show.
The manufacturer has normally tested the vehicle prior to EPA's test,
so an independent set of data is available for comparison. The
MFR/EPA emission differences and percent differences are calculated
and stored in a "paired data" file. These normalized values can then
be statistically summarized for each manufacturer group. The results
of this analysis show the relative agreement between EPA and all
individual manufacturers. If EPA is consistently higher or lower, a
systematic bias may be indicated. Diagnostic tests or correlation
programs can be performed to identify and correct the cause.
Statistical analysis of all these data can provide the upper and lower
limits which are used to assess the significance of a bias. Test
conditions and equipment identifiers can be used to stratify the
analysis for assessing whether such things as altitude differences or
specific test sites correlate with the paired data differences.
Finally, the data validation loop can be refined by the statistical
determination of QC limits.
129
-------
QUALITY CONTROL REFINEMENTS
A strong data validation program can be developed by automating many
of the checks being made- Computerized validation and acceptance
tests require that the data be pertinent and accessible. Use ot an
integrated data base structure can minimize manual operations, improve
security, and assure the integrity of the data.
A computerized data base can also enable the automation of screening
programs, plotting routines, and statistical summaries. It will
permit rapid development of more precise tools and tests which can be
used in the data validation process.
Finally, a computer data base can provide a trail for audits or
requests for documentation.
CLOSURE
This paper has shown that the data validation process is not simply an
inspection of results at the end of a test. Rather, it is a combi-
nation of specific individual tests and checks which when taken as a
whole, form the foundation for a quality control system which can
provide documented, quantitative assurance that the "data product" of
the EPA mobile source program is fit for use in our regulatory process.
130
-------
VALIDATION OF CONTINUOUS STACK MONITORING
DATA
by
Joseph E. McCarley
Emission Standards and Engineering Division
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
131
-------
VALIDATION OF CONTINUOUS STACK MONITORING
DATA
J.E. McCarley
SUMMARY
The Emission Standards and Engineering Division is currently developing
a revised standard of performance for new steam generators. As part of this
study, the feasibility of continuous regulation of sulfur dioxide emissions,
as well as a percentage of sulfur reduction from fossil fuels, is being
evaluated. In support of this study, the Emission Measurement Branch is con-
ducting sulfur dioxide continuous monitoring projects at five coal-fired
power plants equipped with flue-gas desulfurization units. When data are be-
ing collected for supporting regulations, validation of these data is an
important consideration.
Prior to collecting emission data, the continuous monitoring systems
are validated by following the procedures described in Performance Specifi-
cations—Appendix B 40 CFR 60. (Performance Specification 2--Performance
Specifications and Specification Test Procedures for Monitors of S02 and NOX
from Stationary Sources.)
The monitoring data are then collected and recorded continuously from
each emission point at least once every 15 minutes. In this study, the data
are then placed in a computer bank, printed and then edited or validated man-
ually. During the monitoring periods when data are collected during instru-
ment malfunction, calibration, or plant upset conditions the time periods for
these conditions are recorded by plant personnel. These data are purged from
the computer bank and the remaining data are averaged for each 1-hour, 3-hour,
8-hour, 24-hour, and 30-day periods of time. If more than one 15-minute data
point has been determined to be invalid in any one hour period, that entire
1-hour data are considered invalid and not included in the longer averaging
periods. In summary, the data are edited for actual known errors and no
132
-------
statistical validation procedures are performed.
Further details of these monitoring projects are contained in the fol-
lowing report and references therein: Kelly, W. and Sedman, C. First Interim
Report: Continuous Sulfur Dioxide Monitoring at Steam Generators. EMB Project
No. 77SPP23A, Emission Standards and Engineering Division, Office of Air
Quality Planning and Standards, U.S. Environmental Protection Agency, Re-
search Triangle Park, North Carolina 27711, June 1978. 54pp.
Future plans for evaluating validation procedures include (1) applica-
tion of more automatic recording and data validation instrumentation, and
(2) quality control steps to assure the accuracy of long-term emission
monitoring.
133
-------
SCREENING CHECKS USED BY THE
NATIONAL CLIMATIC CENTER
by
William E. Klint
National Oceanic and Atmospheric Administration
National Climatic Center
Federal Building
Asheville, North Carolina 28801
135
-------
SCREENING CHECKS USED BY THE
NATIONAL CLIMATIC CENTER
W.E. KLINT
ABSTRACT
Current processing is discussed with emphasis on validation checks and manual
interface. The need for an automated quality control program is recognized
and plans for such are presented. Plans for a new modular surface edit are
presented along with a new quality control procedure using an interactive
graphics system. Data management is addressed through a Data Dictionary/Data
Base Management system.
136
-------
The National Climatic Center is:
Responsible for receipt, processing, archiving and publication of
climatological data. Coordinates the analysis of past meteorological
data for NOAA, other Government agencies and the oublic to accommodate
user requirements for climatological data through special studies
and statistical analyses. Manages the national program of climatolog-
ical data recall and works closely with the military in meeting
this special requirement. Provides facilities, data processing
support, and expertise, as requested, for World Meteorological
Organization programs (e.g., 6ARP and GATE). Assists in training
programs to familiarize the representatives of developing countries
with modern meteorology and coordinates (through World Data Center-
A) international exchange of climatic data.
Of the various types of incoming data, paper forms predominate. These
then must be keyed to digital form for processing. This effort entails
keying approximately 37 million bytes of data per month. Because of a
cutback in funding years ago only three-hourly surface observations, or
eight observations per day, are digitized.
At the present time, processing exists in two modes; a machine edit, and
a manual interface.
The machine edit consists of data verification, a range limit check, a
cross-field consistency check, a continuity check, and appropriate flags
to "verifiers."
Data verification is a simple machine check to see if there is indeed
data keyed into the appropriate field, and if there, and the field is
coded, is it a legitimate code.
The range limit checks to see if the value in a particular field falls
into an appropriate range. However, at the present time there is only
one range limit per field. This, in and of itself, causes many unnecessary
"kickouts."
The cross-field consistency check looks at the entries in related fields
for consistency; i.e., clouds and precipitation.
The continuity check does a range limit check on certain fields between
the previous observation and the one being checked.
Finally, if any of the above checks fails, the appropriate flags are
printed out for return to the "verifiers" and appropriate action.
The manual interface, due to the magnitude of data, consists primarily
of a visual scan of all forms. A random sampling of stations receives a
closely scrutinized check of all observations. Problems with the data
requiring corrections are handled as follows: first, the erroneous
entry is crossed through with a blue pencil and the "correct" entry is
made directly above the erroneous one. Second, if the observation is
one which is normally digitized, a change form is routed to key entry.
137
-------
The "kickouts" from the machine edit, which were returned for action,
are scrutinized, a decision on validity is made, and, if necessary, a
correction is made both on the original paper form and on a change form
to key entry.
The change forms are routed to key entry for digitizing and the changes
are again run through the machine edit.
The above procedure is a recurring one until no more errors appear.
Once all the data "pass" the edit, they are formatted into the surface
observation file and entered into the data bank.2
It is fairly obvious that, due to the rather limited nature of these
checks, some erroneous data slip through and are placed into the data
bank. This fact, coupled with the realization that the magnitude of
incoming data in digital form is on the increase, and with the fact that
a more closely "real time" edit is both possible and needed, is forcing
changes upon NCC.
Although the basic processing stages of machine edit and manual interface
will remain the same, the nature of each will take on a new and challenging
meaning.
With the innovation of the new National Weather Service Automation of
Field Operations and Services (AFOS) system, the NCC will acquire near
real time collection capabilities of data in digital form. These, plus
manuscript forms, create a real need for dual processing of data.
The edit computer program is being completely rewritten, as in its
present form it is difficult to maintain. It is designed in a modular
form and many previously manual functions are designed into the program.
The creation of a Master Station Inventory (MSI) will completely change
the complexion of the edit program. The basic edit routines remain the
same, with the following changes:
1. The verification step will now be checked against the MSI for
validity. Previously some missing entries were flagged to a "verifier"
whether they were missing or simply not observed at that particular
station. The MSI will now be checked for proper disposition before an
error flag is returned, thus alleviating the "verifier" of this task.
2. The creation of the MSI will allow for a complete set of range
limits for every field of every individual station, thus preventing
unnecessary "kickouts" for "good" data, and providing for a narrower
range limit check of each field.
3. Cross-field consistency checks will remain basically the same
with the provision that with the above mentioned checks, should be more
reliable. They have been "beefed-up" to contain closer checks and
checks previously left to the "verifiers."
138
-------
4. If an error is isolated and a flag is called for, a check is
first made with the MSI to see if a mathematical relationship exists.
If one does, a new value is calculated and entered beside the original
with an appropriate flag.
If an error is isolated and no mathematical relationship exists, the
appropriate flag is issued and the observation queued for scrutiny by a
"verifier." All observations changed by a "verifier" are automatically
re-entered into the edit program.
The manual interface by the verifier will consist of interacting with
the data through use of an interactive graphics system. The "verifier"
previously had only manuscript forms as input to his decision. Now he
will be able to present the data in any of several displays including
contoured map analyses of a surrounding areal coverage. With this input
the verifier will be able to make a more intelligent decision as to
proper disposition of questionable data.
Up to this point we have discussed only a superficial edit of the incoming
data. We have not, as yet, looked at the inherent quality of the data
itself. NCC, at the present time, does not have the capability of doing
relational checks on the data. With the acquisition of the Asymptotic
Singular Decomposition (ASD) model, developed by Dr. John Jalickee,3
CEDDA, NCC now has this capability.
In its simplest terms the ASD model uses the method of least squares on
a data matrix.
The first step is to calculate a "characteristic" vector for the matrix.
Next, the differences between the data matrix and the appropriate "charac-
teristic" are calculated. The matrix is now overlayed with these dif-
ferences and the process is iterated.
The first component of vector magnitudes, when plotted, results in a
graph of the dominant features; the second component, the features of
the difference matrix, etc. We have found that with most data fields,
the second and third component plots prove to be the most useful for
validation. By the time the fifth component plot is made, we have
usually reached the noise level.
The data, thus plotted, can be expected to show "continuity." The
physical relationship of the field should be apparent in the graph. If
that relationship breaks down at any point in the graph, we can assume
bad data. This model will give NCC the capability to perform quali-
tative (relational) checks on all incoming data.
A side effect benefit of this model is the capability of building a
station "normal" situation. Based on this, such things as instrument
drift, miscalibration, and erroneous launch data become readily apparent
when exposed to a trend analysis. Once isolated, these "bad" data can
be adjusted "toward" the normal with at least some degree of accuracy.
139
-------
The concept of the "verifier's" job changes somewhat under this new
approach. The computer edit now will do much of the job the verifier
did previously, thus relieving him of that task; the bulk of which was
scanning "good" data. Upon his arrival for duty, he sits down at a KCRT
console or terminal and calls up the flag file for his particular area
of interest. He sits in the seat of judgment and makes those decisions
too delicate or volatile to have been programmed into the edit routine.
Once made, these observations are returned to the edit queue to be run
once again. Only after an observation "passes" the edit program is it
allowed to continue into the ASD model.
The results of the ASD run are displayed in one of several graphic modes
for verification. Realizing that the normal range of this display is
from +0.5 to -0.5 units, its power and usefulness becomes apparent.
Remember here that this is a display of the second or third component of
the data field, and, as we are working with differences, should nicely
fit within this range. The "outliers" will stand out here with striking
notoriety. The verifier now has the task of "replacing"* the "outliers"
with a more reasonable value. This can be done simply by sight align-
ment of his cursor or light pen with the trend of the curve or by having
the computer do a best fit. Although this sight alignment appears to
be a rather gross correction, when it is "blown up" into the initial
state it becomes very tolerable.
All original data are kept, with corrections and appropriate flags being
entered adjacent to them before being incorporated into the NCC data
bank. This will allow use of either datum by the user.
The NCC is currently planning a database environment. This quality
control process will allow us to place only QC'd data of a high relia-
bility into our database, thus assuring the user of quality data.
Another side effect of ASD is its compaction possibilities for storage.
The set of components for a data field can be "blown up" to explain 99%
of the original field; thus NCC can store components and blow them up to
the "original" field on output. This will result in many orders of
magnitude reduction of the necessary storage facilities.
*Note here that "replace" does not imply that we destroy the original
value. It will be maintained and output along with the corrected value.
140
-------
REFERENCES
For your convenience, a copy of the following three references are included
herein, starting on the next page. The generosity of Mr. Walter James Koss,
Primary Data Branch, EDS, Asheville, NC 28801, for supplying these references
for publication in these Proceedings is appreciated.
1. Barton, G. and Saxton, D. The Role of Interactive Computer Systems in
Data Processing at CEDDA. Environmental Data Service (EDS) Magazine,
pp. 10-14.
2. Edit Procedures - Surface Observational Data. Surface Section, Primary
Data Branch, National Climatic Center, Asheville, NC 28801. August 1975,
31 pp.
3. Jalickee, J., et.al. Validation, Compaction, and Analysis of Large
Environmental Data Sets. Environmental Data Service (EDS) Magazine,
pp. 3-9.
141
-------
The Role of
Interactive Computer
Systems in Data
Processing at CEDDA
By Gerald Barton and David Saxton
Introduction
The Environmental Data Service's
Center for Experiment Design and
Data Analysis (CEDDAI processes
enormous volumes of interdisciplin-
ary environmental data collected in
major field research programs and
projects, such as the recent GARP
I Global Atmospheric Research Pro-
gram ) Atlantic Tropical Experiment
I GATE I. As an example, CEDDA re-
ceived 1,700 miles of magnetic tape
data from the four U.S. ships (Re-
searcher. Oceanoprapher, Dallas, and
Gillis) in GATE's primary array.
CEDDA's goal is rapid processing
to provide the data to the scientific
rommunitv as soon as possible after
the completion of a field experiment.
One necessary step is editing the data
to rcmo\e invalid readings. CEDDA's
current turnaround time for interac-
ti\e editing of a data file, is 1 to 3
weeks. It is hoped that a new inter-
active computer system CEDDA is
cunentl) assembling will cut this
lime to ''. hour or less.
Data Collection
During field experiments, environ-
mental data are recorded continuously
by instruments on ships, towers,
buoys, balloons, and other platforms
at sample rates from 10/second to
4/second. A wide variety of specially
calibrated sensors measure such vari-
ables as temperature, dewpoint, pres-
sure, wind, radiation, salinity, and
rainfall. The outputs are processed
and stored on multitrack magnetic
tapes. One track is used exclusively
for time so that the exact Julian date,
hour, minute, second, and 1/10 sec-
ond for each sample are known.
To augment this high-resolution
taped data, each major sensor sub-
system output is supplemented by
logs, stripcharts, and optical marked
cards that record calibration checks,
sensor changes (with all serial num-
bers), and special events, such as the
beginning or end of an instrument
cast.
The completeness of the data sets
and their security are matters of
prime concern. At the end of a phase
of a field experiment, or at other
convenient intervals, all tapes, logs,
cards, etc., are shipped to CEDDA
using the safest methods available.
During GATE, CEDDA had a data
manager on each of the 4 U.S. ships
in the primary (B-scale) array and
also at the GATE Operations Control
Center to ensure the completeness and
security of the transfer process.
Current Processing System
At CEDDA, the incoming analog data
tapes are first checked for recording
quality and completeness. Next, a
minicomputer converts the analog
data to digital form, producing a
digital tape. Playback time is 32
times faster than field recording
speed, so an 8-hour field tape is tran-
scribed in about 15 minutes. During
the minicomputer processing, an ad-
ditional computer time word is added
to each sample to control subsequent
data processing programs and to pre-
clude the loss of any sensor data due
to malfunction or noise in the field
time system.
Processing next proceeds to one of
NOAA's larger computer systems,
where data sets are organized by com-
ponent systems used on the data col-
lection platform, e.g., Oceanographic
Data Set or Rawinsonde Data Set.
Graphical display of the data as time-
series plots and graphs, and fre-
quency distribution plots, is required
for the analysis of these data sets.
The editing features of the current
computer processing system can be
thought of as an interactive graphics
system, with the time required for in-
teraction varying up to a week or
more. For optical mark cards, reac-
tion is rapid since all event cards may
be listed in chronological order and
cards may be inserted, deleted, or
corrected using a list-edit program in
the minicomputer. However, for high-
resolution meteorological or ocean-
ographic data which must be trans-
formed to engineering units and
properly scaled, display for editorial
review is currently limited to a micro-
film graphics subsystem located in
nearby Suitland. Md. For these data
sets the time required for interaction
includes the transport of data tapes,
generation of microfilm graphics in a
batch mode at the remote site, trans-
port of microfilm on the return loop,
review using microfilm readers, test-
ing of automated corrections when
required, and the recycling to display
New Processing System
CEDDA is currently assembling the
hardware and software necessary to
implement an interactive computer
system that will allow the data editing
and updating functions to be per-
formed in a single processing step
I real time). The main components of
the system will remain a Digital
Equipment Corporation (DEC) PDP-
] 1/50 minicomputer and an IBM
360/65. It will be possible to access
data on the IBM 360/65 through the
PDP-11 or through terminals. The
142
-------
PDF 11/50
(184K bytes)
Floating-point hardware
Line frequency clock
Programmable real-time clock
RSX-11D operating system
DRUB
Interactive
? graphics
interface
DEC-writer
terminal
Optical
reader
•:•/•..- •:-->'fc:v??.y.f3
-r DR11C (12) t;
J~^" Uecomutation ••'/
•*" interfaces >>
Future DPll-DA
link to .S"~ Synchronous
IBM 360 communications
interface
BLUE
__ — _^_ Auto answer
acoustic coupler
interface
DL11E
— _f~~ Auto answer
acoustic coupler
interface
High speed
paper tape
reader/punch
Versatec
printer-plotter
9-
_. track
800 BPI 9-
^_^^ track
"~~~~ track
2UO, bob,
9. 800 BPI
track
1 — 800. 1600
RPT 9-
J — - track
• — — 800, 1600
BPI
40 ..:;\;\ . •'•.:;•
C million
u » 40
bytes ;*y
million
bytes
SUPERBEE r
Kpvbnnrrl Centronics
cathode ray tube iuu char/ sec
. . printer
terminal
I .phnratorv _ \ir i.- l ]
peripheral analop
system "" "" ' " • .
A/D hardware
CEDDA's proposed interactive
computer configuration.
143
-------
IBM
360/65
Color
TV monitor
Monochrome
TV monitor
Monochrome
TV monitor
Pictoral
hard copy
( 16 shades)
-4-7 PDF 11/50
* ** (184K bytes)
)
R
> , i \ > i
"• v.
*•'••'••
Track '•,-}
ball (2) ;;
._,.,..,_ Kcyhnnrd (?)
Pencil and
tablet (2)
i
AMTEK . -,,.,
and display
1256 levels)
t
j Video
' tape
recorder
CEDDA's proposed interactive
graphics subsystem configuration.
144
-------
PDP-] 1 will have a graphics sub-
system that will take less than 30 min-
utes to perform the functions of the
current microfilm subprogram.
The major features and components
of the interaction system are:
(Ij Access to the IBM 360/65 time-
sharing facilities via key, board cath-
ode ray tube iKCRT) terminal, ASR-
33 teletype terminal, or PDP-11/50
minicomputer.
(2) Input terminals to the PDP-11/
50. including an LA-30 DEC writer
terminal, a KCRT. and two dial-in
terminal interfaces for use with re-
mote terminals.
13) A graphics subsystem for the
PDP-11/50.
(4) DEC's (RSX)-llD real time,
priority-driven, multidisking execu-
tive system for the PDP-11/50.
With these features, a user can ac-
cess the 360 to perform mathematical
computations or generate data sets.
He can look at the data and analyze
them in real time on the interactive
graphics subsystem. V^ hen he finds
errors, he can immediately correct
the data, and display them again on
the graphics system to validate the
corrections. He can then archive the
updated data set for future use.
Interactive Graphics
Capabilities
The interactive graphics subsystem,
designed and assembled by Operating
Systems Incorporated of Tarzana,
California, consists of a RAMTEK
graphics display system interfaced
with CEDDA's PDP-11/50 computer
by an appropriate switching network.
Features of the full system (onlv part
of which is required for the data
editing job) include two black and
white TV monitors, one color TV
monitor, two data entry keyboards,
two pencil and tablet systems, two
track ball cursor controls, a television
tape recorder with microphone input.
a TV camera with zoom lens, an ana-
log to digital converter, eight planes
of memory that allow up to 256
shades of gra} or coloi and a cross-
print switching network that allows
mixing control of inputs and outputs.
A simple use of an interactive
graphics system is the editing of raw
data displayed as'a time-series analy-
sis or plot. For example, a single
parameter, such as temperature, is
plotted at its highest resolution in a
time sequence covering several hours
or days. Visual inspection of the data
may reveal large errors where the
sensor or telemetry sv stem failed. To
correct these larger errors, a win-
dow edit program might be tested
with all "'good"' values of the param-
eter constrained to fit between the
upper and lower limits of the window.
Diurnal and other trends might be
superimposed on the data plot. The
limits and trends can be displayed
with the raw data to show which data
points ^hould be edited out.
A slightly more sophisticated ver-
sion of this time-series plot would
compute running means over minutes
ur hours and shou which of th° liiHh-
resolutiori point* nil! fall outside two
or thiee standard deviation*. Complex
cm vp= nsini: higher 01 dot polv-
nominals can be fitted to time-series
data, both before and aftci various
editing passes, to eliminate, insofar
as possible, "noise" from the data.
Various filteis and smoothing func-
tions also ran be tested and evaluated
befoie going into an Automatic Data
Pioccssing ( \DPl production mode.
In general. CEDD V's new interac-
tive maphic« .system will make it pos-
sible to display two 01 more curves
simultaneously, using coloi. intensity.
or blinking characteristics to distin-
guish, for example, between a stand-
ard and dial edit scheme or between
different parameters. It u ill provide
the capability to produce haul-copy
documentation of both the trial pio-
grams as thev progress dming a test
and the data sets used.
\ more demanding requirement of
an interactive giaphics svstem is the
ability to display and opeiate on dig-
itized field data. Vn example of this
type of data i- ,i digitized i.id;u pic-
ture. Under the contiol of an inter-
active graphics system, the analyst
should be able to select and display
a radar picture, to rotate and rescale
it to a standard grid size, to enhance
the digitized increments bv contours
or false color transfer*, to overlay
and compare it with the previous pic-
ture, and to display onlv those points
fiom the two pictutes whose change
exceeds some threshold value. Simi-
larly, the analyst should be able to
display the overlap portion of disi-
tized radar pictures fiom two loca-
tions and to scale and normalize these
independently so that compatibility i?
established on common echo systems.
A further refinement is (he addition
of a TV-type scanner so that analog
material can be rapidly digitized at
high ieso)ution and then handled with
all the capabilities of the interactive
graphic* system For example, a satel-
lite visual range photograph could be
scanned and digitized and then dis-
placed with a radar picture coveiins
the same area. Specific rainfall rates
fiom surface observations could be
overlayed on (be same display so that
some integration of area] rainfall
amounts would be immediately avail-
able.
\n interactive graphics system pro-
v ides the ability to overlay data from
different platforms or different sv s-
teins For example, the temperature
and vertical velocity from sensors at
several levels on a tethered balloon
sy stem could be compared by an
analyst for coherence and lags as eon-
vertive plume? are sampled. Prop-
sonde* I atmospheric soundings'1 from
aircraft could be graphically super-
imposed on simultaneous ladiosonde
soundings from ships Spcdia taken
by instrumented aii craft duiinc ship
fly In s can be compared with hiiih-
resolution data recoided on board
each ship.
CFDP \ plans to rune the neu
interactive computei system in oper-
ation by late 1(17,~> In that time, im-
plementation of flic graphic* subsv --
tem should include the v\oik done in
145
-------
the current COM c\cle. Future
CEDDA applications of the graphics
s\stem will include program'; that al-
low display of radar or satellite pic-
tures in multicolors or up to 256
shades of gray using the tape-record-
ing features of the graphics -system.
It should be possible to construct
time-motion pictures of changing
weather features. Also envisioned is
the capability to display slices
through 3-D models of weather sys-
tems. CEDDA currently has analysis
programs that allow an analyst to
change parameters in a weather
model. The real time operation of
the graphical display should allow
the scientists to experiment with
parameters that he may never have
had fhe opportunity to look at previ-
ously.
It can be seen from the above ex-
amples that an interactive graphics
system has broad applicability, ex-
P
Gerry Barton
The Authors
GERALD BARTON, Chief of CED-
DA's Computer Systems Branch, has
a B.S. degree in Geophysics from
Pennsylvania State I niversiu and an
M.A. in Geological Science from the
Lniversit) of Texas. Before coming
to CEDDA, he worked for ten \ears
with the U.S Naval Oceanographic
Odice as a geophysicist. His early as-
sociation with the Oceanographic Of-
fice included gravity surve\ cruises
in the I'SS Archerfish, a research
submarine, in the Western Pacific
and off the east and west coasts of
the United States. From 1967 through
January of 1974, when he joined
CEDDA, Cerrv spent most of his
time working in computer program-
ming, systems design, and the proc-
Dave Saxton
essing of gravity and geodetic data—
to determine, among other things,
the deflection of the vertical, or
"which way is up."
DAVID SAXTON joined CEDDA as
Chief of the Operations Division in
April 1974, following a 30-year ca-
reer in the Air Weather Service
which took him to England. France,
German), and Japan. Dave has a
B.S. degree from the University of
Michigan and an M.S. from the Uni-
versit\ of Chicago. During World
War II, he served as an Air Force
weather forecaster in Europe. After
the war and a year of civilian/stu-
dent life, he was recalled to active
duty and assigned to the joint
Weather Bureau/Army/Navy Weath-
er Central in Washington, D.C. Sub-
sequently, he was posted to the Tokyo
tending from program design and test
through all stages of data reduction
and processing to scientific data anal-
ysis. In addition, interactive graphics
provides programmers and analysts
with the ability to see the data move
through programs from recorded
voltages on multiple channel tapes
until they become validated meteoro-
logical or oceanographic data suitable
for permanent archival and dissemi-
nation to the user community.
Weather Central, then to the USAF
Weather Central in Suitland, Md.,
later moving with that organization
to Offutt AFB, Nebraska. In 1961 he
was assigned as Chief of the Strategic
Air Command Weather Support Cen-
ter in High Wycombe, England. Four
years later he was assigned to Air
Weather Service Hqs.. Scott AFB,
Illinois, as Chief of AWS' Computer
Techniques Division. In 1967, Dave
returned to Offutt, now the Air
Force's Global Weather Central, as
Chief, Development Division, and
later Chief of Operations. In 1971
he went to Hickman AFB. Hawaii, as
Chief of Operations Division, Head-
quarters, First Weather Wing. Retir-
ing fiom the military in March 1974
I with the Legion of Merit). he joined
CEDDA the following month.
146
-------
EDIT PROCEDURES
SURFACE OBSERVATIONAL DATA
Contents Page
Card Images Keyed 1
Procedures 2
No. 1 Card Edit 4
Psychrometrie Check 4
Limiting Range of Variability 6
Wind, Weather, Temperature, and Visibility 7
Cloud Coding 9
Clouds and Obscuring Phenomena 11
Explanation of Edit Flags 16
Visual Checking of Records 19
No. 3 Card Edit 19
Machine Computations 24
Precipitation Data Card Images 26
Checking Procedure - Hourly Precipitation 27
Checking Procedure - Extreme Precipitation 28
Maximum Short Period Precipitation 29
Surface Section
Primary Data Branch
National Climatic Center
Asheville, N. C. 28801
August 1975
147
-------
SURFACE OBSERVATION RECORDS PROCESSING
NWS FAA NAVY LAND*
FOB
TSB
Eistrib«ti»n to E3S,
Kfl U>. and ISO t».
geceiv«s records (unuscript tacSM
• Dd charts) froa KWS £ FAA stations.
rre-«dits fora* and indicates keying
instruction*.
Kikes copies oC Preliminary LCD'"
(Tora »-«> .
ADPSD
Data Entry *eys eata on tape.
Opens. Sect, organise* cUt«
on tape and edits.
FOB
ADPSD
1. Revitv* edit
2. CorreeCs li«tii>9» and fomi.
3. Preparu ditenpaney r*pacu.
4. Kauriui discrepancy reports.
I. D«tft entry keys corrections.
3. Oprns. Sect, updates tapes and re-
edits (repeated as necessary to
obtain clean data) .
3. tuns LCD COM copy for printing.
\
IfDCCK
for
frlntia?
toe
%>*
ADPSD
\
•tier SCC rroira
Ustings-
1. Heviews Listing..
2. rrepares data for special jobs as
required.
J. **vi Annual cape*.
nan* data tables for LCD Annual
Ann. corrections-^
->U33 Ann. control cards ^
. da« tables'
trol cards for ten
3. IrtMhl-T aad reviios US Annual
Print* and diitribut*. LCD
Data on tape
listings*
or cards
1. Microfilm records.
2. Archives reco: i» and Kicrofilai.
1. Kalataias stack. -A publications
•S«rr Land r*cei
-------
EDIT PROCEDURES
SURFACE OBSERVATIONAL DATA
I. Introduction
A. Surface records are received at NCC for processing and quality
control to produce several routine summaries by machine methods
from taped data. Processing includes keying, verification, and
quality control procedures. After processing, records and sum-
mary products are archived at the NCC.
B. A joint machine edit program for a portion of the hourly obser-
vations has been made by EDS and AWS. However, where different,
only that which is applicable to EDS is listed in this outline.
C. Data are keyed on magnetic tape. If wet bulb temperature and
relative humidity values are not in the basic data as keyed,
machine computations of these values are entered on tape.
D. The taped data are machine edited, corrected, and used in a
number of machine programs producing various monthly and annual
summaries.
II. Card Images Keyed
A. WBAN No.l card - Hourly Surface Observations. This image is
keyed only for the hours corresponding to 3- and 6-hourly syn-
optic times in LST for NWS, FAA, and Navy stations.
B. WBAN No. 3 card - Summary of Day. This image is keyed from the
summary blocks of Form MF1-10B, the B-16 or, in a few cases,
from the F-6. For FAA stations, the form is MF1-10C. In gen-
eral, this image is not used when the station program is such-
that a summary approximately midnight to midnight is not possible.
C. The precipitation card series is:
1. Hourly precipitation - 2 images ( 1 & 2 keyed in col. 12) for
each day of the month having precipitation and for the last
day of the month with or without precipitation.
2. Maximum short period precipitation, per month - 2 images
( 1 S 2 keyed in col. 10) for each station per month showing
maximum amounts for time intervals of 5 to 180 minutes.
3. Maximum 24-hour amounts, per month - 1 image (4 in col. 12)
is keyed showing the greatest precipitation and date(s),
greatest snowfall and date(s), and the maximum snow depth and
date(s) .
(a) When the value is zero (0), date is left blank.
149
-------
III. Procedures
A. A scan edit of the forms is made and keying instructions appli-
cable to the station program indicated on the station folder.
B. Data Entry Section keys data on tape.
C. Operations Section transfers keyed data to computer tapes by
record type.
WBAN No. 1 images, hourly observations, are placed on two tapes.
1. Tape No. 1 includes NWS (except Antarctica) and FAA stations.
2. Tape No. 2 includes NWS Antarctica and Navy stations.
3. The edit program'provides for priority editing on tape No. 1
into two groups.
a. Group No. 1, stations in the LCD program, is edited in
two lots - first and second cutoffs. The first cutoff
is made at the discretion of the Chief, Surface Section,
when 75 to 90 percent of the records for the month are
available; remaining records constitute the second cut-
off. Records received unduly late can be held for
processing with data for the next month.
b. Group No. 2, stations not in the LCD program, is usually
processed after completion of group No. 1.
D. WBAN No. 1 records are edited according to the station's observa-
tional program using a reference tape containing the station WBAN
Number, Name, Elevation, Psychrometric Pressure Table, and Obser-
vational Pattern.
1. The observational pattern is designated by assignment of nu-
meric values to fields in the card image and use of the sum of
the field values applicable to the station for each hour as a
control of the machine tests to be made.
Value Card Image Columns
1 14-16
Sky Condition 2 17-20
Visibility 4 21-23
Wea. & Obstruction 8 24-31
S. L. Pressure 16 32-35
Dry Bulb Temp. 32 47-49
Dew Point Temp. 64 36-38
Wind Dir. & Speed 128 39-42
Station Pressure 256 43-46
Wet Bulb Temp. 512 50-52
Relative Humidity 1024 53-55
150
-------
Field Value Card Image Columns
Total Sky Cover 2048 56
Cloud Layers 4096 57-58
Total Opaque 8192 79
The observational pattern is keyed in two cards as illus-
trated in Fig. 1. The station WBAN number in cols. 1-5,
the first 12 hours LST of the day in cols. 7-11, 13-17, etc.,
in the card keyed 2 in sol. 80 as an identifier and the last
12 hours in the card keyed 3 in col. 80 as an identifier.
The observations are sorted from the original tape into
chronological day and hour order, edited, and one observa-
tion only for each hour (first on the original tape if
multiple entries) transferred to another tape (called the
sorted tape).
Only the records questioned in the edit are listed. Complete
data, keyed and computed, in a questioned record are listed
on format paper (Fig.Sa) with triple spacing. Appropriate
flags appear on the line above the data in the first column
of the field(s) questioned. Field corrections are entered
on the second line above the data for keying.
a. An asterisk "*" indicates inconsistency.
b. An ampersand "&" indicates data not in the station's
program, except that, if there is an inconsistency, the
"*" flag instead of the "&" will appear.
Observations not in the station's program are edited
as though all fields were required.
c. "DUP1," "DUP2." etc., are listed to indicate duplicates
up to three. All duplicates are edited, but only the
first observation on the original tape is transferred to
the sorted tape.
d. "MSG" above the day and hour indicates an observation in
the edit pattern is missing. "- -" for the hour indicates
the entire day is missing.
An inventory listing (Fig. 3g) at the end of the edit listing
for each station shows all hours for which observations are
on tape with the total number of observations on the tape
for the month at the end of the inventory listing.
a. "01" printed under an hour indicates an observation with
the cloud field keyed.
b. "02" indicates an observation without the cloud field
keyed.
151
-------
031/53 pOO|Op Op Op 14(347,
1 1 )i> S «|7 1 1,
J 45 JG|;7*4I A[ p' 5: H|M ej SO[S' *- i K'lfc! '-' '^M £>K|b' ti -.9|^['i 11 73(" n »
7 1 "j
|00
111
222
3*3
444
555
666
777
888
999
i i >
|00
1 1 1
222
3|3
444
555
566
III
1 1 1
222
333
444
555
666
ONE
777
388
999
SAC
777
888
999
M4*3
101
1 1 1
222
(333
444
555
,666
?77
!888
5999
III
1 1 1
222
333
444
555
666
TWC
7 7 /
888
999
too
ME
222
333
44*
555
666
)
777
388
999
000
1 1 1
222
1333
t4j4
5 55
566
77|
!I88
3999
Jll
1 1 1
222
333
444
5 5 5
5 66
rHRE
7 77
888
999
n
in
1 1 1
222;
333"
444-
555:
66ft
E
7 7 7
888
999J
Ill
1 1 1
222
333
444
555
666
777
!888
1999
ma ,»•!.*
lie
1 1 1
222
333
4 44
555
666
FOU
777
888
999
ijj,.
000
ill
222
333
414
5 5 5
666
R
7 7 7
884
999
loot
1 1 1
222
>333
1444
)555
>G66
III
1 i 1
222
333
444
555
66 6
FIV
"tn
)888
3998
7 7 /
888
999
""""
|0||
1 1 1
222i
333:
44 4 <
555:
656E
7 7 7
888
999
111
1 1 1
222
333
444
555
666
000
1 .'1
222
333
4 4 i
555
566
0 (i ',' !
1 1 1
222
333
•!|4
5 5 ;.
5 £ 2
SIX
7 77
888
)999
777
888
99?
/ 1
|8S
999
III
1 1 1
2?2
1333
144
555
,666
7 7 7
)888
< ~3 * J
no
1 1 1
222
333
444
5 55
566
Itl
1 1 1
22?
333
4M
555
i ' 6
I
1
2
3
4
3
5
SEVEN
/ ; ?
888
999
"'"""' " -"""
// /
888
999
/
8
IGO
! U
2 22
3 * -
4 4 4
000
1 1 1
222
333
"4f
i
5 5 5
6 65
3 0 C G
111!
222|
3333
!4 4 4
j 5 S'.i 5 ; 5
5 6 6|6 6 6 5
EIGHT
7 7 7
88S
'99
'
/ / /
eg 8
999
1777
5883
3993
.,.«..
03103 pOO,Op OpOOp I|*j34r, 00,000, pOOpO
i i i|i : !j> t •|io|r a u|i< is ii|i.' ii u|»bi uap< Bxfmn\a\it av\x tut
1494,7 0,000,0 0,0,000, 14,3*7
,000, ii ,'jOOflQ 1,^4,7 ,:.
•- s 44i '.' «! • « «|a « «Uk •! -^ •-• • 1" - -I* H e i'H - «!- - --[ *• « .'
|00
H|
222
3|3
444
555
666
III
888
999
1 ! 1
|ou
1 1 1
222
31 3
444
555
666
ONE
7 7 /
888
999
4 I f
CAC
Ill
11 1
222
333
444
555
666
1 t 1
888
999
i i i
M40
loiill
iiini
!222
1333
444
555
222
333
444
55-5
> 6 6 6|6 6 6
TW(
mi
i888
)999
Mil 1!U
/ 7 1
888
999
n u»
4uo
ill
222
333
444
555
366
3
7 7 /
388
999
0
1
2
3
1
5
6
'
8
9
D; u life
000
1 1 1
222
333
4|4
555
666
} 7l
188
999
binz,
oil
1 1 1
222
333
444
55-5
666
FHRF
7 7 1
388
999
WUH
n
Ui
1 11
222
333
444
555
666
E
7 7 7
888
999
bra a
itn ,
J
1
2
3
4
5
6
7
8
9
£
1
Hill"
,11
222
333
444
555
1 1 1
222
333
444
555
6 6 6K 6 6
FOU
7 77
888
999
11 BE
7 7 /
888
999
WSJ!
001
in
222
333
4|4
555
666
R
77 7
88|
999
axx
•/
1
2
3
1
5
6
7
S
9
tw
oo|
1 1 1
222
333
444
555
666
17 7
888
999
k «
-------
5. Corrections for updating the tape are keyed on a "correction
card" image by fields or by keying a complete No. 1 card image
for missing observations or those having numerous field errors.
Following the updating of the tape, another edit is made in-
cluding the inventory listing.
IV Details of No. 1 Card Image Edit
A. Major check groups
1. Psychrometric check: relationships between T, T , T , & KH.
w dp
2. Limiting ranges of variability.
3. Wind, weather, temperature, and visibility and certain
interrelationships.
4. Cloud coding.
5. Cloud, ceiling, and sky cover relationships.
B. Psychrometric Check
Psychrometric relationships. The program is designed to accept
and check the interrelationships between the four psychrometric
parameters if all are keyed, or to compute T and RH and then
check the interrelationships if only T and T are keyed. The
notations are in terms of whole degrees Fahrenheit for tempera-
ture and whole percentages for relative humidity. If there is a
suspected error in these relationships, the observation is printed
out complete, including an appropriate error flag.
The empirical formulas used to compute Tw and RH (with respect to
water) are:
1. Computation of wet bulb (Tw):
If the dry-bulb temperature (T) is zero and above:
Tw = T - (.034N - .00072N JN - Ij ) (T + Tdp - 2P + 108)
a. If T is less than 100°F., rounding of Tw follows this
scheme:
Tw rounded = Tw + .9 if the tens position of T is 0, 1, 2.
Tw rounded = Tw + .9 -.01(T + .9) if tens position of T is
3, 4.
Tw rounded = Tw + .4 if the tens position of T is 5 thru 9.
b. If T is 100°F. or higher:
Tw rounded = Tw + .9.
153
-------
If the dry-bulb temperature (T) is less than zero;.
TV = T - (,03'tf - .oo6;;?)(,6rr -f Tdpl - 2? -v- 108)
TV rounded = Tw - .OlTdp
H » T ~ Tgp in the above equations ,
10
2. Computation of relative humidity:
RH
173 + ,9T
The checking procedures print out the error flag if:
Tg is greater than Tw and if Tw is greater than T, and
if the following are not satisfied:
a. In the range of temperature from -60° to -M39°» the
dew point range may be -60° to +90° • For individual
observations t the dew point check requires that s.
maximum Tdp taking T -0,5°F, and T v+ O.U°F., and a
xaininum Tdp- taking T + Oj*°F,,and TV - 0.5°F. (if
T = TV, maximum Tdr> = TV.) Saturation vapor pressures
from tables stored in memory: Table A (vapor pressure
over vater for the range -60° to -VlUO°F,) or B (vapor
pressure over ice for the range -60° to -5-31°F.) are taken
for the above values of Tv. The vapor pressures for each
end of the allowable dew point range are then computed,
using
e = ew - 0.000367? (T - TV) A + Tv^-jsa \
\ 1571 /
P are in inches of mercury. Pressure may be
taken from individual observation, or from the pressure
applicable to elevation range in vhich station is located.)
From the vapor pressure tables in memory, the dew point
temperatures corresponding to the vapor pressures at
each end of the range, which are for the air at tempera-
tures T + O.Uo?t| Tv - 0,5°F.t and T - 0,5°F., Tw + 0.1*0?.,
are taken in terms of whole degrees of dew point. If the
dew point being checked falls with 2° above or 2° below,
it is accepted as correct. If outside this range, an in-
dication of psychronetric error is printed. Note that
if station pressure values are not recorded in the obser-
vations, computation of Tw should still always be possible
since the program will taken an appropriate pressure va,lue
that corresponds to the station elevation.
154
-------
b. Relative humidity values are accepted if they are in the
range of 4% to 100% and are within 2% above and 2% below
the computed range of humidity below. All values less
than 4% are flagged for review. For hygrothermometer sta-
tions, the relative humidity will have been computed by
the formula in 2 above; for other stations it will have
been keyed from the original record.
The range for relative humidity is determined in the same
way as for the dew point check. Maximum and minimum vapor
pressures are obtained from the taped tables for each end
of the range, and the computation at each end of the range
is by this formula:
e
RH = — , e being the vapor pressure of the dew points
s and e the saturation vapor pressure of the
s
air at the observed temperature plus 0.4°F or
less 0.5°F.
If liquid fog is reported in present weather and the tem-
perature is 31°F. or less, T = T = T is acceptable.
If T is less than -35°F., no formula is applied. In the
latter case, when T = -36° or - 37°, an error is listed if
the dew point does not fall within the range T - 6° (plus
of minus 1°). An error also lists if temperature is within
the range -38° through -53° and dew point is not in the
range T - 7° (plus or minus 1°).
C. Limiting Range of Variability
Limiting values, some absolute and some dependent on other
elements within an observation, are incorporated into the
machine edit program for checking purposes. Items with
values outside the limits, or such as appear inconsistent
with other elements in the observation, or approach extreme
conditions are flagged for technical review as follows:
1. Sea-level pressure: above 1060.0 or below 940.0 mb.
2. Station pressure: if pressure in inches and hundredths
plus 10~3 times the elevation (H ) in feet is less than
27.75 or greater than 31.30 inches.
3. Change of sea-level pressure from one observation to the
next is greater than 6.0 mb., change of station pressure
from one observation to the next is greater than 0.20
inches. The interval between observations in both cases
is 3 hours. For 1-hour, 3.0 mbs. & 0.10 inch apply.
4. Temperature: T, above 125° or below -60°; T , above 125°
or below -60°; T, , above 90° or below -60°,Wand if T
dp w
and T., are present and T is -53° or colder.
dp
155
-------
5. Temperature fluctuation from one 3-hourly observation to the
next: if T or T changes 20° or more from one 3-hourly obser-
vation to the nex€, the observation which varies 20° or more
from the preceding is flagged for review. Changes of 10° are
flagged for hourly observations.
6. Relative humidity: below 4%.
7. Winds: When wind at one 3-hourly observation of 20 knots or
more doubles at the next 3-hourly observation, or reaches 50
knots, the wind speed is flagged for review. (In AWS version
of this edit, all winds 30 knots and higher are flagged for
review.)
8. Visibility: is 15 miles or less at one observation and 70
or above at the next.
9. Obscuration and cloud heights, as follows:
a. Obscuration greater than 4,000 ft.
b. Fog greater than 1,500 ft.
c. Stratus, stratocumulus, greater than 9,000 ft.
stratus fractus, cumulus
fractus, cumulus mamatus
d. Cumulus, cumulonimbus greater than 12,000 ft.
e. Altostratus, altocumulus, less than 4,500 ft. and
nimbostratus, and altocumulus greater than 20,000 ft.
castellanus
f. Cirrus, cirrostratus, and less than 15,000 ft.
cirrocumulus
D. Wind, Weather, Temperature, Visibility:
1. Wind: direction is recorded and keyed in tens of degrees from
north (00 = Calm), and wind speed in knots (00 = Calm). If
speed is 00, direction must be 00. Legal directions other than
00 are 01 through 36. The wind error indication is printed
with illegal directions, for speed of 01 or more with direction
00, for direction of 01 - 36 with speed 00, and for exceeding
limits mentioned in b above.
Speed is related to the check of blowing dust, sand, blowing
spray and blowing snow. Observations in which these items
appear with wind speed less than 9 knots are flagged.
2. Weather: the following items and observations containing them
are flagged for review:
a. Tornado.
b. Ice crystals with intensity indication or in combination
with any other element.
156
-------
c. Fog or any form of precipitation with clear sky (0 cloud
amount) except ice crystals.
d. Fog with dew point depression greater than 8°F.
e. Fog with less than 1/10 cloud cover.
f. Weather types below with visibilities other than those listed:
Weather Visibility range
S+, SP+,SW+, L+, ZL+, SG+ 000-004 (0 - 1/4 mile)
S, SP, SW, L, ZL, SG, 1C* 005-007 (5/16 - 1/2 mile)
(* Note: 1C may be reported with higher than 1/2 mile visibility)
S-, SP-, SW-, L-, ZL-, SG- 008 (3/4-unlimited)
F, IF, GF, BD, BN, K, H,KH, 000-060 (0 to 6 miles)
D, BS, BY
g. Weather types (all intensities) with temperatures other than
within ranges below:
Weather Range of temperature
R, RW, L 28°F. or higher
ZR, ZL No lower limit, to 39°F.
IP 10°F, through 44°F.
SP, SG, S, SW -40°F. through 44°F.
1C -40°F. through 15°F.
IF -40°F. through 15°F.
h. 100% relative humidity reported without liquid fog or liquid
precipitation in the weather fields and wind speed > 6 knots.
i. Illegal visibility codes are flagged for correction. The legal
visibility codes are:
VSBY Code VSBY Code VSBY Code
0
1/16
1/8
3/16
1/4
5/16
3/8
1/2
5/8
3/4
1
1 1/8
1 1/4
1 3/8
000
001
002
003
004
005
006
007
008
009
010
012
014
016
1
1
2
2
2
3
4
5
6
7
5/8
3/4
1/4
1/2
018
019
020
024
027
030
040
050
060
070
8
9
10
11
12
13
14
15
20
and,
080
090
100
110
120
130
140
150
200
by 5 mile
increments , on
to
95
950
1 1/2 017 > 100 990
157
-------
E. Cloud Coding
Ceiling, sky condition, and clouds are interrelated. Three
columns are keyed for ceiling height. The valid codes are as
indicated below and any others are flagged for correction.
Ceiling height Card code
Unlimited XXX
Zero 000
100 ft. - 5000 ft. 001 - 050
(every hundred feet)
5000 ft. - 10,000 ft. 050 - 100
(every five hundred feet)
10,000 ft. and higher 100 - 250, etc.
(every thousand feet)
Sky condition is a four-position (4 card columns) field, with
provision for keying four sky condition symbols, as may be
recorded in the MF1-10A Sky column. Heights of clouds are not
keyed in this field (ceiling is keyed in the ceiling field and
cloud heights in the "layer" fields discussed below). If less
than 4 symbols are reported, keying begins at the left of the
field, with "0" keyed in each column at the right of the field
for which no sky symbol is reported. The lowest sky symbol is
keyed first, the next highest second, etc., until the 4-column
field is coded completely, either with sky condition symbols
(including blanks) or zeros.
If more than 4 sky condition symbols are reported, the highest
is keyed in column 20, and the first three in columns 17-19, un-
less this excludes the ceiling symbol. In the latter case the
ceiling symbol is keyed in column 19, the first two in columns
17 and 18, and the highest in column 20.
For a partial obscuration (-X) the first column of the sky con-
dition field is left blank. The succeeding three columns are
keyed for reported sky conditions.
No clouds or obscurations (clear) is keyed 0000.
An obscuration (not partial) requires an X key in the first or
second column of the sky condition field. If the obscuration
is the lowest sky condition, the X will be in the first column.
If a cloud layer is reported below the obscuration it will be
keyed in the first column in the normal manner, and the X in the
second column of the field. In this situation, the last two
columns of the field would be 00.
158
-------
The table below presents the valid codes of the Sky Condition
field, in the table, p = punch, b = blank, and - = X.
Card
code
0000
Card column punching | Description
possibilities j
0
pOOO ' 1,2,4
5,7,8
ppOO
pppO
pppp
-000
bOOO
bpOO
bppO
bppp
p-00
1,2,4
5,7
1,2,4
5,7
1,2,4
5,7
X
Blank
Blank
j
Blank
(
Blank
1,2,4
0
0
1,2,4
5,7,8
1,2,4
5,7
1,2,4
5,7
0
0
1,2,4
5,7,8
1,2,4
5,7
1,2,4
5,7
X
0 0 I Clear sky, (less than 1/10) .
00 j One symbol only, not an
i obscuration or partial
j obscuration.
0 j 0 | Two symbols reported, no
obscuration or partial
. obscuration.
1,2,4 i 0 i Three symbols reported, no
5,7,8 f j obscuration or partial
1 obs cur a ti on .
i
1,2,4 j 1,2,4 i Four symbols, no obscuration
5,7 : 5,7,8 i or partial obscuration
0 0 Obscuration, 10/10 sky hidden,
no layer below obscuration.
0 0 Partial obscuration, no other
symbols.
0 0 Partial obscuration, one other
symbol.
1,2,4 0 Partial obscuration, two other
5,7,8 symbols.
1,2,4 |l,2,4 Partial obscuration, three other
5,7 ! 5,7,8 symbols.
j
i
0 j 0 Obscuration above one layer of
i i cloud
159
-------
F. Clouds and Obscuring Phenomena.
Provision for keying as many as four layers of clouds and/or
obscuring phenomena, total sky cover, and opaque sky cover
amount is made in this field. Cloud layers are keyed in ascend-
ing order. If more than four layers are reported, the four
lowest are keyed. The lowest layer is always keyed in the left
hand cloud field of the card. For each layer, amount, type, and
height are keyed. For the second and third layers (if reported),
the summation amount(s) is keyed at the level(s) involved.
If a complete cloud layer section is reported unknown, "U", on
MF1-10B, the corresponding card field for the entire layer is
left blank.
When fog or any other obscuring phenomenon is reported, it will
be handled in a manner similar to a cloud layer, and an amount,
type, and height will be keyed. Obscuring phenomena other than
fog (smoke, for example) are keyed X for type. Heights of clouds
and vertical visibility into obscurations are keyed in hundreds
of feet. Where vertical visibility is unlimited (dash in height
column of MF1-10B) height is keyed XXX. If cloud field is re-
ported clear or none, height will be keyed XXX. If cloud height
is reported unknown (U), height is left blank if type is unknown.
Summation totals may not exceed 10/10, but the first summation
(card col. 67) may be 1 greater (not exceeding 10/10) than the
sum of card columns 57 and 62; and card column 73 may be 1 greater
than the sum of card columns 67 and 68, not to exceed 10/10.
Total cloud amount (card col. 56) should be the same as col. 57
if only one layer is reported, the same as col. 67 if only two
layers are reported, the same as col. 73 if only three layers are
reported, and equal to not more than 1 greater than the sum of
cols. 73 and 74 (not exceeding 10/10) if four layers are reported.
1. Legal codes in the card field for "Clouds and Obscuring
Phenomena" are related to the Ceiling, Sky Condition, and
Weather and/or Obstruction to Vision fields. Accordingly, a
discussion of the several relationships is presented.
a. If sky condition is reported clear, ceiling must be un-
limited. Summation of all clouds must be zero. Type and
height in the cloud layer fields may be keyed for zero
amount (less than 1/10).
160
-------
b. The ceiling height must be consistent with the height of
the lowest cloud layer whose corresponding symbol in the
sky condition field is broken or overcast, or with the
height of an obscuration. The total (if one layer) or
the summation amount at the layer constituting the ceiling
must be equal or greater than 6/10. The cloud type must
be coded either 2 through 9, X/2, X/4 through X/7, X/9, or,
if an obscuration, 1 or X.
c. If the ceiling is not XXX (unlimited), some sky symbol must
be keyed: i. e., broken (5), ovc (8), or obscured (X in
1st or 2nd col. of sky condition field). Only one X may be
keyed. It will be the first column of the sky condition
field if the lowest layer is the obscuration; the second is
a layer below a portion of the surface-based obscuration.
d. If the first cloud layer contains 10/10 F or IF (not GF),
the ceiling height must equal the height of the first layer,
and the sky must be obscured.
e. If fog is keyed as an Obstruction to Vision with clear sky
or with partial obscuration or less than 5/10, with no
clouds above, the fog must be GF or IF (not F). If the
partial obscuration is 6/10 or greater, with no clouds above,
or with obscuration (10/10), the fog may not be classified
as GF.
f. If total opaque is zero, all sky symbols must be thin or
clear, and ceiling must be unlimited.
g. If any sky symbol is thin, the total opaque amount must not
be more than half the summation amount of that layer and
all higher layers (not always in error for higher layers,
but should be flagged for review). If the ratio of total
amount is 1/2 or less, the highest sky symbol must be thin.
h. Sky condition symbols must, with increasing height, reflect
equal or increasing sky cover. Only the highest sky symbol
may be overcast, except that below an overcast there may
be a thin overcast.
i. The highest sky condition symbol must be compatible with the
amount of total sky cover.
j. If obscuration (X) is reported as the second sky condition,
the second cloud layer type must be obscuring phenomena
(fog, ice fog, smoke, rain, snow, for example) keyed X;
total amount, total opaque, and first summation total must
be 10/10. The third and fourth cloud layers and the second
summation total columns should be blank (may be keyed if
an aircraft report has been received, but should be flagged
for review). The third and fourth sky cover symbols must
be zero (0). Ceiling must not exceed the second cloud layer
height, and that height should be 4,000 ft. or less. Normally
fog will be questioned in such a situation.
161
-------
k. If obscuration is reported as the first sky symbol, the
type of the first cloud layer must be fog (code 1) or
other obscuring phenomena (code X), total amount and total
opaque must be 10/10, and height must correspond with ceil-
ing height. The second, third, and fourth cloud layer and
first and second summation total columns should be blank
(may be keyed if an aircraft report has been received, but
should be flagged for review). The second, third, and
fourth sky cover symbols must be zero (0). Height and
ceiling should be the same. Height should be 1,500 ft. or
less if fog (code 1) or 4,000 ft. or less if other obscur-
ations (X) are encoded.
1. When fog (code 1} is reported in the first cloud layer,
amount not coded 0, the sky condition must reflect an ob-
scuration or partial obscuration.
m. When fog is reported as the only cloud field (code 1), it
should be coded in Obstructions to Vision as GF if amount
is 1 to 5 tenths, or F if 5 to 10 tenths (prevailing
visibility being 6 miles or less).
n. The corresponding cloud and summation total columns for
sky cover symbol reported (code ) above an overcast
(code 8 in Sky Condition) should be:
-1. Blank if total opaque is 10/10.
-2. Zero (0) in amount and type columns, 10 in summation
total columns, and XXX in height columns whenever
total opaque is less than 10/10. Additional layers
may be keyed if an aircraft report has been received,
but should be flagged for review).
o. Partial obscuration (blank in first position of Sky Condi-
tion) must have a first-layer amount from 1 to 9, type
must be fog (code 1) or obscuration (code X), and height
must be unlimited.
p. Some stations (FAA) do not observe cloud layer values,
but do enter total cloudiness and total opaque. If the
ratio of total opaque to total amount is 1/2 or less,
there should be no codes 5, 8, or obscuration (X) in Sky
Condition. If the ratio is greater than 1/2, there must
be 5, 8, or X in Sky Condition, and if ratio is not 1:1,
X is invalid. (The valid blanks in cloud layer field will
cause "2's" to print in the inventory listing for these
stations.
162
-------
2. The testing procedure to flag errors or suspected conditions
in Clouds and Obscuring Phenomena, Sky Condition, Total Clouds,
and Total Opaque is systematic. Missing fields are indicated
(except the valid condition for FAA coded "2" in the inventory)
in the usual manner. The system, in general follows these
steps:
In the cloud fields, the valid codes for cloud amounts are
0 = no clouds or less than 1/10; 1 - 9 = 1/10 to 9/10 clouds;
X = more than 9/10 or 10/10.
Valid codes for cloud types are 0 = NONE; b (Blank) when a
cloud type is reported UNKNOWN; 1 = Fog; 2 = Stratus;
3 = Stratocumulus; 4 = Cumulus; 5 = Cumulonimbus; X/2 (K) =
Stratus fractus; X/4 (M) = Cumulus fractus; X/5 (N) = Cumulus
mammatus; 6 = Altostratus;'"7 = Altocumulus; X/6 (0) = Nimbo-
stratus; X/7 (P) = Altocumulus castellanus; 8 = Cirrus; 9 =
Cirrostratus; X/9 (R) = Cirrocumulus; X = Obscuration other
than fog.
The valid codes for cloud heights in the cloud layer fields
are the same as for ceiling heights (in number of hundreds of
feet); XXX indicates NONE or (in the first layer) a surface-
based partial obscuration; and bbb (Blanks) indicates cloud
height unknown with type unknown.
Errors are listed for invalid codes, and if
a. Any cloud field element is keyed and the Total Clouds left
blank.
b. Total opaque is not keyed, unless indicated in the station's
observation pattern (III. D. 1.) .
c. Total opaque is greater than total cloud amount.
d. Total opaque is less than 10/10, and any blanks occur in
Cloud and Obscuring Phenomena fields including summarions,
FAA excepted.
e. Any element within a cloud layer is keyed (amount, type,
or height), and any other element is left blank.
f. Total Cloud amount is keyed from 0/10 to 9/10 inclusive,
and amounts and types of fields above highest reported layer
are not coded "0" and heights are not coded "XXX."
g. Each summation amount does not equal or exceed the next lower
summation amount, or if a succeeding summation amount is
greater than 1 more than the amount(s) of the accitional
layer(s), or exceeds 10/10.
1. In the case of partial obscuration in the first layer,
the second summation is not greater than the, amount of
the first layer if a cloud layer is also reported.
163
-------
2. The summation amount is less than any lower individual
cloud layer amount.
3 . Blanks in a summation amount are not preceded by 10/10
in the last summation amount (which is not blank) by
blank cloud amounts in fields with blank summarion
amounts, and 10/10 Total Opaque is not reported.
h. Height ranges by cloud types are in disagreement with those
listed under Obscuration and Cloud heights .
i. Fog (code 1) is coded in layers above the first.
j . * Height in cloud layers reporting height does not increase
one layer to the next.
k.HfThe ceiling height does not agree with the lowest layer
height constituting a ceiling, or the highest sky symbol
is not compatible with total sky cover.
4
»
164
-------
G. Explanation of Edit Flags
Beginning on the following page are numbered explanations of edit
flags appearing in the correspondingly numbered observations printed
out in the Surface Weather Observations format. An asterisk (*)
prints over fields to be checked.
The edit is designed for No. 1 card images keyed every 3 hours (at
local standard time corresponding to 00, 03, 06, 09, 12, 15, 18 and
21 hours GMT) and is readily adaptable to programs under which all
record observations are keyed.
1
1-
c:
in
(0
0
0
u_
1
"
i
<
*-
n
y
s
§
STAT'OH
NUMBER
00000
i a 3 4 s
11111
22222
33333
DATE
rr»
00
j 7
no
3D
8 9
1111
22
33
2
DAY
iiO
1911
1 1
22
333
00 NOT PUNCH
IN THESE
COLUMNS
S666C
77777
88886
99993
66
77
88
6
7
8
99 9
4
5
6
7
8
9
APK323O BSC I
1 CEIltj } «KV 1 {WEATHER AMD/OKI
j II IcovjT^fi! l^sr^jcrod-, To.sjW
KOJA
OC
1211
1 1
22
3
4
5
E
7
8
•
»ct w s 'S Ij i)
mi, II !| I, 1
^M ; i ,
S333 !
OOG
srrVf '.-•""'-'' J 5. I;"-;
°EW I WIND 1 I ORY
PO,»4T 1 -— r-J 1 Bill p
iMttLC?jL£*i*>i "| F>*l",->, j « irStSSJRc1 j
pS|>«c»
3IOIDISIO 8:0
0 5^0
;4l5i; 17 '8 :S !i 21 2125 24
1 1 1
222
333
444
55
66
77
88
I
cj g o
i
dstes
1 t t
assist
fiSU^I
1 1 1
25I»
1 1V!"-1
..ill
1 '
33V4-!
44V,
5551s
66'i
me&i ? \
i
jte&felaa'i
III 9 9 *<
12 13, '4 15 11 n ig iv A'UI ^/ a
i 1 1
o
24
|
K L-
1
1
1
T.'K'b
1
»Ui
1
[8-fl-
1
73! 21
i
TKt
1
OlOiOICl!)
27IH<21|3sh
S-*3»-if f 1 1
1 1
1 1
s>ii«if-
1 1
1 1
5?l U
1 1
1 1
teH
1 1
i;kcl
1 1
l»»
1 1
I
I
Sl|i>
1
1515
1
1
b-
1
1
1
1
1
1
1
WET
RM B
^I'tm^^'Jj-f'f^-"
"" a i" «
C'.eEi j-t, f-i:1-!
0 0 DlO
n 15 "JJ3
1 111
1
22(2
I
3313
I
4414
1
55'5
66i6
1
7717
1
8818
1
99919
1
IT i 7S|29l 30)31 1 3! 33 34)35
i i 1 i 1 i
-flOOC
36 37 C3 :9 4C
1 ill 1
22122
33
44
55
66
77
88
99
2S 3J 33
33
110
41 4:
1 1
22
33
444
i
r,
6
7
8
9
39 41
55
66
7 7
88
39
4! 42
(I'.CHES) | i
0 CIO !i
43 «|4) 45
1 111 1
1
22122
j .
331^3
H
4144
1
515 i
616 L
. I
^
£ D
H-^M
CLOUDS AND O8SCUWN6 PHENCV^1. A
£*--
B,'' ^ 1
CU-
- o : - oo;o o o
47 48 43 5C 51 V
1 ! 1
22
33
44
5 5
J11
7'77! 77
1
8'3 6
1
S'3 3
1
43 44143 45
I
88
39
1 1
22
33
44
1] 54 *5
1 1 1
22
33
OiO
Sj ''
ih
J.
i
44J4
5 S| 5 5
66
77
88
99
-!» E B
'
1
,' 1
t 5 -v
v 77i;; ? 7. ''7..-
5 6 6
7 T 7
i , ' ' •
B 8i? *•'• 8 s!?.3' ! 3 ?. Gj
i
i
;
ok Q q!o c) ,i c C'. c:-
3 •. 3 S 0 1 u 1 Jj'v.a-'"
u2 DJr^ t] bs
i
1
,: O
i
1
j' i
i
9 S SlSi
|
"6!
Fig. 2
' STATION
NUMBER
WSAN
00000
1 Z I < 5
. 11111
§22222
133333
• §44444
^55555
3^66666
2|77777
g;88888
§199999
C* 1 ! 1 4 5
; . OC M-
Y
E
A
R
00
( 7
1 J
22
33
44
55
66
77
88
99
t 7
m
N
T
H
00
8 9
11
22
33
44
55
66
77
88
99
8 9
54>4
S
00
ion
1 1
22
33
44
55
66
77
88
99
101'
o
!
** i
*
00(
12 n i
1 1
2 2
33
44
55
661
77
88
99
u-
1'
^
JOO
4 IS IS
1 1
222
333
»44
555
166
777
!S8
599
i
f
0
17
1
2
3
4
5
6
7
8
9
u
DATA
00000
18 19 20 21 22
11111
22222
33333
4444 j
55555
66666
77777
88888
99999
18 II) 3 21 22
-
i.
00(
23142
1 1
22
33
44'
55
66f
77
38!
99
21242
; DATA
0000 0
i 26 27 28 23 X
11111
22222
3333.3
44444
55555
66666
77777
88888
99999
5 262228 2938
or~n— 1|
001
11 323
1 1
22
33'
44.
555
66
775
88!
99C
;i 321
1 DATA
00000
1 34 3; -o; B
11111
22222
33333
44444
55555
66666
77777
38888
99999
3 31 »36 3/ *>
f r
1
E
b '
oot
39 40 1
1 1
22
33
444
55
66
77
88!
39
DATA
00000
42 41 44 45 46
11111
22222
33333
44444
55555
66666
77777
888S8
99999
42 41 44 -.3 46
F
b
00
4748
1 1
22
33
44
55
66
77
38
99
47 48
1
0
n
1
2
3
4
5
6
7
8
9
'5
DATA
00000
£0 M 52 53 54
11111
22222
33333
44444
55555
66666
77777
88888
99999
SO SI S2 53 54
b '•
OOC
5556 .
1 1
223
33:
444
55S
66
77)
88E
99'
5553 '.
DATA
00000
58 59 60 61 62
11111
22222
33333
44444
55555
66666
77777
88888
99999
1 58 i-J U 61 62
orm— •*!
000
53 64 S
1 1 1
222
33:
44<
5 55
6 6 E
7 7)
38!
991
au E
DATA
00000
66 67 68 69 70
11111
22222
33333
44444
55555
66666
77777
88888
99999
5 68 !7 68 B 70
F ,
1
E1
001
71 :i 7
1 1
22
33
44'
55
66
77
88
9S
7,72,7
DATA
00000
3 74 75 75 T 78
11111
22222
833333
44444
55555
66666
77777
88888
99999
3 74 '' 7j 77 73
00
r9 80
1 1
22
33
44 J
U*
55 i
66-S
77 |
88 8
UJ
99 g
73 80 1*4
Fig. 2a
165
-------
Explanations of Reasons for Flags on Correspondingly
Numbered Observations in the Edit Listings **
1. Dew point incorrectly keyed.
2. Hour other than that in the normal program.
3. Card missing for hour in station program.
4. Dew point temperature higher than dry-bulb temperature.
5. Ceiling height differs from cloud layer height.
6. Ceiling height differs from cloud layer height.
7. Obscuration under Weather is snow with type of obscuration fog.
8. Obscuration with less than 10/10 sky cover.
9. First sky symbol scattered with corresponding layer over 5/10,
and Second cloud group and ceiling non reportable value.
10. Ceiling, sky symbol, and summation total not in agreement with
total sky cover.
11. Two opaque overcast symbols.
12. Lower layer is not opaque.
13. Incorrect relationship of ceiling, sky condition, and total
opaque sky cover.
14. Dry-bulb temperature incorrectly keyed.
15. Review of wind speeds over 50 knots.
16. Wind direction value over 36 (360°).
17. Partial obscuration with lowest layer a cloud type.
18. Wind direction with calm wind speed.
19. Wind speed with no wind direction.
20. Ceiling height missing.
21. Amount of partial obscuration is greater than total opaque.
22. Total opaque missing.
23. Amount of obscuration and total sky cover differ.
24. Amount of obscuration less than 10/10.
25. Partial obscuration due to fog omitted from weather, and in-
complete keying of cloud and obscuring phenomena.
26. Amount of partial obscuration is greater than total opaque.
Also, incomplete keying of clouds and obscuring phenomena.
27. Third layer summation is missing.
28. Total sky cover omitted.
29. Visibility omitted.
30. First two columns of weather omitted.
31. Fog not shown as obstruction to vision. Clouds and obscuring
phenomena layers less than total sky cover.
32. Illegal visibility.
33. GrOund fog with obscuration greater than 5/10.
34. Fog reported with less than 6/10 obscuration and lowest cloud
layer greater than 5000 feet.
35. Ceiling not a reportable height.
36. Ground fog with over 5/10 obscuration.
37. Sky symbol and first cloud layer not in agreement.
38. Total opaque cloudiness and cloud layer data in error; or
ceiling, sky and cloud layer relationships in error.
** See paqes 168 through 174.
166
-------
39. Blowing dust with wind speed less than 7 knots.
40. Flagged for review - no increase in 2nd layer summation amount.
41. Ceiling height not a reportable value.
42. Fog reported as obstruction to vision with visibility greater
than six miles.
43. Visibility value not reportable.
44. Visibility value not reportable.
45. Ground fog reported as obstruction to vision with visibility
greater than six miles.
46. Visibility reduced to less than seven miles and no obstruction
to vision.
47. Illegal keying in weather and obstruction to vision columns.
48. Illegal keying in weather and obstruction to vision columns.
49. Squalls reported with wind speed less than 16 knots.
50. Fog with less than 6/10 obscuration and lowest cloud layer
greater than 5000 feet and psychrometric error.
51. Sea level pressure flagged for non-reportable value.
52. Dry bulb and dew point sequence check.
53. Station pressure flagged for improbable value.
54. Dew point incorrectly keyed.
55. Cloud height incorrectly keyed.
56. Illegal punch in weather & obstruction to vision columns.
57. Flagged for intensity of snow with 1/4 visibility.
58. Station pressure sequence check.
59. Duplicate cards, date and hour 1st card.
60. Duplicate cards, date and hour 2nd card.
61. Visibility sequence check. Change in values up. Sea level and
station pressure check.
62. Visibility sequence check. Change in values down.
63. Missinb observation.
64. Dry bulb sequence check.
65. Flagged for intensity of snow with 1/4 mile visibility.
66. Flagged for liquid precipitation with 24-degree temperature.
67. Station and sea level pressure sequence check.
68. Station and sea level pressure sequence check.
69. Sea level pressure sequence check, station pressure flagged for
review.
70. Frozen precipitation with 45-degree temperature, station pressure
flagged for review.
71. Station pressure flagged for review.
72. Station pressure flagged for review.
73. Snow intensity not in agreement with visibility.
74. Snow intensity not in agreement with visibility.
75. Missing observation.
76. Sea level pressure sequence check.
77. Sea level pressure sequence check.
78. Duplicate cards, date and hour 1st card.
79. Duplicate cards, date and hour 2nd card. Dry bulb sequence check.
SO. Sky condition symbols missing.
81. Weather and obstruction to vision symbols missing.
82. Observations for the 29th day missing.
83. Observations for the 30th day missing.
84 Observations for the 31st day missing.
85. Monthly inventory check.
167
-------
IMOAA F«rw K2-3U , ft IPPArP \A/PATWPP OR^PRVATinN-K NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION
(1.73, OUKPAV-t WCAmCK WBOCKVMIIUINO ENVIRONMENTAL CATA SERVICE
c
f
Ul
u
u
-
s
u
E
3
Z
i
CM
1 STATION NAME
WBAN EDIT TEST *1
CO
5 °
STATION NO.
OOOOA
S
s
K
K
S
s
-
=
R
S
:
6
2
-
-
-
-
-
fl
F*
*;
Q
u
inorrfo
•c
u
fit Ofl -MW
MAI
4NOO«
NOI1VWMOS
OAVl OHC
uaxvi out
";"..v*
MAI
innoHv
SjAvrss
2ND LAYER
LOWEST LAYER
"'"..T1
Jjju
ANCWm
C14 -K) IMtl
MU
H3A03 AXf
1V101
(V)
AiiownH
3AI1VT3M
•ina
(•j
13*
>)
iitnstaaj
NO 11 VIS
** i
* i
Ii
TENS OF
DECRCCS
S|t
(•$««>
13A11 V3S
HEATHER AND/OR OBSTRUCTION TO VIIION
s>
PROZCN
PRECIP.
LIQUID
PBECIP.
.•u:r.'«
VISIBILITY
(MILES)
• SKY
STMBOLS
Ml
M ~
•14 JO 'SOH
9HI1I33
1
2
*^
1
1
0
O
1
1
O
o
1
1
o
1
1
1
o
o
o
CO
o
o
-«
•o
CO
CM
o
CM
O
O
* 1
o
o
o
o
o
o
o
o
o
o
o
0
1
1
o
o
^
1
1
o
0
1
1
o
o
1
o
1
1
1
0
o
1-
o
o
o
-•
1-
03
N
o
C"
CM
O
0
O
o
0
o
o
0
o
o
0
o
o
IM
1
1
CMCO
o
o
o -»
in O
o
1
1
o
0
•o
1
1
0
•o
1
o
-o
o
0
en
o
"°
o
*4
CO
0
o
r-
co
CM
O
CD
CM
O
m
0
o
0
o
o
o
o
o
o
0
o
— o
"
1
1
1
0
0
*
o
o
m
Cl
o
CO
o
CO
o
IT-
CM
1-
co
CM
O
O
•fl
•O
o
o
0
o
o
o
o
o
o
o
o
:3
f-
* o
2
0
^
o
CO
°
o
o
CM
CM
0
CM
r-1
0
CM
O
0
O
O
0
o
o
o
o
o
o
o
o
c\
"**
* 0
•o
0
^
O>
o
o
D
u.
o
CO
o
CM
CM
•O
CD
CM
.-*
m
CM
r-
c-
o
0
o
o
o
o
I/)
0
0
* o
o
o
— ^
*
o
o
o
o
*
1
0
•-•o
*
e»
l
l
o
*
o
1
1
0
*
t-
o
o
1
o-
o
*
CO
r-
0
l\
CM
a
CM
O
o-
o-
o
o
o
0
o
o
1
o
o
o
CM
o
*
0
0
CM
CM
O
z
*
*
o
CO
CO
U
CO
*
o
o
— «
o
CD
O
o
CM
m
CO
CM
IA
O
tA
CM
*
C
0
X
o
o
o
o
0
0
o
o
o
o
IA
O
(/•
-
•o
o
~
"
CO
CO
o
r-
CM
CM
o
0
o
o
o
0
o
o
0
o
o
XJ
*
u
o
o
o
'%J
1
1
o
0
1
o
o
f\
"
* r-l
o
a-
o
— 0
o
o
o
o
o
-
CO
CO
*
s
CM
o
•o
0
o
o
o
o
o
0
o
o
o
in
a
1
o<
* o
*
o
o
"'
1
1
o
o
o
1
0
0
0
c\
•"
s
0
0
o
» -f
o
•o
1-
0
o
o
o
o
CO
CO
o
fx
o
1-
o
0
0
o
o
o
o
o
o
o
o
CM
— ^
°
*
o
» 0
o
o
o
1
I
o
- -— o
o
1
o
-(5
o
1
1
1
0
TC
I
t
1
o
o
4
0
o
0
« 1
o-
o
1^
CM
o
u
CM
o
0
o
o
o
o
0
o
o
o
CS
— c
-
CM
It IA
O
CM
l~
O
f
<\
0
O
o
o
o
o
o
o
o
o
m
CN
O
o
o
1
_ 1
o
c
1
1
I
o
— js
o
1
0
c
1
1
1
o
o
<
o
c
o
*M
o
CM
0
r-
* m
r—
o
I
a
CM
C
0
o
0
0
o
0
0
o
o
in
u
•-*
o
o
OXEN LATER X» OBSCURATION ("X" APPEARING IH CLOUD TTPE COLUMNS NOTEl "+" DENOTES -MEAVY-
ERCAST LATEH DENOTES OBSCURING PHENOMENA OTHER THAN POO) «-" DENOTES" -PARTIAL." "LIGHT." "THIN". OR "MINUS- AS APPROPRIATE
• H
• O
,
TTHIO LAY
CLOUD
ENTRIEIiSiSC'
C>NO
i
S
i-l i-l 1-4 i-l i-l fH r-l
Fig. 3a
168
OINVBM
-------
1 33IAB3S*.l\rO-|VJ.N3MJOHmN3 CM/-M 1 W A M3Car\ V3UIW3AA 3^WJVnt- Ert) 1
| Noavtij.stNirvav3iii=w"«*« von J
NATIONAL CUMATIC CENTER |
X
u
£
O1
(M
STATION NAME
WBAN EDIT TEST #1
to
rf 0
« 0
u •"
STATION NO.
00001
a
s
n
«
s
a
R
~
"
i
to
£
O
-
•
-
-
'
n
"
O
ano*dQ
K
u« *••*•«
«A1
IHTOW
NOIiYNHnt
D3AY1 Odt
3RD LATER
'""„"?"
IKflOKY
a 3 xvi OWE
K
*
J
a
z
*
w
>•
<
UJ
*
O
_»•
' " ™£"
IdAi
iHnoiw
en ID tew
WAI
INnOWT
Mii?ioi>$
(M
3AUV13H
aina
13«
ains ma
(S3HDHO
NOUV.IS
Q u
J «
ll
|
0 JJJ
P. S
sft
T3A3T f 35
WEATHER AND/OR OBSTRUCTION TO VISION
51
S>
FROZEN
PRCCIF.
,__J
LIQUID
PRECIP.
«",r,°.v«
>-
>
* SKY
SYMBOLS
(MILES)
„, "
"
„ "
«. -
'IJ JO'SQH
ONI1133
I
o
*
CM
O
o
o
CO
CM
m
o
X
o
o
o
o
o
o
c
o
m
o
**
* i
CM
2
O
*
t
f-
0
o
to
•-)
-
o
!•-
O
r-
en
*
*n
•o
0
S
«
P-I
cf>
o
o
o
o
o
o
0
o
o
m
o
— — o
o
o
o
0
r
i
i
o
0
CO
1
1
o
<_>
CO
*
1
1
o
o
1
1
1
D
U.
•ft
CO
o
in
in
in
CM
•o
0
z:
i
•o
(N
O
O
a.
CO
o
o
o
o
0
o
^
— X
1
t
O
s
o
0
CM
•O
O
Z
m
fM
o
o
u.
o
o
o
o
1
a.
0
o
0
G
* **
O
O
o
o
»
*
*
*
1
1
a
u.
* C'
o-
m
CO
o
CM
-t
CO
CO
CM
m
o
CO
o
m
o
o
o
o
o
o
o
* o
o
o
0
o-
1
1
r-
o
o
*
*
*
«
1
1
1
o
u.
•^
o
CO
o
*
t>
CO
CM
m
0
l
r-t
-«
CO
in
o
s
CM
CM
O
CO
0
m
r-t
CM
O
O
o
o
o
o
o
o
o
0
0
» CM
0
-•
o
CO
o
<
W
*
CO
in
o
CM
(M
CM
tM
CO
o
r-
0
(M
r-4
o
o
o
o
o
0
o
*
o
— ^
CO
o
o
r-l
-—-
o
* -•
o
o
«
*
1
1
1
D
u.
*
-
CO
f-
o
fvJ
(M
r-t
PJ
o
rsj
O
OJ
o
O
O
a.
0
0
0
1
a:
K|
* C
o
0
1
o
o
ft
r*
_ —
m
O
o
I/I
r-l
CO
o
-------
1C NATIONAL OCEANIC AND ATMOSPHERIC AOMINISTBATIO
"*a ENVIRONMENTAL DATA SEiWtCE
NATIONAL CLIMATIC CENTER
I JKUUF..UW SURFACE WEATHER OBSERVATlOt"
1 u-73
5
£
O
CM
STATION HAKE
WBAN EDIT TEST f»l
?°
2°
STATION NO.
ooool
3
s
s
s
s
R
R
K
s
t
=
t
2
-
-
-
-
-
"
c
u
anerwo
J
„.-,..
arfAJ.
iHnomr
NOIlYWtM
X
< i< n tMQ
im
1NHOT1V
WOIJ.V
o
x
y
MHOS
OH:
•"M.2T"
3JA1
IMHOHT
1IOIIH
»A1
INnonr
B3AOD A«
3AI1T13S
U
flina
.)
u.)
anna «ao
NOIJ.V1S
I||
x«2'B
*• S
s|£
aanssigj
T3A3T ta$
VIATHE* AND/OR OBSTRUCTION TO VIIION
Ii
MOZCH
PRCCIP.
5u
uauiMi
VISIBILITY
MILES!
• SKY
SYMBOLS
^ •*
V, **
'J.J 40 'SON
ONnm
2
*
1
1
O
0-
1
t
o
0
1
1
o
1
1
1
CD
u_
*
*
CM
CO
0
CO
rj
CM
O
Co
CM
O
•-4
(-4
CM
CM
0
0
u.
o
o
o
o
o
0
« 1C
o
n
o
« i
l
i
I
tr
•-4
1
t
O
CO
1
1
O
CO
o
in
1-
I
1
O
u.
*
tn
CO
O
CM
CM
CO
CM
0
in
o
CM
o
o
UL
O
O
e>
0
» o
o
•-I
o
-_A
01
1
o
o
CM
O
O
o
o
tn
* -i
o
o
o>
CM
CO
CO
CM
O
CO
0
CO
CM
1-
0
0
0
o
o
o
o
o
•3
0
c
"
^^
^•*
m
* 0
o
o
o
o
0
en
i
i
i
o
u.
*
CM
CO
O
CO
CO
CO
CM
in
o
CM
CO
CM
in
•-*
o
o
u.
u
o
o
o
0
1
tf
* o
CSJ
o
o
c
1
o
o
0
1
1
1
o
..... JJ.
CO
1
1
o
0
CO
1
1
0
o
0
o
u_
•o
o-
o
CO
m
CM
o
S
CO
CO
o
o
o
u.
o
o
o
o
1
o
o
o
o
— ^
*
o
m
•
*
*
*
o
* -<
0
CO
— — Iff
m
o
V,
*~
* -i
CO
o
o
ro
CO
CM
-
CO
CM
CO
CM
C"
0
0
0
o
o
o
o
o
o
o
0
in
-o
—~—- ^
in
o
p-
o
^.'
o
1
1
o
o
o
i
i
i
o
o
1
1
o
o
o
<
w
O
CO
0
^
CO
CO
' CM
* o
CO
CM
O>
CM
0
o
0
CO
o
o
o
o
_ o
* 0
0
o
o
IT0
1
1
o
i
*~
1
1
0
o
o
r-t
CM
o
tn
^
CO
*
o
0
1-
tn
1
l
1
" O
a
u.
*
CO
CO
o
CO
~
o
CM
O
CM
O
o
X
u_
o
o
o
o
p
c
0
•0
o
>c
1
CM
o
r-
0
_4
O
r-l
0
CO
CO
• 'in
if
*
CO
~°
"
fM
O
O
f.
.-J
1
1
1
o
0
-"
1
1
o
**
-
1
1
o
**
o
CO
p
~
CO
c
*
-
Csl
O
CO
-
0
o
o
o
o
o
o
o
o
0
en
i
r-
o
_4
*
1
1
O
-o
CO
1
1
1
o
— - — TO
CO
1
1
o
o
o
o
en
u
o
in
o
3}
CO
CM
0
CM
r-
o
o
u.
o
o
0
o
o
o
* o
o
TQ
1
!
o
*
«-<
1
0
o
CO
1
1
o
0
CO
o
tj
o
•o
o
*
in
o
in
*
O*
fM
*
cs
CO
CM
O
0
0
O
o
o
o
o
* 0
o
^
VI
f
,-
»-«
O
o
I/
o
u
l/>
m
o
u\
m
CO
CM
O
O
O
CM
O
CO
O
O
O
CO
o
o
o
0
* 0
0
w
o
o
_
o
o
10
o
«J
I/I
CM
*-•
o
u
in
vu
*-<
CM
CO
O
CO
CM
r-
co
CSJ
O>
O
o
CM
•*•
CM
0
0
0
o
•o
0
o
o
o
* o
o
o
o
*"*
0
r-
o
r*
1« IDOKEII LAYEK K» 001CU«ATIOH ("X" APPEAKIN9 IN CLOUD tYPI COLUMNS NOTEl "+" DlROIlt "BEAVlf "
0 . OVEHCAST LAYEK DENOTES OOJCUKIH6 PHENOMENA OTHEH THAN POO) "-" DENOTES" "PAKTIAL." "LIOHT," -THIN". OK "MINUS" AS APPSOf KIATE
•e
TTEHED LAY
CLOUD
ENTKIESiS>SCA
C* NO
i
8
SSSSKSSS5335355S
Fig. 3c
170
OlNt8M
-------
o
NOAA fwm S206* StlRPArP WPATHPR OBSERVATIONS NATIONAL OCEANIC ANO ATMOSPHERIC AOM.NISTSAT
(3-73) iUKrAV-C WCAinCK VJD3CKVAIIUIN3 ENVIRONMENTAL OATASERV'CE
NATIONAL CLIMATIC CS.NTS3
x
STATION NAUI
WBAN EDIT TEST #1
CO
d O
« 0
3 r-
STAT10H NO.
00001
»
«
R
£
S
s
r«
=
o
:
5
IS.
O
'
-
-
-
-
«
~
_J
O
300VJO
X
X
>-
"ixy
Mil
INflONV
M3AV1 08C
w
•€
a
K
n
1HMM
Mil
IHnONV
SSjiVToS!
2ND LAYER
LOWEST LAYER
1MIU
J..U
IHOOHV
4M-M1W
3JA1
iNnonv
"*?ioV$
Aliarwnn
flina 13*
tins Ana
{53H3NO
NOI1V1S
e SP
- "2
* «s
Q«*° «
S5£i
»- Q
a*.>"
T3A31 V3«
i
o
X
o
i
1-
a
O
«
a
Cf
X
t-
*
*t o
g?
N U
O 411
DC tt
M. k.
LIQUID
PRECIP.
~"!££ii
= 5
*
0 v, "*
> «•
•U JO '5CJH
i
2
O
O
r-t
t/»
O
o
o
o
1/1
-0
o
s
•0
a)
CM
CM
* O
CM
O
CO
o
o
o
a.
o
0
1
t/)
o
o
* 0
CM
•*N.
r-t
r-4
O
o
o
o
r-4
^
CM
1
1
1
O
a
*
i
o
cr
-*
O
o
1
1
1
a
u.
*
o
o
o
CM
*
CO
CM
*
CM
CO
*
O
O
0
u.
o
0
o
o
o
# o
o
in
o
i
i
i
2
Z
t
t
o
0
-
o
0
•+
1
1
o
-o
o
o
r-4
•^r
O
•O
O
O
CM
CO
CM
^
O
O
CM
O
* 0
O
O
O
O
0
o
0
o
o
m
~ — \^-
u
o
^
CM
1
1
O
°
*
1
1
O
- - -ts
*
S
tn
m
o
VI
o
•o
o
o
1
o
o
o
CM
CM
r>J
CO
CM
in
o
tn
CM
CM
in
o
o
o
u_
CO
o
o
o
1
ce.
•» o
o
tn
0
COT
CM
o
o
(M
o
•-I
CM
o
o
1
,-t
in
CO
o
CM
CM
CM
CO
CM
0
CM
o
CM
o
CM
CM
o
o
0
0
o
o
o
o
* o
i-4
o
* 0
XT
o
o
t 0
u
0
2
CM m
CO CO
CM CM
—1
CM
o
o
o
o
o
o
o
0
o
o
o
o
1
1
-o
—«
o
o
o
1
t
1
o
'O
o
1
o
D
1
1
1
o
- " o
a
tn
in
o
•*
o
o
r-4 CO
O CO
CM CM
i
CO
CM
r-
o
1
-o -»
CM — t
00
o
o
0
o
o
o
0
o
o
o
in o
i~* «3
1
o
CM
o
i
1
o
— o
o
1
1
o
• o
o
1
1
1
o
o
1
1
1
o
— TO
m
0
CM
o
o
CO
CO
CM
en
CM
o
1
r-
o
o
o
o
o
o
o
o
0
o
o
o o
CO »-«
1
r-
o
CM
O .-"
to o
IV
*
1
1
1
o
e>
o
1
1
1
o
-—(5
o
1
1
o
0
o
o
m
'Cfi
o
~H
CM
o
CM
O CM
O
o
CM
0
f^t
O
in
C\J
o
o
o
o
o
o
o
o
o
o
tn
"
-0
r-*
fM
SSSSSSSSKSSSSSSS
* COLUMN 3 ENTRIES) S = SCATTERED LAYER t * BROKEN LAYER X« OBSCURATION C*X" APPEARING IN CLOUD TYPE COLUMNS NOTEi "+" DENOTES "HEAVY**
C* MO CLOUD 0 * OVERCAST LAYER DENOTES OUCUHtHC PHENOMENA OTHER THAN POC) "-" DENOTES" **PARTIAL." "LIGHT." "THIN**, OR "MINUS** AS APPROPRIATE
Fip. 3d ZiErsnvABH PINVEV
SKDU^AtiSEO SJ^KHS OTSS ftBOd
171 -
-------
| NOAAr-ormtt-:** . ' ci ipFArP WFATWFR OR^FRVATION^ NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION
1 (».73) oUKrM\-C YYCMinnK \^DOCr\YAMiw.^o ENVIRONMENTAL DATA SERVICE
r
u
E
JJ
J
-t
<
X
j
-i
J
<
E
3
4
E
B
X
eg
STATION NAME
WRAN EDIT TEST #1
CO
« 0
>.
STATION NO.
00001
s
s
s
R
5
n
"
K
S
t
=
-
0
-
-
*
-
-
"
-J
o
u
an&Y«
K
J
"!£»*
3JAI
1MHOWY
NOI1VNHRS
ti
i
o
DC
1M9I1H
adxi
IHnOKY
NOIlYI-mnS
tl 3 AVI QNZ
3ND LAYER
LOKliT LAYER
'"*»",«'°
3JA1
iNROHV
JM9OH
Ull
lNnoit>
K3AOD AJCS
(*)
AllQimm
aAuviaii
tine ia«
(^*)
vina JIB a
(S3H3NI)
NOI1VJ.S
III
«. .SB
His
»- o
lit
auntsaitj
0
X
o
p
at
0
ae
X
ac
X
*
=1
o *•
St
at at
LIQUID
PRECIP.
-•asa.
VISIBILITY
(MILES)
>>
tt
M
*
^ "
_1 M
O M
a «••
S *
|VU rfO 'SON
9NI1I33
I
o
o
o
*/>
-
CO
0
CO
CM
r4
^t
r-
o
?
o
o
o
o
o
1
to
o
0
* o
*»•
•-4
o
* o
w
o
o
o
CM
"
— __
o
o
-
CM
O
0
CM
CM
*
CO
CM
CM
CO
O
IM
CM
O
O
U.
O
O
0
o
1
n:
* o
o
m
o
o
o
0
o
o o
0
u.
0
o
o
o
o
o
o
— «
o
0
o
CM
2
r-t
o
o
" ej
o
u_
r-4
o
o
en
en
.» CM
CO CO
CM CM
1-
0
in
o
en
CM en
0 O-
O 0-
0
u_
0
o
1
to
0
o
o
•J-
0
o
°
0
o
en
CM
«
OOOpCdDT
i i
"
eg
O>
o
in
«n
m
o
* CM
m
o
eg
o
en
enm
C-C"
o
u.
o
o
CO
0
o
0
0
o
o
Ci
-^
o
o
•o
CM
1
1
1
o
o
o
1
o
o
0
o
I-
o
0
•-4
o
o
"a;
0
KT
"
en
CO
o
en
*
o
i-
* eg
CM
O-
o
eg
CO
0
o
o
u
to
0
o
1
3:
» o
o
03
o
v
t~
o
o-
eg
1
1
1
o
0
0
1
1
o
o
o
.-I
i
i
i
o
D
CO
en
o
to
—-a
.-i
^
o
o
•o
en
en
o
* CM
CM
o
CM
en
m
CM
03
O
o
o
o
o
o
o
o
o
o
•-4
o
0
en
o
CM
CM
IN
•-4
O
•4-
0
* eg
t-
en
CM
CO
CM
en
CO
0
0
o
o
o
0
o
o
0
m
-r
o
o
CM
p-4
O
en
o
to
cf
.-i
r-4
O
o
CM
eg
CO
eg
r-4
1-4
CM
eg
"
o
o>
o
o
o
o
10
o
0
* o
CM
O
* o
tr
en
o
o
eg
1
1
1
o
cs
0
1
o
o
r>
D
O
en
«
w
0
en
0
10
w
•0
o
o
CM
CM
U
CO
CM
03
0
en
•o
o
o
o
o
o
o
to
o
o
o
« o
o
» o
"
™ -to
ft
0
o
CM
o *
in o
CM
•f
1
1
1
o
— e>
CO
1
1
1
O
e>
CO
1
1
1
o
0
o
o
en
m
CD
•O
o
CM
X
CC
CM
•"*
CO
CO
o
o o-
o o
o
o
o
o
o
o
0
0
o
in
m
i
i
i
•o
CM
O
1
1
1
O
0
o
1
1
1
o
o
1
o
o
1
1
o
o
0
i-
o
o
eg
c\
r-
03
CM
CM
CO
O
CM C\
CM -<
O O
O
o
o
o
0
o
o
o
o
0
CM
- 0
l_)
I
1
CM
CM
CM
O
O
cy
e>
o
03
0
CM
CM
ir\
eg
r-
co
CM
O
CO
CM
O
O
0
0
0
o
o
o
o
-3
0
m
l-J
•f
o
— » -d
a. o
OIM
0
1
o
0
o
1
1
o
0
o
1
o
1
1
1
o
o
°
o
o
o
o
a
*
o
CM •-*
-• 0
O 1
o
o
o
o
o
o
o
o
0
o
m
— 0
"
1
1
CM >J-
a. o
O 'M
1
A
o
<3
*
en
1
1
1
o
*
en
I
1
0
e>
m
o
o
*
en
-r
o
i-
CM
«
CO
CM
r-
en
"
o
O
o
o
o
o
o
0
o
o
o
m
*
I
I
-o
0
0 PHENOMENA OTMEU THAM POC) "-" DENOTM" "PAKTIAL." "LIGHT." "THIN". OX "MINUS" AS APP«OP«IATE
K I a eKOHEN LATEH X> O5SCUKATIOH I"
0 * OVEXCAST LATEX DENOTES OiSCU
TTEXED LAY
CLOUD
sS
u *
Ml U
ie
X
i
3
s--ss°---"-----s
Fig. 3e
172
SKOUVAMSiO 33VJKnS OtK HKOJ
-------
1 KOAA form t2-3*» SURFACE WFATHFR ORSFRVATIONS RATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION
1 IMS OUKPMVwC YYCAmCK VJD3CKVAIIWINO EMVIRONWSMTAL DATA SERVICE
NATIONAL CUMATIC CENTER
CLOUDS AND OBSCURING PHENOMENA
r
STATION NAME
WBAN EDIT TFST SI
CO
0
O
STATION NO.
00001 '
S
s
s
"
™
s
ft
s
o
=
-
t
s
•
-
-
»
*
-
-
o
u
3 nor jo
w
X
r"."il°"
3411
1NHOHV
B31Y1 OSC
]RD LAYER
'">S£"
3411
iNnowv
D3AV1 ONE
INO LAYER
LOWEST LAYER
'"js,y
3411
1NHOHV
1HM1H
3411
J»
no>.»
»3AO> A»S
1Y101
AllomnH
3AUV13tl
•ins i»
Uc)
9ing ABO
(13H3HI)
NOIiViS
;£l=
a .°S
""
slf
(•!»«!
13A31 VIS
WEATHER AND/OR OBSTRUCTION TO VISION
s'i
PROZEN
PRECIP.
D fe
^ 9.
• 0 BO*HI01
2
. SKY
SYM&OLS
S
„ "
„, "
„ "
„ -
•14 JO 'SOU
9NI1I»
1
o
1
t
o
0
0
1
1
o
o
o
1
o
1
1
0
o
CM
N
CO
SCATTERED LAYER • • BROKEN LAYER Urn OBSCURATION <"K" APPEARING IN CLOUD TYPE COLUMNS NOTHl *H" DENOTES "HEAVY"
C» NO CLOUD 0 « OVERCAST LAYER DENOTE! CISCURINO PHENOMENA OTHER THAN POO) "_" DENOTES" "PARTIAL," "LIGHT," "THIN", 0* "MINUS" AS APPHOPRIATt!
Fig. 3f
173
-------
1 HOA«. f a-m u-3»* . ^IIRFATP WFATHFR OBSERVATIONS »*THMAL OCEANIC AND ATMOSPHERIC ADMINISTRATION
1 (MM OUKPAt-C YYCAinCK UD3CKYAIIWINO ENVIRONMENTAL DATA SERVICE
NATIONAL CLIMATIC CENTSR |
kf
5
i
00
o
o
r-
STATION NO. |V
oooox 1
s
n
R
a
s
s
s
«
s
s
=
E
~
-
'
*
-
'
"
"
«•
8
JftOYJO
«
M
£
"is.r
94J.I
1NOOHV
NOIXV
)RD LAYER
WMRt
1H911M
301
INnoKY
NOUVMwns
>NO LAYER
LOVCST LAYER
"is,"r
MAI
imotn
"ISff"
MAI
1NHOMV
U3AOD AXf
1Y101
(t)
AllOIHnH
1AI1V1JII
BlrtB 194
U.)
(I3HDHI)
j»nsi3»J
NOllYlf
£ i
|
lil|
°£~"
fSVMV
T3A3TT3S
X
O
o
\
o
1
o
ac
X
-'I
e •»
LIQUID
PRECIP.
M DO*»Mi
\
«<
1
J
„ '
„ "
„ "
«* ^
I'la JO'SOH
9NI1U3
5
J- L_
1 n
i n
an
ca
(a
Fe
.111111
ventory of No. 1 car
formation. Note the
d the 0400 card on 1
rd count, 222, is 2
t 8 observa-t ions per
bruary.
o
o
~*
000
o o o
o o o
o o"6
o do"
000
o C- o
o rt rs
O o O
O O O
o o o
o'o'o
O 0 O
o o o
o'6'o
o c o
" n o
000
o o o
o o o
o'o'o
o o o
o oo
000
so o- a
0~0 0
o o o
o o o
o"5 o
o o o
S o~o
oTTo
-f CM ffl
_1
•ds
>t t
he
les
• da
o o o
o oo
o o o
—t ft ft
o'oo
o o o
o o o
o'-o'o
o o o
*"lf. -0
_JL. L J
conta? ni ng
he 0100 ca
27th are rr
s than the
y) for a n
0 0 0
o o o
o o o
ft f-t ft
o'oo
o o o
°_?_?
o o o
t- 00 O>
o o o
o o o
o o o
O'O'O
o'oo
o o o
?^_1
o o o
O ft N
o o o
o o o
o o o
o'oo
000
o o o
0"0"0
o o
rt * in
_l
cl
rd
iss
pr
on-
000
00 O
000
—4 r-t t~t
a~c o
o o o
o o o
0 0
coo
_J .
-
oud layer
on the 24t
ing. The
ogram coun
leap year
•o r* oo o* o ft
fM
h -
t -
-
-
«TES "NEAYY"
HOTEI" "PARTIAL," "LI6NT," "THIN", OK "MINUS" AS APP*OT«UTE
> Q
t
I
• > MOKEN LAYER X* OMCUHAT1OH f»" APPEARIH* IM CLOUO TYPE COLUMNS NO
0 > OYEHCAST LAYER DENOTES OUCUftlHO PHENOMENA OTHER THAN POO)
TTEREO LAY
CLOUO
u i
i £
M
t-
M
8
oo
Fig. 3g
174
z«i -Blur/van
ot«»SM
-------
VI.
Visual Checks
Data that are not keyed onto tape are given a limited visual scan
as a random check for quality control and consistency of climatolog-
ical data entries.
WBAN No. 3 Card Edit and Listing
The No. 3 WBAN Card images (Fig.4 ) contain varying daily Climatolog-
ical data for individual stations for the period midnight-midnight LST
,
STATION
NUMBER
00000
1 2 3 4 5
1 1 1 1 1
r~DAtr~
Y*
00
9 7
1 1
uo
00
1 9
1 1
DO NOT
PUNCH IN
THESE
COLUMNS
6GE6E
77777
88888
99999
66
77
88
99
AfK
6
7
8
9
DAY
00
!« 11
1 1
7?
33
4
5
6
7
8
9
MAX
TEMP
CFI
3
* 0 0
a 13 14
1 1
77
33
44
55
66
77
88
99
1231 BSC
HIM
TEMP
l-F]
3
* 0 0
15 16 17
1 1
77
33
44
55
66
77
88
99
PRECIP
(1.-.J
E
OOlOO
II 19120 21
I 111 1
77i?7
1
33133
1
44144
1
5155
6|66
1
7177
1
8188
1
9199
i
II rt 11
SNOW-
FALL
-------
VII. Program and Tape Control Procedures
In order to provide appropriate edit information and to meet publi-
cation deadlines, the records are placed on two separate tapes each
month, preceded by certain station program and priority information.
A. Tape No.l contains data for all stations in the LCD and CD
programs and Tape No.2 contains data for all others.
B. A thirteen-digit Program and Priority Editing Code is provided
in the station's header on the name tape. Positions 1-11 indi-
cate the programs in which the station participates. The figure
"1" in the various positions indicates that the station has that
program; "0" indicates it does not.
1. Sunshine data keyed in cols. 54-58.
2. Fastest mile in compass points.
3. Station in 1009 program.
4. Station has monthly temperature normals.
5. Station has mid-monthly temperature normals.
6. Station has monthly precipitation normals.
7. Station has degree day normals.
8. Station in Extended Forecast program.
9. "Days With" are keyed in cols. 41-51.
10. "Water Equivalent" keyed in cols. 63-65
when snow depth 002, or greater.
11. Station has CD number.
12. Station has LCD, coded 1; no LCD, coded 2.
13. Station is operating, coded 1; closed, coded zero
(a convenience in using the name tape as a reference
in other programs).
VIII. Edit and Listing
The machine edit is designed to detect various inconsistencies of
data. The corrected (updated) listings provide various computations
of sums, averages, departures from normal and counts of number of
occurrences, etc., used in the climatological programs.
Sample listings appear on pages 20a and 20b.
The fields for all inconsistencies noted in the edit are flagged
with appropriate symbols in the column(s) to the right of the
field(s) questioned.
Checks are made and field(s) flagged for review, according to the
outline below:
A. All columns 1-80
"12" overpunch.
176
-------
I
I
8
O
c
c
o
0
o
z
c
t—
t-
<3
t-
V
>
u.
H
• c
c
f-
vl
•—
c
c
c
c
UJ
o
o
EATHER TW ES
. Z 0 KH BS MF
OCCURRENCES Of W
f T (P A R S
2?
SUNSHINE
HOURS K
HJ Of
I5
ll
20
C
Od
z
L
35
ii
ii
X ^
S"
c
o —• m -o ^1-
rn O ^f -O ir\ ;
o o o — « «-t,
^S \^ jf
UJ 3:
^-« Z
*-*
% ^t A -l
o co r- o o
r^ & C3 & -t
S (si UJ
-O Z •-!
m z (o
O •-• (SI O
•-t in (si r-«
* *
O O O O O
ft I
fl r-t f*~ *£" -T
* O O CO
cs» O O<
«
3: 2 IU;
r- 21Z
m .
r-« o rsi -O
rsi CM (Si .-< <
*i
•M ^« O •-• i-
csl 000
=? O O O
0 O O O
CO O O --<|
r-' o o -«
•O 'r+ O •-«
u^ o .0 o
,* 0 O •-•
CA O O O
rsi 1-1 O ^-«
~$\ Q* fi^ ^3
^ &• »-4 ^3
O • in o o
(SI O
r- 4 ^J f*^ C5
(si O
rH r-«
*
ro c^ co
r^- (A (O
*
csi rA rH in
O CM CSI .-•
o -»•
o o o *$•
o CM o m
o rj>o (si
O O (7s
•^ o o o o i
o n ir» o csi <
D (si o O r-J (
S 3 3 S 3:;
IS 2-
•si »o •-> m in
>J ^4 ff) ^t ^<
-< .-i ^-< O O '<
D O O O O
O O O O O
O O O O O
r-4 •-» O O O
O O O -• — i
r-* r~« O O O
o o o o o
o •-> o o o
o o o o o
r-l .-• _4 O O
CO O Cl O(SI
•-4
co O rsi o o
r-t •-<
n o co O (•<•>
03 O w^L^(>(slrj
in -f rj
~o A o -*
*J ^f ~3 csi ro
>0 1? co 0 o N n -* in
•^ r- csi —i so
D M -O O O .
>J (SI r-« CNJ O
S 3 3 3: 2
z^z-zz
£. 10 3
co -O 1— rA co
D O O O O
O O O O O
o o o o o
o o o o o
O O O 0 O
O O -i O O
o o o o o'
o o o o o
o o 6 o o
o o o o o
O O .-i O O
f*- (O CO -^ »-«
-o o o» csi o cst r-«
M csi csi en en
-< cst (si co r-
M .-t esi m csi
." •
- • •
^-l O1 ~t
S 01 - Z
1
D O O O r*
o -• o o o
o o o o o
o o o o o
o o o o >-i
— • ••* O O r*
r-l »^ O O »-«
o o o o o
O O O O -i
O O O O -i
O r-l O O •-*
r~ in o IA o
~
A in o -o o
ro O CO *—* O
-o m o o
CO -t CO O O
WD IAOO
O O O O IA
;sj rs( csi
r- >o co csi (si
-1 C<1 CS) r-t CSI
rA m CM
OO«-«OO| OOO
1"
D O O O O
IA in
CM r- o csi -i
O rsi o O O
' CM
•O ro ro I*- co
ro -J in -J- (si
o (sj csi co r-
rj CSI -• r-l PI
D O O
300 .
300
D O O
D O O
-< o o
boo
o o o
0 O 0
000
300
•- en ^
•Q t-'CO
•t O 1A
-• CO IA
A O -O
•>- o o •
si ro (A .
SJ (SI (SI
o o
0 0 -
o o
o -o -a
~- o o
rSl r-l CM
INJr-ICSI Ui • CM j-t r-l CM _ - •
-i- -a r- rsi ^
^ CA C '1 CSJ C^a
,-o r- ca c* o
xnr-ocoot^-co-o
n"T"--
3: 2:
NCMIMfMCMNCMCMfMrt
r-
r-(
•V
r-l
•o
CO
IA
IA
(M
•o
CO
CO
-o
(M
o
»A
CO
•a-
•o
o
CSI
o
1
«SI
SA
CO
>f
•
IA
CO
(M
CO
r-
cs
0>
m
-4-
(SI
IA
r-
I
•o
3 «
O:
UJ
Z c*»
IA
•
(SI
O
kTV
(SI
O
(SI
O IA
• CSI
(SI
.
o
-J-
t
O
sr \f\
s
c*
5
(1) Water oquivaloni of mow and ice on ground.
SNOW MLY (2) If 'no / (salidus) appears, speeds are gusts. Figures for directions are lens
) 0+ ClEAIt ClOY ClOY of degrees from Irue North; i.e., 9 = Eosl, 18 =SoutS, 27 = Wesl, and 34 =Norlh.
4 1178' When directions ore in tons of degrees, speeds ore fastest observed 1-minule values.
(3) S-S indicates sunrise lo sunset and M-M midnight lo midnight.
(4) Enlry of 1 indicates occurrenco, 0 indicates no occurrenco. Weoiher types orei
.. • • F =fog, visibility moro than '/Smile; T = thunderstorm; Ifc &~ K, A =hoill R = roinj
—,- . S =snowj 2 =glazo; D =dust, visibility Vjmilo or less; KH =imoko or hoio or both;
BS = blowing »now; ond HF =heoxy fog • (visibility 14 mil* or less du« lo fog).
+ (SI |
*!«! i
o^-H IA
c
. . o
.i » N 5
Si V •;
Tf^J
Si y j5
2 | | — o
Q ' A 0
«. '. *~~co ^
2 si v
z
177
-------
-
Olu
00
o rum mix »« o«t nrainom r-+ me* «A o> •» n» «-»
•«•«•«••••••«»* -« on •» o
0
a
IT*
Or- CMVtCM
0*»<
CMRItriCMfX^IWfM
oooo
ooooo
ooooooooooooooo
5 0
5 *
S o
CM
-
ooooo
_• O O'*-*
OOOOOOOOX^
S3
_ -<—«o> r- o o «in r- —
»«O'<» o «coo «r-r»^
-rmc««
r-» OCMO
gi
IS
•»
X
o "^
•:
ooooo
CW O O -
O <
00-
O I
ooooooooo ooooo
oooooooooo ooooo
OOO-^O 3OOOO
oo-<»--«ooo-^ >-
O «fk !*• Ol>-
n m m a> tn
U tMUl til I
z:
ooooo
O OO t-O
oo> m r-r^
OOOO
>~t.«,«O
r- •» « o
••»••» CO O
IUUIIU IU
z
or- o 01 o
ooooo
ooooo
-^
S»
Z
O:
u O
fS
o* u
t-f - 5
o
2 0
XDQ ai^t jo XjDiuu
jtY puo jiu
178
-------
B. Day (col. 10-11)
1. ^> Possible for month
2. Missing
C. Max. Temp. (cols. 12-14)
Legal punches are: X, 0, or 1 in col. 12 and 0-9 in cols. 13-14.
1. Illegal punches
2. < Min. Temp (cols. 15-17)
3. < Min. Temp. (cols. 15-17) of previous day
Print negative values (X punch in col. 12) with a minus (-)
preceding numerical values in cols. 13-14.
D. Min. Temp. (cols. 15-17)
Legal punches are: 0 or X for col. 15 and 0-9 for cols. 16-17.
1. Illegal punches
2. > Max. Temp. (cols. 12-14) of previous day
Print negative values (X punch in col. 15) with a minus (-)
preceding numerical values in cols. 16-17.
E. Precipitation (cols. 18-21)
Legal punches are 0-9 or BBBX
1. Illegal punches
2. "0000" with cols. 22-24 other than "000" or "BBB"
3. Col. 21 "X" with cols. 18-20 other than B
4. Any of cols. 18-20B with col. 21 "0-9"
5. > 1000
Print BBBX as "T." Also print "OOOX" as "T" but flag as
error as indicated above.
F. Snowfall (cols. 22-24)
Legal punches are 0-9 or BBX
1. Illegal punches
2. Col. 24 "X" with cols. 22 and 23 other than B
3. Cols. 22 or 23 B with col. 24 "0-9"
4. > 200
Print BBX as "T." Also print "OOX" as "T," but flag as
error as indicated above.
G. Snow Depth (cols. 25-27)
Legal punches are 0-9 or BBX. May also be B for entire field.
1. Illegal punches
2. Other than "000" with cols. 22-27 for preceding day and cols.
22-24 for same day punched all O's.
179
-------
3. Col. 27 "X" with cols. 25 and 26 other than "B"
4. Cols. 25 or 26 B with col. 27 "0-9"
Print "BBX" as "T." Also print "OCX" as "T," but flag as error
as indicated above.
H. Peak Gusts, Direction and Time (cols. 28-35)
Legal punches are:
0-9 for cols. 28-30,
The "Alpha" Compass Point Code for cols. 31-32, and
000 - 239 for cols. 33-35 or entire field may be "B."
An "X" in col. 31 is programmed to convert peak gust speeds from
knots to mph and publish under fastest mile heading with "/" following
the direction as an indicator of peak gust speed. Omission of "X" in
col. 31 will be flagged by "$" following the direction on the edit
listing.
A "#" following the speed and direction spaces on the edit calls atten-
tion to entry of speed with direction omitted.
The Alpha Compass Point punches are:
00 C (calm) 22 NE 44 SE 66 SW
11 N 32 ENE 54 SSE 76 WSW
12 NNE 33 E 55 S 77 W
18 NNW 34 ESE 56 SSW 78 WNW
1. Illegal punches
2. Cols. 28-30 > 050
Print in Alpha Code Letters.
I. "Days With" (cols. 41-51)
Legal punches are 0 or 1 if in station's program, otherwise all cols.
should be B. (If punched, all columns should be punched.)
1. Illegal punches
2. CoL. 41 "0" with "1" in col. 51
3. Col. 43 "1" with either or both fields (cols. 18-21, 22-24)
all O's.
4. Col. 43 "1" with min. temp. (cols. 15-17) > 044
5. Col. 44 "1" with cols. 18-21 "0000"
6. Col. 44 "1" with cols. 43 & 46 "0" & cols. 22-24 other than "0000"
7. Col. 45 "1" with cols. 18-21 "0000"
8. Col. 46 "1" with either or both fields (cols. 18-21, 22-24) all O's.
9. Col. 46 "1" with min. temp. (cols. 15-17) > 044
10. Col. 47 "1" with col. 45 "0" (some exceptions, but flag)
11. Col. 47 "1" with min. temp. (cols. 15-17) > 039
12. Col. 50 "1" with either cols. 28-30 or 59-60 (if punched 010 or 10
respectively).
180
-------
J. Sky Cover (cols. 52, 53)
Legal punches are 0-9 and X for both cols, or "B" for col. 53
if cols. 41-51 are B.
1. Col. 52 B with other than B in col. 53
2. col. 52 > 3 greater than col. 53
3. Col. 53 other than B with cols. 41-51 B
4. Col. 53 > 2 greater than col. 52
Print "X" punches as "10"
K. Sunshine and Percent of Possible (cols. 54-58)
Legal punches are: 000-199 for cols. 54-56, 0-9 or X for col. 57
and 0-9 or B for col. 58. Also entire field may be B.
1. Illegal punches
2. Col. 57 "X" with underpunch
3. Cols. 54-58 are blank
4. Col. 57 "X" with other than B in col. 58
Print as "100" when cols. 57-58 punched "XB." Also print
"100 when col. 57 has an "X" punch regardless of other illegal
punching in either or both cols. 57 or 58, but flag as error
as indicated above.
5. With cols. 54-56 punched 000, cols. 57-58 with other than zeros
6. With cols. 57-58 punched 000, cols. 54-56 with other than zeros
7. With cols. 54-56 punched greater than 000, cols. 57-58 will be
greater than 00.
L. Fastest Mile and Direction (cols. 59-62)
Legal punches are: 0-9 for cols. 59-60 with an X overpunch permitted
in col. 59 for speeds of *> 100, 00-36 for cols. 61-62 if neither
col. has an "X" overpunch, and the "Alpha" Compass Point Code (see
VIII,H above) if either or both (cols. 61-62) have an "X" overpunch.
Illegal punches:
1. Cols. 59-60 "00" (without "X" overpunch in col. 59) with
other than "00" in cols. 61-62.
2. Cols. 59-60 > 50.
3. Cols. 59, 61 or 62 punched "X" without an underpunch 0-9.
4. Col. 62 "X" overpunched with no "X" overpunch in col. 61.
181
-------
Print:
1. "1" preceding speed punched in cols. 59-60 when col. 59 has an
"X" overpunch.
2. Direction in the "Alpha" code letters when either or both cols.
61, 62 have an "X" overpunch. (See VIII, H above.)
3. A dash (-) in col. following direction with col. 61 has an "X"
overpunch.
4. A plus (+) in 2nd col. following the direction when col. 62 has
an "X" overpunch.
M. Water Equivalent (cols. 63-65)
Legal punches are: 0-9 or B. Water Equivalent is in inches & tenths.
Illegal punches: B in any of cols. 63-64 with col. 65 punched 0-9.
Other cols. (36-40, 66, 68-80) should be B.
IX. Machine Computations
Various sums, means, departures (from pre-programmed normals), frequency
counts, summary cards, etc., necessary in the verification program and
used in the preparation of formats for the LCD, CDNS, and Table J are
made by the computer.
Print the sums, averages, etc., from the data available when some days
and/or items are missing.
A. Daily Computation are made for:
1. Average temperature
2. Departure from normal
3. Degree days
B. Monthly Sums are computed and listed for:
1. Max. temperature
2. Min. temperature
3. Mean temperature
4. Degree days, heating and cooling
5. Precipitation
6. Snowfall
7. Sunshine
8. "Days With" (if in station's program)
9. Sky Cover (SR-SS & Mid-Mid)
182
-------
C. Monthly Averages are computed and listed for:
1. Max. temperature
2. Min. temperature
3. Mean temperature (this is 1/2 the sum of the average
max. and min., C, 1 & 2 above).
4. Average percent of possible sunshine (sum of daily
percentages divided by the number of days).
Monthly percent of possible sunshine is computed from
total sunshine recorded and the pre-programmed possible
amount, sunrise to sunset.
183
-------
D. Monthly Departures are computed and listed for:
1. Mean Temperature
2. Degree days, heating and cooling
3. Precipitation
E. Seasonal Departure for Degree Days (from seasonal totals carried
forward from preceding month and current month's total) are com-
puted and listed. Season begins with July for heating and January
for cooling.
P. Extremes and Dates are selected and listed for:
1. Highest temperature
2. Lowest temperature
3. Greatest precipitation
4. Greatest Snowfall
5. Greatest Snow Depth
6. Greatest Wind Speed and Direction
(When the same value occurs on two or more dates, the date of
the last occurrence followed by a plus (+) is listed. Also,
the direction of the last occurrence of multiple "Greatest Wind
speed" is printed.)
G. Frequency Counts are made and listed for:
1. Temperature
a. Max. :£ 32
b. Max. ^90, except 5^70 for Alaskan stations
c. Min. 2:32
d. Min. :a 00
2. Precipitation
a. Trace (BBBX)
b. > 0001
c. >, 0010
d. 5; 0050
e. > 0100
3. Snowfall
a. =>: 010
4. Character of Day (SR-SS)
a. Clear (Avg. 0-3)
b. Partly Cloudy (Avg. 4-7)
c. Cloudy (Avg. 8-10) (Punched 8, 9, or X)
184
-------
X.
Precipitation Data Card Images
A. Program Involved. Hourly precipitation, monthly extremes, and
maximum precipitation.
1. Hourly precipitation, greatest amounts of precipitation,
snowfall, and snow depth and maximum precipitation are con-
tained in a series of tape formats currently known as the
HPD Deck. These are identified as to station, year, and
month in the same manner as the WBAN #1 and #3 cards.
STATION
NUMBER
OOOOC
1 2 1 * 5
11111
YR
00
C I
1 1
DATE
WO
00
1 >
1 1
DAY
00
iQ II
1 1
|c«flO W.MBSR 1
22222|22|22|2 2
(CASO HUMSER 2
3 33 3 3|3 3|3 3i3 3
[CAP? DUMBER 3
44444)4 4I44J4 4
JCARO «uuacR 4
55555
55
00 HOT
PUNCH IN
T1CSE CCUUHi
9999999
1 1 ! 4 ill 1
55
NS
I 99
"
5 i
66
7 7
86
99
it 11
.
8
0
i?
1
2
3
4
5
6
7
E
c
11
c«os • c---j 2 ;*»- i
1
IE
0100
!j|l4 IS
111 1
0.00
2122
1300
3|33
9 MM
4|44
1
5|55
6|6G
'!"
he
3l99
1) U IS
1
!n
i
0:00
5|IJ IS
111 1
0200
?I22
1400
3133
10 M'.N
4|44
1
5155
1
E|66
1
7|77
1
6)88
1
9)99
KI7II
1
'It
1
0100
nb a
111 1
OiOO
21:2
IWO
313 3
.3 UH4
4|44
1
5|55
t
6166
1
7177
1
818 0
1
9199
HHII
1
:'E
OIOO
nb ;t
ill i
0400
212 2
isoo
3133
20 MiN
4|44
1
5135
1
6)66
1
717 7
1
eise
i
9199
nilU
1
•JE
0100
-.Is .-7
111 1
0500
'.M 2
17X>
3133
W M"l
4j;4
1
5|55
1
6|6 6
1
717 7
1
8|88
1
9190
Bai7
1
•!c
OIOO
3sl:j M
ih 1
O6UO
2I2 7
leoo
3I33
45 M.S
4U4
1
5155
1
6|G6
1
7177
1
8138
1
919 9
nil 30
1
,'E
o;oo
'ii t
0700
2122
_•_•. - - .. .
TS
3
0 010 03 0 OJO 0
i ih i
1
1 2|2 2
,i,
4j
j 515 5
1
56|6 6
1
7 7|7 7
1
88188
1
3 919 9
57 53 53 £3
5
Q
51
1 1
2 2
3 3
',
5
6
7
8
9
ti a
1 1
2 2
33
4
5
6
7
8
9
C4 IS
•R
~!
0 O'OJO'I 0
1 111
,i
1
3 3,3
1
4 4U
1
5515
1
6 S|6
1
77|7
1
9813
I
9 919
u t; &
it
z.
<
ll
<
£S
i i
22
3 o
_i
5
E
7
8
9
ESI
0~0
1 1
22
3 3
11
5 'j
86
7 7
38
9 9
'2 73
Ji_
00
1 1
2 2
33
4J
5l
66
7 7
38
3 9
14 IS
j"1
0
c
a:
c
4
0 0
1 1
2 2
"
4
5
6
7
8
o
77 «
j 0
; i
i
22
3 3
•1 4
55
6 E
7 7
8 8
3 9
o «
Fig. 5
2. Hourly precipitation, HPD card format 1 or 2 in col. 12
as identifier.
For each station in the LCD program, #1 and 2 HPD data are
keyed each day with precipitation and for the last day of
the month whether precipitation has occurred or not. If
stations are not equipped with recording gages, amounts are
keyed only at 6-hourly synoptic times. In this case, the
daily total is not keyed in the second format; the monthly
total, however, is keyed in the last format of the month for
all stations.
185
-------
B. Checking Procedure - Hourly Precipitation
1. The checking is accomplished by a computer cross-foot listing
to insure internal compatibility. A second check is made be-
tween the daily totals and the monthly total. The cross-foot
for each station is begun by building in the memory of the
computer a grid of zeros for all days in the month, i. e.,
28 days, 30 days, etc., as the calendar requires. The keyed
data are read into the grid and then edited. Information
concerning missing record, blank fields, erroneous keying,
and arithmetic mistakes is listed to the right of the data
field. If a record is missing, the computer will list all
hourly fields as having zero precipitation with indication
to the right that the record is missing. In the case of
duplicates, only the last presented to the computer will be
used and duplication indicated to the right.
Since the presence of the HPD 1 & 2 record is a controlling
factor, stations not having hourly precipitation must have
"dummy" records for the last day of the month, containing
only identification, date, and card number data.
The HPD #4 card image (4 in col. 12) has the greatest 24
hour precipitation and date, snowfall and date, and great-
est depth of snow on the ground and date. There is only
one #4 per station month.
2. The edit checks of the HPD 1, 2, and 4 card images are
as follows: (Sample shown on page ]ygm
Column Data Edit Check
1- 5 Station No. Sequence checked by number with a
4 punched in column 12 of 1st image.
6- 9 Year & Month Values are checked and must be the
same for the entire edit. Month
must be in range of 01-12 in cols.
8-9.
10-12 Day Card No. Only days with pcpn. are keyed ex-
cept for the last day of the month.
Each day will have only two images
identified as 1 and 2 in col. 12.
No. 2 has the daily total in cols.
49-52. The #4 in col. 12 will not
have day punched in cols. 10-12.
186
-------
Hour Iy edi t
10 1J 12 __
00001
oooof
ooeoi
ooooiT
00001
ooooi"
00001
O'OOOl"
• 00001
00001"
ooool
~~oooor
00001
ooool
00001
70
70
70
"70
70
'70
70
"70
70
70
70
70
70
70
70
ooboT"7o"
00001 70
~ 00001
70
06
~06
06
T6
06
"06
06
"06
06
"06
06
"06
06
06
06
1
2
~~S
6
~7
'e
"10
11
"12
1«
"15
IS
21
22
""66-2S
06 26
"06
'JO
1 000
2 000
~i~ooo
_2 000
1 000
2 000
~i ooo"
2 000
1 OCO
2 OCO
'V'ooo
2 000
'l 002
2 000
~i~" 000
_2_COO
1 000
~2_ 000"
"1" 000
2 000
1 000
2 000
1" 000
2 000
1 000
2 000
1 .000
2_000
1 000
2 000'
"T~OOV"
_2__005
1 000
2 000
1
000
000
000
000
000
000
000
000
000
'ooo
000
000
010
000
000
000
000
000
000
000
ooo
'000
000
000
000
000
000
000
000
000
010
010
000
000
oos
000
000
003
000
000
000
000
000
ooo"
001
005
005
010'
003
000
000
000
001
001
000
000
ooo
000
000
ooo"
001
001
001
'005"
"010~
010
T
• r
00-
000
000
000
T
000
000
000
000
005
000
002
015
005
000
000
000
001
001
T
T
000
000
000
000
001
001
001
005
015
010
000
000
000
004
000
000
000
' 000
000
030
000
005
005
003
005
005
000
000
"ooo
001
OU1
000
"ooo
001
001
T
f
001
001
001
' f
"005
T
ooo
000
000
CO*
000
000
000
000
0 1
000
000
000
005
005
000
" 10
000
000
T
T
001
001
000
"ooo
000
000
T
T
000
000
000
010
"020
oos
000
000
001
"ooo"
000
000
000
~ooo~
000
000
000
"ooo
005
000
000
"010
000
000
T
T"
000
000
000
000
000
000
000
~00l'
ouo
000
000
000
"o:'5"
005
000
ooo
000 <
000 (
000 <
000 (
000_<
ooo"i
000 (
000 (
000 (
ooo~c
005 (
000 <
005 (
005 (
000 (
000 (
000 <
000~<
000 (
000 (
000
000
000 (
000 (
000 <
ooo~<
000 !
001 <
001 (
ooo'c
020~(
010 C
000
coo
000 000 000
ooo ooo ooo
000 000 000
000 000 000
ooo_ooo_coo
ooo ooo ooo"
ooo ool ooo
000 000 000
000 000 000
oco'ooo'ooo"
005 000 000
000 000 000
005 00 000
005 000 000
005 T 005
000 000 000
000 000 000
ooo'ooo'ooo
ooo ooo ooo
000 000 000
_T__000 000
~T 000"~000~
000 000 000
001 000 000
000 000 000
coo" ooo ooo'
000 000 000
000 000 000
001_000 000
000 000~000"
O'.O" 005 005"
005 005 000
_T_000__000
T 000 000"
000
"ooo~
0010 ERROR HR 3 *
ooo MISSING no. i CARD
000 _0003 ERROR HR _ 16
ooo_
000
_H 1SS1NC _Np_._V_CARD_
00 - CROSSFOOT IKH.OH
000
000
000
000
--5
000
000
000
000
000
coot
T
0092'
002)
ERROR HR 6
HISSING N3. 1 CAR?
ERROR KR 17
ERROR HR 12
ERROR HR 1018
000
000
000
000
oooa
000"
000
000_
'ooo
000
000
000
'ooo
0003
0001
0007_
-oWf
010
010
009
000""
020",
_ 00001. 70 .0»_.f.
_CiRD. MONTHLY. .TOTK_0<.30_...CC!',?UTE 3
CCMPUTED
TOTAI._.03ao__£RRCR._
_ 25 __________
6 Hnurly Edi t_
23237 70 06 t 1
2
23237 70 06 » 1
2
23237 70 06 12 1
2
23237 70 06 1* 1
2
23237 70 06 25 1
2
23Z37 70 06 24 1 '
2
23J37 70 04 30 1
'• 2
23237 70 04 4
000
ool
005
000
000
000
T
000
ooo
000
000
000
000
T
001
000
000
I
T
000
000
T
T
000
CROSSFOOT ERROR 0001
CROSSFOOT ERROR OOOi
CROSSFOOT ERROR. T
CROSSFOOT ERROR T
CROSSFOOT ERRBR T
CROSSFOOT ERROR T
0007 00309 000 000
K(
HPD EDIT LISTING
187
-------
Column
Data
Edit Check
13-48
Hourly Values
49-52
Daily Total
53-56
Monthly Total
Each hour has three cols, for data,
i. e., hour 0100 cols. 13-15, etc.
Zero pcpn. is keyed "000." BBO, BOO,
OBO, COB & BBB are flagged. All cols.
are keyed with zeros placed to fill
the col. Blanks are flagged. Trace
amounts are indicated by an X in the
right col. of the hour, preceded by
two blank columns. Punched data of OOX,
OBX, BOX and over-punches are flagged
as errors. A trace is indicated by
an X and accumulation by a Y punch.
Flagged for error when these data are
omitted from the HPD #2 or keyed in the
#1. When entered, the field is fully
keyed and errors are indicated for
blank columns. Trace is BBBX. Data
are flagged for OOOX, BOOX, etc. The
daily total is checked with the values
in cols. 13-48 of the HPD #1 & #2
cards. If the values do not agree it is
indicated as a "cross-foot error" and
the amount of error is shown. The cross-
foot does not function if there are
illegal punches.
Keyed in the #2 card of the last calen-
dar day of the month. This datum is
listed at the bottom of the edit as
card total. It is compared to the com-
puted total taken from all daily #2
cards. If the totals are the same, the
word "agree" is printed and, if not,
the word "error" appears. If the total
is omitted from the last #2 card, the
card total is blank and error indicated.
C. Checking Procedure - Extreme Precipitation
1. The remaining data are contained in the HPD #4 card image. This
card contains no date in cols. 10-11, and cols. 13-56 are blank.
The card is listed on the edit below the last day of the month.
Column
Data
Edit Check
57-60 Greatest pcpn.
in 24 hours
The value is checked for illegal
punching.
188
-------
Column
61-65
Data
Date of 24
hour amount
66-68
69-73
74-75
Greatest 24
hr. snowfall
Date of 24
hr. snowfall
Greatest
snow depth
76-78
79-80
Date of
snow depth
None
Edit Check
Col. 61 is keyed zero or X. Other
values are flagged. When the value
in 57-60 is 0000 these cols, will be
blank and are flagged if not. The
field is fully keyed if there is a
value for 57-58 and listed as an
"error pcpn. date" if miskeyed.
Datum is keyed the same as the hourly
pcpn. and has the same error check.
Same check as in cols. 61-65, with 69
keyed 0 or X with data in cols. 66-68.
Zero is keyed for no snow. 2" = 02;
110 = X/10. Note: The snow depth is
keyed in two cols., but prints to
three places. This is to accommodate
the overpunching for values greater
than "99."
These cols, are blank if the value
in cols. 74-75 is 00. 76 is keyed X
for + dates or zero. Other values
are flagged.
HPD cards 1, 2, and 4 are blank in
these fields.
2. The edit contains a "Computed High - 24 Hour Precipitation"
with dates. This is a guide to checking this value on the LCD.
There is no check by the computer between this value and the
one keyed in the HPD #4 card.
D. Correction of HPD Data
Data contained in the HPD 1, 2, and 4 card forms are corrected
by submitting to the computer a new card punched in its entirety
containing the information to be updated.
E. Maximum Short Period Precipitation.
For each month, maximum precipitation is keyed as two records:
1 in col. 10 with data for 5, 10, 15, 20, 30 and 45-minute
periods, and 2 in col. 10 with data for 60, 80, 100, 120, 150
and 180-minute periods. See page 31 for the keying format.
Day and time entries designate the end of the time period in
which the amount of precipitation occurred. Day and time are
omitted when the amount is zero, trace, or missing.
189
-------
A computer edit program checks completeness and consistency
of the data and produces an edit listing with flags indicating
the deficiencies. The flags and associated deficiencies are
as follows:
A = Record #1 missing
B = Record #2 missing
C = Month < (2(1 or > 12
D = Day < 01 or > 31
E = Hour < 00 or > 23
F = Minutes 59
R = Amount zero or trace
S = Missing "M"
T = Pcpn, 5; 0.01 with day
or time missing
W = 10.00 or greater
AA
AB
AC
AD
AE
AF
AG
AH
AI
AJ
AK
AL
AM
AN
= 10
= 15
= 20
= 20
= 30
= 30
= 45
= 80
= 100
=120
= 60
= 150
=120
=180
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
> 2
;> 5
:p=- 5
-y* 2
^»10
•y* 2
^=*15
>>20
?-20
^•20
:=- 2
5>30
> 2
^•60
X
MIN
MIN
X
MIN
X
MIN
MIN
MIN
MIN
X
MIN
X
MIN
5 MIN
+ 10
+ 15
10
+ 20
15
+ 30
+ 60
+ 80
+ 100
30
+ 120
60
+120
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
MIN
Corrections are keyed in the format shown on page 31, enter-
ing data for the time period involved only, for updating the
tape. The updated tape produces printers copy for use in
the CDNS Annual.
190
-------
s
H
IS
H
H
O
§
a
w
p<
o
ffi
0)
S
H
i
o
E-J
O
S3
H
ana
on 15
ens
onC
o» SS
enS
o»8
en 3
ons
enS
enS
ens
en 3
en9
ens
en at
en a
ens
en 51
en 3
en S
en 3
en 9
enS
en 9
en*
coy
en 3
en n
COS
cnS
e»3
on n
c»$i
e»R
en n
cnR
en*
en R
o>S
0>R
er>K
en K
cog
en 2
en 5
en c
o»S
en B
en :£
enS
en ~
en 2
en •»
en <•
en r»
en «•
en •<•
en ••
en •%
O9 w
en —
.
£
-H
X
m
1-1
X
«n
81
O
tj
O
3
|
~
n
3
I
0
c
M
z
rt
•
X
•H
O
?
X
•-«
-5
2
8
S
3
X
«
•H
S
c«
8
C
»5
C
IS
K
T£
-------
Validation,
Compaction, and
Analysis of Large
Environmental
Data Sets
By John Jalickee
Jerry Sullivan
Richard Rozett
EDS scientists have developed a tech-
nique which, among, other benefits,
allows them to compact a data set of
184,000 values into an equivalent
data set of fewer than 6,000 values,
while retaining 90 to 98 percent of
the variability of the original data
fields. Moreover, much of the remain-
ing variability appears to be sensor
noise.
Introduction
Large-scale environmental field ex-
periments such as the Barbados
Oceanographic and Meteorological
Experiment (BOMEX), the Interna-
tional Field Year for the Great Lakes
(IFYGL), and the GARP (Global
Atmospheric Research Program) At-
lantic Tropical Experiment (GATE)
produce huge data sets and attendant
large-scale problems in data valida-
tion, analysis, and synthesis. New and
more sophisticated techniques are
needed to extend and complement
traditional methods when working
with such large data sets.
The failure of conventional smooth-
ing techniques to adequately remove
noise from an IFYGL rawinsonde
(atmospheric sounding) wind data
set and still retain meaningful,
though highly variable, natural fluc-
tuations led the authors and other
scientists of EDS' Center for Experi-
ment Design and Data Analysis
(CEDDA) to try a new method,
called the asymptotic singular decom-
position method, or ASD for short.
The resulting computer program elim-
inated the noise and retained the es-
sential data. It also greatly reduced
the size of the original data base and,
through intermediate graphics, pro-
vided a quick and efficient method
of error detection, while isolating
physical relationships and character-
istic patterns.
The ASD Method
The idea behind this data decomposi-
tion technique is to extract meaning-
ful information in the form of char-
acteristic patterns. As an example,
consider a meteorologist studying
daily maximum temperature data for
the east roast. Station by station, he
observes that, in general, it is warmer
in summer lhan in winter: from this
he abstracts a typical seasonal vari-
ation. On the other hand, studying
station-to-station variations, he notes
that temperatures are generally colder
in the north than in the south at al-
most any time of year. With these
two characteristic variations (space
and time) lie can qualitatively explain
the main features of the entire data
set. And b) retention of a relatively
few significant temperature \alues he
could quantitatively describe perhaps
')() percent of the east coast maxi-
mum temperature field.
The ASD data decomposition meth-
od adapted by CEDDA formalizes this
process and provides a technique to
calculate characteristic patterns for
small and large data sets. I sing the
ASD method, dominant patterns with-
in the data are easily extracted in an
objective, repeatable fashion. In many
respects, the science of ASD is much
akin to the art of the caricaturist: the
major features of the subject are
quickly shown with a few sure, deft
strokes.
CEDDA scientists have used ASD
to reduce the quantity of data needed
for a sufficient representation of a
physical situation: often the equiva-
lent data set is an order of magnitude
smaller than the original one. Data
generated by the method also are used
in calculations that require relatively
noise-free data: random noise is
smoothed out, while real discontinui-
ties or sharp changes are relatheh
unchanged. An unexpected bonus of
the method is its error-detection ca-
pabilities: keeping with the caricatur-
ist analo<:\. distorted (erroneous^
features stand out sharpK. Physical
relationships within the data, often
buried b\ the volume of numbers, are
also highlighted h\ the method.
The ASD method is related to
other statistical techniques such as
principal component analysis. Lor-
enz's1 empirical orthogonal functions
in meteorology, and the factor analy-
sis method of psychologists, political
scientists, and sociologists: however.
ASD has the advantages of simplicit\
and accuracy. A factor analysis com-
puter program might fill over a thou-
sand punched cards, while ASD
would use a hundred. And ASD is
almost immune to computer roundoff
error, an important consideration
when large data sets aiv in\ol\ed.
192
-------
x
t
*
•=
*
0-
ft,
:
a
C,
L.
a.
18
-2*. 3
-22. C
-Jfl. 5
-15.5
-19. 0
-17.9
-io. e
-Iv. 7
- It. t
500-13.2
-12.0
-10.8
— 5.0
-8. -3
-e. 3
-7. 7
-7.1
-o.5
-b. 0
400-6.0
-"•. 3
-".. 0
-3. 5
-2. 7
-<:• u
-1. 5
, D
1. D
300 l.o
2.7
3.1
3. a
t. 5
t. 9
3 . t
o.O
o. 7
7. 0
200 7. 7
0.2
0.3
3.9
9.2
9.6
9. 8
9.0
5. 0
9.3
100 S.t
9.o
5. 5
1U. 1
9.8
9.5
9.t
9.5
9. *»
5.b
0 5. 8
1 21
-23. J
-22.5
-21.5
-2j.9
-20. U
-19. u
-17.9
-I'.b
-15.2
-»U.C
-12.6
-11. t
-9.3
-b.b
-5.1
-f .'
-/ .0
-o. u
-3.7
-5.2
-t.5
- 1 . t
-3.5
-2.°.
-2.3
— 1.3
• d
1. c
2.6
3.5
t.2
t . /
j.l
5.5
6.2
7.0
7.2
8 • u
8.5
9 ,j
9 ,u
9.5
!<..<•
10.5
11. t
11.7
12.0
12.3
12. b
13.0
12.}
li.b
1 ?. 1
11.6
11. H
11.3
11.0
10 .b
Nov. 3
00
-2o.l
-2t.t
-<: J.I
-fink
-20.3
-10.9
-17.?
-16.3
-It. 9
-13.6
-13.0
-ll.o
-in /
i. J • /
-9.9
-9.2
-e.s
-7.6
-7.1
-o . 2
-5.3
"* J • (l
-•». t?
*" «* « *t
-3.5
-2.9
-l.o
t C
1. 3
1.9
3.T
t.2
5.2
6.1
/.I
0.2
9.2
10.1
n.?
11.1
10.5
1U.7
10. C
10.2
10.3
10.7
11.0
11.6
12.3
12.5
13.3
13.5
1-..5
It .5
13.0
13. t
11.2
13.S
13.o
12.0
03
-25.?
-23.0
-22.6
-21.0
-20.0
-19.3
-10. J
-17.0
-lo.2
-l^.o
-It. 1
-13.t
— 12.0
-10 . y
-9.6
-9.7
-o. u
-7.7
-O.O
--•. t
-t.3
- '. 1
-c. 5
-1.9
-1.0
-.8
. 1
.ft
1.3
i. 5
.9
1.8
...9
T ^
t.5
5.0
5. U
-.. . 7
t.5
fa. 9
7.6
7.9
0.7
5.2
5.7
5.0
1C. t
11.0
11.7
12'. t
13. u
13.5
1^.3
12.8
12. 7
13.2
1-..5
13.1
12.1
' Oft
-2i..t
-25. 0
-23.7
-22.0
-22.3
-21. t
-20.0
-10. 3
-10.2
-1 7.2
-15.9
-It. 8
-12.0
-1 !.!•
-1 C.7
-9.t
-S.t
- / • 3
-6. 5
-3.3
-t.7
- ?. 9
-3.<
- 3 . t
-t.t
~ 3 • i
-3.7
-3.3
-t.6
-3.9
-3.'
-2.'
-2.0
-l.o
-••i
-.a
-. ^
.2
.1
1.7
2.t
3.7
t.l
t . 0
5.o
6.2
7.1
7. i
3.7
9.7
10.3
11.1
ll.l
11.7
12.1
12.o
13.2
13.1
12.0
Time
09
-2*. 7
-28.7
-27.l)
-2o.O
-25.1
-2U.1
-?.>. 1
-22.0
-2i . 9
-Zll.i
-18.7
-18.?
-I5.o
-la.O
-li./'
-It. 0
-13. t
-13.0
-12.0
-11.0
-lu . 3
-•3.3
-K.7
-b.3
-9.f-
-5.1
-l.«
-b.t
-7.5
-6.2
-7.3
-7.3
-7.C
-3.6
-3.7
-2,0
-2.0
-.7
_ C
.5
1.9
?.6
3.3
t.l.
5.2
6.1
7.1
8.5
8.7
9.9
lu.1
15 .6
11.1
11.7
12.1
12.1
11.7
(GMT)
12
-jl.5
-32.'
-33.7
-32. 5
-31.o
-J0.1
-?Q . U
-27,-n
-2o . e
-25.^,
-25. (
-23.5
-.2.0
-21.7
-tG.O
-20.0
-19. *
-18.2
-17.3
-io. ti
-16.5
-15. i
-It. 9
-In. 3
-1J.9
-1^.-
-11.0
- 1 9 . r
-10. u
-9. 3
-S. b
-s.a
-< . t
-6.?
-6.0
-5.5
-t.9
-t . 0
-o. 1
-2.5
-l.o
- . D
. 1
1.0
A. 0
2.0
^, c
1.2
t.l)
t, ,7
3.6
6.2
7.3
8.?
8.9
3.9
iu.6
10. j
15
-29.
-29.
-3u.
-29.
-28.
-30.
-29.
-27.
-2o.
-25.
-2t.
-23.
-22.
-21.
-2u ,
-19.
-16.
-17.
-16.
-15.
-15.
-lu.
-Io.
-lj.
-12.
-11.
-in.
- ^ .
-9.
-9.
-8.,
-7.
-6.=
-b.c
-t-.t
-H. "
-t *
-3.
-0.
— ? .
-1.
-1 •£
~ • C
• C
1.-
1 • *•
2.
3.
3.
5.
5.-
5.
b.
7.1
8.
9.
9.
18
-25.5
-?i* . t
-Ct. 1
-23.0
-23.5
-22. t
-? 1 . t
-i J.I
-15.2
-.a.i
-If .U
-16. 1
-15.2
-It. -j
-13.3
-12. t
-11.7
-13.0
-1L.1
-8.9
-1.3
-8.'.*
-1.9
-1 . 1
-1.1
-.H
.0
- . b
- ?.l
-3.5
-U.J
-3.8
-3.6
-3.7
-"..5
-U.I
-3.t
-2.7
-2.2
-1 .1
-.9
-. J
. .>
,S
1. J
1.9
2. a
1.7
2.3
3.i
3.*
t.7
E.5
8. D
Nov. 4
00
-25. 9
-2t.3
-2^.9
-21.1
-±9.7
-IS. t
-If. 3
-lu.U
-lo.l
-lp. 0
-It. 5
-13. T
~ A 2 • 0
-11. S
-li.O
-10.1
-?. J
-0.2
-0.9
-0. 1
-5.3
-3.2
-t . T
-t.O
-3.2
-2.7
-1.9
-2.3
-1.5
-1.2
-1.1
-. 8
-1.0
- A . 0
-1.9
-'.U
-'.3
-2.5
-3.2
-,.1
- 1 . b
-t.5
-t.t
-1.7
-3.2
-2.0
-2.0
-1.5
-1.3
-1.2
-.8
-.2
.3
1.1
2.1
2.9
o.b
t.3
•j.2
03
-26.1
-•",.7
-?o.3
-21.8
-2C. H
-15.1.
-19.3
-1< .9
-16.6
-15.7
-11.. 7
-11-. 0
~ 1 ? • 2
-12.
-11.
-1C.
-9.
-6.
-7.5
-6.7
-b.2
-5.2
-U.6
-t. 2
-3.6
-2.8
-t.9
-L,q
-t.7
-<•.!»
-3.9
-3.9
-3.6
-<«. 1
-"..7
-t.l
-3.6
- 1, 7
-1..8
-6. 3
-b.6
-E.2
-t.5
-t.5
-3.0
-o.C
-2.t
-1.6
-.8
-.1
. <4
.9
.5
-.0
.5
l.C
1.7
i.b
3.7
Figure 1. Upper-air temperature data for Stony Point, New York.
Data Compaction
Data from IFYGL for 1972-73 pro-
vide some vivid illustrations of the
benefits of ASD applications. To dem-
onstrate the data-compacting capabili-
ties of the ASD method (plus the
method itself), consider 12 successive
1FYGL ravvin^onde launches from sta-
tion Stony Pt,, N.Y.. for the period
1800 GMT Nov. 2, 1972, to 0300 Nov.
1. 1972 (fig. 1 ). Temperature values
are pivcn for each 10-mbar pressure
level, so that up to the 590-mbar level
we have 12 X 60 ~ 720 values. (The
pressure variable used in all figures
is P*, the difference between surface
pressure and observed pressure, i.e.,
P» =P,ur,.r,.-P.) The particular
time period was chosen because a
sharp upper-air trough was passing
over Lake Ontario, producing the
characteristic temperature variations
represented by the solid lines in fig-
ure 2.
The object of ASD application in
ibis instance is to replace the 12 col-
umns of 60 numbers with 1 column
of f>0 numbers and 1 row of 12 num-
bers, as in figure 3. In the latter illus-
tration, the column represents the
pressure dependence of the tempera-
ture soundings, while the row repre-
sents the time variation. To obtain
the 350-mbar temperature for 0000
GMT on November 3. one would mul-
tiply the 36th number from the bot-
tom of the column by the 3d number
of the row (as shown in fig. 3), or,
193
-------
500-
400-
A
L.
5
i
y—v
300- ft.
200-
100-
Time (GMT)
0-
to get the 150-mbar temperature at
inOO GMT on November 3. multiph
the 16th column number from the
bottom ]r>\ the Oth number in the row.
Where did the column and row
come fromV t\u\ column and ro\\ of
numbers can be multiplied together
to generate a temperature field. The
best choice is one that minimizes the
sum of squared differences between
the generated field and the original
field. In practice, the ASI) computer
program begins with a trial column
and row, then generates successive
values until there is no further mini-
mization of differences between the
two temperature fields.
In the example at hand, the origi-
Figure 2. Time-height temperature
analyses for Stony Point. The solid
lino are based on the original
data set. the dashed lines on a
reconstituted data set.
nal 7'20 numbers. have been replaced
by 60 - 12 = 72 numbers, a 10-fold
reduction. The new field generated
by the row and column explains ap-
proximately ('0 percent of the varia-
tion about the mean of the original
field. The ASD method now mav be
used again to describe the residuals
of the original field minus the first
generated field, producing another
row and column. Vsuallv. about OM
percent of the original temperature
held variation i? covered li\ three
rows and columns. The broken line*
in ft;:lire 2 show a it-constituted tem-
peratuie field using thiee iow- and
columns.
Overall. CEDP \ scientist- wen-
able to compact 60 levels of tempera-
lure, humiditv. and wind \.ilues from
768 IF^GL upper-air -oundinns i (>
stations. l'2o launches each i. a total
of 184.000 values, into an equivalent
data set containing fewei than 6.0(H'
values. From 1X> to «>{'> peicent of the
characteristics of the oiiginal field-
are retained, and much of the une\
peeled variabilitv appeal* to be sensoi
noise.
194
-------
Error Checking
Figure 4 illustrates ASD's error-de-
tection capability. Obviously, the
sounding for station 2 differs greatly
from the soundings for the other five
stations. Figure 5 shows the time
components corresponding to the
pressure component of figure 4. Once
again, a strong anomaly (circled
values') shows up. The six soundings
indicated were checked and did prove
to be erroneous. Thus, a 10-second
scan of these two ASD graphs isolated
an error that previous!) had escaped
detection.
Physical Relationships
Three station pairings stand out
clearly in the lower levels of the
soundings shown in figure 6. These
station pairs—1-2, 3-6, and 4-5 —are
geographically related. Stations 4-5
are on the western end of Lake On-
tario, 3-6 on the middle shoreline, and
] -2 on the eastern end. Figure 7, a
plot of the corresponding time com-
ponent, shows that the effect is most
pronounced for launches number 20
through 27. A detailed check of the
soundings from all stations for this
period revealed a large east-west wind
velocity gradient which varied from
2 m/s in the west to 6 m/s in the
middle to 14 m/s in the east.
Other Uses
With ASD. new data can be com-
pared quickh with older data ob-
tained by the same measuring sys-
tem. Drastic differences in the ASD
plots will suggest instrument drift
and/or mistaken assumptions about
experimental background conditions.
The same approach can be used where
different t\pes of instruments are sup-
posedly measuring the same physical
phenomenon. This type of application
allows CFDDA scientists to study the
very large data sets associated with
ecosystems and often, through simul-
taneous analysis of many different
kinds of variables, uncover hidden
interactions.
400
£ 300
^
jg
.9
^^
L
IE
s
200
100
Nov. 3 Time (GMT)
18 21 00 03 06 09 12 15
.77 84 .91 95 106 1.24 U9 IDS
27,0
110
101
9.2
85
7.9
73
6.6
5.6
54
50
4.5
3.9
3.3
2.9
24 *
18
1.6
13
7
2
2
6
8
1 2
1 7
23
3.4
40
46
5 1
57
64
69
75
7.7
80
83
18
.84
Nov. 4
21 00 89
AO 41 MS
94
98
0 98
Figure 3. An illustration of the ASD
data compaction technique. The
single column oj 60 numbers and
single row of 12 numbers replace
the 12 columns oj data appearing in
figure I, yet retain approximately
90'/i oj the details oj the original
data set.
Modeling and Experiment
Design
CEDDA scientists are pursuing other
potential applications of the ASD
method, including its use in modeling
and experiment design. The charac-
teristic patterns obtained provide im-
portant clues as to the physical reali-
ties underl)ing the data. We hope that
the pattern-detection capabilities of
the ASD method may lead to an em-
pirical, data-oriented form of system
modeling.
Another promising path leads to-
wards the economical design of field
experiments and data collection sys-
tems, based on characteristic patterns
derived from preliminary survey data.
Much redundant data and informa-
tion are often collected in large-scale
field experiments. If the redundancies
could be eliminated, all subsequent
data collection, processing, analyses,
archival, and dissemination activities
would be greatly simplified and more
cost-elfective. The ASD method, by
highlighting significant patterns of
preliminary survey data sets, could
suggest which data contribute most
to the definition of the patterns, and
which are dispensable.
Reference
1 Loren/, E. H., Empirical Orthogonal Func-
tion',
-------
Figure 4. Composite printout of
U-components of the wind for 48
upper-air soundings taken at each of
six IFYGL observation stations.
0 S
ii
Figure 5. Time analysis of data from
figure 4 isolates six anomalous
soundings (circled).
196
-------
Figure 6. Composite of V-components
of the wind for 48 upper-air
soundings taken at each of six
IFYGL observation stations.
Figure 7. Time analysis of data from
figure 6 indicates that the pairing
pattern is most pronounced in
soundings 20-27.
197
-------
About the Article
and the Authors
JACK JALICKEE was thumbing
through a scientific journal in the
spring of 1973 when he came across
an article on the mathematical
theorem of singular decomposition.
It was evident that the theorem was
adaptable to the analysis of the large
data sets the EDS Center for Experi-
ment Design and Data Analysis
(CEDDA) was working with. This
was the origin of the ASD I Asv mp-
totic Singular Decomposition) meth-
od.
CEDDA analysis of atmospheric
data from the International Field
Year for the Great Lakes (IFYGLl
began in the autumn of 1974. Prob-
lems arose almost immediately. Di-
vergence calculations derived from
upper air winds did not make physi-
cal sense. (The calculation is a ver\
sensitive one. involving small differ-
ences of large numbers which contain
noise.) The data themselves appeared
reasonable and consistent with ob-
served weather conditions, \\bich
were highK variable. Traditional
analysis techniques could not resoKe
the problem: ASD did.
A native of Vi ashington, D.C., Jack
Jalickee worked his vva) through
Catholic Universitv I in D.C.I, receiv-
ing a R.A. in 1962. and a Doctor's
degree in 1966. both in Phvsics. Sub-
sequently, he worked as a research
associate and teacher at Northwestern
Uni\ersit\ in E\ anston. Illinois. A
Presidential Internship appointment
brought him "to EDS/CEDDA in
1972.
JERRY SULLIVAN v»as the man
having problems with IFYGL data
divergence calculations. His inhouse
paper on the subsequent resolution of
those problems through ASD applica-
tions provided the nucleus of the cur-
rent article.
Jerry received a Bachelor's degree
in Phvsics from Hol\ Cross College.
Worcester, Mass.. and his Doctor's
Degree from Catholic I nhersitv. He
Jack Jatickfe
t
Dick Rozett
joined EDS/CEDDA in the fall of
1970.
Fr. RICHARD ROZETT, S.J., is on
a year's sabbatical from Fordham
I'nivershv in New York. His previous
work and interest in the application
of statistical techniques to large data
sets led Fr. Rozett to come to
CEDD \. where he heads up its
MESA (Marine Eeosvstem Analysis)
Project. Since September 1974. he
has been working with Jack Jalickee
in collecting, devising, and develop-
ing ASD and similar techniques to
analvzp ecosv stems data sets.
Kcosv stems data sets are verv
large, complex, and highlv redundant.
They include plivsieal measurements
such as temperature, depth, pressure.
and the particle size of sand: chemi-
cal measures of oil. lead, phosphate.
aciditN. =>alinitv. nitrate, and carbon-
ate concentrations—not to mention
garbage and sewer sludge: and bio-
Jerry Sullivan
logical measurements such as the
number of barnacles per square
meter, or the percent of flounder with
fin rot. ASD and similar techniques
make it possible to massage the orig-
inal data into a simpler, concen-
trated. and more meaningful data set.
Dick Rozett earned a B.S. degree
in chemist r\ from Spring Hill Col-
lege in Spring Hill. Alabama, a M.S.
degree fiom St. Louis Vniversitv.
then studied chemical phvsirs at
Johns Hopkins in Baltimore. Md..
where he received bis Ph.D. in 1967.
Ordained a priest in 1062. Fr.
Rozett was an Assistant Professor of
Chemistiv at Fordham from 1967 to
1972. when he \sas made an \ssoci-
ate Professor. He is the author of
more than 30 scientific papers on
chemistrv and the statistical analvsis
of large data sets, and has partici-
pated in international scientific con-
ferences in I.eningiad. Lisbon. Kifis-
si.i l Greece i . and Kvoto.
198
-------
DATA VALIDATION FOR UPPER AIR SOUNDING DATA
AND EMISSION INVENTORY DATA
by
J.H. Novak
Environmental Sciences Research Laboratory
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
199
-------
DATA VALIDATION FOR UPPER AIR SOUNDING DATA
AND EMISSION INVENTORY DATA
J.H. Novak
A systematic approach to data validation requires that
several steps be taken during the design of a validation scheme.
For any set of data it is essential to be familiar enough with the
data collection and data handling procedures to be able to locate
all possible sources of error and to define a criteria for
distinguishing good and bad data at those critical points. The
next task is to determine which techniques can be used most
effectively in error checking, and what course of action should be
taken if an error is detected. Finally, after the validation
scheme has been implemented the quality of the validated data
should be assessed in some manner.
Therefore the first step in the validation of RAPS upper air
data was to determine all possible sources of error in the data
handling system. The upper air data consists of two types of
observations, Pibals and Radiosondes.
A pibal is a pilot ballon which is filled with helium to an
exact pressure in order to insure that it will rise with a known
ascension rate when released into the atmosphere. An observer uses
a mechanical device known as a theodolite to track the balloon by
recording azimuth and elevation angles at 30 second intervals.
These angles are then used to calculate wind speed and direction
at various heights above ground. There are two possible sources of
error during this phase of data collection. First, the observer
200
-------
may read the angles incorrectly during the sounding and second,
transcription errors may occur when coding the data onto forms for
keypunching.
The radiosonde is similar to a pibal in that it is also a
balloon; however, a package of instrumentation containing various
meteorological sensors is attached to the balloon which is tracked
electronically instead of manually. In addition to the azimuth and
elevation angles, pressure, temperature, and relative humidity
readings are recorded. A variety of thermodynamic parameters can
be determined from these measurements. There are several
potential sources of error associated with the soundings :
electronic difficulties, sensor malfunction, calibration,
misinterpretation of the strip charts, interpolation of the
adiabatic charts and transcription errors.
Once all possible sources of error have been determined and a
range of good and bad data defined, various techniques can be
chosen to search the data for possible errors. The upper air
sounding network's(UASN) preliminary quality control program
contained the following tests on the raw data:
1. Routine data checks - data was checked for completeness and
compared with known data(e.g. station date and time vs a
performance matrix, station # vs station height, balloon
weight vs release time)
2. Consistency checks with alternate data source(e.g. wind data
vs station log books, doubtful data vs weather maps and
recording barograph)
201
-------
3. Intra-station checks with previous and following soundings.
4. Inter-station checks with simultaneous soundings.
5. Checks with known meteorological relationships(e.g.
comparison of temperature and relative humidity with adiabatic
charts, shape of the pressure-altitude curve).
The actual key punching of the data forms introduces another
source of error. But at this point the data checks can be
computerized, so that all data will routinely undergo the same
tests. The UASN data validation programs test the data for order,
range, missing values, station height, and special conditions such
as calms or wind speeds greater than 40 meters/second. Again,
additional checks can be performed on the radiosonde data when
special relationships exist(e.g. inverse relationship between
pressure and time). The advantage of computerized error checking
is that the entire data set can be objectively evaluated.
Once the known data errors have been flagged and corrected,
the next step is to archive the data. During this phase both
printouts and printer plots are produced in order to provide
additional information to be used for error detection. The printer
plots of speed, direction, temperature, dew point and atmospheric
pressure can quickly be scanned for remaining inconsistencies.
In an effort to further validate the UASN radiosonde data ,
dew point, relative humidity and vapor pressure calculated at
sites 141 and 142 were compared with corresponding data recorded
by the national weather service at Lambert Field. Correlation and
202
-------
incidence matrices were also calculated for the same three
parameters on a seasonal basis. Both types of preliminary analysis
proved very effective in isolating some remaining data problems.
The final quality assurance effort produced Calcomp plots of
wind speed, direction, temperature, potential wet bulb temperature
and mixing ratio for each of the 5,717 UASN radiosondes. Each of
these plots were scanned for data errors and used to determine
mixing depths for the St. Louis area.
In summary, the important concepts to be derived from the
previous discussion of validation techniques used with the RAPS
Upper Air Sounding Network data are:
1) Determination of all possible sources of error in the
collection and data handling.
2) Use of alternate sources of data for consistency checks.
3) Use of intra and inter station comparisons.
4) Use of known relationships(meteorological in this case) for
compar isons.
5) Completeness and objectivity of computerized comparisons.
6) Use of preliminary analysis routines in error detection.
7) Use of computer graphics.
The second topic for discussion is the validation of the RAPS
emission inventory. The main objective of the RAPS program is to
provide a body of data (emissions, meteorological, air quality,
etc.) which could be used to develop, improve and validate air
quality simulation models. The first priority is to determine what
203
-------
accuracy is required in any data base to be able to achieve the
objectives of RAPS and secondly, what accuracy, precision, and
bias currently exists in the RAPS emission inventory. The answers
to these questions are too complex to be addressed in this paper,
but they are essential to the design of a good validation scheme;
therefore I have included(as references) a list of papers which
discuss this important question of accuracy in detail. Thus, from
this point I will limit the discussion to the procedures that were
chosen to verify the accuracy of the acquired and estimated data.
The RAPS emission inventory is composed of three separate data
bases: (1) point, (2) area, and (3) line source. The choice of validation
\
technique depends on the amount and form of the data in each data
base. The point source data base contains hourly, daily, monthly
and annual raw process data ; no emissions are stored in the data
base. The methodologies used to calculate emissions and determine
temporal resolution are applied at data retrieval time. In
contrast, the area and line source data bases contain annual
emissions. The methodologies used to calculate emissions have
already been applied before the data was entered into the data
base. Temporal apportionment is accomplished through the retrieval
software. As usual, checks must be performed on raw data at their
entrance into the data handling system . For the area source data
base, this implies checking the raw data inputs to the methodology
programs. There are seven source categories for the area source
inventory - river vessels, fugitive dust, highways, railroads,
stationary residential and commercial sources, off-highway mobile
204
-------
sources, and stationary industrial sources and airports. The software
for these source categories was developed by several different
contractors and therefore must be reviewed independently. Area
source data is mainly checked for internal consistency within each
grid. Parameters such as population, number of homes, amount of
water area per grid, agricultural acreage etc, are compared with
each other in terms of overall land use per grid. Typical errors
that were found include a 1 KM square grid which contained over 2
million acres of tilled farm land and a grid with population of
180 and only 11 single family homes. Calcomp graphics was heavily
used in the validation of line source data. Line sources and
associated characteristics such as average daily traffic,
functional class, etc. were plotted on gridded maps to the same
scale as county roads and DOT maps. Overlaying these maps provided
an excellent means of checking the raw line source data.
In contrast with the area source data base, the point source
data base contains all the raw data for emission calculations.
Because the raw point data includes temporally-distributed process
data for the entire study period in contrast to annual county
statistics for area data, the amount and type of point data must
be taken into account when choosing a technique for raw data
validation. Parameters which apply at the stack level and
therefore do not have a temporal association can be manually
verified against original plant data. These parameters include
stack and fuel characteristics, operating patterns, stack test
data, and applicability of the SCC to a given stack. And because
205
-------
of the small amount, monthly process data was verified manually.
In order to perform a reasonable check on the remaining data
a random selection of representative sources were chosen. The
prime determinants in the selection of test sources were the
method of emission calculation and the time interval of reporting
the data. One source from each combination of these two factors
was chosen to insure that all paths in the software would be
exercised. The following tests were performed on the selected
sources: 1) manual verification of process data, 2) verification
of diurnal, weekly and/or seasonal variations, 3) hourly and
annual retrievals. Computer software was developed to check all
process data in the point data base for consistency and continuity.
Finally, all test software runs were compared with hand
calculations and the retrieval programs themselves were compared
with the documented methodologies.
In summary, the important concepts to be derived from the
above discussion of the validation of RAPS emission inventory data
are:
1) Preliminary determination of required accuracy.
2) Analysis of current accuracy.
3) Selection of validation techniques by:
a) amount of data
b) form of data
c) availability of supporting data
d) significance of data to the overall accuracy
e) availability of time and personnel
206
-------
REFERENCES
Kock, R.C. et al, "Validation and Sensitivity Analysis
of the Gaussian Plume Multiple-Source Urban Diffusion
Model", NTIS Publication Number PB-2Q6951, Geomet Inc.,
Rockville, Maryland(1971).
Ditto, F.H. et al, "Weighted Sensitivity Analysis of
Emission Data", Final Report, EPA Contract # 68-01-0398(1973)
Littman, F.E., S. Rubin, K.T. Semrau, and W.F, Dabberdt,
"A Regional Air Pollution Study(RAPS) Preliminary Emission
Inventory", SRI Project 2579 Final Report, EPA Contract
#68-02-1026 (1974).
Gibbs, L.L., C.E. Zimmer, and J.M. Zoller, "Source
Inventory and Emission Factor Analysis", Volumes I and II,
Final Report, EPA Contract # 68-02-1350 (September 1974).
Ruff, R. E., P. B. Simmon, "Evaluation of Emission Inventory
Methodologies for the RAPS Program", SRI Project 4331,
Final Report, EPA Contract # 68-02-2047 ((1977).
207
-------
VALIDATION OF BIOMEDICAL DATA THROUGH AN
ON-LINE COMPUTER SYSTEM
by
Larry D. Claxton
Health Effects Research Laboratory
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
209
-------
VALIDATION OF BIOMEDICAL DATA THROUGH AN
ON-LINE COMPUTER SYSTEM
L.D. Claxton
INTRODUCTION
Within the biomedical disciplines there are a variety of testing pro-
cedures used routinely within many separate laboratories. Since health,
research and regulatory decisions are being based upon the results from
many laboratories, there is a basic need for assuring the quality of the
data. In the area of microbial mutagenesis, the use of Salmonella
typhimurium as an indicator organism for mutational events is employed
by many laboratories across the country. The various procedures available
are rapid, relatively simple, sensitive and are used in a variety of
laboratory situations including private industry, government and university
laboratories. Presently, a great deal of emphasis is placed upon these
types of tests as prescreens for substances that may be human mutagens and
potential carcinogens. Therefore, the use of a system involving Salmonella
typhimurium could provide an excellent pilot study for methods involved in
data validation. Data validation is used in this context to mean the
process by which generated data is filtered and accepted or rejected by
objective criteria. Likewise, computerization provides a potential
means for systematically applying a predetermined set of objective
criteria in a rapid non-biased manner. With the use of TSO (Time
Sharing Option), portions of the data validation can be conducted during
the performance of a biological test. This article will describe the
design of a pilot system for the on-line computer assistance of testing
Also published separately as EPA-600/1-73-038, "Biomedical Data Vali-
dation Through an On-Line Conpucer System," Hay 1978.
210
-------
protocols and data validation. The scientific protocols and initial
computerization have been completed and the system will be tested in a
laboratory situation in the near future by the National Institute of
Environmental Health Sciences.
DESCRIPTION OF TEST:
From a variety of microbial mutation test systems, the suspension
test using a mammalian activation system was chosen because it is well
defined and is a quantitative test system.^- ' The more commonly used
Ames plate incorporation method is only semiquantitative. We also chose
to compare three strains of Salmonella typhimurium and a forward muta-
(2)
tion strain of K-12 E. colIi.v J In simple terms, the test involves the
combining of the bacterial strain with a compound and a mammalian activation
system into an Erlenmeyer flask which is incubated at 37°C for 30 minutes
to 2 hours. The bacteria are then separated and aliquots are plated on
minimal media for the detection of mutants and on supplemented media for
relative survival. Figure 1 provides a representation of the pilot test
presently used. Pilot tests are used to define more appropriate testing
conditions, and definitive tests provide data from which mutagenicity is
judged. For complete testing, the substance must be tested in several
strains of bacteria to monitor for a variety of different types of
genetic alteration.
SYSTEMS OVERVIEW
This program uses TSO and was written in COBOL with some additional
FORTRAN being integrated into the final program. All programming was
accomplished on an IBM System/370 at the Division of Computer Research
and Technology within the Nationa} Institutes of Health, Bethesda,
Maryland.
211
-------
C/J
D w
O £
I
cc ><
=> 2
to I-
oc
111
>
LU
QC
OQ
O
CC
O
cr:
O
CO
O) Z
s- o
3 HH
cn co
Q-
co
co
O
_l
I—I
ex.
u_
o
o
I—I
5
LU
CO
LU
CsL
Q-
LU
UJ C
1— f
CO §
— co
tt^Jig I-*
i
O
+ 1
1 I
"} =
oaf >
O
1
z
3 O
o o
^ r*
± CO
-------
For ease of programming, the task was divided into three individual
programs (Figure 2). Information, needed prior to testing of a parti-
cular substance, is stored with the use of Program 1. This program also
supplies a number for the blind coding of the compound. The second
program provides for the technician the proper form of the basic proto-
col , performs certain "within-experiment" calculations, accepts the
input of data from the tests, and evaluates the test by predetermined
objective criteria. The ability for the central laboratory to monitor
the accomplished work and recall any pertinent data is provided by
Program 3. A more precise description of the program is available. '
Quality Control Through Interactive Computerization
One of the basic premises of quality control is that good data
yields good decisions. By monitoring the quality of data during an
experiment and providing feedback to the technical personnel, both
personal bias and technical variation can be reduced. With an inter-
active computer network this can be done. This pilot project demon-
strates these capabilities in several ways. First, the compound to be
tested is coded and only essential information for the test is provided.
Secondly, certain other variables, e.g., concentrations of various
components, are predetermined for both the pilot tests and definitive
test. Within this testing system, two pilot tests are conducted to
determine levels of toxicology and potential mutagenicity. From this
data a narrower range of concentrations for the definitive tests are
calculated by predetermined rules so that there are a limited number of
213
-------
o
Z
GO
UJ
I-
Z
O
I-
<
00
o
cc
o
EXPERIMENT
PERFORMED
00
< <
_l
r TERMINAL
c -^
M -J D
. o a.
o o g
a j73 ^
a. h- <
a>
UJ
o
GO
GO
Q
UJ
N
CC
UJ
H
D
Q.
S
O
o
cc
o
LU
t-
00
< o cc
CC UJ O
O -J a.
CQ uj 2
214
-------
definitive concentrations used across all laboratories. Next, the
computer performs any needed calculations during the performing of a
test thus lessening the occurrence of potential computational errors.
Some of the calculations performed for this system are: (1) bacteria
per ml solution based on a standardized spectrophotometer curve, (2)
variance for the weights of animals used in microsomal S-9 preparation
(if outside normal limits, these will be rejected), (3) calculation of
liver weights and amounts of buffers to be used in microsome prepara-
tion, and (4) calculations for the dilution of samples. Final data
validation is also performed automatically upon the final data output.
The computer's ability for data storage and retrieval is very important
in this regard. For example, in this system, final results are recorded
as number of colonies per plate. This software program compares the
average number of colonies per plate for the controls to the past 100
accumulated controls to determine statistically if the controls are
within normal limits. After the statistical examination of the controls
the test is either accepted or rejected. If the test is a pilot then
the data is also used to determine the concentrations of test substance
to be used in further testing. All data are, however, recorded per-
manently. Rejected data are recorded so that problems can be analyzed
as they are encountered. A flow diagram for the areas within the de-
cision processes is shown in Figure 3. The TSO is the component that
allows for immediate technician/program interaction, thus allowing for
a rapid and constant quality control.
215
-------
o
a.
o
UJ
_j
CD
i—i
to
QC
0
• OO
oo to
UJ
O) C_>
i. O
3 QC
Ol Q_
0
O
LU
Q
LU
O
ce
rs
to
is
u
uj
0
"
O
CO
H <
uj E
2 ui
It
LU CC
a u
m
6
Z
CO
H<
UJ CC
Z UJ
u. t
UJ CC
Q 0
i
t
-i
OH"1
2 58
UJ UJ <
H a H
*
z
o
p
<
o
_l
UJ
1-
co
UJ
H-
Ik
CO
O
CO
U
UJ
O
O
UJ
N
CC
Ul
0.
u
«
a
z
<
z
- i
O wl
Z .LO
SoS
UJ _ <
H a. H
CO
t/ANTED
RAINS V
i-
CO
i
CO
Ul
H <
Ul CC
Z "'
iZ t
Ul CC
»
ro
0
Z
CO
UJ
I- <
UJ CC
Z UJ
It
UJ CC
QO
H
^ ^
OH
H
UJ U.
1- C
'
»
O H
2 =
* 2
CO U
1
col
<
1-
'
: col
> W>|
H
3
>
LU
CO
UJ
nr
»
z
o
H
D
_J
Ul
CO
UJ
H
T
J
216
-------
This prototype system demonstrates that interactive computer pro-
grams can be used to effectively increase the quality control of rapid
ui vitro tests. However, it is also apparent that the more simple
i_n vitro microbial mutagenesis tests such as spot tests and simple plate
incorporation tests do not require such extensive computerization if well
documented and detailed protocols are available. Since most i_n vivo
mammalian systems have extended experimental time periods, the time sharing
option would be of little benefit due to cost factors and experimental
design. However, even with the more simple i_n vitro tests and mammalian
cell culture tests, this system can serve as a model for data storage
and test evaluation for the purpose of quality control.
/4)
This paper was extracted from an EPA reportv ' which is available
through the National Technical Information Service, Springfield,
Virginia 22161.
217
-------
REFERENCES
1. Frantz, C. N. and Mailing, H. U. 1975. The Quantitative Microsomal
Mutagenesis Assay Method. Mutation Research 31:365-380.
2. Mohn, Georges, Ellenberger, J. and McGregor, D. 1974. Development
of Mutagenicity Tests Using Escherichia coli K-12 As Indicator Organism.
Mutation Research 25:187-196.
3. Claxton, Larry and Baxter, Richard. 1978. The Computer Assisted
Bacterial Test for Mutagenesis. Mutation Research (In Press).
4. Claxton, L. Biomedical Data Validation Through An On-Line Computer
System. EPA-600/1-78-038, U.S. Environmental Protection Agency,
Research Triangle Park, North Carolina 27711, May 1978. 10 pp.
218
-------
REGIONAL VALIDATION OF STATE AND LOCAL AIR
POLLUTION DATA
by
Thomas H. Rose
Region IV
U.S. Environmental Protection Agency
Athens, Georgia 30605
219
-------
REGIONAL VALIDATION OF STATE AND LOCAL AIR
POLLUTION DATA
Thomas H. Rose
SUMMARY
Two types of data auditing are performed on state and local data in the
region. One is directed. The goal of a directed audit is to verify a certain
value such as a violation of a standard. The other is undirected. The goal
of the undirected audit is to determine the quality of the data being gener-
ated. Both are systems audits but the undirected audit will have wider ram-
ifications. For the most part, I will address the undirected audit.
Each measurement system requires a different auditing path. I point
this out not to make the job sound complicated, but to emphasize the impor-
tance of having the auditor to be knowledgeable in the area of the audit.
The path of auditing will be determined by:
• the quantity and quality of records,
• the existance of an agency SOP,
• the availability of records,
•• on a macro geographic scale,
•• on a micro geographic scale,
• the system itself,
• the time frame allowed.
Thus you can see that the auditing process is tailored to the specific
system being audited.
In Region IV where every funded state and local agency is audited at
least once a year this is the approach that we take.
1. Establish the flow of samples and data through the system (from
the agency SOP).
2. Trace each parameter of the measurement process (volume, time, flow-
rate, etc.) back to the base standard and verify the quality of that standard.
220
-------
3. Verify that all measurements and transfers of data are documented
and follow reference methods.
4. Verify that all measurements, calculations, and data transfers
are accurate.
5. Provide feedback to the agency being audited of improvements that
could be made in the measurement process as well as the data handling.
One of the most important aspects of this audit is that the agency
itself has to participate and will itself determine the best corrective
action for their own system.
221
-------
DATA VALIDATION FOR THE LOS ANGELES
CATALYST STUDY (LACS)
by
Charles E. Rodes
Environmental Monitoring Systems Laboratory
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
223
-------
DATA VALIDATION FOR THE LOS ANGELES
CATALYST STUDY (LACS)
C.E. Rodes
INTRODUCTION
The Environmental Monitoring and Support Laboratory (EMSL) is very
concerned with the quality of data generated in its field studies. This
is reflected in the quality control measures employed by EMSL during
sampling and analysis, and the data validation performed before data are
released.
Data validation like other aspects of quality control requires
resource allocations, especially in terms of the manpower required to
complete the final validation. The amount or degree of validation
required is dependent upon the end use of the data. In a study such as
the Los Angeles Catalyst Study (LACS) which is primarily concerned with
long-term trends, the emphasis in data validation is to detect any
extreme outliers which would affect monthly averages. Since we do not
report maximums, our data validation philosophy for this study is
primarily concerned with those values that may affect long-term averages.
CONCLUSIONS
As the project officer responsible for the study, I initially chose
an acceptance error band of ±10% on individual measured values, hence,
this is also the error band of the averages generated from these numbers.
Given this requirement one should be able to assess statistically
the amount and types of validation required to prevent data reduction
and transfer errors from contributing more than 1 to 2% to this overall
±10% error. Unfortunately this area has really not been examined for
this study in any detail nor I expect for many other studies. The
present validation levels used for the LACS probably examine more data
than are necessary to maintain the desired error level, but in regard to
validation, I would much rather be conservative than embarrassed after
the data are released.
22k
-------
PROCEDURES
The main objective of the Los Angeles Catglyst Study (LACS) is to
develop ambient air data bases for sulfate (SO,), carbon monoxide (CO),
lead (Pb), and other mobile source related pollutants before and after
introduction of the 1975-model automobiles that employ catalytic converters.
The data from this study are being analyzed to determine whether the
catalytic converter has significantly increased the ambient sulfate
levels and/or simultaneously decreased the ambient CO and Pb levels near
the San Diego Freeway in Los Angeles.
The Environmental Monitoring and Support Laboratory (EMSL) is
responsible for all study-related functions including instrumentation,
operation, sample analyses, quality control, and data validation and
analyses. Since January 1976, the operation of instruments and analyses
of samples were performed under contract to Rockwell International or by
interagency agreement with the Lawrence Berkeley Laboratory. To assure
the quality of the data supplied by these two organizations, EMSL maintains
a comprehensive quality assurance program covering all aspects of the
study. EMSL issues periodic reports which discuss the trends and the
interrelationships among the various pollutant patterns.
The site locations in Los Angeles and the site layouts in relation
to the San Diego Freeway are shown in Figure 1. By selecting sites with
the prevailing wind perpendicular to the freeway, the cross-freeway
contribution to the ambient pollutant levels can be determined using
concurrent upwind and downwind measurements.
The data collected are classed as either continuous or integrated
depending on the measurement method. Continuous data are reduced to
hourly averages and integrated data are collected either over a 4-hour
or 24-hour period. The total data volume generated by the LACS is shown
in Table 1. Since the sites are usually shut down in December of each
year for routine maintenance, the data volumes are based on an 11-month
year.
The flow of samples and data are shown in Figure 2. All block
items except "Data Processing at RTP" and "Final Data Validation" are
performed by the contractor. Data validation steps taken by the contractor
are referred to as "pre-validation", while validation performed at RTP
under more direct EPA control are referred to as "final validation".
225
-------
_j _i OC
O O oo
»E
CC CC OC
Z Z Z
>
_J
<
z
<
fftc
CO UJ
< ce
CO UJ
SCO 2
O <
Q. oc co
-rr S =
Sco ^2
CO
>
CO
o
o
oil*
j
" >°1^>1
o — > S < uj —
UJ
o
<
u.
cc
=3
°£
a >
LLJ 4
5 5
< cc
co u.
!Z o o O ro .i.
CO O H- Z O S
ec
z
u
LU
o
•h
^ I
-1-1 SS
oo £
c
o
'+J
OJ
I
(U
T3
c
03
c
O
O
Q.
E
o
o
(U
T)
3
o
3
O5
zz S
OC OC CC
N
OC
60UJ UJ
— KJ -J
«S I
= co 5 »
UJ
a
fills6s
< Sigis s s=»;
JS
-------
Table 1. LACS YEARLY* DATA VOLUME
CONTINUOUS (HOURLY)
INTEGRATED (4-HR)
INTEGRATED (24-HR)
INTEGRATED (WEEKLY)
INTEGRATED MONTHLY)
SUMMER
70,080
20,160
9,000
816
120
100,176
WINTER
58,560
11,100
4,050
680
100
74,490
TOTAL
128,640
31,260
13,050
1,496
220
174,666
'ASSUMES OPERATION FOR 11 MONTHS/YEAR.
227
-------
CONTINUOUS SAMPLER OUTPUT
INTEGRATED SAMPLER OUTPUT
STRIP CHARTS
SAMPLES/
DATACARDS
DIGITIZER
PRINTOUT
SAMPLES
ARCHIVED
ri
SAMPLE
ANALYSES
PRINTOUT
DIGITIZER
DATA QC
CHECKS
I
LABORATORY
DATA QC
CHECKS
| j STRIP CHARTS TO RTP |
1
PREVALIDATED
DATA
DATA PROCESSING
RTP
1
FINAL DATA
VALIDATION
,
PREVALIDATED
DATA
•j DATA CARDS TO RTP
Figure 2. LACS Data Flow.
228
-------
Pre-validation by the contractor is performed in two areas - electronic
digitization of the strip charts and compilation of the analysis data in
the laboratory. A portion of the data generated by the electronic
digitizer are checked against manually read strip charts to verify
scaling and digitizer performance. At present 5% of the data are spot
checked in this procedure. The laboratory analysis results are compared
on the contractors computer listing against the data cards manually
completed during the analyses. All data (100%) generated in the laboratory
are checked in this procedure because of the importance of single integrated
values. We do not at present require the contractor to keep records of
the amount of data corrected during prevalidation.
Final data validation is performed at RTF following the general
procedure in Figure 3. This step in the validation is concerned primarily
with data transfer errors, but also examines data that are not consistent
(outliers) with the rest of the data base. In general all of the values
in approximately the highest and lowest 1.0 percentile are verified with
a check made at random of approximately 5% of the remaining data. These
validation levels were initially selected somewhat arbitrarily by the
project officer as a compromise between data quality and the amount of
resources required for the validation.
The final data validation procedures are based upon the output
formats used to list the individual data values. The three formats are:
(1) an hourly listing for continuous data such as CO and NO, (2) an
integrated data listing for samples averaged over 4-hour or 24-hour
periods, and (3) a summary listing comparing simultaneously collected
upwind and downwind data for freeway contribution. The general instructions
given to the data clerks are shown in Figure 4. A sample printout of
hourly data is shown in Figure 5, followed by the outlier limits in
Table 2 used in validating the hourly data. A sample of a 24-hour
integrated data printout is shown in Figure 6 with its associated validation
limits given in Table 3. A study is presently being made of the frequency
distributions of the LACS data to reassess the validation limits listed
in Tables 2 and 3. The starred (*) values on the printouts are values
determined to be outside ±3 standard deviations of the monthly means.
A sample of the summary format is shown in Figure 7.
For possible future validation requirements, portions of integrated
samples are stored at the contractor's laboratory, and the strip charts,
data cards, and final validated printouts are stored by EPA at RTF.
229
-------
RAW DATA FROM
CONTRACTOR
'
I i
STRIP
CHARTS
PREVALIDATED DATA
FROM CONTRACTOR
r i
F
C°RTD*S | OAT»nH>CE«M«G |
/
HOURLY
PRINTOUT
\
^ ^^
;i
•
r
24-HR
PRINTOUT
1
r
FINAL DATA
VALIDATION
\
SUMMARY
PRINTOUT
f
'
VERIFICATION
OF OUTLIERS
FINAL
VALIDATED
PRINTOUTS
Figure 3. Final data validation.
230
-------
Figure 4. LACS PRINTOUT VALIDATION INSTRUCTIONS
GENERAL
(1) VERIFY THAT BLANK SPACES ON PRINTOUT MEAN THAT NO DATA EXISTS.
(2) VERIFY THAT ALL ZERO VALUES (0.0) ARE REAL,
CONTINUOUS
(1) CHECK ALL HOURLY PRINTOUT VALUES THAT EXCEED THE OUTLIER
LIMITS AGAINST THE STRIP CHART.
(2) SPOT CHECK 5 RANDOM HOURLY VALUES ON EACH STRIP CHART (ONE
WEEK/CHART) OTHER THAN THE MAXIMUM VALUES.
INTEGRATED
(1) CHECK ALL 4-HOUR AND 24-HOUR PRINTOUT VALUES THAT EXCEED THE
OUTLIER LIMITS AGAINST THE SAROAD CARDS.
(2) SPOT CHECK 2 RANDOM VALUES FOR EACH POLLUTANT AND TIME
INTERVAL PER MONTH.
(3) CHECK ALL STARRED VALUES ON THE SUMMARY PRINTOUT IN COLUMNS
A, B, C, D, AND (C-A). IF ONLY (C-A) IS STARRED, IN ADDITION CHECK A, B,
C, and D.
231
-------
>•
w
lu 4
149
< 4
z «
0 t
IM
»— £
O U
Hi h
t»- «
o d
or M
O.
•
_J • :
« 1
»- t
z
UJ
* u.
O r-
ac c
1-4 1
>
z <
UJ 4
M
^
rg
*•
»-
>
a
BE f*
O lA
u. O
UJ
<3
) IM
M
O
»-4
0
1
1 2
Ui .»
u> «-
O rg
?i
i 1-4 rg
z •*
j»
.j
IM
<
A
rw
r»-
0
*"
O.
UJ
1/9
Z
o
^
-1 141
« 141
*$
a. u
u ••* ••
0 O
a
0 ci
oo
K. (V
•4t •"* O
O O
«- »
•A »- O
00
ro »•
IM •- »-
O 0
•- »-
«* r*. r*
O O
C •*
r ocl
». T2
4T- »- O
r> an KI
»- O a
oo rg
&• *o ^
oo
* *™r
06 w\ i/t
o ci
r^ *n trt
°°4
rout
•d ro *a*
O O
«• fM -*
«*. iA ro +9
D •-)
«- -*
•* >g -t
0 Cl
*) -1
Kl rvj O
O O
•- l/l
0 0
••* ro a)
o o
Ui iA
•g -o A|
r* CJ c J
rg rg IM
Ktl/MA
0 O O
4*O *O ***^
0 OO
^- lA «"*
0 O O
r-j rg ro
c*> u «— *
ro oo gj
*- IM IM
0 O 0
^* O fO
o o o
rgr- fi
r- IM IM
O O C.
O O O
*~ u-.r-
r- rg rg
0 O O
CO -O IM
o oo
f^ rg !>•
^ N^ IO
O O Cl
o ro o
tj\l (jf) ^
0 O < J
V* O O
** r~> ^
t> «- C-
SroT
O »» tJ
OO fO 0(>
o «- o
r^- o^- *o
« o
O O 0
• • •
o «» r^
ro ^j go
f i O t
.00--
>J K> r^
0 CJ 0
0 K> -»
r") ^r rw
r- o c
t *J ^O
ro •» r^
o O o
IA r^. r\j
O O C.
«- ~» gj
o o o
TJ N. >»
»- .» -o
t j 0 C.
«~ rg i>* ry IM
J iA -O iA gj
ooooo
IM sr rg O IA
ooooo
•43 O g/ lA IM
OQOOO
O g) .» >» «-
fC?c ms»c
0 r> « . LJ 3
*~ ao -* oo ro
O cj Ci ' J O
-» M A, ^- A,
K1 ^ *^ Kt *~
o o o o c-
ooooo
r\j *c ^i a c
r\J (\^ f\J ,1^ <|r_
o o o o c:
O r* fO Ki O
CM <^ K\ KV C4
ooooo
-4 M f^ »- o
rvj rn ^ ^ fvj
0 O C- 0 C;
••* -O f*» W~»
O '•-> O O
>4 ht T* O
O -O oo JO
o c> c c
l/^ w~ ^ Ow
«^) r- O1 o
O T- 0 f J
ao LJ »- t>
Q f>J r* O
«- •- r* O
Kl Kl -^ *-
JCJ i— ^ -O
o «- o o
O 0 0 « t
r** IA *o ^
** wi ^> IA
o o cj r
•* >» M ^»
J o cj o
0 u) .) ^
• (Wl ^ 1O
r) o c ) o
^ D wi (ij
IA ro IA ro
:3 0 O 0
IA •- r^ gj
a o o o
>O <4 Wl iT%
a '_. • i o
•CI ^» l/J >»
CJ C 0 Ci
IM rg
fM IM
OO
O r«t
OO
Co
4JJ— O^
•^ir?
O O
t- CJ
o o
oa «o
o o
c- c
OCJ
«- x<
«- c>
oo
rg ao
0 0
r-j >»
CM »-
Cl CJ
0 >»
•-J O
^ -0
C C)
JO rg.
r >»
O rj
•* Jo
c > cj
o o
o o.
ro to
r> o
'^^
Cl M
•0 r-1
O f I
>A -(>
t* t f 1
f > -J
art 0
rO f
<-•>.)
•0 v|
ro io
O O
N. <)
O u
>t «-
o o
J r-
r. c i
4r- <\j rg
rg rg -»
a o u
IA ro O
OO O
-» |A IA
< lA •»
°Jo ro
' •> CJ CJ
IA rv sf
cj o o
o o o
OO 0
0 «.;
T* V
O O
rg o
O O
•O «r-
^- f\J
Ci C
•o o
0 -J
^ ^
C' c
^ r°
u 1-J
^- ao
c_> c:
o o o
•~ rg o
O O e
.J O . 1
rj r j .-
'M ^^ *A
r J t i r
T- .N) LA
r- i r-> N*
Cj O .J
o- r- ro
i-\J fO ^
r. n i;
* i r^« -j
IM ro 'rf
0 T j
uj cj r^
C, CJ O
1- 1 ~T IA
l-l O J
"> ^4" l/^
I-' CJ 1
IM IM rg rv rw
rg m in IM rg
u O OO O
ro gj 10 rg g)
oo oo a
K> g» i»> IM g/
o O o o o
ro K> ro rg ro
r- IA o -o o
O K> O O O
ro c> •— •» o*
O O o O '-3
c, rg ro in ro
OOOOO
O O r- C »~
OOOOO
O •»• •- -4 C
CJ Cj v~ U ^
o o ci O o
0 Kt oarjo
OOOOO
o- 06 >» O rg
O •• *- ^* v-
c O o c. c
O *- w» O f '
J O ( 1 -J 3
»- ^* -^ 10 r^
CJ O O O C
^ r i? * ?
V_J l^ \J »• 1 1
IM iv* r*- ("J IA
f.; O 0 •- 0
O O O O CD
to »^> s* fM IO
ci ct c:i n o
o o o -} o
•- o o IA r-
rO r*> ro IM fM
c r» c - r> r
J-, o* rg o r-
i\i rg r* t r\j **
i > O O O c.>
r- rs. ro >» r^-
g '\J .-•> AJ ^*
• I ) I 3 Cl O
0 > - 1 » 1
rvj rg ro rg r\j
:J T ci ~J O
u- ro c i •* r-
c-. cJ O Ci cj
ro ro -J fM i lA OO
*A -^ tf\ & gj
cj O cj ci o
>* O Kl AJ (A
IA Kl •* •- J
OO O rg O
-«-«»- r- ro
O O o f 0
r«- oo on r- o
ro rg »- ro •-
^2^2^
C. C5 0 0 0
CJ IA T- lA U"V
OOOOO
•OIA r- rg rg
C O C- Cl O
OOOOO
K r-. c >* ro
^~ •"* IM ro T^
CJ O O O O
ao of gj rg •-
O O o O O
-* r1- rg ro KI
10 rg T- N. T—
CJ O C • O C
IA f* c> O f^
0 . » 0 ' 1 0
4T- C) CJ •- C-
c" ^ u" r4, o?
»- 0 0 «- 0
LA f>- 00 5 r^
«- o o »• o
»- o o T- o
oo -d o rg ^j
C' n o •- cj
-) T 0 T- o
o o o- r- a,
u> ui ^ ji r**
f O r • i ) n
b> IA .4 f*- Gv>
O CJ O O J
f^. IA 0 0 IA
u-\ Jf t O >
c r-1 i i r> i >
-o n> ,•> ,i M
g^ *4* i^* «o •—
'_) o n n •-
N* IN) r- u\ r^
O CJ V'J Cl «-
lA I/I >» INI 1*1
*,» o rj *- ^j
w m j •>• g
c ' i 1 i- j r j 10
rg cv rg fM rg
ro Kl IM KI T-
ro Kl O lA lA
OOOOO
gf f»fc o 30 O
IA oo f»- r»- »•
ooooo
K% 0^ OQ Qr* 1"^
o o o o o
*f ^ fig ^ ^-
*~ aO oO fit i/^
^3 Lrt O U^ CVJ
O O O O ' J
O «" *•• O1 O
o o o o ^
o- u P«> i/.
c^ o o o c
ooooo
0- O 0 ,» >»
O «• 1- 4f» «r-
0 CJ O O O
oa O - •*
ooooo
C< a. ro r>- c.
•- O g/ rg ru
c^ c; o o o
ro «™ •* ro •—
J C> 'J 00
^ |/\ 9^ QQ ^.
ro r* n >o *M
r c: *- o c
*, is £ r *'
'" i >. > •- O CT
r«- %» a. Ni
0 C) C. 0
r«- -o o r^ -o
OOOOO
-O r-1 ~O ^o o
O O Cl O O
1.1 1 1 J C» O
K K> Of «- C
ro »• O O uu
t • c) o ci o
^ .0 IM «A IA
-4> «i », i>t gj
-3 O U CJ C 1
IA ui r\i vj ^J
•O O * A
i_. n c' ci ••*
^1 M -O A Is-
r»> 1*1 g^ ^> IA
0 ^3 '3 0 »-i
» OO O
0 CJ (.i 0 C-
Kl "1 Is- -O XJ
• J ' J • J CJ • 1
g- u> r*- r*- «j
C. Cl VJ Cl U
oo
o
1CJ
-» IM
O IM
• •
Kl O IM
IA Kl •
O T-
• •
rg ors.
•^ K> K>
»_* T*
• •
r» 0 **%
rO ro fw
f > O
• •
fM CD grj
fs* f • *O
0 0
» •
aw c-K»
C C
• •
h» o ^
«- Kt -^
^ S
V 0 *
v* rw r*1
0 0
• •
jtj O rg
r* f\i m
0 0
• •
u^ o ra
IM r\j t*-
O o
» •
««O Qfl O"
f • P-J a)
> O
• *
-» a 0
C •-
* •
*" O CO
1 3 f^-
* *
aJ rj o
ci fM
• •
«p— *^- >*
r** «**a oo
O •-
• •
a 0s o
*A r>j rg
CT «-
4 •
tA f\( O
ul <-
* •
m o r--
-* "ij J&
r> n
• •
KI ;v » t>j a>
tJ 0
• •
*o i> *n
* M v>
Cl < J
• •
(\ > l vl
<>4 r>i «*
i ) T-
*
< J o r*
*/^ <^g iA
O v
« •
w> f-j CJ
J r-i
• •
u i c> r\i
tj > ' \j AJ
i_j ro
o
•f-J
c
'L_
Q.
O
.G
D.
E
03
to
iri
0)
L_
D
• O- O »- rg i
232
-------
Table 2. LACS CONTINUOUS SAMPLER OUTLIER LIMITS
LACS CONTINUOUS SAMPLER OUTLIER LIMITS
(ppm)
CO (CARBON MONOXIDE)
NO (NITRIC OXIDE)
N02 (NITROGEN NIOXIDE)
O3 (OZONE)
TS (TOTAL SULFUR)
WS (WIND SPEED)
SITE 008
25.0
—
-
-
ALL OTHERS
15.0
0.5
0.3
0.3
0.05
15A
AMILES/HOUR
233
-------
1
o
if,
IT
•a
u
Z 2
C »
— Y
t- t
*- •
li-' -
t-
c «
01 >
0. «
: £
•3 'O
'Z C
L. :
i a-
—
^
u
'•
\
i
i.
c
•
c
c
c
0
-
C
c
t
I r n P N i 4
_i (,
* <
;
vr
i C
t
ViJ
i in
5
i
<
.
-
]
c
3
•
£
"*
t
L
<
u
~
0
t
*"
c
t-
c
u
-z
u
(X
u
u
1
,
x c
J C 3
r c 3
r*
-
c
P
* O
c
ff
r
;
r-
_
c
c
a
^
~"
f^
"
r->
C
' "
o
f*
t
• '
! fV
: —
r-
-
i r
o^
(N
r>
_
c
&•
• — H
t- tr
!t -
;?i
-
,
'
=
C
!
r
C
"
g
f
,
'
C
L
-
a
r
x
:
r*
r
b*-
Lf =
r r
i C' <
'• — i
• Ul u
IT r
3- -4
(SJ ~-
*i 3" C1
f-i K
J
CM C
_* »
, X X
k tl t
i^;
r"
i
i
CV.'
r -
fx -
• C1 C
' 3 1
er f
cr r»
'
*^* u
rO 5"
i 1^ C
— c
_-
X X
j CJ C
(^
r. 3
C T
• -t r
C^ 3
O (
1^ -
o- a
<
i rg a
.-•
- CS! 1*
I O C
x x
«SI fS>
1 c r
> X X
t;
t. a
i -
t <
^ r- c
IT 0
r- 3
1
3
! 0- O
'
*s cc
^ oC
1 C C
•s >•
c c
C' -
o c
* *
& tr
IsT ?s
xC
M 3-
rs —
1/1 (S
•~* ^<-
ec (*•
X X
fsi rs,
C' C
X X
fs- Js.
1
1
1
f" f
Cr C
-c.
t c
fv *J
cc -t
— .
I^-L f
t_' t
0- O-
c -
(N (S
X X
IS- c
X X
rs. i^
fs, fs
j
i
c o-
r rj
*
cr -t
C (S
r c
r c
o -
<*: u'
0- C
00 sO
(V IS
r -i c
TS |s.
rs ^^
;
r .
-c is
f ^
t r
C C
— 0
o r-
^ iT
U i.
^5 rs
X X
fsi r\i
t, c
x x
t-s rs
r. rs
i
I
''
i
1
c
f
L,
c
-t.
^
c.
ts
a
_•
cr
ISI
IN
r
x
rs
r~.
I
i
-
r
-
c
c
(Si
•c
{£
IT
*
ISi
_
*-,
O
?"
X.
^
+•
C
C
*^
c
L
c
s.
^
r
(
1
I
C
L
1
234
-------
Table 3. LACS INTEGRATED SAMPLER OUTLIER LIMITS
TSP (SUSPENDED PARTICULATES)
NOa (NITRATE)
SOi (SULFATE)
NH4 (AMMONIUM)
Pb (LEAD)
S02 (SULFUR DIOXIDE)
24 HR
200
30
30
3.0
8.0
50
4HR
300
30
50
3.0
12.0
-
235
-------
rv
FV
V.
f
_
X
O
c
u
z
4
ct
cr
>-
UI U
•-• z
I/I UJ
>- u>
_l 4
S H
«• c
^
_l 1-
4 U
u u
*-• >-
z c
x a
o a.
UI
»- _
4
O 1-
z z
4 U
Z
_l Z
4 C
^J Q;
— —
UI Z
— u.
1-
4
l>—
UI
>-
o
3
IT1
• »-
IT
>-
1 _(
4
»-
4
U
UI
UI
_J
12
z
4
I/I
O
_l
—
3"
CM
1
O
CNJ
cr-
eo
CM
CM
Q
4
UI
_l
U
m
INI
— •
tt
UI
I-
UI Z
I O
—
u n.
x cc
13 C
r v
cr
z <
•—
c
z -
c z
1— 1-
4 4
or
h-
Z _l
UI O
u >
Z 1
0 —
O I
t
I
t
1
1
1
t
»
t
»
II
t
1
H
o a
i
i
- H
1 U
1 C
1
1 t
ICO 4
1 ••»
IV, _
1 U
1 I
1 >
1 1
0 -1
1 U
a
i
i »
10 -
u
1
1
i CO _
1 u
c
1
c
1
1 1-
•-
1 U
1
1 1.
> u
1 «-
V
1
a
u
u
4
1-
V
O
X.
*
1
0
c
o
3-
CM
0- 0
0 -
c
o
CM
•
3"
C> f
t —
O
rv
0 O
CM rv
rv «
c. o
CM CM
O O
0 C
CM C
< 1 C
o a
o m
in in
o c
CM CM
c a
o-
T
1
f>
1
O
1
C
3-
O O
O -
1
o
-
o
-0
3"
D
O
O O
rv 00
co a>
C C
CM CM
|0C
o-
T
*
t
Cl
1
tl
CN
o-
0
1
o a
— CN
C
c
c
c
-o
c
0 C
rv -o
o c
CM CM
o c
o c
— c
- .J
0 0
c —
CM CN
0 C
1
J1
1
o-
IT
D
Cl
c
CM
1
0 0-
1
0
IT
•
1/1
O
o a
CM CN
O C
,
•
"'
1
O-
1
C
or
o
o o
CM CN
r
r
c
o
•
in
o
rv
0 0
CN CM
O C
1
1
o o
— c
1
0 0
CM CM
O C
!
" I
1
C'
c
o
o
CM
o
C' C
3- -<
C
o
-c
•
un
0
3-
0 C
CN, CN
0 C
3
•
1
r-1
»
C
C
3
C
<
c
0 C
^\ u
c
c
c
t>
c
0 C
— CN
CM CN
C C
236
i
0 0
rv *•
--
0 O
CM CM
CM CM
0 C
O
T
CN
1
1
O
-c
1
0
CM
1
0 C
c, c
D C
D
-
C
•
CM
CJ
X
o o
CM CM
C C
i
•Ci
fMJ
0-
a
i
INI
CNi
o- a
-C 3
— a
i
c
-
Cl
3-
•
«•>
O
n
o o
CM CNI
CM CM
0 0
0 O
-C Q
O CM
0 0
CM CM
X X
CM CM
0 C
1
<
X
o si
* »l
» 1
> •*!
* *
IT tS
1
|
1
O" iN
• i
— IN
0- 0
o m
3- —
i
o- un
-o o
T-
O m
c c
o rv
1 -O —
• •
<
1 — CM
Cl IT
•C -O
|o m
IrC-
•
c
4 • •
q z o
3-
a
CM
O-
CC
CM
-
CD
t£
C
^
t-
O
*-
•Hi X > 1
•1 • •
t =r C
IT
*
C
i
•v C
IV C
<~ f
1
c
o
*~
^1
oc
o
m
CM
^
~
U
a
CM
rn
i
\r
in
-c
c
o
1*1
n
o
o
3 |v | ir
• 1 •
n l o
u
u-
* \T: CM
CN
-
CM O
11
C'
i
i i
U.
u.
u^
1 c
-\ n
»-; ' a:
4 O
or
t~ o
i
1
!
1
t.
a
<
z
c
%-
I/
! *+•
1 0
, o
u
a
o-
c
c
ir o-l o
o- H
• 0
0 II
ii a.
CL
K-
>-
UI
X
•-
-
"J-| 4
1- Z- U
Z 4 -
4 UI O_
03 >-
— u.: i-
U, — , 4
Z IT UI
IT •— 1 or
~ 01, 4
<"
U\ UI
U1 -, r^
M -1
Ui O
o z
4
>
Z UJ
UJ Or' —
cc u, •
U U. Q
u. — ] UH
— o! cr
c cr
••' 4
-• UI 1—
UI Ui UI
-------
VALIDATION TECHNIQUES USED IN CONTINUOUS
AIR MONITORING
by
Marvin B. Hertz
Health Effects Research Laboratory
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
237
-------
VALIDATION TECHNIQUES USED IN CONTINUOUS
AIR MONITORING
M.B. Hertz
The Community Health Air Monitoring Program (CHAMP) is a network of air
monitoring stations used to acquire reliable air quality data for use in
epidemiologic health effects studies.
The CHAMP network has remote air monitoring stations located in each of
the selected health study communities across the country. The focal point
of the CHAMP network is the central computer facility located at the Environ-
mental Protection Agency, Research Triangle Park, North Carolina. A mini-
computer at each of the remote stations controls and acquires data and asso-
ciated system status information from aerometric and meteorologic instrumen-
tation and transmits the data by phone lines to the central computer facility.
The central controller for the CHAMP network is a dual processor system with
a full complement of input, storage, and display peripherals. One minicom-
puter was selected to perform the tasks associated with the management of the
large data base to be generated by the network. The telecommunications and
real-time processing tasks are handled by the other processor.
Two fundamental system objectives were: (1) to provide machine valid-
ation of the data, and (2) to develop a management information system for use
in the quality assurance, field logistics, and field maintenance tasks asso-
ciated with system operations.
Remote Data Acquisition System
Basically, the minicomputer in the remote station serves as an interface
between the pollutant analyzers and associated system, magnetic tape data
storage, the remote field service operator, and the telecommunications net-
work. The data generated and recorded at the remotes and transmitted to
central includes not only the actual meteorologic and pollutant sensor re-
sponses, but also associated analog signals and digital status signals.
238
-------
These signals supply information about the performance and status of each in-
strument. For example, if an instrument is switched from an ambient sampling
mode to the calibration mode, a status bit is recorded which reflects this
change.
Telecommunication
Data is retrieved on the request of the central computer system from
each of the remote stations via a dial up phone line at two-hour intervals.
The central and remote computers converse via voice-grade telecommunications
system consisting of modems operating in a full duplex mode at the rate of
1200 baud from remote to central and 150 baud in the reverse direction.
Polling is under the complete control of the central controller. A file is
maintained on a disk at central which contains the phone number of each sta-
tion in the format required by the calling software system. An alterable
polling queue is also disk resident. A rigid protocol has been established
to guarantee accurate transmission and retrieval of data. Central makes sev-
eral tries to establish contact with a remote station before abandoning the
attempt and placing the station at the bottom of the polling queue. A hard-
ware carrier detect protocol establishes the communications link. Each frame
is checked for parity and framing errors by the modem controller. Checksums
are computed for each 512 frame record and compared by the computer. An
acknowledge character is exchanged indicating correct receipt of the record.
Should any of the tests fail, several transmission retries are made. Commu-
nications are terminated by receipt of a character from the remote system
indicating the end of data or by failure of the remote to transmit in the
required period of time.
The Central Controller Hardware Configuration
The focal point of the CHAMP network is the central computer facility
located at the Environmental Protection Agency Complex, Research Triangle
Park, North Carolina. The central controller for the CHAMP network is a
dual processor system with a full complement of input, storage, and display
peripherals. The heavy burden on processor time placed by the telecommuni-
cations and real-time processing of the large quantities of data justified
the choice of a dual processor system. A PDP-11/40 with 40K of core was
239
-------
selected to perform the tasks associated with the management of the large
data base generated by the network. The telecommunications and real time
processing tasks are handled by a PDP-11/05 computer with 16K of core. The
two processors are interconnected by a Unibus window which takes advantage of
the unified asynchronous data path architecture of the 11 system. The window
allows each processor to address the core and peripherals on the other pro-
cessor as if it were its own. In addition, the DEC memory management option
was added to the PDP-11/40 to handle addressing above 32K in the 16-bit sys-
tem. An extensive complement of peripherals including two 1.2M word car-
tridge type disks, three tape drives, and electrostatic printer-plotter, line
printer, and CRT display were initially selected; the rapid retrieval require-
ment for large quantities of data necessitated the addition of a Telefile
dual spindle, quad density, removable 20 surface pack disk system capable of
storing 98M words.
The Central Controller Software Requirements
As mentioned previously, the PDP-11/05 processor is dedicated to the
system telecommunication tasks and the storing of the data simultaneously
on a magnetic tape and the Telefile disk as received. The data (Level 1)
so stored is an image of the tapes recorded at the remote station. These
data include the primary data (those data which actually represent parameters
of interest such as pollution levels), secondary data (that data required to
validate primary data or which are used only to insure proper station oper-
ation), and the status bits. For flexibility any channel at the remote sta-
tions can be selectively assigned a primary or secondary function as required;
furthermore, the number of primary and secondary channels is made arbitrary.
All of the data at the remote stations, whether primary or secondary data, is
assigned a remote station data slot (RSDS). The complement of instruments
and the RSDS number corresponding to a given instrument may be different in
each station. It is, therefore, necessary to append a "map" which gives the
correspondence between instrument and RSDS number at the front of each set of
station data. At the central each parameter is assigned a mnemonic (2 to 4
letters) which describes the parameter (NOS, 03, TOUT, etc.). The map,
therefore, must contain the mnemonic and the link between the mnemonic and the
corresponding RSDS. As the complement of instruments in a station changes,
240
-------
a new map will be created and appended to future data from that station. All
operations at Central, will, therefore, refer only to the mnemonic names and
not any "channel," "data slot," or other number.
The 11/40 processor is devoted to data validation tasks and task asso-
ciated with the management information system requirements for quality assur-
ance, field maintenance, and field logistics.
CHAMP Software Features
As mentioned previously, fundamental system objectives were:
1. To develop and implement a computer-based management information
system for use in system quality assurance, field logistics, and field
maintenance tasks.
2. To provide for machine (computer)-validation of the data.
The current CHAMP software, which will now be described, represents
the composite of original system programs plus those that were subsequently
developed in response to needs recognized after the system became operational
and to inadequacies in the original software.
Current CHAMP Central Software
1. FILMAP - This program creates the air quality data base station map
files. The station map files contain station location information,
instrument complement by station, status bit configuration indicating
hardware failure, calibration data, and correct operating ranges of
the hardware. In addition, the map files identify validation criteria
such as secondary parameter limits, dependencies between primary para-
meters, filtering and interpolation techniques to be applied, and the
format of the final data output.
2. FILSET - This program sorts polled and station data (mailed) by
type and writes the data in the correct format into the appropriate
files in the data base. Six types of data are sorted. These are
(1) primary parameter data, (2) calibration constants, (3) status
words, (4) journal entries, (5) secondary parameter data, and
(6) calibration data. Table 1 presents a sample of secondary data.
3. TIMSTN - This program processes the remote station data tapes and
checks for remote station data time anomalies. The program enables
z4i
-------
the operator to edit the reported times to resolve the time jumps.
The edited station data is recorded on magnetic tape.
PDAILY - This program summarizes and produces a printed report of
daily station performance. PDAILY invalidates primary parameter
data in cases where bits associated with hardward failures are set.
Calibration data and data which is collected on the wrong instrument
range is invalidated. The number of invalid five-minute averages for
each primary parameter for each hour is tallied as well as the number
of times each status bit is set for each hour. The amount of data
found invalid, valid, missing or in calibration made for each primary
parameter is summarized over the day. The journal entries, which are
operator comments entered at the remote stations, are also listed.
PSUMRY - This program summarizes daily station performance and produces
printed summaries of station performance. The performance summary
contains the following:
1. Percentage of valid data by parameter by day.
2. Percentage of valid, missing, and calibration data by day.
3. Percentage of valid, missing, and calibration data over the
days shown on the summary.
4. Logs of data processing progress.
5. Primary and secondary parameter calibration occurrences, and
occurrence of control chart samples which exceed upper or
lower control chart limits.
PSCHRT - This program samples the air quality data base secondary
parameters and generates control chart files from the sampled
secondary parameter file data.
REVMAP - This program allows manual editing of the data by processing
validation actions entered on punched cards in the standard "Review
Change Request" format.
PCHEMS - This program generates the calibration, performs pre-
established validation tests, and produces a printed report summarizing
chemical analysis data (i.e., hi vol data, bubbler data).
242
-------
9- PPCHRT - This program samples the air quality data base primary
calibration parameters (A and B constants) and generates control
chart files for these data.
10. PCALIB - This program generates the calibration coefficients used
to convert the air quality data collected as a voltage by the remote
station into a concentration. The calibration constants are calculated
from the raw calibration mode data recorded at the remote station on
tape.
11. FAUT - This program performs automatic calibration for selected
equipment under control of the Central Computer and/or the remote
station operator.
12. FILCAN - This program sorts the chemical analysis data by station
by data and creates chemical analysis data files.
13. PLOTCC - This program takes input from the control chart files and
plots control charts for the primary parameter zeroes, spans and B
coefficients and for the secondary parameter ranges and values.
14. FLMRG - This program sorts station data (mailed) by types and writes
the data in the correct format into the appropriate file in the data
base, then these files are compared with those created by the FILSET
Program (polled data) and fills the gaps in the data base.
243
-------
00
0
t—
CO
1 — 1
1—
SECONDARY PARAMET!
77222 TO 77237
r~*i . _
NJ ff.
_J
QL 1-1
0 —I
Lu.
0 C£.
O
2 Lu.
0
Q O
1-1 CD
3 CO
f-r i^
Q 1—
co co
o:
u_
S
0)
cu
-M
fO
O
-^
[__
a>
03
Q
rO
Q
o> r**. csj ^ LO o^
i—
i^D O> OO OO ^ **O
CM O O O O O
i— O^t OO CT» CTi CJ»
rf\ m ^4-
- VO
CM CM CO CO CO CO
CM OJ CM OJ OJ OJ
CO r— «* OJ O •*
CO «* OJ CM CO t—
^- OJ <* OJ OJ r—
OJ VO CO O OJ <*
OJ CM CM CO CO CO
OJ CM OJ OJ OJ OJ
C
cu
ai
>,
X
i — i-~ CM o un cn
CM OJ O CO CD r—
r-- un un un un un
o i — cn co oj un
OO VO 00 CO O OO
co CM vo r--. r-^ r--.
^3" CM i — ** OJ OJ
O O 0 O O O
CO r— 00 «* O CO
o o o o o o
CM o cn oo vo cn
o un o CM o r—
un co co cn cn i—
O O CM r— CD r—
vo r*1* cn CD *^ ^i-
CM CM OJ CO CO CO
OJ OJ OJ OJ OJ OJ
co oo cn o <• co
r— r— O O r— •—
oj *d- cn cn
-------
*^++
*^3
a
cu
3
re
>
cu
_£
(—
CU
re
a
cu
E
•r—
CU
.p
re
a
r**» cn r— • r^ i*** CM
CM 00 O 00 CO r—
r-~ LO LO LO LO LO
i — r~» «*• 1 — CM r^
cn «d- LO o LO CM
LO l-v i — O LO LO
r^ CM CM CM i— i—
O O O O O O
1 — VO «3" 01 LO LO
cn cn LO co co co
«J- <»• LO LO LO LO
CM CM CM CM CM CM
CO LO LO r— CM O
i— CM LO OO i— LO
Cn Cn CO 00 cn O
r- O O O O r—
co co 01 1 — LO «*
CM CM CM CO OO OO
CM CM CM CM CM CM
LO LO r-» oo r^.
CM LO r^ CM r^. CM
V4J LM LAJ r^« CU
CO LO LO LO LO LO
CM CM CM CM CM CM
CM LO CTi OO <=3- «=t-
LO O «* CM LO r—
r^ LO o LO LO i —
r— O i— r— r— r—
LO tv. O r— «* OO
CM CM OO OO OO CO
CM CM CM CM CM CM
p^ o cn LO oo •*
«* r^ i— Osj o •*
tM ^r t-j uj *~j r^
r^ ^ LO ^~ LO ^f
r— CM CM CM CM CM
r^* CM r^ r~- LO cn
i — i — O LO CD CM
CM •* CM CM r— i—
VO CO O CM VO •*
CM CM OO CO OO OO
CM CM CM CM CM CM
^1- CM O i— O «3"
r— O CM r— LO CM
LO OO LO OO i — OO
CM LO CO O «=J- CM
CM CM CM CO OO OO
CM CM CM CM CM CM
CO
CO
"a.
E
re
CO
E
CU
cn
>{
j-
re
cu
^3
•i—
^
O
t/1
ZJ
o
s-
4_>
•r—
^
ti
H—
3
3
O
re
^>
S-
^.
o
•*-*
cu
3
CO
CO
ai
s-
Q.
CU
3
r—
O
CO
J3
•^
•a
o
•r-
S-
cu
o.
01
E
•i—
a.
E
re
00
o
i.
cu
1^
E
3
^
•o
i.
re
~£2
E
•P
oo
re
cu
E
3
E
•r—
X
s:
E
3
E
E
•^
:^
E
T3
E
UJ
cn
E
E
E
•r—
cn
cu
CO
(/)
E
O
.Jj_)
fO
cu
Q
3
,E
re
Q^
E
^^
cu
•P
re
o
0)
3
cu
E
L_M
fmn.
CU
+J
re
0
cu
"re
,
•p—
cu
re
Q
cu
P^
••^
*~"
cu
•p
re
a
cn r^ CM «^- LO cn
«* CM O CM O i —
IO LO LO LO LO LO
CM
oo r-- r~» LD LO oo
LO CT> LO "^ O LO
oo cn cn o co vo
OO CM LO CM O LO
CM i — O «d" i — CO
o CM cn LO «3- co
LO O O LO «* CO
LO LO LO CM OO LO
CM CM CM CM CM CM
LO *j- vo cn r^ *±
r— LO r— OO «* r—
CM CM CM r--. co r^
O O CM O O O
•sj- r>. co CM «vf LO
CM CM CM OO CO CO
CM CM CM CM CM CM
LO CM cn cn «* «*
<* «* oo LO cn O
C^i P"*1* t-Q vO r**^ uo
OO CM CM CO CM CO
CO CT> LO ^t" CM CTi
•* LO CM O CO i—
CO LO LO CT> CO CM
__ —~ f \ f-~i (—x __
f— , - (^J ^^^ 1 ^^ 1
CM LO CTI CM
LO CO O CM «=}• LO
CM CM CO OO OO CO
CM CM CM CM CM CM
CO i — «d- CM O «*
00 •* CM CM CO i—
^J- CM *J- CM CM r—
OO LO OO O CM «d~
CM CM CM OO CO 00
OO CM CM CM CM CM
CO
CU
a.
E
rO
CO
c
cu
r—
>>
4->
t Lj
-o
re
cu
o
NJ
o
JT
o
M-
E
3
3
0
re
LO LO O O CM "3"
•3- oo o CM oo o
LO LO LO LO LO LO
LO
r— LO f^ ^3" CO *~~
oo sf cn CM ^^ r^
oo oo •— CM o r^
co CM r~-- CM r^ o
CM i— O -!d- 00 r—
i — co J- LO cn 'd- r^
i— LO •— co r— «d-
CM CM CM l^> r~ OO
O O CM O O O
<5f r~» co CM vo «*
CM CM CM CO OO CO
CM CM CM CM CM CM
LO CM cn cn •* <*
}• ^ co LO o cn
CM i^* LO LO LO r*^
00 CM CM OO OO CM
co cn r^ ^1* cn cn
«3" LO ^f O i — CM
co LO CM cn CM i —
CM LO CD CM LO ^d-
CM CM OO CO CO OO
CM OO CM CM CM CM
LO OO f-» CM CM O
i^ LO r~ r^» cn ^^"
^^ .
f*~ CM OO ^j* 1^* CM
CM CM CM i— r— CM
LO cn i^* i — i— cn
OO LO <*• LO OO CM
CM •* CM CM •— r—
LO CO O CM LO <*
CM CM CO OO CO OO
CM CM CM CM CM CM
CM r— O CM O *d"
p— O CM r- LO CM
LO CO LO CO r- CO
CM LO CO O "vf CM
CM CM CM OO OO CO
CM CM CM CM CM CM
245
-------
•o
CD
^
C
• r—
4J
C
o
o
•
r—
UJ
^_
ca
•^
h-
_
CD
i—~
r}
03
1—
C
H- 1
(/)
0)
"c.
e
03
oo
,_
5
s-
o
Li-
eu
s-
3
10
in
CD
S-
O-
O
si
f\ \
UJ
s-
03
CQ
01
£
£
**
(/)
OJ
Q.
CD
-P
^3
r—
o
If)
j3
^£
XJ
o
•r~
S-
OJ
Q.
CT
C
•r-
"o.
E
fO
oo
0
S-
eu
o
E
3
^^
"O
S-
n3
T3
03
^_)
oo
«0
^
E
E
•r—
X
cO
s:
£
Z3
_E
C
•r-
2£!
cn
c
•5
c
UJ
01
c
E
c:
•i —
01
OJ
on
to
CD
^
1 —
03
^>
C
o
•f—
-p
03
'>
CD
Q
CU
fr<
•W
CD
_E
1—
QJ
03
Q
03
CD
E
t_
CU
4_>
03
Q
CD
>
QJ
£
•p.
f—
Qj^
^_>
03
Q
CD
E
i
i
CU
-4_>
(O
Q
<~O 00 •=*• i— CO CO
«=!••* i— Ln CM co
r~. Ln Ln Ln Ln Ln
CM CM
CO CO
1 1
en Ln vo *d~ en
CO CM CO CO i — i —
co p*-» tn en en en
o en co co co co
en en oo Ln vo 10
O O r— O O O
i — O «D P~- Ln CM
Ln Ln oo o CM «*
ID Ln co en oo co
p^s r*^ j*«» ^^ r^^ r^^
Ln co "^ r^. cr> en
«* «5t O CM O «*
CM o co CM CM en
O r- CM r— r— O
«* co en o co vo
CM CM CM CO CO CO
CM CM CM CM CM CM
co en en co o Ln
co «* Ln en i — r—
co co r- o en o
r— r— CM CM r— CM
r~- r— co co o vo
Ln co Ln CD ^~ ^"
CO CO CM CO CO CO
•— •— O i— O r-
Ltt VO CO r— -=3- •^•
CM CM CM CO CO CO
CM CM CM CM CM CM
O i— Ln rv. o ^t
«* en oo vo tn co
«* CM «e- 1^ r- ID
i*» h» r~ r^ r^ r->
^* co r*** en *^ '^
r— Ln Ln o ^3" Ln
CM co i— CM o en
i— •— t— i— r— O
^O CO O CM *^J" VO
CM CM CO CO CO CO
CM CM CM CM CM CM
CO CM CO CM «* r-
CO ^3- CM CM CO i—
•=d- CM *d- CM CM r—
CM ID OO O CM «*
CM CM CM CO CO CO
CM CM CM CM CM CM
246
-------
USE OF PRECISION AND ACCURACY ESTIMATES
FOR VALIDATION OF DATA
by
David T. Mage
Environmental Monitoring Systems Laboratory
U. S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
247
-------
USE OF PRECISION AND ACCURACY ESTIMATES
FOR VALIDATION OF DATA
David T. Mage
INTRODUCTION
A basic need in the presentation of a data set is a description of the
precision and accuracy associated with the measurements. The definition of
valid data for a given study is then determined by the precision and accuracy
claimed for the data set. For example, a statement may be made that the data
have a precision of 10%. Assuming that the errors are independent and normal-
ly distributed, one expects that 68% of the measurements are within ±10% and
95% of the measurements are within -20% of the true value which is unknown.
By this definition the 5% of the data in error by more than -20% are not in-
valid since one expects from probability alone that runs of positive or runs
of negative errors can occur. It is not productive and probably impossible
to examine each datum point and determine an individual error associated with
it. The approach being taken for the Community Health Air Monitoring Program
(CHAMP) data base is to determine the precision and accuracy of the entire
data set and not invalidate data except for known cause such as instrument
failure. The alternative of eliminating data suspected of higher uncertainty
in order to improve the precision of the remaining data set is counter pro-
ductive in the context of a health study.
In a health study, the aerometric data are paired with health data.
When aerometric data are invalidated, the associated health statistics are
also removed from the analysis. Because the occurrence of the health indica-
tor, such as an asthmatic attack, is relatively infrequent, the loss of the
information significantly reduces the validity of the overall study. For
this reason an approach which provides a large aerometric data base of moder-
ate precision and accuracy is preferable to an approach which provides a re-
duced data base of higher precision and accuracy. The following sections
248
-------
describe the system of data validation currently being used by CHAMP.
ERROR ANALYSIS
When an aerometric analyzer is continually monitoring pollutant, several
sources of potential error can influence the measurement. The ten sources of
error given in Table 1 are discussed below.
1. Span Gas Analysis—This error covers the uncertainty in the process
of preparing a known concentration of pollutant to provide an upscale reading.
This process may contain errors associated with preparation of a primary
standard and subsequent analysis of a secondary or transfer standard to be
used in the field. When dilution is necessary to attain the desired concen-
tration, the errors in the flow measurements of standard and diluent air also
contribute to the overall error. It is the belief of the author that this
uncertainty is on the order of 5% (al = .05 ).
2. Zero Gas Impurity—The gas used to zero the instrument may contain
some impurity. The presence of 0.1 ppm as opposed to 0.0 ppm represents po-
tential error of 10% at the 1 ppm level and 1% error at the 10 ppm level.
For the purpose of this analysis a low error of 2% (o2 = .02 ) is chosen
since greatest concern is at or above the National Ambient Air Quality Stan-
dard (NAAQS) where this error is minimized.
3. Instrument Drift, Electronic—When the sample and reagent flows to
the instrument are held constant, a constant input concentration produces a
signal which fluctuates about a mean value. This "noise" in the output sig-
nal may be caused by electonic noise in the photomultiplier tube and other
electrical components due to voltage, frequency and temperature fluctuations.
The estimate of error for this effect is 3% (a3 = .03 ).
4. Instrument Drift, Flow Variations—After an instrument is calibrat-
ed, and with input concentration held constant, an increase or decrease of
the sample flow rate will tend to cause the instrument to drift away from the
equilibrium point. When reagent flows are also being mixed in a reaction
chamber, such as ethylene flow in a chemiluminescent ozone analyzer, the
fluctuations in reagent flows also influence the output signal. These flows
may be influenced by fluctuations in atmospheric pressure and vacuum in the
flow system. The overall effect of these variations of flow from the
249
-------
to
cxi
LT\ C3- CD CD UD CD
' CXI CD CD hO i—H CD
~ CD CD CD CD CD CD
CD CD CD CD CD CD
CD CD CD CD CD r-H
CD CD CXI CD CD CD
CD CD CD CD CD CD
O
0.
IT\ CM N^ CJD CT
CD CD CD CD CD
CXI CNJ CNJ
O CD CD
CD O
LU
uu
_i
CO
o:
o
O
cn
a:
o
on
ce
u.
o
CO
UJ
o
on
^
o
GO
00
o
CO
C£
O
or
o:
en
O
o
a:
i-
o
00 >- ~> i 0)
— I- I- h- —
CO •—i U_ U- O
>- on i—ii—i LU
—J :D on on on
^ Q_ f*"^ f~^i Q.
UJ
O
CO
CO
z
o
O
LU
on
on
o
LU
a:
CD
Q
O
0£
CO CO
CD <_D
a:
o
h-
<
a:
on on
Z O t- I-
< OC V> CO
Q- UJ Z Z 0-
CV5 rx| ^- H— CD
CO
o:
o
on
on
_u
Qi
O
Z
1—4
s:
o
CO
LU
i— *
1-
1— 1
on
^
UJ
z
1— 1
_J
1
z
o
LU
21
h^
UJ
CO
z
o
Q_
CO
UJ
Q^
CO
UJ
u
z
UJ
on
UJ
u.
o;
LU
h-
z
1— 1
1-
^
a:
LU
a.
2:
LU
1—
1
LU
on
^
00
CO
UJ
o:
Q_
^
CO
z
t— 1
CO
CO
LU
o
o
en
O-
«^
1-
«^
CD
on
o
u_
CD
II
b"
on
O
cc
or
a
LU
i-
o
UJ
Q_
X
CNI
-=r in
to
oa o>
250
-------
calibration condition is taken to be 6% (a^ = .06 ).
5. Operator Impreci si on—The station operator in performing calibra-
tions must adjust potentiometers and rotometers and perhaps read the mean of
a fluctuating signal. A different operator repeating these procedures will
arrive at a slightly different result for each of them. The resulting uncer-
tainty, due to the human element, is estimated at 4% (o5 = .04 }.
6. Non-Linearities of Scale—A linear relation is usually assumed be-
tween voltage output and pollutant input. Slight non-linearities of scale
are usually masked by the uncertainties in the measurements themselves. Where
the scale appears to be linear an error of 2% in the linearity is almost in-
distinguishable, consequently this error of 2% is treated as a possibility
which cannot be ignored (o6 = .02 ).
7. Response Time—Due to the finite response time of the instruments,
a rising signal will lag and tend to be underestimated and a falling signal
will lag and tend to be overestimated. These errors are felt to provide an
error on the order of 2% in the measurements (a7 = .02 ).
8. Interferences—Variations in the atmospheric composition from the
composition of the gas used to calibrate the instrument can cause errors.
Common gases, besides the common pollutants, which fluctuate in the atmosphere
are C02 and H20. These fluctuations can cause variations in output signal on
the order of 2% (a8 = .02).
9. Pressure Temperature Correction—When data are corrected to stan-
dard conditions (25°C and 760 mm Hg), uncertainties in measured pressure and
temperature can cause a slight error on the order of 1% (a9 = .01 ).
10. Data Processing and Round Off--In analog-digital conversions and
vice versa, an error is created. When data are outputted a round off error
also occurs. These errors are quite small and probably less than 1% (a10 =
.01 ).
The net result of all of these effects, assuming that the variances are
additive, is an overall uncertainty of -10%. This is interpreted as follows.
If the atmospheres at 10 stations were all 10 ppm of some arbitrary pollutant,
one would expect the mean of all 10,station measurements to be 10 ppm, 7 of
them would be between 9 and 11 ppm, 2 of them would be in the range 8-9 ppm
and 11-12 ppm, and 1 would be over 12 ppm or less than 8 ppm.
251
-------
CHAMP VALIDATION CRITERIA
The CHAMP aerometric system is unique in that it measures and records
the secondary flow parameters within the instrument simultaneously with the
measurement of the primary parameter (pollutant concentration).
The fluctuations in the secondary parameters produce uncertainties in
the measurements as described previously. When the fluctuations exceed their
expected range, or limit of normal operation, two possible causes must be in-
vestigated. The first is pure randomness which means that the data are valid.
The second cause may be a component failure, such as a clogged capillary tube,
and this produces a bias in the data. Besides signalling the need to repair
the instrument, the analyst has a cause for invalidating that portion of the
data set where the instrument was operating out of the normal range.
As an example of the usage of secondary parameter data for determining
precision, accuracy, and validity, the effect of ethylene flow within the
Bendix chemiluminescent ozone analyzer is chosen for discussion. Figure 1 is
a typical plot of the ozone flow variations within a CHAMP station. On the
day shown the ethylene flow (FETH) varied from 25 cc/min to 26.3 cc/min.
Figure 2 is a histogram of the fluctuations of FETH about the previous cali-
bration setting of FETH during the 11-day period August 15 - August 25, 1977.
The histogram has a mean of -0.03 cc/min and a standard deviation of 0.29 cc/
min. The mean close to zero confirms that the fluctuations are not producing
an appreciable bias, as expected. The standard deviation can be related to
a standard error by examining how the output of an ozone analyzer varies with
FETH.
Figure 3 shows the instrumental bias (A ppm 03) as a function of AFETH
at various ozone levels. In this case the instruments were on the 0.5 ppm
scale, and were calibrated with a sample flow of 1000 cc/min and a FETH of
25 cc/min (bias = 0 by definition). When these data are normalized by di-
viding bias by original ppm value, the percentage changes at all four concen-
trations follow a common curve, Figure 4. In this case a linear fit to these
data is not justifiable. At 25 cc/min the slope of the curve is +1.6% per
cc/min. The standard deviation of 0.3 cc/min, therefore, corresponds to a
standard error in ozone of 0.5% as shown on Table 2. The biases of the meas-
urements to changes in vacuum (VAC), sample flow of ozone (SFOa), sample
252
-------
CD
C
i— OJ
UJ >>
a: .£=
en QJ
i—i
u_ <*-
o
c
o
•r—
-!->
(O
•r—
s_
(
0
"
I
|
!
i
c
o
o
CL
O
0
n °
o
o
o
o
o
o
Q
e
0 <
Q
o
o
o
C
o
o
» 0
e
r)
*o0
O
tf
I
i
i
c
o <
' u
Oo
0 4
o
o
°0
> °
o
o
0
0 0
1
d»«°
i
o
o
MJ<
o
0
o
0 ^
,°3
0 0
0
0 °
0 0
f *
0°
0°
0°
o
0
° o
0°
0°
0 °
o
o 0
0 °
o
o
o
o
o
\J
o
0
o
o
0
u
o
o
3
0
„ 0
p
I '
o
0
o
o
o
o
3
0*&
0°~^
9
o
c
o
o
o
e
0
1
»v
o
0°
o
o
fl
3°°
0
!
t
J°o
p
o
0°
^
1
0 ^
o
o
d- »
o
>o
lO
> „<>
> o
>0
I
11 1
1
00
»
o
o
> o
(f3
0
o
I
1
f
i
i
!
o
{
{
c
o
e
: O
i 0
o
0
<
1
o
<
1
p
0
a
1
o
o
0
0
?tl
&
y
2
U3
.-r
rH
^1
l-l
y
0
<£
Q
CM
u>
•
CM
4-
•
CM
O
•
CM
253
-------
CNJ
LLl
ce.
o
LU IO
Z CO
LU CM
Z O
1-1 --3
00 O
Z I—
O
I—l VO
1— CM
< CM
O
II
LU
o
-3
o :r
H- »-
CO LU
h-H LU
a:
o
UJ
£
u
CO
Si
o
ID
o
J-
s
o
o
o
o
o
o
o
§
sNOiivniorm do
254
-------
-NnjMtfll^fflUJB-NnTtl
B M J n W - H B Bl B B B H. H H B ---.---
B B B B B B B g gf g jj gj g| gj 5} ^ B* B B B B B
SBBBBBBllllllilllilJl!
(6Q
SVia
255
-------
"SE
«3-
LU
C£
• • • . H M
BUB . . M - -
N - - W 0 I 1 i
(aBueip %) SVI9
256
B W B W B
N PI PI PI J
I i i i I
-------
CD
U_
oo
ZE
|— —
UJ
LJ_
CO
ry
LU
1- CO
LU hO C_)
Si CNl «=C
-
a: r^.
< CO 1 <
d_ LU CNJ •— '
o CNI z
>- < CNJ o:
o: o; r^. o
< LU l\ U_
o > — •
CNl 2 «-" ^ CNl
< z. a: o CD
) — LL. •— UJ Z U_
O 2: Q_ <
1 _l
co i_n > o.
•^ ^ -^ '
O Q
•— 1
1- LA
< I— I
»•— t
<
>
CD
^
u_
oo
s:
•^^
CJ
CJ
oo cr>
T— H N^v
s:
o
CJ
LO en
CNl 1^
N^ *
^ +
a:
or
0
I—
N™\
r— 1
CO N~\
CT i— i
CO -f
^r
^^
CJ
00 CD
en
CNl r-(
2:
CJ
CJ
•CJ" r— • |
cn
CNJ -=T
0
*— i
LU |-
O <
> 21
CC CC. ID
uj LU s:
CQ CO — •
s: CQ x.
~\c~~i cr en
CD 1^. CNl
• - -
1 1 0
a:
a:
0
l-
oo
oo en hn
- i— i
CD C.O
N"^
+ 1
^"
v^
O
OO CJ
CD
CD 00
CO -3"
. •
+ T— | 0
1
S
U
CJ
co i_n
CD rv. to
- . .
LO r— i
1 1
•z.
0
Q— r
SI fv* (—
13 ^ - 2 >
< 2 < LU
HI t~i f— C5
SZ SI CO
fe-5
un
CD
&-S
un
•
CD
fe-S
tn
-
CNl
B-S
hO
«
O
&~s
to
-
r-H
G
Od
«i
Q
"Z.
<£
^—
GO
257
-------
flow of NO (SFNO), and flow of air through the ozone generator (FCh) are
shown on Figures 5-8. These biases were used to compute the other standard
errors given in Table 2.
DISCUSSION
The assumption developed in the preceding section is that the measure-
ment errors from instrument-to-instrument are independent and normally dis-
tributed. When the same standard is used for repeated calibration of an in-
strument to provide a time series of measurements, the error in the analysis
provides a bias. One of the functions of an audit, using an independent set
of standards and observers, is to disclose whether a significant bias exists.
In September of 1977, an audit of 46 instruments located at seven CHAMP sta-
tions was performed by the contract operator of the stations. The audit pro-
cedure was not truly independent since the transfer standards in station use
were originally compared to the primary standard used in the audit. Of the
46 instruments, seven of one type showed a consistent bias indicating a prob-
lem with the audit procedure. The remaining 39 instruments showed an ex-
pected positive and negative scatter about the audit values.
In order to test the hypothesis that these 39 results are normally dis-
tributed, the results are plotted on normal probability paper at frequencies
corresponding to their rank, lowest to highest, divided by the total (39)
plus one. Each datum point represents the average of four span results at
approximately 20%, 40%, 50%, and 80% of full scale. These four values of
deviation are not independent since the same operator used the same standards
for each one. However, the set of 39 averages are mutually independent. The
mean of the deviation, y, is -1.55% and the standard deviation, o, is 5.5%.
The maximum difference between the frequencies predicted by the normal dis-
tribution, N( u,a), and the data points is 6%. This corresponds to a
Kolmogorov-Smirnov statistic of 0.06 which indicates that the hypothesis of
normality for the distribution cannot be rejected at the 5% level. If an
independent auditor performed the audit with independent standards, a stan-
dard deviation larger than 5.5% would be expected, probably on the order of
10 to 15%. The results of independent CHAMP audits are being analyzed in the
manner described above and the results of the analyses will be reported with
258
-------
Q
139
'55
•05
LO
LU
C£.
CD
CM
O
T3h
"SE
•0E
•52
S W
W
M El
. El
M -
1 1
(aSueip %
U H
N
1 1
) svia
u
n
i
El
n
i
w
n
i
d
j
i
259
-------
*0hZ
"01Z
VO
•061
u
•0B1-
o
u_
CO
•051
•021
•0hl
•0EI
-------
o
LlJ
a:
CD
o
<:
to
•»->
E
3
S-
•P
c
•I—
-------
262
-------
U3
CM
o
o
•r-
4->
IB
.c
4->
•r—
X
T3
4J
C
OJ
o
O)
Q.
IO-
CS
O
-------
the data elsewhere.
In conclusion, the approach being taken for the CHAMP data validation
procedure is to accept data from the instruments when they are known to be
operating properly and make a probability statement for the individual data
set as a whole. For example, a pollutant data set for one station may have a
standard deviation of 10% but the standard deviation for another pollutant
at the same station may be 15%.
These different uncertainties allow the statistical analyst to weight
the data higher when the expected error is low and adjust for the relative
uncertainties in making correlations between air pollution and health.
261*
-------
VALIDATION SYSTEM USED IN THE ST. LOUIS
REGIONAL AIR MONITORING STUDY (RAMS)
by
Robert B. Jurgens
Environmental Sciences Research Laboratory
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
265
-------
VALIDATION SYSTEM USED IN THE ST. LOUIS
REGIONAL AIR MONITORING STUDY (RAMS)*
R.B. Jurgens+
Abstract
This paper describes the RAMS measurement system, screening categories
of data validation, the RAMS automated validation system, the current
status of special validation studies - visual validation and successive
differences, and updates to the RAMS data base. The conclusion presents
a generalized measurement system including quality control, data validation
and feedback.
*Portions of this paper have been discussed in detail elsewhere (1) and
will only be mentioned here for completeness.
+0n assignment from National Oceanic & Atmospheric Administration.
U.S. Department of Commerce, Rockville, Maryland 20852.
266
-------
Introduction
The Regional Air Monitoring System (RAMS) is the ground-based
aerometric network of the St. Louis Regional Air Pollution Study (RAPS).
See references (2-3) for a discussion of the objectives, scope and
accomplishments of RAPS.
The location of the 25 RAMS stations within the St. Louis metropolitan
area is shown in Figure 1. The air quality, meteorological and solar
radiation measurements within the RAMS network are listed in Table 1.
Note that not all measurements are made at each station. Measurements
began being recorded in mid summer 1974 and continued through June 1977.
From April to June 1977 only stations 104, 106, 107, 111, 115, 121 and
125 were in operation. The approximate volume of data recorded during
the network operation is 500 million values. Figures 2 and 3 show the
data flow through the RAMS stations and through the central facility at
Creve Coeur. Rockwell International Corporation was the prime contractor
for the installation operation and maintenance of the RAMS network. A
detailed description of RAMS can be found in references (4,5).
267
-------
Quality Control & Data Screening
Data validity results from: 1) a quality control program designed
to provide accurate data as it is measured and 2) a screening process to
detect spurious values which exist despite the quality control process.
The quality control program for the RAMS network is reviewed in (1) and
(5). Detailed definition and discussion of the elements of quality
control for air pollution measurement systems have been published in (6).
The specific quality control activities relating to calibration, zero/span
checks, status and analog checks associated with the gas analyzers are
quite similar to those of the CHAMP program which are discussed in the
preceding paper by Dr. Marvin Hertz.
Based on the experience of managing the data validation activities
of RAMS we have developed a summary of screening techniques which is
applicable for any continuous automated monitoring network (air pollution,
water quality, etc.). These tests (Table 2) have been divided into
three categories: 1) Operational, 2) Continuity and Relational and 3) A
Posteriori. Discussion of screening tests within each category and
their application in RAMS follows.
The first category, "Operational," contains checks which document
the network instrument configuration and operating mode of the recording
station. These checks, which in RAMS are part of the quality control
program, include checks for station instrumentation, missing data,
system analog and status sense bits, and instrument calibration mode.
In addition to documenting system performance the checks are used to
flag data in the RAMS archive. As designed, the RAMS data bank contains
space for every potential measurement. For example, if an instrument is
in calibration mode, the corresponding data slots will contain a "calibration"
flag.
268
-------
The second category, "Continuity and Relational," contains temporal
and spatial continuity checks and relational checks between parameters
which are based on physical and instrumental considerations or on statistical
patterns of the data. A natural subdivision can be made between intra-
station checks, checks which apply only to data from one station, and
interstation checks, those which test the measured parameters for uniformity
across the network.
Intrastation checks include tests for calibration drift (gas
analyzers in RAMS), lower detectable limits, gross limits, aggregate
frequency distributions, relationships, and temporal continuity.
The drift calculations, which are part of the quality control
program, are discussed in the above references. Many measurement instruments
have a threshold, or lower detection limit (LDL), below which their
output is obscured by instrument noise. A standard practice adopted in
RAMS is to replace values in this range (0.0 +_ LDL) with +1/2 LDL. The
LDL's for the gas analyzers and the wind speed sensor are the lower
instrument limits listed in Table 3.
Gross limits, which in RAMS are used to screen impossible values,
are based on the ranges of the recording instruments. These, together
with the parametric relationships which check for internal consistency
between values, are listed in Table 3. Setting limits for relationship
tests requires a working knowledge of noise levels of the individual
instruments. The relationships used are based on meteorology, atmospheric
chemistry, or on the principle of chemical mass balance. For example,
at a station for any given minute, TS cannot be less than S02 + H2S with
allowances for noise limits of the instruments.
269
-------
A refinement of the gross limit checks can be made using aggregate
frequency distributions. With a knowledge of the underlying distribution,
statistical limits can be found which have narrower bounds than the
gross limits and which represent measurement levels that are rarely
exceeded. A method for fitting a parametric probability model to the
underlying distribution has been developed by Dr. Wayne Ott of EPA's
Office of Research and Development (7). B.E. Suta and G.V. Lucha (8)
have extended Dr. Ott's program to estimate parameters, perform goodness-
of-fit tests, and calculate quality control limits for the normal distribution,
2- and 3-parameter lognormal distribution, the gamma distribution, and
the Weibull distribution. These programs have been implemented on the
OSI computer in Washington and tested on water quality data from STORET.
This technique has not been implemented within RAMS.
Also, under intrastation checks are specific tests which examine
the temporal continuity of the data as output from each sensor. It is
useful to consider, in general, the types of atypical or erratic responses
that can occur from sensors and data acquisition systems. Figure 4
illustrates graphically examples of such behavior, all of which have
occurred to some extent within RAMS. Physical causes for these reactions
include sudden discrete changes in component operating characteristics,
component failure, noise, telecommunication errors and outages, and
errors in software associated with the data acquisition system or data
processing. For example, it was recognized early in the RAMS program
that a constant voltage output from a sensor indicated mechanical or
electrical failures in the sensor instrumentation. One of the first
screens that was implemented was to check for 10 minutes of constant
270
-------
output from each sensor. Barometric pressure is not among the parameters
tested since it can remain constant (to the number of digits recorded)
for periods much longer than 10 minutes. The test was modified for
other parameters which reach a low constant background level during
night-time hours. SO^ was generally at zero and no persistency check
was applied against it.
A technique which can detect any sudden jump in the response of an
instrument, whether it is from an individual outlier, step function or
spike, is the comparison of successive differences of a measurement with
predetermined control limits. These limits are determined for each
parameter from the distribution of successive differences for that
parameter. These differences will be approximately normally distributed
with mean zero (and computed variance) when taken over a sufficiently
long time series of measurements.
The type of "jump" can easily be identified. A single outlier will
have a large successive difference followed by another about the same
magnitude but of opposite sign. A step function will not have a return,
and a spike will have a succession of large successive differences of
one sign followed by those of opposite sign.
Though not implemented in Rockwell's data processing and validation
program, TAPGEN, (partly because of an expected large increase in processing
time on the PDF 11/40), or in EPA's data archiving programs (version
6.4) strong consideration has been given to this technique for potential
applications in data screening and quality control checks. In 1976, the
Data Management and Systems Analysis Section (DMSAS) awarded a contract
to RTI to study validation procedures for the RAPS data bank (9) in
271
-------
which a major area of investigations was the use of minute successive
differences. Use of successive differences in an ongoing special validation
study will be discussed in a later section.
A number of interstation checks on meterological parameters are
implemented in Rockwell's TAPGEN program. However, they have only been
used for quality control of the RAMS system and not for validation
(flagging) of RAMS data. These tests, which are shown in Table 4, are
performed on hourly average data.
Another interstation check, the Dixon ratio test has been examined
to determine its applicability for screening RAMS network outliers
(1,9). Dr. Ty Hartwell, RTI, in an earlier session presented some
results he has obtained using the Dixon ratio test on RAMS data. This
test was never implemented into the RAMS data validation system.
Referring again to Table 2, the third screening category, "A
Posteriori", was established to provide a mechanism for overriding the
automated flagging schemes which have been implemented in the instrumentation
at the remote sites and in the data screening module. From a review of
station logs and preventive maintenance records, a knowledge of unusual
events, or through visual inspection of data, it may be determined that
previously valid data should be flagged as questionable. Conversely, it
may be determined that previously invalid data should be validated by
removing existing flags. An example of when data would be invalidated
is when an instrument, such as a wind direction indicator, becomes
misaligned or uncalibrated because of some non-linear or unknown reason.
Removal of flags or revalidation can occur, for example, when the recording
instruments function properly, but either the sense bit or analog status
circuitry is known to have malfunctioned.
272
-------
RAMS Automated Validation System
The screening tests used in validating RAMS data were largely
developed and tested at RTF and then implemented in the St. Louis central
facility computer for on-site, near-real-time processing. Through
continued testing and modification the validation system evolved to its
final version - version 6.4. All data archived by previous versions
have been rearchived to this standard. Table 5 lists the causes and
flags of screening tests while Figure 5 shows a flow diagram of the
order in which the tests are applied.
Special Validation Studies
Special known problems have occurred on certain parameters from
time to time. The origin of these problems can be traced to sensor
failure, electrical transients, software bugs at RAMS stations and at
the central facility, data acquisition hardware, etc. Despite the
automated validation program these problems have lead to the archival of
erroneous data. It should be noted that these problems have only effected
a small percent of all data - estimated to be less than 1 percent of the
total.
In an effort to locate, review and flag any remaining suspect data
(known a priori or not) several studies have been initiated within
DMSAS. Two major efforts involve a graphical review of hour average
data and a computer study of minute successive differences:
Rams Hour Average Graphical Review. — Table 6 lists the volume of
data from the RAMS networks, the number of minute and hour plots and the
number of microfiche (24x) required for plotting all RAMS data. The
tremendously large number (70,000) of minute plots preclude a graphical
273
-------
review at this time interval. Therefore, a graphical review system
using hour average data was developed. This system combines the use of
computer graphics, interactive programs and computer files (lists)
wherever possible to reduce the manual labor associated with the various
tasks.
The steps in this study are shown in Figure 6. Computer-generated
note books of hour average plots are reviewed by trained personnel for
any suspect data. See Figure 7 for an example. The plots are also
reviewed by a second individual. A consolidated list of dates and times
is entered into the computer for input into an automatic retrieval of
minute plots using the RAPS*GRAPHICS program. These plots from the RAMS
minute archive are reviewed by DMSAS personnel. From the original
review file a second computer disc file (preliminary update file) containing
dates, time periods and suggested changes and flags is prepared. This
list with corresponding minute/hour plots is forwarded to Rockwell for
review and investigation of cause. With the concurrance of DMSAS and
Rockwell the final output from the graphical review process is reached:
an update file for the RAMS minute/hour archive.
Minute Successive Difference Study. — Visual inspection of hour
data will detect large discontinuties in time series plots of a measurement
or uncorrelated traces between stations. However, if a few minutes of
"spiky" data were recorded during an hour, the hour average may only be
changed by a few percent. Since hour-to-hour variations in almost all
RAMS parameters can normally be much larger than a few percent, small
changes caused by errors in minute data will not be detected by visual
observation.
-------
To determine the quality of the minute archival data we have been
applying a flagging procedure based on distribution functions of minute
successive differences. This technique is based on the assumption that
minute successive differences will approximate a normal distribution
with mean zero. See Figure 8 for an example of a distribution function
of ozone data from a five hour period.
The RTI study (9) has shown that for a given parameter sample
standard deviations of minute successive differences are not constant
over stations, time of day, or seasons. The functional form for E,
which can be expressed as:
I E z (parameter, date, time, station)
is not known, however. Therefore, this study data flags have been
chosen as 4*Zmax where Zmax is the largest sample standard deviation
found in the RTI study for a given parameter. These 4 sigma limits are
listed below.
RAMS Variable 4 Sigma Limit
Windspeed (meters/sec.) +_ 3.0
Temperature (°C) +_ 0.7
Ozone (ppm) + 0.010
CO (ppm) +1.97
Methane (ppm) +0.32
THC (ppm) +0.84
NO (ppm) + .028
NOX (ppm) + .035
Total Sulfur (ppm) +_ .022
S02 (ppm) +_ .015
275
-------
Flagged data (dates, times, station, etc.) are stored on the Univac
1110 and can be used as input in further analysis. Programs exist to
automatically print and plot suspect data. An example of minute temperature
data with a succession of outliers is shown in Figure 9. The corresponding
hour average is circled in Figure 7.
Application of the minute succesive difference technique in a RAMS
data base update module will permit recalculation of hour averages which
contain significant amounts of erroneous spiky data.
RAMS Data Base Update
The visual validation and successive difference studies are part of
a review of the RAMS data being conducted by DMSAS and Rockwell. The
results of this review process will be an update file (dates, times,
changes, flags, etc) and separate modules for an update program. Figure
10 shows this update process including review studies and specifically
known problem areas. Underlying this review is the requirement that all
changes will be documented and concurrance required as to probable cause
of suspect data.
Monitoring System with Quality Control and Data Screening
Figure 11 illustrates the data validation process within the framework
of a generalized monitoring system. Associated with sensor instruments
and the data acquisition system are quality control blocks which contain
those elements required for acquiring acceptable data: calibration,
system status and sense bits, preventive maintenance, training and
operation and maintenance documentation and records. Data processing
276
-------
and screening should take place soon after data acquisition to permit
system feedback in the form of corrective maintenance, changes to control
processes and even to changes in system design.
A control data set (or sets) should be created for use in software
verification. When software changes are made, the control data set is
processed and the output compared with previous versions. This is
analagous to the recalibration of gas analyzers after maintenance. The
control data set used in RAMS is 1-day's data from all sites.
The effectiveness of the data review process can be greatly enhanced
by the use of graphics. Review of graphical displays of raw data permits
a rapid continuity check of individual time series and a visual correlation
of network data. Graphics naturally augments automated data validation
techniques which are necessarily based on a priori knowledge of system
performance characteristics, expected magnitude and variations in recording
levels, etc.
A monitoring system is dynamic in nature - responding to changing
hardware/software requirements and to variations in operating and maintenance
procedures. On-site, near-real-time data review and allowance for
feedback in system design can minimize the amount of lost or marginally
acceptable data.
277
-------
References
(1) Jurgens, R.B. and R.C. Rhodes,1976: Quality Assurance and Data Vali-
dation for the Regional Air Monitoring System of the St. Louis
Regional Air Pollution Study. Proc. of the Conference on Environmental
Modeling and Simulation, EPA 600/9-76-016, 730-735.
(2) Burton, C.S. and G. M. Hidy, 1974: Regional Air Pollution Study Program
Objectives and Plans, EPA 630/3-75-009, 53 pp.
(3) Browning, R.H., 1977: (RAPS) Description and Status of the Data Measure-
ments, Quality Assurance and Data Base Management System, (unpublished),
72 pp.
(4) Meyers, R.L. and J.A. Reagan, 1975: Regional Air Monitoring System at
St. Louis, Missouri, International Conference on Environmental Sensing
and Assessment, Paper 8-2, Lofc #75-37494, 4 pp.
(5) Hern, D.H. and M.H. Taterka, 1977: Regional Air Monitoring System Flow
and Procedures Manual, EPA Contract DU 68-02-2093, 177 pp.
(6) Quality Assurance Handbook for Air Pollution Measurement Systems, 1976:
Volume I, Principles, EPA 600/9-76-005, 365 pp.
(7) Ott, W.R., 1974: Selection of Probability Models for Determining Quality
Control Data Screening Range Limits, Presented at 88th Meeting of the
Association of Official Analytical Chemists, Washington, D.C., 6 pp.
(8) Suta, B.E. and G.V. Lucha, 1975: A Statistical Approach for Quality
Assurance of STORET-Stored Parameters, SRI, EPA Control No. 68-01-
2940, 8 pp.
(9) Hartwell, T. and F. Smith, 1977: Study of Two Data Validation Procedures
for the RAPS Data Bank, RTI project 43U-1291-2, EPA Contract 68-02-2407,
46 pp.
278
-------
"I
ro I
a '
a
•—I
a
c
o
en
c
o
'+-J
03
o
o
273
-------
00
LU O
CO >— i
Z h-
oo
CM ro no in
i— r— r— CM
in
CM
in
CM
in
CM
in
CM
in
CM
in
CM
in
CM
in
CM
in CM
CM i—
CM
CM
OH —I
•=> et
oo >
01
o •—
C O I
•r- O t
o
ex:
cc
o
CO
CM
O
00
LU
O
X
o
Q
o:
oo
CM
X
o
LU
O
CM
00
eg
U-
I
=3
oo
OO
co x
o o
CD O
O LU i—i
OH Z OH
O O I—
>• M HH
x O z
o
oo
LU
O
X HH
o z
CM
O
O LU
i-i Q
X i—i
O X
1-1 O
O Z
LU
a:
5
C_J
LU
OO
O
CD
u
o
Q. t— •
UJ
t— •
O
00
O
CJ
LU
OH
O
-
Q.
-------
o
QC <
LU
H- 00
II
LU O_
QC
<
I-
<
a
oo
00
00
cc
<
CO
2
O
LU
o
o
-t >
< I-
0
LU <
O LU
x o o
320
5 < <
o
_J
2
O
p
<
K
oo
00
CO LU
2 h-
— CO
>-
CO
o
in
cc
03
•o
CO
I-
2
LU
^
cc
H-
GO
,
^
j
*•
*u.
O
0
0
Mi
J
o
QC
O
LU
t-
LU
^
•
^
2
O
H-
Q
<
QC
QC
_i
O
00
>
j-
_
<
»•
•N
J
t
3
C
3
QC
<
I
id cc
< 22
a _,
281
-------
CJ
CO h-
cc a
co
LU Jrt
CC
*"
1
LU
O
oo
00
CO
_ -
« CO
0
a. J3
a co
in -J LU
CM UJ K
LU CO
CO
LU C3
H- <
= CC
m|*l>
H- co — co" cc
u5£5i =
CC CO of L! 3-
" S rf ^
I- >
< 00 2
a Q o
CO
= < H
2 a co
111
P < o <
co a x DC
oc — u. LU
•M
'o
c
01
o
CO
•H1
nj
T3
CO
CO
O)
O)
LU
CO
o:c
282
-------
TABLE 2. SCREENING CATEGORIES FOR AUTOMATED RECORDING NETWORKS
I. OPERATIONAL
NO INSTRUMENT
MISSING MEASUREMENT
STATUS
CALIBRATION
II. CONTINUITY AND RELATIONAL
A. INTRA-STATION
CALIBRATION DRIFT
LOWER DETECTABLE LIMITS
GROSS LIMITS
AGGREGATE FREQUENCY DISTRIBUTIONS
RELATIONSHIP AMONG PARAMETERS
TEMPORAL CONTINUITY
CONSTANT OUTPUT
SUCCESSIVE DIFFERENCE
B. INTER-STATION
METEOROLOGICAL NETWORK UNIFORMITY
STATISTICAL OUTLIERS
DIXON RATIO
III. A POSTERIORI
REVIEW OF STATION LOG
UNUSUAL EVENTS OR CONDITIONS
VISUAL INSPECTION OF DATA
283
-------
TABLE 3. GROSS LIMITS AND RELATIONAL CHECKS
PARAMETER
INSTRUMENTAL LIMITS
INTERPARAMETER CONDITION
Ozone
Nitric Oxide
Oxides of
Nitrogen
Carbon Monoxide
Methane
Total Hydro-
carbons
Sulfur Dioxide
Total Sulfur
Hydrogen Sul-
fide
Aerosol Scatter
Wind Speed
Wind Direction
Temperature
Dew Point
Temperature
Gradient
Barometric
Pressure
Pyranometers
Pyrgeometers
Pyrehliometers
LOWER
.005 ppm
.005 ppm
.005 ppm
.1 ppm
.1 ppm
.1 ppm
.005 ppm
.005 ppm
.005 ppm
0.00001 m"1
.27 m/s
0°
-20°C
-30°C
-5°C
950 mb
-0.50
0.30
-0.50
UPPER
5 ppm
5 ppm
5 ppm
50 ppm
50 ppm
50 ppm
1 ppm
1 ppm
1 ppm
N0*03 £0.01
NO - NOX £ .002
NO - NOX <_ .002
CH4 - THC 1 -1
CH4 - THC <_ .1
S02 - TS 1 .002
S02 - TS l .002
H2S - TS £ .002
(NO)
(NOX)
(CH4)
(THC)
(so2)
(TS)
(H2S)
0.00099 m"1
22.2 m/s
360°
45°C
45°C
5°C
1050
2.50
0.75
2.50
DP - 1.0 <.T
mb
Langleys/min
Langleys/min
Langleys/min
28^
-------
IRREGULAR INSTRUMENT
RESPONSE
A) SINGLE OUTLIER B) STEP FUNCTION
A.
•••••S ••••••• •*•••••••••••••
C) SPIKE D) STUCK
• •%•• ••*••% ••*•••
E) MISSING F) CALIBRATION
• ••••*
G) DRIFT
Figure 4. Irregular instrument response.
285
-------
TABLE 4. MAXIMUM ALLOWABLE DEVIATIONS FROM NETWORK MEAN
UNDER MODERATE WINDS (NETWORK MEAN > 4 m/sec)
WIND SPEED 2 m/sec OR mean/3
(WHICHEVER IS LARGER)
WIND DIRECTION 30°
TEMPERATURE 3°C
TEMPERATURE DIFFERENCE 0.5°C
DEW POINT 3°C
ADJUSTED PRESSURE 5.0 millibars
TABLE 5. RAMS DATA VALIDATION VERSION 6.4
CAUSE FLAG
1. MISSING DATA 1037
2. CALIBRATION DATA 1035
3. EXCESS DRIFT -VALUE
4. FAILED RANGE TESTS 1034
5. LDL CHECKS 1/2 LDL
6. STATUS ERROR VALUE X 10"25
7. FAILED RELATIONAL TESTS VALUE X 1032
8. FAILED TIME CONSTANT TESTS VALUE X 1024
9. FAILED NETWORK TESTS Q. A. REVIEW
10. DATA MANAGEMENT OVERRIDE VALUE X 10"15
286
-------
ro
T3
13
C
O)
C
S screen
<
tr
73
6
LO
O)
i_
13
O)
-------
UJ
o
oo
O
o;
i— CM
o
CM
O
CM
C£ UJ
ro >-
o \
:r i—
o
_i
CL.
VO
00
CO
CD 00
r— CO
•— O
co co
CTl CO
CM
ce
o
o
oo
O
Lf>
CM
O
CM
CO
CO
•=*
Cft
CM
LO
CO
o
a:
u,
o
UJ
s
I
— _J
o
LU LU
1— >-
Z3 \
2: oo
1—4 t—
[
Q.
LO
OU
f>n
CO
CM
O
O
cn
px—
o
co
f*«*.
LO
vo
co
LO
ko
co
O
co
CM
v:
o
*
o
en
•a:
00
o:
e£
UJ >-
^3 o:
(/> UJ
o
co
co
co
oo
to
co
LO
co
o
CM
CM
o
CM
o
o
LO
eu oo
z. -z.
i-. o
Q 1-1
0£. \—
0<
O I—
uj oo
cc
LO CO
CM r-
CM
<£>
oo
Qi
CC. UJ
UJ h-
CQ LU
co
CM
i— i— CM
OO
I
ct:
o
-------
HOUR AVERAGE
PLOTS
IDENTIFICATION
OF
SUSPECT DATA
REVIEW
PLOTS
OF
MINUTE DATA
INITIAL ANALYSIS
OF
QUESTIONABLE DATA
VERIFICATION
OUTPUT: FILE OF TIME PERIODS
AND CHANGES
Figure 6. Visual review of RAMS hour average data.
289
-------
SNOIlViS
in
o
C;
in
a z
UJ O
my Jf?
ro ^r I**
at
a>
in
at
in
r»
O)
_ o> o
S£S 5
<
a
•3- CM CM «» CM CM rt CM
000 OOOO OOOO 000 O 00000 000 00
CM^CM CM^CM CtJ^-CM CM^-CM CM^a**" ---
CMQ'CM (M«teM
CD
a
a>
(O
•4—'
TO
tu
O)
O
JC
CO
cc
r--'
-------
I I I I I I I I I I I I II
II I I
I I I I I I I I I I I I I I I I I
E
c
o
o
c
C
.Q
O)
o
c
0)
0)
0
u
c
E
O)
c
o
N
O
CO
01
291
-------
1 1 \ fl I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1
1
1
V
1
—
— 1 -
1
/
1
•
—
J t
lAv
. . 1 , , , , 1 . . . I 1 . . . . 1 . . . .
a
D
en
CM
CM
CD
o
^
CM
O
o
en
en
0
o
o
00
^
a
en
5
a
o
o
in
0
o
g
CM
in o in o in a *T
«- «- CM«-
CO
in
en
CM
in
^
o
CM
O
O
*~
§
CM
in
^
o
s
CO
0
c
o
(O
+-t
E
s
'ro
(D
T3
_^
L
3
to
a3
Q.
E
3
C
ai
O)
^
LL
(30) 3UniVU3dW31
292
-------
HOUR
ARCHIVE
VERSION 6.4
MINUTE
ARCHIVE
VERSION 6.4
UPDATE
PROGRAM
VISUAL VALIDATION
STUDY
SUCCESSIVE
DIFFERENCES
UPDATE
FILE
WIND SPEED
SPIKES
CO SPIKES
(AFTERCALIB)
NEGATIVE
POLLUTANTS
LDL
HOUR
ARCHIVE
VERSION 7.0
Figure 10. RAMS update.
293
-------
ll
LU {/)
CJ
oc o
=3 K
O O.
ocu 2
1-0-1
Z co <
~~
II
t— UJ UU
CO
a
E
o>
01
c
o
E
•a
a>
N
9)
CD
O)
LL
294
-------
NAMES AND ADDRESSES:
PROGRAM
EPA/RTP INTERLABORATORY QUALITY ASSURANCE COORDINATING COMMITTEE
DATA VALIDATION CONFERENCE, SPEAKERS
DATA VALIDATION CONFERENCE, ATTENDEES
295
-------
Conference on Data Validation
Research Triangle Park, North Carolina
November 4, 1977
The program that was distributed before the meeting is presented. Following
this is an alternate schedule, which was used the day of the meeting.
296
-------
CONFERENCE
ON
DATA VALIDATION
Environmental Research Center Auditorium
Highway 54 and Alexander Drive
Research Triangle Park, North Carolina
November 4, 1977
Sponsored by
ERC/RTP Interlaboratory Quality
Assurance Coordinating Committee
U.S. ENVIRONMENTAL PROTECTION AGENCY
Office of Research and Development
Research Triangle Park, North Carolina
297
-------
PROGRAM
8:00
Registration
8:25
Welcome
Dr. J.K. Burchard
Sen/or ORD Official, RTP
8:30
Opening Remarks
S. Hochheiser
EMSL
GENERAL SESSION
D.J. von Lehmden
EMSL
Moderator
8:35
What is Data Validation?
R.C. Rhodes
EMSL
8:45 Validation Procedures Applied to
In-Use Motor Vehicle Emission Data
M.E. Williams
EPA, Ann Arbor
9:15 Computer Graphics In Data
Validation
Dr. R.H. Allen
COMP-AID
9:45
COFFEE BREAK
10:00 Engineering Computations and Data
Collection Formats Useful in
Data Validation
A.C. Nelson, Jr.
PEDCo
10:30 Regional Validation of State and
Local Air Pollution Data
T.H. Rose
EPA, Region IV
11:00 Use of Precision and Accuracy
Estimates for Validation of Data
Dr. D.T. Mage
HERL
11:30
12:30
LUNCH
298
-------
J2 S to
-*UJ ®
< t5
q i
CD
f 2
LU§
ceo
29;
S5
O
L_I
r^
Q
.
^
GO
—. w
€
o
4
O"
O
>
"5.
a.
<
«
"5
a
(A
J3 C
!w
o>|
£-2
c15
!Z
r
co£
o
co
cvi
o
cvi
m
co
cvi
co
O
CO
f0
uj
co
u.
O
cc
UJ
Q
1
LU
cc
299
-------
AUDITORIUM
Special Environmental
Monitoring Studies
Dr. M.M. Bufalini
ESRL
Moderator
2.50 Data Validation for the Los Angeles
Catalyst Study (LACS)
3:25 Validation Techniques Used in
Continuous Air Monitoring
Network (CHAMP)
C.E. Rodes
EMSL
Dr. M.B. Hertz
HERL
3.50 Validation System Used in the
St. Louis Regional Air Monitoring
Study (RAMS)
R.B.Jurgens
ESRL
4:15
Closing Comments
S. Hochheiser
EMSL
300
-------
CONFERENCE ON DATA VALIDATION
Research Triangle Park, North Carolina
November 4, 1977
(Alternate Schedule)
AUDITORIUM
8:00 Registration
8:25 Welcome
8:30 Opening Remarks
8:35 What is Data Validation?
Dr. O.K. Burchard,
Senior ORD Official, RTP
Seymour Hochheiser,
EMSL
R.C. Rhodes, EMSL
301
-------
o
I—I
Q
C i-
IS
(J fC
C O -i- Q
O 4-> -C
•i- O) C
-t-» TD > O
US $•• fO
CO CD T3
J- •!-
03 J- r-
CO <0
CO •»-> >
«t- Q. «O
0)
CT
(O
• o:
S- LU
Q DC
CO
QJ
C +•> O
O CO
•r- LU C
00 O
•r- >,-i-
O O -4->
O) (0 (O
J- S.-0
a. 3 -I-
O r—
<4- O
O (O
O Q
O
O
0)
CO
-O $-
C O
o
co
co
CO
CO
LU
co
CO
01
3
CT
U
OJ
U
40
CO
co r-
cos.
CO -P O) Q. 3
5- O) CO
a> n3 i- $- (O
j= x: o -i- aj
H- o co «a: s:
o
co
• •
CM
•r- S-
f— O
(O 4J
»*- ra
3 S-
CQ O)
•a
• o
LU
U.
<
co
a
O
CO
O)
•z.
o
• c_>
O Q
. LU
«C 0-
O
OJ •r-
co cn
o a>
Q; a:
-a
o
cc:
c s-
a>
CO
CO
CD
CO
«0
a>
c
0)
CD
3 C
O. .r-c
O «O r— O
O -M 3 •!-
«O <4- 4->
O)Q CO (O
C 00 T3
•r- TO Z> -i-
J- C .—
co «o co U. O
O
O
(U
S-
CQ
OJ
at
o
o
in
c:
O r- «J
•^ »0 +->
•»J O fO
( —
C «J O
O 4-> O-
•r- CO
a> s-
co M- -i-
Q£ O <
O
O
3 T-
CO
S- O
o <
C CO
o a» >>
•r- i— -O
+•> a> 3
o >>
_J t—
(O (O
4-> O) 4->
to x: (O
Q 4-> O
LO
O
o
o
co
CO
c
o
IT]
o
a.
a.
o
O)
Q.
co
00 CO
U (O
co e
O CO r—
^ o
C r—
C Z3 C S-
CO O CO
co -o ••- +•>
S- OJ +J C
u co (O co
CO 3 Z CJ
o
co
• •
CM
302
-------
an
o
o
c
re
S-
3
CJ>
I— 00
D.
• cr
i- <
Q O
(/) 3
CU r-
'o
Q.Q-
fO
C3 S-
C <
O
3 S-
-Q 3
•r- O
s- n:
-t-> to
01 s- +->
•r- O re
O 4- Q
OJ
cu
.a
re
Q-
O
I
E
re
oo
cn re
r— C -P
re -I- re
o -P o
•P T3 01
to -i— -p
•r- r— O
-p re cu
re > 4-
oo c LU
o cn-p
c r—
cu -r- re
01 r— CU
(0
s-
CQ
CU
4-
O
CU
S-
03
S- I—
a o:
CU
E C
•r- O
i— x re
•r- -P
cu o re
> o
•i- -a
01 C i-
01 re o
cu 4-
(_) 01
o cu -P c
o to o
c cu •!-
oji- -P
s- re
cu o -a
3
i.
c
re
tO
• c
i- O
Q O
re
01
to
(/) O
re re
C
CU
01
3
CQ CtL
• 00
oo
"re -=c
C OL
-a o —
CU •!-
01 cn >,
rD CU T3
ct 3
E -P
cu 01 co
to
O)
-o
O
o oo
to
3
OO
i-
a c£. >
01 re,
r- 3
' Q
01 3 cn c
>, o c cu
,_!.£ I
C • O O
O -P -P O
•i- 00 T-
-P c cn
re cu o c
•o ^: s: ••-
t^ S- O
re c -I- r-
o
o
o
co
o
o
c\j
un
i—•
CM
LO
CM
IT)
oo
O
O
to
>
o
O
o:
oo
oo
a.
C.
o
o
a a:
s-
i— O
r- _a
cu s-
tO ef.
3 C
re c
Q. et
Q o:
• Q_
CJ LU
s-
O)
CQ
S-
Q
C£.
i.
O)
to
ai
o
o
• oo
oo LU
o
o
OL
oo
oo
S-
O CD
n- c
o c o re
•1- 3 •!- -P
•p o 01 re
re oo 01 Q
•o -r-
•r- S- E >,
i 1- LU S-
re
re Q. re c
a •=> Q •-<
01
3
O
3 re
c 4->
•i- re
4-> Q
c
O O)
o c
•^-
1- S-
o o
+J
E -r-
o c
•r- O
•P 5!
re
r— re
re -t->
s> oo
re
o
\s
re
cu
03
cu
cu
4-
o
0
•r—
•o cu
QJ £T
£ *r~
O _l
•I- 1
CO C E
o cu
4- -P
O -C 01
01 >>
C 3 00
o o
•r- S_ J-
-P JC CU
re i— -P
-O 3
•r- re a.
r- -P E
re re o
=> Q O
cn
C
• r—
C -P
•r- 01
CU
c -o i—
o cu
•i— O1 CU
-p => a
re s-
TD 01 3
•r- CU O
•— 3 CO
re cr
> -i- CU
C r—
re jz T-
•P O J3
re cu o
a i— 2:
01
CD
3
cr 01
•r- 3
C O
.C 3
o c cn
0) -r- C
J— -P •!-
c s-
C O 0
O 0 -P
-P C C
re -i- o
T3 2!
•i- T3
.— cu s-
re 01 -i-
5» ^3 «i
01
•^>
c
cu
o
o
cn
c
•r-
01
o
r~
CJ
o
o
o
oo
o
o
LT>
O
o
OO
303
-------
Conference on Data Validation
Research Triangle Park, North Carolina
November 4, 1977
Members of the EPA/RTP* Inter!aboratory Quality Assurance
Coordinating Committee:
Mr. Seymour Hochheiser, Chairman
Assistant to the Director
EMSL
MD-75
RTP, NC 27711
Telephone: (919) 541-2106
FTS: 629-2106
Mr. Raymond C. Rhodes, Secretary
Quality Assurance Specialist
STAB/EMSL
MD-75
RTP. NC 27711
Telephone: (919) 541-2293
FTS: 629-2293 -
Mr. Ferris B. Benson
Quality Assurance Coordinator
HERL
MD-52
RTP, NC 27711
Telephone: (919) 541-2545
FTS: 629-2545
Dr. Marijon M. Bufalini
TPRO/ESRL
MD-59
RTP, NC 27711
Telephone: (919) 541-2949
FTS: 629-2949
Mr. William B. Kuykendal
Mechanical Engineer
IERL
MD-62
RTP, NC 27711
Telephone: (919) 541-2557
FTS: 629-2557
Mr. Darryl von Lehmden
Chemical Engineer
QAB/EMSL
MD-77
RTP, NC 27711
Telephone: (919) 541-2415
FTS: 629-2415
Acronyms arranged alphabetically and used in this and the subsequent two
sections.
EMSL - Environmental Monitoring and Support Laboratory
EPA - Environmental Protection Agency
ESRL - Environmental Sciences Research Laboratory
HERL - Health Effects Research Laboratory
IERL - Industrial Environmental Research Laboratory
MD - Management Division
NC - North Carolina
QAB - Quality Assurance Branch
RTP - Research Triangle Park
STAB - Statistical and Technical Analysis Branch
TPRO - Technical Planning and Review Office
305
-------
Conference on Data Validation
Research Triangle Park, North Carolina
November 4, 1977
List of Speakers
Dr. Rod Allen
COMP-AID, Inc.
Box 12327
RTF, NC* 27709
Telephone: (919) 967-6376
Ms. Carolyn P. Chamblee
EPA/HERL
MD-55
RTP, NC 27711
Telephone: (919) 541-2348
FTS: 629-2348
Mr. Larry Claxton
EPA/HERL
MD-68
RTP, NC
Telephone: (919) 541-2518
FTS: 629-2518
Dr. Harold Crutcher
Consultant
35 Westall Ave.
Asheville, NC 28804
Telephone: (919) 253-2539
FTS: 672-0961
Dr. Thomas Curran
EPA/OAQPS
MD-14
RTP, NC 27711
Telephone: (919) 541-5351
FTS: 629-5351
Dr. Tyler Hartwell
RTI
Box 12194
RTP, NC 27709
Telephone: (919) 541-6453
Dr. Marvin Hertz
EPA/HERL
MD-56
RTP, NC 27711
Telephone: (919) 541-3124
FTS: 629-3124
Mr. William F. Hunt
EPA/OAQPS
MD-14
RTP, NC 27711
Telephone: (919) 541-5351
FTS: 629-5351
Mr. Robert B. Jurgens
EPA/ESRL
MD-80
RTP, NC 27711
Telephone: (919) 541-4545
FTS: 629-4545
Mr. William E. Klint
NOAA
Federal Building
Asheville, NC 28801
Telephone: (704) 258-2850, ext. 755
FTS: 672-0755
Dr. David T. Mage
EPA/HERL
MD-56
RTP, NC 27711
Telephone: (919) 541-3121
FTS: 629-3121
306
-------
Mr. Joseph E. McCarley, Jr.
EPA/ESED
MD-13
RTP, NC 27711
Telephone: (919) 541-5245
FTS: 629-5245
Mr. A. Carl Nelson
PEDCo
Suite 201
5055 Duke Street
Durham, NC 27701
Telephone: (919) 688-6338
Ms. Joan Novak
EPA/ESRL
MD-80
RTP, NC 27711
Telephone: (919) 541-4545
FTS: 629-4545
Mr. C. Don Paul sell
EPA
2565 Plymouth Road
Ann Arbor, MI 48105
Telephone: (313) 668-4342
FTS: 374-8342
Mr. Charles E. Rodes
EPA/EMSL
MD-76
RTP, NC 27711
Telephone: (919) 541-3076
FTS: 629-3076
Mr. Thomas H. Rose
EPA/SB
College Station Road
Athens, GA 30605
Telephone: (404) 546-3489
FTS: 250-3489
Ms. Marcia Williams
EPA
2565 Plymouth Road
Ann Arbor, MI 48105
Telephone: (313) 688-4342
FTS: 374-8323
See previous section for definition of acronyms.
viously are defined as follows:
Acronyms not used pre-
NOAA - National Oceanic and Atmospheric Administration
OAQPS - Office of Air Quality Planning and Standards
RTI - Research Triangle Institute
GA - Georgia
MI - Michigan
307
-------
Conference on Data Validation
Research Triangle Park, North Carolina
November 4, 1977
List of Attendees
Gerald G. Akland
EPA/EMSL/STAB
MD-75
RTP, NC 27711
Tel: (919) 541-2346
FTS: 629-2346
Rod Allen
COMP-AID
P.O. Box 12327
RTP, NC 27709
Tel: (919) 967-6376
Joseph S. All
EPA/HERL
MD-55
RTP, NC 27711
Tel: (919) 541-2240
FTS: 629-2240
J. Anderson
Rockwell International
5529 Chapel Hill Blvd.
Durham, NC 27707
Tel: (919) 942-2407
D. W. Armentrout
PEDco
1499 Chester Road
Cincinnati, OH 45246
Tel: (513) 782-4700
James D. Ashworth
U.S. Army Corps of Engineers
P.O. Box 2127
Huntington, WV 25721
Tel: (FTS) 924-5694
Andy Berlin
Xonics, Inc.
P.O. Box 12415
RTP, NC 27709
Tel: (919) 541-3080
John Boston
EPA/SDMO
MD-55
RTP, NC 27711
Tel: (919) 541-2337
Frank Briden
EPA/IERL
MD-60
RTP, NC 27711
Tel: (919) 541-2557
FTS: 629-2557
T. G. Brna
EPA/IERL
MD-61
RTP, NC 27711
Tel: (919) 541-2915
FTS: 629-2915
Steve Bromberg
EPA/QAB
MD-77
RTP, NC 27711
Tel: (919) 541-2273
FTS: 629-2273
308
-------
Robert Browning
EPA/ESRL
MD-80
RTP, NC 27711
Tel: (919) 541-4545
FTS: 629-4545
Sam Bryan
EPA
Chapel Hill, NC 27514
Tel: (919) 541-2872
FTS: 629-2872
Marijon M. Bufalini
EPA/ESRL
MD-59
RTP, NC 27711
Tel: (919) 541-2949
FTS: 629-2949
Bob Burton
EPA/HERL
MD-52
RTP, NC 27711
Tel: (919) 541-1394
FTS: 629-1394
D. Calafiore
EPA/HERL
MD-54
RTP, NC 27711
Tel: (919) 541-2674
FTS: 629-2674
Oon Carpenter
EPA
Ann Arbor, MI 48105
Tel: (FTS) 374-4293
Tom Caldwell
Xonics, Inc.
P.O. Box 12415
RTP, NC 27709
Tel: (919) 541-3080
Susan S. Casada
Northrop Services, Inc.
P.O. Box 12313
RTP, NC 27709
Tel: (919) 549-0611
Carolyn Chamblee
EPA/HERL
MD-55
RTP, NC 27711
Tel: (919) 541-2518
FTS: 629-2518
Ronald Chambler
NCHS - DPB
Box 12214
RTP, NC 27709
Tel: (919) 541-4422
FTS: 629-4422
Jonn Chavy
Xonics, Inc.
P.O. Box 12415
RTP, NC 27709
Tel: (919) 541-3080
Larry Claxton
EPA/HERL
MD-68
RTP, NC 27711
Tel: (919) 541-2518
FTS:, 629-2518
John Clements
EPA/EMSL
MD-77
RTP, NC 27711
Tel: (919) 541-2196
FTS: 629-2196
Wayne Clements
TVA
345 Evans Bldg.
Knoxville, TN 37902
Tel: (615) 632-4579
William M. Cox
EPA/OAQPS
MD-14
RTP, NC 27711
Tel: (919) 541-5312
FTS: 629-5312
309
-------
C. L. Cox, Jr.
EPA/ADM
MD-30
RTP, NC 27711
Tel: (919) 541-2296
FTS: 629-2296
Tom Curran
EPA/MDAD
MD-14
RTP, NC 27711
Tel: (919) 541-5351
FTS: 629-5351
Bob Currin
Xonics, Inc.
P.O. Box 12415
RTP, NC 27709
Tel: (919) 541-3080
Harold Crutcher
35 Westall
Asheville, NC 28801
Tel: (919) 253-2539
Robin Davis
EPA/CH/HERL
MD-73
RTP, NC 27711
Tel: (919) 541-2872
FTS: 629-2872
Davis Davis
P.O. Box 12313
RTP, NC 27711
Tel: (919) 549-2333
Robert Denny
EPA/QAB
MD-77
RTP, NC 27711
Tel: (919) 541-2785
FTS: 629-2785
0. L. Dowler
EPA/HERL
MD-56
RTP, NC 27711
Tel: (919) 541-3126
FTS: 629-3126
Ronald Drago
EPA/MDAD
MD-14
RTP, NC 27711
Tel: (919) 541-5486
FTS: 629-5486
Cary Eaton
RTI
P.O. Box 12194
RTP, NC 27709
Tel: (919) 541-6920
Foy W. Edwards
TVA
345 EB
Knoxville, TN 37902
Tel: (615) 632-2071
Susan B. Edwards
NRCD - Air Quality
P.O. Box 27687
Raleigh, NC 27611
Tel: (919) 733-5125
Gardner Evans
EPA/STAB
MD-75
RTP, NC 27711
Tel: (919) 541-2292
FTS: 629-2292
Gary Evans
EPA/STAB
MD-75
RTP, NC 27711
Tel: (919) 541-2294
FTS: 629-2294
B. E. Edmonds
EPA/EMSL
MD-76
RTP, NC 27711
Donald H. Fair
EPA/STAB
MD-75
RTP, NC 27711
Tel: (919) 541-2732
FTS: 629-2732
310
-------
Bob Faoro
EPA/OAQPS
MD-14
RTP, NC 27711
Tel: (919) 541-5351
FTS: 629-5351
Paul Feder
NIEHS - EBB
P.O. Box 12237
RTP, NC 27709
Tel: (919) 541-5402
FTS: 629-5402
H. L. Fisher
EPA/HERL
MD-74
RTP, NC 27711
Tel: (919) 541-2631
FTS: 629-2631
R. Fisher
EPA/ESRL
MD-80
RTP, NC 27711
Tel: (919) 541-4551
FTS: 629-4551
Nancy Gaskins
RTI
P.O. Box 12194
RTP, NC 27709
Tel: (919) 541-6915
Gerald Gipson
EPA/OAQPS
MD-14
RTP, NC 27711
Tel: (919) 541-5486
FTS: 629-5486
Maurice E. Graves
Northrop Services, Inc.
P.O. Box 12313
RTP, NC 27709
Tel: (919) 549-0411
D. Glover
Rockville International
5529 Chapel Hill Blvd.
Durham, NC 27707
Tel: (919) 942-2407
Bonnee Gryder
Xonics, Inc.
P.O. Box 12415
RTP, NC 27709
Rel: (919) 541-3080
Ed Hanks
EPA/MDAD
MD-14
RTP, NC 27711
Tel: (919) 541-5474
FTS: 629-5474
F. Hageman
Xonics, Inc
P.O. Box 12415
RTP, NC 27709
Tel: (919) 541-3080
Martin Hamilton
NIENS - EBB
P.O. Box 12237
RTP, NC 27709
Tel: (919) 541-5402
Tyler Hartwell
RTI
P.O. Box 12194
RTP, NC 27709
Tel: (919) 541-6453
Tom Heiderscheit
EPA/HERL
MD-55
RTP, NC 27711
Tel: (919) 541-2468
FTS: 629-2468
Marvin Hertz
EPA/HERL, MD-56
RTP, NC 27711
Tel: (919) 541-3124
FTS: 629-3124
311
-------
David 0. Hinton
EPA/HERLD
MD-56
RTF, NC 27711
Tel: (919) 541-3126
FTS: 629-3126
Seymour Hochheiser
EPA/EMSL
MD-75
RTF, NC 27711
Tel: (919) 541-2106
FTS: 629-2106
William F. Hunt
EPA/OAQPS
MD-14
RTP, NC 27711
Tel: (919) 541-5351
FTS: 629-5351
R. C. Jordan
Northrop Services, Inc.
P.O. Box 12313
RTP, NC 27709
Tel: (919) 541-2766
Robert B. Jurgens
EPA/ESRL
MD-80
RTP, NC 27711
Tel: (919) 541-4545
FTS: 629-4545
Robert Jungers
EPA/ESRL
MD-78
RTP, NC 27711
Tel: (919) 541-2456
FTS: 629-2456
William E. Klint
NOAA
Fereral Building
Asheville, NC 28801
Tel: (704) 258-2850
FTS: 672-0755
William B. Kuykendal
EPA/IERL
MD-62
RTP, NC 27711
Tel: (919) 541-2557
FTS: 629-2557
Ralph I. Larsen
EPA/ERSL
MD-80
RTP, NC 27711
Tel: (919) 541-4565
FTS: 629-4565
William D. Lee
EPA/QAB
MD-75
RTP, NC 27711
Tel: (919) 541-2293
FTS: 629-2293
Robert E. Lee
EPA/HERL
MD-51
RTP, NC 27711
Tel: (919) 541-2283
FTS: 629-2283
Barry Levene
EPA - Region VIII
1860 Lincoln Street
Denver, CO 80203
Tel: (303) 837-2226
FST: 327-2226
Dan Litton
EPA/HERL
MD-73
RTP, NC 27711
Tel: (919) 541-2873
FTS: 629-2873
Raymond Michie, Sr.
RTI
P.O. Box 12194
RTP, NC 27709
Tel: (919) 541-6492
312
-------
Randell Morgan
Xonics, Inc.
P.O. Box 12415
RTP, NC 27709
Tel: (919) 541-3080
Gerald K. Moss
EPA/MDAO
MD-14
RTP, NC 27711
Tel: (919) 541-5335
FTS: 629-5335
George C. Murray, Jr.
NCAQ
P.O. Box 27687
Raleigh, NC 27611
Tel: (919) 733-5125
J. E. McCarley, Jr.
EPA/ESED
MD-13
RTP, NC 27711
Tel: (919) 541-5243
FTS: 629-5243
Linda J. McDay
TVA
345-EB
Knoxville, TN 37902
Tel: (615) 632-2071
John S. Nader
EPA/ESRL
MD-46
RTP, NC 27711
Tel: (919) 541-0385
A. Carl Nelson
PEDco
5055 Duke Street
Durham, NC 27701
Tel: (919) 688-6338
William C. Nelson
EPA/HERL
MD-53
RTP, NC 27711
Tel: (919) 541-2330
FTS: 629-2330
W. Norris
Xonics, Inc.
P.O. Box 12415
RTP, NC 27709
Tel: (919) 541-3080
Joan Novak
EPA/ESRL
MD-80
RTP, NC 27711
Tel: (919) 541-4545
FTS: 629-4545
Barbara Nye
EPA/HERL
MD-56
RTP, NC 27711
Tel: (919) 541-3125
FTS: 629-3125
Blaine F. Parr
EPA/HERL
MD-56
RTP, NC 27711
Tel: (919) 541-3123
FTS: 629-3123
C. Don Paul sell
EPA
2565 Plymouth Road
Ann Arbor, MI 48105
Tel: (313) 668-4342
FTS: 374-8342
Debora R. Pizer
EPA/HERL
MD-56
RTP, NC 27711
Tel: (919) 541-3124
FTS: 629-3124
Francis Pooler
EPA/ESRL
MD-59
RTP, NC 27711
Tel: (919) 541-2857
FTS: 629-2857
313
-------
James Reagan
EPA/ESRL
MD-59
RTP, NC 27711
Tel: (919) 541-4486
FTS: 629-4486
Joan Reece
EPA/HERL
MD-55
RTP, NC 27711
Tel: (919) 541-2466
FTS: 629-2466
Raymond C. Rhodes
EPA/STAB
MD-75
RTP, NC 27711
Tel: (919) 541-2293
FTS: 629-2293
Wilson Riggan
EPA/HERL
MD-54
RTP, NC 27711
Tel: (919) 541-2674
FTS: 629-2674
Charles D. Robson
EPA/HERL
MD-67
RTP, NC 27711
Tel: (919) 541-2625
FTS: 629-2625
Charles E. Rodes
EPA/EMSL
MD-76
RTP, NC 27711
Tel: (919) 541-3076
FTS: 629-3076
Tom Rose
EPA - Region IV
College Station Road
Athens, 6A 30601
Tel: (404) 546-3111
Glenn Ross
NCAQ
P.O. Box 27687
Raleigh, NC 27611
Tel: (919) 549-8941
Bill Sensing
EPA/IERL
MD-62
RTP, NC 27711
Tel: (919) 541-2557
FTS: 629-2557
Frank D. Slaveter
EPA
401 M Street, S.W.
EN 340
Washington, DC 20460
Tel: (202) 755-1572
Ben Smith
EPA/IERL
MD-62
RTP, NC 27711
Tel: (919) 541-2557
FTS: 629-2557
Paul E. Smith
Xonics, Inc.
P.O. Box 12415
RTP, NC 27709
Tel : (919) 549-8941
Ralph Sullivan
Xonics, Inc.
P.O. Box 12415
RTP, NC 27709
Tel: (919) 549-8941
Jake Summers
EPA/MDAD
MD-14
RTP, NC 27711
Tel: (919) 541-5395
FTS: 629-5395
Jose Sune
EPA/HERL
MD-56
RTP, NC 27711
Tel: (919) 541-3127
FTS: 629-3127
314
-------
Richard Symonds
Catalytic, Inc.
P.O. Box 240232
Charlotte, NC 28224
Tel: (704) 542-4107
Charles Tate
Xonics, Inc.
P.O. Box 12415
RTP, NC 27709
Tel: (919) 541-3080
C. E. Tatsch
RTI
P.O. Box 12194
RTP, NC 27709
Tel: (919) 541-5945
Lawrence E. Truppi
EPA/HERL
MD-54
RTP, NC 27711
Tel: (919) 541-2861
FTS: 629-2861
John Van Bruggen
EPA/HERL
MD 55
RTP, NC 27711
Tel: (919) 541-2465
FTS: 629-2465
Darryl vonLehmden
EPA/QAB
MD-77
RTP, NC 27711
Tel: (919) 541-2415
FTS: 629-2415
Betty Wagman
EPA/EMSL
MD-56
RTP, NC 27711
Tel: (919) 541-3125
FTS: 629-3125
Kim Wattenbarger
Xonics, Inc.
P.O. Box 12415
RTP, NC 27709
Tel: (919) 541-3080
J. E. Whitney
EPA/WA
RD-680
401 M Street, S.W.
Washington, DC 20460
Tel: (202) 426-4477
Cindy Wingarden
Xonics, Inc.
P.O. Box 12415
RTP, NC 27709
Tel: (919) 541-3080
Mack Mil kins
EPA/EMSL
MD-45
RTP, NC 27711
Tel-(919) 541-3119
FTS: 629-3119
Marcia Williams
EPA
2565 Plymouth Road
Ann Arbor, MI 48105
Tel: (313) 688-4342
FTS: 374-8323
Max Woodbury
Rockwell International
5529 Chapel Hill Blvd.
Durham, NC 27707
Tel: (919) 493-2471
Chris Woodbury
Xonics, Inc.
P.O. Box 12415
RTP, NC 27709
Tel: (919) 541-3080
315
-------
TECHNICAL REPORT DATA
(Please read fnuructions on the reverse before completing}
1 REPORT NO. 2.
EPA-600/9-79-042
4 TITLE ANDSUBTITLE
DATA VALIDATION CONFERENCE, Proceedings
7 AUTHOR(S)
Raymond C. Rhodes and Seymour Hocheiser, Editors
9 PERFORMING ORGANIZATION NAME AND ADDRESS
Office of Research and Development
Environmental Monitoring and Support Laboratory
Research Triangle Park, N. C. 27711
12 SPONSORING AGENCY NAME AND ADDRESS
3. RECIPIENT'S ACC£SSIOf*NO
5 REPORT DATE
September 1979
6. PERFORMING ORGANIZATION CODE
8. PERFORMING ORGANIZATION REPORT NO.
10. PROGRAM ELEMENT NO.
11. CONTRACT/GRANT NO
13, TYPE OF REPORT AND PERIOD COVERED
14. SPONSORING AGENCY CODE
EPA 600/08
15 SUPPLEMENTARY NOTES
16 ABSTRACT
The proceedings document technical presentations made at a l-day
conference on Data Validation for environmental data. The conference
was hosted and sponsored by the U.S. Environmental Protection Agency,
Research Triangle Park I nter laboratory Quality Assurance Coordinating
Committee on November k, 1977, at the Research Triangle Park. Various
approaches and techniques used for data validation are presented.
17. KEY WORDS AND DOCUMENT ANALYSIS
a. DESCRIPTORS
Data Val idat ion
Data Screening
Data Editing
Qua! i ty Assurance
Out 1 iers
Stat ist ics
Environmental Data
18. DISTRIBUTION STATEMENT
Release to publ ic
b. IDENTIFIERS/OPEN ENDED TERMS
Environmental monitoring
Data management
19 SECURITY CLASS (This Report/
Unclass if ied
20 SECURITY CLASS (This page/
Unclass if ied
c COS AT i 1 icld. Group
43F
68A
21 NO OF PAGES
315
22. PRICE
EPA Form 2220-1 (9-73)
316
------- |