PROCEEDINGS No. 2
ORD
ADP
WORKSHOP
November 11-13, 1975
Office of Research and Development
J.S. Environmental Protection Agency
Washington, D.C. 20460
-------
This document is available from:
U.S. Department of Commerce
National Technical Information Service
5285 Port Royal Road
Springfield, Virginia 22161
Do not order from the U.S. Environmental Protection Agency.
DISCLAIMER
This report has been reviewed by the Office of Research and Development, U.S. Environmental
Protection Agency, and approved for publication. Mention of trade names or commercial products does not
constitute endorsement or recommendation for use.
-------
EPA-600/9-76-008
April 1976
ORD ADP WORKSHOP
PROCEEDINGS
NO. 2
Sponsored By
Denise Swink, ORD ADP Coordinator
Technical Information Division (RD-680)
Office of Monitoring and Technical Support
Office of Research and Development
U.S. ENVIRONMENTAL PROTECTION AGENCY
OFFICE OF RESEARCH AND DEVELOPMENT
WASHINGTON, D.C. 20460
-------
FOREWORD
The second ORD ADP Workshop was held November 11-13, 1975, at the EPA Gulf Breeze Environmental Research
Laboratory, Gulf Breeze, Florida. This workshop focused on the merits of past data acquisition and manipulation techniques
and procedures, and provided suggestions for new approaches from the views of both providers and users of ADP resources.
Participating in the workshop, sponsored by the Office of Research and Development, were representatives of EPA
Headquarters program offices, Regional offices, laboratories, and organizations outside of EPA.
Denise Swink
ORD ADP Coordinator
-------
TABLE OF CONTENTS
Page
Number
FOREWORD iii
AGENDA
Opening Remarks
Denise Swink 5
Welcome Address
Tudor T. Davies 6
Keynote Address
Wilson K. Talley 7
Microprocessors
D.M. Cline 8
A Flexible Laboratory Automation System for an EPA Monitoring Laboratory
Bruce P. Almich 10
Data Acquisition System for an Atomic Absorption Spectrophotometer, an Electronic Balance,
and an Optical Emission Spectrometer
Van A. Wheeler 19
An Automated Analysis and Data Acquisition System
Michael D. Mullin 25
A Turnkey System: the Relative Advantages and Disadvantages
D.Craig Shew 31
One Approach to Laboratory Automation
Jack W. Frazer 33
Software Compatibility in Minicomputer Systems
John O.B. Greaves 38
Summary of Discussion Period - Panel I 40
Laboratory Data Management
William L. Budde 42
Suspended Particulate Filter Bank and Sample Tracking System
Thomas C. Lawless 44
-------
Page
Number
Eight Years of Experience With an Off-Line Laboratory Data Management System Using the
CDC 3300 at Oregon State University
D. Krawczyk 50
The CLEANS/CLEVER Automated Clinical Laboratory Project and Data Management Issues
Sam D. Bryan 60
Requirements for the Region V Central Regional Laboratory (CRL) Data Management System
Billy Fairless 65
Data Collection Automation and Laboratory Data Management for the EPA Central Regional Laboratory
Robert A. Dell, Jr. 74
Sample Management Programs for the Laboratory Automation Minicomputer
Henry S. Ames and George W. Barton, Jr. 76
Summary of Discussion Period - Panel II 79
The State of Data Analysis Software in the Environmental Protection Agency
Gene R. Lowrimore 81
National Computer Center (NCC) Scientific Software Support - Past, Present, and Future
M. Johnson 84
Exploitation of EPA's ADP Resources: Optimal or Minimal?
John J. Hart 86
Scientist, Biometrician, ADP Interface
Neal Goldberg 89
Statistical Differences Between Retrospective and Prospective Studies
Dr. R.R. Kinnison 91
Raising the Statistical Analysis Level of Environmental Monitoring Data
Wayne R. Ott 93
Quality Assurance for ADP (and Scientific Interaction With ADP)
R.C. Rhodes 95
How to Write Better Computer Programs
Andrea T. Kelsey 98
Summary of Discussion Period - Panel III 103
The Utility of Bibliographic Information Retrieval Systems
Johnny E. Knight .Q,
Biological Data Handling System (BIO-STORET)
Cornelius I. Weber 109
VI
-------
Page
Number
Utility of STORE!
C.S. Conger 111
The Uses and Users of AEROS
James R. Hammerle 114
Description and Current Status of the Strategic Environmental Assessment System (SEAS)
C. Lawrence 118
Improving the Utility of Environmental Systems
Donald Worley 121
Summary of Discussion Period - Panel IV 123
Operational Characteristics of the CHAMP Data System
Marvin B. Hertz 125
Development of Thermal Contour Mapping
George C. Allison 135
Remote Sensing Projects in the Regional Air Pollution Study
R. Jurgens 138
Automatic Data Processing Requirements in Remote Monitoring
J. Koutsandreas 148
Developments in Remote Sensing Projects
Sidney L. Whitley 156
Summary of Discussion Period - Panel V 173
Agency Needs and Federal Policy
Melvin L. Myers 175
Minicomputers: Changing Technology and Impact on Organization and Planning
Edward J. Nime 178
Univac 1110 Upgrade
M. Steinacher 181
A Case for Midicomputer-Based Computer Facilities
D.Cline 183
Status of the Interim Data Center
K. Byram 185
Large Systems Versus Small Systems
R.W. Andrew 187
Vll
-------
Page
Number
Summary of Discussion Period - Panel VI 189
APPENDIX - List of Attendees A-1
viii
-------
AGENDA
ORD ADP WORKSHOP NO. 2
NOVEMBER 11-13, 1975
GULF BREEZE, FLORIDA
Opening Remarks
Welcome Address
Keynote Address
Denise Swink
T. Davies
W. Talley
Panel I
Laboratory Automation - Instrumentation and Process Control
Chairman: D. Cline
Microprocessors
A Flexible Laboratory Automation System for an EPA Monitoring Laboratory
Data Acquisition System for an Atomic Spectrophotometer, an Electronic Balance, and
an Optical Emission Spectrometer
An Automated Analysis and Data Acquisition System
A Turnkey System: The Relative Advantages and Disadvantages
One Approach to Laboratory Automation
Software Compatibility in Minicomputer Systems
Question and Answer Period — Panel I
Panel II
Laboratory Automation - Data Management
Chairman: William L. Budde
Laboratory Data Management
Suspended Particulate Filter Bank and Sample Tracking System
Eight Years of Experience With an Off-Line Laboratory Data Management System
The CLEANS/CLEVER Automated Clinical Laboratory Project and Data Management
Issues
D. Cline
Bruce P. Almich
Van A. Wheeler
Michael D. Mullin
D. Craig Shew
Jack W. Frazcr
John Greaves
William L. Budde
Thomas C. Lawless
D. Krawczyk
Sam Bryan
-------
Requirements for the Region V Central Regional Laboratory (CRL) Data Management
System
Data Collection Automation and Laboratory Data Management for the EPA Central
Regional Laboratory
Sample Management Programs for the Laboratory Automation Minicomputer
Question and Answer Period - Panel II
Panel III
Strengths and Weaknesses of Analysis of Scientific Data in EPA
Chairman: G. Lowrimore
The State of Data Analysis Software in the Environmental Protection Agency
National Computer Center (NCC) Scientific Software Support - Past, Present, and Future
Exploitation of EPA's ADP Resources: Optimal or Minimal?
Scientist, Biometrician, ADP Interface
Statistical Differences Between Retrospective and Prospective Studies
Raising the Statistical Analysis Level of Environmental Monitoring Data
Quality Assurance for ADP (and Scientific Interaction With ADP)
How to Write Better Computer Programs
Question and Answer Period - Panel III
Panel IV
Utility of Environmental Data Systems
Chairman: Johnny E. Knight
The Utility of Bibliographic Information Retrieval Systems
Biological Data Handling System (BIO-STORET)
Utility of STORET
The Uses and Users of AEROS
Description and Current Status of the Strategic Environmental Assessment System
(SEAS)
Improving the Utility of Environmental Systems
Billy Fairless
Robert A. Dell, Jr.
George Barton
Gene R. Lowrimore
M.Johnson
John J. Hart
Neal Goldberg
R.R. Kinnison
Wayne R. Ott
R.C. Rhodes
Andrea Kelsey
Johnny E. Knight
Cornelius I. Weber
C.S. Conger
James R. Hammerle
C. Lawrence
Donald Worley
-------
Non-Use of EPA Data Systems
Question and Answer Period - Panel IV
Panel V
Developments in Remote Sensing Projects
Chairman: Marvin B. Hertz
Operational Characteristics of the CHAMP Data System
Development of Thermal Contour Mapping
Remote Sensing Projects in the Regional Air Pollution Study
Automatic Data Processing Requirements in Remote Monitoring
Developments in Remote Sensing Projects
Question and Answer Period - Panel V
Panel VI
Future Developments in ADP Resources for EPA
Chairman: Melvin L. Myers
Agency Needs and Federal Policy
Minicomputers: Changing Technology and Impact on Organization and Planning
Univac 1110 Upgrade
A Case for Midicomputer-Based Computer Facilities
Status of the Interim Data Center
Large Systems Versus Small Systems
Question and Answer Period - Panel VI
D. White
Marvin B. Hertz
George C. Allison
R. Jurgens
J. Koutsandreas
Sidney L. Whitley
Melvin L. Myers
E. Nime
M. Steinacher
D. Cline
K. Byram
R.W. Andrew
-------
OPENING REMARKS
By Denise Swink
I am pleased to see the familiar as well as the new
faces present here today. Such representation at this
meeting substantiates the need for, and usefulness of,
the ORD ADP workshop series. To refresh memories and
provide background for those of you who did not
participate in the first workshop held in October 1974, I
will summarize the purpose, subjects covered, and
impacts of the first workshop.
It became apparent in 1974 after my first year of
functioning as the ORD ADP Coordinator that the
scientific community of EPA involved in data processing
applications had few mechanisms to transfer or find
significant ADP technology. In response to the need for
better communication, the first workshop was designed
to promote state-of-the-art techniques as well as the
sharing of experience and knowledge in the area of
scientific applications and processing of data. The
subjects presented in formal papers included: mathe-
matical, scientific, and statistical applications software;
applications of minicomputers; applications of inter-
active graphics; laboratory data management; chemical
information systems; capabilities of the Univac 1110,
and ADP policies. Participation at this workshop
included not only personnel from the Office of Research
and Development but also Regional offices, Head-
quarters program offices, and organizations outside of
EPA. The Proceedings of the ORD ADP Workshop No. 1
are available from the National Technical Information
Service (NT1S) and can be purchased on request using
the accession number PB241 150/AS.
The first workshop was successful; however, the
participants commented that they thought it would be
beneficial as a follow-on activity to spend time
addressing issues and operational problems associated
with the subjects covered. Hence, the second workshop
has been designed to respond to this gap of information
transfer.
Since ADP activities rely on an interdependent
network of providers and users, many problems and
questions arise concerning methods and policies. To
manage available ADP resources effectively, one must
discern the level of technology and resources appropriate
for an application. Because of constantly accelerating
developments in technology coupled with the provision
of ADP resources from several organizations with
differing management philosophies, it becomes
extremely difficult to optimize one's use of ADP
resources. Consequently, this workshop will focus on the
merits of past approaches and provide suggestions for
new approaches from the view of both a provider and a
user.
To maximize participation, this workshop is
organized into six panel sessions for which there will be
short presentations by each panel member with a
question and answer period immediately following the
presentations. The panels have been established to cover
the topics, of: instrumentation and process control
aspects of laboratory automation, data management
aspects of laboratory automation, strengths and
weaknesses of scientific analysis of data in EPA, utility
of environmental data bases, developments in remote
sensing projects, and future developments in ADP
resources for EPA. Proceedings from this workshop also
will be available from NTIS by April 1976.
In closing, I thank you for your interest and partici-
pation in the ORD ADP workshops and look forward to
our increased and improved communications in the
future.
-------
WELCOME ADDRESS
By Tudor T. Davies
I am delighted to welcome you to the ADP Confer-
ence and to the Pensacola area, particularly to the Gulf
Breeze Environmental Research Laboratory. Besides the
obvious benefits of being in this equable climate, we are
fortunate that many people recognize this and use it as a
site for Agency meetings. Many of you are old friends,
but there are some present who will be unfamiliar with
the capabilities and the research program of the Gulf
Breeze Laboratory. Therefore, we would like to invite
you to visit us tomorrow.
When I was a newcomer to the Agency and
becoming acquainted with the Great Lakes community
of research, Regional, and State people, one of the
common factors that tied the water people together was
the STORET system. Many voices have been raised
against it. I believe, however, that these people are
speaking from ignorance about its capabilities. I strongly
support it as the best available interactive data storage
system. With the evolution of the BIOSTORET system, 1
feel that we have an excellent comprehensive data
system which is very much user-oriented. We should
strongly defend and support its continuance and future
growth. Although many of the topics to be addressed in
this workshop reflect our interest in automating labora-
tory analysis and controlling sample flow and data
arrangement, the eventual use of the data and its
communication are perhaps most significant to us in an
overall sense. At Grosse He, we found that data storage
was an expensive business but, once stored in the
STORET system, it was available for sample analysis and
complex modeling exercises by the whole user com-
munity. The community action in EPA toward ADP is
most encouraging.
-------
KEYNOTE ADDRESS
By Wilson K. Talley
The rhetoric used when EPA was formed is still
valid, that is, pollution problems are too often perceived
in isolation and addressed in isolation. The result is the
suboptimization of our society's dealing with the
environment. Yet, EPA was structured to be a regulatory
agency that was required to take a total view.
In the early days, the first half of the 1970's, the
Agency attacked the most obviously serious, and least
controversial, problems. Pollutants and classes of
pollutants were identified and abated. Today, we have
exhausted the single pollutant/single medium/single
species approach. To continue to enhance and protect
environmental quality, we must consider the residuals of
our activities.
This current situation presents a challenging
opportunity for us to accept. For example, within the
last year, we have imposed on ourselves the requirement
that we submit Environmental Impact Statements with
respect to our Agency decisions. Let us contrast our
Agency's view with the narrower missions discharged by
other Federal agencies. The Department of Agriculture
must maximize the production of food and fiber. One
way for discharging its mission would be to assume that
there is an abundance of inputs, such as the land itself,
capital, labor, chemicals, energy, and water, and that the
residuals are unimportant. Only when an input runs
short or a residual becomes a problem is it necessary to
consider the social, economic, or environmental systems
outside agriculture.
Another example might be the Department of
Health, Education, and Welfare's important mandate for
a health delivery system. Originally that system was
viewed primarily as a health services delivery system.
That view is tenable only as long as the social cost of a
solution is not out of proportion when based on taking
the present system and making it bigger.
We recognize that EPA cannot do its job and those
of its sister agencies. But note that EPA can do its job
more easily and completely if it works with those
agencies. And in doing so, there is every reason to
believe that the other agencies will be able to do a better
job with their primary missions. For instance, a large
portion of the problem with agriculture chemicals is
their misuse. One investigator has estimated that only
20 percent of the average pesticide application is
effective. Another points out that for some crops on
some lands, fertilizers have passed the point of
diminishing returns. Changes in irrigation practices could
not only save water initially but could also cut down on
the runoff pollution by salts. With respect to a health
delivery system, a more correct approach is to regard
environmental protection as a necessary adjunct to
health maintenance.
Similar examples abound that indicate a return to
the old conservation ethic: use less, use it well, and you
will waste (pollute) less. Our concern should include not
only present missions but also future problems. We
should anticipate and move against these future
problems to eliminate them at the least cost. However,
without the proper data base or the tools to manipulate
it into information, our task will be impossible.
ADP may provide us one of the tools we need to
realize the full potential of the Agency. This workshop is
a continuing effort to provide us with appropriate tools.
It is being held but three weeks shy of EPA's fifth birth-
day. The Agency is using that anniversary as an
opportunity to review the first five years and to plan for
the future. This workshop can make a valuable contribu-
tion to that future.
-------
MICROPROCESSORS
By D. M. Cline
INTRODUCTION
In recent years, system logic design engineers have
included minicomputers as integral parts of various
process control and analytical instrumentation applica-
tions. The relatively high cost of minicomputers has
inhibited their widespread use in these types of applica-
tions. As an alternative, system designers have used
"random logic" techniques in which the expense of a
single system is reduced by volume production. Then in
the early 1970's, a phenomenal new electronic device,
the microprocessor, was produced as a result of large
scale integration (LSI) technology. The microprocessor
unit is a central processing unit on a chip of plastic with
typical dimensions of 0.25 by 0.25 inches. It includes
the functional units of arithmetic logic, control
circuitry, and registers. The unit alone does not con-
stitute a microcomputer, since it requires two additional
components: input/output (I/O) circuitry and memory.
Although the microprocessor was initially used by the
system logic designer, new and improved microprocessor
designs are beginning to be utilized by users other than
engineers. Regardless of profession, this new technology
is demanding that the system developer have expertise in
both hardware and software techniques.
CHARACTERISTICS
Word size and speed are the first characteristics
about which a potential microprocessor user inquires.
However, if the interrogation ends at this point, the user
will have many surprises in store as system development
proceeds. For instance, the microprocessor may not
include interrupt circuitry, direct memory access (DMA)
ability, accessible stack, or BCD arithmetic capability.
The component that provides system timing, the clock,
initially was composed of circuitry external to the
microprocessor unit, but some of the more recently
introduced microprocessors include clocks as integral
portions of the chips themselves. Many microprocessors
do not have clocks and their manufacturers do not offer
them on a separate chip; therefore, the user has to design
and construct a single- or two-phase clock as required for
operation of the microprocessor.
Many of the microprocessors are supplied by a
single manufacturer. This is an important factor to
consider when system life may be long and one must be
assured of spare or replacement components. Another
factor to consider is the number of power supply
voltages required. Most models require two, but some
newer models require a single positive 5-volt supply and
are compatible with the TTL family of logic. One should
also consider both the number and the kind of registers
available.
In the past year, several manufacturers and
independent suppliers have begun offering microproc-
essor kits which include the microprocessor and the
required support logic. This could be useful for proto-
type development because many include software and
logic for a serial device. The three major sources of kits
are the microprocessor manufacturers, the electronics
distributors, and the system houses. The system houses
seem to offer the most complete kits, which include the
microprocessing unit, support logic, printed circuit
boards, power supplies, cabinet, programer's console,
switches, and lights.
One major disadvantage of programing a micro-
processor is that the peripheral devices required for
expedient software development are more expensive
than the microprocessor itself. Thus, it is very difficult
to produce a single system at a low cost which utilizes a
microprocessor even though the components which
comprise the system are inexpensive. One last charac-
teristic, which manufacturers of microprocessors are
finding to be of considerable marketing value, is a chip
that has an instruction set compatible to a popular
minicomputer.
SUPPORT LOGIC
Support logic consists of many devices, including
memories (ROM), random access memories (RAM),
programable read-only memories (PROM), electrically
programable read-only memories (EPROM), clocks, shift
registers, and parallel and serial I/O interfaces. Most
microcomputer systems contain at least one MPU, one
ROM, and one RAM. The ROM is a device from which
information can be read but on which information can
not be written. Usually the control logic or "computer
program" is implemented in ROM because its contents
are not lost when power is removed. Since a RAM is a
-------
read/write memory device, it is used for data storage and
its contents are lost when power is removed. The PROM
is a device that can be irreversibly programed under
special conditions but acts like a ROM once it is pro-
gramed. The EPROM is similar to the PROM but can be
reprogramed under special conditions.
APPLICATIONS
Microprocessor applications abound. Microproc-
essors are used as traffic light controllers, electric range
controllers, numerical controllers, and elevator con-
trollers. They are used as the control logic for slower
computer peripherals such as cassette drives, flexible
disk drives, line printers, card readers, and plotters. They
are used in point-of-sales terminals, cash registers, adding
machines, and fare collection devices.
A recent article in Computer World reported that
IMS Associates, Inc. had combined multiple Intel 8080
microprocessors into an array configuration, the smallest
containing 32 Intel 8080 MPUs and the largest con-
taining 512 Intel 8080 MPUs.1 The systems are reported
to offer high computing power at a low cost.
In this paper, a brief overview of a new and exciting
technology has been presented, including some of the
characteristics of microprocessors as well as some of the
pitfalls to avoid when selecting a microprocessor.
Microprocessor-based systems provide the system
designer with the opportunity to reduce costs,
component count, and size. However, an essential
knowledge of the hardware and software characteristics
of the microprocessor is necessary in order to utilize the
microprocessor effectively in a system.
Reference
1 Frank, Ronald A., "IMSA1 Arrays Micros for Low-
Cost Power," Computer World, October 1975.
-------
A FLEXIBLE LABORATORY AUTOMATION SYSTEM FOR AN EPA MONITORING LABORATORY
By Bruce P. Almich
INTRODUCTION
In the environmental field the measurement of
specific air and water pollutant chemicals is a very im-
portant activity. Until reliable measurements are made
and correlated with undesirable health or wildlife popu-
lation effects, environmental concerns are limited only
to those concerned with purely aesthetic values. Current-
ly, there is considerable emphasis on setting standards
for acceptable air and water quality, issuing permits for
discharge of wastes into rivers and oceans, monitoring
these effluents to ensure compliance with permit limita-
tions, and conducting enforcement actions when vio-
lations occur. All of these activities are increasing the
demand for more and improved chemical environmental
analyses.
Improved analyses embody accuracy and precision,
and require extensive use of analytical quality control
techniques. Quality control is often omitted in ana-
lytical laboratories because of its cost and time require-
ment. With this omission, the meaningfulness of the
measurements decreases substantially. There is nothing
more costly than the wrong answer. Another aspect of
better analysis is the desire for new kinds of measure-
ments that are more revealing about the state of environ-
mental pollution than that provided by traditional
measurements. These more revealing measurements are
often more complex and simply cannot be accomplished
economically, or at all, without some form of
automation.
With the above remarks in mind, one can sum-
marize the objectives of laboratory automation for EPA
as follows:
Increase instrument and laboratory through-
put for a given level of instrumentation and
operations personnel
Increase the productivity of laboratory per-
sonnel
Improve accuracy and precision of analytical
results with instream data quality control and
instrument reliability assurance procedures
Reduce clerical time and errors by eliminating
manual calculations and transcriptions of data
Reduce tedium by automating as many repeti-
tious laboratory tasks as practicable
Incorporate instruments and techniques into
laboratory operations which would not other-
wise be practical or possible due to technical
or economic constraints
With all costs considered, provide a substantial
positive net benefit to the laboratory for the
lifetime of the added equipment.
Many of these objectives are fully discussed else-
where.
23
In response to the needs and objectives stated here,
the Environmental Monitoring and Support Laboratory
(ORD-EMSL) and the Computer Services and Systems
Division (OPM-CSSD) of the Cincinnati EPA laboratories
have been conducting a project in concert with the
Lawrence Livermore Laboratory (ERDA-LLL). The re-
sult has been the development of a highly flexible
laboratory automation system capable of meeting the
above objectives for a wide variety of environmental
monitoring laboratories. EMSL was chosen as the site for
pilot system installation and integration because its pri-
mary mission is: "To develop, improve, and validate
methodology for the collection of physical, chemical,
radiological, microbiological, and biological water qual-
ity data by EPA Regional offices, Office of Enforcement
and General Counsel, Office of Air and Water Programs,
and other EPA organizations." In the direct support
and initial direction of this project, CSSD has been
carrying out its major responsibilities: "Provide EPA
Cincinnati focal point for coordination and integration
of computer systems across technical lines.... plan,
coordinate, and carry out a program for exploitation of
scientific and technical application of computers to EPA
needs." LLL was chosen to assist in the project because
of its previous record of solid accomplishments in
relevant areas as well as its existing highly qualified
technical and professional staff.
10
-------
PROJECT GOALS
In order to meet the objectives and needs of EPA
monitoring laboratories, a thorough systems analysis ap-
proach was taken to establish the exact goals of the
project, to write detailed specifications for hardware and
software, and to develop an implementation plan.
Several of the goals defined as mandatory include:
23
To develop a laboratory automation system
that would incorporate presently owned
chemical analysis instrumentation widely used
throughout the Agency for measuring water
quality parameters.
To develop this methodology to permit the
adaptation of the technology to other EPA
laboratories at very significant cost and time
savings. In particular, designs for hardware in-
terfaces between instruments and computers,
as well as all custom computer software,
would become public property to be used in
any EPA laboratory without further develop-
ment or licensing costs.
To develop an open-end design for both hard-
ware and software that would permit the
attachment of many additional instrument
types for measurements, including nonwater
parameters. This goal includes the minimum-
cost ability to have varying numbers of each
automated instrument type for a given labora-
tory, depending only on laboratory needs.
To take advantage of presently available
computation power in writing as much of the
software as possible in a very flexible, high-
level, modern programing language. This
would assist scientific personnel in modifying
and improving software, and facilitate the
transfer of technology to other laboratories at
minimum cost.
To design the system with sufficient flexibility
so that it is applicable to methods develop-
ment research as well as to the production at-
mosphere. In an automated environment, care-
ful testing of new procedures becomes
economically and technically possible with a
statistically significant number of samples.
To maximize flexibility but minimize redun-
dancy and inefficiency, from the viewpoint of
an Agency-wide computer software effort.
It was recognized early in this project that a satis-
factory level of technical and administrative coordi-
nation and communication would be necessary for EPA
to realize the goals and objectives of laboratory auto-
mation. To this end, the following relationship attributes
were defined for LLL, CSSD/EMSL, and other "client"
laboratories:
Design efforts for major system components
and facilities would proceed as an "interactive,
iterative process" among the interested
parties.
The first stage of this process would involve a
thorough system specification and design re-
sulting from the collection of requirements
from interested parties throughout EPA's
monitoring laboratory community.
LLL would assume technical leadership in the
initial systems level hardware and software
implementations, producing sufficient docu-
mentation to allow EPA to independently
maintain and modify the delivered turnkey
systems at all technical levels.
Following the successful installation and de-
bugging of three turnkey systems, EPA would
take a more active technical role in the areas
of maintenance, new equipment additions,
and subsequent client laboratory implementa-
tions. With the continued assistance of outside
sources such as LLL, EPA would use its in-
creased level of in-house expertise and coordi-
nation to maintain existing systems and to add
to them, whenever applicable.
At maturity, the effort would require a level
of continued EPA technical and administrative
interaction to the point that the formation of
an Agency-wide users' group would be indi-
cated.
PROJECT STATUS
Instrumentation
Since the last report of the project status to this
workshop, two complete hardware installations of these
systems have become operational, with a third due on
March 1, 1976. The number and types of instruments
11
-------
completed, together with their performance character-
istics and controlling software, are substantially identical
to the original intentions of the design.
exceptions to this are the following:
1 2
The major
A furnace has been added by LLL to the
Automated Atomic Absorption spectrometer
system at the CRL-V Chicago site.
The hardware and controlling software for a
Mettler Balance system at CRL-V has been
completed by LLL.
EMSL-Cincinnati is completing the addition of
another "new" instrument to the system; i.e.,
a Coleman 124 Spectrophotometer.
With a CRL-V implementation installed, sev-
eral variations of automatic sample changers
are now available for the systems. Capacities
start at 40 samples.
CSSD-Cincinnati has responded to CRL-V in
their priority requirement for a remote job
entry capability, allowing the laboratory
computer to send and receive batch mode jobs
in conjunction with EPA's IBM and Univac
computing centers. The technical aspects of
this addition are complete.
For convenience, the existing and planned hardware for
this project has been broken down into a number of
functional areas and is presented in Table 1.
Software
Table 2 illustrates the various types of software
included and planned for each laboratory automation
system installed as part of this project. The present
status of the software includes performance meeting the
original specifications, with the notable exception
that the Sample File Control systems and applications
programs are still in the design phase. Further informa-
tion on software status will be available after Febru-
ary 1, 1976, from the author. In brief, however, the
presently running software includes all data acquisition,
reduction, instrument control, and quality assurance
procedures that were originally intended for the existing
automated instruments. In addition, a variety of other
BASIC applications programs are either presently avail-
able or are under test/debug phases at the various labora-
tory sites. Present capabilities include the production of
printed laboratory analysis reports suitable for filling
and reporting in the usual manner.
In order to assist the reader in visualizing the
"total" software picture for this system, a core map for
the EMSL-Cincinnati site is given in Figure 1. A hard-
ware memory protection unit functionally divides the
available memory into three partitions: foreground,
background, and operating system area. Although one
can subdivide the available 64k of address space in many
ways, it was found that the allocations shown were the
most optimal for the purposes of most laboratory
systems, with the accompanying constraint that no
partition can be larger than 32k. Thus, the foreground
partition contains Multiuser Extended Basic, the LLL
assembly language instrument drivers, and the user data
area shared by each user via the swapping disk. The
background partition is intended for the operating
portions of the Sample File Control programs as well as
an area for- utility functions and a limited amount of
program development. The operating system, MRDOS
revision level 3.02, resides in the lowest 16k or so of
core space. For the given revision levels of the DGC-
supplied software shown, the core map represents not
only the optimal configuration but also the maximum
total amount of core that should be used for the system.
As one shifts from the design/development of this
project to maintenance, it is necessary to reevaluate the
validity of having a sophisticated operating system such
as MRDOS resident in each laboratory automation
computer system. With the advantages and disadvantages
cited in Table 3, it is clear that advantages are apparent
when the system is under development and testing. How-
ever, the disadvantages tend to appear in the longer term
as maintenance costs for system level software upgrades.
This type of upgrade is required in order to maintain
pace with software and hardware technology as well as
to keep the computer vendor's hardware and software
support for the "current" revision level. The degree to
which the technical and economic aspects of the disad-
vantages can be minimized varies with the number of
"custom" systems level software interfaces that must be
adjusted with each revision level of the vendor-supplied
code. This is one reason why the LLL designers have
maintained a "hands-off" policy with respect to custom
modifications of the MRDOS software. There are, how-
ever, a fair number of other aspects of the total software
picture which are showing sensitivity to revision levels as
time passes. The costs of maintaining the entire software
system, therefore, can be minimized only if these soft-
ware modules are maintained as close to identical as
possible for all installations of these systems. Thus,
changes can be made in sensitive areas once, and only
once, with timely distribution throughout the Agency.
12
-------
Table I
Summary of Existing and Planned Hardware Types*
A. Instruments
1. Atomic absorption spectrometers, furnace options
o P.E. 503,303,306
o I.L. 453, Varian AA-5
2. Beckman total organic carbon analyzer
3. Technicon autoanalyzers
o Single/multiple channel
o Types I and II
4. Jarrell-Ash 3.4 meter electronic readout emission spectrometer ,
5. Varous automatic sample changers, capacity of 40 and up
6. Mettler balance
7. P.E. (Coleman) 124 double beam spectrophotometcr
8. Various instruments similar to the above*
9. High data volume instruments already fitted with digital control systems (e.g., GC-mass spectrometer,
plasma emission spectrometer*)
10. Various instruments characterized by low cost, infrequent data production, etc. (e.g., pH meter,
turbidmeter, etc.*)
B. Computer-Related
I. CPU: Data General Nova 840 (or equivalent instruction set emulation capability at increased performance*)
2. Memory: 64k words (up to 128k words*) of high-speed core (or interleaved semiconductor*)
3. Arithmetic: Hardware multiply/divide and floating point processor
4. Peripherals
o Fixed head swapping disk
o Removable moving head data and program storage disk
o Tape drive for backup and data archives
o Digital interface and up to 32 channel analog to digital converter
o Medium-speed line printer
o Low-speed hard copy and CRT user and computer control terminals
o Custom-fabricated instrument interfaces
o Asynchronous telecommunications interface and modems
o Synchronous telecommunications interfaces*
o Dual processor - shared disk adapters*
'Planned hardware types are indicated with asterisks.
13
-------
Table 2
Summary of Existing and Planned Software Types*
A. Data General Supplied, DGC/EPA Maintained
1. Operating system: MRDOS revision 3.02 (4.02, 5.xx*)
2. Data acquisition and control: Multiuser Extended BASIC (up to 16 users), revision 3.6 (4.xx, up to 32 users*)
3. Utilities: assemblers, loaders, debuggers, editors, command language, Fortran IV, V, and Remote Job Entry
package (RJE-IBM)
B. LLL Supplied, LLL/EPA Maintained
1. Real-time assembly language routines and interface to BASIC for instrument control and data acquisition
2. Patches to DGC BASIC
3. Initial BASIC applications program package, including instrument controllers, data acquisition, and quality
control programs
4. System performance analysis and foreground/background communications packages
C. "Source-X" Supplied, EPA Maintained
1. Univac 1110 Remote Job Entry package (licensed from Gamma Tech.)
2. Sample File Control systems software package*
3. Sample File Control initial applications package*
4. Documented EPA internal software with broad application
5. All other software selected for EPA-wide support
D. Client Laboratory Supplied, Client Laboratory Maintained
1. Additions and changes to either BASIC or Sample File Control applications packages for incorporation of
"local" needs
2. All other uncoordinated "local" changes to types A-C above* (case-by-case tradeoff with item C-5 above)
3. All other "new" local programs
Since the software involved in this maintenance function maintain Agency-wide compatibility for software items
is deeply buried at the systems level, those who are A-l through A-3, B-l, B-2, and C-2 shown in Table 2.
knowledgeable about the subject believe that this will Performance
not impact on the flexibility of the systems. One can,
therefore, conclude that the advantages of MRDOS in To date, the hardware aspects of system perform-
the laboratory clearly outweigh the disadvantages, even ance have been quite good. Although insufficient opera-
during the maintenance phases, if every effort is made to tional experience has been accumulated at this point to
14
-------
Table 3
Vendor-Supplied Real-Time Operating System Software in Laboratory Automation Computers
Advantages
o Efficient allocation of system resources::
— Core: overlays, tasking, swapping, partitioning, reentrancy
— CPU: dynamic scheduling, tasking, time slicing
— Peripherals: file structuring, interrupt service, space management
- Man-machine interface: console system control language
o Time-proven, reliable, vendor-supplied code
o Standardized utilities: compilers, editors, job stream managers, etc.
o Data and programs generally transportable among installations
o Compatibility with new and existing vendor-supplied hardware
o Program and system development accomplished at minimum cost
Disadvantages
o Operating system overhead can be significant: CPU time, core, cost
o Peripherals not directly accessible: 50 jusec overhead per interrupt
o User device driver implementation can become quite involved
o User responsibility to keep up with vendor-supplied upgrades of software; i.e., vendor discontinues support
of outdated software releases
o Vendor software upgrades may require significant user software modifications and/or rewrites
recommend design changes in subsequent implementa-
tions, the following are included among present
observations:
The "problem child" for hardware main-
tenance has been the fixed head swapping
disk. In two of the operating installations, this
device has been quite unreliable. It is also the
current performance bottleneck in the system
because its low data transfer rate causes poor
user response time during peak loads. Efforts
are being made to improve this aspect of the
system.
One can take advantage of recent advances in
computer technology in subsequent designs to
include a cpu which is instruction-set com-
patible with the Nova 840 but also much
faster and somewhat cheaper. The benefits in-
clude faster user response time in the face of
the large number of automated instruments
and program functions at the larger labora-
tories.
The addition of new instrument types should
proceed in an orderly manner so as not to
impact the performance, reliability, or cost ef-
fectiveness of the existing system in an undue
manner.
The software aspects of system performance also
have been very good. Generally, it is believed that the
conservative policy towards modification of systems
level code has been responsible. A second significant
factor has been the level of creative thought brought to
bear by LLL in the design and implementation of the
15
-------
custom software and hardware for the system. Neverthe-
less, obsolescence and the promise of improved response
time for large numbers of operating instruments are pres-
ently encouraging an ongoing effort to upgrade the
revision level of the systems software on an Agency-wide
basis. Once the upgrade from revision 3 to 4 of MRDOS
is performed during the first half of 1976, the following
new features will be present:
A new maximum core capacity of at least
128k words will be possible. The present 64k
maximum may be a bit tight for the larger
laboratories. Figure 2 shows an improved core
map.
A system tuning feature will be available for
determining the size of the operating system
partition on the basis of measured system per-
formance for each laboratory site. This will
allow for the true optimization of computer
resources for each site, especially the larger
installations, where the present state of op-
timization is pretty much a guessing game.
Better features for spooling and BASIC system
generation should improve system perfor-
mance substantially.
PROBLEMS
As the non-Data Base Management aspects of this
project approach maturity, the typical problems
associated with technological innovation in the presence
of budgetary and organizational constraints are evident.
The basic issues include the following:
1. What is "standardization"? Management
discusses it in broad, nontechnical terms, while people
with technical responsibilities are grappling with the
basic tradeoffs between "reinventing the wheel" and
losing flexibility. A firm policy needs to be developed
and accepted.
2. A formal process for adding new instruments
and capabilities must be developed. Those who are close
to a problem, tend to misestimate the broader implica-
tions and impacts of its solution. Those removed from a
problem have difficulty fitting it into a priority scheme
and must be motivated in obtaining a realistic, timely
solution by those closer to the problem.
3. A mechanism for solving systems-level prob-
lems for vendor and LLL-supplies software is needed.
Problems referred to DGC and LLL from a single point
within EPA will be solved once and for all in a timely
manner.
4. Progress towards a users' group should be
initiated as soon as possible.
SUMMARY
A flexible laboratory automation system has been
designed and currently is being proven as operationally
viable within EPA. With the completion of three hard-
ware installations and the first cut of sample file control
programs during 1976, the previously stated goals and
objectives will be satisfied to a large extent. The con-
tinued development of in-house Agency-wide expertise
and experience in this field will facilitate not only the
effective communication among the interested parties
but also the increased capability to apply automation
technologies to the solutions of laboratory problems.
REFERENCES
1 Budde, W.L., Nime, E.J., and Teuschler, J., "An
Online Real-Time Multi-User Laboratory Automa-
tion System," Proceedings No. 1, ORD ADP
Workshop, 1974.
2 Frazer, J. W. and Barton, G. W., "A Feasibility
Study and Functional Design for the Computerized
Automation of the Central Regional Laboratory
EPA Region V, Chicago," ASTM Special Technical
Publication 578, ASTM, 1975.
3 Frazer, J.W., "Concept of a Different Approach to
Laboratory Automation," Proceedings No. 2, ORD
ADP Workshop, 1975.
4 Nime, E. J., CSSD Functional Responsibilities, EPA
Cincinnati, 1975.
5 Bunker, E., "If something works, don't fix it." "All
in the Family" (TV series) 1974.
16
-------
CORE MAP
64 K
(WORDS)
34 K —
16 K —
0 ._
ADDRESSES
FOREGROUND PARTITION:
EXTENDED MULTIUSER BASIC,
REV. 3.6
(WITH LLL REAL-TIME INSTRUMENT DRIVERS)
BASIC USER DATA AREA
BACKGROUND PARTITION:
COMMAND LINE INTERPRETER, UTILITIES,
PROGRAM DEVELOPMENT, ETC.
OPERATING SYSTEM AREA:
MAPPED, REAL-TIME DISK OPERATING SYSTEM,
REV. 3.02
15K
30 K
15K
18K
16 K
SIZES
Figure 1
Present Pilot Laboratory Automation System Configuration .
17
-------
CORE MAP
128 K
(WORDS)
112K
52 K —
20 K
FOREGROUND PARTITION:
EXTENDED MULTI-USER BASIC
REV. 4.XX
(WITH LLL REAL TIME INSTRUMENT DRIVERS)
BASIC USER DATA AREA
BACKGROUND PARTITION:
SAMPLE FILE CONTROL
AND
INTERGROUND/INTERPROCESSOR COMMUNICATION
OPERATING SYSTEM AREA:
MAPPED, REAL-TIME DISK OPERATING SYSTEM,
REV. 4.02
16 K
76 K
60 K
32 K
20 K
ADDRESSES
SIZES
Figure!
Maximum Single Processor Laboratory Automation System Configuration
18
-------
DATA ACQUISITION SYSTEM FOR AN ATOMIC ABSORPTION SPECTROPHOTOMETER,
AN ELECTRONIC BALANCE, AND AN OPTICAL EMISSION SPECTROMETER
By Van A. Wheeler
Computer automation of several instruments within
the Analytical Chemistry Branch, Environmental Moni-
toring and Support Laboratory at Research Triangle
Park (RTF), was due primarily to our responsibilities
with the National Air Surveillance Network (NASN).
The Trace Element Chemistry Section alone was re-
sponsible for recording approximately 150,000 data
values on file cards each year. One can imagine the pos-
sibilities for entry of human error when support of this
project required punching the Wang calculator 2 million
times a year.
Sample flow of the NASN filter samples is depicted
in Figure 1. Eight-by-ten-inch glass fiber filters are
screened, numbered, weighed, and then mailed to the
field. A 24-hour sample is collected and the operator
records collection time, air flow readings, and weather
conditions. Upon return of the filter, the final weight is
obtained for calculation of total suspended particulate
(TSP). Individual filters are cut and combined in a calen-
dar quarterly composite for each site. The acid extract
of the composite is analyzed for 24 elements by an op-
tical emission spectrometer with support or additional
elemental analysis by atomic absorption spectro-
photometry.
The computer system built for the project stores
the filter number and initial weight, assigns filters to
specific sites, stores the final weight upon receipt from
the field, and calculates the TSP based on the site in-
formation entered at a CRT terminal at the balance. The
system determines the filters to comprise the composite
and assigns a sample number to the extract. The analyses
are obtained under computer control and the raw instru-
ment data processed with the stored calibration and
blank parameters. The extract concentration is merged
with previously determined data for each site to produce
the final aerometric reports.
The contract to provide such a computer system
was finalized with Bendix Field Engineering Corporation
in March of 1972. The Automated Laboratory Data
Acquisition System (ALDAS), Figure 2, was designed to
produce and store valid aerometric data and offer real-
time control of three instruments: a Perkin Elmer 403
atomic absorption spectrophotometer (AA) with an
automatic sample changer, an Ainsworth 1000D digital
balance, and an Applied Research Laboratories 9500
direct reading emission spectrometer (OES). At the time
two processors were required as Digital Equipment Cor-
poration's (DEC) real-time operating software could not
support both foreground instrument control as well as
background report generation and file editing.
Figure 3 demonstrates the hardware organization of
the system. The POP 11/20 foreground processor oper-
ates under real-time software and is connected through a
bus cable to the instrument interfaces and the CRT's for
each instrument. The POP 11/15 background processor
operates under a traditional disk operating system from
the 64k word fixed head disk and is connected through a
bus cable to a DECT APE peripheral and line printer. The
units on each processor's bus lines are not accessible by
the other processor. However, they do share three 1.2
million word disks and the 9-track magnetic tape
through a bus switch.
Numerous status checks are performed at each in-
strument during operation: timing of the sample analysis
integration periods are checked with expected values,
correct instrument operating modes are checked, and
time-out routines are utilized to prevent hangup because
of lost data or instrument glitches. After passing pre-
liminary testing, the raw data are recorded on 9-track
tape. The tape serves a "log book" function of the
analysis recording the date, sample number, and raw
data. The raw data are processed according to the instru-
ment's mathematical routines, and the final data arc
stored on disk file for later report.
The advantages of this system include:
Turnkey operation of the project. We began
with only the analytical equipment and pro-
cured a system of all the necessary hardware
and software to service the bulk of our NASN
analytical responsibilities.
Elimination of human error in clerical filing
and computation.
Consistency in sample and data treatment.
Operation of instruments by personnel with
little or no experience.
19
-------
The disadvantages of the system are:
Failure of the DEC bus switch hardware
through which both processors accessed com-
mon peripherals at the most inopportune
times, requiring a restart of the system and
corrupting any open files. Analysis had to be
restarted as no allowances for restarting in
midstream were provided. The switch was also
suspected as the source of ghost messages ob-
served on the AA CRT during maximum use
of the three instruments.
The rigid structure of the system for the
NASN project limits its flexibility to process
other samples. This "monkey" mode opera-
tion is frustrating to professional chemists as
there are no allowances for operator decision-
making during the analysis scheme other than
beginning and ending the program.
The inflexibility of the system has proven to be the
death of the system before it could prove its merit. One
of the reorganizations within EPA transferred most
NASN responsibilities to regional offices leaving the
Analytical Chemistry Branch with receipt of a portion of
the filter for analysis. The ALDAS software could not
support this change in procedure without major re-
visions.
The system is currently being modified (Figure 4)
with the emphasis on servicing the analytical instrumen-
tation instead of a specific project. DEC now has avail-
able a real-time operating system, RSX-11M, which can
accomplish with one CPU the foreground-background
duties desired. Thus the PDF 11/15 and bus switch have
been eliminated from the system.
Memory capacity of the PDP 11/20 can be in-
creased from 32k to 128k through a field modification
and upgrade to a PDP 11/40. Increased memory will be
necessary for real-time operation of anticipated instru-
ment additions to the system since the instruments are
so designed that they must be monitored continuously.
Thus, it is cheaper to purchase additional memory than
to modify existing instrumentation to take advantage of
the time-sharing capabilities of the computer operating
system.
Software for the new system will be written in two
operating modes. A routine or "monkey" mode much
like the original ALDAS concept will be maintained
where standards and quality control samples must meet
rigid historical limits. The second mode will be for
special samples with the operator having considerable
input as to selection of samples, standards, and quality
control samples. In this mode, the software will evaluate
the standards and quality control data generated during
the analysis and report the statistical variation of the
analysis. The log tape concept will be maintained, but
the data will be written on a DECTAPE unit for faster
access.
In conclusion, automation of our analytical labora-
tory is now being approached from the standpoint of
servicing the analytical equipment instead of a specific
project. From this position, the outermost layer of pro-
graming can reflect project needs. If program plans
change, minimal, if any, programing changes will be
necessary.
20
-------
PREWEIGH
FILTER
SUPPLY
, U.S.
1 POSTAL
SERVICE
EXPOSED
FILTER
5T
SAMPLING
SITE
LOCATION
SAMPLER
ROTAMETER
FLOW RATE. TIME. SITE. ETC.
CONVERT INSTRU-
MENT OUTPUT.
MERGE WITH AIR
VOLUME. WEIGHT.
SITE. AND TIME
INIORMATION
STANDARDS.
REPLI-
CATES
SPIKES.
ETC.
MAINTAIN
CALIBRATION OF
INSTRUMENTS.
BLANK VALUES.
ETC.
FINAL
TABULATOR
OF TOTAL
PARTICULAR AMD
•ETALS
CONTENT.
BY SITE AND
QUARTER
QUALITY
CONTROL
DATA
Figure 1
NASN Filter and Data Processing
-------
to
to
PURPOSE:
INSTRUMENTS:
PROCESSORS:
INPUT:
OUTPUT:
TO PRODUCE AND STORE VALID AEROMETRIC LEVELS OF TRACE ELEMENTS
AND TOTAL SUSPENDED PARTICULATE MATTER IN REAL TIME
1. PERKIN-ELMER 403 AUTOMATIC ABSORPTION SPECTROPHOTOMETER
2. AINSWORTH 1000D DIGITAL BALANCE
3. ARL 9500 Dl RECT READING SPECTROMETER
1. DEC POP 11/15-BACKGROUND (24K MEMORY) REPORT GENERATION
2. DEC POP 11/20- FOREGROUND (12K MEMORY) INSTRUMENT DATA
PROCESSING
1. CRT TERMINALS (FOUR) - DESCRIPTIVE AND NONINSTRUMENTAL DATA
2. ANALYTICAL INSTRUMENTS (THREE) - INSTRUMENT RESPONSES
1. 9-TRACK MAGNETIC TAPE - FINAL PRODUCT
2. LINE PRINTER - PRINTED RECORDS, REPORTS
3. MAGNETIC DISK PACK (1.2 MILLION WORDS) - SOURCE OF
PERMANENT FILES
4. CRT TERMINALS-TEMPORARY DISPLAY
Figure 2
ALDAS Overview
-------
AUTO-
LOAD
R.O.M.
PROGRAM
CLOCK
POP
11/20
16 K
CORE
MEMORY
INSTRUMENT
INTERFACE
9-TRACK
MAGNETIC
TAPE
LARGE
DISC
PACK
INSTRUMENT
INTERFACE
BUS
SWITCH
AUTO-
LOAD
R.O.M.
60 H,
CLOCK
POP
11/15
INSTRUMENT
INTERFACE
CRT
TERMINAL
CRT
TERMINAL
CRT
TERMINAL
12K
CORE
MEMORY
CRT
TERMINAL
LINE
PRINTER
DEC
TAPE
SMALL
DISK
(FIXED)
(JO O
KJ
co
Figure 3
Computer Hardware
-------
DIGITAL
BALANCE
(AINSWORTH
10000)
CRT
TERMINAL
EMISSION
SPECTROMETER
(APPLIED RESEARCH
LABORATORIES 9500)
CRT
TERMINAL
ATOMIC
ABSORPTION
SPECTRO-
PHOTOMETER
(PERKIN ELMER
403 WITH
AUTO-SAMPLER)
CRT
TERMINAL
COMPUTER
HARDWARE
DIGITAL
EQUIPMENT
CORP.
POP -11/20
CRT
TERMINAL
DATA SYSTEMS
DIVISION
PRINTED
RECORDS
Figure 4
ALDAS System
-------
AN AUTOMATED ANALYSIS AND DATA ACQUISITION SYSTEM
By Michael D. Mullin
The Large Lakes Research Station is the EPA facil-
ity charged with research into the fate and transport of
pollutants in large lakes, concentrating on the Great
Lakes. A project was initiated 2'A years ago to study
Saginaw Bay in conjunction with EPA and other
Agency-sponsored studies on Lake Huron.
Initially the Saginaw Bay study entailed 59 stations
in the Bay to be sampled at various depths for a total of
approximately 110 samples per cruise. As a result of
knowledge gained during the 1974 field season, the num-
ber of stations was decreased to 37 for a total of 80
samples per cruise for the 1975 season. Each sample was
analyzed in the laboratory for 15 to 25 parameters, in-
cluding nutrients, organic carbon, conservative and trace
metals, chlorophyll, and chloride. At the same time,
other research groups were conducting extensive bio-
logical studies to interact with our physical and chemical
data. There was a total of 31 cruises amounting to over
50,000 separate analyses.
Within budgetary constraints, it was necessary to
automate as many of the analyses as practicable. For this
purpose, a Technicon Auto-Analyzer II System was
purchased to, initially, provide the analyses of five nu-
trients: dissolved ammonia, dissolved reactive silicate,
dissolved reactive phosphates, dissolved nitrate and
nitrite, and dissolved sulfate. Four additional parameters
have since been added: total phosphorus, total kjeldahl
nitrogen, chloride, and dissolved hexavalent chromium.
Dissolved iron may be added to the system in the near
future.
Although there are many other parameters being
analyzed in the laboratory, the only additional ones to
be automated analogous to the Technicon discrete sam-
pling technique are sodium, potassium, calcium, and
magnesium by atomic absorption spectrometry. With
our present dual channel instrument, calcium and mag-
nesium are analyzed simultaneously. Due to different
burner conditions, sodium and potassium must be an-
alyzed separately.
With 10 parameters, this system as shown in
Figure 1 can generate a large volume of data. For an
effective workday of 6 hours, up to 1,700 results can be
generated for the 10-parameter system per day. An
additional 500 results per day can be generated by the
dual channel automated atomic absorption spectrom-
eter. The Auto-Analyzer System can be set up and
maintained by two or three technicians. In addition, if
they have to read peak heights from a strip chart
recorder, perform regression analyses on standards, and
calculate concentrations of samples, more time will be
spent on calculations than on analyzing samples. The
other option is to have an online data acquisition system
mated to the instruments for time-consuming work.
Prior to purchasing any automated data processing
equipment, a requirement for our laboratory was the
need for the system to be functioning with as little in-
house effort as possible. The decision was made to pur-
chase a Digital Equipment Corporation PDP-8e mini-
computer. The basic unit is a 12-bit digital computer
with 8k core memory, 12-channel analog multiplexer,
and a teletype for communication and printing of re-
sults. This has since been expanded to 32k of core, along
with a high-speed reader-punch, a medium-speed line
printer, and a punched card reader.
There were a few problems that required attention
prior to online operation. The first priority item was
getting the analog signal from the analytical instrument
to the computer. The voltage output from the Auto-
Analyzer colorimeter is 0 to +5 volts DC, and the
required analog input to the analog to digital coiiverlor
is -1 to +1 volts DC. To eliminate this problem, an inter-
face was designed and constructed utilizing an opera-
tional amplifier and several resistors to produce this
conversion and is shown in Figure 2. The system
required that the output be linear over the entire
operating range and that the final voltage be adjustable
so as to fine tune the signal.
The analog output of the colorimeter is the same as
that going to the strip chart recorder. At the low concen-
tration levels encountered with some of the parameters,
there is a slight drift in the baseline absorbance that
must be taken into account when calculating standard
graphs or sample concentrations. Digital Equipment Cor-
poration developed a computer language analagous to
FORTRAN called FOCAL. For the PDP-8, they market
a software package called PAMILA, which is an overlay
to FOCAL and modifies it. We subsequently adapted
PAMILA to our specific uses and needs for real-time
operational control.
25
-------
Figure 3 illustrates the baseline drift correction pro-
cedure which is part of the in-house modified PAMILA
package. The routine checks the between-peak valleys. If
one falls below the initial baseline, the baseline is reset
and the peak heights are then corrected by subtracting
the interpolated baseline. If the baseline increases by the
end of a preset time interval, the baseline is reset and the
peak heights proportionally corrected over the time
interval. A number of other procedures also yield rela-
tive peak area, time of peak ending, peak height, and
type of peak.
The 8k version can hold information only for 64
peaks in the peak file storage at any one time, so a
mechanism is provided for printing of the contents of
the buffer, by channel, at any preset time interval. The
time interval is set for every ten sample cups. The first
and last sample in each decade is a water wash to set the
initiating and terminating baselines for the computer.
With 12 channels online simultaneously, the buffer
possibly could have up to 192 peaks stored at one time.
With an additional 8k or more of core, the peak file
storage can hold information on 200 peaks, thus increas-
ing the number of channels that can be monitored at any
one time.
Figure 4 shows the raw data output format. A
punch paper tape copy of the raw data is generated
simultaneously with the typed copy. When a given series
of samples, usually a day's run, is completed, the
punched tape is fed into the computer and stored on the
disk prior to editing and analyzing. Then the raw data
can be recalled, and extraneous peaks, noise, or other
unwanted information deleted. The corrected raw data
file is then analyzed with an interactive program
developed at the Grosse lie Laboratory to give concen-
trations to the standard peak heights, set upper and
lower limits, perform a regression on the standards and,
using this equation, calculate the concentrations for .the
unknowns. The output has all the needed regression fac-
tors together with the calculated concentrations.
Figure 5 is a copy of the output.
There are a number of additional ways to utilize
the computer. Data management procedures can be im-
plemented. Additional instruments, such as pH meters,
carbon analyzers, and weighing balances, can be inter-
faced with the existing equipment. Essentially, any
laboratory instrument that generates a measurable signal
will provide this interface. The Large Lakes Research
Station hopes to implement some of these as time and
resources permit.
26
-------
ATOMIC ABSORPTION SPECTROMETER ANALOG
SPECTROMETER PDP-8 MULTIPLEXER
INTERFACE
SAMPLER PUMP
LrtJ
VJ VJ
VTH
TECHNICON AUTO-ANALYZER , pDp_g
COLORIMETER INTERFACE
MANIFOLD I RECORDER- '
SAMPLER
r~i r~~| r~i r~
\MPLER PUMP .XT M II 1 I
D—D^OOO—D
xx>n—n
ANALOG-DIGITAL
CONVERTOR
Figure 1
Twelve Channel Automated System
/?2=20K*
/?3 = 10 K
AMP = 3267/12C Burr-Brown
f, =5VDC
*Variable potentiometer.
Figure 2
Analytical Instrument POP-8 Interface
MEDIUM-SPEED
LINE PRINTER
HIGH-SPEED
READER-PUNCH
PUNCHED CARO
READER
'l 0
Signal
Inpu
Chassis
yiuund O j^. n ^ AMP
t - 2 V V^-^
£2 Common ^^
t signal O AAAA VW
«2 *3
^> A t
4
•
j
r -10K • r -l"*\t*+**\' //?o^-
"0 ~ 0 ~ 1 a J 1
/?, =50 K"
w 2 \^l 1
Output signal
f"
•O Signal ground
•=• Common
O Chassis ground
27
-------
N»
OO
UJ
X
UJ
TIME, minutes
Figure 3
Baseline Connection
-------
RUN I 2 INST,
I AREA
TA 8.215524E+86
RUN I
1
2
3
4
5
6
7
B
INST,
AREA
98586
83330
56292
28325
32638
16.7614
25.8732
34.8537
TA 0.185625E+86
RUN •
1
2
3
4
5
6
7
8
INST:
AREA
18331
76642
11 178
28196
53678
18.8754
25.7231
34.3815
TA 8.211437E+86
1 DAY 11 TIME. 18
RET.T HCT TYPE
1
2
3
4
5
6
7
8
8. 91783
1. 77448
5. 28738
5. 27319
8. 65888
17.5442
25.6864
35. 0179
3. 683
7. 167
18. 23
13. 23
16. 37
19. 58
22. 57
25. 58
28
42
123
126
284
422
618
834
BV
VV
VY
VV
VB
88
BB
86
e DAY
RET.T
3. 883
7. 358
18 62
13. 42
16. 55
19. 55
22. 55
25. 48
11 TIKE
HCT
19
42
95
96
161
353
543
731
18
TYPE
8V
VV
VV
VV
VV
VB
BB
BB
DAY
RET.T
3. 788
7. 333
18. 68
13. 53
16. 53
19. 68
22. 68
25. 68
11 TIME
HCT
24
41
119
123
198
418
588
789
18
TYPE
BV
VB
88
BB
BB
BB
BB
BB
Figure 4
On-Line Raw Data Output
AT:
AT;
29
-------
••••• ANALYSIS TYPE • 3 •••••
ACTUAL
PEAK
HEIGHTS
24.
41.
119.
123.
198.
418.
988.
789
ACTUAL
STANDARDS
8.88123
0.88238
B.88738
8.88738
8.81238
a.92588
a.83738
8.83888
CALCULATED
STANDARDS
8. 88123
8.88233
a. 88723
8.88733
8.81229
8.82623
8.83783
8. 84978
HI MI BUR DETECT CONCENTRATION • 8.88838
MAXIMUM- DETECT CONCENTRATION • 8.83888
DECREE OF POLYNOMIAL • 1
CALCULATED CONCENTRATION • -8.8882(73
CORRELATION COEFFICIENT - a.9993638
CALCULATED CONCENTRATION • -8.8882671 »
REDUCED CHI SQUARE FOR FIT - 8.8888883
CONCENTRATION
DIFFERENCES
-a.88888
8.88817
•.88822
-8.88883
e.86821
-0.88123
8.88847
8. 88822
8.8888634 *PEAK HEIGHT FOB LEAST SQUARES
8.8880634 *PEAK HEIGHT. FOR POLFIT
RUN I 7 IHSTR I 2 DAY 11 TIME 18:36 AT,
RETT
3. 867
7. 388
18.388
13.238
16.238
19.388
22.378
23.176
HCT
77.
98.
5.
75.
1 .
61.
62.
32.
LESS
XCONC
J. 8846
9. 8839
THAN,
8.8843
LESS THAN,
8.8836
8. 8837
8.8818
a.8083
0. 8083
Figure 5
Off-Line Calculated Concentration Output
-------
A TURNKEY SYSTEM: THE RELATIVE ADVANTAGES AND DISADVANTAGES
By D. Craig Shew
The primary objective of this paper is to discuss
some of the relative advantages and disadvantages of
automated laboratory instrumentation in terms of com-
mercially available turnkey systems based on our past
4 years' experience with an automated gas chromatogra-
phy/mass spectrometry (GC/MS) system. It is important
to point out that, in general, the people who either
operate or are responsible for EPA's mass spectrometry
laboratories arc trained in organic or in analytical
chemistry rather than in data processing, per se. Thus,
this discussion will be from the viewpoint of the end
user of specific instrumentation as opposed to that of
one who is involved in general hardware and software
design.
About 4 years ago, EPA made a major commitment
to the automated GC/MS system by purchasing 23 more
or less similar systems at a cost of about $2.5 million. In
general, the GC/MS system provides an unequivocal basis
for the identification of organic compounds and, thus,
finds many uses in organic analytical chemistry. Specifi-
cally, EPA uses mass spectrometry for the identification
of organic pollutants originating from a wide variety of
sources. At this particular point in the evolution of auto-
mated instrumentation, the GC/MS system has proved to
be one of the most widely used, most successful
examples to date.
Figure I shows an overview of the system with its
three main components: a gas chromatograph, mass
spectrometer, and data system with the various input-
output devices. The system is controlled by a DEC
PDP-8 minicomputer which provides for control of the
mass spectrometer, data acquisition, data manipulation
and reduction, and data output. The entire system was
purchased from the Finnigan Corporation at a cost of
about $90 thousand. The data system was developed, in
part, and constructed by Systems Industries, a small cap-
tive subcontractor to Finnigan Corporation, at a cost of
about $55 thousand. Since that time, the Finnigan Cor-
poration has developed its own data system and is now
in direct competition with Systems Industries. As a
result, we are in the rather tenuous position of having to
depend on the Finnigan Corporation for service of the
mass spectrometer and having to depend on Systems In-
dustries for service and support of the data system. Al-
though this is an undesirable position from the stand-
point of service responsibility, no major problems have
arisen so far.
One of the first advantages of a commercially avail-
able turnkey system is the lower cost when compared to
the various alternatives. Development and update costs
are amortized over a large number of users, thus making
the turnkey system more economical. A major advantage
from the standpoint of the end user is that the system is
available for immediate use at a fixed cost. Thus, the
system's capabilities and limitations can be compared
with other techniques or with other commercially avail-
able systems.
Another major advantage for EPA's MS laboratories
is a well-organized, active users' group, a direct result of
having 23 similar systems within the Agency. Conse-
quently, a number of people have contributed to the
development of standardized operational and analytical
techniques. This has saved a substantial duplication of
effort in that development of similar techniques was not
required for various configurations of similar instrumen-
tation. In terms of day-to-day operations, the 23 systems
together have substantially lessened the degree of exper-
tise needed in any one laboratory because of support
from the other users. Additional advantages have result-
ed from having similar commercial systems and an active
users' group within the Agency. For example, data and
software changes can be easily analyzed or evaluated by
other laboratories with similar hardware. Similarly,
quality control is easier to set up and maintain.
One of the major disadvantages of the commercial-
ly available turnkey system is that one becomes highly
dependent on others for hardware service and software
updating and support. However, it might be well to
point out that primarily because of the commercially
competitive situation, about ten software updates and
revisions have been obtained at essentially no cost to us.
Hiring of outside contractors to make specific software
changes and additions in some cases has been a fairly
expensive proposition. A second disadvantage is that
software listings are generally unavailable to the com-
mercial users; consequently, even minor software
changes are difficult to accomplish. In our particular
case, we have able to obtain software listings, but the
documentation has been so poor that the listings are
practically useless.
In considering the various alternatives to commer-
cially available turnkey systems, there are several options
31
-------
available. Some form of interagency agreement is proba-
bly the most attractive since generally the contract is
relatively simple, and there is no question of ownership
rights. Similarly, a system can be developed on a com-
mercial basis according to detailed specifications
outlined in a contract. Also, one can envision some sort
of in-house effort in which commercially available hard-
ware and software are assembled, or alternatively, a
more extensive effort, including the necessary research
and development, construction of hardware, and writing
of software. The latter case involves a multidisciplinary
effort and is generally beyond the capabilities of most
EPA laboratories.
In conclusion, our experience with a commercially
available GC/MS system has been very satisfactory. How-
ever, in other cases of specific types of automated instru-
mentation, the various alternatives to a turnkey system
would have to be considered on a point by point basis.
GAS CHROMATOGRAPH
(ISOTHERMAL OR
TEMPERATURE PROGRAMMED)
SAMPLE ENRICHMENT DEVICE
OUADRUPOLE MASS
SPECTROMETER (INTEGER
RESOLUTION TO 750 AMU)
SLOW PLOTTER
(HIGH QUALITY
GRAPHICS)
IT
i L
MINI COMPUTER
(DATA ACQUISITION,
REDUCTION, AND
CONTROL)
SLOW PRINTER
WITH KEYBOARD
(ALPHANUMERIC DATA)
DIAL-UP TELEPHONE
TO LARGE COMPUTER
(DATA BASE SEARCH)
MAGNETIC DISK
OR TAPE
(PROGRAM AND
DATA STORAGE)
CATHODE RAY TUBE
WITH KEYBOARD (CRT)
(FAST GRAPHICS
ANDALPHANUMERICS)
FAST HARD COPY
OF CRT DISPLAY
Figure 1
GC/MS System
32
-------
ONE APPROACH TO LABORATORY AUTOMATION
By Jack W. Frazer
It has been said often that an antiphilosophical
attitude exists in American society. While such attitudes
are considered normal, ours, it is said, is intensified as a
result of our bias towards action. The effects of our
bias towards action combined with our antiphilosophical
attitude are nowhere more apparent than in our efforts
to automate laboratories and chemical processes. Most
of this kind of automation has proceeded via the action
route with too little thought given to an understanding
of the many dimensions involved and how the associated
efforts might be accomplished most effectively.
A typical scenario is as follows: development of
vague ideas concerning instruments and the functions to
be automated, survey of the computer market, selection
and purchase of a computer and auxiliary hardware,
acceptance of the computer and accessories, and finally,
an attempt to build a suitable system without the aid of
specifications or well-defined plans. This approach has
resulted in many systems that have failed to produce
significant cost savings or improved scientific results.
Note that the usual procedure is based on action; that is,
purchase of a computer first without the aid of a com-
plete set of system specifications and plans (an unaccept-
able philosophy).
There are undoubtedly many philosophies that,
when put in practice, could result in the successful and
cost-effective implementation of online computer auto-
mation. Following is an outline of one philosophy as set
forth at the Lawrence Livermore Laboratory, which
is now being fully developed and documented by Com-
mittee E-31 of ASTM.7'10
One of the first tenets of this philosophy is: Com-
puters used for online automation of instruments and
processes are system problems. Often the computer pre-
sents the designers with the least difficult problems. If,
then, the computer is not always the central issue in the
development of computer automation, what are the im-
portant issues and dimensions? First, the designer must
recognize that an automated laboratory impacts many
aspects of management practices, chemical and physical
processes being automated, and the instrument opera-
tional procedures and characteristics. Therefore, infor-
mation from a number of people with different responsi-
bilities and expertise is required in order to properly
define and specify the desired characteristics of the pro-
posed automation.
Secondly, the designer should recognize that the
implementation of an automated laboratory is a problem
of many dimensions. Therefore, an interdisciplinary
team effort is required if the implementation is to pro-
ceed smoothly and in a cost-effective manner. Due to
the complexity of automation and the requirement for
multidisciplinary team action, larger automation projects
are difficult to manage. This is particularly 'true when
the implementation is undertaken with incomplete spec-
ifications and designs.
OPERATIONAL PROCEDURES j
Since automation is recognized as a difficult and
complex.undertaking requiring effort from many scien-
tific and engineering disciplines, why not attack these
projects as we do other difficult tasks; that is, separate
the variables and solve them one at a time? An opera-
tional procedure that has been field tested and found to
be effective is shown in Table I.
Table I
Operational Procedures
System definition (including a cost benefit
analysis)
System specifications
Functional design
Implementation design (hardware and software
selection)
System implementation
System evaluation
Documentation
System Definition
At the onset of an automation project, the respon-
sible scientist should write a brief tutorial description of
the proposed project aimed at those levels of manage-
ment responsible for funding. Therefore, it should
33
-------
contain only a brief description of the principal features
of the project and the anticipated benefits. For larger
systems, a schematic representation should be included.
Finally, when the system specifications and functional
design are complete, a cost benefit analysis should be
inserted into the system definition.
System Specifications
System specifications may be defined as a listing of
all the details necessary to direct the uninformed (but
knowledgeable scientist) in the construction, installa-
tion, and testing of a complex project. They are similar
in detail and extent to the specifications required to
build a complex instrument, dam, or factory. For auto-
mation projects, we have preferred to consider the spec-
ifications to exist in one of three domains; inputs,
outputs, and transfer functions. The following is a set of
definitions for these terms:
Inputs - any source of stimuli that causes a re-
sponse within the system
Outputs-actions taken by the system as a
result of stimuli
Transfer functions - those algorithms that in-
terconnect the system inputs and outputs.
One of the better ways to understand system spec-
ifications is to study briefly a case history as published
in ASTM STP-578. Included below are examples taken
from one of these papers. The system has now been
delivered. Tables 2 and 3 contain a few of the input
specifications for one of four atomic absorption instru-
ments in the system.
Table 4 is an example of output specifications of an
output report.
Figure 1 shows part of one transfer function spec-
ification for the atomic absorption instruments. It des-
cribes the control algorithm for the automatic samplers.
The above examples are only a small part of the
system specifications, which become formidable docu-
ments running to several hundred pages for large
systems.
Functional Design
A functional design is a schematic representation of
an automated system, including inputs, outputs, and the
interconnecting transfer functions. An example of a
functional design is shown in Figure 2 for Auto
Analyzers automated with the other instruments de-
scribed previously. Where there are severe system time-
response or bandwidth requirements, the required time-
response characteristics and data rates are listed on all
.data paths.
Implementation Design
After the above three phases of work are complet-
ed, the designers begin the selection of specific hardware
and software as required to meet the design require-
ments. The cost effectiveness of various hardware-
software tradeoffs, as well as computer operating system
(software) characteristics, are assessed. With the aid of.
the system specifications, it is a relatively straightfor-
ward procedure to select system components that will
meet performance requirements.
CONCLUSION
Given a good set of system specifications, an imple-
mentation design, and appropriate hardware and soft-
ware, it is usually a fairly straightforward task to
construct the computer automated system. However, no
system is really complete until it is fully documented
and evaluated. Evaluation should include not only the
proper set of tests to assure that the system meets design
specifications but also the tests that determine the
"boundary conditions" of the system. These include
such parameters as the system time-response and band-
width under various operating conditions.
REFERENCES
1 Tinder, Glenn, "Political Thinking," Little, Brown
and Co:, Boston 1974.
2 Frazer, Jack W., "Management of Computer Auto-
mation in the Scientific Laboratory," UCRL-
72162. Presented at Stored Program Controller
Symposium, Sandia Laboratory, Albuquerque, New
Mexico, September 23, 1969.
3 Frazer, Jack W., "Laboratory Computerization:
Problems and Possible Solutions." Presented at
ASTM Meeting, Philadelphia, Pennsylvania,
May 12, 1970.
4 Frazer, Jack W., "A Systems Approach for the Lab-
oratory Computer Is Necessary," Materials Re-
search Standards, vol. 12, no. 2 (1972), pp. 8-12.
34
-------
5 Frazer, Jack W., "Design Procedures for Chemical 9 Frazer, J. W., Perone, S. P., Ernst, K., and Brand,
Automation," Amer. Lab., vol.5, no. 2 (1973), H. R., "Recommended Procedure for System Spec-
pp. 39-49. ification and Design: Automation of a Gas Chro-
ma tograph-Mass Spectrometer System," ASTM
6 Frazer, J. W., Perone, S. P., and Ernst, K., "A Special Technical Publication 578, ASTM, 1975,
Systematic Approach to Instrument Automation," pp. 25-64.
Amer. Lab., vol. 5, no. 2 (1973), pp. 39-49.
10 Frazer, J. W. and Barton, G. W. Jr., "A Feasibility
7 Frazer, J. W. and Kunz, F. W. editors, "Computer- Study and Functional Design for the Computerized
ized Laboratory Systems," ASTM Special Technical Automation of the Central Regional Laboratory,
Publication 578, ASTM, 1975. EPA Region V, Chicago," ASTM Special Technical
Publication 578, ASTM, 1975, pp. 152-256.
8 Frazer, J. W., Kray, A. M., Boyle, W. G., Morris, W.
F. and Fisher, E., "The Need for Automated
System Specifications and Designs," ASTM Special
Technical Publication 578, ASTM, 1975, pp. 65-76.
Table 2
Some of the "Operator and File Inputs" Specifications for the Atomic Absorption Instruments
Inputs for Operator Interactions During the Course of a Run
I. Command to halt a run in the event of out-of-conlrol conditions or equipment failure
2. Command to restart the run in the event of an interruption. This would include:
(a) Sampler position. .
(b) Identification of the starting solution.
3. Commands to set pause time and integration time when in the flame aspiration mode.
4. Analysis commands for the semiautomatic sampler mode of operation:
(a) S = standard, SlAN = sample standard 1 Nth reading.
(b) U = unknown, UlAN = sample unknown I Nth reading.
(c) B = baseline.
(d) A = average the replicates.
(e) E = erase, S2A2E, erase 2nd replicate run of standard 2.
(f) C = calculate, Cl = interpolation, C1.C2 = first, or second degree least squares fit.
CA = method of addition.
5. Special commands for the preprocessed sample mode of operation (for example, graphite furnace):
(a) INT! or PK! for integration or peak height value.
(b) PNAN!; read Nth peak (for example PHA3! read (he third peak).
(c) UlASlAN; unknown I + standard I, Nth reading (for standard additions).
UIAS2AN
U5AS3AN; unknown 5 + standard 3, Nlli reading.
6. Quality control commands:
(a) SC5 = check standard as standard 5 run as a check,
SC5AN = check standard 5, Nth reading.
(b) SP5 = spiked unknown 5, SP5AN = Nth reading of spiked unknown S.
7. Reagent blank commands:
(a) Y = use reagent blank values to correct results.
(b) N = do not use reagent blank values to correct results.
(c) ? = skip this value for now.
(d) AV = Use average value of reagent blanks.
35
-------
Table 3
Some of the "Instrument Inputs" Specifications for the Atomic Absorption Instruments
Instrument Inputs
General
I. Dynamic range of signal: 10,000 for-5 fullscale: 500 fiV must be detectable.
2. Example of signal: Appendix I-K has an example of a typical AA signal.
PE 303
The PE 303 will be fitted with a PE DCRI readout which will have the following electrical characteristics:
I. Signal characteristics:
(a) Internal sources -5 V fullscale from a solid-state operational amplifier output impedance
of < ion.
(b) Sample source 0 to -5 V.fullscale.
Pin 9 of the interface board in the Model DCRI readout.
(c) Reference source -3 ± 1.5 V. The exact value is dependent upon the energy level of the source.
Pin 11 of the Interface Board in the Model DCRI readout.
(d) Noise SO mV of I /js pulses dependent upon the noise-suppression switch setting.
Time constants 2 to 80 s in 5 settings.
2. Required filtering:
Analog; low pass filter; 4 pole rolloff.
Suggested at 0.1, I, lOHz.
Table 4
Example of an Output Report for an Electronic Balance in the Automated System
Example of Listings
/ = Explanations and comments ! = Carriage return, line feed.
/ Underlined responses are operator's input.
? RUN SFC! / Call sample file controller.
SUBSCHEMA? FILTER!
? LIST CPDATE; GPST; EAST CHICAGO; FILTER; NETWT!
PASSWORD? LOGSDON!
/ Display the net weight of all the filters.
/ By date from East Chicago.
GPDATE
740921
GPST
FILTER
EAST CHICAGO 4571293
4571294
4571295
NETWT
45.230 /gram
43.211
49.832
741021 EAST CHICAGO 5678912 48.213
5678913 48.214
36
-------
X = Prcscni pusilkm (slurcd).
Y = Next pusiiiun
-------
SOFTWARE COMPATIBILITY IN MINICOMPUTER SYSTEMS
By John O.B. Greaves
INTRODUCTION
Suppose that we wanted to build a Behavior Study-
ing Machine to observe and quantize the motions and
responses of various aquatic organisms. We would also
like to direct the machine to record the images of some
data, to display the images, and to manipulate these pic-
tures into some usable form. Then we would like to be
able to attach some statistical significance to the results.
Once the machine is built, the method of its construc-
tion may be deemed irrelevant until some other person
requests one like it. Or, if the machine breaks, as ma-
chines do, then the method of construction plays a sig-
nificant role in how much time it takes to repair it and
what training is required to do so.
To communicate our wishes and commands to the
machine, we develop a language called the Behaviorial
Response Language or BRL. Since we wish to use the
machine in an interactive fashion with graphics capabili-
ties, the language should be interpretive rather than
translative. This language will be executed on the Behav-
ior Studying Machine. In a purely top-down approach to
the design of the machine, this language would be the
starting point. We could define layers of metalanguages
until, at the bottom, there would be some mechanical
and/or electronic hardware to execute the commands
from higher levels. With a purely bottom-up approach,
we could choose a logic line of integrated circuits or
relays and build upon that a nanoprograming language, a
microprograming language, a conventional machine lan-
guage, and on up until the BRL could be executed.
BUILDING A VIRTUAL MACHINE - THE BEHAVIOR
STUDYING MACHINE
In building a realizable system, the systems archi-
tect often employs a middle-out design, with good
reason. Being knowledgeable about the requirements of
the top and the possibilities at the bottom, the system
can grow in both directions through the levels of struc-
ture toward a realizable system. We set out to build the
Behavior Studying Machine without knowing what com-
puter would eventually perform the lower level opera-
tions. Because of the high data rates for storing images, a
memory and a disk had to be local. Because of the cost,
the computer had to be mini/midi. The FORTRAN IV
language was chosen as the primary metalanguage be-
cause reasonably standardized translators exist on many
machines; for example, there are several "virtual FOR-
TRAN" machines on the market. For another reason,
FORTRAN IV is a translative rather than an interpretive
language and gives us a run time speed advantage. While.
our language, the Behavioral Response Language, is in-
terpretive, the modules it executes are translated from
FORTRAN to a lower language prior to run time and,
therefore, they need not be scanned for lexical and
syntactical errors. To achieve ultimate speed advantage,
we could provide a translator to translate our metalan-
guage directly into microcode for the host computer,
but perhaps the time payoff would be long in coming.
To assist in describing the Behavior Studying Ma-
chine, we will provide the following top-down sketch.
First, the keyboard function names include
INPUT , PLOT < >, FIND
PATHS < >, SQUARE < >, EDIT < >,
SUM < >, and LIV (for Linear Velocity). These
modules are coded as FORTRAN subroutines and pro-
vide the basic elements of the BSM. Thus, when new
functions are added to the system, they may be called
by name and may receive their keyboard arguments in a
labeled common block that has been scanned, decoded,
and placed into fixed alphabetic and numeric fields.
These modules, in turn, interface with the next lower
level via subroutine calls to the file handlers. The subrou-
tines are described as follows: BCREAT to create a new
file, BOPEN to open (or try) an existing file, BCLOSE to
close a file, GETV to get a vector (a variable length
record) from a file, and WRITEV to write a vector into a
file. At this level, if the programer only abides by the
rules of the interface, the entire business of managing the
buffer space becomes transparent. Thus, all existing and
proposed keyboard functions are machine-independent.
The file handlers are therefore conceived as the next
level down, but they, too, are coded in FORTRAN. The
file handlers make calls upon the next lower level in two
forms, the buffer managers and the operating system file
subroutines. The first of these is made up of buffer man-
ager modules. They perform the virtual memory-like
function of swapping in and out sections of the disk-
based file data as requested. Typical calls to the buffer
manager modules are SEARCH to search for a vector,
38
-------
H1LOV to determine the highest and lowest vector resi-
dent in the buffer, and GETBUF & RELBUF to get and
to release buffer space respectively. Typical operating
system calls are OPEN, CLOSE, RDBLK and WRBLK to
read and write blocks on the disk. The buffer manager
routines are also coded in FORTRAN. They allocate and
deallocate another labeled common block dimensioned
at about eight thousand words to allow that many inte-
gers or half as many real data elements resident at any
given time. They will execute these routines on any
"virtual FORTRAN" machine. Calls to the operating
system file handlers are both machine-dependent and
operating-system-dependent. These sections must be
suitably interfaced to the operating system of the host
computer, which we found to be a nonmonumental task.
The primitive operations are the same; for example,
open, close, read, and write.
At the lowest level, assembly language programing,
we have two classes of subprograms. The first class in-
volves augmenting the FORTRAN language with three
primitive subprograms: (l)the logical LSTEQ operator
to compare two N character strings to determine if they
are the same (.TRUE.) or different (.FALSE.),
(2) UNPACK to unpack two bytes into two integer
words, and (3) PACK to pack two bytes into one integer
word. All three subprograms were also coded in FOR-
TRAN, but the assembler versions were used for aes-
thetic reasons and for the checkout of the FORTRAN-
assembler interface. The second class of assembler sub-
programs exist as software drivers for the special-purpose
video to digital converter, the "Bugwatcher." The Bug-
watcher is connected to the computer via a direct
memory access (DMA) channel, for speed. These sub-
programs have well-defined machine-independent inter-
faces. They are: BWSEND to send a command word to
the Bugwatcher to set the input frame rate or the video
threshold, to reset the hardware, or initiate or terminate
the transfer of data from the Bugwatcher; BWSETS to
set up for a Single buffering operation; BWSETD to set
up for a Double buffering operation; BWONS to turn on
the DMA channel for a Single buffer; BWOND to turn
on the channel for Double buffering; BWWAIT to wait
for the buffer to fill and interrupt the processor; and
BWOFF to disable the interruption and terminate the
DMA transfer. The simpler single buffering programs are
used for initial checkout and for the LIVE keyboard
function to check that all hardware-software systems are
working properly. The double buffering subprograms are
used to transfer data from the DMA input to disk for
later retrieval and analysis.
THE LOWER LEVELS
We are currently implementing the Behavioral Re-
sponse Language on three separate virtual machines: the
DOS-9 operating system on a POP 11/45, the RDOS
operating system on a Data General ECLIPSE S/200
and, most recently, the RT-11 operating system on the
same PDF 11/45. To bring the software up on the
ECLIPSE RDOS operating system is roughly a one to
two mythical man-months (MMM) effort with another
month for hardware connections cabling, interfacing,
and checkout. The time estimated to change operating
systems on the DEC POP 11/45 is one MMM, which
seems reasonable with the progress seen to date.
CONCLUSION
One design criterion has been to bridge the gap
between the interpretive Behavioral Response Language
and the specially designed hardware, the Bugwatcher,
using software as portable as possible. To accomplish
this, we developed a structure of layered software, each
with a well-defined interface that could be implemented
with relative ease on most minicomputers. Software,
being pliable only in its formulative stages, can be
shaped for this purpose. A more complete description of
the prototype "Bugsyslem" can be found in the follow-
ing references list.1 2
REFERENCES
Davenport, D., Culler, G.J., Greaves, J.O.B., Fore-
ward, R.B. and Hand, W.G., "The Investigation of
the Behavior of Microorganisms by Computerized
Television," IEEE Trans. Biomed. Eng., Vol.
BME-17, July 1970, pp. 230-237.
Greaves, John O.B., "The Bugsystem: The Software
Structure for the Reduction of .Quantized Video
Data of Moving Organisms," Proc. of the IEE, Vol.
63, No. 10, October 1975.
39
-------
SUMMARY OF DISCUSSION PERIOD - PANEL I
The question and answer session was brief as a result of the extended paper presentation session. A summary of the
discussion is presented below.
Mini- and Microcomputers Versus Large Computer Systems
When asked whether minicomputers and microcomputers would eliminate the need for large-scale data base systems, the
panel felt strongly that they would not. With the introduction of the low cost microprocessor, even the lowest cost
instruments in the laboratory, including the pH meter, will inevitably be automated. The increased automation will produce
machine-readable information. There will be an increased requirement to transfer the information from facility to facility and
to store the data in a large data base for subsequent retrieval and analysis.
Laboratory Automation Serves The Analyst
Spontaneous laughter followed when the panel was asked if laboratory automation systems were for the benefit of the
"Boss" to keep track of what his people were doing at the bench and were just a waste of the analyst's time. The responses
were varied, but all contained the common theme that all of the laboratory automation systems observed in operation by the
panel members served the analyst. The analyst was assisted in data reduction and in data handling, and quality assurance
increased with laboratory automation systems.
Training
The panel members were asked about the difficulty in training personnel in utilizing laboratory automation systems that
they had implemented. The members agreed that the training process involved few difficulties. Generally, the personnel
utilizing laboratory automation systems were briefed on the flow of information from the instrument to the printed output.
This indoctrination was followed with on-the-job training with satisfying results. In designing one system, the user helped
define what interaction would be necessary for optimum system/user interaction. Another system contains online assistance
in the form of "Help" commands which the user may invoke at any time.
Computers for Research
The question that evoked the most discussion was whether ORD was becoming a computer-oriented organization rather
than an environmental research organization. The panel response was an emphatic "No." It is very difficult for EPA to
complete its mission without getting more heavily involved in utilizing laboratory automation systems because of current
manpower limitations. The computer will be used as a tool just as the atomic absorption spectrometer is commonly used;
ORD is not being accused of being an "instrument house" because it uses atomic absorption instruments. According to
Frazier of the Lawrence Livermore Laboratories, EPA is the leader for the country in laboratory automation systems and the
high cost of the equipment it has and the systems it is implementing. The techniques ORD is implementing are forerunners of
those that even municipalities will be using as the proliferation of microprocessors continues..It was suggested that the
systems will be available for 10 to 20 percent of today's developmental cost.
Laboratory Feasibility Studies
Given that feasibility studies consume considerable resources, the panel was asked when the laboratory manager
performs a feasibility study. The panel response to this very important question was less than adequate, mainly because there
is no logical set of rules to follow to determine when a feasibility study is to be performed. A study could be included as part
of the ongoing effort in the development of project plans which are formed to meet EPA's mission, or the manager may find
that there is no way to complete his project without the aid of a laboratory automation system.
40
-------
Laboratory Instrumentation
The question was raised as to which laboratory instruments lend themselves to automation with microprocessors. It was
the opinion of the panel that with the advent of the microprocessor, no instrument in the laboratory would be excluded from
automation.
Standardized Laboratory Automation System
The final question inquired of the panel whether EPA should have a standardized laboratory automation system. It
seemed inconceivable to the panel members that there would be a standard automation system since each of the laboratories
has such diverse research objectives. The computer industry is changing and the standardization of hardware, software, and
especially hardware interfaces may be realized in the future, but EPA, as an enforcement agency, cannot afford to wait and
standardize.
41
-------
LABORATORY DATA MANAGEMENT
By William L. Budde
Data management is a term that has many different
meanings. To the accountant, it means payroll and in-
ventory control. In the environmental field, data man-
agement could mean archival storage of environmental
data, trend analyses and statistics, environmental quality
indexes, mathematical modeling, or a host of other
activities.
For purposes of this discussion, data management
has a limited meaning. We will be concerned with the
management of information related to the operations of
a laboratory that makes chemical, physical, and biologi-
cal measurements on environmental samples. In other
words, our concern is limited to the computerized han-
dling of information about samples that are currently in
process in a laboratory.
Laboratory data management (LDM) could start
several months before sampling begins when a project
manager defines sampling sites, sampling dates or times,
and analytes to be measured. Subsequent entries may
include project/sample numbers, maximum allowable
holding times, values from field measurements, and a
variety of other information. At any time, management
may require a laboratory workload projection based on
all current and expected samples. After samples are re-
ceived at the laboratory, daily work lists may be generat-
ed for specific analytes with references to potential in-
terferences; for example, a particularly high trace metal
concentration could be noted on the nutrients work list.
As measurements are completed, data must be sorted by
project, interim status reports generated, and final
reports printed. At periodic intervals, management may
require other reports to summarize quality control or
allocate operations costs.
A traditional approach to LDM is manual entry of
all information into a computerized data base located at
a large data processing center. Manual entry includes
automatic but offline transfers, such as punched card
input, punched paper tape input, magnetic mark/print
sensing, and keyboard entry from a time sharing com-
puter terminal. One of the principal advantages of this
mode of operation is that it permits the utilization of
large computer systems with massive memories and a
variety of peripheral equipment, including high-speed
printers, plotters, and microfilm/microfiche output. In
addition, in recent years, general purpose data base man-
agement software has become available for use on
medium- to large-scale computer systems. The disadvan-
tages of this approach include the potentially significant
error rate associated with manual data entry systems and
the relatively slow turnaround times that are standard on
many batch computing systems. Of course, careful verifi-
cation of manually entered data and fast batch turn-
arounds are possible, but usually at a significantly higher
cost. Several other panelists have described their opera-
tions and experiences with the traditional approach to
LDM.
An alternative to the traditional approach to LDM
is local data management with a minicomputer. This is a
viable choice because of the development of relatively
inexpensive minicomputers for data acquisition, data re-
duction, and control of analytical instruments.
Instrument automation permits a substantial im-
provement in quality assurance by transferring instru-
ment output signals to the computer via direct electrical
connections. The signals are processed in digital form to
generate measured values. These operations eliminate
errors from hand measurements of peak heights or areas,
hand or desk calculator computations, manual transcrip-
tions, and coding in computer readable form. The effi-
ciency of the whole process permits the incorporation of
many quality assurance checks as an integral part of the
instrument operating system. Typical quality checks
include frequent analyses of check standards, replicates,
spikes, blanks, and reagent blanks. These data may be
assessed rapidly by statistical methods and the accuracy
and precision compared with established accuracy and
precision data for that method, instrument, and
operator.
It is very attractive to develop an LDM system
using a computer in the laboratory that has direct access
to the online acquired data. This approach has the ad-
vantage of preserving the quality of the data since no
manual transfers would be required. Another advantage
is that fast turnaround times would be possible at rela-
tively low cost. Few, if any, LDM systems have been
developed that coexist with laboratory instrument auto-
mation systems. However, a system of this type is under
development by EPA and several panelists at this session
have discussed progress to date.
42
-------
There are three principal options for local LDM:
(1) background processing during the day in the labora-
tory automation computer, (2) overnight processing in
the laboratory automation computer, and (3) the em-
ployment of a second minicomputer in the laboratory
for data management. The second processor would have
access to the laboratory data via a shared mass storage
device or a high-speed data channel. Figure 1 shows a
proposed laboratory computer network for instrument
automation, LDM, and subsequent transfer of data to a
large computer for non-LDM operations. The large com-
puter system is a traditional data processing center.
COMPUTER
GROUP OF SLOW INSTRUMENTS
AA, ES, UV-VIS.TOC, IR, ETC.
MINICOMPUTER
ONE FAST INSTRUMENT
GC-MS
MINICOMPUTER
ONE FAST INSTRUMENT
FTNMR
MINICOMPUTER
GROUP OF SLOW INSTRUMENTS
MULTIPLE GC
COMPUTER
MANAGEMENT OF DATA
ABOUT SAMPLES CURRENTLY
IN PROCESS IN THE
LABORATORY
•LOAD PROJECTIONS
•WORK LISTS
•STATUS REPORTS
•PROJECT REPORTS
LARGE COMPUTER
MANAGEMENT OF DATA
FINISHED SAMPLES
•ARCHIVAL STORAGE
•TREND ANALYSES
•QUALITY INDEXES
•MATHEMATICAL MODELS
Figure 1
Laboratory Computer Network
43
-------
SUSPENDED PARTICULATE FILTER BANK AND SAMPLE TRACKING SYSTEM
By Thomas C. Lawless
INTRODUCTION
The National Air Surveillance Network (NASN)
was established in 1953 by the Public Health Service in
cooperation with State and local health departments.
Today there are approximately 300 urban and nonurban
sampling stations operating as part of the NASN. Part of
the network monitoring activity is devoted to sampling
total suspended particulate matter. These samples are
obtained with a high volume sampler which draws air
through an 8-by 10-inch glass-fiber filter. The sampling
schedule used by the network calls for the collection of
one 24-hour sample each 12 days, or approximately 30
samples a year. The Filter Bank System was developed
to act as an inventory system for monitoring the status
of samples during and after analysis, i.e., storage.
FILTER BANK SYSTEM
Exposed filters from each sampling site are sent to
its respective regional office, which, in turn, forwards
them to the Environmental Monitoring and Support
Laboratory at the Environmental Research Center in
North Carolina on a monthly or quarterly basis.
Attached to each filter folder is an Air Quality Data
Bank form for particulate data (Figure 1). This form
contains the filter number, the sampling date, the air
volume, and the 12-digit SAROAD (Storage and Re-
trieval of Aerometric Data) station code depicting the
exact location of the sampling site. SAROAD is a data
storage system which is part of the Aerometric and
Emissions Reporting System (AEROS). The Filter Bank
System requires that all filters be validated, e.g., checked
for tears and so forth, and that all filter cards be checked
for completeness of sampling information. Sub-
sequently, these filters are entered into the Filter Bank.
The Filter Bank is a data processing system
developed using SYSTEM 2000* on the Univac 1110. It
uses the Immediate Access, the Report Writer, and the
Program Language Interface aspects of SYSTEM 2000.
Use of SYSTEM 2000 allows specification without
restriction of elements in the data base which are key
fields and identification of hierarchical relationships
among elements in the data base. Information retrieval is
dependent upon components which are declared key
fields. Data security is maintained by password control
* SYSTEM 2000 - MR] Systems Corporation.
to the data base and additional password control to each
component. The Filter Bank System is centered around
a data base designed to use the standard SAROAD
coding procedures. There is. a logical entry (tree struc-
ture) for each site in the NASN. As shown in Figure 2,
data sets, which are subordinate to each site, contain the
filter information. Associated with each of these data
sets is another group of data sets containing the
pollutant, method, and units information and its specific
analytical result.
DATA ENTRY
Filter information is entered into the Filter Bank
through a interactive terminal by using a prompting pro-
cedure. This procedure contains a SYSTEM 2000
FORTRAN interface program which allows data entry
by non-ADP personnel. It also provides the editing capa-
bilities for the Filter Bank System. The site code,
sampling date, filter number, filter type, and air volume
are entered. Before the next entry is possible, the Filter
Bank System checks for a valid site code, a valid date,
and for completeness of entry. The entry is rejected if
these requirements are not met, and a short description
of the error is displayed on the terminal. The entry is
also rejected if the site/date combination is not unique
to the Filter Bank. If the error is obvious, the correct
information can be reentered immediately and when
accepted, the system replies by printing the 4-letter code
it is assigning to this particular filter. Use of a 4-letter
coding scheme can accommodate over 400,000 filters.
(The system is easily modified to a 5-letter code which
will accommodate over 10 million filters.)
The update session is terminated by executing
another SYSTEM 2000 procedure. The first part of this
procedure is a program which produces an update log
listing the site and date of each accepted filter. The
program also produces a label to be attached to each
filter folder, and this label contains the information on
the data card together with the unique 4-letter code.
Subsequently, labeled filters are stored until analysis.
The second part of this procedure is a SYSTEM 2000
Immediate Access program which backs up the Filter
Bank onto a magnetic tape to protect against loss due to
system or machine failure. This precaution was devel-
oped out of necessity during the early days of the
Univac.
44
-------
SAMPLE ANALYSIS
For the analysis of nitrates, sulfates, and ammonia,
a portion of each filter is cut and sent to the laboratory
accompanied by its assigned unique 4-letter code. There,
sets of samples are arranged for the Technicon Auto
Analyzer so that there are two complete sets of
standards, a standard after every tenth sample, and a
series of blanks. Periodically, quality audit samples
(previously analyzed samples) are included. Upon
completion of the analyses, the results of each set of
samples, standards, and blanks, in micrograms per
milliliter, with their associated filter codes, are tran-
scribed onto a specially designed form. In the event of
dilution of a particular sample, the dilution factor is
coded. Likewise, if a color analysis was performed, this
result is also coded. Then the Filter Bank System accepts
these entries, separates them into data samples and
quality assurance audit samples, and processes them
accordingly. The system produces a listing of the data
samples with final concentrations expressed in micro-
grams per cubic meter, calculated by using the
previously stored air volumes. A statistical summary of
the standards, blanks, and audit samples accompanies
each tabulation. These listings are returned to the labo-
ratory for validation and determination of acceptability
of the data (Figure 3). Using this summary and informa-
tion from previous standard analyses, an analytical data
quality indicator is presently being developed which will
quantitatively determine the acceptability of the
analysis.
For the analysis of metals, using the Optical
Emissions Spectrograph, quarterly composite samples
are prepared using information supplied by the Filter
Bank System. A SYSTEM 2000 FORTRAN program
prepares this information on site quarters which have
met established criteria regarding number and spacing of
filters. The filters which make up the valid sample and
the average air volume for the composite are listed. The
program also sets the composite flag of each sample in
the composite for future reference. Upon completion of
the laboratory analysis, a magnetic tape containing the
results is passed on to the Filter Bank System. The tape
is processed and a data tab is produced showing
duplicate analytical results, the average result, and the
percent relative standard deviation for all metals of each
site quarter.
The Filter Bank System then stores the date of
analyses and the sample results (nitrates, sulfates,
ammonia, and metals) using the standard SAROAD
formats for pollutant, method, and units with the other
filter information (Figure 4). In addition to storing all
the information in the Filter Bank, the system produces
a file of SAROAD formatted records to be passed on to
the National Aerometric Data Bank.
REPORTING
Periodically, computer printouts are produced to
inform the laboratory of filters or specific analyses that
were inadvertently overlooked or lost in processing. This
report also includes quarterly composite samples which
became valid and eligible for analysis due to late arriving
filters. This safeguard assures that all required analyses
are performed on all filters received. The Filter Bank
System inventories the data base and lists, by site, all
dates for which filters have been received, those filters
which have been analyzed, and those pollutants which
have been analyzed for, as well as analytical results with
yearly averages.
Various information retrieval techniques are pos-
sible. The Immediate Access feature of SYSTEM 2000
provides a user-oriented language with which a nonpro-
gramer may express his request for Filter Bank
information, This feature is highly suited for interactive
use from a remote keyboard. SYSTEM 2000 allows
access to the data base by multiple users simultaneously.
Using this feature, one can determine the status of any
filter, the analyses which have already been done, and
the results of these analyses.
The following examples demonstrate the usefulness
of the Immediate Access feature. The response time for
each of these examples varies with machine load but
averages less than 10 seconds. Component numbers
correspond to those in Figure 2.
(1) To print the filter information for which
either the site code and date are known or the
4-letter filter code is known:
PRINT SAMPLE WHERE SITE EQ
056980004A01 AND INDATE EQ 12/25/74:
210* HIV
220* 7
230* 12/25/1974
240* 0
250* 1044964
260* AIVR
270* 3063
280* 1
45
-------
PRINT SITE, STATE, CITY, SAMPLE
WHERE FILCOD EQ AIVR:
1*
2*
3*
OS6980004A01
CALIFORNIA
SAN JOSE
210* HIV
220* 7
230* 12/25/1974
240* 0
250* 1044964
260* AIVR
270* 3063
280* 1
(2) To print the pollutant information for a site/
date combination.
PRINT POLUT WHERE SITE EQ
056980004A01 AND INDATE EQ 12/25/74:
310* 12306
320* 92
330* 1
340* 9.1500
350* 2
360* 04/29/1975
310* 12403
320* 91
330* 1
340* 3.0000
350* 1
360* 04/29/1975
310* 12301
320* 92
330* 1
340* .6900
350* 2
360* 04/29/1975
(3) To print the date of filters which have not
been analyzed for a site:
PRINT INDATE WHERE SITE EQ
056980004A01 AND VALUE FAILS:
230* 01/06/1975
230* 01/18/1975
230* 01/30/1975
230* 02/11/1975
230*
230*
230*
230*
230*
230*
230*
230*
230*
230*
230*
230*
230*
230*
230*
230*
230*
230*
230*
02/23/1975
03/07/1975
03/19/1975
03/31/1975
04/12/1975
04/24/1975
05/06/1975
05/18/1975
05/30/1975
06/11/1975
06/23/1975
07/05/1975
07/17/1975
07/29/1975
08/10/1975
08/22/1975
09/03/1975
09/15/1975
09/27/1975
(4) To print the dates and air volumes for a site:
PRINT INDATE-FILCOD WHERE 01 EQ
056980004A01 AND INDATE LT 01/01/75:
230*
260*
230*
260*
230*
260*
230*
260*
230*
260*
230*
260*
230*
260*
230*
260*
230*
260*
230*
260*
230*
260*
230*
260*
230*
260*
12/06/1973
ACGC
12/18/1973
ACGD
12/30/1973
ACGE
01/11/1974
ACGF-
01/23/1974
ACG
02/04/1974
ADXB
02/28/1974
ADXC
03/12/1974
ADXD
03/24/1974
ADXE
04/17/1974
ADXF
04/29/1974
ADXG
05/11/1974
ADXH
05/23/1974
ADX1
46
-------
NATIONAL AIR SURVEILLANCE NETWORK
AIR QUALITY DATA BANK RECORD
(24 Hour or Greater Sampling)
PARTICULATE DATA
[71
(1) Station Name
Site Location
nn
st
Sampling Rate, mVmin
[
Sampling Time, min
1 I
1 1 1
Station Site
(2-10)
Yr Mo Day
1 1
I
Yr (15-20)
Filter No.
Station Code Agency Project Time
n m D
(11) (12-13) (14)
ST-HR
(21-22)
1
2
3
4
Pollutant
Participate
N03
SO 4
NH4
Method
Units
Mg/m3
/ig/m3
fig/m3
Aig/m3
Pol Code
1
1
1
1
1
2
2
2
1
3
4
3
0
0
0
0
1
6
3
1
Method
Unit
DP
Value
Card
Col.
23-36
37-50
51-64
65-78
OP= Number of digits to right of decimal
Filter + Sample.
Filter Weight _
Sample Weight _
Air Volume,
EPA(DUR)295
REV.5-74
Figure 1
Air Quality Data Bank Form
DATA BASE
1* SITE (Key)
2* STATE
3* CITY
20* SAMPLE (Repeating Group)
210* FILTER TYPE (Key)
220* SAMPLE INTERVAL
230* SAMPLE DATE (Key)
240* START HOUR
250* FILTER NUMBER
260* FILTER CODE (Key)
270* AIR VOLUME
280* COMPST (Key)
30* POLLUTANT (Repeating Group in Sample)
310* POLLUTANT CODE (Key)
320* METHOD OF ANALYSIS
330* UNITS
340* VALUE (Key)
350* NO. DECIMAL PLACES
360* DATE OF ANALYSIS
Figure 2
Data Base Structure
47
-------
SET STANDARDS
STO
STO
STO
STO
STO
STD
STD
STD
STO
STO
STD
STD
25.00
10.20
55.10
70*10
80.00
91.70
21.70
39.60
51.60
69.80
79.30
95.20
SOI
10/16/75
IOTH STANDARDS
STD
STO
STD
STO
STO
STO
STO
STO
56.00
55>30
56.20
55.70
55.30
51.90
55.00
55>IO
MEAN • 55.11
ST.DEV. •
.18
RANGE
(.30
BLANK SAMPLES
9371
937)
9371
1.70
3.50
.70
QUALITY AUDIT SAMPLES
SAMPLE
10021AFVOI
10031AC1FI
RESULT
I I.511 10.101
13.251 11.90)
DIFF.
-I .11
-1.35,
tolrF.
-|0|72
SAMPLE
AKLJ
AKLK
AMFV
AMFW
ANFX
AMFY
AMF2
AHGB
AHGA
AlUU
AIUV
AlUM
AIUX
AIUY
AJX»
AKLL
»KL«
AMGC
AHOD
AMGE
AHGF
AMGG
CONC
UG/ML
IB. 20
26.30
11 .20
57. SO
21.30
52.30
120,80
106.10
88.70
31,20
16.10
20,00
22.90
20.30
28.00
32. SO
27.60
29.20
26. MO
22,70
33.70
9S.IO
COLOR BKbRD
1.90 . 2>20
.50 2>20
.no 2*20
1 .00
1 .00
I. 10
3.70
1 .10
1.90
1.30
1 .00
.70
.00
.90
.20
.20
•20
.20
•20
>20
<20
'20
'20
• 20
• 20
.00 2>20
.00 2*20
.00 2> 20
.00 2'20
.00 2'2U
.00 2<20
.00 2*20
.00 2>20
AlRVOL
2251,00
2389.00
2166.00
2166.00
2116.00
2311.00
2108.00
2389.00
2251,00
2517.00
2583.00
2606.00
2739.00
2539.00
2583.0°
2709.00
2687,00
2709,00
2709,00
2687.00
2661,00
2612.00
CONC DUPLICATE TRIPLICATE
UG/CM UG/ML UG/CH UG/HL UG/CM
9,08
5.93
9.29
13.21
1.11
12.72
28.63 121.80 29.63
25.82
22.52
6.60
10.03
3.91
1.5J
1.06
S.99 30.10 6.53
6.69
5.67
5.98
5.36
1.58
7.09
21.10
Figure 3
Data Listing
48
-------
1* 010380003A01
2* ALABAMA
3* BIRMINGHAM
X
210*
220*
230*
240*
250*
260*
270*
280*
HIV
7
01/11/74
00
1039580
AAAB
2045
1
210*
220*
230*
240*
250*
260*
270*
280*
HIV
7
01/23/74
00
1039579
AAAC
2351
1
310*
320*
330*
340*
350*
360*
12306
92
01
2.73
2
10/07/74
310*
320*
330*
340*
350*
360*
12403
91
01
11.0
1
10/07/74
T^
1
1
1
'
1
I
l_
Figure 4
Logical Entry
V
49
-------
EIGHT YEARS OF EXPERIENCE WITH AN OFF-LINE
LABORATORY DATA MANAGEMENT SYSTEM USING THE CDC 3300 AT
OREGON STATE UNIVERSITY
By D. Krawczyk
ADP is the acronym for automatic data pro-
cessing. ADP use in chemistry laboratories indicates
considerable labor savings through its proper use. At the
last ORD ADP workshop, Byram et al. reported on one
facet of ADP: the combination of an automated colori-
metric system with a computer. In addition, this writer
discussed SHAVES, a broad aspect of data manage-
ment. Whether printouts, paper tapes, or published
reports are used to report data, the important products
from an analytical laboratory are valid results. Let me
quote from Rudyard Kipling:
"The careful text-book measure
(Let all who build beware!)
The load, the shock, the pressure
Material can bear.
So when the buckled girder
Lets down the grinding span,
The blame of loss or murder,
Is laid upon the man
Not on the stuff-the Man"4
As Kipling points out in the verse, blame for a
fallen bridge cannot be placed on education, loading,
shock, or other known variables but on the builder-
designer. The number of labor saving devices used in the
laboratory is unimportant. If the answer is invalid, then
man bears the responsibility. Eight years of experience
at the Corvallis laboratory have resulted in some novel
approaches to handling data, especially in quality as-
surance aspects.
The availability of the Oregon State University
computer, literally across the street, provided the labora-
tory group with an experimental tool to aid in handling
data. Initially, ADP use followed the sample control and
verification principle reported by Krawczyk et al. As
the demand for flexibility, responsiveness, and personnel
limitations increased, greater emphasis was placed on the
use of ADP as a tool. Getting the job done for the least
cost resulted in using automated chemical analytical
systems connected with ADP. During the 8 years of our
experience with ADP, the need arose each month for
changing or modifying one or another subprogram to
improve quality assurance of data. Not only has ADP
permitted the laboratory to respond more effectively
but also has made the job easier. The first and foremost
purpose of our use of ADP is the production of quality
data. From collection of samples to final reporting of
data, quality assurance is a way of operation within the
Corvallis laboratory.
The 8 years of experience began modestly with
approximately 6,000 samples and 40,000 analyses from
six to seven projects per year. Graphical representations
of projects, samples, and results since 1972 are shown in
Figures 1, 2, and 3. What was accomplished 8 years ago
in 6 months was exceeded in 1 month in calendar 1973,
1974, and 1975. In 1972, the workload was performed
with a staff of 17 permanent employees. In calendar
1975, the staff was reduced to 12 permanent people
with increased use of temporary employees to ac-
complish specific short-term tasks. The shift from
permanent to temporaries required further refinements
in use of ADP, especially from the quality assurance
aspect.
An illustration of a change made to provide more
rapid response with quality is a change made in ther
scheduling subroutine. During the last 3 years, the
computer was used to schedule the analysis of samples
for forms of nitrogen and phosphorus through our auto-
mated subroutine. Initially, the computer produced a
run with samples scheduled for analysis ordered as the
samples were brought into the laboratory. Thus, fresh
waters were mixed with marine waters; waste-water
treatment plant effluent samples preceded and followed
base water used in bioassay experiments. From a com-
puterized standpoint, first in, first out, was good man-
agement; however, the technical problems generated by
such an approach were difficult to handle. Very quickly
the scheduling was modified to permit choice of samples
by project designation. Project assignment was usually
made based on type of sample. The analyst chose his
mixture of samples and processed similar types at one
time. The procedure of first in, first out, was then
changed to permit combining similar samples into a
production run.
In the examples of outputs of quality assurance in
Figure 4, a listing of intercomparison errors is presented
for the week ending October 8, 1975. During this
50
-------
period, approximately 3,000 results were processed
through program "TECKNICON." Approximately 1,300
results were reported by analysts for metals, ATP, car-
bon, CHN, and so forth. As noted in the data or remarks
written in Figure 4, steps were taken though reruns, re-
filtration, or delineations (in cases of inadequate
samples) to determine the cause of sample intercom-
parison errors.
The computer program will disallow the replace-
ment or entry of a piece of data if the analysis was not
scheduled, if a replicate was not noted, or if there is an
analytical quality control problem. Input of any entry
outside of the regular scheme is flagged in the "un-
matched data card file." This flag again puts the burden
on the analyst. An example of an "unmatched data
card" page output is shown in Figure 5.
Each week a list of replicates is printed as shown in
Figure 6. Those replicates shown with four stars require
inspection by the section chief.
The printing of data that were rejected with reason
by program "TECKNICON" is a recent innovation. An
example of this output is shown in Figure 7. The desig-
nation, "past off," is a code indicating that the previous
sample was off scale and that the present sample may be
adversely affected because of washout characteristics.
Another system used in the past is a milliequivalent
comparison. This comparative approach requires the
complete analysis of soluble major ionic components in
the water system. An output of this type is shown in
Figure 8. When the cations and anions do not match, the
milliequivalent balance scheme points out problems in
analysis of soluble components or the possibility of the
presence of an unanalyzed ionic component. The section
chief must determine the action necessary to resolve
problems noted in milliequivalent balance output.
Figures 4 through 8 are a few examples of quality
assurance aspects incorporated through ADP into labora-
tory operations phases.
After 8 years experience with ADP, we have come
to the following conclusions:
ADP can be a tremendous asset in laboratory
operation, especially when handling a large
workload.
Not only has ADP proved effective, its use in
laboratory situations will increase.
As in all laboratory functions, quality as-
surance within all phases of ADP is a first
consideration.
In summary, let me turn to Rudyard Kipling's
poem, "Arithmetic on the Frontier," and quote an ex-
cerpt from the poem:
"A great and glorious thing it is
To learn, for seven years or so,
The Lord knows what of that and this,
Ere reckoned fit to face the foe - "4
Eight years of experience with ADP has convinced
us that ADP is a great tool. With ADP in hand, the
laboratory manager can wage the fight against the foe
(bad data).
REFERENCES
1 Chandor, A., Graham, J., and Williamson, R., A
Dictionary of Computers. Baltimore: Penguin Book
Inc., 1970, p. 27.
2 Byram, K. V., Roberts, F. A., and Wilson, L. A., "A
Data Reduction System for an Automatic Colori-
meter," Proceedings No. I, ORD ADP Workshop,
1974.
3 Krawcyzk, D. F. and Byram, K. V., "Management
System for an Analytical Chemical Laboratory,"
Amer. Lab., vol. 5, no. 1 (1973), pp. 55-62.
4 Kipling, Rudyard. Rudyard Kipling Verse, Defini-
tive Edition, Garden City: Doubleday and Com-
pany, Inc., 1973.
5 Krawczyk, D. F., Taylor, P. L., and Kee, Jr., W. D.,
Laboratory Sample Control and Computer Verifi-
cation of Results at the Lake Huron Program
Office. Paper presented at the Ninth Conference of
Great Lakes Research, Illinois Institute of Tech-
nology, Research Institute, Chicago. March 29,
1966.
51
-------
25-
20-
o
cc
5 15'
oc
LU
ffl
D
Z
10-
x/
\y\.
•/ / A^'
1974
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Figure 1
Projects Submitting Samples
52
-------
4,000-
3,000-
Q
UJ
CD
C/5
01
CL
1974
2.000-
1,000-
1973
1975 4' /
1972
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Figure 2
Samples Submitted for Chemical Analyses
53
-------
25.000-
20,000 •
DC
o
Q.
15.000-
CO
1974
10,000-
5,000-
1975
1973 -I'
1972
X \/
/ \ \ /
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Figures
Results Reported to Project Leaders
54
-------
INTRASAMPLE COMPARISON ERRORS -1G/C3/75
PAGE
(COMPARISONS SHOWN ARE C0D
* INDICATES WHICH RESULTS
Qf CORRECTED RESULTS ONLY
T NEW OR
1U33033/75
2930066/75
-2932T1-8775
2933035/75
293<»021/75
2936035/75
7021t»93/75
7023351/75
-70^335-3/75
702«*039/75
702<»0<»C/75
-702-7?25/75
703236^/75
ORT
ORT
-QRT-
ORT
ORT
ORT
ORT
ORT
ORT
ORT
ORT
ORT
ORT
AS
AS
AS
as
AS
AS
AS
as
AS-
AS
AS
AS
AS
P
P
P
P
P
P
P
P
-p
P
P
P
P
> TOT
> TOT
> TOT
> TOT
> TOT
> TOT
> TOT
> TOT
> TOT
> TOT
> TOT
> TOT
> TOT
PHOS(
PHOS(
PHOSf
PHOS(
PHOS(
PHOSt .
PHOS(
PHOSl
PHOS(
PHOS(
PHOSC
°HOS(
PHOS(
.62J» >
.' 2 0 5 >
7. 9<
7. IT:* >
f^c
5'. !» 0 C >
5*. 100 >
.a/f
. 19C >
,*290* >
.'27C* >
.^T
.350* >
/^73cr^">"
.155 >
• c)^
.39C* >
.020*
'.190*
?. yv
7.130
5.'700*
y. 57
4.700*
££fc*
'.QUO*
.09C
c-
.D^C
.^0
-.&
:t&
-017
.030
)
)
)
)
)
,
)
)
)
,
;
>
!>
/
Figure 4
Intrasample Comparison Errors Reasonable Chemical Comparisons
-------
UNMATCHED DATA CARDS 10/16/75 PAGE <»
TYPE CODE CH U LAHNO 9 ANSWER C
UNMA 79 999J1 56 1 1036012/71*
UNMA BO 90801 59 1 1036013/7*.
UNMA 91 998Q1 59 1 1036C18/7«»
UNMA 92 99801 70 1 ll;230G<»/75 A
fWTS 83 99801 59 1 1C 36008/75 A
BUPO 8U 999J1 59 1 1P36GC9/75 A
UNMA 86 671131 1 291<*'!77/75
SUM* 87 610 59 J 292606^/75 y
OUnC 98 625 59 ) 292707I./75
SUPS 89 671 59 0 2927C7(f/75
BWft 90 6.'5 59 u 2928GC1/75
Dune- 91 625 59 C 29283C2/75
PUPS. 92 610 2* 0 2929019/75 1
OUFO 93 671 af> 0 2929C19/75
•OUPS 9<» 6lO 26 C' 2929321/75 <
BHPO 95 610 2% o 2929Q26/75
BUnt 96 613 1*, ] 2928C27/75
flung 97 671 26 C 2928J3G/75
""p^, 98 671 36 J 2929031/75 A
J"rt_ 99 671 26 0 2929033/75
«*M*6 100 671 26 3 2928i:35/75
Syna. 101 671 26 u 2928036/75 A
•«Or> 102 671 26 0 2928J37/75
UUIJJ103 610 26 0 2931Di»0/75
UUF>-10<» 610 26 3 2931095/75 A /
.120
.230
.1WC
3. 500
.a£^
^9.500*
9.200
5.100
10.000
16.00C
-<=r=
10.500
.(•20
.050
ttO.500
6.90C
.35C
: 2.300
8.300
9.9CO
1.150
.q-.I*°J
^^
9.ooy
fORIG=A .070 71
(ORIG=A .055 JJ
1 1 T
r-°"-Ga< - *n
(ORIG= 9.2\D01 >
(ORIG= 5. IOC 1
tORTG= 10.COIO I
(ORIG= 16.000 )
^(ORIG= ffi H)
/rQRIG= t HI
<0:?IG = A .llffl 1
(ORIG= fll H)
IORTG= J)|H>
JOI>IG= pJH)
(ORIG= C \H»
(ORIG= p M)
-------
REPLICATES
610
614
•••• 625
63J
665
671
625
665
671
fel-j
»•*» 61J
»••» 671
•••• 630
630
»»•» 61i)
»»•» 630
671
671
63]
671
61J
671
63J
63J
671
V
671
61u
FOR 10/22/75
535001
539u08
539JC8
539008<
539008
5391C8<
103BJ02
1CL-.3-C2
U 381J.
292803-,
2923.36
2928:36
2930J71
2935^8
89360-35
2936,3,
2937000
2937059
293706.
293706*
2938002
2938,02
2938J12
2938017
2938020
2938021
2938J22
.020
.u!77
.010
.0091
1.'0025
-u . J623
.071
.0667
Io03i»
.-J7W
Ij385
.010
.0117
lu .500
lC.i»332
.•"Si*
lll912
2^» . J u u
.160
-(KOG37
.j65 ,
.36U7
.120
.1172
lli»l71
12.6CL
12.6526
6.9 Ji>
6.8775
l'.<»259
2.880
2.8662
.190
.1912
9.20C
9.2339
<».20u
£4 • X & U &
^ • ^ 0 o
9. 200
9.1591
.'.CO
.015
.0170
.010
.0091
1.10C
1.0969
.005
-0.0023
.070
.0700
.005
.0031.
.350
.3W7
.030
.0296
.010
.0117
ld592<»
.09".
.0937
1.250
1.2527
22.003
22.t9<.7
. 166
.1591
.025
-0.0106
. lOo
.0988
.110
.1137
1 1 1.236
13.000
12.9701
7. 100
7.0637
i:*"/
2.880
2.853<<
. 190
.1915
9.300
9.3953
<*.20&
i«!38i«8
9.1.00
9.3625
.i»lC
Figure 6
Replicates With Coded Designation
(Coded-Starred Require Judgment on Acceptance)
57
-------
LAttNO PAKAM REASON COM ANS
KAC COMPOTLU BATCH
10 380bO
0
0
0
4437000
0
4533283
7024032
7027073
3
7039342
7029343
3
7029344
7029345
7030259
7030260
7031305
5
7031307
7031 393
3
7031394
7031396
7031397
7
7031398
7031401
1
1
7031^07
7
7031408
7031410
7031413
70314^fa
70314^7
7032376
7032377
7032380
70323H1
7032382
2
2
2
2
2
7032.183
3
70323H4
7032397
7
7032398
7032399
9
7032400
0
62S
625
665
6b5
625
66=.
665
665
66S
665
63ll
ft 30
630
630
b3o
630
630
610
610
610
671
671
67\
630
b30
671
630
630
630
olO
6b5
6b5
665
630
630
630
630
630
630
630
630
630
630
610
610
671
671
630
Sl0
630
665
630
630
630
67\
671
671
OFFSCALt
OFFSCALE
OFFSCALt
OFFSCALt
PAbT UHF
PAST OFF
1 ULA'vIK
SHLOrt PK
KAO
rtAJ
OFF
OFF
'.iFF
OFF
OFF
PAS
PAS
OFF
OFFSLALL
PAST UK-'
OFFSC.ALt
UFFSCALE
WAST 'rtf
OFFSCALt
PAbT OFF
SHLO« PK
PAST OFF
MAU SP*
MAD SPK
h^AD b^K
oFFSCALE
OFFSCALt
PAST OFF
SHLOP P(\
SMLOK ^K
OFFSCALt
PAST !;*• f
uFFSCALt
PAST DhF
OFFSCALt
OFFSCALt
OFFSCALF.
DFFSCALL
tiAU SP^
HAD SPK
riAU SPK
HAD bP^
PAST OFF
SrtLOK PK
HAST OFF
SHLDr? PK
OFFSCALt
OFFSCALt
MAST OFF
OFFSCALt
PAST. OFF
PAST OFF
0
0
0
0
u<
0<
0>
n<
u
o<
0
0
c<
I)
0<
0<
0<
0<
0<
0<
u
0
0<
0<
0<
0
0<
0
0<
0
0
0
0
0<
0<
0<
0<
u<
0<
0<
0>
c
u
0<
0
0<
0<
26.300
26.300
b.300
b.300
.SSO
.020
.010
.041)
J . t. U U
.040
.bbO
,b50
.IPS
1.050
.180
.045
.1?0
.055
.Ob5
.1/30
1 .ODO
l.J5u
.080
.025
.0?5
1.100
. 31/0
1.100
.075
1.100
1.100
1.100
1.1CO
.090
.090
.040
.035
.0^0
.010
.J15
.010
1.100
1.100
.045
.660
.060
.055
10. UU
10.00
10. uO
10.00
1.S1
.3b
.19
.71
1U .01'
.9M
10.00
10.00
?. It*
1 0 . 'J f)
1.99
.99
1 .44
.86
.86
.91
I 0 .UU
10. Ou
1.11
.hO
.el
1 o . o ')
3.0 1
] 0.00
1.02
10.00
10.00
1 0 . 0 U
10. OU
1.88
1.88
.82
.81
.79
.nl
.SO
.49
10.00
10.00
.76
10.00
1.14
1.03
5
5
5
5
1
1
1
1
>
s
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
)
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
10/<;l/7b
10/21/7t>
10/21/75
10/21/7S
10/21/75
10/^1/75
10/21/7S
10/21/7S
10/21/75
10/21/75
10/2I//5
10/21/^5
lu/21/75
10/21/ r-3
1J/P1/75
10/21/75
10/21/75
10/21/75
10/21/^3
10/21/75
10/21/75
10/21//5
lu/Zl/Vb
10/21/75
10/21/75
1U/21/75
10/21/75
10/21/75
10/21/71}
10/21/7S
10/21/75
10/21/75
10/21//5
10/21/7S
10/21/7b
10/21/7S
10/21/ /b
10/21/75
10/21/75
10/21/75
10/21/75
10/21/75
10/21/75
10/21/75
10/21/75
lD/^1/75
10/21/75
lu/dl//5
10/21/75
10/21/75
10/21/V5
10/21/7^
10/21/75
U/21/75
10/21/75
10/21/75
10/21/75
A080630
A080B30
A080830
A080830
A080830
A080830
A080830
A071113
A07111J
Au/1113
AOM112
A071 112
A071112
Ao711 12
AOHl )2
A071112
A071112
A071 112
A0711 12
A0711 12
A071112
A0711 12
AU/1V12
AOni 12
AJ711 12
A071112
AU7U12
A071112
A071112
AOM1 12
AOM113
A071113
Ao711 13
AO M112
A071112
A071112
A071112
AU71112
AU/1112
A071112
A071112
A071 1 12
AG Ml 12
A0711 12
A071U2
A071112
A071112
A0711 12
A0711 12
A071112
A071113
AOM112
A071112
A071112
A071112
AOM112
A071112
Figure 7
Computer Output of Rejected Data
58
-------
MlLLIF.QUIVALF.NT COMPARISONS
UO NOT MATCH
CA= 20.0 MG= ls.0 NA= 13.0 K=
_C_03= 16.0 S04= SS.O CL= ?4.0
MIJ.LEOUIVALENT VALUES
CA= 1.447100 Mr,= 1.56*940 NA =
C03=""i.312800 504= 1.14S1QO CL=
K =
SUMS
"CATIONS =
IN 712?201,?<>9|<>?H.J f-
CA= 3«.0
)"3= 1H.O
MIL'LEOUIVALRNT
"CA'=
C03=
MH-LEOUIV/ALTNT SUMS
CATIONS = 4.?'306.ll
AMtONS = 3.903160
MA=
CL =
1.348500
1.071980
TN 712230] .?600tt?7>< MILL
C03= 1S.O S04=
7.3 Nfl=
9.U Cl. =
CA =
1.0<»7flOO
l.?4Q500
"MTLLEOUIVALFNT SUMS
"TTATTONS = 1.99019^
ANIONS = 1.-606140
'S 00 HOT MATCH
4.U K= .7
. I 74000 •< =
.1.-73HO CL =
IN 7122501.3607«98P M ILLItuul VALfNTS HO NOT MATCH
1.4
CA= 26.0 MR=- %.S NA=
C03="~I3.0 504= 2.0 CL =
3.S K=
ft.O
VALUES
~C5^T.297400 H6=
C03= 1.0«?900 504=
MILLFQUIVALENT SUMS
NA=
,04164-j CL =
.1522^0 K=
CATIONS = 1.937B7H
ANIUNS"=""' 1.293800
T73~3Tnv~36«»°90R MILLIEQJIVALENTS DO NOT MATCH
.0485H3
.058H11
.01
•03S796
Figures
Special Computer Program to Compare Chemical
Balance in Water Sample
59
-------
THE CLEANS/CLEVER AUTOMATED CLINICAL LABORATORY PROJECT
AND DATA MANAGEMENT ISSUES
By Sam D. Bryan
INTRODUCTION
As participants of this second ORD ADP workshop,
we were asked to orient our discussions around ADP
issues. An issue involves a decision of importance; there-
fore, this paper will present a brief discussion of some
data management decisions important to the CLEANS/
CLEVER project. Some of these decisions have already
been made, but others will not be made for the next
several months. Hopefully, this discussion will be useful
to others who are now, or will be, faced with making
similar decisions. The opinions stated are the author's
and are not necessarily shared by anyone else.
CLEANS/CLEVER SYSTEMS OVERVIEW
The Clinical Laboratory Evaluation and Assessment
of Noxious Substances (CLEANS) program will be
located in the present EPA Clinical Studies facility at the
University of North Carolina in Chapel Hill. The Clinical
Laboratory Evaluation and Validation of Epidemiologic
Research (CLEVER) program will be located in
sophisticated mobile laboratories that will travel from a
home base in Chapel Hill to various locations across the
United States.
The CLEANS program will be conducted using two
large exposure chambers and an adjoining area for a
computerized physiological data acquisition system and
a pollutant control system. Using these self-contained
exposure laboratories, CLEANS scientists can expose
individuals for extended periods of time to air pollution
conditions that are the same as those found in urban
areas.
Until the CLEANS program, clinical studies have
been performed only during brief 2 to 6-hour exposures
to pollutants. Such short-term exposures provided only
limited information on the interaction between pollutants
and human physiological systems. Longer term
exposures, particularly when atmospheric conditions can
be precisely simulated, will greatly increase knowledge
about health effects of airborne pollution.
The data system will enable the staff of the Clinical
Studies Division to make more frequent and more
precise measurements of human physiological reactions
and to determine physiological function decrement and
recovery over extended periods of time.
The controlled clinical laboratory study of air
pollutants and other environmental stress on humans
will provide health effects data for:
Continued evaluation of existing air quality
standards and potentially noxious substances
Establishment of short-term standards for new
air pollutants
Establishment of standards for odors, noise,
and microwave radiation.
Significantly, these programs will provide information
necessary to evaluate the adequacy of current ambient
air quality standards.
The two clinical environmental laboratories will
have beds and bathrooms as well as televisions and tele-
phones. Food, clean clothing, and linens will be supplied
through a catering service to make the laboratories fully
inhabitable both for short and extended stays. The
environment of each exposure laboratory will be con-
trolled (either manually, or automatically by the com-
puter) for temperature, humidity, lighting level, and
pollutant gas and aerosol concentration. The process
control computer also will be used to record environ-
mental parameters and operational status of the
measurement instruments. The laboratories also will be
equipped with an array of cardiopulmonary exercise
instrumentation and an associated physiological data
acquisition system. Each exposure laboratory will
contain complete examining and testing equipment for
the human research subjects.
For the CLEVER program, two mobile van systems
will be used to study cardiopulmonary functions of
humans exposed alternately to high and low ambient air
pollution levels in their home environment in the United
States. A standard 34-foot self-contained motor home
will be the shell of the CLEVER mobile laboratory. By
eliminating the standard cooking, sleeping, and living
60
-------
space areas in the mobile unit, room will be provided for
the large array of equipment needed. Behind the driver's
area, the mobile unit will contain areas for reception and
interview, test preparation, subject examination, housing
of the computer, medical instrumentation, and exercise
equipment.
The CLEVER program's mobile laboratory will
have physiological measurement equipment identical to
that housed in the stationary CLEANS laboratory in
Chapel Hill. Each system is designed, constructed, and
instrumented to ensure that data obtained will be
. directly comparable between the two facilities.
The mobile facility will be used to obtain and
evaluate clinical data from epidemiologjcal studies in
various population study areas.
The CLEVER mobile laboratory will travel to the
home areas of populations being studied to gather data
on environmental pollutants effects on human health.
The primary mission of the mobile laboratory will be
pulmonary function and cardiovascular performance
measurements. In addition to this capability, general
medical histories will be taken, physical examinations
made, and biological specimens stored. Future plans call
for expansion of the CLEVER mobile laboratory capa-
bilities to include other noninvasive physiologic
measurements.
Computer Sciences Corporation (CSC) has been
contracted to design and build the fixed facility as well
as the two mobile systems. EPA expects to begin using
these systems in 1976. Some of the work discussed
below is original work by CSC.
COMPARISON TO OTHER EPA LABORATORY
AUTOMATION PROJECTS
Many of the issues of this project will be found in
other laboratory automation projects. The paradigm is a
sensing instrument connected to an analog-to-digital
converter, connected to a computer, which displays,
analyzes, and stores the sensed data. It is the same in this
project as it is, for example, in the chemical laboratory
case. This project, perhaps, differs quantitatively from
the chemical laboratory case by interfacing a relatively
large number of different instruments (about 12
presently) and by sampling the analog outputs of some
instruments at a relatively high rate (e.g., 500 Hz for the
ECG amplifiers). This clinical system also is highly
integrated so that the system operator can interact with
a complex array of instruments and perform elaborate
subject testing protocols through a single display screen
and a single customized keyboard.
In general, this clinical system is very large,
complex, and nonstandard. The online acquisition and
control software alone required about 30 man-years of
programing. This effort, understandably, had to be
achieved through a contractor. The in-house versus
contract issue is extremely important, but will not be
discussed here.
The complexity of the system stems from the need
to do a number of tasks concurrently. For example, it
must sample, display, and store ECG signals, do simple
analyses on them, and control the speed and elevation of
an exercise treadmill.
There are both nonstandard hardware and non-
standard software features. Special hardware interfaces
are required between the medical instruments and the
computer since there is generally a disparity in the
output voltage ranges of the medical instruments and the
A/D input range of the computer. Special hardware was
also designed to interface the human operator with the
rest of the system. This special equipment takes the
form of a customized keyboard and display. It was
considered very important to make the operator's inter-
action with the system as streamlined as possible in
order to process a high throughput of subjects. Inter-
action was made as simple and foolproof as possible in
order to simplify operator training and to reduce
operator mistakes. The issue of using off-the-shelf versus
custom-built hardware devices and interfaces is another
important issue; however, it is outside the scope of this
paper.
The system uses nonstandard operating systems as
well. In particular, the PDP-11, RSX-11A, and DOS
operating systems were modified to coexist on the RK05
disk, to share disk files, and to pass control of execution
back and forth. The gas monitoring and control com-
puter uses the RT-11 operating system.
Another distinctive feature of this laboratory
automation project is the overwhelming emphasis placed
on subject safety. This is apparent mainly in the
pollutant control system, where elaborate concentration-
range checks and performance-status checks are made.
High reliability of the total system is also of
paramount importance, since downtime will be so
expensive both in terms of its possible damaging effect
on experimental protocols and on the time of test
subjects and technicians that it would waste.
61
-------
We are faced with pushing the state-of-the-art in
some cases, where an attempt is made to automate a
previously manual task. Such ventures incur time and
money risks. Automating versus not automating is
another important issue that is outside the scope of this
paper.
Although some of the issues we face in the
CLEANS/CLEVER project differ at least quantitatively
from issues faced in other laboratory projects, most of
the data management issues discussed below are relevant.
THE ISSUE OF ONLINE VERSUS OFFLINE
REPORTS
In the physiological system, which is considered
here, many of the tests require subject cooperation. For
example, in the forced vital capacity spirometry test, the
operator exhorts the subject to exhale as fast and as
completely as possible. This maximal effort is used as a
norm to make comparisons to other subjects, or com-
parisons to the same subject at different times. Since a
subject "maneuver" is often inadequate, usually multiple
maneuvers are required. This is one important reason for
seeing the results of the tests online, so that the operator
can direct the subject to try again if poor performance is
indicated. To store spirometer output voltages blindly
on an analog tape, for example, would either require the
subject to perform a large number of maneuvers, which
is not desirable, or to perform just a few maneuvers in
the uncertain hope that he has done his best, which is an
even less desirable alternative.
The subject is just one of several components of the
system with which something can go wrong. In the case
of spirometry, for example, any of the following can be
faulty: the spirometer bellows, transducer, amplifier,
signal conditioning circuitry, A/D converter, computer
hardware, or software. The ability to display the
spirometer signal and the numeric values derived from it
online gives the operator the necessary opportunity to
note a problem on the spot and thereby prevent
erroneous data from being recorded. Although this is an
important capability in the chamber facility, it is even
more important in the vans, where it is not always
feasible to have a subject come back for retesting if
something went wrong.
The other physiological measurements also require
on-the-spot abbreviated reports on overall data
acquisition status for the operator. Offline reports are
appropriate once successful acquisition of the data has
been assured. The longer, more fine-grained reports on
physiological response required for subsequent correla-
tion with the pollutant data should be done off-line.
This is interrelated with the following issue.
THE ISSUE OF COMPUTING ON THE LABORATORY
MINICOMPUTER VERSUS THE CENTRAL
MAXICOMPUTER
This question involves which programing tasks
should be performed on the laboratory minicomputer
and which ones on the central maxicomputer. The
answer, as usual depends on the task.
Some tasks must be done on the laboratory mini-
computers. These are the real-time tasks involving fast
sampling rates or special laboratory devices that cannot
be supported at a terminal, or tasks requiring such a
large percentage of uptime, as in the pollutant control
system, that a dedicated minicomputer is required both
technically, because simpler machines are more reliable
and managerially, because dedicated administrative
control is required for a critically important function.
Some tasks must be done on the central maxicom-
puter. These are tasks that involve very large programs,
such as the sophisticated statistical packages, or that
involve very large online files, such as those required for
efficient sorting or online queries of a large data base.
There are also the tasks that theoretically can be
programed on either the minicomputer or the
maxicomputer. The turnaround time requirement can
swing the decision to the use of one or the other. If
turnaround time on the order of 48 hours is acceptable,
then the maxicomputer is appropriate. The minicom-
puter should be left as free as possible for tasks that it is
uniquely suited to run. To arbitrarily add tasks to the
small system eventually would create, unnecessarily, a
small computer center with the attendant problems of
scheduling, tape and disk library management, more
operators, supplies, and maintenance problems. It is
possible that evolution of this type of operation is
unavoidable, but a special effort should be made to
avoid arbitrary loading of the minicomputer system. If a
task can be performed on the maxicomputer (and if
there is no fast turnaround requirement), a number of
advantages result: 1) the large machine is generally more
accessible both for program development and for
production runs since it timeshares a large number of
terminals; 2) there is generally more manpower available
for programing assistance on the maxicomputer than on
the minicomputer; 3) program development is generally
62
-------
easier on the maxicomputer due to more core avail-
ability and a wider variety of high-level languages; and
4) there exists greater availability of large general
purpose statistical and data management packages.
There is the final case to consider; that of a task
which could be programed on either machine, but which
has a fast turnaround requirement (e.g., 4 hours). Such
tasks involve status reports based on the offline pro-
cessing of the "history" tapes output by the physiolog-
ical and pollutant system. Examples are: 1) trend-
plotting of recent physiological measurements that
would be used both for subject safety supervision and
for quality assurance of the entire system, and 2) limit-
checking of both physiological and pollutant variables
that would be used for the same purposes. These are
examples of status checks used to protect the subject's
safety and to ensure the acquisition of valid data. Such
status information may indicate a need for fast remedial
action, thus, fast turnaround is mandatory.
The conservative approach would be to perform
these tasks on the labortory machine where fast turn-
around is more likely to be achieved on a 24-hour,
7-day-a-week basis due to the factors, mentioned above,
of small machine reliability and local administrative
control.
Even if the maxicomputer were 100 percent
reliable and accessible, it still would not necessarily be
the proper choice for these tasks. The only way of
achieving fast job turnaround using the maxicomputer
would be to enter the jobs through an RJE terminal at
the remote laboratory site and to get the listings back
over the RJE terminal. This is because our closest labo-
ratory computer, which is in the fixed exposure facility,
is over 10 miles away from the central maxicomputer
site, and shipping tapes and listings back and forth
would be too time-consuming. This introduces two
difficulties. First, too often the communications link
either is down or drops connections in midtransmission.
Second, at least for the foreseeable future, one of the
data acquisition minicomputers in the fixed exposure
facility will be used as the RJE terminal. If we have RJE
capability in the mobile vans, it will certainly be through
the onboard minicomputer. One important goal is to
free the minicomputer by using the maxicomputer.
However, the simple report-generating programs that we
are considering here, trend-plotting, and limit-checking,
are I/0-bound, and the minicomputer would be tied up
sending data over a relatively slow communications line
(e.g., 2000 baud), waiting some time for the job to be
run, and then waiting for the report to be sent back for
printing at a rate much slower than the minicomputer's
line printer rate. So the minicomputer is tied up longer
for this kind of task in trying to use the maxicomputer
than it would be in simply running the task on the mini-
computer (provided enough online file storage is avail-
able). The possiblity of running an RJE task con-
currently with the physiological applications tasks is not
practicable since the operating system under which the
applications tasks run cannot support continuous
multitasking.
The most conservative, and currently the most
favored, approach is to develop these trend-plotting and
limit-checking programs on the maxicomputer as well.
There are several advantages to this: l)a trend program
is already required on the maxicomputer to do plots of
long trends which cannot be done on the laboratory
minicomputer due to limited online storage; 2) the
feasibility of using the maxicomputer and the RJE in a
fast-turnaround report-generating mode could be tested
this way and, of course, this is the only sure way to
know if that approach can work, and is superior to
armchair speculation; and 3) perhaps most importantly,
redundant maxicomputer analysis could be used by the
quality assurance supervisor (see next section) as a
cross-check against the values reported on the mini-
computer. To elaborate on 3 above, this project has so
many components that might fail and its resultant data
is so important, that redundancy can be vital. Just
recently, in a large study in the Clinical Studies Division,
a problem was brought to light by redundant analysis,
saving the investigators from reporting possibly incorrect
results.
These status-checking programs were not included
in the current CLEANS/CLEVER contract scope of
work. The present plan, subject to various approvals, is
to hire programers under an operations and maintenance
contract to perform this programing. It is possible that
in the not-so-distant future, additional online file storage
may be required; and in the longer range, another CPU
may be required depending on the workload imposed on
the current configuration by such requirements as
expanded applications programs, RJE, and onsite status
checking.
THE ISSUE OF QUALITY ASSURANCE
The most critically important data management
function is quality assurance of valid data. The end
product of the combined online and offline systems is
reports by the clinical investigator on the relationship of
physiological response to pollutant dose. The im-
portance of the data underlying these reports is obvious.
63
-------
However, the size and complexity of the system
demands a special effort to ensure that everything is
working properly to produce valid data. In fact, both the
online acquisition system and the offline data manage-
ment system are complex. The difficulties that could
arise online were mentioned previously with the need for
online reports, using the spirometry measurement as an
example.
In the offline case there are fewer potential hard-
ware problems, but there are some data flow logistics
problems. First, there are many possible sources of data.
They include:
The physiological system history tape
The controlled pollutant system history tape
Handwritten physiological values (when the
automatic system is down)
Handwritten pollutant values (when the auto-
matic system is down)
Medical questionnaire form
Microbiological and metabolic laboratory
reports
Handwritten pollutant values at the mobile
van site
Information from various operators' log
books.
For some of these sources of data, coding, keypunching,
and verification are required. Also, for each of these
sources of data, there must be a listing and edit program.
When data are detected to be in error, the edit program
is used to correct the data where possible or to purge it
where correction is impossible. Once remedial action has
been taken, its effect must be verified. Subsequently,
merging with earlier data and other forms of data must
be performed, and so on. These steps finally lead to a
cleaned up data base. Thus, the data flow through many
stations and in a variety of forms, and the coordination
of these activities presents a real challenge.
Presently, it is our feeling that a single individual
should be responsible to follow the daily flow of data
full-time to ensure that a clean data base actually results.
This "quality assurance supervisor" also should have
knowledge of the status of the instrumentation calibra-
tion, standardize testing procedures, and audit checks of
the system. This supervisor will serve as a liaison among
the clinical, operations, programing, and clerical
personnel. He will have the important responsibility of
responding to systems-oriented questions from any of
these groups.
CONCLUSION
This paper has summarized some of the technical
and administrative data management decisions that have
been, or are in the process of being, made by persons
involved in the CLEANS/CLEVER project. We
appreciate having the opportunity in this forum to share
our deliberations with fellow ORD computer users.
64
-------
REQUIREMENTS FOR THE REGION V
CENTRAL REGIONAL LABORATORY (CRL)
DATA MANAGEMENT SYSTEM
By Billy Fairless
Laboratory scientists and supervisors must perform
data management for the following procedures:
Analyzing samples in a timely manner
Observing recommended holding times for
perishable parameters
Maintaining a balanced workload for laborato-
ry personnel
Allowing time to identify and correct inaccu-
rate data
Performing in an overall efficient manner.
Prior to understanding the minimum data management
requirements for a service laboratory such as the CRL, it
is necessary to understand laboratory operation. We will
follow a survey from inception to completion in this
paper.
First, a survey is programed to satisfy a stated pur-
pose, and a project officer (PO) is assigned to it. The PO
specifies the numbers and kinds of samples to be collect-
ed and the parameters to be analyzed for each sample.
He follows existing quality assurance guidelines for items
such as reagent blanks, sample preservatives, bottle
types, and sample volumes. In Region V, a computer
technician establishes the basic data form for the survey
(see Figure 1) using existing software and the OSI com-
puter. The technician adds station identifying informa-
tion (latitude-longitude, river mile, etc.) as necessary,
and the finished data is input into STORET.
A copy of the basic data form is given to the data
samplers prior to sample collection. The samplers then
establish their travel plans, arrange for sampling bottles,
sample preservatives, proper labels, shipping of collected
samples, and purchase of ice or dry ice while in the field.
They wash and label all sampling bottles as necessary,
collect and preserve the requested samples, and complete
all field measurements for parameters such as tempera-
ture, wind direction, precipitation, pH, and turbidity.
Note that a sample usually will consist of at least seven
different bottles and frequently will contain over ten
different bottles. Many bottles are required because the
parameters are preserved differently; obviously a bottle
preserved with nitric acid (for metals) could not be ana-
lyzed for nitrates. A sample containing seven bottles is
shown below.
Sample: 75.-19876-
Preservative
No preservative
Nitric acid
Sulfuric acid
NaOH
CuS04/H3P04
Ice
Formalin
Parameters
pH, cond., solids, BOD, alk, etc.
Metals except Ag and W
NO3, phos, COD
Cyanide
Phenols
Organ ics
Biological
The CRL uses 24 different methods to preserve samples
and routinely analyzes for over 200 different parame-
ters. In FY 1974 the average number of analyses per
sample was 40.
When samples from the field arrive at CRL, they
are received in the laboratory shipping and receiving area
where label data are checked against information on the
basic data form. The form is obtained from computer by
shipping and receiving personnel and includes field data
entered by samplers upon return to the field office. It is
necessary to correct the information on the basic data
form either because some samples are broken, others are
not collected, or more samples are taken than originally
planned. Samples are then divided according to the labo-
ratory section that will perform the measurements
(inorganic, metals, organic, biology) and copies of appro-
! priate pages of the corrected basic data form are given to
the section chiefs.
Up to this point in the operation, all samples are
kept together as a survey group. Once in the laboratory,
however, efficiency dictates that they be mixed with
samples from other surveys. This mixing is one of the
65
-------
critical differences between the f operation of a high-
production service laboratory and that of a research lab-
oratory. Mixing is required because preparation time to
perform an analysis is approximately 2 hours and shut-
down time is 1 hour. Therefore, 3 hours are required to
obtain the first parameter concentration. After this, all
remaining samples are analyzed at rates between 20 and
1,200 analyses per hour. Thus, once an analysis system is
set up and running, one should analyze all samples in the
laboratory requesting that parameter. If samples are
missed, or if some samples are not analyzed properly, a
minimum of 3 hours is required the following day to
complete the requested parameter work. Since the CRL
employs only 20 bench chemists and runs over
200 different parameters, each chemist is responsible for
an average of 10 different parameters. The Surveillance
and Analysis Division requires a 14-calendar-day turn-
around time which gives each chemist only one day for
each assigned parameter. Therefore, when samples are
not analyzed on the proper day, makeup work must be
done using personnel from another group. This results in
lower quality data and wasted resources. In summary,
although mixing creates a serious data management
problem, it permits us to operate from 300 to
500 percent more efficiently than we could by handling
all samples in their original groups.
As parameters are completed, the bench chemists
report the results to the section chiefs, who perform
"two parameter" quality assurance audits when possible.
When all measurements assigned to the section have been
completed and they appear to be correct, pages of the
basic data form with handwritten results are given to
the computer operator. The operator enters the results
into the computer using a low-speed terminal, a key
punch, or an optical card reader. When we have gained
more experience with the system, we believe the chemist
may be able to enter his own data using the optical card
reader.
When all sections have reported their results for a
given survey, the PO is notified by phone. He retrieves a
copy of the basic data form from the computer, submits
all data to an automatic quality assurance audit, reviews
the results for reasonableness, and refers questions back
to the appropriate section chief. After the PO has ac-
cepted the results, he writes his report and directs that
the data on the basic data form be entered directly into
STORET by the computer technician. Later, a retrieval
is made during the weekly update of that system to
ensure that all the data were placed in STORET.
Given the above information, the minimum require-
ments from a laboratory data management system can
be easily summarized. The system should provide:
A summary of work to be done by date,
survey, section, and parameter for manage-
ment short-term planning. Figure 2 is an
example of a workload listing by parameter
for the inorganic section.
A real-time listing of in-house samples to be
analyzed for a given parameter. For example,
if a chemist is running mercury, the computer
should identify all in-house samples requiring
mercury.
A report giving the status of each survey in-
cluding: due date, analyses completed, ana-
lyses being run, and analyses not started.
Figure 3 is an example of how the CRL
collects and reports this data manually.
A summary report of work completed per unit
of resource expended so that long-term plans
can be made. See Figure 4 for the computer
output desired.
Figure 5 shows a graph of the CRL requested work-
load as a function of time for this fiscal year. Please note
that the first spike represents a rate of work that would
require a staff of 100 chemists. Since the holding times
used by the CRL permit us to hold some samples for
1 week, others for 2 weeks, and still others for longer
time periods, we are able to analyze all of them with our
23-man staff without losing samples (if we are not asked
to do additional work the following week). When we
obtain large numbers of samples on consecutive weeks,
however, as shown in the last spike in Figure 5, we are
forced to discard unanalyzed samples when the holding
times expire. Workload spikes of this nature usually
occur because work request information cannot be pro-
cessed in the time necessary to effect a change in
sampling plans.
In addition to knowing the total number of mea-
surements to be made, it is essential that we know our
workload on a parameter-by-parameter basis, to ensure
that all perishable samples are analyzed first. Figure 2 is
a backlog listing from our inorganic section for the week
of October 17, 1975. Dr. Carter had just over 1,000
phosphorous, 361 TOC, 459 mercury, and many other
66
-------
analyses to complete. However, he also had 22 bromide
analyses and, since we do not run bromide on a regular
basis, we knew these 22 analyses would require 2 man-
days; almost as much time as the 459 mercury analyses
would require. As you can see, if we are to avoid discard-
ing samples, it is essential that workload listings on a
parameter basis be available in real time to the section
chiefs so that section personnel can be used effectively.
Presently, we are using at least three positions to provide
this information in a timely manner.
Figure 3 is one page of a weekly publication we
print summarizing the status of each survey. The left
column identifies the survey by name, the submitting
office, the computer data set number, and the beginning
and ending sample log numbers. The next three columns
record key dates and estimates of the work required by
each section. The fifth column is titled "No. Analyses
Requested." The last section permits us to identify
quickly which parameters are completed (0) for a partic-
ular survey and which remain to be done (0). Often,
when program deadlines are approached, a PO will re-
quest an incomplete data set and begin his report; he will
finish it as the last parameters are completed.
Figure 4 is a summary used at the CRL for long-
term management purposes, such as estimating resources
required for different work-plan options, the proper
balance of personnel among the sections, and ability to
participate in national or other large projects. For this
and for all other outputs, we define an analysis as a
concentration value reported to someone else.
Therefore, these numbers probably describe between 10
to 30 percent of all data we would like to have auto-
mated into a complete Laboratory Data Management
system. Figure 6 is the output from Lab-Label which
tracks the progress of each survey. The example is from
the Indiana District Office (INDO) file. At the end of
each year this file will contain a summary of the work
completed by INDO. At any time during the year it can
be used to spot bottlenecks. The report is generated
automatically by Lab-Label and requires very little
computer operator time.
As you can see, the CRL has considerable ADP
needs. These needs are generated by large and variable
workload requests combined with limited personnel re-
sources. To satisfy our needs, an ADP system must pro-
vide the desired reports from one-time data entry. It
must be easy to use and reliable. We are optimistic that
such a system can and will be developed in the near
future.
67
-------
Use 4enmwdo.wak.mwdo9ZJ on t»o017
STORE!
Sample
Identificat
Field
Parameters
Organic
Metals
Inorganic
T list
i.
2.
»_
».
s-
6.
».
8.
Q.
' 10.
;iofti. —
12.
i«,
it.
H.
1E.
l».
18.
_J9
JO.
51.
22.
j*.
2*.
«.
26.
57.
28.
«.
30.
. 31. .,
32.
}V
]«.
"i
36.
— I*.
J8.
-J9,
llO.
— U,
»2.
fc^
»«.
L C
«s.
—Ml —
kS.
-»9,
$0.
'— «,
52.
51..
—55,
56.
— SJ
58.
59.
60.
{},
62.
t ct
lCRLS_05APJt DSN-CNMWOO.WAK.MWD092J / 7S0017
NPAR NLOC AGENCY! D UNLOCKEY
REVOO T
---ilUUT UtbLKI Kl | UN-
STATTYPE SMPLDAY ATLABBY DUEOATE ACCOUNT-NUMBER
2k 5 12HIWIO BRIEMHUR OJl4»UAIRYLAM) POWER. E.J.STONEMAH PLANT. CASSVI LLE,WI
SAMPLE UESCRI 1-TIOMS
LUlDffilHJiIOHU-U> COUDAr TIME STATTrPE DEEP T H WO ENDOATE TIME
767152 DAIRU 150CT75
767151 DAIRTL 1SHCTJ5
76715U BLANK 1SOCT75
>»7R7ISJ »eOOUNr. WATFR
>»76715I<
EPA-CRL
JflTS
SAMPLE
inn «n
767152 :
7S7155 •
7671514
.MW00922
»HETAL BLANK
00010 F 00056 F
_WATEJt FLOW
TEMP RATE
CtNT CPO
IP ' »P
DAIRYLAND POWER,
051.1.1.21,0 001 T
05IH1.2CIO 003
A 21. 160CT75
JRU
SAMPI F/PADAMFTER DATA
001)00 f 00500 F
PH RESIDUE
FIELD TOTAL
SU MG/L
: : :
: ; i
t : t
JP kP
E.J.STONEMAN PLANT,
31505 F
TOT COLI
MPN CONF
/lOOHl
:
:
5P
CASSVILlt
31615 F
FEC COll
MPNECMED
/100MI
:
:
6P
,Ml
50060 F
CHLORINE
TOT RESC
MG/L
\
7P
EPA-CRL 00556 OG
IQ75 OIL-GRSE
SAMPLE
inn NO .
767152 :
7«71S» :
76715ii :
.MWD0922
tPA-CRL
T01«
SAMPLE
IOC-NO
767152 :
J471S3 -
76715H :
.NH00922
EPA-CRL
1975
SAMPLE
Wi "On.
767152 :
J6J15J i
76715H i
.UH00922
1 EPA-CRL
] n) e
SAMPLE
inn «n.
767152 t
767153 1
767151 1
.HWD0922
FREON-GR
Mflyj.
.
_JL ' 9f>
OAIRTLAND POWER,
01067 MW 00927 MW
MICKFI HCNSIUM
Nl.TOT KG, TOT
1JC/L ItC.fl
HP ' 1KP
OAIRYLANO POWER,
010S1 MM 01027 MM
-LEAP CADMIUM,
PB.TOT CD, TOT
ufjj_ nr./i
I
...11V— 7JP
OAIRYLAND POWER,
: i :
: : :
: i :
10P IIP
E.J.STONEMAN PLANT,
01051. MH 0101.2 MU
CHROMIUM COPPER
CR.TOT CU,TOT
UC/L UG/L
i : :
• • !
: : :
17P 1»P
E.J.STONEMAN PLANT,
: : <
! ! !
i i i
?tt J5P
E.J.STONEMAN PLANT,
0007C IM 00530 IM 70300 IM 00095 IM
TIIBR BF5inuF KESIDpF CNPUCTVY
TKDIDUTR TOT NFIT
MACH FTO MC/L
!
1
!
— J9P- 3op
DAIRYLAND POWER,
DISS-UO AT 2bC
C MG/L HICROHHO
1 I :
1 1 1
1 t 1
\IP \IP
E.J.STONEMAN PLANT,
1
:
12P
CASSVILLE
0101.5 MW
IRON
FE.TOT
UC/L
19P
CASSVILLE
!
26P
CASSVILLE
0091.5 IM
SULFATE
SOU
Mfi/l
t
:
JSP
CASSVILLE
:
:
UP
,WI
01055 MM
MANGNESE
MN,TOT
UG/L
:
i
20P
,WI
:
i
27P
,HI
o6?liO IM
CHLORIDE
CL
MC/i.
I
1
1
3dP
,WI
!
HP
0109! I'M
zinc
ZN.TOt
UC/L
B
t
21P
:
:
28P
06I.U} IM
LAB
PH
su
:
35P
• A
• A
• ^
• A
15« A
2S'A
3S'A
• •A
• •A
?T-
• B
• B
• 8
1S«B
2S«B
"55. B
• •B
• •B
• C
• C
• C
«C
Tsvc
2S«C
3SVC
•«C
• «C
• 0 "
•0
• D
• 0
1S>U
2S«0
JS-D
««0
— T?D —
• t
• E
• t
«E
-M-*e —
2S«E
3S«E
• •E
.;.£
Figure 1
Basic Data Form
68
-------
Submitted by
Weekly Report for Inorganic Section
Carter Date
10/17/75
Analysis Requested
Position Weeks of Effort
Analysis Completed
Backloq
Parameter
Alkalinity
BOD
Chloride
CrIV
Cyanide
Fluoride
MBAS
PH
Phenol
Silica
Solids - Dissolved
Solids - Suspended
Solids - Total
Solids - Volatile
Spec. Cond.
Sulfate
Turbidity
Bromide
Ammonia
COD
Mercury
N03-N02
TOC
TOP
TP
TKN
P04-P
N02(air)
S02(air)
Solids (air)
TOTAL
ILDO
4
13
8
15
8
8
8
13
17
5
10
10
5
117
INDO
6
1
6
13
MODO
22
22
23
8
23
17
10
8
133
MWDO
2
3
3
8
8
24
GLSB
159
200
105
105
159
105
105
105
352
378
282
2055
ASB
31
31
44
106
Other
172
344
172
344
172
172
344
172
1892
TOTAL
4
194
8
17
208
8
8
105
105
172
22
319
119
459
214
361
524
573
642
172
31
31
44
4340
Figure 2
Weekly Report for Inorganic Section
69
-------
PAGE 65
LOG i. SURVEY
& DATA SET 1
IHDO DSN 138
Harvester Ditch
Pesticide Study
3140-3143
MODO Detroit
WUTP (Water)
6506,6507,6665.
6666, 6667
HOCO Detroit
VWTP(B.S.)
6503.6504
MODO Columbus
• STP
6609--6660
Date
Samples
12/8/75
12/8/75
12/8/75
12/8/75
Date
Data
Reported
Requested or
Projected (P)
1/8/76 (CRL-P)
V8/76(CRL-P)
12/24/75 (CRL-P)
12/24/75 (CRL-P)
l/8/76(CRL-P)
12/24/75(CRL-P)
12/24/75(CRL-P)
No.
Analyses
200
c
ino
<10
33
3p
24
36
51
22
For questions, contact:
0 = REQUESTED /•= COMPLETED (Biology) H.Anderson (Organic) Or. E. Sturino
0»G Oil IdentifiraHnn
Qualitative Total Oraanics CDuantitatwe OrqanicsJ Other
Al Ag As B Ba Be Ca TiPTS" Cr Tu ~e Li K Mg Mo Hn
Na Ni Pb Sb Se Sn Ti V Zn Pb(Ras)
Alk BOD Cl Cr+6 CH F HBAS pH Phenol Si Solids(D S T V) Spec.Cond. Sulfate T"r
NH-i COD Hg NOi TOC TOP TP TKN Ora.N NO? SO^ TPS
Hacroinvertebrates Phytoplankton Zooplankton Chlorophyll Periphyton
ATP (Biomass) Total Coliform Fecal Coliform Bioassav flfhpr
PestijidjO CPCBs 3 OSG Cpibutvl PhthfllStd
Qualitative Total Orqanics quantitative Orqanics- CDiothvl Hexvl Phthalatti
5-?Jia. © B dD Be <;a> Co C£ IV" "'R "(R|> Ho JHn)
™) <59 JBl Sb ?5e5 Sn Ti V Qrh Pb (Gas) Total S Dissolved
Alk BOD CT Cr+6 CM L-»M94i. _PH_ Phenol Si Solids(D S T V) Spec.Cond. Sulfate Tur
ATP (Biomass) Total Colifora Fecal Coliform Bioassav Other
Pesticides PCBs O*1"- Oil Identification
Qualitative Total Organlcs Quantitative Oraanics Other
JU^ Ag (As) B QL> <5O Ca (Cj? Cd Cr Cit? Fe CLJ^ K fflov yj£> Qja>
Ra Ni Tt Sb Se Sn Ti v^ Zn Pb (Gas)
ATP (Biomass) Total Coliform Fecal Coliform Bioassav Othsr
D.,H ,.<.•« /Hx'b^ niR nil IHonf fir^finn
Qualitative Total Orqanics n'ani-1l!8tiv^Jt'i!ill5s flthpr C nrtn^rrial [TYPjL J
2P B_ffeO Be £a_ CCo? Co? £r> (TO ^e> Li (IO Hg Ho ^n)
ta CiD 7FS5 Kp Se Cnl (TD fp «?p PIT fCTO
Alk BOD. Cl Cr+6 Cn_ V HBAS _Ph Jfifinol Si Solids(D S T V) Spec.Cond. Sulfate Tur
/NHO®)
-------
SUUUITTED DY
SECTION
BIOLOGY
ORGANIC
METALS
NUTRIENTS
TOTAL
BIOLOGY
ORGANIC
UETALS
KUTRIENTS
TOTAL
• ~ MONTHLY CRL
: 6&A 4a^Jj>^ ;
Analyses
ILUO
(7
17
9tf
111
-392
IN 00
M9
7
(•••WOO
f)
65"
.^7
o xl
/ O /.
ANALYSES
Backlog
GLSO
1^3
3.%'15
22X2
2056
7 ?>b£
REPORT
DATE : /^/c2 7 / 'X *T
USCG
0
%0
O
0
20
ASO
0
o
lOb
/££,
OTHER
£jg
^63
RS15'
/vm
IU32
TOTAL
/>2/y
50/2
// /) V^
>I3HO
SlOltf
Analyses Completed
#
7
l?3.
33.^
-T/5"
ifjvZ
0
91
%0
11 1
/V/if ^
J/H
/79
^5.7
i>30
>ffilf]U
-------
-J
to
-o
0>
8
<* 7
6
c
o
(O
0)
a
o 4
i/>
c
ro
3 3
o
Each 1000 Analyses Represents
A 16 Position Work Rate
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Weeks
Figure 5
Laboratory Workload As a Function of Time
-------
IISF. iCNASAO . PXS . I MDO. l.DCS Of] TSO004
LIST
1 .
2.
3.
4.
5.
6.
7.
8.
9.
1 0.
1 1 .
50.
51 .
52.
53.
54.
55.
56.
57.
58.
51.
f.O.
61 .
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
CT
1/11 ,50/LAST
INDIANA DISTRICT OFFICF. Ill RAST DIA(in'!D A'T. r VA !'S '.' I 1,1. i: l"PIA"A '.771
LISTING OF DATA SFT rlAIlF.S , VOl.l'I'F.S , CK'lMF.r'TS AMP PATF.S
IF THTRF, ARF. A'lY (JHFS T IO''S PLFASF Cnr'TUCT "MCll/MMi pi'F.fm.
8-P 1 2-4 23-6264 FTS OP. B 1 3-4 2 3-68 7 1 EXT 2fc4,2fiS
LAST UPPATT - 76012R
RF.T TMF.SF. DATA SP.TS usirx: &CVAS AH. n::n . i.AHF.i.. DATA T.TT 'Minrr.r.
DATA SET NUMRr.KS PRIOR T" It::i007" ART STOP.F.I? 01! IVPO.I.UCS Or! ISP-POT
LABEL I.AHEL COMMENTS I'ATK B EC 1 "-E'M) DA1T
US'! VOLI'MT SIIRVFY N'AMF IA3F1 DATI'S (IF !A!irl
;jo rotJTACT I'FRsn1.' srTi!" Firi n sr^v*' ri'iM
\ltiDoii6\TSOOOfi\PCRs STUDY o;: HARVF.STF.R DiTCH\o7 i o\7 r>/o7/oR \
\lMnoil7\TsnnoR\r.iTi7.F.f COMPLAINT p. HOOK \n? 1 n\ 7 s/ti7/n 7 \
.\KIPOI1P\T5000R\CITIXFH COMPLAINT I'AMHS" \ 0 7 1 0\ 7 5 /(• 7 /O 1 \
\ I !!PO1 19\TS0008\R. F.COODr.lCI! UOOplry.N KA;'r.SII\O7 1 4 \7 5/H7/2 2-21 \
\ t r:oni 2o\Tsnoo8\P.i.rFFTO>.' S.T.P. HANTS" \o? i s\7 5/^7/2 2-23 \
\ i »:noi 21 \TsoooR\J"rJFSRORO S.T.P. PA:M:SII \ 07 1 5\7 5/07 /? 1-2.'. \
\ii-noi22\TSOonR\nPissnM Ai" FIRCF. P.ASF ARAMS \o72ft\7 5/07/20-30 \
\IND0123\TSOOnR\FAP.MLAMD S.T.P. JIM APAVS \ 0 7 2 f.\ 7 S /|i 7 / 3r- 3 ! \
\iNnni?4\TsoooR\3-ri COMPANY JIM ADAMS \o?2f>\7 s/o7/3"-i I \
\ I NDOl 25\TSOOOR\CA"MFI. f, FDP.TVI I.LR S.T.P. .] . A\ 0 7 1 0 \ 7 S /T> /O 5-0 <> \
\ MDOI 26\TS0008\FT RF.I' MARRlsnt! S . T . !' . I.S. \ \ 7 5 / 1 0 /Tl-OQ \
\ 'inol 27\TSOOOR\()TTKP.pr I" S.T.P. .'!.'=. PAili:S"\ \ 7 V-jo/.Tl-OI \
\ ND01 2P.\TSOOOf'\!;.LY LILLY AT i.AFAYF.TTT. f'A:!FSH\ \ 7 5 / n- 1 O/ 3 n-.l 1 \
\ KD01 29\TSOOOR\r.l.Y LILLY AT LAFAYP.TTi: l!.\r:KSII\ \ 7 5 /O9- in/ 3 0- 1 \
\ ^noi 30\TsonnR\f:oiiiirRciAi. SOLVF.::TS RAMF.SM \ \75/i i /o5-on \
\ ^'DOl 3 l\TSOOOR\PFi:p FR TIPP.F HAI'TH PAITSH \ \75/ll/nA-Pi \
\ !:noi 32\TSOons\f.p.ri:r!CASTLF. S.T.P. JIM ADAMS \ \75/ll/ll-l? \
\ :n)oi33\Tsooo8\'.:rsTO!i PAPF.R IMC. JIM ADAMS \ \75/i 1/12-11 \
\ r:DO134\TSOOOR\nARTl.RV SRAl'D FnopS I.S. \ \75/ll/n-20 \
\ >:DOI 35\TSOOOR\BRF.F.n p.p & n . A . n . P.A.VF.SM \ \7 5/ 1 2/02-0.1 \
\ t^Donfi\TSoon(i\.i rFFF.RSor: pr.ovirit: r.«iiiTr:n .i.\.\ \7 *•/ 1 z/ni-K! \
\ IIIDOl 37\TS1008\i:Ll LILLY AT I.AFAVr.TTF. \ V75/12/03-P4 \
\ l'JD013R\TS0008\FORT 1'AYSH PKST. STUDY !', . S . f . \ X75/12/03-04 \
\IIin0139\TSI1008\COMMKRlCAL SOI.VTrlTS 003 MSR \ \7 5/ 1 21 1 5- 1 6 \
\ l:4nol40\TS0008\COLCATK PALIIOLIVF. n.S.R. \ \76/01/13-14 \
\ IHI1014 1\TS0008\IKDIAKA AP.MY AIIMO n.S.R. \ N76/01/14-I5 \
\IND0142\TS0008\JCFFERSOi: PROVING CKOUI-'DS \ \76/nI/l&-15 \
\ INDOl 43\TS0008\ ItiDIAIIA FAW1 Hl'RfAII T..S.!'.. \ \ 7 f) /O 2 /O 2-0 3 \
\lMDO144\TSOOn8\SRnF.D POWF.R PLT. H . S . F! . \ \ 7 f> /O 2 /O 3-04 \
\IKDOI45\TSOOOR\PCR S STUDY BLOOM I riCTOM \ \ 7 f> /O 1 / 2 1 -2 3 \
\ llin0146\TSOOOR\ALCOA f. SIGF.CO STUDY \ \ 7 f. /O 1 /2 7-2R \
\llin0147\TS0008\SIGF.CO STUDY \ V76/01/2S \
\1NDII14R\TS0008\C1ILLCY POIIER PLT \ \76/01/2R \
\ IMD0149\TSOOnR\LYOI'S S.T.P. JIM ADAMS \ \ 7 f. /O 2 /O 4 -o 5 \
1
SAMP nr.cn ni'r. o.o. PATA CRL DATA RF.PORDATF. DATF.
MAII DATF lAT1' LAI n.O. |AT CPI. CO"P DATA DATA
HATT CW1 DATF T.n. T.D. C D. T.D. DATF ^TPPF VFR
\ \0 7 1 4\07:>P\ ". A. \". A. \OROR\P80P\ v. A. \V.A.\I-.A.\
\ \070"\0723\07!2\n7l2\07|S\0715\f'.A.\r'.A.\;'.A.\
\ \n7l l\n72.i\n7l6\0717\P729\0729\K.A.\".A.\'-.A.\
\ \072f\o=n''\o72R\0731\OP14\0'!|(.\o00\OP22\OR2'>\0'10<'\OP2R\O905\
\ \ORO*)\nP. IR\0822\OR25\OR2?\OR2'i\0922\OR28\Orlo5\
\ \oso'*\oPi\122f)\l21R\12l0\l230\0107\ \ \ \
\1 03\ 1 204\ 1 21 7\ 1 2 lfl\l 2 |9\0 1 11\01 10\ \ \ \
\1 (1«\1 20fl\l 217\I1.A.\r!.A.V \ \ \ \ \
\llf>\1217\1230\1222\1229\ \ \ \ \ \
\ \0120\012R\0123\nl26\ \ \ \ \ \
\ \0120\OI2S\0123\nl28\ \ \ \ \ \
\\\\\\\\\\\
\ \ \ \ \ \ \ \ \ \ \
\\\\\\\\\\\
\ \012?\ \ \ \ \ \ \ \ \
\\\\\\\\\\\
\\V\\\\\\\\
\\\\\\\\\\\
\\\\\\\\\\\
Figure 6
Survey Tracking From Planning to Final Report
-------
DATA COLLECTION AUTOMATION AND
LABORATORY DATA MANAGEMENT FOR THE
EPA CENTRAL REGIONAL LABORATORY
By Robert A. Dell, Jr.
INTRODUCTION
The Central Regional Laboratory (CRL) is a unit of
the Surveillance and Analysis Division within the
Chicago-based Region V of the Environmental Protec-
tion Agency. The CRL currently performs in excess of
150,000 environmental measurements per year. For the
fiscal year 1975, this workload has varied from 100 to
10,000 analyses per week. Thus it has grown to the
point where manual handling and reporting of the data
have become demonstratively inefficient, and the need
for more cost-effective operation with better quality
assurance is evident. An example of this need, both the
quantity and variability of the workload make advance
prediction for lab scheduling difficult. A comprehensive
description of the CRL operations and functional
requirements for the CRL Laboratory Data Management
(LDM) system are described in a companion paper at
this workshop. The methodology currently used to
automate CRL operations uses both available equipment
and personnel, and takes advantage of significant past
and ongoing projects within the EPA.
Table 1
Major Components of CRL NOVA Minicomputer System
Model
8294
8206
8207
8208
8020
4008
6003/4019
4057A/4046
6013/4011
4030
4063
102A
3405
4032
4055J,K,N
various
733
ADM-1
Description
840 CPU, 64k of 16 bit words, memory mngmt.
Power monitor and autorestart
Hardware multiply-divide unit
Automatic program loader
Floating point processor
Real time clock
262K Word Novadisk/control (fixed head)
12.472M Word (IBM 2314 type) disc drive/control
Paper tape reader 400 cps/control
Mag tape, 9track, 45 ips
12 line async digital multiplexer
Centronics printer
Vadic 1200 baud modem
Analog/digital converter, 14 bit
16 channel analog multiplexer
Interface cards, cabinets, and DC software
Texas Instruments Corp. console (hard copy)
terminals (2)
Lear-Siegler Corp. (video) terminals (10)
Total Cost
Cost
($)
29,300
300
800
300
3,300
300
10,000
21,000
1,600
9,200
3,700
6,800
1,000
4,000
1,200
11,400
5,000
17,000
$126,000
CURRENT STATUS
The CRL has an automated data acquisition system
which was installed in November 1975. Currently two
types of measuring equipment are supplying data
directly to a Data General Corporation NOVA 840 host
computer. The hardware configuration of this system is
shown in Table 1 and the documentation is available
fj
elsewhere. The two basic prototype data collection
programs were delivered with the CRL system for
Technicon autoanalyzer and atomic absorption
measurements.
At present, the LDM system is in the detailed
design stage and essentially allows for the storage,
updating, and retrieval of the analyzed data generated by
both these programs and manual entry of other measure-
ments. Sample and laboratory supervision is facilitated
by incorporating in the LDM system report writing pro-
grams. These programs prepare status and short-term
periodic reports.
A future aspect envisioned for the LDM system
allows for the implementation of an interlaboratory
Regional Communications Network (RCN) to further
enhance the cost effectiveness of the data base. The
RCN would standardize the connection of other instru-
ments and outside agencies to the LDM system.
LAB DATA MANAGEMENT REQUIREMENTS
The LDM system is in response to the needs dis-
cussed below.
1. Effective Infra- and Interlaboratory Communi-
cations
From study conception through field and labora-
tory measurement and final reporting, the LDM system
must aid communications between the study requester,
sample collectors, analysts, and lab supervisors. The
format of the report form allows it to be used as an
analysis request by the study originator. The same report
can be utilized not only as an input mechanism to the
STORET system but also for reporting the analysis back
to the requester.
74
-------
2. Timely Report Generation
CONCLUSIONS
While the analysis request and study summary are
accomplished by the same data set as described above,
generation of the workload projections and periodic
summaries spanning several studies require considerable
manual effort to prepare at present. In particular, the
workload listing is essential to program the startup of
laboratory processes in an efficient manner.
3. Assurance of In-lab Quality
Included in this area are assurances concerning the
parameter measurement precision and accuracy, as well
as consistency checks on similar samples, to keep a
monitor on the complete analytical procedures.
4. Optimum Utilization of Present Resources
Three areas are identifiable:
A. The EPA Office of Research and Development
has cooperated recently with Region V in
compiling an Interim Laboratory Data
Management System (ILDMS)3 which
supplied several computer programs directly
usable in the LDM system.
B. Data collection automation utilizing the on-
line programs mentioned previously enters
data to the NOVA in computer-readable
format. Other instrumentation at the CRL,
such as a plasma emission spectrometer and a
gas chromatograph, also generates digital data
through their associated minicomputer.
C. Standardized Methodology: the NOVA 840
has been used by four laboratories, while a
compatible ECLIPSE machine also exists with
data management capabilities within the EPA.
Remote job entry programs for the OSI and
RTF national computer utilities are opera-
tional on this machine, and much online pro-
gram support is available. The foregoing
suggests that the LDM system as developed
could also be transportable thus allowing
other facilities to take advantage of each stage
of development. A synchronous data com-
munications protocol and standard LDM
implementation languages have also been
beneficial.
At present, the regional laboratories seem to need
LDM systems slightly more than automated data collec-
tion systems. The effective management of the data,
however, requires its reliable entry into computer-read-
able form. A quality analysis is much easier to obtain
with computer-aided detection of peaks and curve fitting
routines.
While the EPA-wide effort has been with instru-
mentation and process control in the past, the LDM
system aspects of laboratory automation are ripe for
development and can be accomplished with current
technology.
REFERENCES
1 Fairless, Billy, "Requirements for the Region V
Centra] Regional Laboratory (CRL) Data Manage-
ment System" Proceedings No. 2, ORD ADP Work-
shop, 1975.
2 Frazer, J. W. and Barton, G. W., "A Feasibility
Study and Functional Design for the Computerized
Automation of the Central Regional Laboratory
EPA Region V, Chicago," ASTM STP578,
American Society for Testing and Materials, 1975,
pp. 152-255.
3 EPA Quality Assurance Division, Office of
Monitoring Systems, "Development of an Auto-
mated Laboratory Management System for the U.S.
Environmental Protection Agency," June 1974 and
January 1975.
75
-------
SAMPLE MANAGEMENT PROGRAMS FOR THE
LABORATORY AUTOMATION MINICOMPUTER
By Henry S. Ames and George W. Barton, Jr.*
In 1973 the Computer Systems and Services
Division (CSSD) of EPA-Cincinnati retained the services
of a multidisciplinary team of chemists and engineers
from Lawrence Livermore Laboratory (LLL) to develop
functional specifications for the automation of a number
of analytical instruments at the Environmental Moni-
toring and Support Laboratory (EMSL), Cincinnati.' As
an outgrowth of that study, LLL was asked to
implement the systems specified and also to develop ad-
ditional specifications and a cost/benefit analysis for the
Municipal Environmental Research Laboratory (MERL),
Cincinnati, the National Field Investigation Center
(NFIC), Cincinnati, and the Central Regional
Laboratory, EPA-Regjon V, Chicago. As a result of these
projects, LLL has developed designs for Technicon
AutoAnalyzers, several manufacturers' atomic absorp-
tion spectrophotometers, the Beckman Total Organic
Carbon Analyzer, a Jarrell-Ash Emission Spectrometer,
and a Mettler Electronic Balance. These automation de-
signs are now installed in EMSL and Region V on Data
General NOVA 840 computer systems.
At this time, LLL is preparing functional specifica-
tions for Region III, Annapolis, Maryland. We have also
had less formal contact with Regions I and IV, and parti-
cipated in the Interim Laboratory Data Management
Project (ILDMS) sponsored by the EPA Office of Re-
search and Development (ORD). Certain problems of
sample and laboratory management have become
evident.
For the purpose of this paper, arbitrary distinctions
are going to be made among the following:
Sample Management. The tracking of an ana-
lytical sample from the time its collection is
planned through actual sampling, analytical
procedures, and quality assurance to the point
of production of a final report on the sample
in a form suitable for introduction into an
archival data base.
Sample Management. The tracking of an ana-
lytical sample from the time its collection is
planned through actual sampling, analytical
procedures, and quality assurance to the point
of production of a final report on the sample
in a form suitable for introduction into an ar-
chival data base.
Laboratory Management. The additional infor-
mation necessary for a laboratory manager to
plan the work of his laboratory in order to
make optimal use of his resources of man-
power and instrumentation, or to convincingly
document the need for reallocation of re-
sources (e.g., hiring, firing, new instruments,
outside contracting).
Data Management. The remaining data reduc-
tion functions, including comparison of data
with models, investigation of environmental
trends, and similar large-scale studies, which
require large data bases and powerful proces-
sors. We will dismiss these last functions as
beyond the scope of the laboratory automa-
tion computer installed in the laboratory.
Immediately after LLL produced the preliminary
feasibility study and cost/benefit analysis for Region V,
we initiated an investigation of what would be needed to
provide this sample management capability. A bench-
mark program was written and demonstrated in
DECSYSTEM-10 BASIC. It performed admirably on the
PDP-10. Arrangements were made with Data General
Corporation to test the program on the computer config-
uration specified for Region V. Results were exceedingly
disappointing. A search of the same simulated data base
which ran in less than 1 minute on the PDP-10 took over
20 minutes on the NOVA-840. A number of fixes were
considered, but other information became available at
about this same time. Although some speed improve-
Speakcr
The Lawrence Livermore Laboratory (LLL) is operated by the University of California as a prime contractor to the U.S. Energy
Research and Development Administration (ERDA) under contract W-7405-ENO48. Funds for this project were provided by the U.S.
Environmental Protection Agency (EPA) under interagency agreement EPA-IAG-D4-0321 between EPA and ERDA. During the life of
this contract both ERDA and EPA have undergone reorganizations. In order to reduce confusion, all organizations and elements will be
referred to by their current names.
76
-------
ments could have been implemented, the original
approach was too inflexible to meet potential require-
ments. The feasibility study for NFIC, Cincinnati,
indicated that their sample management system had to
satisfy certain legal requirements, and that these require-
ments might well be placed on regional laboratories too.
These requirements included: audit trail, chain of
custody, assurance of data integrity, automatic data re-
jection criteria, and legal defensibility.
It became clear that in the long run a totally dif-
ferent approach was needed. Sample management re-
quires at least the following functions in real time or
near real time:
Sample Log-in. The online facility that
permits survey planners to request analyses,
field engineers to enter field data, and labora-
tory personnel to verify receipt of samples for
analysis, including all information necessary to
schedule the sample for the needed analysis.
Analyst's Workload. The online facility to per-
mit the operator(s) of the instrument(s) to
determine which samples need to be analyzed
and to select those which will be analyzed on
a given day.
Analysis Reports. The online reports to the
analyst of results of analyses, including alerts
to the analyst of anomalous conditions (e.g.,
off-scale, out of range, disagreement of dupli-
cates, unacceptable recovery of spikes, etc.),
and summary reports of all the results of the
work session. This is installed at Cincinnati
and Chicago.
Quality Assurance Reports. Reports to the
analyst and his managers of all the relevant
quality assurance data including trends, thus
permitting the laboratory staff to detect
potential problems of precision and accuracy
and to take necessary action before, or as soon
as, unacceptable results are produced. This is
also installed at Cincinnati and Chicago.
Consolidated Reports. Reports to the labora-
tory manager, the requester of the analysis,
and to the national archive system of all perti-
nent data on a sample or group of samples in a
form that requires no further hand transcrip-
tion with its probability of error.
The LLL approach relies upon separation of real-
time functions, instruments which must be serviced on
demand, and queries which can wait awhile. A prototype
system has been written to investigate the response to be
expected in the laboratory environment. It incorporates
several of the important features, but in no way can it be
considered a product for release. It is, however, a realis-
tic way to investigate the feasibility of this approach to
sample management. At the present time, the prototype
system, called Sample File Control (SFC), executes on a
NOVA 840 at a lower (background) priority than the
instrument programs. Communication between instru-
ment programs and SFC is through a buffer area of core
accessible to both background and foreground programs.
Instrument programs need know only the formats of the
SET, GET, LOCATE, etc., calls to access the data. Prob-
lems of data base access are handled by the SFC pro-
grams. If at some future date it should be desirable to
change the format of the data bases, only the SFC need
be changed. Instrument programs should require no
changes.
Operating on a data base of 25,000 analyses
(3 million words), in an environment with ten Auto-
Analyzer channels running at the same time, data base
queries are answered within 45 seconds. This response
time includes the time for a foreground BASIC program
to make a request of the SFC, search the data base, and
to return data to the BASIC program.
If many instruments (more than 20) are operated
simultaneously, response time may become excessive.
Response time is largely a function of the time necessary
to disk access. Three system changes can be made. The
least expensive, which is being investigated now, is to
regenerate the MRDOS system so that BASIC and SFC
overlays utilize the fixed head disk, with only the data
base and BASIC user files on the moving head disk.
Thus, the moving head disk controller would be able to
search the data files with a minimum of head motion
introduced by non-SFC activities.
A moderately expensive enhancement to the sys-
tem would be to increase the size of data buffers by the
purchase of additional core. The number of disk accesses
would be reduced in inverse proportion to the buffer
size, and the search speeded up comparably. The most
expensive enhancement would require the purchase of
an additional separate processor which would have SFC
or one of the commercial data base management systems
as its highest priority job for communication with the
archival systems, report generation, management queries,
and so forth.
77
-------
We have concluded that an acceptable SFC must
not require various undesirable compromises such as re-
quests for the analyst's workload files the previous night,
production of consolidated reports overnight, and
manager's status reports only as of the end of the pre-
vious day. It must be flexible and must not require the
modification of user programs in order to include new
analytes or additional information such as audit trails.
The present approach described here is flexible and
open-ended; it admits a number of enhancements as
analytical requirements change and as analytical load in-
creases. The investigation of sample file management
alternatives is nearly complete, and a valuable and flex-
ible_sample management system could be installed by
mid-1977"
78
-------
SUMMARY OF DISCUSSION PERIOD - PANEL II
The presentations concerned with laboratory data management (LDM) generated a number of
questions.
Need for Automated LDM
It was suggested that a sample load of 10,000 to 100,000 analyses per year was sufficient to make the
use of an automated LDM system feasible; however, it was pointed out that this really depends on the
resources available to a laboratory'. Gearly, 1,000 analyses per year would be sufficient if the LDM program
were available cheaply enough.
Sources of Assistance
Two sources of assistance in LDM were identified within EPA. The Management Information and Data
Systems Division in Washington, D.C., has basic ordering agreements with several contractors who will assist
in the development of such requirements specifications and documentation. The Environmental Monitoring
and Support Laboratory in Cincinnati will assist EPA monitoring laboratories in the implementation of
laboratory automation/LDM systems. There are several commercial minicomputer-based data management
systems available. These are relatively new but are being considered in feasibility studies for LDM systems.
Finally, it was pointed out that the economies of scale imply that EPA monitoring activities should be
consolidated into a small number of highly automated, very efficient laboratories. However, several of those
present expressed serious reservations about the wisdom of this approach.
Specification Development
There was considerable discussion about the development of specifications for LDM systems. It was
pointed out that simple flow chart-level specifications are usually too vague and general and that
implementations based on them alone lead to problems. Detailed functional descriptions are required to
avoid acquiring an unwanted system. There was discussion of the justification for the response time
requirements in the new Region V LDM specification. Finally, the delays involved in implementing LDM
systems were discussed. These were attributed to vague, inadequate specifications, higher priority instru-
ment automation, underestimated required resources, and limited resources.
SHAVES and NASN
There was a great deal of interest in two existing LDM systems that use the traditional data processing
center approach. These systems are the EPA, Corvallis, "SHAVES" system and the system developed for
the National Air Surveillance Network. Points covered included development costs, use costs, turnaround
time, implementation time, personnel requirements, programing language interfaces, and batch versus
interactive operations.
Standardization
The desirability of standardization was mentioned several times. For certain classes of laboratories
(e.g., environmental monitoring), standardization is highly desirable and could result in significant cost
savings. It was pointed out that differences in hardware and operational methods work against
standardization.
79
-------
Languages
There was some discussion of the merits of programing in assembler and high level languages. Several
participants agreed that the number of lines of fully debugged code produced per unit time by an
experienced programer was independent of language. It also was pointed out that one line of high-level code
usually accomplishes as much as many lines of machine language code.
OSI System and STORE!
It was asserted that the interim LDM system (OSI-based) will complement minicomputer-based
systems, especially in regard to interfacing with STORET. The consensus was that quality assurance
concepts should be applied to environmental data before its transfer to archival storage (e.g., STORET).
80
-------
THE STATE OF DATA ANALYSIS SOFTWARE IN THE
ENVIRONMENTAL PROTECTION AGENCY
By Gene R. Lowrimore
Data analysts have yet to realize the full benefits
promised by the availability of large-scale computer
power. We keep looking forward to the day when we can
concentrate on the analysis of the data, that is, the day
in which the hardware-software machine will enable us
to do the analyses we want to do with only a reasonable
amount of nondata analysis effort. This paper briefly
discusses the current situation within EPA concerning
data analysis software and outlines what kind of soft-
ware we might develop to support the data analysis
effort more fully.
First of all, who is a data analyst? For purposes of
this discussion, the following definition will be used:
Data Analyst-One who analyzes, for some purpose
or other, data. For the most part, he or she is
assumed to know what operations should be per-
formed on the data in the process of analyzing it.
ARL Linear Algebra Library and Box-Jenkins Time
Series Analysis were developed by some data analysts for
their own particular purposes and furnished to us to use
at our own risk. IMSL is generally considered to be the
best general purpose subroutine collection available.
Among the stand-alone programs, the BMD-P series
is probably best known to most data analysts. It is a
series of 80 or more programs furnished by the UCLA
Computer Center. These .programs were developed for
the IBM 360/370 series computers. They have been
converted to the Univac 1100 series computers and are
distributed in this form by the University of Maryland.
The other three programs in the list are supplied by the
University of North Carolina at Chapel Hill. MANOVA
and MGLM are quite useful for performing extensive
multivariate analyses. LINCAT analyzes contingency
tables by exploiting the analogy with the analysis of
variance techniques.
The software tools which a data analyst has at his
or her disposal are of three kinds: (1) subroutines,
• which, of course, require the writing of a main program
,before anything useful can be done with them; (2) stand-
alone programs, which do not require additional pro-
graming but require that the. data be input at least once
for each analysis desired; and (3) integrated packages,
which allow many analyses to be performed on the data
once it has been entered. The following list is representa-
tive of the kinds of software tools available in EPA.
\ Of the collections of subroutines, STATPACK and
SPSS are vendor products (IBM no longer supports SSP).
Subroutine Collections
Univac STATPACK
Scientific Subroutine
Package (SSP)
ARL Linear Algebra Library
Box-Jenkins Time Series
Analysis
International Mathematical
& Statistical Library (IMSL)
Stand-Alone Programs
BMD-P Series
MANOVA
Multivariate General
Linear Hypothesis
(MCLM)
Integrated Packages
Statistical Analysis
Systems (SAS)
Statistical Package for
Social Sciences (SPSS)
OMNITAB
Linear Categorical STATJOB
Analysis (LINCAT)
From a data analyst's point of view, SAS (devel-
oped by North Carolina State University) is by far the
best of the integrated packages available. The strength of
SAS is its data handling capability and the ease with
: which the data analyst can invoke any procedure within
its repertoire. SPSS is supplied by the National Opinion
Research Center (NORC) and is the strongest of the
integrated packages for making contingency tables and
performing descriptive statistics. OMNITAB was
developed by the National Bureau of Standards and is
useful primarily as a data analysis tool for limited
analysis on small data sets. STATJOB was developed by
the University of Wisconsin and appears to be an
enhancement of SPSS. The 7 inches of documentation
accompanying the package is a major obstacle to
STATJOB's use.
These tools are typically those with which an EPA
data analyst must work. Because the software has not
been adequately designed for general use, a programer is
usually assigned to assist the data analyst in carrying out
the analysis. When a programer is unavailable, the data
analyst must perform that function. Since this kind of
programing is unexciting, it is likely to be greeted with
something less than enthusiasm. This lack of enthusiam
frequently leads the data analyst to perform the pro-
graining function even when not absolutely necessary.
81
-------
Just as frequently, the data analyst gets so involved in
programing that he or she ceases to be an effective data
analyst.
Some of the packages and programs available to the
data analyst are excellent. For instance, the availability
of SAS has dramatically cut the overall time required to
complete an analysis of data. SAS is a prime example of
what good software can do to improve data analysis, but
we should not stop with SAS, SPSS, or any other
particular package. Much data analysis needs to be done,
for which software is not available. The functions in data
analysis done manually are very expensive in terms of
money, time, and accuracy.
EPA needs to develop, or cause to be developed,
data analysis packages which will greatly reduce the need
for assigning programers to assist data analysts, reduce
the overall time and expense of doing data analysis, and
increase the scope of the analysis that (he data analyst
can easily do. In order to accomplish these ends, the
software should:
1. Handle Large Data Sets Effectively. None of
the available packages has this capability. For instance,
SAS stores all data in double precision. SAS also reads
ithe data set several times unnecessarily.
2. Manage Analysis Results. One of the require-
ments facing EPA researchers - is that the analyses
included in published reports should be exactly repro-
ducible by different programs. Under the Freedom of
Information Act, industry is requesting EPA data and
subjecting it to their own analyses. The system should
keep up with the results and the observations used in
each analysis.
3. Make Good Use of Plotting Capabilities. Publi-
cation quality plots or printer plots should be as easy to
, generate as any other statistical procedure. Presently, a
program must be written in order to generate a publica-
tion quality plot. This situation is unreasonable and
unacceptable.
4. Operate Effectively in the Interactive Mode.
Abbreviated output for a procedure should be sent to
the interactive terminal; the complete computer output
should be saved and sent at the end of the session where
the data analyst states. This software should lead the
data analyst through the session to whatever extent
necessary.
5. Use Analysis Techniques Which Exploit
Computer Capability. Procedures are needed for explora-
tory analysis of large data sets. The capability to use
empirical sampling techniques to test hypotheses needs
to be provided. Least squares procedures that do not
require the formation of the normal equations should be
used. SAS would not have required double precision
data representation if this had been done.
6. Allow the Analyst to Estimate Power of a Test
of Hypotheses. The power of the hypotheses test being
used by the data analyst is rarely calculated, primarily
because the computation is difficult.
7. Incorporate User Procedures. It should be
recognized at the onset that everything a data analyst
might want to do cannot be anticipated. Therefore,
linking an analyst's pet procedure with the data handling
capabilities of the system should be made as simple as
possible.
John Tukey was discussing these same problems in
1963. The question obviously arises, "Why haven't we
gotten more of this capability in the ensuing 12 years?"
The answer has three parts:
A false definition of scientific programing has
been promulgated which says that very little
input and output are involved and much
calculation is required
Data analysts often have not let their thinking
about the process of analyzing data be in-
fluenced by the presence and power of the
computers. Consequently, they have not pro-
vided the necessary analysis support nor have
they been demonstrative enough in demanding
the right kind of ADP support
Computer programing has been a very unreli-
able enterprise. Only in the last few years has
some slructure been brought to the pro-
graming process. This structure will give us
confidence that we can successfully develop
more comprehensive systems.
As ADP professionals, the challenge to us is to turn
the situation around and really accomplish something in
the field of data analysis software.
82
-------
REFERENCE
1 Tukey, John W., "The Inevitable Collision between
Computation and Data Analysis," Proceedings IBM
Scientific Computing Symposium Statistics. IBM
Data Processing Division, White Plains, New York,
1963, pp. 141-152.
83
-------
NATIONAL COMPUTER CENTER (NCC) SCIENTIFIC SOFTWARE SUPPORT - PAST, PRESENT, AND FUTURE
By M.Johnson
Traditionally, the computer center at Research Tri-
angle Park has obtained and maintained scientific soft-
ware packages of general utility for its user community.
Until 5 years ago, there was an IBM 1130 computer
serving about 25 local users in what was then the
National Air Pollution Control Administration. Nearly
all processing was of a scientific nature using
FORTRAN, the 1130 statistical and mathematical
package, APL, and SPSS. A great deal of CALCOMP
plotting was also done.
During the next 4 years after the installation of the
IBM 360/50, the user community expanded and applica-
tions greatly diversified. Statistical packages such as
BMD, SPSS, and SAS were made available, as well as the
IBM Scientific Subroutine Package. The interactive TSL
library was made accessible under TSO. Thus, scientific
software was implemented and maintained, but no
central consulting or training support was available.
Users tended to help each other with problems; trial and
error was the methodology. Also, as is frequently the
case when no designated user support staff exists, the
last system programer "touching" a particular package
tended to become responsible for it by default. Any
time spent diagnosing user problems was at the expense
of other regularly assigned systems tasks.
As the 360/50 became saturated about a year after
its installation, time was bought from the neighboring
university computer center. This provided access to a
wider range of statistical packages as well as to APL. The
universities also offered several short courses in such
packages as SAS and SPSS, at both beginning and
advanced levels, which were available to the EPA user
community.
Who made up the user community at that time?
Primarily it was still local, consisting of the Office of Air
Quality Programs and the RTP National Environmental
Research Center, but had grown to well over 100 users.
Regional offices began retrieving data from the
SAROAD data files. However, scientific applications still
represented a large portion of the job mix.
Shortly before the installation of the Univac 1110
in the fall of 1973, a user services function was estab-
lished. Unfortunately, this coincided with the departure
of the two DSD staff members who had statistical back-
grounds and experience with scientific software. These
vacancies could not be refilled, so scientific users still
had to rely primarily on each other for debugging
assistance. Naturally, a situation developed that once a
routine was made to work and the user became familiar
with a certain software package, there was great reluc-
tance to explore other possibly more expedient
alternatives.
Procurement of scientific software for the Univac
system has been rather haphazard, although the com-
puter center has attempted to obtain and implement
available software in response to user's requests. SAS
and TSL were converted as part of the Univac con-
version contract. STAT PACK and MATH PACK are
Univac-supplied routines directly callable from
FORTRAN. OMNITAB was obtained from the National
Bureau of Standards, STATJOB received from the
University of Wisconsin, and APL acquired from the
University of Maryland. The University of Maryland now
has the BMD-P series available for Univac, and this
package has been ordered for NCC. All the standard
CALCOMP plotting capabilities are available and
TEKTRONIX interactive graphics have been imple-
mented. A Univac version of SPSS was one of the first
packages to be installed, but installation was essentially
where central support stopped. Consultation and
debugging assistance has been severely limited and
training virtually nonexistent in the efficient and
expedient use of the software.
Recently, a scientific software manual has been
developed by SAI under contract for Elijah Poole. SA1
surveys Agency-wide scientific software and provides
descriptions and sample runstreams of all available
packages. We now have a central source of current infor-
mation which can be expanded and updated as our
resources improve. Elijah Poole has also coordinated
three regional training sessions to introduce the manual
and software resources to the EPA scientific community.
Now as a part of MIDSD under the direction of
Willis Greenstreet, we are on the threshold of change and
the future looks bright. We are now charged with the
responsibility of becoming an Agency-wide computing
resource serving several hundred users and have a new
84
-------
name, EPA National Computer Center. Not only is it
necessary to upgrade the computer hardware but also to
improve supporting services. A modification to the
existing systems programing contract with ISSI will
provide a highly experienced staff dedicated to the
support of user services functions. The responsibility of
enhanced scientific software support has been clearly
defined and appropriate staffing is being procured. This
support will include evaluation, implementation,
maintenance, documentation, training, and consulta-
tion-all specific to the needs of the NCC user com-
munity. The Scientific Software Committee will be
resurrected and made a viable channel of information
exchange within the scientific community. Attention has
been brought to bear on the inadequacies of the con-
verted version of SAS and on the question of what
software, currently unavailable, should be provided. New
software requirements will be evaluated in a reasonable
and orderly manner and, once procured, adequate
support will be provided to assure efficient utilization.
85
-------
EXPLOITATION OF EPA'S ADP RESOURCES:
OPTIMAL OR MINIMAL?
By John J. Hart
The traditional approach to the provision of ADP
support to various functional requirements in Govern-
ment and industry has been based on centralization of
hardware, software, and systems analysis/programing
resources. Because of economies of scale and require-
ments for highly specialized technical skills, this concept
has been both necessary and desirable. Generally,
individual divisions and branches within EPA research
laboratories cannot afford to employ a central staff of
analysts and programers with the varied technical skills
necessary to effectively support the diversified data
processing and analysis functions associated with today's
research and development problems. Likewise, the costs
of complex, sophisticated, and comprehensive hardware/
software systems prohibit the use of local computer
installations. Although for all practical purposes EPA's
centralized hardware, software, and personnel resources
are providing competent and useful support services, it is
suggested that the Agency has not yet fully exploited
the total capabilities available and inherent in the
sophisticated computer systems at the National Com-
puter Center (Univac 1110) and Optimum Systems, Inc.
(IBM 370). This paper will review several factors which
affect the utilization of these capabilities and will
suggest opportunities for improvement.
The Agency's missions include the identification of
pollutants, overall assessment of environmental quality,
development of strategies and techniques for control and
abatement, and implementation of continuous moni-
toring functions and mechanisms. The scope of these
missions and the quantity of physical, biological, and
chemical parameters which must be measured, analyzed,
and interpreted, obviously confront the Agency with an
enormous information and data explosion. Consider the
possible outcome if EPA were limited to the technology
available in the 1950's. An enormous number of people
would be performing statistical computations with
electromechanical calculators and slide rules, the overall
productivity would be low, and the error rates would be
extremely high. The ability to implement and effectively
use sophisticated modeling and simulation would also be
severely restricted.
Fortunately, in 1976, the Agency has the sophisti-
cated and comprehensive ADP resources to solve the
complex analytical and processing requirements
associated with its research and development missions.
In addition to numerous dedicated laboratory mini-
computers used to support analytical instrumentation,
the Univac 1110 at NCC and the IBM 370 at OSI
provide extremely fast computational and processing
capabilities, substantial mass data storage facilities, and
extensive libraries of canned scientific programs to
support statistical analysis, modeling, and simulation.
High-level programing language processors (FORTRAN,
COBOL) and software systems to efficiently support
data entry, editing, and data base file structuring
(Wylbur, IRS, System 2000) are also available. Through
the existing time-shared low-speed terminals and multi-
plexed communications facilities, scientists, programers,
and data clerks have immediate access to the large-scale
computers for program implementation and execution,
and for performance of varied data handb'ng functions
(e.g., entry, editing, and retrievals).
With all of these resources and capabilities, it would
be natural to assume that their application to the
Agency's missions are cost effective, efficient, and
sufficiently comprehensive in scope. It is suggested that
these conditions do not accurately describe current
conditions. For example, let us examine several char-
acteristics pertaining to the attitudes and involvement of
management and the scientific community. Frequently
we hear the following concerns expressed by
management:
Extremely large ADP expenditures
Lack of ADP planning
Fragmented and n on standardized ADP
resources and approaches to supporting the
Agency's missions
Complexity of issues regarding what resources
are required for specific applications
Complex technology requiring specialized
knowledge and training
86
-------
Communications problems in interfacing with
ADP professionals
Lack of adequate and competent assessment
of the cost benefits obtainable through use of
ADP technology and resources.
In contrast, the scientific community is usually too
busy promulgating the Agency's technical missions to
become intimately involved with the proper planning
and application of ADP resources. In fact, the scientific
community can be classified into three distinct groups:
1. A group which is significantly indifferent to
ADP technology and issues. They appear to execute
their technical data analysis responsibilities without
computers.
2. A group which acknowledges the need for use
of automation techniques and resources for selective
problems. Typically, they depend on central ADP pro-
fessional staffs for selective applications development.
3. The last group is very active in the application
of ADP resources in that ADP is integrated into their
technical line responsibilities relating to statistical
analysis, modeling and simulation, and engineering
design. In many cases, these people have taken the
initiative to leam computer programing and have devel-
oped excellent skills equal to, and greater than, many
ADP professionals. To them, ADP resources become
effective tools for analysis, design, and simulation
functions.
The previously mentioned areas of management
concerns and interests and involvement of scientific
personnel have a direct impact on the extent to which
ADP technology and resources satisfy the Agency
missions. The impacts are manifested in the following
ways:
Low numbers of ADP users among the
scientific community
Persistent use of manual methods for data
analysis
Low analytical productivity
High error rates
Redundancy in ADP systems development
Arms-length relationships with the ADP pro-
fessional and difficulties in communications
Disproportionate expenditures of ADP funds
(e.g., administrative vs. research and
development).
Can these conditions be changed? Can ADP
resources be more effectively employed? Is it possible to
increase the use of ADP in support of the technical
missions of the Agency? The answer to these questions is
affirmative. However, the burden of such accomplish-
ments rests with the ADP professionals and their respec-
tive management. First of all, the ADP professional will
have to take the initiative to break down the communi-
cation barrier by the following means:
Simplify the language used in communicating
with prospective users concerning the design
and implementation of new applications;
deemphasize the ADP technical jargon
Develop an increased awareness of the func-
tional aspects and utility of the user's pro-
posed application in the user's environment
Determine, describe, and emphasize the cost
benefits to be derived from the proposed
application
Determine and formulate the disciplines and
procedures required to make the application
successful and effective.
A second requirement for increasing the effective
application of ADP technology to the Agency's missions
will be to develop strategies and programs which enlarge
the ADP user population representing the scientific
community. The following are some possibilities:
Develop training seminars which demonstrate
and describe typical types of computer
applications using information from existing
systems
Develop an introductory training course on
standard general purpose statistical packages
(e.g., OMNITAB, SAS, BMD); such a course
would present an overview of the unique
capabilities of each package and typical
problem applications
87
-------
Develop an introductory training course on
the standard graphics packages and their appli-
cation to typical Agency problems; this course
would be followed by additional in-depth
instructional courses on specific packages (i.e.,
Calcomp, IPP, Tektronix)
Develop an introductory ADP concepts course
which enumerates and describes the Agency's
ADP resources and facilities (e.g., OSI, RTP,
Univac, IBM 370, Wylbur, other buzzwords).
The purpose of these suggested courses is to
indoctrinate the scientific community on the available
resources and typical applications. Existing courses, such
as FORTRAN programing, System 2000, and Wylbur
text editing, provide the detailed training required for
use of unique individual systems.
Additional effort is also recommended for ex-
panding the use of existing statistical packages and for
reducing the complexity of using several packages. There
are four specific recommendations to be made for
accomplishing this objective.
1. All statistical packages at OSI and NCC should
be reviewed and tested. Each package should be tested
to determine overall functional capabilities, limitations,
and restrictions for use.
2. A cross-reference directory should be devel-
oped to help a scientific user select the package best
suited to his/her requirements. This cross-reference
directory should briefly describe the functional capa-
bilities of each package and identify all packages which
solve common problems (e.g., analysis of variance).
3. Simplified written procedures should be
developed for use of selected statistical packages to solve
common and frequent types of problems (e.g., simple
regression). It has been suggested by several laboratory
scientists that a cookbook procedure for SAS and
OMNITAB would be useful for selected problems.
4. Considering the redundancy of statistical soft-
ware packages installed on Agency computer systems, a
comprehensive review of all existing packages followed
by development of a standard package for the Agency
may prove useful. At present, the scientist is confronted
with the task of reviewing literature on multiple sta-
tistical packages, each of which was designed for a
unique scientific discipline (e.g., behavioral science,
medical) and has unique characteristics and limitations.
88
-------
SCIENTIST, BIOMETRICIAN, ADP INTERFACE
By Neal Goldberg
To look at strengths and weaknesses in the scientif-
ic analysis of data in EPA, we must look well beyond the
realm of automatic data processing (ADP). A general
picture must include consideration of three disciplines:
Scientist
Biometrician-statistician
Computer specialist.
There must be a viable working relationship among
these three persons, or groups of persons, to ensure
proper completion of a project. Subsequently, there
must be a shared understanding of major principles
applied by all involved with the study.
Generalizing, the scientist may be a chemist, biolo-
gist, environmentalist, and so forth. In the scientific
scheme, the scientist is the person who defines a prob-
lem, accumulates data, and presents his or her solution.
The scientist must rely upon the biometrician who is
concerned with the proper design (e.g., replication,
sampling techniques, etc.) and interpretation of data.
Essentially, the biometrician will instruct the scientist in
the proper statistical procedures to find what he is look-
ing for and explain what he has actually found. In any
but the most basic experiment, an unwieldy amount of
data is usually accumulated. In most cases, the biometri-
cian needs to call upon the computer specialist for
proper organization and application of his/her require-
ments in the interpretation of data.
Within the laboratory, we see too many experi-
ments which are not "designed" until "after the fact."
Most often, this error is caused by failure to comprehend
proper mathematical/statistical techniques for sampling
and hypothesis validation. It is possible that the effects
of this misinterpretation could cause irreversible damage
if left uncorrected.
This error referred to is evidenced in many forms,
the most notable being that of lost time and money. The
most serious, however, involves the loss of credibility.
One example, is cited below.
A senior member of the scientific staff at a research
laboratory undertook an experiment to study the effect
of a potentially toxic substance using a suitable biologi-
cal indicator organism. At the conclusion of the 6-month
sampling period, he came to the slow realization that he
did not understand how to validate his findings mathe-
matically. In an attempt to find a reasonable solution,
the ADP operation was requested to provide a variety of
statistical analyses. All software was available from exist-
ing proprietary packages. Upon completion of these
analyses, no useful information was found. All data were
then rerun utilizing logarithmic transformation. After 5
months of attempting to find a solution, a stop was put
to the processing of these data. At this time, over 100
jobs had been run, requiring the punching of about
3,000 cards, more than 60 graphs were produced, and
more than 80 online data sets were maintained. Total
direct ADP costs incurred were around $2,500.
The principal investigator and others within the lab
began seriously to seek biometric services. After consul-
tation with members of the Department of Experimental
Statistics at a major university, all were satisfied that a
reasonable solution had been found: In one day, five
jobs were run and proper interpretation of the data
provided. Total cost for direct ADP services was $25
with only 560 cards required to be punched.
A loss of Agency credibility would reach far
beyond the loss of time and money cited previously. It
would strike directly at the justification for our
organized existence (i.e., protection of the environ-
ment).
A large portion of EPA effort is enforcement
oriented. In order to remain an effective entity, we must
be able to provide constructive support. The data we
produce must serve this end. Therefore, it is essential
that our methods (i.e., experimental design and
statistical/mathematical reduction) must be defensible as
well. Opponents of some of EPA's policies will expend
vast resources in an attempt to invalidate the Agency's
findings and weaken its regulatory ability. We cannot
afford to allow a defeat based upon a technicality in
experimental procedure.
Granted, these problems do not exist at all facili-
ties, but they have reached a critical point in some areas.
Currently at most Environmental Research Laboratories,
it is recognized that better utilization of existing data is
required. There are cases where new experimentation is
89
-------
in a holding stage pending adequate biometric analysis.
Also, interpretation and publication of some existing
data are being withheld until procedural methods are
suitably defined.
Where all three personnel resources are available
with effective lines of communication, an efficient scien-
tific process is more than likely to be found. In cases
where this effective process is not found, it can be
assumed that the parties involved will seek to nullify the
predicament. It is, therefore, necessary to address those
with the authority to act. Scientific problems of the
previously discussed nature are such that they are diffi-
cult, if not impossible, to communicate via the tele-
phone. Currently there are means for clarifying both
scientific and ADP problems.
EPA should attempt to equip those in the field
with the proper support for biometric/statistical func-
tions. Carrying this one step further, an investigation
should be undertaken to study the feasibility of
providing a staff to travel between laboratories, provid-
ing services as required. This probably would give rise to
a significant improvement in data handling and enhance
the atmosphere for more interlaboratory communica-
tion. Such solution would have the least impact upon
the personnel shortage.
There is general agreement between the scientific
investigator and the computer specialist that there is a
need for an intermediary in the scientific process; yet, it
is extremely difficult to acquire one. Neither the con-
tracts office nor the personnel office have .been able to
provide any direction toward this goal.
The EPA scientific software study shows that
sufficient mathematical/statistical analyses are available
to meet most requirements. This wide spectrum of pro-
prietary software is of little value, however, when there
are so few available who fully understand its utility. In
this context, it must be recognized that ADP services
cannot be used as a tool for arbitrarily attempting to
solve problems. Like all others, this resource is not limit-
less.
To minimize problems, project managers and ADP
coordinators must be sure that they are getting the most
out of a project. Automatic data processing cannot be
scientifically effective as an entity. It is only one part of
a whole system and, as such, cannot function until all
the pieces are in place.
90
-------
STATISTICAL DIFFERENCES BETWEEN RETROSPECTIVE AND PROSPECTIVE STUDIES
By Dr. R. R. Kinnison
The U.S. Environmental Protection Agency is
currently collecting and archiving massive quantities of
environmental data. This collection is intended to pro-
vide rapid access to answers to questions that may arise
in the future. With this rapid access, it will not be neces-
sary, in most instances, to initiate a project to collect
raw data for an adequate analysis of a situation.
In statistical terms, analysis of existing data is
performed in a retrospective study. Development of a
data base to answer an existing question is performed in
a prospective study. Statisticians know that the generally
used statistical data analysis techniques yield biased
answers when applied in retrospective studies. The
commonly used statistical data analysis techniques were
developed for prospective studies, and they assume
many characteristics of the data base, some of which
cannot be met by a retrospective data base. The usual
resulting bias is in the direction of finding significant
effects when, in fact, none exist.
Of course, the concept of a retrospective study is
not new; it was developed to a high degree of sophistica-
tion by medical epidemiologists studying chronic
diseases in humans. In such instances, a prospective
study would require following a sample of humans for
their entire lifespan of about 70 years. Such a study
obviously would be expensive, and the time delay
between question and answer would be unacceptable. In
addition, elaborate precautions are necessary just to
keep track of the fate and the location of the elements
of the sample. Naturally, techniques have been devel-
oped to avoid a prospective study in such situations.
These techniques come under the general statistical
category of "Observational Studies," and, in general,
they rely on retrospective data.
The statistical analysis tools applicable to observa-
tional studies are distinct from those applicable to
prospective studies. They are not as highly developed as
the more commonly employed statistical tools. There is
also a substantial amount of current research effort
being devoted to this type of statistical analysis. An
excellent review article, "The Design and Analysis of the
Observational Study-A Review," by Sonja M. McKinlay,
was published in the Journal of the American Statistical
Association, September 1975 (Volume 70, Num-
ber 351). Those characteristics of observational studies
that specifically define a retrospective analysis follow.
The "treatment," which determines the groups for
comparison, is the statistical effect, and the observed
response is the hypothesized cause. Thus, we look al the
past smog exposure in people that now have lung
disease, rather than determine what will happen in
currently healthy people exposed to known degrees of
smog.
Those assumptions of "regular" statistics that are
violated in retrospective analysis are that: (1) the replica-
tions of the experiment are made under similar and
known conditions, (2) the replications are mutually
independent, and (3) the uncontrolled variation is due
only to random fluctuations. In retrospective studies,
the conditions of the experiment are random and the
effects are known. Thus, the treatments cannot be pre-
assigned in a random manner. Note that a retrospective
study invariably will exclude all those subjects exposed
to the treatment, who did not develop the effect.
Without these nonresponders, common statistical tech-
niques are invalid. Two other properties are-desirable in
retrospective studies; techniques that permit (l)the
capability to measure and to manipulate systematic error
in the absence of randomization, and (2) methods to
evaluate evidence from a variety of sources, each of
which may have unknown and different characteristics.
The EPA computerized information systems tradi-
tionally have been used to collect data, retrospectively
evaluate that data to suggest questions, then used the
same data to answer those questions using techniques
applicable only to prospective studies. The data are
good, but the answers are not. There is a statistical
science available that will provide good answers. Our
current data screening efforts to find good questions are
valid, but the finding of a possible effect is not
synonymous with the firm and statistically valid
establishment of the existence of that effect. We have
taken the first step, but not the second, which is the
application of observational analysis statistics. So far our
data studies are best characterized as data screening; in
fact, a very limited type data screening. Such screening
efforts must be expanded to find all the good questions,
and to find them before they become political issues. We
also must recognize that data screening only raises the
question. A new effort in observational analysis, unique
to EPA in genera], is necessary to find valid answers
from existing data.
91
-------
Because of the massive archival data at our disposal,
there is another important principle that should not be
overlooked. This principle, once the archival data raises
the question, is to collect new data specifically to answer
that question or hypothesis. It is, of course, easier to
analyze existing data than to design and execute a valid
experiment, but the effort may be necessary to obtain
the truth. In general, it is a good research practice to
collect specific new data. EPA needs to emphasize
efficiency in finding potential problems, and to initiate a
new awareness of the need for reaching valid, complete
answers.
92
-------
RAISING THE STATISTICAL ANALYSIS LEVEL OF ENVIRONMENTAL
MONITORING DATA
By Wayne R. Ott
INTRODUCTION
The term monitoring data, as used in this paper,
denotes routine measurements used to represent or
describe the state of the environment. The definition
includes measurements of environmental contaminants
and other variables in air and drinking water as well as in
lakes, rivers, and marine waters. Routine monitoring
data are collected at great expense and in large quantities
by environmental control agencies.
An example of monitoring data is data generated
by a metropolitan air-monitoring network. A general
urban monitoring network may consist, for example, of
10 stations, each measuring the six "criteria" air pollut-
ants: sulfur dioxide, nitrogen dioxide, ozone, hydro-
carbons, carbon monoxide, and total suspended particu-
late. Five of these six pollutants usually are measured
continuously on an hourly basis, and one is measured on
a 24-hour basis. Potentially this gives 438,000 values per
year for the hourly data (8,760 hours/ year x 5 pollut-
ants x 10 stations) and 3,650 values per year (365 daysx
1 pollutant x 10 stations) for the 24-hour data. The cost
of installing and maintaining such a network can be sub-
stantial, over $60,000 initial investment per station and
probably more than $100,000 per year for overall main-
tenance for all stations. Therefore, such a network has
an original cost of more than $600,000 and can generate
441,650 values per year, about four values per dollar of
annual operating cost. The metropolitan air monitoring
networks installed across the Nation represent a national
investment of possibly over $30 million.
These data ultimately find their way into large
monitoring data archives, such as STORET, SAROAD,
and the EPA water supply data bank. It is reasonable to
ask whether the resources expended on the analysis,
interpretation, and display of these data are sufficient
(and whether the quality of these analyses is adequate)
relative to the resources originally spent to collect the
data.
THE PROBLEM
For the most part, the problem is that State and
local air and water pollution control agencies do relative-
ly little in-depth analysis of these data. Some agencies
lack access to computers, and some even lack the techni-
cal expertise to perform in-depth statistical analyses. The
usual practice is to calculate means and standard devia-
tions, to note how many values are above or below envi-
ronmental standards, and to report the data to the
public in some form of environmental index. Very few
agencies examine correlations between environmental
variables and causative factors, for example, or study
trends using formal techniques such as time series
analysis, or probe underlying statistical characteristics of
the data. In addition, EPA's effort to carry out in-depth
monitoring data analysis is very limited in scope.
In one sense, we have the spectre of a vast resource,
environmental monitoring data, that is largely unexploit-
ed and underanalyzed. Looking at the total picture,
there appears to be undue emphasis on making measure-
ments and minimal emphasis on interpreting the mea-
surements once they are made. The goal appears to be to
collect, then to store, but not really to analyze.
AREAS REQUIRING EMPHASIS
There are al least three areas where we need to
strengthen our efforts in upgrading statistical and mathe-
matical levels: examination of underlying statistical
properties of the data, correlation analyses, and trend
analyses.
Underlying Statistical Properties
Except for very simple curve fitting, no extensive
work has been undertaken to examine the underlying
distributions from which environmental quality data
arise, either for air or for water quality data. This lack of
detailed analysis is of particular concern because some
regulatory decisions and environmental standards are
based on the assumption that measured concentrations
have a particular distribution, such as the lognormal dis-
tribution. In a partial effort to fill this need, the Office
of Monitoring and Technical Support has completed
contractual work to develop univariate curve-fitting soft-
ware. For example, the computer program MODEL.FIT,
still under testing, is a general-purpose tool to evaluate
the suitability of different probability models, such as
Gamma, lognormal, Weibull, and normal, for a given
data set. More work must be done in applying such tools
to existing monitoring data.
93
-------
Correlation Analysis
At present, there also appears to be too little em-
phasis on examing the correlations of important
causative factors. In analysis of air, for example, we need
to examine the correlations of meteorological variables,
such as fuel use patterns and industrial and vehicular
emissions, with measured pollutant concentrations.
Some examples of recent questions of interest are:
How did measured carbon monoxide and sul-
fur dioxide concentrations change in relation
to last year's fuel shortage?
Is the decline of sulfur dioxide concentrations
in New York City over the last 10 years con-
sistent with the predictions of diffusion
models or proportional rollback models,
which often have been criticized and on which
major regulatory decisions are based?
Can these complex phenomena be treated by
multiple regression analysis, or are other
mathematical tools more appropriate?
Such studies, if carried out in depth, may give EPA
policymakers and managers important insights into the
way regulatory and energy policies are affecting environ-
mental quality.
Trend Studies
Some trend reports are now routinely produced by
EPA, and they are excellent contributions to the envi-
ronmental literature. However, these reports do not
probe very deeply into trend phenomena of the data.
For example, the Box-Jenkins time series analysis is a
powerful, but little used, tool for detecting underlying
changes over time that may be masked by meteorologi-
cal and random fluctuations. Stratification of observed
values, by different meteorological or hydrological con-
ditions, is another neglected method of analysis. Auto-
correlation functions for existing data sets, which can
reveal subtle details about concentration variation over
time and are important for short-term forecasting, are
not routinely calculated by EPA or State and local
agencies.
CONCLUSION
Water quality data actually may have suffered from
a somewhat greater lack of in-depth analysis than air
quality data, not only because there are many more en-
vironmental variables in water than in air but also
because there are less continuous data available. In both
air and water analyses, however, the need exists for
improved data interpretive techniques and for greater
use of these techniques to deteci subtle trends, to
improve the display of data, and to translate the data
into forms that are understandable to managers and
policymakers.
In summary, a critical problem appears to exist in
the field of environmental monitoring, a gap between
the effort expended to collect the data and the effort
expended to analyze and display it. As the EPA quality
assurance effort moves increasingly toward the day when
monitoring data will be of uniformly high quality, we
may find ourselves in the embarrassing position of being
unable to confidently determine whether the en-
vironment has grown better or worse, or the responsible
factors, simply because we have not sufficiently ana-
lyzed the monitoring data. Certainly our efforts to
improve the quality of the data must not outstrip our
efforts to intelligently analyze it.
Finally, there are, I feel, two important needs in
the area of monitoring data analysis: (I) a need for the
Agency to produce and demonstrate more tools to aid in
the analysis, interpretation, and display of environmen-
tal data (guideline documents, customized computer
programs, statistical manuals), and (2) a need for the
Agency to give greater emphasis to applying these tools
and to carrying out high quality statistical and
mathematical analyses of monitoring data.
REFERENCES
1 Box, George E.P. and Jenkins, Gwilym M., Time
Series Analysis, Forecasting, and Control, Holclen-
Day, Inc., San Francisco, 1970
94
-------
QUALITY ASSURANCE FOR ADP
(AND SCIENTIFIC INTERACTION WITH ADP)
By R.C. Rhodes
Quality assurance personnel and statisticians are
also concerned about ADP. Quality assurance practi-
tioners (and quality control statisticians) are interested
in the producing end of the total process-obtaining
good data for storage in the computer, while applied
statisticians and scientists are concerned with the using
end of the total process-performing good analyses on
the data in the data bank.
QUALITY ASSURANCE
A total quality assurance system for pollution
monitoring organizations involves the following elements
presented in Table 1.
Table 1
Elements of a Quality Assurance System
Quality policy
Quality objectives
Quality organization
and responsibility
QA manual
QA plans
Training
Procurement control
• Ordering
• Receiving
• Feedback and corrective
action
Calibration
• Standards
. Procedures
Internal QC checks
Operations
. Sampling
. Sample handling
. Analysis
Data
. Transmission
. Computation
. Recording
. Validation
Preventative maintenance
Reliability records and analysis
Document control
Configuration control
Audits
. Onsite system
. Performance
Corrective action
Statistical analysis
Quality reporting
Quality investigation
Interlab testing
Quality costs
Some of these elements may not be as important
for research programs as for monitoring efforts. In either
endeavor, however, the product of pollution measure-
ment programs is data. The quality of the data may be
measured by:
Accuracy
Precision
Completeness.
In the simplest definitions, accuracy is the closeness of
the measured values to the truth, precision is the
measure of repeatability, and completeness is a measure
of the amount of data obtained.
A unique feature of data as a product is that
examination of the product itself does not reveal its
quality with respect to accuracy and precision. Com-
pleteness can be indicated only by the amount of data
obtained compared with the amount of data which
should have been obtained under perfect conditions.
Accuracy and precision must be determined from
ancillary information obtained during the measurement
process. Accuracy may be checked by using primary
standards (standard reference materials from the
National Bureau of Standards) or secondary standards
traceable to these primary standards, for calibration.
Other information relative to accuracy may result from
interlaboratory comparisons.
Checks for precision are obtained during the
measurement process for internal quality control by
using duplicate samples, spiked samples, interspersed
standards, and so forth. The use of back-to-back dupli-
cates of split samples in the analytical portion of a
measurement system provides some internal quality
control, although it is an inadequate (ultraconservative)
measure of precision for the entire measurement process.
One of the best methods of measuring the precision of
the overall measurement system is the use of dual or
colocated sampling with the analyses of the two samples
being made as independently as possible. Those responsi-
ble for measurement systems are strongly encouraged to
use, at least to a limited extent, colocated sampling to
obtain overall system precision estimates.
Apparently, none of the pollution data banks
currently incorporates, either .directly or indirectly,
measurements of precision and accuracy which can be
matched with given blocks of data in the data banks.
(Some efforts are being made in this direction.)
Certainly, a key question in any enforcement situation
is: "What objective evidence (data) exists which assures
or measures the accuracy and precision of the data on
which the enforcement action is based?"
95
-------
Completeness of data sets is important in research
efforts where the objective is to relate pollution data to
other data, such as health effects or meteorological
information. For example, complete health effects data
are of little value if no matching air pollution data exist.
Obviously, completeness of data is important in compli-
ance monitoring.
Ideally, precision and accuracy data should be avail-
able (i.e., reported) together with the monitoring data of
the data banks so that confidence limits may be applied
to each of the monitoring data values. Another type of
auxiliary data which should be included in the data in
the data banks is a "special events file." Unusual pollu-
tion measurements may result from special events or
circumstances observed or recorded at the time of
sampling. Such information would be helpful to users of
the data banks.
The use of ADP systems and quality assurance
systems should be mutually beneficial. Obviously, ADP
can and should be used for statistical computations,
monitoring data summaries, and pertinent record-
keeping aspects of quality assurance. On the other hand,
quality assurance concepts and techniques can and
should be used in procuring and operating ADP systems.
Particularly applicable to integrated analysis/computer
facilities and operations are the quality assurance
elements presented in Table 2 which have been selected
from the list in Table 1.
All of the quality assurance elements presented in
Table 2 except perhaps for "Calibration and internal QC
checks" apply equally well to ADP data bank/central
computer procurement and operations. (Even for these
systems, the periodic use of test programs may be con-
sidered a form of calibration and internal quality
control.)
STATISTICAL ANALYSIS OF DATA IN DATA
BANKS
Potential users whose work should be greatly en-
hanced by data banks and ADP capability ask the fol-
lowing questions:
What data are in the data banks?
Parameters
Time
Geographic location
Table 2
Quality Assurance Elements Particularly
Applicable to ADP
QA Elements
Procurement
Ordering
Receiving
Receiving
Calibration and internal
QC checks
Reliability records
and analysis
Preventive maintenance
Document control
Operational procedures
Computer programs
Configuration control
Configuration control
Data validation
Manual editing
Scientific vaJidation
Specific Activities and
Considerations
Specifications
Performance demonstration
Operating and maintenance manuals
Operational computer programs
Warranty
Complete operational checkout
Acceptability criteria and
corrective action procedure
Filing of results and Iraccabilily
to monitoring data
Recording of frequency and cause
of failure, MTBF for system, and
components
Establishing optimum schedules und
recording evidence of maintenance
actions
Arc records adequate for procedures.
software, and hardware configuration
to be reconstructed for some
specific past date?
Techniques of detecting human errors
Techniques involving scientific
considerations to detect questionable
data
Spatial and temporal continuity
Interrelationships between different
measurements
What data analysis programs are operational?
Statistical
Mathematical
How to talk to the big computers without
learning a new language such as FORTRAN?
What interactive graphics programs are opera-
tional for use with a CRT, so that the data can
be seen in various ways before, during, and
after analysis?
Histograms without transformations
Histograms with transformations
Correlation (X-Y) plots
96
-------
X-Y-Z plots with different symbols to
indicate levels of Z
Time (chronological) plots, one or more
variables on same plot
Plots of residuals
Regression
Analysis of variance
Distribution of differences and percent
differences for paired data
Map arrays (given coordinates and data
values)
Contour computations, plots, and map
overlays
Provision for deletion of specified data
points prior to computations
How and when will the capability exist to do
all these things?
What non-ADP scientific users need to get optimum
use of ADP are:
or
INstructions for
Scientific and
Technical
Users
Simple
Understandable
Instructions for
Technical
Scientific
Users
(IN SITU)
(SUITSU)
gathering and reporting methods. Furthermore, quality
assurance generally has dealt with the complete measure-
ment-system-sampling, analysis, and data validation
(Figure 1), rather than with ADP for the data bank.
QUALITY ASSURANCE
ANALYSIS ^_ DATA _
(MEASUREMENT! VALIDATION
I
TECHNICAL
" DATA
ANALYSIS
Figure 1
Manual Measurement System with ADP Data Bank
Recently, an increasing number of analytical pollution
measurement methods have been automated with micro-
and minicomputers. These also may be used as
temporary data banks and may be directly tied in with
the larger computers handling the master data bank
(Figure 2). This makes it necessary for the quality
assurance process to focus more on the ADP for the data
producing system, rather than on the data storing system
alone.
QUALITY ASSURANCE
1 I
SAMPLING -
„ DATA _
VALIDATION
I
TECHNICAL
DATA
ANALYSIS
Figure 2
Integrated Analysis-ADP System with ADP Data Bank
Potential users of the data banks, such as statisti-
cians and other scientists who could benefit from using
the data and who desire to make the best analysis of the
data stored in the bank, are reluctant to do so because of
the complicated learning process.
in English (not computerese or FORTRAN). All con-
versation between the user and the computer should be
in English or mnemonic English abbreviations.
SUMMARY
Two major concerns related to automated data
processing (ADP) systems are quality assurance of the
data which enter the data system and scientific inter-
active use of data after they have been stored in the
system.
In the past, much of the data entering the pollution
monitoring data banks was generated through non-ADP
systems; i.e., manual analytical methods and data
97
-------
HOW TO WRITE
BETTER COMPUTER PROGRAMS
By Andrea T. Kelsey
Poor computer programing is a weakness found in
statistical data analysis, as well as in all ADP environ-
ments. In general, computer programs are poorly
designed, poorly coded, and hardly ever bugfree. In
addition, they are usually impossible to understand and
nearly impossible to maintain and to modify.
Good programers are hard to find, and even when
good, they are often at a loss to teach their successful
methods to others. Data analysts are still awaiting the
packages or the language that will enable them to
analyze data properly, and with a reasonable amount of
effort. For them to have this software, and for it to be
correct and maintainable, the quality of computer
programing must be improved.
Structured programing appeared to be the answer
to the problem of poor computer programing. The two
basic rules of structured programing are: (1) limit the
choice of constructs which the programer can use, and
(2) break the program down into modules which are
functionally independent.
Although they are sensible, these rules tell nothing
about how to write a structured program, only what one
should look like. Specifically, structured programing
does not answer the question, "Which collection of
modules is right for this particular problem?"; that is,
"What should the program structure be?"
In the book Principles of Program Design, Michael
Jackson describes a programing technique which shall
be called MJT for the purposes of this paper. This
technique does answer the question, "What should the
program structure be?" MJT is most applicable for
programs concerned with data processing and with the
development of systems. A scientific data analysis
system can readily be described as a data processing pro-
gram with sophisticated elementary operations. There-
fore, MJT would be especially valuable in the
development of any comprehensive data analysis system.
Basically, MJT consists of a formal analysis of the
structure of the data files on which the program must
operate. A program structure is then designed which has
the same "shape" as the data. The machine-executable
instructions are allocated to the program structure where
necessary to accomplish the objectives of the program.
All the decisions are made before any coding is done.
There are five steps to writing a program using
MJT. They are described below:
1. Draw a data structure for each input and
output file. Use any combination of the three types of
diagrams (sequence, selection, and iteration) shown in
Figure 1. Figure 2 shows examples of two data struc-
tures. The input file contains personnel records sorted
by department, with each record containing department
code, name, salary, and status. The status contains either
a 'C' for current employee or an 'F' for former
employee. The second data structure is the output
report. It contains a report heading followed by a report
body. The report body contains an iteration of lines.
Each line contains the name, followed by the salary,
followed by the status. The status is either 'CURRENT
or 'FORMER' depending on the value of the input
status.
2. Draw the program structure. In this step, look
at the data structures and draw all one-to-one correspon-
dences between the input and output files. As a result,
the program structure can be drawn. The one-to-one
correspondences in the example are:
For each file there is one report, one heading,
and one report body.
For each input record there is one line on the
report.
For each name, salary, and status on input
there is a name, a salary, and a status on
output.
For each 'C' for status on input there is
'CURRENT' on output.
For each 'F' for status on input there is
'FORMER' on output.
The resultant program structure is found in Figure 3.
98
-------
3. List and allocate elementary operations. In
this step list all machine-executable instructions neces-
sary to accomplish the program objective and allocate
these operations to the program structure. In the
example, the list contains:
1 Read record
2 Write heading
3 Write line of report
4 Open file
5 Close file
6 Stop run
7 Move NAME to output line
8 Move SALARY to output line
9 Move 'CURRENT' to output line
10 Move'FORMER'to output line.
The allocation of the elementary operations is shown in
Figure 4.
4. Write the schematic logic. Take the program
structure and write it in a special language that allows
only the three constructs shown in Figure 1 :
Sequence
Selection
Iteration.
The schematic logic for the example is:
PFILEseq
Open FILE:read FILE:
PHEAD1NG seq
Write heading;
PHEADING end
PBODYseq
PRECLINE iler until end of file;
PNAMEseq
move NAME to line:
PNAMEend
PSALARY seq
move SALARY to line;
PSALARY end
PSTATUS select (CURRENT)
move 'CURRENT' to line:
PSTATUS or (FORMER)
move 'FORMER' to line;
PSTATUS end
Write line;
Read FILE:
PRECLINE END
PBODY end
Close FILE;
Stop run;
PFILE end
5. Write the code. Take the schematic logic and
write the code in any programing language.
The result of MJT is a correct program, one that
has the right structure so that modifications can be made
without causing the unwanted interactions that usually
accompany a modification. The example is a simple one,
but the advantage of MJT is that when it is used on a
large, complicated program, the program becomes a
series of simple processes.
Other aspects of MJT are briefly mentioned below:
Program Inversion. There are times when the
data structures will not fit together and when
all necessary one-to-one correspondences
cannot be found. This is called a structure
clash. Sorting the files can sometimes resolve a
structure clash. If not, then program inversion
can be a solution. The technique involves the
following two steps:
Think of the program as two programs.
The first program has an output file
which resolves the structure clash. This
output file is an input file to the second
program.
Convert one program so that it can run as
a subroutine of the other. This is easy,
once the schematic logic has been written
for both programs. A cookbook method
is used to convert one program to a
subroutine.
Backtracking. In some cases where a selection
is necessary, the condition of the selection
cannot be determined at the selection time.
Backtracking means assuming one of these
selections is correct, and using it. If it is found
to be incorrect, one branches to the correct
selection.
Optimization Rules. There are two rules on
optimization:
Don't do it.
For experts only: Don't do it until you
have a perfectly clear and unoptimized
solution.
The main reason is that optimization makes a
system less reliable, harder to maintain, and
therefore more expensive to build and
operate.
99
-------
GO TO Statement Use. The conventional REFERENCE
objection to using GOTO statements is that
they permit unrestrained branching from one 1 M. A. Jackson, Principles of Program Design,
part of a program to another. MJT does not Academic Press, 1975.
allow this use of the GO TO, either. But in
three cases, MJT allows GO TO statements:
In backtracking to branch to the correct
selection.
When doing program inversion (restricted
use of a computed GO TO).
When the shortcomings of the pro-
graming language force its use to imple-
ment the schematic logic.
Read Ahead Principle. MJT always uses this
principle. It is defined below:
t
Always have one read operation immediately
following the open operation for each input
file. Then always read another record when
the previous record has been completely
processed. By following this principle, the
logic of when to read does not become a
problem.
Collating. Before learning MJT, matching two
or more sorted input files has almost always
been a problem that must be solved each time
a collating program is needed. MJT teaches
that there is only one. collating problem,
which always has the same solution. This
solution, .made possible by the read ahead
principle, requires too detailed a description
for this paper. It is mentioned only to point
out that MJT has a mechanical solution for
collating problems once the collating keys
have been identified.
My experience with this technique has involved a
large data processing program which was written in
COBOL and which interacts with a System 2000 data
base. By using MJT, the programer understands the
program well and is convinced that the program is
correct. When modifications are needed,' the programer
knows exactly where to make the changes. The pro-
gramer knows, also, that the changes will not adversely
affect the other parts of the program.
In conclusion, the use of this technique would
greatly enhance the quality of the software the data
analyst has to work with, and would make development
of a comprehensive data analysis package reasonable.
100
-------
PROGRAM STRUCTURE
Figure 1
PROGRAM STRUCTURE
PFORMER
0
Figure 2
101
-------
PROGRAM STRUCTURE
PROGRAM STRUCTURE
OUTPUT REPORT
RNAME
RSALARY
RSTATUS
SEQUENCE
SELECTION
ITERATION
A CONSISTS OF B
FOLLOWED BY C.
A CONSISTS OF B OR
A CONSISTS OF C.
A CONSISTS OF ZE RO OR
MORE OCCURRENCES OF B.
Figure 3
Figure 4
-------
SUMMARY OF DISCUSSION PERIOD - PANEL III
In the discussion following the presentation concerned with strengths and weaknesses of analysis of scientific data, the
following conclusions were drawn.
Data System Uses
The panel agreed that EPA data systems accomplish more than just storage of data. Research data, such as health effects
data, are being subjected to extensive analysis, and resources are being provided specifically for that purpose. In the case of
most routine compliance monitoring data, so little of the right kind of analysis is being done that, in effect, only data storage
is accomplished. There was a rejoinder from the floor that perhaps it should not be the function of the monitoring data
systems to provide the analysis but merely to make the data available and that other components of EPA should be
concerned with the analysis. There was sympathy for this view among some of the panel. The panel generally agreed that
these systems should provide for easy retrieval of the data in a form which can be analyzed readily.
Experimental Design and Analysis
Some felt R&D management is not insisting that scientists apply good experimental design techniques. Various members
of the panel felt that the kind of people required to do the statistical design and subsequent analysis of experiments generally
is not available within ADP. (Notable exceptions are the HERL and EMSL at RTP which have their own in-house cadre of
statisticians.) Such resources, therefore, must be provided either from within the R&D laboratories or by contract. In the case
of contracts the panel felt that an employer-employee relationship was needed in this situation. How one stimulates such a
relationship, given the contracting process was asked. Several panel members said "not very well." Others said some sort of
regional contract is needed which would allow the contractors to become so familiar with the research program that they
could contribute to the design and analysis of experiments.
Quality Assurance Guidelines
In answer to what ORD is doing to improve monitoring QA it was pointed out that over the past year or so, there have
been a series of "Guidelines for QA Programs" for various pollutant measurement methods (ambient and source air). There
are a dozen or so currently available and more are being developed. What is needed most is a general quality assurance
handbook, "QA Handbook for Air Pollution Measurement Systems," which will be ready, hopefully, by February 1976.
While written specifically for air programs, much of it will be applicable to any media. It was noted from the floor that there
has been a quality control handbook for water available for 4 years. Either people ignore it or, after using it, data is put in the
data banks without distinguishing whether good quality control was used in the production of that data or not. The panel
concluded that the water handbook was important and good, but that it was directed primarily to the analytical laboratory,
while the real need is to address the total measurement process.
Appreciable resources are being devoted to some aspects of the quality assurance effort, but applications of quality
control techniques to data once it is in the data bank is recent and quite limited. There are substantial policy questions
involved with ORD's role in this area. ORD is a technical organization and can provide such things as software, studies, and
techniques, but STORET people for example, would be the ones applying such techniques to STORET data.
Quality Assurance Responsibilities
The panel agreed that EPA's quality assurance efforts should not be transferred to the program offices for various
technical reasons, such as coordination with State agencies which cannot afford to be split this way. Any program in which
data are being generated should implement quality control methods and techniques and should have persons assigned to
quality assurance responsibilities. However, because of the many common quality assurance activities across media and
programs, the top level quality assurance organization for EPA should be in one place for maximum efficiency and
effectiveness.
103
-------
QA Information in Data Bases
The panel agreed that the quality assurance information that should be carried in the data base should at least include a
precise description of the conditions under which the data were derived. Also, some measure of accuracy and precision
associated with the data would be highly desirable. There was no agreement on whether explicit measures of these quantities
should be carried in the data base or whether such measures should be related to the conditions under which the data were
produced.
ADP Centralization
The general feeling among the panel was that it would be good to centralize in one office or laboratory all ADP people
at a particular facility. As with many other resources, some combination of centralization and decentralization is appropriate.
A laboratory or office which has substantial ADP needs, such as RTP, should have its own ADP staff.
Policy for Non-Agency Requests
The panel stated that the Freedom of Information Act is the EPA policy toward providing data to people outside the
agency.
Contract for Statistical Data
The consensus of the panel was that while the worth of a statistical services contract could be debated at length, any
such contract should be negotiated by the research organization itself, not MIDSD.
Structured Programing
Should GO TO statements be used and if so when?
The problem with the GO TO statement was expressed as permitting unrestricted movement among components of the
program. However, limited use of the GO TO is called for in a few situations; namely, those identified in the paper on the
Michael Jackson techniques.
It was decided that existing systems should probably be rewritten using structured techniques whenever modifications
are extensive at all.
The Michael Jackson technique was said not to be a competitor of Structured Programing. The MJT answers the crucial
question of how to construct a particular program. The result of its answer is a structured program.
Statistical Training
It was asked whether any efforts are being made to train chemists and other non-ADP people in how to use the
statistical packages. Several seminars were described which, have been open to everyone with a need to know, but no
concerted effort specifically directed towards non-ADP people was known.
Scientific Software
One way for the RTP-NCC staff to get information on software and training needs was established at Research Triangle
Park before the National Computer Center began. It is the Scientific Software Committee. Whether it should be expanded to
include the regions or other groups is being discussed. Some sort of formal means of polling the user community on scientific
software needs is definitely needed.
To some degree, the panel advocates providing users with custom software packages to meet their needs. In developing a
software package, a class of users should be identified and the package should be tailor-made for that class. But the classes
should be kept very large.
104
-------
Statistical Package Development
Whether the development of a comprehensive statistical package is something that must be done in-house or can be
contracted was discussed. The definition of need must be done in-house up through having functional and performance
specifications. Only then can the development of the package be contracted.
It was then asked whether the development of comprehensive statistical packages should be the responsibility of a
central group or of the data base managers.
The panel stated the data base managers should determine needs; the central group should be the means of seeing that
the needs are met.
105
-------
THE UTILITY OF BIBLIOGRAPHIC INFORMATION
RETRIEVAL SYSTEMS
By Johnny E. Knight
Over the last several decades we have seen what is
now popularly referred to as an "information explo-
sion." Presently, the United States spends at least
$11.8 billion a year on scientific and technical informa-
tion activities. As much as $6.1 billion of this amount
was spent on all distribution-associated aspects including
distribution, storage, and retrieval. In the past decade,
the number of scientific periodicals has increased by 9
percent while technical report literature has increased by
an estimated 16 percent.
At least four facts have made it almost inevitable
that some form of computer bibliographic data handling
system would be developed. First, the amount of
scientific and technical literature has become massive.
Second, the assimilation of information by scientists and
engineers, primarily as salary costs attributable to
browsing, searching for information, and reading is
reported to be as much as $3.3 billion. A cost reduc-
tion in this respect would be greatly desired. Third, new
computer hardware and software technology have
reduced literature search costs significantly. For
example, in 1965 the search cost of a 500,000 record
file was approximately $1,000. Today, the search cost
for the same file is approximately $10. Fourth, the
present day requester of information has an almost
fanatical desire to know answers instantly.
In their book on online information retrieval
systems, Lancaster and Fayen have summarized hard-
ware developments over the years as follows:
Before 1940: mostly card catalogs and printed
book indexes.
During 1940-1949: the first application of
semimechanized approaches, including edge-
notched cards and the optical coincidence
(peek-a-boo) principle; the microfilm
searching system (the Rapid Selector).
• During 1950-1959: the first fairly widespread
use of punched card data processing equip-
ment; some early computer systems; further
microimage searching systems.
During 1960-1969: more general application
of digital computers to information retrieval
in an off-line, batch-process mode; some
experiments with online, interactive systems;
more advanced microimage searching systems.
From 1970 to the present: definite trend
toward design of online systems and conver-
sion of batch systems to the online mode.
As the summary shows, until recently all bibliographies
had to be compiled and edited by hand primarily from
printed indices. This laborious task was extremely time-
consuming and precluded the "immediate" response
time generally expected today by our "modern research-
ers" fighting those so-called "hot" issues, which have a
tendency to appear overnight.
One of the first attempts to use computers for
bibliographic literature search preparation was in 1966
by the National Library of Medicine (NLM) with the
Medical Literature Analysis and Retrieval System
(MEDLARS). This was a batch-oriented system requiring
the user to send his request to NLM. Delay in receiving a
reply was sometimes considerable.
It was soon recognized that the batch mode of
searching also had other deficiencies. Generally, the
requester had to rely on an information specialist to
conduct his search. The search could not be developed
heuristically, and the user lost another very important
option: browsability. Even manual methods allowed
these two aspects of searching.
Of course, the answer is immediately obvious to
those in the ADP field. Online real-time search systems
would eliminate these inadequacies. In 1965, there were
about 20 machine-readable data bases. Today there are
approximately 200 data bases available to the public.
About 50 of these are online systems. Thus, a trend
begun in 1970 has become the established mode of
operation in 1975.
Although computer storage and retrieval methods
are beyond the scope of this paper, they should be
mentioned briefly. Although other methods are avail-
able, either sequential or inverted file organization is
106
-------
most often used for bibliographic data systems. An
organization method must be chosen to suit the
computer hardware, the data characteristics, and use
requirements of the data. Use of inverted files with
indexed sequential access has the advantage of easy file
maintenance and allows search strategy evaluation using
Boolean logical connectors without actually retrieving
the citation or document data. Recently, however,
Gerald Salton has produced an alternative to the
inverted file which he calls clustered file organization.
Salton is very critical of inverted files and states that his
method requires less storage and allows more flexible
searching than other methods.
Historically, large data bases have utilized as much
of their particular computer facility as could or would
be allowed. They were so expensive to develop, main-
tain, and make available that in some cases only Govern-
ment funding allowed their existence. Lancaster and
Fayen state that the computer specialist has been
forced to extend computer technology to handle more
and more data. For whatever reason, it seems that
industry journals announce each month new technology
and innovations that allow more data in smaller spaces
and faster retrievals.
At first, use of the data bases was free of charge;
but as their use became more than a novelty, charging
systems were implemented and Government subsidies
were sought. Originally, the data bases were developed
privately to be used in-house. When it became recog-
nized that these systems had a wide user community,
they were made available through commercial data
centers such as the Lockheed Missiles and Space
Company (LMS) and the System Development Corpora-
tion (SDC). Recently, third party vendors or brokers are
beginning to make these systems available as self-
supporting, profitmaking enterprises.
Although many search systems are commercially
available, probably the two most generally available
within the EPA, and possibly to the public in general,
are the ORBIT system from SDC and the DIALOG
system from LMS. Medical Literature Analysis and
Retrieval System Online (MEDLINE) is also widely used
within the Agency. Even though it is no longer main-
tained by SDC for NLM, its search system is essentially
an ORBIT implementation.
Both DIALOG and ORBIT search languages
accomplish the same end; that is, selection of a number
of documents pertinent to a given topic based on a user-
formulated search strategy. For demonstration purposes,
suppose we wish to find citations/abstracts on the topic
of: "How do sulfur oxides get into the Detroit River?"
The DIALOG system was designed for indirect searching
as illustrated on our hypothetical inquiry:
? S SULFUR; S OXIDES; S DETROIT; S RIVER?
1 5324 SULFUR
2 9760 OXIDES
3 290 DETROIT
4 851 RIVER?
?C 1 and 2 and 3 and 4
5 53 1 and 2 and 3 and 4
? S SULFUR(1 W)OXIDES); S DETROIT(W)RIVER
6 3561 SULFUR(1W)OXIDES
7 65 DETROIT(W)RIVER
? C 6 and 7
8 28 6 and 7
The select (S) command causes numbered subsets of the
inverted file to be set aside and displays the number of
postings or "hits" for the selected key. The subsets are
then available for use with other DIALOG commands.
The combine (C) command allows Boolean logic to be
used with the set numbers. The"?" is the computer
prompt that it is expecting a command. Also, it may be
used as shown in set 4 to indicate to the system that
right-hand truncation of that key is desired.
Another feature of the system is shown in sets 6
and 7. This feature allows the user to request that the
documents to be retrieved must contain the stated keys.
Additionally, these keys must occur within a specified
number of words of each other (string searching). This
allows the user to request that the documents be more
specific; that is, the key "sulfur" must occur within one
word (disregarding insignificant words such as "the") of
the key "oxides." Obviously, this would eliminate
documents occurring in set 5 which might have discussed
sulfur compounds and nitrogen oxides together with
some other river besides the Detroit River.
107
-------
The ORBIT system was designed to allow a
searcher to enter his search strategy directly as
illustrated:
SSI
SS2
SS3
USER: SULFUR
PROG: PSTG (5324)
USER: I AND OXIDES AND
DETROIT AND RIVER
PROG: PSTG (53)
USER: STRS :SULFUR OXIDES:
PROG: PSTG (30)
The ORBIT system informs the user of the subset
number it will next form (SSI). The system then
prompts the user for his input by displaying "USER:".
Its replies are always preceded by "PROG:" to indicate
that it has control. Four separate sets may be selected
and then combined into a fifth as shown with DIALOG.
Alternatively, only one set with the final 53 postings
need be created, whichever the user desires. This system
also allows string searching as shown in SS3; however, a
previous set of postings must be searched.
The indirect approach is more helpful to the
inexperienced user, but the direct approach would
probably be preferred by the more experienced user.
Both of these commercial systems DIALOG and ORBIT,
allow some form of display of the. inverted file keys
together with their statistics to help the user decide on
his strategy as the search proceeds. A variety of terminal
and off-line print formats are available. Both systems are
used on "indexed" and "free-text" or "natural lan-
guage" data bases, depending on certain fields declared
when the data base is loaded.
Although it is impossible to state a monetary value
for bibliographic data bases, they appear to be providing
a useful service to researchers. Williams estimates that in
1965 there were only 10,000 users, whereas in 1975
there were over a million in the United States and
2
Canada.
Presently the EPA Research Triangle Park (RTP)
library has approximately 30 online data bases available
to be searched for its user community. On a monthly
basis, the RTP library averages about 50 to 60 subject
requests from about 175 individual researchers and
spends approximately $2,000 for these inquiries. The
usage of this library is an indicator of the value of
bibliographic data bases to the general user community.
REFERENCES
1 BurchinaJ, L.G., "Recent Trends in Communication
of TSI," Bull. Amer. Soc. Information Sci. 2(3):9,
1975
2 William, M.E., "Use of Machine-Readable Data
Bases." Presented at 38th American Society for
Information Science Annual Meeting, October
26-30, 1975, Boston, Massachusetts.
3 Lancaster, F.W. and Faycn, E.G., Information
Retrieval On-Line, Los Angeles, California: Melville
Publishing Co., 1973.
4 Sal ton, G., "Dynamic Document Processing,"
Association for Computing Machinery Communica-
tions. 15(7):658-668,July 1972.
108
-------
BIOLOGICAL DATA HANDLING SYSTEM
(BIO-STORET)
By Cornelius I. Weber
The need for a functional, computerized system to
handle data on the communities of indigenous aquatic
organisms in the Federal water pollution control pro-
gram has remained unfulfilled for nearly 20 years. The
water quality data storage and retrieval system
(STORET) developed in 1957 has been adequate for the
storage and manipulation of physical and chemical data,
but despite many improvements, it still lacks the ability
to accommodate the hierarchical structure of biological
data. The deficiencies in STORET have prevented the
computerization and analysis of the bulk of the data
from nearly 30,000 plankton samples collected during
the operation of the National Water Quality Network,
and from large numbers of other biological samples col-
lected by current programs of the EPA and other Feder-
al, State, and private agencies engaged in studies of the
aquatic environment. Furthermore, many State pro-
grams have delayed the collection of biological samples
until an adequate EPA computerized biological data
handling system is available.
The mandate for the collection of data on com-
munities of indigenous aquatic organisms contained first
in Section 4(c) of the Water Pollution Control Act of
1956 (Public Law 660) was greatly expanded in the
1972 Amendments to the Federal Water Pollution Con-
trol Act (Public Law 92-500). This legislation contained
direct or indirect reference to the need for the collection
of biological data in at least 15 sections, and emphasized
the need to restore and maintain the biological integrity
of the Nation's waters, thus ensuring the protection and
propagation of fish, shellfish, and wildlife. It also made
numerous references to the need for the collection of
data on the effects of pollutants on the diversity, pro-
ductivity, and stability of communities of indigenous
aquatic organisms, necessary to achieve the overall objec-
tives of the Act.
In the spring of 1973, the staffs of the Monitoring
and Data Support Division, Office of Water Programs,
and the Aquatic Biology Methods Research Program,
Office of Research and Development, initiated a project
to develop a new computerized data handling system
(BIO-STORET) for the storage, retrieval, and analysis of
field and laboratory biological data. These data are con-
cerned with the structure and function of communities
of indigenous aquatic organisms and are currently being
generated by the EPA as well as by other Federal and
State agencies whose programs include inland, estuarine,
and marine water quality monitoring, compliance moni-
toring, and studies of the effects of ocean-disposed
wastes and heated water discharges. Such programs are
mandated under Sections 104, 106, 308, 314, 316 and
other applicable sections of the 1972 Amendments of
the Federal Water Pollution Control Act. Communities
of indigenous aquatic organisms provided for in the
system include phytoplankton, zooplankton,
meroplankton, periphyton, macroalgae, macrophyton,
macroinvertebrates, and fish (Figure 1). It was agreed
that the responsibility for the development of BIO-
STORET rested with the Office of Research and Devel-
opment and that, once the system became operational, it
would be supported by the Office of Water Programs as
a companion to the Water Quality File. Management of
the BIO-STORET system was to reside in the Informa-
tion Access and User Assistance Branch.
Contracts awarded in 1973 resulted in the prepara-
tion of a system requirements specification, a system
design specification, and master tuxonomic (6,000
species), parameter, and station files to be included in
the initial system. An ad hoc steering committee, con-
sisting of senior biologists representing a cross section of
EPA regional and research programs and personnel from
USGS and NOAA, was organized to assist the contractor
in defining the system requirements and design. Further
work on the system was delayed until 1975 because of
the lack of funds.
The current contract with MRI Systems Corpora-
tion, Austin, Texas, was awarded in March 1975 for
development of the detailed software design, software
coding and debugging, and system implementation.
BIO-STORET will utilize SYSTEM 2000, a generalized
data base management software package marketed by
MRI Systems Corporation, and available on OSI and the
EPA Univac 1110 at RTP. The project schedule calls for
the completion and implementation of the BIO-
STORET system on OSI in the spring of 1976.
The preliminary design specifications for BIO-
STORET in the new SYSTEM 2000 software environ-
ment were completed October 17, 1975 (Figure 2).
Copies of the design specifications are available from
Cornelius I. Weber, Aquatic Biology Section, Environ-
mental Monitoring & Support Laboratory, U.S. Environ-
mental Protection Agency, Cincinnati, Ohio 45268.
109
-------
COMMUNITY
SAMPLING
METHOD
ANALYSES
DATA
r
-
1 PLANKTON
[ PERIPMYTON | [ MACHOPHYTON j | MACROINVEHTS | 1 FISH 1
1 I
GRAB
SAMPLE
FIELOa LAB
PHYSIOL STUDIES
PP. RESP. N^
BlOASSAY
AGP. TOX
BIOMASSa
PIGMENT
CHEM
ANALYSIS
CTS a 10
PHYTOPLANKTON
200PLANKTON
^ NETTED
SAMPLE
8IOASSAY
BIOMASSa
PIGMENT
^CHEM
ANALYSIS
CTS & ID
*— PHYTOPLANKTON
ZOOPLANKTON
PP. RESP. Njli.
Spocici Lilt
ctl/lpocirt-Oiv. Ind;
Diat. Sp. Prop
chl A. B. C pfieo A
WW, DWS. AFW, TOX
Chem Content
Imotali. poit, radl
_J NATURAL i
\ SUBSTRATE |
HARTlFlCAL
SUBSTRATE
FIELD & LAB
_ PHYSIOL. STUDIES
PP, HESP. N?
— j BlOASSAY |
BIOMASSa
PIGMENT
_ CHEM
"" ANALVSIS
CTS a ID J
— ALGAE. PHOTOZ
FUNGI. BACT |
PP. RESP. N? f»
Specie* Liu
cH/ipcclM-Div. Ind.
chl A. B. C. qheo A
TOX. Chem Content 1 metal.
post, rod)
WW, OW, AFW
-^ CORER |
•1 FRAME |
FIELD a LAB '
l_ PHYSIOL. STUDIES |
PP. RESP l_
-| BlOASSAY ]
•-{ - fllQMASS ]
_ CMEM
~ ANALYSIS
-1 CTS a ID 1
PP, RESP
Spociet Lut
ical Coverage
i/ipecierDiv. Ind.
hi A. B. pheo A
WW, OWT, AFW
OX, Chem Anal
metat. pett, rod)
^ GRAB J"|
-j CORER |
-j DREDGE""]
^ ARTIFICIAL
SUBSTRATE
1
—j BlOASSAY ] i
1
-JCHEM ANAL] j
i
, . i
•^ CTS a ID | |^
-j BIO MASS 1
Specie* Uit.
cti/tpecioi
Oiv. Ind.. WW. DW.
AFW
TOX, Cham An.)
(mtttli. pen. rod)
Hlttopath, taint ng
Qenihic Oxygen
Demand
-r-^i-i
— j THAWL |
— | SHOCKER
— [ CHEMICAL
1 HOOKA 1
™| LINE |
— | GILL NET
4 TRAMMEL
NET
— 1 HOOP NET
-| FYKE NET
-^ SLATTRAP |
-J BlOASSAY
— | CTS a ID
-J CONDITION
IWGT, LENGTH!
IAGE |
^ TISSUE
ANALYSIS
Specie* (ill
ct*. length, Wgt
Age. Oeiri Anal
Hiiiopctn
Tsiniini)
Figure 1
Types of Biological Data Accommodated
by BIO-STORET
INPUT
HEW DAT* ^ Urn
CORRECTIONS ' SORT/
A
RETRIEVAL
SEARCH . S-2000
LANGUAGE
,/ ^
,DAI
REPORTS 'em
\PRC
\^^^^s~~ *
SUSPENSE
FILE
fl
L ERRORS
U^
\t
ITV EDIT
^
MERGE ' PROGRAM
v
| ERROR
HbPOH
I "
>
T
SPECIES
MASTER FILE
BIO-DATA A
FIIF T
CJ UPDATE >•
^ pnnr.BAM ^
J x
-1. \
f
ITRANSACTIOIJ
- ^ REPORTjJ
A FILE ^"
POST |
CESSING
PARAMETER
7) MASTER FILE
v^ INPUT
CALCULAT i
HNS*
* TALLV CONVERSION
' CHLOROPHYLL CALC
* DIVERSITY INDEX
UPDATE
FILE
VALIDAT
DATA
ED
Figure 2
BIO-STORET System Flow Chart
110
-------
UTILITY OF STORET
By C.S. Conger
Webster's Dictionary lists several definitions of the
work "Utility," including: "being of a usable but
inferior grade," "capable of serving as a substitute in
various roles or positions," "kept for the production of a
useful product rather than for show or as pets,"
"designed for general use," "the quality or state of being
useful," "something useful or designed for use."
STORET fits all the above descriptions and even a few
others, at different times. However, it would be best if
all but the first applied to STORET.
"Who, what for, and how" are questions which can
be asked about STORET. There is no way to give a good
representation of "how" in a short time, so this discus-
sion will describe the "what for" and "who" of
STORET.
The "what for" might be broken into basic func-
tional areas of interest, which include: data manage-
ment, water quality assessment, water quality
management, and water pollution control program
management.
Management and operation of STORET is primarily
a data management function that should support the
water program. Responsibilities for data management
include providing efficient and reliable software that
meets user requirements with regard to types of data
accommodated, adequate controls for data entry and
data quality, and appropriate analytical and retrieval
routines. User training in the system is included in this
responsibility. Data management adapts to the programs
and activities it functions to support. It is not a "stand
alone" activity with a mission of its own; it is a support
function.
The water quality assessment function is concerned
with all aspects of collection and analysis of water
quality samples and with reporting the findings and
trends of these analyses. Areas of responsibility include
monitoring strategies, method of sample collection, and
method and quality control of analytical procedures.
The monitoring strategies should represent the questions
to be answered by the reporting function so that they
can provide appropriate data. The water quality assess-
ment function relies heavily on STORET as a data
handling tool. However, the groups charged with water
quality assessment are responsible for the data collected
and the reports produced from the STORET data. As •
procedures and controls for data collection improve, the
data in STORET will improve as well. In actual practice,
the easy access to previously collected and analyzed data
provides data for analysis of problem areas and for
suggested improvement priorities. As the definition of
questions improves, STORET retrieval capabilities will
continue to be extended to answer these questions.
Water quality assessment plays a rather passive role
since it does not directly have an impact on the state of
the Nation's waters; however, water quality management
plays an active role. Its functions include: the 303 Basin
plans, which recommend approaches and priorities for
pollution control (based on water quality assessment and
abatement technology), the permit program, which
establishes and monitors effluent limitations, and the
standards activities, which provide guidelines for attain-
able water quality goals. Water pollution control actions
are more direct than water quality management, which is
an applied technology. As such, it is supported more by
experience than by scientific procedures. Data handling
techniques and data requirements for water quality
assessment activities are much more precise and rigorous
than those for water quality management activities. In
many of the latter activities, the use of data replaces or
enhances the intuition and experience that is basic to
applied technology and engineering. The pooling of data
and the utility mode of STORET operation strongly
support the water quality management activities. The
bulk of the responsibility for water quality management
activities rests in the States, and the tasks are jointly
performed by the Regions and the States. Both water
quality assessment and water quality management would
benefit from improved communication of needs and
priorities among the various groups involved. Use of a
common data base, such as STORET, could serve as a
starting point for communication improvements.
Management of the water pollution control
program is essentially the responsibility of those making
decisions within the Agency. Water quality assessment
activities support this function by measuring progress in
improvement of the Nation's water quality, but these
activities do not necessarily identify the actions or
decisions which were most effective in producing this
improvement. Identification may require a management
information system that can correlate the type of
111
-------
actions and the expenditures in certain areas (specific
basins, States, or regions) with the assessment of water
quality for that specific area. Aggregation of data on the
Agency level may not provide the detail needed for
correlation of cause and effect. Nevertheless, the data
handling requirements to support the program manage-
ment activities would not be directly related to
STORET. Knowledgeable analyses of the data within
STORET would possibly provide insight into the water
quality progress but STORET cannot answer questions
on which programs have the greatest impact on pollution
abatement and on where funding priorities should be.
Now let us combine more "what for" with "who."
National-level users of STORET are primarily concerned
with research and with water quality assessment. These
users, and their respective functions, are included in the
foil owing groups:
Office of Research and Development: ORD
use involves all appropriate phases of research
to understand the causes and effects of pollu-
tion, the majority of which is conducted
under Title 1 of PL 92-500.
Office of Water Planning and Standards: The
principal use has been for preparing 305(b)
reports submitted by the States, and for
responding to queries from other agencies
with a national perspective.
Council for Environmental Quality: 'CEQ
prepares annual reports on the environment.
National Commission on Water Quality:
NCWQ recently submitted a draft report on
the technological aspects, as well as the social,
economic, and environmental impacts, of
achieving or not achieving the goals for 1983.
Regional and State roles overlap in many cases,
particularly at the present stage of implementing
PL 92-500. While activities related to water quality
assessment are conducted on the regional and State level
(305(b) reports and Surveillance and Analysis Divisions
activities), primary emphasis is on water quality manage-
ment activities such as:
Establishing standards - Primarily a regional
responsibility, but the States are required to
make recommendations for changes.
Permitting-Equally divided between the
States and regions; approximately half of the
States have permit authority. Review is a
regional responsibility.
Planning - Performed at several levels, starting
with 303 Basin plans prepared by the Slates
and reviewed by the regions. Various inter-
state commissions will be involved in areawide
planning.
Significantly, STORET has been a major data handling
tool supporting all these various functions on the
national, regional, State, and local levels.
The division of responsibilities for the various
components of STORET across organizational lines has
had an impact on STORET system management
effectiveness. The components, or functions, and the
respective organizations responsible are summarized as
foil ows:
Operation and maintenance: Data Processing
and User Assistance Branch, Monitoring and
Data Support Division (MDSD), Office of
Water and Hazardous Materials (OWHM)
Policy decisions: Water Quality Analysis
Branch, MDSD, OWHM
Vendor for computer resources: Management
Information and Data Systems Division
(MIDSD), Office of Planning and Management
(0PM)
User (storage): monitoring and data collection
activities are found in various divisions within
the Regions, States, other Federal agencies,
and Office of Research and Development
(ORD). Monitoring Branch, MDSD, OWHM, is
also responsible for other aspects of
monitoring and data collection.
User (retrieval): found in all divisions of the
Regions, States, and ORD, but may not be the
same group responsible for collection and
storage of the data.
Designated STORET contact: within the
Regions, this individual may be located in the
Planning and Management Division, the Water
Programs Division, or the Surveillance and
Analysis Division.
112
-------
From this division of responsibilities, it is seen that
decisions made in one group with one line of manage-
ment have a major impact on several other groups.
Problems of communication and management occur
because of the fragmentation of functions, and this
fragmentation detracts from effective utilization of
STORE! and from a consensus regarding its future
priorities.
STORE! plays a dual support role. "How"
STORE! does this is by providing utility support to the
Regions' and States' water quality assessment and water
quality management activities, and by providing a data
base for national reporting of trends and progress.
STORE! serves a utility role both by sharing software
(providing for efficient storage and retrieval of data) and
by pooling data in a common format (providing access
to data collected by others). The utility role is the more
successful of the two functions. The other role, national
reporting of trends and progress in water quality, is a
complex and highly technical responsibility, particularly
when performed at the national level without benefit of
localized knowledge of data, such as seasonal variations,
type and location of major sources of pollution, natural
background, and other variables that can have a sig-
nificant impact on analysis and interpretation of data.
Improvement in the role of a national data base will
depend to a large extent on a better definition of the
questions to be answered.
Other major STORE! uses are explained below.
Over 75 percent of the retrievals from
STORET are in direct support of PL 92-500.
The three major functions supported are
reporting (such as 305 (a and b)), planning
(303 Basin plans), and surveillance
(Section 104).
Users (in the utility mode) retrieve data pri-
marily from their own waterways as this
coincides with their area of responsibility or
jurisdiction.
Usage of the data by the data generators
(through direct access) contributes sig-
nificantly to improved data quality. The
generators then have a vested interest in main-
taining quality in a specific data base since
they will benefit from its use.
Detailed reporting by the States for the
305(b) reports provides a detailed national
inventory, when taken collectively.
Pooling of data contributes significantly to the
value of the data base for both utility use and
national reporting use.
The groups responsible for data collection and
storage are not the only users of the data.
Many groups are concerned with only retrieval
and use of the data and may not have a voice
in the priorities for data collection or
monitoring.
STORET is not a management information
system. STORET retrieval requires technical
analysis to produce a-visible product or a base
for management decisions.
Increased usage appears to be related to
increases in use of quantitative data for
decisions previously made by intuition and
experience.
From a review of the STORET mode of use, it is
obvious that STORET has had a significant impact on
improving water pollution control and abatement
procedures. With good data handling, those national or
local organizations concerned with water pollution
control and abatement can choose to use scientific and
technical information to make their decisions. Without
good data handling capabilities, they have no choice.
Taking a little liberty with Webster, another definition
of utility would be: STORET Utility - Having the
quality or state of being useful, designed for general use
to serve various roles in water quality management, and
kept for producing a useful product.
113
-------
THE USES AND USERS OF AEROS
By James R. Hammerle
BACKGROUND
The United States Environmental Protection
Agency's comprehensive air pollution information
system, the Aerometric and Emissions Reporting System
(AEROS), is a valuable tool for managing the national
air pollution control program effectively. Each of the
two major subsystems of AEROS, the National
Emissions Data System (NEDS) and Storage and
Retrieval of Aerometric Data (SAROAD), is described in
detail in this paper. SAROAD is the established Federal
data system for storing ambient air quality data from the
air monitoring activities of State, local, and Federal
agencies. NEDS, on the other hand, contains annual
emissions and operating characteristics of individual
emitters throughout the Nation. NEDS and SAROAD
data are submitted regularly to EPA by all of the States
in accordance with mandatory Federal reporting
requirements.
It became apparent in 1973 that additional data
were needed independently by the data bank users. For
this reason, the concept of AEROS was expanded from
an integrated NEDS/SAROAD data system to one
encompassing other information systems such as test
data, hazardous air pollutant sources, and computerized
air pollution regulations. To date, the EPA air data
systems have been used primarily by governmental
agencies although the private sector is becoming more
aware of the advantages in utilizing a common data base
in conjunction with Government representatives. Thus,
it is expected that the EPA air data systems will be used
more frequently by private groups interested in
influencing governmental decisions.
The Aerometric and Emissions Reporting System
(AEROS) is comprised of input forms, programs, files,
and reports established by EPA. These elements enable
the EPA to collect, maintain, and report information
describing air quality, emissions sources, and so forth.
Although a great deal of the efforts and activities
supporting AEROS are concerned primarily with the
collection and maintenance of data, the primary purpose
of AEROS is providing reports and computerized data
on ambient air quality and emission sources. The input
forms, procedures, programs, files, and reports are the
basic structural elements of AEROS and, under the
management of EPA, form a comprehensive system for
collecting and reporting air quality and emissions data.
The purpose of AEROS is to provide hard data and
basic information under the following requirements
specified in the Clean Air Act:
Evaluation of plans and strategies to meet
national ambient air quality standards
(including air pollution modeling)
Evaluation of emissions and control equip-
ment for the development of new source
performance standards
Support of hazardous pollutants investigations
Determination of the status, projections, and
trends of air pollution for reports and progress
evaluation
Studies on fuels, their usage, and availability.
AEROS is a general purpose data system which attempts
to meet the general needs of a large number of users. It
does not attempt to meet all of the needs of any of the
users, and in fact, does not meet any of the requirements
of certain potential users because of resource availability
and certain other factors.
WHAT DATA ARE AVAILABLE?
Data Files
Data should be ordered in nonduplicative groups
into many separate files tied together by identifiers. The
groupings should be determined by the size, frequency
of access, and available storage media (disk, tape). The
same criteria should be applied to the decision on access
speed (online-interactive, remote batch, etc.). Some
duplication of files may be necessary if interactive access
is considered.
AEROS
The AEROS system was developed over a long
period by different groups before the concept of inte-
grated data files and common data base management was
accepted. Therefore, there is some duplication among
114
-------
the files which are not associated with the interactive
system. Furthermore, as other program elements develop
systems, there is a tendency to control files, which
usually contain data duplicative of existing files, or to
access existing files to create additional files. If
permitted to continue unchecked, this practice will
increase required storage space needlessly and will result
in a string of unmanageable files in various stages of
currency.
Research Data
Not all the data collected are intended for inclusion
in AEROS. The primary purpose of AEROS is to serve
general national needs; therefore, monitoring research
data, short-term emission rates for models, or other data
of short-term interest are not included. However, if these
data are submitted in proper format by the collectors,
they will not be rejected.
WHERE DID THE DATA COME FROM?
Basically, Federal regulations require the submittal
of data in accordance with State Implementation Plans
(SIP). Extensive reports are available indicating the types
and sources of data received. The following statements
are universally applicable to the major data banks.
Air Quality Data
Originally air quality data were obtained vol-
untarily from State and local agencies in exchange for
provision of reports for State and local use. EPA
research programs operated a 275-site network, per-
forming all associated operations. The network was
decreased in size and decentralized to Regional Offices
(RO). Federal regulations require submittal of ambient
data from networks as specified in SIP's.
Emissions Data
Initial efforts were centered on collecting emission
inventories, in hard copy form, in the "first" 32 Air
Quality Control Regions (AQCR) defined. States were
provided funding to develop emission inventories in con-
junction with SIP preparation. Summary data were to be
submitted and detailed data kept on file. EPA attempted
to collect all available data from States, local agencies,
other Federal agencies, research projects, and other
locations to create a base-line nationwide emission
inventory. Currently, States are to submit emissions data
in accordance with Federal regulations designed to main-
tain up-to-date information. Other EPA activities
yielding emissions data are required to submit them to
the banks in order to reduce duplication and
redundancy.
HOW "GOOD" AND "COMPLETE" ARE THE DATA?
Quality assurance techniques for ambient mon-
itoring have been developed by ORD; however, there has
been no effort to address the matter of quality assurance
in the emissions inventory area. The question concerning
the data quality cannot be answered because there ire
no existing methods for quantitatively defining the
assurance which a user can apply to the data. In general,
there is more confidence in the quality of the ambient
data than the emissions data.
The completeness of the ambient data may be
considered from several perspectives:
More sites are in operation than required by
SIP's (for certain pollutants in certain areas);
therefore, more data than necessary are being
collected (about 300 percent over the required
amount of data).
State and local agencies manipulate the data
and move monitoring sites, thereby causing
gaps in the full picture of data and making
statistical analysis difficult in some cases.
Certain States and RO's are usually late in
submitting the data and, therefore, are less
complete.
The completeness of emissions data may be viewed
according to the following:
Some States and local agencies still do not
have a usable emissions inventory.
Certain States and local agencies have done an
inadequate job of collecting the most basic
information for calculating emissions; there-
fore, even though the sources have been
identified, considerable information is missing.
Some States have deliberately withheld data
about selected sources; in some cases, RO's
have known of the existence of sources but
made no effort to include them in the system.
115
-------
WHAT DO THE DATA BASE MANAGERS DO?
WHO ARE THE USERS OF THE DATA?
Considering the basic areas of responsibilities, i.e.,
data collection, communications, computer facility, data
system, files, and software utilities, the data base
managers perform the following functions in the two
applicable areas of data system and files:
Development of data system, including feasi-
bility study, design, programing, testing,
documentation, maintenance, and user
surveys.
Definition and creation of files, maintenance,
add/delete/change actions, security, and
auditing/anomaly investigations.
Engineering (air pollution) necessary for calcu-
lations and statistics generation internal to the
system.
Guidance is also provided by the data base
managers with respect to data collection. No efforts are
possible for interfacing data files with software utilities
or for custom retrieval/analysis programing given the
current level of resources. Data collection is by State and
local agencies with overview of RO's. Communications,
computer, and software utilities are the responsibility of
the facility managers.
It is truly surprising to find a large number of EPA
personnel in headquarters and regional offices who do
not understand these divisions of responsibilities. On the
other hand, perhaps they do not accept these divisions.
Most importantly, a conscious decision must be
made by management concerning what the data systems
are to accomplish and then to commit the necessary
resources to support the desired level of accomplish-
ment. The capabilities of ADP personnel must be
thoroughly understood. Furthermore, it must be
acknowledged that the crisis mode of operation can
destroy a data base. Every operation, with few excep-
tions, must be viewed as a routine procedure; otherwise,
the integrity of the system and the data base will be
damaged. Management must also enforce the concept of
nonredundant files so that valuable computer storage
capability and operating time will not be wasted.
Finally, the philosophy of data base system opera-
tion must be uniform throughout the Agency. Other-
wise, users who do not understand the differences
among the many systems, will begin to use selected
systems for uses for which they were not designed.
Users of AEROS may obtain data by the following
methods:
Use of existing batch and remote batch
reports
Use of existing interactive system reports
Special requests to system/data base managers
Writing of programs directly extracting data
from defined files.
To date, the users have been identified as follows:
OAQPS, OAWM 50%
RO's 25%
Other EPA and public 25%
From another viewpoint, the users may be cate-
gorized according to the following:
Persons or organizations for whom the system
was designed and who, in general, have the
majority of their needs met
Persons or organizations who originally played
a large part in the design of the system but
currently do not use the system for some
reason
Persons or organizations not intended to be
users who now insist on using the system to
meet their needs.
There is another class of "users" which makes
decisions without using available data, pretending that
the data are nonexistent or "no good." Of course, it is
easier to work without data, since data analysis is usually
complex and time-consuming. However, this attitude
cannot continue because sooner or later someone else
will use the available data, come to a conclusion in
conflict with the one previously made, and crisis/cover-
up becomes the mode of operation.
Thus, it is very difficult to determine exactly who
is a bona fide user and what the data needs are. This is
further complicated by an inability to convince the users
that the cost/benefit relationship indicates that all their
116
-------
The SEAS ABATE module computes the cost of
pollution abatement for approximately 500 abating
sectors. These sectors may be either components or
combinations of INFORUM sectors and INSIDE sub-
sectors. Capital investment costs and operating/mainte-
nance costs are computed for each abating sector based
on: predicted treatment requirements; average plant size;
and capital requirements for catchup, expansion, and
replacement of pollution control equipment. The
ABATE module also acts as the mechanism through
which the economic impacts of pollution abatement are
incorporated into the economy as a whole. The funda-
mental, or default, assumption of both the RESGEN and
ABATE modules is that all relevant Federal pollution
control requirements are complied with on schedule.
Each of the above-mentioned SEAS modules
generates output files which are used as primary data
sources by the remaining system modules. Other mod-
ules in SEAS include:
1. Solid Waste: this module estimates the annual
tonnage at the national level of solid waste generated
from all sources except pollution control (RESGEN
module). Twelve materials categories (e.g., paper, glass,
ferrous metals) and 20 product categories (e.g., news-
paper, furniture, batteries) are considered. Estimates are
computed for:
Type of disposal facility-municipal or private
Method of disposal-incineration, open
dumping or landfill
Disposal cost
Levels of recycling and resultant secondary
residuals
2. Regionalization: this module disaggregates
national residuals estimates from the RESGEN module
and national economic outputs from INFORUM and
INSIDE into eight regional allocations including the
following: states, Standard Metropolitan Statistical
Areas (SMSA), major river basins, and minor river basins.
Economic and residual shares are computed at the
county level, and then reaggregrated to the desired
regional allocation. The base year (1971) data from
which the shares are computed is obtained from the best
available source, with default values determined from an
employment distribution survey. The projected change
in these shares over time is computed from Department
of Commerce projections (OBERS 2-digit SIC) for all
industries except electric utilities, which use industry-
published planning data.
3. Transportation: the transportation module
forecasts the demand for automobile, bus, truck, rail-
road, and airline travel as vehicle miles traveled for both
passenger and freight purposes. Based on these data, it
estimates the total emissions produced by these sources.
Emissions are computed in annual tonnage at the
national level and may be disaggregated to the State and
SMSA levels.
4. Energy: this module forecasts the energy
demand resulting from the economic projections, by fuel
type, for six user categories. The energy module is
currently undergoing extensive revision.
5. Raw Materials: the STOCKS module estimates
annual demand levels, relative price changes, capital
investment, and import/export levels for 27 categories of
raw resources.
6. Nonpoint Source Residuals: this module
estimates the annual contribution of waterbome pollut-
ants from agricultural and urban land use. It is currently
in the test phase.
At the direction of Dr. Wilson K. Talley, EPA's
Assistant Administrator for Research and Development,
an ad hoc review panel was established by the Executive
Committee of the EPA Science Advisory Board. The
review panel was to assess the current status of SEAS
and to make recommendations for its future develop-
ment and application. This panel was headed by Nobel
Laureate Wassily Leontief, who is currently at New York
University. It recently completed its evaluation and
made the following recommendations in its report:
The SEAS system should be maintained in
EPA and its utility enhanced by a carefully
structured and independently reviewed pro-
gram with the objectives of increasing the use
of SEAS, developing a better data base,
verifying results, and improving the structure.
Encourage the use of SEAS and broaden the
base for constructive criticism by a program to
increase the visibility of SEAS to EPA Head-
quarters and Regions, regional and local
officials and environmentalists, and other
Federal agencies. Use for the Cost of Clean Air
and Water Report is one example of a recom-
mended use.
119
-------
Improve the residual coefficients in order of
priority based on sensitivity tests to determine
sectors of activity having the greatest impacts
on residual generation and those sectors of
high regulatory importance.
Cooperate and share development costs with
other agencies to develop modules and the
corresponding data of mutual value to the
agencies involved. Where these efforts are
undertaken, the responsibility for developing
models and collecting the data should rest in
the other agencies so as not to detract from
EPA efforts to improve SEAS for its own use.
A senior-management-level review team has been
established to evaluate alternative ways of implementing
the recommendations of the SAB review panel. The
review team includes personnel from the Office of
Planning and Evaluation as well as ORD. They will
evaluate the alternatives within the practical limitations
of existing budget and manpower. It is anticipated that
final recommendations on resource levels and direction
of developmental and operational activities will be
forthcoming.
Figure 1
SEAS Flow Diagram
120
-------
IMPROVING THE UTILITY OF
ENVIRONMENTAL SYSTEMS
By Donald Woriey
Four random points are presented in this paper to
help frame questions in the session which follows. These
four points are in the form of questions:
What is an environmental system?
What are the environmental systems of EPA?
Is our criticism valid?
How have we determined our need?
Your view of the utility of environmental systems
is dependent upon your particular backgrounds. The
"old line" water quality people praise STORET as do
those who have learned SAROAD. Those people coming
to either system as a new user with preconceived ideas
tend to criticize. Far too often their criticism is aimed at
the wrong target.
To many people, SAROAD is a collection of
computer programs written for IBM equipment and
converted to Univac equipment. To others, SAROAD is
17 magnetic tapes of air quality data which may be used
on the Univac 1110. In fact, neither of these opinions is
correct. If SAROAD is viewed as an environmental
system, then it includes much more than computer
programs and data. Figure I is a simple graphic represen-
tation of SAROAD.
For a proper understanding of SAROAD, the
following aspects of the system are important:
The data is initially collected by State and
local agencies
The data is collected by a monitoring network
that has been planned with Federal assistance
and implemented by Federal money in many
cases.
The data is submitted through our Regional
Offices to the National Air Data Branch of
OAQPS
Other aspects of this system.
Many Federal, State, and local policies are involved in
this environmental system. Each component must carry
its responsibility for the system to work effectively.
Figure 2 represents a view of some of the EPA
environmental systems. Each system performs its
mission uniquely, but each is separate. Little effort has
been made to relate the methods or contents of these
individual systems.
Few would champion a single EPA environmental
system: however, there are existing techniques for
relating common items within ihese systems. This rela-
tionship would not necessarily be a complex computer
relationship, but rather could be a simple catalog of all
environmental systems. At least, the catalog could detail
places of reference for problems.
Criticism is a favorite American habit, and
criticizing information systems is an easy thing to do.
Much of our criticism is not unique to EPA systems. For
20 years, we in the computer field have been attempting
to implement a system that will answer "any" question.
This goal has not been achieved. Our current level is to
answer the questions that we planned to answer.
Data management systems are the results of our
search for the general solution. These systems are a large
step forward from custom-designed programs, but their
development is still ongoing. EPA information systems
must rely upon these infant systems and some problems
and limitations must, therefore, be expected.
Finally, we must consider our need for information
systems. In the past, our group as well as other organiza-
tions have been easy prey for the salesman. The salesman
brings a tool, and we try to find a way to use it.
Unfortunately, the job we are doing docs not need the
tool in many cases. In fact, the tool that is unnecessary
prevents us from obtaining the tool we need.
This discussion has been presented at an ORD
workshop to focus on the fact that the users must help
in defining the requirements for environmental systems.
121
-------
10
ENVIRONMENTAL
POLICY
STATE 5 LOCAL AGENCIES
MONITORING SYSTEM
EPA STAFF
SAROAD
COMPUTER 6
PROGRAMS
OPERATING
NEEDS FOR
AIR QUALITY DATA
•
ENVIRONMENTAL
ATTITUDES
Figure 1
SAROAD
Figure 2
The Information System Maze
-------
SUMMARY OF DISCUSSION PERIOD - PANEL IV
The discussion from the session on utility of environmental data systems is presented below.
STORE! Costs
First, it was stated that the annual costs to operate and maintain STORET are probably around SI million. In the
INFOMATICS study, which addressed the idea of moving STORET to the Univac at RTF, it was estimated that the move
would cost between $7 and $11 million. The study -used the word conversion, but the move might be better described as a
reimplementation of a system design.
Survey Data Directory
It was pointed out that MIDSD's Information Systems News recently announced the publication of a loose-leaf
directory of survey data studies. One panel member interpreted "survey data" to mean a particular data collection effort and
results of that data collection as opposed to a whole system. It was suggested that a directory of surveys, related data files,
and originators might be made available through a bibliographic data base system, but that a data base should not be created
only because someone might possibly need the information.
STORET Routines
It was asked how to find out whether a program exists for a particular analysis of data in STORET. This information, it
was pointed out, might be available through one's resident or Washington STORET contact. Normally, if a user produces a
well-documented routine that could be of general interest, it will be made a standard part of the system. In addition, a list of
routines people have used against the data base is maintained by the Washington STORET office. Information is also
disseminated through meetings of active STORET users or potential users. This form of information interchange between the
users is encouraged. Unfortunately, water quality does not have an ADP coordinator group set up like ORE), so the only focal
point is at the Assistant Administrator level.
Functional Decisions
The management or functional decisions which demand immediate (online) access to these related data bases was
discussed. A majority of the panel agreed that most decisions do not require the immediate response of an online system;
generally the decision can wait for an overnight job to be run against a batch-supported system. However, one panel member
felt that there are many instances when online access is needed. This allows a user to browse through the data and, possibly,
to use interactive graphic packages against the data base. This is a tool to help make on-the-spot analysis of a selected subset
of data.
Interactive Processing
The panel agreed that interactive processing is most appropriate when used for summaries, statistics, and general
information, but not for detailed information. The discussion revealed a difference between a data base such as SAROAD or
STORET and a bibliographic data base. Bibliographic data bases are cost effective online because of their very high usage.
These systems also allow a user to develop a search heuristically and browse through the data. These characteristics are much
more difficult or impossible to use and are time-consuming in a batch system. When appropriate, online systems allow a user
to submit jobs to the batch operation and give him confidence that the job will be run overnight without errors in his job
control language.
In a situation where a user wants to produce a subset of data from a large interactive system he is using, the panel felt
data subsets should be produced in batch operations. The subset can then be searched online. The industry has shifted from
encouraging all programing to be done over a terminal to suggesting that in many cases a batch operation might be more
economical.
123
-------
Future ADP Management
While the sessions have discussed the Agency mistakes that have been made in ADP, those present were interested in
what input they are providing that gives hope for the future. The panel said these meetings have increased communication
and have helped to break down the organizational barriers which apparently get in the way of data flow. They expressed the
hope that these meetings would create enough excitement among attendees that they would return to their jobs and generate
helpful input to management. Unfortunately, most input will have to go from the bottom of the organization to the
decisionmakers at the top. Also, when the lower level rises to the top, they will be able to generate input. It was of concern to
the panel that, although many branches of the Agency are represented at these meetings, if the attendees do not all choose
one or two good methods to manage ADP, and go home and form completely divergent policies that are ultimately
implemented, then nothing will have really been accomplished as a result of these meetings.
BIOSTORET
It was asked who will support BIOSTORET as it expands and acquires more users. BIOSTORET will be brought up as a
pilot project in March 1976. It is assumed that OWHM will decide to support it at that time.
It was explained that BIOSTORET media is disk-dependent and this is because the hierarchical coding structure is a class
order file which means it must be a direct access structure. The BIOSTORET file is not going to be a high data volume file in
comparison with the water quality file. Additionally, BIOSTORET has excellent retrieval capabilities.
ADP System Users
How a data system manager determines who the users are, or should be, was discussed. Most members felt the manager
must simply try to establish a class of user to fit the purpose of the data base and why it was accepted. At least the manager
should establish the minimum and maximum levels of users. Another panel member felt the question reflects an underlying
belief that data base systems should be judged by the number of users, which is erroneous. It should be judged by the user
times his influence. If the President is the only user of a system and he finds the system useful, it is worthwhile. A positive
value must be obtained when equating the cost and the gain. Once these positive results are obtained and the system is
justified, other users are extra benefits. The manager does not need a large user community. If someone wishes to use a data
base, it was generally agreed that the prospective user should go to the manager. The manager should have a set of guidelines
on who may use an online system and, although the user may not be allowed online access, he will probably be provided with
the data he requires from the system. .
i
Members said that there has been much criticism among Agency groups about which groups are using particular data
bases. The panel commented on what the Agency should be doing to stop this constant infighting and how it should
rationally regard these systems.
The easiest method, the panel felt, is to make the user accountable for his utilization through the budget process. The
individual manager should have the opportunity to use the most applicable system. Who should or should not use a system
should not be dictated across the board.
ADP System Managers
Managers within EPA and those who evaluate EPA output should determine whether or not the reports and analyses
they receive are adequate. It was not thought that any central committee or contractor could make the necessary assessment.
Management should take steps to ensure that required data is collected and placed in the proper data base. If this is not done,
the products of the data bases will be damaged.
Generally, the panel agreed that data system management should be placed in the line organization in the program
offices. One member felt that data system management should be at the Assistant Administrator level but that users could be
anywhere within the organization.
124
-------
OPERATIONAL CHARACTERISTICS OF THE CHAMP DATA SYSTEM
By Marvin B. Hertz
Last year, a detailed description of the Community
Health Air Monitoring Program (CHAMP system) was
given at the workshop. The design of the system was
basically complete at that time, and the Health Effects
Research Lab was in the process of implementing the
design.
The CHAMP system consists of a network of
remote monitoring stations located across the country in
coordination with concomitant epidemiologic studies.
The two major requirements for the system were:
To deliver high-quality data
To deliver and handle large quantities of data.
To achieve these objectives, software has been
developed (machine validation of the data) to connect
aerometric and meteorologic data with system status
information (i.e., instrument performance information).
The validity, therefore, of each individual data point can
be determined.
The remote station data acquisition system hard-
ware is shown in Figure I. Basically, the minicomputer
in the remote station serves as an interface between the
pollutant analyzers and associated system, magnetic tape
data storage, the remote field service operator, and the
telecommunications network. The data generated and
recorded at the remotes and transmitted to central
includes not only the actual meteorologic and pollutant
sensor responses, but also associated analog signals and
digital status signals (Figure 2). These signals supply
information about the performance and status of each
instrument. For example, if an instrument is switched
from an ambient sampling mode to the calibration
mode, a status bit is recorded which reflects this change.
The focal point of the CHAMP network is the
central computer facility located at the Environmental
Protection Agency, Research Triangle Park, North
Carolina. The central controller for the CHAMP network
is a dual processor system with a full complement of
input, storage, and display peripherals. The heavy
burden on processor time placed by the telecommunica-
tions and real-time processing of the large quantities of
data justified the choice of a dual processor system. A
PDP-11/40 with 40K of core was selected to perform the
tasks associated with the management of the large data
base generated by the network. The telecommunications
and real-time processing tasks are handled by a
PDP-11/05 computer with 16K of core. The two pro-
cessors are interconnected by a Unibus window which
takes advantage of the unified asynchronous data path
architecture of the 11 system. The window allows each
processor to address the core and peripherals on the
other processor as if it were its own. In addition, the
DEC memory management option was added to the
PDP-11/40 to handle addressing above 32K in the 16-bit
system. An extensive complement of peripherals
including two 1.2M word cartridge type disks, three tape
drives, an electrostatic printer-plotter, line printer, and
CRT display were initially selected. The rapid retrieval
requirements for large quantities of data necessitated the
addition of a Telefile dual spindle, quad density,
removable 20 surface pack disk system capable of
storing 98M words. A block diagram of the Central
Computer System is shown in Figure 3.
Although we could not have possibly contemplated
all the problems that developed during the last year, the
flexibility that was designed into the system enabled us
to overcome most of the difficulties that resulted in the
handling of the data. More important, however, the
computer served as a valuable tool in solving many of
the problems associated with system implementation.
The two major problems encountered were:
Improper field testing and installation of the
aerometric and meteorologic sensors prior to
system startup
Incomplete training of the field operators,
especially in the area of total system design.
The following printouts demonstrate the various
programs that were developed to allow the operation of
each station to be followed at all times and to determine
the validity of the data.
Figure 4 is a station map. It allows a remote data
slot number (parameter) to be associated with a param-
eter mnemonic and an engineering unit mnemonic in all
central operations. Since the system was limited by the
125
-------
number of possible data slots and the battery of instru-
ments differed at the various stations, flexibility was
thus added to the system.
Figure 5 is part of the validation criteria file. The
primary parameter to be validated is indicated, and the
concentration units are noted. The current calibration
constants for the instrument are listed. These constants
are updated as new calibration data are obtained from
the remote data. The permissible delta about zero
signifies the noise range of the instrument. The ten
computer words listed under status bits comprise a
validation map for status bits. A 1 in the top row of each
word indicates that this bit must be checked. The 0 or 1
below this number indicates the valid condition for the
instrument.
The minimum number of 5-minute averages
required to make a valid hourly value is indicated, as
well as the notation that no special software routine
(handler) is required for this parameter.
Figure 6 is a continuation of the validation criteria
files. The secondary parameters (analog signals to be
tested) are listed as well as the upper and lower limits for
correct instrument operation. The Validity Map Associa-
tion gives the correspondence between a bit in the status
word (carried along with every validated value) and the
corresponding reason for invalidation.
The machine validation of the data is performed by
Program VALDAT. Figure 7 is the first page of the
output of VALDAT. The machine validated hourly
values are presented by station and day for each param-
eter. The number after each parameter value denotes the
number of valid 5-minute averages in that hourly value
(12 maximum). The M, I, or B after each value signifies
that the values not averaged are missing, invalid, or both
missing and invalid respectively. The second page of
VALDAT (FigureS) lists the invalidity causes by
parameter by hour. Figure 9 is the next part of
VALDAT and lists the values of the invalid 5-minute
averages which were invalidated by secondary param-
eters. The value of the secondary parameter is also listed.
Figure 10, also part of VALDAT, lists the status of each
5-minute value by hour, by parameter. The characters -,
0, I indicate that the noted 5-minute average is missing,
valid, or invalid respectively. Figure 11 is a daily plot for
a primary parameter for the station and day indicated.
The # and I indicate valid and invalid data respectively.
Since the line printer limited the number of points that
could be plotted, basically every other 5-minute value is
indicated.
VALDAT is reviewed by cognizant EPA personnel
and the data edited to reflect station status that cannot
be determined during the machine validation. For
example, journal entries (Figure 12) and field logs
supplied by the remote station operators are often help-
ful in validating the data. After the review of the data,
changes are incorporated into the data base and a final
"REVIEW" printout (Figure 13) is obtained. REVIEW is
checked to see that the appropriate changes have been
made. The data is then ready to be archived and used in
the epidemiological analyses.
Although the system required considerable human
intervention initially, the use of the software described
in this paper has proven to be a valuable asset in the
installation and maintenance of the stations. This will
eventually lead to the need for only nominal human
intervention (quality control spot checks) into the
system.
126
-------
DATA
SET
TELETYPE
OPTICAL
ISOLATORS
96 BITS
DIGITAL
INPUT
OPTICAL
ISOLATORS
••MBK
96 BITS
nifSITAI
OUTPUT
MODEM
CONTROLLER
POP - 8/M MINICOMPUTER
MAG TAPE
CONTROL
MAG TAPE
MUX/ADC
CONTROLLER
6 DIGIT
DISPLAY
48 CHANNEL
MUX & ANALOG
TO DIGITAL
CONVERTER
Figure 1
Remote Data Acquisition System
-------
OZONE
• SAMPLE FLOW
• ETHYLENE FLOW
• POWER STATUS
• VALVE POSITIONS
• RANGE
OPERATION CHECKS
03GENERATOR FLOW
03 GENERATOR SETTING
• DILUTION AIR FLOW
• VALVE POSITIONS
CALIBRATION CHECKS
Figure 2
Ozone Operational Parameters
RK05
DISK
RK05
DISK
TU10
MAG TAPE
BOOTSTRAP
32K CORE
HARDWARE FLOATING POINT
MEMORY MANAGEMENT
REAL TIME CLOCK
DL11
ASYNCHRONOUS
INTERFACE
Figure 3
Central Computer System Block Diagram
128
-------
PARAMETER
a
i
3
a
5
6.
T
8
4
1 A
11
12
13
14
15
IB
17
18
I'
11
22
23
37
38
34
C5I
01
02
04
fl5
«6
SP2
HPBQ
CO
CMO
THC
HVFL
HSPI
H5PF
P1M
N*HC
PYSO
TPSL
HC'PB
flPHP
DARI
P A 32
UAR3
KM03
HPAN
AfGC
T^O*
MFFV
FETM
CALC
5FD3
SF^O
F02
USD?
SFSU
FT-SN
rt> i' ci
f.fpti
CHS
V ft C
LFL
Hfr.c
i:ri-o
['^^
PPvB
?Pflfl
"H^
>'?••?
EU MNEMONIC
K/HR
DEC
CENT
CENT
MMHG
CENT
PPH
PPM
PPM
PPM
PCM
VMV,Z
PPM
PPM
PPM
L/N
L/n
L/M
PPM
PPM
VOLT
FARM
pac.fl
ticca
CC/M
CC/M
CC/M
XFS
CC/M
CC/M
CFHT
VOLT
CC/M
CC/M
CC/M
CC/M
CC/M
cr./H
CC/M
CC/f4
BI'Kd
UMl'K
UCCM
TPRR
XLEL
CC/M
p:tt?,C!
BOOZ
acfe
Btlpa
dMCin
ffi:-.-
Figure 4
Station Identification Map
LISTING OF FILE DKllVCB84l,aei
PRIMARY PARAMETER! 03 IN PPM
CURRENT CALIBRATION CONSTANTS AREI
A. -f.ff\ D> n.»H5T c« o. none HDLF • i
PERMISSIBLE DELTA ABOUT ZERO • a.oia PPM
REPORT FONMAT - F7'.«
WOOD
VAL
WORD
VAL
WORD
VAL
WORD
VAL
KOIfO
VAL
•iOQD
VAL
WORD
VAL
UOPO
VAL
l.'ORD
VAL
V.ORO
VAL
a TT
Oil
a EX
1 EX
2 EX
3 Ex
« Ek
5 EX
6 EX
T EX
8 Ex
4 EX
>>irjI"UM NO
K.O SPECIAL
STATUS BITS
oeonesoo
e e e 0 e a a e
e ' a e ti e e e e
e e e e e e 0.0
oaeaaeea
0000BB00
aoapaaee
aaaaaaaa
eeefieeee
8BBBBBB0
e
e
e
e
e
e
e
e
e
e
, OF POINTS TO MAKE A VALID HOUR AVC •
HANDLER FOR THIS PRIMARY PARAMETER
D e
e a
e e
e e
e e
0 0
e a
e e
i a
e
a e
9
1 1
e
e
e
e
e
e
e
e
e
0
Figure 5
Validation Criteria File I
NUMBER OF ASSOCIATED SECONDARY PARAMETERS
SEC PARAXI i FETH
A. !.*42Ai 0,584C>
LOW LI^IT • I8.BCO
SEC PARAMf g SF03
A« I05.|7PB« 343.
LOW LI^H • 42B.ePe
B.1M
HIGH LIMIT •
-2.075
HIGH LIMIT •
32.0aa
VALIDITY MAP ASSOCIATION
BIT HO»D/SPAR
BIT
n
I
2S
JS
OS
ss
t.s
7S
85
45
ins
IIS
1SS
13S
MNETNIC
FETH
SF03
PWK
CAL
Figure 6
Validation Criteria File II
-------
CMAMP f>ATA.VALIDATED ON 1S-OCT-75 WITH VERSION 5.10 OF VALPAT
y.Al l'»»T.l.n!4_.ll.FJ?.0»T..EO!..ST»Tt05i_-:_*B«2.._F.O"..0»y__- ?P5 -..1975 .
TIME snx
PI
l :
?:
. 3:
«:
51
e:
7:
t !
c ;
!:•:
1 1 :
! 2 !
1 3:
iu:
15:
1 1 :
17:
1 * :
is:
2":
PI:
?<>:
23!
P P- .^?**9 1 ?
9 P.I iHf 1 >
P P . < .« * " I P
; ? t .'•<•!••••. i •>
» P . >• -i 1 1 l P
? P.i9':7.«ii
? ? . " i •• '.' 1 P
P »..-.'•< i?
:• ? . " V • 'i • : 1
i* . " .1 '• 1 '
J, j,>;..,r.,.; ,
7 (*.' *' S i>
7 ?..""• -.i >•••!!
C n.':|e« 12
* " ."• 1 K '' 1 1
[• p. L »•-•.• 1?
p p .:<•'" i n i
7 ? . : ' ' ; IP
P P . 1 / •'! '1 ! P
P P. 3 !«(•.« u
•
SNO a l
01 e 1
SOP U 1
run n02 SNO 01 SH2
?.?312 12 P.PIS? 12 Q.P.912 iP-f*.i£.*S 12 0.n3fc? 12
P.P7*S |2 fl.n-'53 12 n.ffSUP. |?-P..??15 12 P.PS'iiJ 12
P. •"''7 J2 *."/'•' 12 '".(•'•"in IP-^.^smb 12 P. 'MM' 12
p > •. 3 ;•. .p/i04 12 p.R3*>c 12
f.?3ut*Mii ?.c:bi.tMij p . »h^v« 1 1 -n . 'iK'iVi 1 1 P.^i*»nrui
C..VPH»> |P tm?^la \> <1.0.<•.•< | )
''•.'•'I*' \P ^."?^1 >P P.".'S/ IP R.'iM J2 P.i.<«>.i 12 1
" .-:» J'"'l 1 «..V'*.MII ™.r ;/ i" i l "..-lassv j | c/i.pjkj-.*! i l
I1.'-', i-d 1«* B.'1!"5! 12 vl."P." » ».''3P2B 7 ".('O'l?-"! «
^.'"l*'^"!^ 'l.^/^h" 9 r> . c 3.*; '_-n | •• ".". 3'ijn f« P.i'JMuR S
i^tif.on j? Pltl'4^i ]P n^pa'i 1? 7. -3 '!.? 12 P.P37S 12
* . :'»-r'i "i 1 <«.'-5'>1t>ii i '''."MP-il (••.»ia9Ml1 r.pj76Mii
P.*J!9 12 P.".1"!! 12 P.*91 1 lP-f.i»wr<'i i n. io|i,-M i -;•.:•;•, "jt'i 1 s..f3°iiMll
(-.«?>2M 1
3.? 12
3.SM11
1.3 12
3.h«l 1
J.2 12
5.8"! 1
5,7 12
5.9 12
2.8*11
1-3 14 IS 16
CAL
M
CAL
U
NECV
la IS 16
CAL
1*
CAL
IS 16
VUQ
324 .2 12
35.7 12
"I. a ]S
(07 ,SU1«
"«. 2 12
2 17. 5" 11
IV .'» 12
h«.a 12
S7.7-.I 1
KJ.h 12
2 5 'l . a M 1 1
87. h 12
92.2HH
1 1 :i . M 12
123. 2M1 1
M*.* 12
1 Jil.S'*! 1
111.3 12
317.9-11
293.8 12
292.1'MI
PWS.5 12
293. S 12
je3.6MU
17 IB
17 ia
TOUT
S.6 12
6.5 12
5.P 12
5.7-IP
5.2 12
5 . jJH | 1
A.I |2
a./ 12
1 1 .•••'i i
15. a 12
17., "in
18. b 12
19,3M| 1
2H..« 12
Pfl.l-Ml
'.•'.<' 12
18.7M1 1
15.2 12
IP.9H11
1 l.h 12
tf.SNl 1
IB. « 12
IF.S 12
9.3M11
19 20
19 2fl
19 2(1
TIN
19.
19.
19.
1 "*.
19.
1 ° •
19.
19.
2?.
21 .
!».
ec.
'".
2P.
?1 .
2'/.
?r.
?i.
PI.
jp. .
19.
19.
19.
19.
*"
PI
71
71
6 12
7 12
7 12
h"|Q
6 12
hH 1 1
6 12
7 12
!"M 1
f- 1?
9-11
8 1?
8'M 1
9 I?
P"l 1
9 IP
9-11
1 12
I'l'll
2 12
7M1I
7 I?
8 12
7M11
22
22
22
8P
752.
752.
75P.
7ci-> .
7S3,
7S ?.
7«>a.
1*K .
7^'i .
7M.
7SP.
7 '• ? ,
7S| .
'S1 .
7M.
751.
7S| .
7^2.
T*f.
751.
751.
25
23
23
" 12
1 1?
h 12
;-IP
S 12
*t" 1 1
1 12
•» 12
/•Ml
S 12
3"M 1
S 1?
li"\ \
1 12
9"! 1
9 12
7-11
7 12
9"! 1
.» 1?
i ''• \ 1
b 12
4 12
4MM
-------
STiTTIjm •
SFF.fiB lan.jTR
M SFMPI 12"..171
P.PI 97 Slf:fi« . 1PA.I-91 .
SFN(i« ia«. ?3«
n.nii*. si-'Mfii .ian.i' 12'I.'31
.',n;fi= 12A.378
n.ri-a3 SI-NO*
SFWDO
SF-lCli
SF'-'O"
SF»-n«
SI- I'll a
SI liCit
SFNIii
smr»
SFr.-r.s
tfnom
SFflOo
SFMO*
SFfTc
SF'.'IU
SFNIK
SFT.'P»
SFNO>
SF'in.
OHA9
11 «
1119
1M«
iPl.Pjn
1PA.^7H
12"!i7«
me,
IP«,
121,
12",
IS"!
129,
ttl,
129,
129,
C91
191
f91
,1"!
,3*1 .
,»3»
HP»
Npx
NOX
NUX
Nf)X
NOX
>I»Y
Hi-it
NPX
NCIX
NOX
not
NUX
NUX
NIIX
iipy
NUX
NOX
NOX
4:59 »IOX
7: 111 NOX
7:P9 hO<
MOX
NPX
?: (•
2i3»
2:Sl
3: 6
3:?l
3:51
HIM
c:?6
o;a|
5:11
i:ai
1jt.hba . — - fl:P9 nox
9:i«
n-.ron
120.?3«
129.SJ5
1«9.I»SS
12°.'525
\FSli
ir:iB
us 3
1 1 : 1 6
ti:'i2
His;
Figure 9-
SFNOs - I20.l"i5
SF»0« 129.?3»
SFNC« 129.525
SFNC» 129JS2S
Vi'.I.DAT - III
.. l?:P7
1P:«2
IP:*;;
. 13:12
13:11
13106
- l<"t 1
K.M6
"OX
NOX
>inx
3 fiOX
NPX
»ux
M'x
N(IV
not
NOX
"PX
NOX
»ux
NOX
KOX
NOX
NCX
NOX
NOX
C..7B97
x.irrn?
r>.pc»7
B . !' 1 1 6
n.n:73
p. ('<••-, 8
P.CS.Sk
n.r.-S3
['.-"v-Sl
f.ft-a)
O.l-l
C.'-i 39
B."'. Ta
n.'T «a
S.»t-39
(l.l--'^9
P.C'f 3d
^.^(••jj
P. IV* 3
SF>ip>
SF >:(•*
SFHO"
SFriO*
SF»0«
SF'1 f«
SF>:o»
SFNC*
SI k'O"
SF'.'C"
SF-r.«
SI r.o»
SFI-0-
SFI'P*
SFNO"
Sft'O-
frt (i«
PF'-O"
Sli-U*
SF '-0«
SFI'O'
SI- "(13
SM.O-
SF"0«
51-t.O"
n.i'219
B.rrss
?.:v78
",P[-r>2
!-.»l'1P
?."1?1
1 . " 1 1 6
* . I1 1 1 J
n.rii2
3.C131
B.cipiia
?.fl'73
Ei.:-i,bj
H.f'PdS
C.P?58
SFMOl
SF-O.
SF'iO"
SF vnc
SF'lba
SFl.O"
SF-0«
SFNU>
SF1 Hi
SF>'U>
SFNOl
ff'iO*
SF'IO*
SFwn.
fft P>
SFNO>
5FNO"
SFNOi
SFNQI
SF*iO»
128.37H
ISoi^J"
12".-521
12".?S«
12".^9|
ipq^sps
ia9!pj8
1291381
I?9.RH
129ls2S
129|s2S
12«.3P1
12"."II
1P9.S2S
129.S2S
129.311
129J3M
129.ht>B
I29J381
129.955.
129.361
Figure 9
VALDAT-III
-------
STATIC* - HBO? 0»» Pl«5-1975
****0'TA REVIEW****
NOX / 1 2 3 a 5 i 7
aocaauooaooQ.ooooooooooao-ooaooocDaona QQoo-rooooao .000000000000. ooooo-oooooo OOQQOOOOPOOO ooooonoooooo.
K •> 10 II 12 13 la IS
-o oooooooooQoo. oooooopooo-o ooooonooonoo. pooooonopo-i.x———-- lonooopono-o
I* 17 18 19 ?n 21 22 ?^
PO?CPP"Eri!2. CPCOOr-OPLiP03.Q30000!iCGO-O.G5QnOPOnor500 QOOOOCOOOOO- 000000000000, 000000001000 0000000-0000
NO :• 1 2 3 a 5 b 7
eoooo&r'OMCQ OQapooononoa ooooocnpcooo onon--ooonop OOPOOOOOOOOO DPOOO-OPOOOP oooopcoonooo opooonnoopoo
« 9 IB 11 12 13 I a 15
ocrcpc'-c-a-o encrooononoo ooooncnnpo-o ocionopoorpoc..popoocnopo-i. I rooopononc-o OOCPOPPPOPOO .
th 17 Ifl 19 23 21 22 23
o:jOC'r''"C?:'f- oocrOnnriOr1OP_POPCnQriPOO-0 gnnpnnonnnno.nnnpopnonoP- pppnnnpnoppn pnnonnnonoPO OPOPPPO-OPPP
N02 .' 1 2 3 « 5 b 7
oorLn-or'j:cC"j..QOQr!OPonoPoo..oopoopf!ooooo OOOO--OOOPOP oonoponooopo OPOPP-OTOPOP. POPOPOPOOOOO OOOPOPOIOOOO
* 9 HI 11 12 13 10 15
inr.r:—'••"i'!.! pnnnnnpnnnop nnnpnnnfnn-o onoonniinooop oopoppnono-l i----.------ IoooPino'io-0 OOOPOOOPOPOO
i A 17 IB 10 ?.-* 21 ?? 23
oo^c?'.1?'.'???- [;ncior|Pricncn l?GOPnPririOP-0 Qipponoonnnp poPOPOPnnoo- oPonononoPOP OOPOPRPOIOPO OOOOOPO-OPOO
SMP ' I 2 3 a S 6 7
.__ .. c?ot'^c'*rrc^o ofonp^OT^op Po^POC"K;rio^o O^P*^—-0^0^00 ooionpoopoio OPPHP—nnonop POPnPono^oPO OOOPOPOPOOOO
" 1 Hi 11 12 13 1« IS
Ct;CCCC?C?CJ-0 QCGnQOGCCnCQ.OOOOPOPOCO-G QiOnOPOnOOOP OOOOOOnOOO-I I- OUOOOIPOno-0
l«> 17 18 19 2t) 21 >i 23
f!20o"C"'jT.!p- aoo^ppoionoci ooPoooPL'TO-o.ononaiDononpn nononopooon- oppnnnpnonop nonopononono onooopp-oooo
03 1 ?. It a s h 7
PJCC.SJ.TJSaCJ UCCrOC'330000 002'J'D'';n'JC!i;OQ.O^OQ--OnonC!a.DOOOPQPOPOTO OPOOO-CTOOOO OOPPPOPonoOO OOOPOPOPPPOO
» 9 IH 11 12 13 l« IS
oof nf-nnr.oo-o. onnnnn[inpnnn.n(3nononpon-o pn(lPonor>ClPOn PPPOPOPOPO-I .I--IOPOPOOOJ. ---IOOnnP--0 ODOPOnonoPOO
'* 17 IS 19 23 21 ?2 23
icscsDp- unaoDDDnonco POQOPC"IOPO-O ononooononoo oooopooonoo-.
~ SOi V • 1 2 S t 5 » 7
£ _.. OOPOPCnOn??0 PnQnQnor'OPGP ncinOPOnOr'OPO onno--or!pnpn noPonoPOPOPP OPP^o-ooooon op^OPO^onono ooonononooon
fi •> 1". 11 12 13 1« 15
.oacccL-ca'jo-o OCQGOPOOOOOO oooorootico-o ononoooooooo .ooooponooo-o o — nononoooo OOP ----- 10-0 OOOPOPPPOPOO
I* 17 18 «o 20 21 22 23
nnnr>rn"-_."?'*- prpr-nnnngnnn nnntinnrpfo-o cmpnpnonnnnn POPPnnnpnoi- onpoonononon oononononono
. :i 1 e J 4 S b
0?OC">l.r'j"l.j?o. GnQ?OPOPOl()0
Figure 10
VALDAT-IV
-------
IT4HC* • ?»«> '»»t !•„•«
PLOT OF KO * limit 00 •
_t *»*a*e«*c«0i«»*« » _ *»»• _ _
-f».316»t _ • I • • •»•••
•• ••• ••>»•••• •••••• **•••••••<«••»»••*•••••*•• t ••••••••••••t •••••••••••••••kCllt*******!!*************!***** «•••••••••*••••*
M 1 3 3 a S 6 T 8 9 10 II 12 13 14 IS 16 IT IB 14 2B 21 22 23
Figure 11
VALDAT-V
-------
STATIOM - efl«? DAY 005- 1975
jnu"wAL ENTRIES
STATIC*- . M2 PAY . S
WEST..COV INA. CHAMP ?bfl2... .MDAME OPEK..
03 KOwiTOH SPAM notKT PUT o TOLESAMCE.
TIME • I4:s«,
,..A_1,2,3...
.. ALL OTHffR
AEEES- 1,2,3.
I-JSTWUMENTS.
•
LO"K GPMO ... STTLL HAVING PROBLEMS WITH RE»niNG OUT CHL 19N
.S02..SA!'..nL£.._FLA^t. UOES DuT.Ih S02.-KONIJ.OR .KME.1..C6
PULLED WOUTIKE *M^T DM STA... nna
Figure 12
Journal Entry
r.i-.p o... ..rvtfw . ouunn ON . H.AUR.T?
STATION - ««42 PAY . 293-1974
nf«u» hfK f'O t*D2 03 302
PPH PPM PPM PPM PPM
f P.S^M »t'.2*37 0.10(14 BMOL 0,0fl&4
1 0.3^64 0I.21P9 9. ft 9 1 PI BMQL 0. PI 1f> 4
? 91 )S/7 P1 . f ^ ? 3 0gP9[*4 0.^184
3 0 )S<|^ fft.f"*?fl 0.^R9A ^.Plhl
ii.i «•>!•• w.r-'ii-s a.c9ja B.i'i^i
•< o . ) ^ <> 9 f.(<^^7 a.K9ai O.»ia65
7 (*.1^54 e . (' -1 « 0 H . tf r« <» 1 O.P164
l> . 0,1?17 C.^'^S 9i,}VZQ BMOL 0^(1169
9 p. 117P »«.tL (f JC'9|tl HMDL 0.H19S
\« .. C..J^I C'MVI. (".I'^SB 0.0 ?56 B.f.SlI
'^ n "T|7 H'TiL f..C.7«7 O.MttPI H.R??3
(r- * • * fc* "'F F
1 4
js l«.t»B?S b^OL H.l'.79fl
• - 0.1*63 B',nS2h
;/ *.:••!' C . V ^ r1 9 B . 0 fl 7 1 P> . (1 S f fc
|» /.••'«•' ' *,t'«»>5 P. 0191 P.P4P9
• ^ (< . :• S 7 U W . I! 7 1< 9 n.(*17PI .B.PIfiS
•>.- ?. •t."*- P.t'7^0 HMOL 0.0164
?; (i.->i.Q^ O.r^P.H HMOL 0^0Qfi4
?• " ?,.«-«1 ' 0.'-'''7tf B*fOL 0,^161
M i-il-'fr'' 4J.UM7 U.H1B« 0.?1^4
-»v VAtl" *.a*94 0.293W B. 1H9 0.2312 8i.n5i8
M"? Qr >*A» 0113 f«13 (M23 16112 13I5S
Figure 13
REVIEW
MS
KPH
2.6
3.7
2. A
3,fr
3.0
3!7
3.2
3.*
?.9
2,5
3.7
n.n
5.3
6.2
>».9
".I
9.4
7.9
5.1
5.?
5.4
fr.2
3.7
11. ei
17M7
..IS--ACTUA.TED.
WO .
.. DEC
««2.2
?3'.9
IB?. 9
215,6
1'li9*7
?•$«.!«
?jn.4
icn,9
133.2
1 13 , A
179.3
91,7
?fi1 .9
? 1 4 . 7
139,3
1 <"». 1 '
1^3.8
136,6
1H6.4
1 19,3
MS. 6
175. S
15B.1
3H.7
1103
*-
TOUT
OE6
13.9
13.7
la.5
14.5 .
H.3
i"!t
13,9
13, P
in.ft
14.5
15.1
1 *>, 1
1 A. 8
2^,5
21 ,3 »
21.5
I'.h
IT. 2
15.7 .
15.3
15. S
15.5
15.6
21.6
16114
PAGE 1 OF 1
TIN BP
DFG .. . MMHG
21.9
21.1
21. fl
21.0
*e)'
2"!
251.
20.
2«.
?(* .
2f,
2W.
21.1
21.5
??. Pi
2 2, a
21.5
21.2 .
21.2
21.2
21.2 • .
21.2
21.2
22.2
15144
-------
DEVELOPMENT OF THERMAL CONTOUR MAPPING
By George C. Allison
INTRODUCTION
The project to produce thermal contour maps was
initiated in the spring of 1973 at the Environmental
Monitoring and Support Laboratory-Las Vegas (formerly
the National Environmental Research Center-Las Vegas).
At that time, two separate capabilities existed for
generation of computer contour plots and for collection
of thermal data with an airborne infrared scanner. The
intent of this project was to join these two capabilities
into an automated system for generating thermal iso-
pleths, or contour maps, as shown in Figure 1. The
system would be applied to map thermal discharges into
water bodies primarily for enforcement purposes.
The system diagramed in Figure 2 was conceived to
meet the requirements for generating thermal contour
maps. In order to complete the system, additional
resources were required to supply the following
capabilities:
Analog recording capability aboard the
aircraft.
A ground station for analog to digital
conversion.
Software for data reduction and to interface
with the contour plotting software.
While the system does not appear complicated, its
development produced a number of interesting problems
and alternatives.
AIRBORNE RECORDING
The requirement for analog recording aboard the
aircraft was readily satisfied by purchasing a standard
14-channel analog recording unit. However, satisfactory
recording of the scanner data was not immediately
obtained. The first imagery produced from a recorded
signal showed that a waviness had been introduced
which was most noticeable at the trailing edge of the
scan lines. The cause was found to be a very small
oscillation of the tape recorder speed. Attempts to
correct the problem through adjustments to the re-
corder were unsuccessful. Although the contour maps
were never affected, the problem was minimized by
recording at the maximum speed of 120 inches per
second.
It had been hoped that a facility operated by a U.S.
Energy Research and Development Agency contractor in
Las Vegas could be used for digitizing the scanner data.
However, that facility was used for digitizing ground
motion data requiring a minimum interval of about .10
of a second, while the scanner signal was to be digitized
at an interval of from 10 to 20 microseconds. Further
investigation revealed the speed limitation of that system
to be far short of that required, so our own digitizing
facility was developed.
THE GROUND STATION
The alternatives considered for the ground station
included both computer-based and nonprogramablc
digitizing facilities. The use of a minicomputer was
chosen over a hard-wired facility primarily for flexibility
and error detection capabilities. This decision led to
another consideration: contract versus in-house develop-
ment of the software. Due to the lack of available in-
house personnel, an attempt was made to obtain both
hardware and software from the computer
manufacturer. The final result was a separate contract
for software development with an individual recom-
mended by the computer manufacturer.
When the minicomputer and software were ready
for delivery, we were unable to accept the software
because the "front-end" of the system consisting of a
playback recorder and analog-to-digital converter was
not yet available. In order to test the software for
acceptance and to progress with the system, a simulator
was developed in the place of the A-to-D converter. The
simulator consisted of: (1) a square-wave generator, (2) a
pulse generator to provide timing, and (3) a minimal
amount of logic to permit data to be read.
The simulator proved to be more useful than
anticipated. In addition to allowing for final checkout
and acceptance of the digitizing software, it served as a
continuously variable exerciser for finding the speed
limitations of the ground station. It also made it possible
to generate test tapes for development and checkout of
the remainder of the system.
135
-------
TEMPERATURE CONVERSION
The conversion from voltage to temperature is the
most critical and potentially controversial phase of the
system. In this area we have relied upon the technique
developed by the NASA Earth Resources Laboratory in
Mississippi. This technique involves the use of tempera-
ture standards built into the scanner for determining the
relationship between voltage and temperature, and the
use of ground-truth data for determining atmospheric
loss. The accuracy of the technique depends largely
upon having constant atmospheric conditions over the
spatial and temporal range of data collection.
Although the mathematics involved is quite simple,
the temperature conversion software was meticulously
prepared for both accuracy and efficiency. The concern
for accuracy was not a concern for the computer
accuracy, but for the programing accuracy. The system
was developed with the ultimate purpose of providing
information for enforcement action. If a case should be
taken to court on the basis of information produced by
this system, the data and software should be able to
withstand detailed scrutiny by reviewing experts outside
the U.S. Environmental Protection Agency.
Our concern for efficiency is necessary to keep
computer time costs down. A typical area to be plotted
might originally consist of 2 million or more digitized
data points. The temperature conversion program uses
several code loops in which instructions must be
executed once per data point. Careless coding in these
loops might double or triple the cost of the entire
system.
CONTOUR PLOTTING INTERFACE
The contour plotting routine requires the input
data to be in the form of an equally spaced orthogonal
grid. Unfortunately, the scanner data contain an
inherent geometric distortion that prevents direct input
for contour plotting. Included with the contouring soft-
ware is a routine to generate a suitable grid from irregu-
larly spaced data; however, the routine is oriented
toward the problems associated with creating a relatively
dense grid from sparse input data. Since the scanner data
are already denser than the grid required, most of the
processing done by this routine would be unnecessary.
The computer time used by the routine would be
prohibitive unless most of the data available were
discarded. This, however, would destroy the resolution
and spatial accuracy of the map. All of these problems
indicated that our own simplified grid generation pro-
gram should be developed.
The grid generation program was developed with
the same consideration for efficiency and accuracy as
the temperature conversion program. The method used
to generate the grid is to compute the coordinates of
each grid point in the distorted coordinate system of the
raw data. This locates the four nearest data points and
the grid value is calculated by linear interpolation. The
grid then can be used to produce a contour map
centered about the flight line.
CONTOUR PLOTTING ROUTINE
The Surface Approximation and Contour Mapping
(SACM) software is a general purpose set of programs
originally developed for oil exploration applications. It
has no features specifically developed for this applica-
tion, nor does it seem to lack any features required.
However, it has been the one part of the system where
software failures have occurred. On occasions we have
exceeded some of its limitations resulting in over-stored
arrays and other failures. This situation occurs when
large portions of the area contoured are land areas with
many temperature variations. Since this software was
obtained commercially and is massive in content, it is
exceedingly difficult to identify and correct such
problems. However, we have been successful at circum-
venting most problems through various options of the
software. The most useful technique is the use of a
routine to mask out polygons from the contour plot.
This can be used to eliminate troublesome land areas
from the plot.
CONCLUSION
This system has been in operation for a year and
has required very few enhancements during that period.
Four of the ten EPA regional offices are receiving
thermal contour maps generated by this system. A
comprehensive analysis currently being performed to
assure the accuracy of the system has not yet indicated
the need for any substantial software modifications.
Most of the difficulty in producing a good contour map
relates to the nature of the water body itself, and the
conditions surrounding data collection. Generally there
are sufficient options built into the system to overcome
the difficulties.
REFERENCE
1 Boudreau, R.D., Correcting Airborne Scanning
Infrared Radiometer Measurements for Atmos-
pheric Effects, NASA Earth Resources Laboratory
Report 029, Bay St. Louis, Mississippi, 1972.
136
-------
891S 10000
20000 IIOOO
201)
esis 10000
20000 2DOO
Figure 1
Sample Thermal Isopleth
DOUGLAS MONARCH AIRCRAFT
NOVA MO
GROUND STATION
CDC 6400
(ERDA-NVOO)
SURFACE
APPROXIMATION
AND CONTOUR
MAPPING
(SACM)
J CONTOUR
/ ~"(
Figure 2
System Flowchart
137
-------
REMOTE SENSING PROJECTS IN THE REGIONAL AIR POLLUTION STUDY
By R. Jurgens
The St. Louis Regional Air Pollution Study (RAPS)
was established in July 1972 with the objective of
developing and evaluating mathematical air quality
simulation models. Given the source emission data and
meteorological conditions, these models would describe
and predict the concentration, diffusion, and transport
of pollutants over a regional area. One application of
these models would be to assist State and local air pol-
lution agencies in assessing the effectiveness of, and
choosing between, alternate air pollution control strate-
gies. Verified models could potentially reduce the
requirements for actual pollution monitoring within a
region.
A requirement of model evaluation is the availabili-
ty of an extensive base of air quality and meteorological
measurements and information on all processes that
determine pollution concentration within the modeled
area. A large research and development effort was estab-
lished in St. Louis to meet this objective.
A number of experiments utilizing remote sensing
instruments are part of the RAPS research effort. These
include NOAA's acoustic sounder, EPA's Lidar systems,
Lincoln Laboratories CO monitor, and the Regional Air
Monitoring network. Descriptions of these remote
sensing systems are included in this paper.
ACOUSTIC ECHO SOUNDER
An acoustic echo sounder was installed in the
downtown St. Louis area early in 1975. The installed
system is maintained by the Wave Propagation Labora-
tory of NOAA. The primary motivation for this project
is a study of diurnal and seasonal changes in the urban
boundary layer. Using acoustic radar thermal plumes,
inversion layers, and their dynamic behavior have been
studied. Recently, the Doppler frequency shift of scat-
tered signals has been analyzed to determine wind veloci-
ties.
This remote sensing technique is based on the
principle that acoustic waves propagating through the
atmosphere are scattered by temperature fluctuations
(variations in refractive index) and by fluctuations in the
motion of the air.
The acoustic radar consists of three basic systems:
(l)a transmitting system that generates short, high-
powered pulses of sound at a single frequency, (2) a
receiving system that detects and amplifies the small
fraction of incident pulse that is backscattered, and
(3) an analyzing recording system-usually a facsimile
recorder which produces time-height profiles of the
echoes and perhaps a multichannel analog magnetic tape
recorder. Typical sounder parameters in use in St. Louis
are: transmitted carrier frequency 2950 Hz, transmitted
acoustic power 10 W, transmitted pulse duration
200ms, pulse interval 5 sec, and receiver bandwidth
30 Hz.
Currently, only the facsimile-record echo data are
being analyzed. Analysis is based on pattern recognition
techniques developed by Clark and Bendun.2 Thirteen
general classification patterns have been defined, and
these are used with slight modification for each new site.
An example of a continuous pattern categorization of
acoustic sounder facsimile records is shown in Figure 1.
From data like these, it is possible to study the diurnal
trends and frequencies of occurrences of the various pat-
terns. With the aid of supporting ground level and verti-
cal profiles of meteorological variables, it is hoped to
fully describe the prevailing atmospheric condition
causing the various pattern types.
LIDAR
Both ground and airborne Lidar (light detection
and ranging) studies have been conducted in St. Louis
during the 1974 and 1975 summer field intensives.
These Lidar systems augment experimental studies of
the boundary layer structure by determining mixing
layer heights over the urban area, especially during the
morning and evening transition periods which are charac-
terized by discontinuous and/or fluctuating changes in
the mixing height. The airborne system also has been
used in determining the dimensions of plumes. The air-
borne Lidar has the unique capability of being able to
make many measurements over large geographic areas in
a relatively short period of time.
Both Lidar and long-path CO monitoring systems
use lasers (light amplification by stimulted emission of
radiation) for their source of pulses. Whereas tlic princi-
ple of operation of the CO laser is based on molecular
resonant absorption, the Lidar system, often referred to
138
-------
as laser radar, measures the backscattered energy of a
pulse transmitted through the lower atmosphere. The
pulse is scattered off aerosols or off dispersed solid or
liquid particles. The principle of pulse generation is the
same for both laser systems. A contained system of
atoms is "pumped" to an active or excited stage. Laser
action is initiated when excited atoms spontaneously
decay to lower energy levels, emitting photons in the
process. Photons trapped within the container trigger
other emissions which are in phase with the triggering
photons. The cascaded emissions are contained within
the material long enough to produce the laser beam.
The optical system of the ground-based Lidar is
shown in Figure 2. The transmitter consists of a
Q-switched air-cooled, pulsed ruby with wavelength
6943A (deep red). Since the angular resolution of the
Lidar is determined by the transitted beam divergence,
6-inch diameter (38-cm Fresnel lens on the airborne
Lidar) collimating optics are used to reduce the laser
beam divergency and to produce an output beamwidth
of 35 mrad. The corresponding spatial resolution of this
beam is 0.5 at a range of 1 km in the crossbeam
direction, and about 2.3 m in range. The maximum
firing rate, limited by the cooling rate is 12 pulses per
minute. In the receiver, a multilayered narrow-band
filter is inserted to reduce the output noise level
produced by solar radiation scattered into the receiver
field of view. During operation, a compressed air-driven
turbine rotates the laser Q-switch prism at 500 r/s. Upon
receipt of a fire signal, a synchronizing generator triggers
the flash lamp in step with a signal from the rotating
prism. A capacitor bank charged to 3 kv supplies energy
for the laser flash lamps.
Detected signals from both Lidar systems are
output onto strip charts and also passed through A/D
converters for storage on magnetic tape. The strip chart
data are subsequently digitized for storage on magnetic
tape. Analysis and plotting are done on a large batch
computer.
An example of airborne Lidar data from an
industrial plant is shown in Figure 3. The Lidar returns
are from the north to south transverse over the Union
Electric Sioux powerplant. With a northeast wind, there
were little or no aerosols upwind of the plant.
LONG-PATH LASER MONITORING OF CO
During the 1974 and 1975 summer intensive field
experiments in St. Louis, a tuneable semiconductor
diode laser system mounted in a mobile van was used to
make long-path (0.3-1 km) integrated measurements of
CO. The system was developed by MIT Lincoln Labora-
tories with funding from EPA and NSF.
The basis for the operation of this system is the
absorption of the laser radiation by gas molecules. The
measured intensity of the laser beam at the detector can
be related to the integrated concentration of the target
gas over the path length. The essential components of
the laser system are shown in Figure 4. The diode laser is
mounted in a closed-cycle cryogenic cooler which is
maintained between 10-20 K. The laser emission is
coLHmated by an aluminum-coated parabolic mirror,
12 cm in diameter. The beam is transmitted down range
to a remote retroreflector which reflects it back to the
parabolic mirror and then onto an infrared detector
situated behind a calibration cell. A sophisticated
spectroscopic technique was devised to minimize the
effects of atmospheric turbulence on system sensitivity.
Detector output is recorded on a strip chart for sub-
sequent digitizing on a Hewlett Packard 9864A and
storage on cassette tapes. Analysis and plotting arc on
Hewlett Packard equipment at Lincoln Laboratories.
The laser source used is one of the Pb-salt types. It
was tailored chemically to operate in the 4.7-^rn wave-
length region (infrared) in close coincidence with the
fundamental vibrational band of CO entered at
2,145cm" . Exact frequency matching and tuning
through absorption lines is achieved by varying the
injection current which changes the junction tempera-
ture, and thus the laser wavelength.
The St. Louis experiments were the first field
measurements of this newly developed technology.
Besides demonstrating this technology, the RAPS long-
path CO laser experiment is being used to study pol-
lutant variability around selected Regional Air Monitor-
ing (RAMS) sites. The laser data are also being compared
directly with RAMS data. An example of correlation
between RAMS site 108 data and the laser monitor is
shown in Figure 5. The two large increases in CO at
about 7:30 and 8:30 a.m. represent plume crossings
from a slag-processing plant in Granite City. Wind direc-
tion strongly affects the correlation between the two
experiments and must be considered when comparing
data.
The monitoring capability of the diode laser is
being expanded to include NO, 0^, NH^ and perhaps to
weakly absorbing S0-
139
-------
REGIONAL AIR MONITORING SYSTEM (RAMS)
ACKNOWLEDGMENTS
RAMS is the ground-based air pollution, meteor-
ological, and solar radiation measurement network of
RAPS. It consists of 25 stations situated in and about St.
Louis. Figure 6 shows the placement of the 25 stations
and the telephone trunk lines which connect them to the
central facility at Creve Coeur (CCF in figure ). RAMS is
a sister system to CHAMP described in a paper by
Marvin Hertz at this workshop.
Although RAMS is not a remote sensing project in
a strict sense, the design philosophy allowed for
untended operation of the remote sites except for
routine maintenance. Features incorporated into the
remote site to implement this philosophy include:
Automatic power fail and automatic restart
Backup storage for up to 3 days on magnetic
tape
Software digital commands used to remotely
control the calibration of the gas analyzers
77 status sense bits which monitor system
performance and associated support character-
istics.
Operation of these features has proven successful in
reducing the frequency of required maintenance and the
manpower requirements needed to operate the stations.
Maintenance of telecommunications between the
central computer facility and the remote sites is required
for automatic operation of the system. The actual
communication is through Novation modems running
with Bell 202 type compatibility. Communication rates
are 1200 baud in both directions using ASCII character
formats. In addition to the parity function provided by
ASCII, each transmission by the remote or central sites
includes a check sum for greater redundancy in error
detection. Bit error rates are on the order of 2 in 10**7
except during periods of electrical storms. Experience
with the RAMS system indicated that between 5 to 10
percent of potential data is lost because of telecommuni-
cation problems.
EPA investigators and contacts for the projects are:
Acoustic sounder:
Frank A. Schiermeier
Regional Air Pollution Study
11640 Administration Drive
Creve Coeur, Missouri 63141
Lidar:
James L. McElroy or J.A. Eckert
Environmental Monitoring
& Support Laboratory
Environmental Protection Agency
Las Vegas, Nevada 89114
CO monitor:
William A. McClenny
Environmental Research Science Laboratory
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
RAMS
James A. Reagan
Regional Air Pollution Study
11640 Administration Drive
Creve Coeur, Missouri 63141
REFERENCES
Acoustic Sounder
1 Mandics, P.A., Hall, F.F. and Owens, E.J., Observa-
tions of the Tropical Marine Atmosphere Using an
Acoustic Echo Sounder During Gate, AMS, 16th
Radar Meteorology Con., 1975.
2 Clark, G.H. and Bendun, E.O.K., Meteorological
Research Stuides at Jervis Bay, Australia.
Australian Atomic Energy Commission Report
AAEC/E309; ISBN 064299B423; July 1974.
Lidar
Eckert, J.A., McElroy, J.L., Bundy, D.H.,
Guagliardo, J.L., and Melfi, S.H., Downlooking Air-
borne Lidar Studies, August 1974 (to be published
as an EPA report).
Johnson, W.B., Jr. and Uthe, E.E., Lidar Study of
Stack Plumes, SRI, June 1969.
140
-------
CO Monitor
5 McClenny, W.A. Ambient Air Monitoring Using
Long-Path Techniques, Paper 28-6, International
Conference on Environmental Sensing and Assess-
ment, September 1975 (to be published).
6 Hinkley, E.P. Long-Path Ambient Air Monitoring
with Tuneable Lasers in St. Louis, Lincoln
Laboratories, MIT, January 1975.
RAMS
7 Myers, R.L. and Reagan, J.A. Regional Air Moni-
toring System at St. Louis, Missouri, International
Conference on Environmental Sensing and Assess-
ment, September 1975 (to be published).
141
-------
it i iiiiloononiiii 10 t o i ) i
''
HOURS
2300 ing 2100 2000 itoo noo noo wco ,__5Wnaiir$
Note: As marked, not all the categorizations are correct.
Figure 1
An Example of a Continuous Half-Hourly Pattern
Categorisation of Monostatic Acoustic Sounder
Facsimile Records Taken over Several Days
-------
COLLIMATING LENS
NARROW-BAND FILTER
RECEIVER SIGNAL
PRISM
FIELD STOP
(ADJUSTABLE)
PRIMARY
MIRRORS
OPTICAL ATTENUATOR
DIVERGING LENS
FIBER OPTICLIGHT
\,
ROTATING PRISM/V
0 SWITCH _
FLASH LAMPS
(2)
LASER
REFERENCE PATH TO
COLLIMATING LENS
CALIBRATED
OPTICAL ATTENUATOR
O-45d3
.TRIGGER PULSE TO OSCILLOSCOPE
SILICON PHOTO DIODE
PARTIALLY REFLECTING MIRROR
(FABRY-PEROT ETALON)
Figure 2
Optical System for Ground-Based LIDAR
-------
>
UJ
<
UJ
U1
z
UJ
£
Q
3
Map of St. Louis Showing Lidar Traverse? on August iy-20, 1974
5/0 '5 10 15 2O
MISSISSIPPI RIVER STACK MISSOURI RIVER
GROUND POSITION. KM
Figure 3
LIDAR Return Signals from North to South Traverse
Over Power Plan: North of St. Louis, Mo.
144
-------
Chopper
Detector
Calibration^,
Cell
Closed-Cycle
Cooler
Laser
M-2
Retroreflector
Figure 4
Optical System for Laser Monitoring
-------
0.0
6:00
10:00
2.0
6:00
10:00
Figures
Correlation of LASER and RAMS
CO Measurements
146
-------
Figure 6
25 RAMS Stations With Trunk
Lines to the Central Facility at
Creve Coeur (CCF)
147
-------
AUTOMATIC DATA PROCESSING
REQUIREMENTS IN REMOTE MONITORING
By J. Koutsandreas
INTRODUCTION
The application of remote monitoring technology
to environmental monitoring has taken a quantum jump
since the advent of the space age; we have seen the
simultaneous development of advanced sensor systems
and platforms to carry them. This remote monitoring
technology encompasses imagery and other forms of
data acquired by a wide assortment of sensors aboard
aircraft or orbiting spacecraft, or automated data
collection systems acquired through telemetry links.
Information processing techniques range from
conventional visual interpretation to sophisticated
computer interactive (man-machine) systems. These
techniques have provided a means of reducing
information to useful formats, including base-map
overlays, electronically displayed color-coded thematic
maps or, for some applications, computer-generated
maps and tabular data.
In this paper, some of the useful techniques of
remote environmental monitoring are discussed, with an
emphasis on automatic data processing. Included is a
brief description of the sensor systems which gather the
data, and a discussion of some of the data requirements
and a few applications.
SENSORS
When carried aboard aircraft or spacecraft, remote
sensors offer a synoptic overview which is not achieved
by ground survey methods. Observations of the total
scene are recorded as an image, and present a visual set
of data patterns, not merely the group of data points
which would have been collected by ground methods.
The remote sensor sampling technique is an unobtrusive
way of gathering data. The mere presence of a ground
survey team, for example, investigating potential sites
for development may result in the spread of unfounded
rumors and may cause unwarranted adverse reactions
which would hinder further site evaluation. The final
images of remote sensors have a very high information
density compared with graphic, textual, or electronic .
storage media. Thus, the remote sensor presents a more
comprehensive picture of the area than conven-
tional field methods. Although the level of detail re-
corded by a sensor may not be as great for a small area
as ground observations would be, the sensor record
affords a valuable overview in a manageable form. The
cost/benefit ratio between overhead coverage and
ground traverses for a given area greatly favors the re-
mote sensing approach, except in cases where investiga-
tion of only a very small area is required. However, the
remote sensors can also indicate where to concentrate
more detailed in situ sensing and sampling.
The remote monitoring systems that provide and
will continue to provide this Agency with environmental
data are listed below, with their respective data storage
medium:
Sensor Type Data Storage
CAMERAS Film
MULTISPECTRAL SCANNERS Tape/Film
SPECTROMETER/RADIOMETER Tape/Film
LASER/LIDAR Tape
AUTOMATED IN SITU SENSORS Tape
Photographic systems are still the most important
of all the remote sensors. These systems include metric
cameras and panoramic cameras, which are being used
for conventional monitoring by DOD, NASA, EPA, and
other Federal agencies. Routine requirements for the
processing of camera films are not included in this paper.
Multispectral Scanner
A multispectral scanner (MSS), Figure 1, is a device
which provides data in a multiband mode similar to that
obtained from multiband camera systems. The major
operational difference is that a multispectral scanner is
an electro-optical instrument. The scanners are con-
figured with single or arrayed detectors which sense the
incoming image from a collector optic (scanning mirror).
Scanners look at a single "spot" of the area at any given
instant in time. The spot is scanned laterally to produce
a line of imagery, also shown in Figure 1. The forward
motion of the aircraft or spacecraft collects successive
148
-------
lines to produce a swath or the scene. These lines are
translated into imagery. The incoming optical signal
image is converted to a modulated electrical signal which
can be either recorded on tape for later reproduction or
used to vary a point source of illumination to photo-
graphically record the image on film.
In the case of multiple-detector arrays, each
detector or group of detectors is designed to provide an
optimum response over a discrete portion of the electro-
magnetic spectrum. In this way, a single overflight with a
multispectral scanner can provide a number of
simultaneous "filtered" recordings, from which a
number of different spectral terrain characteristics may
be detected.
The scanner technique is always applied to thermal
infrared (1R) sensing since no film emulsion is capable of
direct thermal IR recording. Thermal surveys, used to
record either relative heat contrasts of surface objects or
absolute thermal values (with ground control calibration
inputs), have been extensively employed for application
in studies such as those of thermal pollution and near
surface geologic structures as seen in Figure 2. Attempts
are being made to devise workable techniques for
monitoring oil spills with thermal IR scanners. Multi-
spectral scanners can detect 256 levels of gray, as
compared with camera systems which can record only
30 shades of gray. This is a great advantage when looking
for very subtle changes in water or land; imagery is
presented with a greater spectral response than what is
obtainable from camera films.
Spectrometer/Radiometer
The spectrometer and radiometer are passive
devices which can measure spectral radiances over wave-
lengths of 0.3*14 ;jm and record the radiance levels. The
outputs are line plots on a coordinate system. The
spectral resolution of the spectrometer is much narrower
than the radiometer and can detect individual pollutants
such as SC>2 and 0^ by looking at the absorption spectra
of the pollutants at specific wavelengths.
LASER/LIDAR
The LASER converts input pump power into
coherent optical out power. The output is a coherent
radiation which can relate a variety of environmental
factors because of the following LASER characteristics:
Narrow frequency
Highly directional
High intensity
Constant phase.
A LIDAR profilometer has been developed and is
shown in Figure 3. The terrain profile of strip mine areas
reveals quantitative information such as elevation, slope,
tailings, and revegetation characteristics. Another
LIDAR application was demonstrated in the St. Louis
RAPS program. An isoscattering contour plot over St.
Louis, Missouri, made by flying a LIDAR on a heli-
copter, is shown in Figure 4. Notice the high density of
participates east and west of the river. Contour plots of
this type, made within an hour, are not cost effective
using in situ methods.
Automated In Situ Sensor Systems
These systems depend on electronic relays to
transmit information to a central location for storage
and analysis. This system will provide the capability of
collecting vast streams of data automatically from in situ
sensors (e.g., pH, D.O., heavy metals, etc.) located at
remote or inaccessible locations on the surface. The
value of such a system is that it facilitates the recovery
of continuous data in regions which have, until now,
required extensive field surveys to acquire even a few
data points. Each sensor/transmitter is designed to
continuously sample and record data and, by receipt of a
coded signal or on a prescribed time schedule, transmit
these values to a satellite, an aircraft, or a ground station
within the line-of-sight of the in situ sensor mounted on
a platform in water or on land.
The data collection system promises to record the
continuous data required for environmental baseline
studies and will make possible early warning of such
incidents as floods, earthquakes, forest fires, oil spills,
and offshore dumping violations.
DATA PROCESSING AND PREPARATION
In order to extract all vital information, remotely
monitored data requires a complete processing capa-
bility. This usually includes computer operations,
systems analysis, and applications programing for
problem definition and mathematical analysis. The data
reduction system consists of computers with a complete
set of standard peripheral devices, special devices for
image digitizing, and display. Multispectral processing of
the data requires an optimum hardware/software con-
figuration, accurate algorithms, and data processing tech-
niques. Data preparation includes photographic and
processed electronic data collection, preparation, and
documentation.
149
-------
REMIDS
In the Environmental Monitoring and Support
Laboratory in Las Vegas, Nevada, the Remote Micro
Imagery Data System (REMIDS) will provide an
efficient method for storage and retrieval of interpreted
remote sensing imagery in high resolution microform.
This system, shown in Figure 5, is oriented toward
supporting various existing and anticipated EPA regula-
tory permit programs which require periodic field
inspections. A central index of the stored data is main-
tained which can be accessed via remote terminal
devices.
The proposed system is currently composed of
three programs which have been designed to allow
maximum flexibility of output. A fourth program will
be added once the system has been finalized. This fourth
program will be an edit update package and will be used
for file maintenance purposes.
The following is a brief description of the three
existing programs:
1. Aperture Card: This program prints the
aperture card and creates a master file.
2. Selection Program: This program allows the
user to selectively query the master file and to determine
what information is available, primarily in a specific
geographic area. The user can select in any of the
following fields:
State name
County name
City name
Facility name
Receiving waters
Standard Industrial Codes (SIC)-This selec-
tion can be on the first, second, third, or all
four digits.
Currently, the selective program is set up to allow
up to 20 different names in each field. The number is
arbitrary and can be expanded easily. The selection
process is mutually exclusive. For a record to be
selected, it must meet all selection criteria. For example,
to select all the facilities in Pennsylvania, the user would
use one selection specifying Pennsylvania. To get all
records in Washington County, Pennsylvania, the user
would specify Pennsylvania/Washington. If Pennsylvania
were left off, the user would get the records for
Washington County in any State that has a county of
that name. The selected records can be printed in the
standard sequence, which is Facility Name within State
Name, or in any sequence desired by the user (County,
SIC, Major Industry Code, Receiving Water, or Discharge
Number).
3. Polygon Selection: This program selects all
records which fall within a polygon specified by up to
20 latitude/longitude points. The report options for
sequence and format are the same as the selection
program.
When the proposed system is finalized, cookbook
instructions will be provided to the EPA Regions.
COMPUTER IMAGE PROCESSING
Modern technology utilizes all types of pictures, or
images, as sources of information for interpretation and
analysis. These may be portions of the earth's surface
viewed from an aircraft or an orbiting satellite. The
proliferation of these pictorial data has created the need
for a vision-based automation that can rapidly,
accurately, and cost effectively extract the useful
information contained in images. These requirements are
being met through the new technology of image pro-
cessing. A typical system is shown in Figure 6.
Image processing combines computer applications
with modern image scanning techniques to perform
various forms of image enhancement, distortion correc-
tion, pattern recognition, and object measurement. This
technology overcomes many of the inherent difficulties
associated with the human analysis of images or objects;
however, it is based upon the same fundamental
principles as visual recognition in human beings.
Although the actual visual process is physiologically
complex, the basic mechanism of vision uses the eyes
and brain as an automatic information interpreting
system. The eyes receive stimuli in the form of visual
light, and the brain processes and interprets this input
for the observer of the image. The human visual system
can be simulated using an electronic scanner for its eyes,
similar to a television camera, and a high-speed digital
computer for its brain. This type of system can "see"
images through the scanner and, by means of the pro-
gramed capabilities of the computer, can manipulate the
images in various ways that contribute to extracting
150
-------
desired information which is usually not apparent to the
untrained observer.
Through various aircraft and satellite programs, a
profusion of remotely sensed images are constantly
being acquired for use in the monitoring of pollution.
Image processing technology is providing the ability to
rapidly and cost effectively extract the abundance of
useful information embodied in these remotely sensed
data.
In conclusion, I have explained the necessity of
ADP in remote sensor data processing. It is only through
the use of computer technology that EPA scientists will
be able to fully exploit the outputs of remote sensors
and realize their potential in the area of environmental
monitoring.
TAPE RECORDER
SCANNING MIRROR
MOTOR
GROUND RESOLUTION PATCH
Figure 1
151
-------
jtl
ISOTHERMAL CONTOURS (°F)
THERMAL INFRARED IMAGERY
COLOR AERIAL PHOTOGRAPHS
NEW ENGLAND POWER SERVICE CO
BRAYTON POINT POWER STATION
SOMERSET, MASSACHUSETTS
1000 2000
_ j —i
FEET (approximately)
NERCLV PROJECT 7502
FLOWN SEPTEMBER 1. 1974
NERC LAS VEGAS
Figure 2
152
-------
Figurc3
25
20 15
WEST OF RIVER
5 10
MISSISSIPPI
RIVER EAST OF RIVER
AIRCRAFT GROUND POSITION. KM
Figure 4
-------
REMIDS* SYSTEM FLOW
REMOTE MICRO IMAGERY DATA SYSTEM
Figure 5
-------
DISC
STORAGE
r
I2S MODEL 70 USER CONSOLE
DISPLAY PROCESSOR
~l
H P 21 MX
COMPUTER
ERTS
MAGNETIC
TAPE
REFRESH
MEMORY
2-512x512x8
BIT IMAGES
1-512x512x2
BIT GRAPHICS
PROCESSING
ARRAY
PAPER TAPE
READER
CRT
TERMINAL
COLOR
DISPLAY
I |
MODEL 500
DIGITAL IMAGE PROCESSING SYSTEM
Figure 6
-------
DEVELOPMENTS IN REMOTE SENSING PROJECTS
By Sidney L. Whitley
NASA's Johnson Space Center established the
Earth Resources Laboratory (ERL) at the National
Space Technology Laboratories (formerly Mississippi
Test Facility) in late 1970 for conducting research inves-
tigations to develop applications of remote sensing. It
was ERL's intention to use the large quantity of existing
data acquired by aircraft and spacecraft as well as data
to be collected in the future. ERL's mission statement is
shown in Figure 1 .*
ERL chose not to develop or refine sensors as other
organizations in NASA were chartered for that purpose.
The sensor technology was further advanced than user
application technology and remains so today. It was not
ERL's intention to develop data handling systems either;
however, a data analysis capability was needed to
develop applications of remotely sensed data. In 1971,
ERL awarded a contract for the design and manufacture
of a data analysis system. This system, known as the
ERL-DAS, has been used to develop the applications
shown in this paper. The ERL-DAS has also served as a
test bed for the development of new low-cost data
analysis systems which may be afforded by a larger num-
ber of potential remote sensor data users.
As an outgrowth of past contacts between individu-
als in EPA and NASA, a working agreement was recently
established. This agreement was finalized in Memoran-
dum of Understanding (MOU) D5-E771. The project
resulting from this MOU is entitled, "Western Energy-
Related Overhead Monitoring Project." Its application is
directed toward monitoring the reclamation of strip
mines in western United States. NASA has agreed to the
following:
To collect certain data with an airborne
11 -band multispectral scanner (MSS) and a
laser profiler
To process selected NASA collected data
To procure an 11-band MSS, a laser profiler,
and a low-cost data analysis for EPA
To train EPA personnel to use the equipment,
associated software, and procedures.
Work began under this MOU in June 1975, and is pro-
gressing well. Figure 2 is a list of equipment and supplies
NASA agreed to use in establishing a data acquisition
and processing capability for EPA. It should be noted
that the list includes the two sensors specified above, an
image display system, a small computer, and several out-
put recording devices. All work performed under this
MOU is funded by energy pass-through funds.
In the course of ERL's research activities, several
low-cost data analysis system (LCDAS) configurations
have been defined. One of these configurations, low-cost
data system configuration 4, closely matches the sysiem
specified in the EPA/NASA memorandum of under-
standing and is shown in Figure 3. EPA's LCDAS will be
capable of reading data from EPA's airborne 11-channel
MSS and laser profiler. The aircraft data will be recorded
in Bi-Phase Level, Pulse Code Modulated (PCM) format.
A PCM front-end will be added to the LCDAS shown in
Figure 3 to allow the computer to read this highly
specialized data format.
The EPA/LCDAS will be very similar to ERL's in-
house data analysis system. A large number of data
processing and applications programs will be provided to
EPA under this MOU. EPA can adopt future ERL
produced applications programs with little or no man-
power expenditure because of the compatibility of our
data analysis systems.
The ERL has developed and documented a large
number of applications of remotely sensed data in our
own research and in cooperation with other user
agencies. Descriptions of some of these applications
follow.
The ERL has developed a technique called the
Water Search Program for detecting water vs. not-water.
The results may be color or grey-shade coded on either a
color film output or on an electrostatic printer/plotter.
Figure 4 is an example of the water search output
including a breakout of water and land area in acres. One
application of this technique is shown in Figure 5. The
technique is useful in studying the loss or buildup of
shoreline, provided the data are carefully selected on
different dates and at appropriate tide levels.
Figures presented at the ADP Workshop were color photographs. This publication is limited to only black and white reproduction of
these figures.
156
-------
Much of our early spectral pattern recognition clas-
sification work was done in agricultural regions because
ground truth is easily obtained and because fields are
usually homogeneous. The technique is equally effective
in studying marsh areas, such as the region shown in
Figure 6. This particular marsh is known to be a salt
marsh mosquito breeding area. Through the use of
knowledge about the types of terrain on which mosqui-
toes breed, the types of vegetation that can exist on such
terrain, and a vegetation classification map produced by
spectral pattern recognition, one can infer a map which
indicates potential for mosquito breeding. Figure 7
shows a salt marsh mosquito breeding map where red
represents positive conditions, green represents negative
conditions, blue represents water, and white represents
other types of material, including roads, houses, and so
forth.
Jointly, ERL and the National Marine Fisheries
Service, both of NSTL, conducted a study to determine
if menhaden fish catches could be related to water color.
Through these studies, it was determined that menhaden
fish were caught principally in waters of a certain color.
A model was developed, LANDSAT data were input to
the model, and a map of high, medium, and low poten-
tial for menhaden was produced. Figure 8 is an example
of such a map.
A few months ago, ERL and the U.S. Army Corps
of Engineers entered a study to determine if a certain
Corps of Engineers-produced atlas could be updated
with remote sensor data. It was determined that certain
maps could be produced from LANDSAT MSS data.
Figure 9 is a simulated color infrared photo map of the
test area produced from MSS data. The map is composed
of 27 LANDSAT frames collected in three seasons. The
data has been translated from LANDSAT scene coordi-
nates to the Universal Transverse Mercator (UTM) pro-
jection. The map was produced to a scale of 1:250,000,
and the original product is accurate to about 300 meters,
root mean square. The area shown in Figure 9 was also
processed through ERL's spectral pattern recognition
programs, and a surface classification of 24 material
classes was produced. These 24 classes were aggregated
to seven classes (i.e., individual crops were changed to
agriculture, tree species were changed to forest, etc.),
and a color-coded map was produced at 1:250,000 scale
referenced to the UTM projection. Figure 10 is a photo-
graph of the classified map produced by this technique.
The map shown in Figure 11 was produced by the
same procedure as described above, but a greater number
of categories were delineated in the map. It should be
observed that the classification map has been super-
imposed over a quad map, and that the fit, which is
particularly evident in the lower right corner of the
figure, is quite good.
Certain regulatory agencies are quite interested in
the extent of salt water intrusion in marshes. ERL
botanists have used both aircraft and space acquired
MSS imagery to survey salt marshes, brackish marshes,
and fresh marshes. Figure 12 is an example of this capa-
bility, and another example of how well the remote
sensor map can be made to fit the more conventional
quad map.
During the past 3 to 4 years, ERL has greatly
simplified and quickened its computer programs for
processing remotely sensed multispectral scanner data,
and has adapted these programs to run on small, widely
available computers. During the past 2 years, inexpensive
and highly capable image display systems have been
designed and are now available commercially. Many
users have a need for color-coded outputs of very high
precision. The production of high quality color products
has remained an expensive item
ERL has conducted research to develop inexpensive
techniques for color recording. Although the work is still
in progress, there are some prelimiary results. One of the
output devices ERL originally considered for production
of grey shade maps is an electrostatic printer/plotter.
This printer/plotter produces all of the standard line
printer characters plus 16 shades of grey. It has been
determined that the grey shade plots can be converted to
color maps as described below.
The computer can be instructed to divide the
digital imagery data into Red, Green, and Blue (RGB)
components (or into separate land use material classes),
or separate grey shade maps can be printed out for each
component. These grey shade component maps are con-
verted to film negatives and registered (the plotter is
geometrically repeatable). The film positives can be con-
verted to a color map using a $19.95 graphics kit plus an
inexpensive black and white contact printer. The
graphics kit, called Kwik Proof, was developed for the
lithographic industry, and is used in proofing materials
before an expensive lithographic run is made. The break-
through needed was to format the scaled, digital image
onto paper, and subsequently onto film so that the
graphics kit could be used. Figure 13 is a color-coded
soils map of Washington County, Mississippi, which was
produced by this technique. There are nine soil types
157
-------
shown as different color levels in this scene. Only a very
small portion of graphics materials was required to
produce this product. The time required was .about 2
hours, most of which was drying time. Figure 14 is a
color-coded land-use map containing approximately 15
colors. This map shows that a large number of colors can
be produced from only red, blue, and yellow (in this
case) components. It should be observed that the com-
puter superimposed coordinate lines registered quite
well.
Another similar color output technique is under in-
vestigation by ERL. The required equipments are only
slightly more expensive, but the product quality will be
excellent and the processing time is short.
All of the techniques and applications will be avail-
able to EPA through the EPA/NASA agreement.
158
-------
EARTH RESOURCES LABORATORY AT NSTL
MISSION
• CONDUCT RESEARCH INVESTIGATIONS IN MISSISSIPPI-LOUISIANA
-GULF AREAS IN THE APPLICATION OF REMOTE SENSING.
• STRESS INTERESTS AND NEEDS OF AGENCIES IN THE AREA.
• UTILIZE EXISTING AIRCRAFT AND SATELLITE PROGRAMS AS A
SOURCE OF DATA.
• COLLECT AND ANALYZE SURFACE DATA FOR CORRELATION
WITH FLIGHT DATA.
• CONDUCT STUDIES OF USER REQUIREMENTS OF POTENTIAL
APPLICATIONS IN ORDER TO GUIDE RESEARCH EFFORTS.
Figure 1
-------
COST BREAKDOWN OF DATA ACQUISITION It PROCESSING HARDWARE
Image Display System 40K
FR2000 Tape Deck (Direct Read, All Speeds) 30K
FM Playback for Analog Recorded Data 30K
PCM Front End for Reading RS-18 Dart 50K
Computer System 180K
V-74 Comp. w/32K MOS Memory $38,400
32K MOS Memory (Additional) 20,000
Disc 46.7 M words 30,300
Line Pr., 14" wide 10,200
States, 33 Pr. Hotter. 22"wide 12,500
2-120 IPS Tape Drives. 9-trk 16,000
Tape Controller 6,000
Card Reader 4,000
Paper Tape Reader 2,300
Floating Pi. Processor 5,000
Expansion Chassis 1,000
I/O Expander 600
3-Buffer Interlace Controllers 1,500
2 - Priority Interrupt Modules 1,000
1 -Block Transfer Controller 1,500
Cal Comp Plotter 30,000
Film Recorder, B & W Strip, 5" 20K
Film Recorder, Color, Strip, Stand-Alone. 9.5" 120K
Auxiliary Air Conditioner, 15 tons for Computer Room 30K
Supplies (Tapes, Film. Paper, etc.) 25K
Support (Photo Lab, Printing, etc.) 20K
$550K
11-Channel Airborne Multispectral Scanner
Airborne Terrain Profiler
PCM Encoding System
Set (Jround Support Equipment, Cal. Checkout
Figure 2
160
-------
LOW-COST DATA SYSTEM
CONFIGURATION 4
IMAGE DISPLAY
IPS
OPERATORS
CONSOLE
IPS
S40K
HAR
COMPUTER
LINE
MINI
COMPUTER
ELECTRONICS
RACK
LINE PRINTER
CARD PUNCH
TAPE DRIVES
OUTPUT RECOIL '.'; ^-
COLOR FILM REC
S11SK
COMPUTER CARD
OPERATOR 5 READER
CONSOLE
— •«
DISC
ELECTROSTATIC
PRINTER/PLOTTER
S1SSK
Figure 3
-------
Shoreline 2345 Mi.
Water 799 Sq. Mi.
Land 1172 Sq. Mi.
Figure 4
162
-------
LAND / WATER INTERFACE ANALYSIS
-\DSAT MSS DATA, PROCESSED BY WATER SEARCH AND SHORELINE ANALYSIS PRCC-?
\DICATES A 17 KM2LAND AREA DECREASE AND 58 KM LOSS OF SHORELINE AT MOl'>
MISSISSIPPI RIVER. BETWEEN JANUARY & DECEMBER OF 1973
Figure 5
-------
o MARSH ECOLOGICAL STUDIES o
COMPUTER DERIVED ECOTYPES
ERL MTF
CLASSIFICATION
SPARTINA PATENS &
JUHCUS ROEMERIANUS
TREES & SHRUBS
OPEN WATER
SAWGRASS (CLADIUM
JAMAICENSE)
WHITE WATER-LILY
(NYMPHAEA ODORATA)
CATTAILS (TYPHA SP.)
ELEOCHARIS QUAD
RANGULATA
UNIDENTIFIED
GRAMINEAE
Figure 6
-------
MARSH ECOLOGICAL STUDIES
FRITCHIE MARSH. WHITE KITCHEN. LA
SALT MARSH MOSQUITO (AEDES SOLLICITANS WLK
BREEDING MAP (INFERRED FROM VEGETATIONAL
CLASSIFICATION MAP)
CLASSIFICATION CODE
•ISIIITI IIUIIIC
- IISIIITI llffllNg
IIIEIIS
Figure?
-------
89°00'W
88°30'W
30°30'N -
30°20'N
30°10'N •
30°00'N
^ ,
HIGH POTENTIAL AREA
_J MODERATE POTENTIAL AREA
};.::.| LOW POTENTIAL AREA
Figures
-------
SIMULATED COLOR INFRARED PHOTOMAP
USING LANDSATI
DIGITAL DATA ACQUIRED 1973 1974
SCALE. 1:250,000
Figure 9
-------
COMPUTER DERIVED LAND USE CLASSIFICATION
USING IANDSAT 1 DIGITAL DATA
ACQUIRED 1973 1974
• AT II LOUIS
SCALE I 250 000
Figure 10
-------
I M VII Mil MUt
.•/J»c IMTN WKHMCCt LMtMUTM*
UL t*KI TtCNNOLOOV L«iOB»TO«ll«
COMPUTEB DERIVED LAND USE CLASSIFICATION
USING LANOSAT-1 DIGITAL DATA. ACQUIRED 1973-1974
20
Figure 11
-------
WSTIR* UNITED STATES 1 ?W 000
a--
»t»/j«C IANTM M1
tOMAi M«CI TICi
•AT BT LOUK
WHHKU i*»0«ATOHt
COMPUTER DERIVED LAND USE CLASSIFICATION
IMMG LANOSAT-1 DIGITAL DATA. ACQUIRED 1)73-1*74
13
Figure 12
-------
lea
9 LEVEL DIGITIZED SOILS MAP PRODUCED FROM 3 GRAYSCALE
PLOTS FROM ELECTROSTATIC PRINTER PLOTTER
PROCESS REQUIRES $19.95 GRAPHICS KIT
PLUS SMALL B fc \\ CONTACT PRINTER ($600 to 82,000)
Figure 13
171
-------
Figure 14
-------
SUMMARY OF DISCUSSION PERIOD - PANEL V
This discussion period, on the Developments in Remote Sensing Projects, included the following remarks.
Definition of Remote Monitoring
Remote monitoring is defined by EPA as including not only remote sensing which uses instruments such as lasers and
multispectral scanners, but also automated in situ contact monitoring platforms in which the information is telemetered back
to some central location.
CHAMP System
The difficulties experienced in the CHAMP system were discussed.
The panel felt that the original specifications were not too loose and the data system had no major weaknesses.
Documentation of system programs was completed slowly, however. It was agreed that the main weakness can be traced to
the instrumentation. Although most instruments met initial specifications, they were not thoroughly field tested before being
employed. RAPS encountered similar problems. If the system were to be implemented again from the start, a stronger
emphasis should be placed on field maintenance. Problems with instrumentation were handled as brushfires. More flexibility
should have been introduced into the RAPS system initially to handle such instrument problems. The question of how to
handle data that can be accurately adjusted for known instrument error must be addressed. It should be decided whether to
flag the data as invalid, or to correct the data and document the changes with appropriate comments. Emphasis should be
placed on system requirements instead of instrument manufacturers'specifications.
A considerable portion of the CHAMP error checking is done manually. It was asked whether more could have been
done by machine. Eventually all will be done by machine. About 85 percent of the data is now completely
machine-validatable. Only quality spot checking should be required.
NASA-ERL System
The applications which are planned as part of the EPA low cost data analysis system were discussed. The software
developed by NASA-ERL was designed predominantly for monitoring the reclamation of strip mining activities in western
United States, but the software applications are available to EPA. As the system is transmitted, training sessions will be
coordinated. The system being developed under a work agreement with EPA can now be used in a hands-on environment at
the NASA-ERL facility.
Monitoring Techniques
The technique used for discrimination between smog, smoke, and fog was explained. Interpretation can be made by
coupling the presence of smog, smoke, or fog with concomitant happenings in the surroundings; i.e., stacks and meteorologic
conditions. Color analysis can also be used.
It was asked whether ultraviolet (UV) fluorescence is specific for petroleum. Natural organics can be picked up if they
fluoresce. Lasers detect not only oil but also its thickness. They also give some information as to the type of oil.
EPA will cope with offshore monitoring platforms and with monitoring the Continental Shelf as follows. The Office of
Monitoring and Technical Support must first determine what tools are needed for the offshore oil rigs. Within 10 years, there
will be about 1,000 large platforms off the coast of the United States. Industry will, of course, try to cut costs. It is hoped
that through proper prioritization of resources within ORD, sensors will be developed that are not only overhead monitoring
types, but data buoys which can be put in select locations to give profiles of exactly what is happening. Interagency
agreements will be needed to accomplish this task.
173
-------
Storing Remote Sensing Data
The best system for storing volumes of remote sensing data depends on what the data uses will be. Conventional data
bases are inadequate to handle the large data bases that result from remote sensing. One system, the REMID system, uses the
computer as an index into the data base.
Commercial Software Use
The contour package used in Las Vegas is a commercial software package that is in use. It was developed by a small
company and is very similar to the conventional plotting routines.
Pollution Monitoring
Remote sensing data does not include information for pollutants covered by environmental standards, but airborne
remote sensing is not a wasted luxury. Information is required by the Agency besides the pollutant concentration at specific
points. There are already examples of court cases in which remote sensing data were used as evidence. Eventually, remote
sensors will be developed for specific pollutants. Remote sensing will give information on where pollutants are concentrated
and where further monitoring should be performed by in situ monitoring. Remote sensing is also being used by Region V to
determine pollution sources and for nonpoint source pollution monitoring.
EPA Precedence in Remote Sensing
To get management acceptance and Agency utilization of remote sensing, the Agency must reprioritize. Instead of
waiting for other agencies to lead the way, EPA must take the lead in certain areas.
174
-------
AGENCY NEEDS AND FEDERAL POLICY
By Melvin L. Myers
This presentation will consider:
Where EPA currently stands and where it is
heading
Trends within EPA
Problems which EPA encounters
Functions which EPA needs to perform and
resulting data base requirements
Federal constraints on the usage of ADP
systems.
As the Administrator has remarked, EPA should be
examining accomplishments over the last 5 years and
structuring programs for the next 5 years.
Future trends in where EPA is going include:
Defining EPA's goals and objectives for the
next 5 years in addition to compliance with
statutory deadlines
Concentrating on preventing environmental
deterioration as well as abating pollution
Strengthening our Federal, State, and local
partnership in environmental programs.
A possible framework for structuring these trends
within the Agency is illustrated in Figure 1; a strategic
approach to defining goals and a waste management
approach to deterioration prevention. The figure high-
lights implementation as the output of our environ-
mental programs.
The EPA has been encountering several problems,
including:
Difficulty in court cases
Efficacy in meeting our environmental
standards
Shifting efforts to maintenance of clean air
Institutional demonstration of waste manage-
ment practices
The question of our role in radiation and that
of the Nuclear Regulatory Commission
The implementation of areawide planning
The degree of regulating pesticides
The legal problems in proving adequate
quality assurance.
The Agency is addressing conditions of its own
management environment, including:
Economic, energy, and environmental impacts
of regulations
Emerging toxic substances legislation
Increased Regional/headquarters interaction in
enforcement
Firm technical backup developed in support
of regulatory action
Novel approaches such as our fuel economy
program
Requirements of the Freedom-of-lnformation
Act.
The Agency recognizes the need for ADP policy
and has established a steering committee to develop an
Agency 5-year ADP plan. The committee is comprised of
the five Assistant Administrators.
Agency functions will be used as a basis for estab-
lishing the need for current or new data bases. Figure 2
shows how this may be done through an input-output
matrix and within the pattern of the functions as illus-
trated in Figure 1.
175
-------
A list of Agency functions for which data bases are
and will be required include the following (see Figure 2):
Administration (personnel, finance)
Implementation (enforcement, citizen partici-
pation, monitoring)
Strategic analysis (safe environment, clean
environment)
Resource recovery (reuse, reclamation, and
recycling of energy and materials)
Deterioration prevention (areawide planning)
Consumption modification (fuel economy,
waste paper source reduction)
Production modification (alternative input,
processes, and practices)
Pollution abatement (ambient standards,
source standards)
Usage restrictions (pesticide classifications)
Research (toxicology, pollutant characteriza-
tion, research monitoring)
Development (prototype technology, model
validation)
Demonstration (control technology, alterna-
tive technology)
Quality assurance (pollution and effects
measurement, standards achievement, mon-
itoring planning).
These functions require many special considera-
tions in the development of data bases. We need good
documentation from which to respond to Freedom-of-
Information Act requests, and we need to question
whether large systems or small systems should be used.
We need a national communications network, a way to
secure trade secrets when registering products, a method
to allow for professional peer review of data, and a total
Agency approach of maintaining valid information on
the status of environmental quality and its relationship
to meeting our standards.
The Office of Management and Budget (OMB) must
oversee Federal use of ADP resources. The relationship
between OMB, the General Services Administration
(GSA), and the other Federal Agencies is usually
misunderstood. Actually, the significance, both domes-
tically and internationally, of the Federal influence on
the ADP market warrants overseeing by OMB. Although
OMB would prefer to limit its role to program budget
decisionmaking, it must, in the case of ADP, also act as a
guardian against proliferation and misuse of a means to
an end.
The OMB function is not a regulatory one, but
because of the implications of Federal ADP policy, it
maintains a veto right over GSA and other agencies
concerning the usage of computers. OMB relies upon
GSA for policy direction and project impact analysis, as
well as for computer-related alternatives, costs, and
benefits. OMB may get involved when implementation is
not consistent with its policy.
Over the last 3 to 4 years, OMB has consciously
implemented its policy; its hypothesis is that the first
avenue should be to use the private sector to fill an ADP
need. In-house systems resulting in utilities should be
avoided.
There are two primary justifications for remaining
in-house for computer systems. One reason is if it is
clearly to the Nation's advantage, and the other is cost.
If a computer is to be provided in-house, then three
stages must be considered. First, is there surplus time on
another Government computer? Second, can excess
equipment be reused to fill the need? Third, should the
equipment or service be rented or purchased?
OMB has urged GSA many times to change its
policy regarding minicomputers. Current policy prefers
larger systems, but we need smaller systems.
Implementation of GSA's Federal Schedule 66,
Laboratory Instruments and Automation, is hopefully,
lessening administrative resistance.
176
-------
PROBLEMS
CRITERIA
WASTE \GUIDELINES
MANAGEMENT
RESEARCH
MONITORING
PLAN
CRITERIA
POLLUTION \
ABATEMENT J
REGULATIONS
IMPLEMENTATION
/STANDARDS
TECHNOLOGY TRANSFER
Figure 1
A Functional Pattern of the Agency System
FIIE I INPUT I OUTPUT
ft
1 WASTE
MANAGEMENT
POLLUTION
ABATEMENT
,
a
IU
**•
FUNCTION
ADMINISTRATION
IMPLEMENTATION
STRATEGIC ANALYSIS
RESOURCE RECOVERY
DETERIORATION PREVENTION
CONSUMPTION MODIFICATION
PRODUCTION MODIFICATION
POLLUTION ABATEMENT
USAGE RESTRICTIONS
RESEARCH
DEVELOPMENT
DEMONSTRATION
QUALITY ASSURANCE
PERMIT COMPLIANCE SYSTEM
FUELS DATA BASE
/*/t
y//f
•
•
•
•
•
//////
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Figure 2
A Partial Input-Output Matrix Approach Relating
Agency Functions to System Files
177
-------
MINICOMPUTERS: CHANGING TECHNOLOGY AND IMPACT ON ORGANIZATION AND PLANNING
By Edward J. Mime
Because of recent developments in computer elec-
tronic technology, computing power will soon be essen-
tially free. Since 1968, hardware costs of the
minicomputer classes have been decreasing at a rate of
approximately 30 percent each year, with an equivalent
increase in processing capability.
The Environmental Protection Agency (EPA) will
benefit from this trend, which will allow many of the
functions currently performed on large central com-
puters to be performed locally in Regional centers.
There are potential organizational and management
problems associated with the use of this technology;
however, with effective management coordination and
control, EPA can maximize present benefits and allow
for easy expansion or integration in the future.
The laboratory automation workshop presentations
represent developments made possible by recent break-
throughs in electronics technology. It is generally known
that the first generation of computers contained vacuum
tubes, the second generation, transistors, and the third
generation, integrated circuits. Large scale integration
(LSI) has resulted in a computer-on-a-chip; i.e., a general
purpose, programable integrated circuit equivalent to a
central processor unit of a conventional computer, on a
chip of silicone a few millimeters on a side and con-
taining the equivalent of several thousand transistors.
These technology developments are resulting in
micro- and minicomputers which approach the process-
ing capabilities of medium-sized systems like the IBM
370/135, but at an extremely small fraction of the cost.
Since 1968, minicomputer costs have been decreasing
approximately 30 percent per year with an equivalent
increase in processing capability. A processor which cost
$25,000 in 1966 can be purchased for less than $2,000
today. Nearly 80 percent of all minicomputers have a
processor in the price range of $2,000 to $10,000.
A 1974 Auerbach study states that the number of
computer installations in existence today represents less
than 5 percent of the total computing power projected
for 1984. Last year, minicomputers represented $1.2
billion or 13 percent of the total computer market; and
by 1984, the total hardware market will have increased
to $20 billion with minicomputers accounting for $6
billion or 30 percent of the total.
A question often asked is "Will minicomputers
replace large computers?" The answer is "No." There
will always be a need in EPA for large computers like an
IBM 370/158 or a Univac 1110 which can process great
masses of water and air quality data for trend analyses
and modeling. Likewise, there will always be a need for
dedicated and general purpose minicomputers which
bring easy-to-use, dependable, responsive, and cheap
computing power to many. All EPA computer applica-
tion areas can benefit from minicomputers, especially
when linked to large computers.
A relatively new term which describes the shared
processing of minicomputers and large computers is
"distributed processing." In the context of this discus-
sion, distributed processing is the interconnection of
minicomputers of approximately the same level of
capability into a network of hierachical computers for
the purpose of data processing.
This concept of placing low-cost computer power
at various action points in an organization, and linking
these points where necessary, is happenning in EPA and
deserves serious management consideration. Nearly 3
years ago, during the early discussions of the Cincinnati
Laboratory Automation project, this concept of linking
various levels of computers was presented. The issues
related to distributed processing are significant and
sometimes emotional; the concept is contrary to central-
ization and for this reason is bound to cause some
confusion at the management level.
EPA is a decentralized organization with more than
two thirds of its employees, or 6,000 people, in
autonomous field locations; therefore, planning for
distributed processing is imperative. However, to ensure
that distributed processing does not get out of hand, the
decentralization of ADP operations should be selective
with strong centralization and control.
Figure 1 highlights the respective functional
responsibilities to be assumed by the Headquarters
Management Information and Data Systems Division and
by the users. These responsibilities have been assumed in
the Cincinnati Laboratory Automation project and are
applicable to the generalized use of distributed pro-
cessing in EPA.
178
-------
An interesting trend can be seen in Figure 2. As
computer technology is evolving, the respective
functions of ADP and user departments are being
exchanged. What were previously functions of ADP
departments are being user department functions, and
previous functions of user departments will soon be the
responsibility of ADP departments.
As technology is developing, the most important
and expensive resource in minicomputer applications is
personnel. Equipment costs will continue to decrease
but personnel costs will continue to increase. We have
reached the point where one can purchase a micro-
computer from petty cash and carry it in a pocket; but
in order to effectively use this equipment, a person
knowledgeable in hardware/software, logic design, inter-
facing techniques, and real time assembly language
programing is essential.
In conclusion, as stated before, computing power
will soon be .essentially free. Many of the functions
currently performed on a large central computer will be
performed locally or in Regional centers. EPA will have
networks of computers with distributed intelligence
consisting of modular central processors with modular
programs being executed in many different sub-
processors.
Terminals will be superintelligent and computer
peripherals will contain microprocessors. The technology
which this represents shows every promise of bringing
about the benefits that computers have been expected to
deliver since their acceptance more than 20 years ago.
The proliferation of distributed processing in EPA
need not be a threat to the concept of centralization; it
only places more responsibility on all involved to com-
municate and coordinate more effectively. An unrespon-
sive organization cannot keep minicomputer applications
from developing but it may lose the benefits to be
achieved now and may prevent easy expansion or
integration in the future.
Management Information and Data Systems Division
(MIDSD)
• Authority and responsibility for establishing
policy and long-range goals
• Planning and project selection
• Coordination of:
- Systems development
- Hardware selection
• Programing language selection
• Audit of decentralized operations
• Operation of large-scale computing facilities
Users
• Full-time participation on project teams
• Control of the systems design effort
• Hardware/software maintenance and operation
• Employment of programer/analyst
Figure 1
Functional Responsibilities
179
-------
oo
o
Era
1950-60
1960-70
1970-75
1975-85
1985-90
Technology/Control
Factors
• Computers are new tools
• Lack of understanding of
computers
• Proliferation of technology
• Project teams emerge
• MIS
• Teleprocessing
• Better understanding of
computer usage
• Distributed processing
• Corporate data base
• High-level language
• High-level application
software for minicomputers
Functions of
User Departments
• Part-time participation
on design of ADP systems
• Maintain data base
• Full-time participation on
project teams
• Control of project team
• Full-time participation on
project teams
• Control of project team
• Full-time participation on
project teams
• Employ analysts
• Limited programing
• Select and maintain hardware
• Control the design effort
• Full-time participation on
projec teams
• Employ programer/analysts
Functions of
ADP Departments
• Select and maintain hardware
• Control the design effort
• Full-time participation on
project teams
• Employ programer/analysts
• Select and maintain hardware
• Control the design effort
• Full-time participation on
project teams
t Employ analysts
• Employ programers
• Maintain data base
• Select and maintain hardware
• Full-time participation on
project teams
• Employ analysts
• Employ programers
• Maintain data base
• Select and maintain hardware
• Part-time participation on design
• Employ programers
• Maintain data base
• Part-time participation on design
• Maintain data base
Figure 2
Exchange of Functions
-------
UNIVAC 1110 UPGRADE
By M. Steinacher
A major hardware upgrade is planned for the
Univac 1110 computer system at the National Computer
Center. The scheduled new equipment will be delivered
and installed during mid-December and ready for
production use in early January.
The reasoning behind the upgrade and the
anticipated benefits are examined in this paper. A brief
review of the Univac's procurement history is also
included as background.
The original hardware/software specifications,
ultimately used to procure the Univac computer, were
prepared as early as September 1970. EPA had not yet
been established and the procurement was intended to
support the growing ADP requirements of the National
Air Pollution Control Administration. As EPA and the
RTP NERC were formed, the developing specifications
were hastily modified to address the relatively unknown
requirements of the new Agency and its local RTP
components.
EPA was relatively inexperienced in the procure-
ment of large-scale computer systems when, in early
1971, it engaged the General Services Administration
(GSA) as an administrative and contracting agent in the
procurement. EPA personnel maintained primacy as
technical advisors and handled the technical aspect of
the benchmarking and proposal evaluations.
GSA was particularly concerned about maintaining
competitive procurement, and, in some instances.it was
necessary for EPA to modify technical specifications to
facilitate open bidding. Although it may have been very
difficult for the Agency at that early date to absolutely
justify certain capacity and speed requirements, GSA
modifications may be directly responsible for the
throughput bottlenecks experienced after installation.
However, open bidding competition was achieved,
and three vendors were benchmarked in the fall of 1973.
Univac passed all of the mandatory requirements and
was by far the lowest bidder. It was awarded the
contract in June 1973. The original configuration was
subsequently installed in October 1973, and was finally
accepted by the Agency in February 1974.
During EPA's early developmental years and while
the procurement process was being executed, the ADP
workload to be supported by the RTP computer grew at
an accelerated pace. The expanding workload was
further compounded by the developing interest in time-
sharing applications. As a result, the actual computer
requirements to be supported by the new Univac system
were materially different from those reflected in the
1970/71 equipment specifications.
The initial performance of the Univac 1110 was
certainly less than desirable. It was characterized by
frequent stops, periods of degraded operation, and
limited responsiveness. Although still less than stable,
the system did settle down to reasonable periods of at
least predictable, if not normal, performance. Subse-
quently, it was possible, through the use of hard-
ware/software monitors, to evaluate and quantify
performance.
Performance analysis clearly demonstrated that
even under the most stable operating conditions, a
serious hardware imbalance existed. Less than
30 percent of the available machine cycles were actually
being used. The computer was literally waiting for work,
work that was apparently bottlenecked somewhere.
Individual jobs were in constant competition for the
same peripheral device and for an adequate memory
resource for program execution. As the operating system
attempted to handle the situation by swapping jobs in
and out, a significant overhead workload of at least
23 percent was being generated.
Recent monthly averages of 32 stops and 13.8
hours between failures indicated that the system was
approaching operating stability. Assuming it could be
achieved, a more productive hardware mix had to be
considered. Therefore, negotiations with Univac
addressed the inadequacies of mass storage and primary/
extended memory as well as the need for dual access
channels (I/O paths) and additional tape units.
The December upgrade reflects the response to
these requirements and includes the following com-
ponent increases:
Mass storage
Primary storage
68%
33%
181
-------
Extended storage 200%
8 additional magnetic tape drives
In addition, a second operator console and a second
I/O control unit were added to provide for backup and
for operating redundancy.
The future outlook for the Univac's computing
potential appears bright. Univac estimates that the
upgrade will significantly improve throughput by
50 percent and turnaround by 80 percent. In addition,
at least 30 percent more jobs can be active and a similar
percentage-increase in the number of demand terminals
supported can also be realized. The unrealistic and
unproductive overhead workload will be greatly reduced,
thereby freeing even more computing potential.
We all look forward to the arrival of the
Univac 1110 as a dependable and powerful Agency
resource.
182
-------
A CASE FOR MIDICOMPUTER-BASED COMPUTER FACILITIES
By D. Cline
INTRODUCTION
When the general purpose digital computer became
a reality in the mid-1950's, only the largest corporate
organizations could afford the capital outlay required
for acquisition of a large-scale computer facility. The
huge cost of acquiring and operating large-scale com-
puter facilities was the factor that most delayed the
development of small-scale computer facilities. Drastic
cost reductions in hardware were realized in the early
1970's as a result of large scale integration (LSI) tech-
nology. Small-scale computer facilities became a reality
with the introduction of the midicomputer which has an
operating system that supports a multiprograming
environment and has device-independent input/output.
LARGE-SCALE CENTRAL FACILITIES
The concept of economy-of-scale is the most
powerful argument favoring large-scale central computer
facilities. According to this concept, many users have
collective access to a degree of computational power,
where none would have access if the total cost of a
facility had to be borne by each of the users. Other high
cost, low usage, special peripheral devices may also be
shared by numerous users in a large-scale computer
facility.
In EPA, a common point of debate is accessing
national data bases. Users of a large-scale central com-
puter facility can access national data bases. Data bases
often mentioned include water quality, air quality,
library, financial, and personnel data. Another attractive
feature of large-scale facilities is their large memory
capacity, which is useful when executing large tasks. The
remoteness of a user from the computer facility is
usually of little concern as most large facilities may be
accessed via direct dialing.
SMALL-SCALE LOCAL FACILITIES
On the other hand, many minicomputers and midi-
computers of the mid-1970's provide a greater com-
puting capability than was available from the largest
computers of the mid-1950's. In fact, a complete small-
scale system could be purchased for a year's budget that
would support a large-scale time-sharing facility. In EPA,
at least three feasibility studies performed to date
substantiate this statement.
. Although access to national data bases is an attrac-
tive feature, the requirement for utilizing national data
bases in EPA's research laboratories varies from no use at
all to daily use, depending on lab needs. Usually only
portions of these data bases are required by any
individual laboratory, in which case a subset of large
data bases could be implemented on a local facility
where intensive utilization is indicated.
Although many large-scale computer facilities have
large memory capacities, at EPA's General Time Sharing
Utility (GTSU) large memory segments are only readily
available to the user overnight. Many computer programs
that have a large memory requirement and an extremely
small execution time are penalized by being forced to
accept overnight response. This situation is normally
acceptable for production jobs, but quite unacceptable
for program development. Large computer programs can
be executed on small machines by using external files for
data storage and by using overlays to decrease the
amount of memory required at any one time.
An attractive feature of EPA's GTSU is its
extensive telecommunications network. To date, how-
ever, only 30 character per second (cps) lines are
commonly available. On local facilities, the transmission
rates are normally 30 to 1,250 cps.
In a research atmosphere where mathematical
modeling and program development have high priorities,
denial of access to the system after midnight on week-
days and all day Sunday severely impedes attainment of
goals in a timely manner. With a local, in-house, small-
scale facility, the system is available 24 hours a day, 7
days a week.
SUMMARY
Large-scale computer facilities have been the main-
stay of the computer industry since their inception and
will continue to provide for the bulk of computational
requirements, mainly because of their favorable
economics. There always will be requirements for main-
taining large repositories of information, for executing
huge memory-bound tasks, and for providing service for
small users. The small-scale computer facilities, however,
183
-------
will complement the large-scale facilities in a cost-
effective manner by performing much of the preediting
of data before it is transferred to the large data bases, by
executing many small routine tasks on a day-to-day
basis, and by providing a means to conduct program
development interactively. The net effect of small
facilities will be to relieve the stress on large facilities.
They will eliminate many of the small, routine tasks the
large facilities normally encounter, and will enhance the
ADP capability of the small research laboratory.
184
-------
STATUS OF THE INTERIM DATA CENTER
By K. Byram
The EPA is currently in the midst of data center
procurement. The contract awarded will furnish one of
two data centers to be used nationwide by EPA.
When EPA was formed in December 1970, it
incorporated several parts of predecessor agencies, and
those agencies were receiving computer service from a
variety of sources. For example, a computer center at
Research Triangle Park (RTP), North Carolina,
employed an IBM 360/50 to support EPA elements
there. National Institutes of Health Computer Center in
Bethesda, Maryland, and Boeing Computer Services in
McLean, Virginia, were supporting Agency headquarters
elements in the Washington, D.C. area and, to some
extent, nationwide elements with IBM 370 series
machines. Department of the Interior, Health Sciences
and Mental Health Administration, Food and Drug
Administration, Department of Agriculture, Atomic
Energy Commission, and several universities were
supplying computer services as well.
Two procurements were initiated in 1971 to 1972
to replace or upgrade the two principal computing
resources listed above. The machine at RTP was to be
replaced by a much larger system, and the contract at
Boeing Computer Services was to be reopened to
competition. Resulting awards were for a Univac 1110 at
RTP, in February 1974, and a contract with Optimum
Systems Inc. to replace the Boeing contract, in
April 1973.
At about this time, General Electric (GE), under
contract to EPA, completed a study which recom-
mended that the headquarters workloads, then con-
centrated at OSI, NIH, and a few other places, be
projected for transfer to an Agency-operated facility
called the Washington Computer Center (WCC).GE also
recognized a continuing need for a data center at
Research Triangle Park to service EPA elements located
there. EPA then began to plan a Washington Computer
Center of its own to follow the OSI contract which
would expire in 1975. An in-house task force was
formed to coordinate the planning for the center. As the
first step, the task force rejected the GE study's work-
load projections and basic conclusions, and began in
1974 to revalidate the study. Concurrently, the
workload at NIH was transferred to OSI.
In the summer of 1974, recognizing the long lead-
time for procurement of the large WCC data center, and
recognizing that the OSI contract would expire soon, the
Agency decided to reopen the OSI contract to competi-
tion. All o'f the studies to date had been oriented toward
data center hardware; now, EPA's appropriations
committee, investigating the Agency's growing ADP
expenditure levels, demanded a 5-year plan including not
only data centers, but information systems and staffing
blueprints as well. Index Systems, Inc., was awarded the
contract to perform this study. It recommended that the
Agency continue, over the 5-year period, to operate the
two data centers at Washington and RTP. They also
suggested that the Agency exercise its purchase option
on the Univac 1110 at RTP.
This study confirmed that reopening the OSI data
center to competition was in order and should proceed.
During the term of this new interim contract, EPA could
complete and update its requirements studies, and
procure a "permanent" data center, called the Washing-
ton Computer Center. Ever since EPA began, the work-
load which is now on the OSI data center had been
satisfied with IBM 360/370 series equipment. To avoid
massive conversion problems, EPA wished to specify
IBM equipment for the interim center.
However, GSA was charged with implementing a
full competition policy. This policy philosophy states
that it is in the interest of the Government to have
several computer manufacturers providing hardware in a
competitive environment. Continued reliance for
decades on one vendor by agencies citing difficulties of
conversion to other vendors' equipment could place the
Government at great disadvantage in receiving price and
service from that vendor. Yet GSA's authority, con-
tained in the Brooks Bill, prevents them from interfering
with an agency's mission. Since requiring full competi-
tion and a probable conversion would interfere, GSA has
found it difficult to prevent brand name specification in
data center procurements. As a compromise, GSA has
allowed "interim procurements" specifying brand
names, to be followed by fully competitive procure-
ments within 2 years, or longer if GSA and the agency
mutually agree. In essence, GSA recognized the con-
version problem. Instead of requiring that EPA en-
counter it with every procurement, GSA allowed EPA to
have an interim period of brand name specification to
develop and complete a conversion plan.
185
-------
1ZPA specified IBM brand name equipment, and its
historical situation mandated that the procurement serve
EPA nationwide, with a majority of workload at head-
quarters, and almost none of it at RTF. Two aspects of
this procurement set it apart from others. First, it is in
three separately awardable parts. Second, it is a facilities
management arrangement, for which the vendor (or
vendors) is reimbursed for the cost plus an award fee.
The RFP's three parts are the data center hardware/
software, the communications network, and the user
support service. Each part is separately awardable,
although one vendor could propose and win in all parts.
While coordination could be a problem with separate
vendors, best support in each area is not necessarily
available from a single vendor. Also, one vendor could
tend to emphasize his management strength in the area
of mosi profit, with adverse effect on the other areas.
One interesting aspect of the communications network
portion of the RFP is that it specifies a network to
provide nationwide access to the WCC. as well as to the
Univac dyla center a) RTP.
The services are to be procured under a facilities
management concept. In essence, EPA and the vendor
(or vendors) will jointly determine the equipment,
network components, and level of staffing necessary for
the three parts of the center. The vendor will then
obtain the necessary resources and operate them to EPA
specification. This differs from the current pricing
arrangement for the OSI contract, whereby EPA is billed
essentially job-by-job for the work accomplished.
The procurement was forwarded for GSA approval
in December 1974 and approved in July 1975. The
procurement was released to the public in August, with
proposals due on October 31. Evaluation is proceeding,
and an award is expected by September 1976.
186
-------
LARGE SYSTEMS VERSUS SMALL SYSTEMS
By R. W. Andrew
The primary consideration in planning development
of ADP resources for the U.S. Environmental Protection
Agency (EPA) should be the needs of the ultimate user
community, that is, the EPA scientists and administra-
tors. The following discussion, representing the view-
point of a Research and Development (R&D) scientist-
user, is intended to describe some of those needs and
how they can best be met by future developments in
ADP resources for EPA. Until recently, such resources in
ADP "just growed," like the proverbial Topsy. This
opportunity to present and discuss our needs heralds a
welcome change in management philosophy toward
fitting the resources to the job rather than the reverse.
Perhaps the best way to begin to outline future
needs is to describe those present needs which are
unmet. At present, ADP resources for EPA are supplied
largely by the maxicomputer systems at Optimum
Systems Inc. (OSI) and Research Triangle Park (RTP).
Both systems are designed and operated primarily for
large-scale batch jobs and large data bases such as
STORET and AEROS. Remote time-share users,
although now garnering an increasing proportion of total
usage, have received relatively little attention in the plan-
ning and operation of the systems. Remote user access
to scientific and statistical software packages has been
added as an afterthought, with little or no previous plan-
ning or overall design constraints. The remote time-share
user has been forced to operate in a "batch-mode"
environment, or to pay an excessive premium for time-
sharing operation (TSO) or demand operation which
accesses only part of the software packages. These are
the precise services and software, however, most needed
by the remote scientist-part-time programcr. Perhaps the
best example illustrating this point is that none of the
present EPA-supported computer services offer online
time-share compilers or interpreters for BASIC language
programs, yet BASIC is rapidly becoming the unviersal
language of the scientist-programer.
A second need, perhaps the major stumbling block
to the R&D scientist-user of the present systems, is
learning the terminal operation and Job Control
Languages (JCL). The typical scientist has neither the
inclination nor the time required to learn an additional
language or the complex JCL required to run his pro-
grams. While the present EPA systems provide raw
computer power equal to any conceivable task, these
systems are virtually unapproachable by the scientist-
user, laboratory, or office lucking trained computer
personnel.
What ought to be done to satisfy these and future
needs of the R&D scientist-user community? First, it is
important to expose cconomy-of-scaJc for the myth that
it is. Because of the rapidly declining cost of large-scale
memories and microprocessors, the economies of central
processors are rapidly disappearing. Such economies are
readily outweighed by the high cost of communications,
project delays, and specialized training. As a result, most
EPA laboratories are following the lead of their in-
dustrial counterparts and are turning to the use of
dedicated minicomputers. In some cases the minicom-
puters have more flexibility and dedicated memory than
is available through the large systems. This transition is
happening so rapidly that it precludes the best prior
planning and management efforts. An estimated 75 to
100 minicomputer systems are presently installed in
EPA laboratories and offices; most without any centrally
coordinated planning, design, or sanction. These
systems, however, do satisfy, though somewhat inef-
ficiently, the need for rapid, cost-effective computation.
Therefore, present planning should recognize and
encourage the use of such decentralized computer
facilities, and should provide for some standardization of
equipment, software, and training as a means of
improving the operational efficiency within R&D. An
additional means of improving both flexibility and
efficiency would be planning and instituting distributive
networks of such minisystems, combined with the larger
Agency-wide systems.
Another possible means of providing improved
computer service to all of the HPA user community
would be to institute greater segregation and dedication
of operating system software (and possibly hardware) to
specific tasks and levels of compulation. For example, it
is inefficient to require a scientific user running a small
FORTRAN job to utilize the same terminal and JCL as
the STORET user wishing to manipulate and sort
massive data bases. The FORTRAN user should be able
to edit, compile, execute, and save his job with simple
commands such as: FORT, LIST, RUN, and SAVE, and
all machine transactions and record keeping should be
invisible to the user. This type of operation is an
established fact with most university or science-oriented
computer systems.
187
-------
Some additional suggestions to help satisfy existing
and contemplated needs are:
Increased use of "smart" terminals; e.g.,
TEXTRON1X 4051, H-P9830. or IBM 5100
as interpreters; or preprocessors for on-line
data reduction and formatting from active
experiments
Formation of scientific and/or statistical
software user groups for formal sharing of
programs and problem solutions
Institution of computer aided instruction
(CAID) for the training of novice users
A switch from contractor-supported lo
EPA-supportcd user service at the central
agency computer (OSI).
While it is obvious that EPA's need for and usage of
ADP will continue to increase at a geometric rate, we
must recognize that those needs and usages are becoming
more and more specialized. Consequently, the ability of
a centralized computer utility to be all things to all users
becomes increasingly difficult. In the future, planning
should avoid equating size with flexibility, and speciali-
zation should be anticipated in such a way that it can be
a benefit to EPA.
188
-------
SUMMARY OF DISCUSSION PERIOD - PANEL VI
The questions after the session, Future Developments in ADP Resources for EPA, focused on six primary topics. They
are discussed below.
Policy
Questions addressed general Agency policy and, more specifically, ADP policy as established by the Management
Information and Data Systems Division (MIDSD). One query raised concerned which level ADP issues should address. The
Assistant Administrator (AA's) are involved in long-range planning of ADP requirements and resources. An ADP steering
committee comprised of the AA's is currently developing a 5-year ADP plan for the Agency. The Deputy Assistant Admin-
istrator (DAA's) are involved in budget decisions regarding ADP, as are Laboratory Directors. ADP Coordinators within each
of the laboratories are responsible for communicating information, both administrative and technical, within their respective
laboratories and for interacting with, and receiving guidance from, the Office of Research and Development (ORD) ADP
Coordinator.
Each ORD Laboratory has been encouraged to develop a plan for the future, of which ADP certainly can be a part. This
plan may show a strong in-house effort. In this case, there would be a lower travel budget, lower grade-point average, more
technicians, and less PhD's. This is the opposite of a strong out-of-house program.
Concern was expressed regarding the possibility of MIDSD's deciding who could establish stand-alone computer capa-
bilities. A policy has not been generated. In any event, the decision would depend upon the users' needs.
Concern was also expressed regarding strong MIDSD alignment with the General Services Administration (GSA). MIDSD
has made a major effort to follow GSA policy, but EPA is allowed by GSA to do what it feels necessary.
Approval by MIDSD is required for procurement of items under Federal Schedule 66, Laboratory Instruments and
Automation.
National Computer Center
The first issue concerning the National Computer Center (NCC) located at Research Triangle Park, North Carolina, was
the architecture of the Univac 1110 as it relates to the present mix of demand use and batch use. This mix is user specific as
opposed to data center specific. The data center can, however, limit what users do over demand as opposed to what they do
over batch.
The specifications for the Univac 1110 include support for both demand use and batch use. A batch processor was not
purchased. A larger amount of slow memory is being acquired to handle demand use more effectively. Pricing is being altered
to make batch use cheaper than demand use.
The types of use for which demand access can better be used include the retrieval of information from data banks and
computational functions which are time-specific and dependent. Program developing may be done more effectively on
minicomputers.
The role of NCC managers in the selection of the interim and a permanent computer facility was questioned. One of the
NCC staff is on the interim center evaluation panel. Future determinations concerning a permanent center, which is not
geographically or machine restricted, are open for participation.
The NCC will become more involved than before with the remote users. Each regional office is being provided with
2-day seminars on the use of systems on the Univac 1110. However, users at RTP will receive service at the same, or an
increased, level as compared to past service.
189
-------
Contracting
There has been a shift to using contractor personnel to operate Agency facilities. Concern was expressed regarding the
potential loss of Federal skills to contractor employees. The Agency will continue the functions of management, including
the technical planning of resources. Contractor personnel will be used to implement specific projects or to provide their
expertise for a product containing factual data. The current large budgets will encourage a tendency to use onsite contracting.
There probably will be no new positions, and if jobs are facility operations in nature, contractors may be used.
Communications
A national communications system will extend to all Regional Offices and to the laboratories located at RTF,
Cincinnati, Las Vegas, and Corvallis. The network is designed to handle communications to both Optimum Systems
Incorporated (OSI) and the National Computer Center.
A user's catalog may be generated which is accessible to anyone. This is contingent upon standardization of programs to
run on any EPA machines. This would allow, in simple terms, the use of any particular machine to solve a problem rather
than the use of a wide mix of machines.
System Size
There was great interest in the use of large computer systems versus small systems. At this time EPA does not have a
firm policy regarding minicomputers. There is no deliberate intent to diminish the number of minicomputers. However, it will
be ascertained that available purchased capacity of a system is used-before additional capacity is purchased.
There was evident support for a POP 10 system, which is much more approachable by a scientist. The suggestion was
well received that ORD should buy a POP 10 for use by scientists.
A mix of different computers operated by EPA was addressed as being desirable. EPA could then borrow programs
without conversion problems. This mix could include the following: IBM 370, CDC 6000, Univac 1110, and POP 10.
A smaller system application used for reducing and feeding data into a higher level system for large summaries may be
efficient. These summaries may include trends, predictions, and historical displays. This would involve a stand-alone
computer with concurrent terminal capabilities for interaction with a larger system; in other words, distributive processing.
Distributive Processing
Support was enlisted for the distributive processing concept, inherent in the pending standard terminal procurement.
This procurement specifies a standard terminal consisting of a minicomputer with stand-alone capabilities and concurrent
terminal capabilities.
This would be a small machine that could hook up with OSI, or NCC, or other centers. Data then could be easily
provided to central data banks, and programs on a distant central system could be used locally.
Pooling the power of individual minicomputers with the large systems would result in a positive synergjsm. One problem
is increased local processing "creep." Through distributive processing, a local operator will learn more about the software
capability available from the larger system, increase the size of the local system to maintain local control over the acquired
software, and slowly generate a maxisystem locally which then becomes independent of the distributive system.
Implementing distributive processing within the Agency, because of its size, would be a formidable task. However,
individual components such as research laboratories could begin. This could be done, fust, by defining functions being
performed by the research program at the particular locations; second, by identifying the data processing needs; and third, by
analyzing the data acquisition process, the analytical or data reduction processes that must be executed, and the level at
which they should be performed. The use of second and third shift time available on larger systems should not be overlooked.
With less demand use, it is more stable. With this information available, local systems could be defined in relation to the use
of larger systems.
190
-------
APPENDIX
-------
APPENDIX
LIST OF ATTENDEES
Allison, G.
Environmental Monitoring and Support Laboratory
Environmental Protection Agency
P.O.Box 15027
Las Vegas, Nevada 89114
Almich, B.
Computer Services and Systems Division
Office of Administration
Environmental Protection Agency
Cincinnati, Ohio 45268
Anderson, G.
Health Effects Research Laboratory
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Andrew, R.
Environmental Research Laboratory
Environmental Protection Agency
6201 Congdon Boulevard
Duluth, Minnesota 55804
Barton, G.
General Chemistry Division, L-404
Lawrence Livermore Laboratory
P.O. Box 808
Livermore, California 94550
Berger, J.
Department of Commerce
National Oceanic and Atmospheric Administration
Environmental Data Service
Washington, D.C. 20235
Borthwick, P.
Environmental Research Laboratory
Environmental Protection Agency
Gulf Breeze, Florida 32561
A-l
-------
Bryan, S.
Health Effects Research Laboratory
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Budde, W.
Environmental Monitoring and Support Laboratory
Environmental Protection Agency
Cincinnati, Ohio 45268
By ram, K.
Management Information and Data Systems Division (PM-218)
Office of Planning and Management
Environmental Protection Agency
Washington, D.C. 20460
Chamblee, J.
ADP Coordinator
Office of Water Program Operations (WH-547)
Office of Water and Hazardous Materials
Environmental Protection Agency
Washington, D.C. 20460
Cirelli, D.
Ecological Monitoring Branch
Technical Services Division (WH-569)
Office of Pesticide Programs
Environmental Protection Agency
Washington, D.C. 20460
Cline, D.
Environmental Research Laboratory
Environmental Protection Agency
College Station Road
Athens, Georgia 30601
Conger, C.
Monitoring and Data Support Division (WH-553)
Office of Water and Hazardous Materials
Environmental Protection Agency
Washington, D.C. 20460
Couch, J.
Environmental Research Laboratory
Environmental Protection Agency
Sabine Island
Gulf Breeze, Florida 32561
A-2
-------
Davies, T.
Environmental Research Laboratory
Environments] Protection Agency
Sabine Island
Gulf Breeze, Florida 32561
Dell, R.
Central Regional Laboratory
Environmental Protection Agency
Region V
1819 West Pershing Road
Chicago, Illinois 60609
Enrione, R.
Health Effects Research Laboratory
Environmental Protection Agency
Cincinnati, Ohio 45268
Fairless, W.
Central Regional Laboratory
Environmental Protection Agency
Region V
1819 West Pershing Road
Chicago, Illinois 60609
Frazer, J.
Department of Electrical Engineering
Colorado State University
Fort Collins, Colorado 80521
Goldberg, N.
Environmental Research Laboratory
Environmental Protection Agency
South Ferry Road
Narragansett, Rhode Island 02882
Greaves, J.
Department of Electrical Engineering
Southeastern Massachusetts University
North Dartmouth, Massachusetts 02747
Hammerle, J.
Monitoring and Data Analysis Division
Office of Air and Waste Management
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
A-3
-------
HarlJ.
Computer Services and Systems Division
Office of Administration
Environmental Protection Agency
Cincinnati, Ohio 45268
HcrU, M.
Health Effects Research Laboratory
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Johnson, M.
National Computer Center (MD-34)
Environmental Research Center
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Jurgens, R.
Environmental Sciences Research Laboratory
Environmental Protection Agency
Research Triangle Park. North Carolina 27711
Kelscy. A.
Health Effects Research Laboratory
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Kinnison, R.
Environmental Monitoring and Support Laboratory
Environmental Protection Agency
P.O.Box 15027
Las Vegas, Nevada 891 14
Kleopfci, R.
Environmental Protection Agency
Region Vll
1735 Baltimore Street
Kansas City, Missouri 64108
Knight, J.
Health Effects Research Laboratory
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Kolojeski, P.
Office of Air, Land and Water Use (RD-682)
Office of Research and Development
Environmental Protection Agency
Washington, D.C. 20460
A-4
-------
Kopfler, F.
Health Effects Research Laboratory
Environmental Protection Agency
Cincinnati, Ohio 45268
Koutsandreas, J.
Office of Monitoring and Technical Support (RD-680)
Office of Research and Development
Environmental Protection Agency
Washington, D.C. 20460
Krawczyk, D.
Environmental Research Laboratory
Environmental Protection Agency
200 S.W. 35th Street
Corvallis, Oregon 97330
Lawless, T.
Environmental Monitoring and Support Laboratory
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Lawrence, C.
Office of Monitoring and Technical Support (RD-680)
Office of Research and Development
Environmental Protection Agency
Washington, D.C. 20460
Lowrimore, G.
Health Effects Research Laboratory
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Meyer, R.
Internationa] Research and Technology
1501 Wilson Blvd.
Arlington, Virginia 22209
Mullin,M.
Grosse lie Laboratory
9311 Groh Road
Grosse He, Michigan 48138
Myers, M.
Office of Research and Development (RD-672)
Environmental Protection Agency
Washington, D.C. 20460
A-5
-------
Nimc, E.
Computer Services and Systems Division
Office of Administration
Environmental Protection Agency
Cincinnati, Ohio 45268
Ott.W.
Office of Monitoring and Technical Support (RD-680)
Office of Research and Development
Environmental Protection Agency
Washington, D.C. 20460
Rhodes, R.
Environmental Monitoring and Support Laboratory
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Richards, N.
Environmental Research Laboratory
Environmental Protection Agency
Sabine Island
Gulf Breeze, Florida 32561
Rislcy,C.
R&D Representative
Environmental Protection Agency
Region V
230 South Dearborn Street
Chicago, Illinois 60604
Schoor, P.
Environmental Research Laboratory
Environmental Protection Agency
Sabine Island
Gulf Breeze, Florida 32561
Scott, F.
Management Division
Environmental Protection Agency
Region VII
1735 Baltimore Street
Kansas City, Missouri 64108
Shackelford, W.
Environmental Research Laboratory
Environmental Protection Agency
College Station Road
Athens, Georgia 30601
A-6
-------
Shew, C.
Robert S. Kerr Environmental Research Laboratory
Environmental Protection Agency
P.O. Box 1198
Ada, Oklahoma 74820
Sommer, D.
National Enforcement Investigation Center
Environmental Protection Agency
Denver Federal Center
Building 53, Box 25227
Denver, Colorado 80225
Spittler, T.
Surveillance and Analysis Division
Environmental Protection Agency
Region I
John F. Kennedy Federal Building
Room 2203
Boston, Massachusetts 02203
Steinacher, M.
Management Information and Data Systems Division
Office of Planning and Management
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Swink, D.
Office of Monitoring and Technical Support (RD-680)
Office of Research and Development
Environmental Protection Agency
Washington, D.C. 20460
Talley, W.
Assistant Administrator for Research and Development (RD-672)
Environmental Protection Agency
Washington, D.C. 20460
Tittle, C.
Bowne Time Sharing Inc.
1025 Connecticut Avenue, N.W.
Washington, D.C. 20036
Ustaszewski, Z.
Office of Health and Ecological Effects (RD-683)
Office of Research and Development
Environmental Protection Agency
Washington, D.C. 20460
A-7
-------
Wheeler, V.
Environmental Monitoring and Support Laboratory
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Whitley, S.
Earth Resources Laboratory
National Aeronautics and Space Administration
Bay St. Louis, Mississippi 39520
Williams, R.
Municipal Environmental Research Laboratory
Environmental Protection Agency
Cincinnati, Ohio 45268
Worley, D.
National Computer Center (MD-34)
Environmental Research Center
Environmental Protection Agency
Research Triangle Park, North Carolina 27711
A-8
------- |