PB-238 088
SUMMARY OF THE OFFICE OF TOXIC SUBSTANCES REQUIREMENTS
RESULTING FROM-THE TOXIC SUBSTANCES CONTROL ACT AND A
PRELIMINARY SPECIFIC FOR A DATA MANAGEMENT SYSTEM
NATIONAL BUREAU OF STANDARDS
PREPARED FOR
ENVIRONMENTAL PROTECTION AGENCY
AUGUST 1974
DISTRIBUTED BY:
National Technical Information Service
U. S. DEPARTMENT OF COMMERCE
-------
-- - ---
NBS,:.114A (REV. 7-73)
U.S. DEPT. OF COMM.
BIBLIOGRAPHIC DATA
SHEET
4. TITLE AND SUBTITLE
Summary of the Office of Toxic Substances Requirements
Resulting from the Toxic. Substances Control Act and a
Preliminary Specific for a Data Management System
.-
1. PUBLIcATION OR REPORT NO.
2. Gov't Accession
No.
-
Pl85
238
O~
-
5.. Publin1lion Dare
-
. .
6. Performil\f', ()ri~I\\\ i za( iOI\ ( ",
-
7. AUTIIOR(S) John L. Berg,. JoseDhine Walkowicz, D,£nnis. Brans tad I. 8. PerforminJ(QrJ(an. Hcpon'N,'. ,
.' "-\\-Iod'I'.""~;$(,~ "'.~'';'~;b,;i"ll~'''!I:t~,.;h~~\b.:''r;'''~",~IC~~~;*~~~ _r...u"'~al3!~~~'O~"IWo,~~.~,~~j~.R.'Q<;l ~
MIchael KeD1Inaer
9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. ProjcCt!Task!W",k Unit Nd.-'"
640 1411
'.." .
NATIONAL BUR EAU OF ST ANDARDS
DEPARTMENT OF COMMERCE
WASHINGTON, D.C. 20234
11. Conrracr!Grant No.
EPA-1AG-D4-0404
12. Sponsoring Organization Name and Complete Address (Street, City, State, ZIP)
Environmental Protection Agency
Office of Toxic Substances
401 M Street, S.W.
Washington, D. C.
15. SUPPLEMENTARY NOTES
13. Type of Report & Period
. Covered
Final
14. Sponsoring Agency Code
20460
Reproduced by
NATIONAL TECHNICAL
iNFORMATION SERVICE
us Department 0' Commerce
Springfield. VA. 22151
-.--.;
n,
16. ABSTRACT (A 200-word or less factual summary of most significant information. If document includes a si~nilicant
bibliography or literature survey, mention it here.)
This report presents a requirements analysis and feasibility study for the data
management system needed to use effectively industrial reporting data resulting from
the proposed Toxic Substances Control Act.
The study finds that the Office of Toxic Substances requires a system wi th flexibilit.
extensibility of data content, ability to handle a wide and confidential nature of
the reports, and suitability for immediate, installation on a production basis.
In the study both a manual system'that minimally satisfies the basic requirements and
a computerized system with much extended capabilities are found technically feasible.
In addition, the study presents feasible enhancements to the manual system which
extend the manual system capabilities and show that a continuum o~ system decisions
exists between the manual and the computerized system. '
The study recommends immediate preparation for the computerized system in parallel
with the adoption of a cost-saving manual system that has a four-year life expectancy.
The manual system will provide the basis of the archival storage under the computer-
ized system. Preparation for the eventual computeri~ed system includes review of rea;
experience under the manual system. It is not unlikely that real experience will lea'~
to some revision in the projected six-year program.
I,
[
I
I
18. AVAILABILITY
C Unlimited
19. SECURITY CLASS
(THIS REPORT)
21. NO. OF PAGES
I
...,
i
I
1
17. KEY WORDS (six to twelve entries; alphabetical order; capitalize only the first letter of the first key word unless a proper
name; separated by semicolons) Da ta Management System; Environmental Protection Agency;
feasibility study; industrial reporting; Office of T~ic Sabstances; requireme~ts
analysis; toxic substances. :~ $ij~j~([ 10 (~G,
C For Official Distribution. Do Not Release to NTIS
UNCLASSIFIED
20. SECURITY CLASS
(THIS PAGE)
-
D Order F~om $.up./of Doc., U.S. Government Printing Office
Washington, D.C. 201\02, SD Cat. No. C13
~ Order From National Technical Information Service (NTIS)
Springfield, Virginia 221';1
22. Price
UNCLASSIFIED
USCOMM-DC 29042-P74
a
-------
EPA-560/3-74-001
REPORT TO THE DIRECTOR,
OFFICE OF TOXIC SUBSTANCES
Summary of the Office of Toxic Substances Requirements
resulting from the Toxic Substances Control Act
and a preliminary specification for a data management
system.
Prepared by
Institute for Computer Sciences and Technology
August, 1974
I*/
-------
. - -
This report has been reviewed by the
Office of Toxic Substances, EPA9 and
approved for publication. Approval
does not signify that the contents
necessarily reflect the views and
policies of the Environmental Pro-
tection Agency, nor does msntion of
trade names or commercial products
constitute endorsement or recommenda-
tion for use.
Ib
-------
ABSTRACT
this report presents a requirements analysis and feasibility study for
the data management system needed to use effectively industrial reporting
data resulting from the proposed Toxic Substances Control Act.
The study fiiujs that the Office. of Toxic Substances requires a system
with flexibility, extensibility of data content, ability to handle a
wide and confidential nature of the reports, and suitability for immediate
installation on a production basis.
In the study both a manual system that minimally satisfies the basic
requirements and a computerized system with much extended capabilities
are found technically feasible.
In addition, the study presents feasible
enhancements to the manual system which extend the manual system capabilities
and show that a continuum of system decisions exists between" the manual and
the computerlzed system.
The evaluation stresses tha"t a cost-effecti ve analysis of the two systems
must include a comparison of the aids and services offered by the more
expensive cotnputerized system against the limited tools provided by the
less expensive manual system.
Further, the estima tes provi ded for the
six year period argue strongly that the annual report handling load and
the accumulated archival data base will continue to grow beyond 1980.
This growth will tax the limited capabilities of the manual system by
i1
-------
the fourth year but certain of the enhancements suggested can extend
the utility of the manual system.
The study recommends immediate preparation for the computerized system
.. .. .
in parallel with the adoption of a cost-saving manual system that has.
.. ;... ,
a four-year life expectancy.
The manual system will provide the basis
.:,'.
of the archi val storage under the computerized' system.
Preparation for
..
the eventual computerized system includes review of real experience. Unde~
the manual system.
It is not unlikely that real experience will lead
to some revision in the projected six-year program.
iii
-------
EXECUTIVE SUMMARY OF MAJOR ISSUES
Manual or Automated Data Base
Discussion: Over the six year period studied/ the annual report input
estimates ranged from 10,500 per year (42 per day) in 1975 to an input of
90,500 (362 per day) in 1980. A simple comparison of costs between a man-
ual system and automated system ignores the additional benefits achievable
.through automation. The manual system could only become increasingly cum-
bersome with the increasing reporting volume and file size. Further, a
manual system would have limited response time and flexibility compared
to the automated system, However, surrent planning necessarily reflects
e&timtes. In addition to eost savings, the manual system would permit
the gathering of seal data and the development of better estimates uaing
a minimal eost system. This data would assist not only the decision to
automate but assist the many operational decisions OTS must make after de-
siding to automate.
Reeommendationi 6TB should initiate a manual system with an expected
life span of four years and immediately begin preparation for a computerized
report management system for operation in the fourth year. This would per-
mit a parallel operation of both systems in the fourth year. At the same
time, OTS should develop a formal program for monitoring actual experience
with the reports and the OTS line units utilization of them. All planning
should anticipate and prepare for eventual computerization of the OTS data
base.
iv
-------
Confidentiality
Discussion:
Considerations of confidentiality are pervasive in both
the manual and automated system.
In all cases and under any mode of ulti-
mate implementation, confidentiality remains the responsibility of the Office
of Toxic Substances.
Confidentiality considerations include procedures
related to data collection, entry, storage, processing, output, and dissem-
ination. Factors affecting confidentiality considerations are personnel,
media, facilities, equipment, administration, and quality control.
The Office of Toxic Substances should recognize the
Recommendation:
complexity of implementing confidentiality considerations within its data
base and initiate an overall management oriented approach.
This approach
should develop a security program focused on specific problems and high
risk areas. The Office of Toxic Substances should include confidentiality
considerations in all phases of planning for the data base.
v
-------
I.
II.
III.
IV.
TABLE OF CONTENTS
Pa~e
INTRODUCTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
REQUIREMENTS. . . . . . . . . . . . . . . . . . . . . . . ., . . . . . . . . . . . . . . . . . . . . . . . . ... . 2
A.
Office of Toxic Substances Requirements....................2
1.
2.
3.
4.
Repor ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2
Summary of Requirements..............................,..4
Interfaces with Existing data bases.................. '. .5
Chemical Substance Searchi.ng.......................... . .7
B.
Confidt=!'ntiality............................................10
1.
2.
3.
4.
Applic~b).e Law .F1nd Legisla ti.ve. Requixfw.IelIts. . . . . . . . . . . . 10
Threats to .Data............"................. '. . . . . . . . . . >.12
Data Security and Confidentiality Considerations..;. ..' .13'
Suzmnary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
System Considerations......................................27
1. Reporting Form Structure............................~..27
2. Data and Report Volumes................................42
3. OTS Use of the Data Base""",.""",.""""........49
FEASIBILITY OF ALTERNATIVE APPROACHES.........................51
C.
Manual System..............................~...............51
1. Inforllla. tion Flow................................... ..'.. .51
2. Retrieval of Informa tion. . . . . . . . . ... . . . . . . . . . . . . . . . . ~ . .56
3. Personnel Considerations..............................57
Enhancements to the Manual System......................... .63
1 . Desar iption. . . . . . . . . . . . . . . .' . .. . . . . . . . . . . . . . . . . . .. . . . . .' . 63
2. Indexing Aids.................... ..................... .65
3. Catalog posting and publication....................~..65
4. Data descriptor vacabul~ry............................66
5. Data Element Dictionary...............................66
6. Bibliography preparation............................... 67
7. Automated mailing list and tickler file...............67
8. Microfilm working files............................,..68
9. Microfilm Retrieval Systems............................68
10. Personnel Considerations.............................. 68
The Computerized System............................,...,..69
1. Information Flow......................................71 .
2. Data Entry Considerations.............................75
3. Data Management System Selection................... "..79
4. Data Management System Review...................... 80
5. System Configuration................................. 90
,6. Personnel Considerations.................... .'........ .90
EVALUATION AND COSTING CONSIDERATIONS..........................94
Evaluation Considerations..................................94
1. Evaluation
2. Facilities
A.
B.
C.
A.
Criteria....................................94
Evaluation..................................~5
vi
-------
.,-
.'
v.
Comparative Cost/Benefit Analysis.... 0 0.0.00000000000000000 102
1. Cost Estimates...oo...oe.......oooooooooooooooooooooooo.l02
2. Comparative analysis..ooooooooooooooooooooooooooooooooo.l05
Comparative Costs.....o...ooooooooooooooooooooooooooooo 112
Guidance for RFP.... 0 . . 0 . . 0 . 0 . 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 0 120
RECOMMENDATIONS. . . . 0 0 . 8 . . . . .0 0 0 . . 0 . . 0 0 . . 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a 0 126
OVerview. . . . . . . . . . . . . . 0 0 . 0 . 0 . . . . 0 . o. Q 0 0 0 . 0 . 0 0 . 0 0 0 0 . 0 0 0 0 0 0 0 0 0 126
127
127
(1976) . . . . . . 0 . . . . . . 0 . 0 e 00000.000000000000000 127
(1977)'.................oooooooeoooooooooooooo.129
(1978) .......0 G. 0 0 0.00000000000000000000 e 000.129
3.
B.
C.
A.
B.
Reco11llrJended Program... 0". 0'..,-.'0' 0'0" 0". .'.' .'0' 0'0" 0'0' (t'.' 6)"0' 0'0' ~'o 0'0' 0'00'0 e.. 0000
1. First year {197S}...o...........o8oo8ooooooooooooooooooo
2. Second year
3. Third year
4. Fourth year
5. Fifth and Sixth years (1979-1980)0000..0000.0000000.0000.129
Specific Topics.......... G . . . . . . 0 CI . . . 0 0 0 0 . 0 0 0 00 0 0 0 0 0 0 0 0 0 0 e 0 II ~130
1. Chemical substance searching..........ooooooo.o.o.o.ooo 130
130
130
bases Ct 0 . . 0 GOO 0 0 0 " CI 0 0 . 0 0 00 0 0 131
Bases. 0 . . 0 0 0 0 0 0 0 0 . 0 0 . 0 0 0 0 . 0 132
Confidentiali ty. . 0 . 0 0 . . . . 0 . . . . . . 0 ~ 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Reporting forms.. 0 0 0 0 . . . . . . . 0 0 0 " 0 0 0 0 co 0 . 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 . .
Interface
Existing
wi th existing data
Bibliographic Data
C.
2.
3.
4.
Appendix A.
vii
-------
J. INTRODUCTION
As required under the Inter-agency Agreement between the Nationa^Bureau
of Standards and the Environmental Protection Agency (EPA-IAG-E4-Q.404) ,
this report presents a summary of the Office of Toxic Substances requirements
and a specification for a data management system suitable for the technical
basis of a Request for Proposals for systems design and implementation.
The report addresses those OTS information needs that satisfy: the requirements
of the proposed Toxic Substances Control Act. The report reflects interviews
with several operating units within the Office of Toxic Substances and
other government agencies. The report contains preliminary system considerations
which identify the major components and feasible alternatives with associated
cost estimates, a statement of the Office of Toxic Substances confidentiality
requirements, and preliminary feasibility considerations to provide the
technical basis of a Request for Proposals.
•\
-------
II.
Requirements
A.
Office of Toxic Substances Objectives and Information Requirements
The proposed data bank of manufacturer 'reports will need to be
organized in a manner such that the reporting requirements imposed by
legislation nowipending in Congress can be satisfied in accordance with the
provisions of the proposed Toxic Substances Act.
The new legislation will provide the basis for imposing various reporting
and other obligations on manufacturers, importers, and processors of toxic
substances as well as on the Administrator of EPA.
Since the final form and
date of enactment of the Toxic Substances Act are still unknown, requirements'
of the Act are swranarized from the House version of the Act, Committee Print
dated Ju1u 25. 1973.
1.
RePQrts
Reporting requirements of manufacturers, processors, and importers
are outlined below.
"Manufacturer" as used subsequently in this report
is used as a general term to incl ude processors and importers as well,
except in those cases where reporting requirements may differ azoong the
three categories.
The House Bill Section column cites the section containing
the requirements.
a.
Manufacturer Reports
(1)
Ann ual Reports
HouSe Bill
Section
All Chemicals regulated by OTS
B(a)
Frequency:
annually and at such lI'Dre
frequent times as the
Administrator may require
Data:
name, identity, categories of use
amounts produced, and by-products of
all chemical substances manufactured,
as indicated in EJChibi t 2.
-2-
-------
(2)
Prema.rket Reports
( 3)
House Bill
Section
(a)
New Chemical Substances
Reporting time: 90 days in advance of commercial
production of new chemical
substance Sed)
Data:
same as a (1) above, and
:
test data developed for intended
use or distribution; £!.
.... - .,. --
SSU€
t"
(b)
New Uses of Existing Substances
Reporting time:
90 days in advance of 5(c)
manufacture or distribution
in commerce of significant
new use of substance(s) listed
in Federal Register or of
existing chemical substance (s)
Da ta:
same as a (1) above
:
test data developed for intended new
use
Other Manufacturer Reports
House Bill
Secti on
(a)
Performance of Tests
Frequency:
as required or requested,
for chemical substances to
which a test protocol applies;
4(d)
:
£E petition Administrator for
test protocol;
See)
:
or share costs for performance
of tests 5(f)
!Jl2!. :
test data developed under given
protocol, or dollar sha.re of costs
-3-
-------
2. Summary of Requirements
a. Overall
0 The information handling capabilities of the Office of
Toxic Substances data base system should offer sufficient
flexibility to allow the inevitable evolution of
requirements as more experience is gained.
* The Office of Toxic Substances storage and retrieval
response time may be 48 to 72 hours.
0 Reports submitted to the Office of Toxic Substances
should be in the form of "finished" reports rather than
simple compute? printouts.
0 £e§raJ requirements necessitate the archival storage
within EPA Q£ signed reports as legal documents.
0 In designing the Information system, care should be
taken to use only techniques well within the current
state of the art,
0 The original documents and all subsequent copies of the
Information contained therein (In whatever form that
may be necessary) remain the exclusive property of
the EPA.
0 The system design should insure Independence from any
particular vendor.
0 Correspondence to and from respondents should be
handled by OTS personnel.
0 Data declared by the respondent as confidential must
be so marked on all reports.
-4-
-------
o
The information handling capabilities shoula be extansible
in order to meet future as well as current needs.
b.
Usage Requirements
o
The manufacturer data must be readily accessible (with
a minimum a110unt of clerical help) to users within the
Office of Toxic Substances.
o
When necessary, the user should be able to carry hard
copy back to his desk for more detailed review without
destroying the integrity of the data base for other users.
o
The original documents must be preserved to satisfy legal
requirements.
o
The data base should support the identification of
"Trend" information and of anomalies in reported
production rates 0
3.
Interfaces with Existing data bases.
As discussed elsewhere, the Office of Toxic Substances Data
Base represents only a small part of the information needs of the Office.
The expertise represented by the staff of OTS includes the knowledge of
where to find the answers to questions raised by operational requirements
of the Toxic Substances Control Act.
The OTS data base will supplement the
current sources of information and may even become a prime source that
points to other data sources.
These other data sources are existing files of information in either manual
or automated form.
Over 175 such data files are reported in the Environmental
Information Systems Directory published by the EFA.
-5-
-------
The first step in accessing any file of information is through descriptions
of the material available.
The Systems Directory mentioned in the previous
paragraph is an example of such a description.
At a lower level, the
description must provide an explanation of the information provided to
describe each of the elements of the data files.
At a still loWer level,
the description may require an explanation of the terms used in the various
data elements, such as, the units used for numeric quantities.
In dealing with automated data files or data bases, the technical question
of the interchangeability of data between the data bases is raised frequently.
At a trivial level, data from existing automated data bases can be reduced to
hard copy for inclusion within a manual data base.
Alternatively, the existence
of an automated data base can be indicat~d at the point ina manual system
where any searcher for the pertinent information could find it.
The interchangeability of data between data files within the same retrieved
system is often confused with the interchange of data between two computer
systems .
EPA with its Univac and IBM computers and S2000 Data Mimaifement
System can support many automa ted data files, each wi th a content independent
of all the others.
Any user permitted by the system to access the data files
can do so within the limits of the system itself.
One such limi t may be the
lack of a capability to allow two files to be opened concurre!ltly, so that
data from one file can be compared to .the other.
Many systerris lack this
capability..
-6-
-------
an the other handp the interchange of data -between dissimilar computer
systems raises the question of incompatibility of the two systemsQ storage
media.. the codes used for storing information, the capacity of storage
media, and similar incompatibilities resulting from physical differences
between the two computer systems.
OTSQs solution to the lattar problem is to initiate development of data
standards prior to acquiring physical or hardware system components.
In
the event that OTS seeks satisfaction of its needs through an outside service
organiZlJtion.. data standards will still be necessary.
40
Chemical Substunce Se£rching
aTS 'IT(~eds for chemical structure searching, under the proposed Toxic
Substances Control Act, fall in the area of analysis of information reported
by manufacturers rather than in the specification of the reporting and
confidentiality requirements resulting from the proposed legislation.
True,
they are related.
Chemical structure searching will not be possible
unless chemical structure information is available.
summari zed below
are some points relevant to existing chemical structure techniques and OTS
requirements in this area.
a.
The Wiswesser notation and other linear notations are
essentially "pre-computer" languages, not suitable for mechanized systems. [4]
b.
The CAS Registry System, and systems based on CAS Registry
Numbers, offer the best means for accessing existing stores of chemical
information.
-7-
-------
.'
. ."
c.
'l'he technology for graphic displays and interactive
searching capability on chemical structures is promising, but oPerational.
systems on large files are as yet not generally available.
d.
The final legislation may not permi t requiring structure
. information.
"
e.
There is precedent for' CAS cooperation both for re~istering
previously unreported compounds and for maintaining confide~tiality of
'I,'
in forma ti. on .
..
:f.
It is f(:}lt that chemical strYct~re searching ~h1Ch is an'
, '
analysis function withi.n ~s cannot: hope to be' satisfied by the management:
information system established for accepting manufacturer reports.
At
best the "hooks" for linking wi th structure-oriented 'systems should be
established wi thin the management information .,,$ystlem.
. ,
-8-
-------
References
1.
"Chemical structure Handling, A Review of the Literature 1962-1968,"
National Academy of Sciences, Washington, Do Co 19690
.2.
"Survey of Chemical Notation Systems," National Academy of Sciences,
Washington, D. C. 1964.
3.
"Progress Toward a Computer-Based Chemical Information System,"
Fred Tate, Chemical and Engineering News, Vol. 45, Jano 19670
4.
"Interactive Graphic Chemical Structure Searching,"
Division of
Computer Research and Technology, National Institutes of Health, May
1973.
-9-
-------
---------~------.
i
B.
Confi denti ali ty
The Toxic Substances Control Act '0£ 1973, S.426, among its provisions
o
specifies the information that is to be collected, the reports whiCh are
to be generated, and confidentiali ty requirements for the data.
Poli cy
and other public law dictate further confidentiality considerations~
The means of meeting confidentiali ty requirements in the design, implementation
and operation of the ars information system are presented below.
The'
requirements are not unlike those found in other Federal agencies which
deal with commercial trade secrets and similar data. .
Providing the security necessary to protect confidential information is a
complex task requiring an overall management-oriented approaCh.
The
sec:uri ty program must focus on specific problem areas of high risk but
imlst !;let remain a balanced overall program insuring certain levels of
protection at all points in the information management process.
Attention
must be gi ven to securi ty a t all times in the planning and development
of the information system.
Security plays a vital role regardless of
whether the system is implemented as a manual, pari tally conputerized,
or fully automated.
1.
Applicable Law and Legislative Requirements
Because the O7'S system must deal with confidential and comzbercially
sensitive information, strict adherence to statutory requirements and
policy is absolutely essential.
Certain legislation, partJcularly
the Freedom of Intq1'lll,t:j.on Act (5 U.S.C. 552), illJ1,)Oses requirements
relating to both confidential and public information.
-10'-
-------
The Freedom of :Information act makes provisions for public access to
data maintained by Federal Agencies.
That Actll however II exampts certain
classes of information from mandatory disclosure requirements including
information which contains or r€;Jlates to a trade secret*o
Further, sanctions
against Federal employees may be imposed under provisions of 18 U"SaCo 1905
(1948) .
These sanctions include a fine of not more than $11'000, or
imprisonment for not: more than one year or both, and removal from office or
employment.
The House version of S.426 contains provisions relating to confidentiality
of data in 915.
These provisions require that information which contains or
relates to a trade secret or other matter referred to in 91905 or ti t1e 181'
United States Code, shall be considered confidential and disclosed only
under certain circumstances.
The House version of S.426 specifies that
such information may be made available to other officers or employees
concerned with carrying out the Act (including the Chemical Substances
Board and Committees formed underl10), or when relevant in any proceeding
under the Act except that disclosure in such proceedings shall preserve
the confidentiality to the extent possible without impairing the proceeding.
Such information shall also be made available upon requests of duly authorized
Committees of Congress.
The confidentiality provisions of S.426 clearly
indicate that ars must ensure that the information processing system
provide simple and efficient access to the information by authorized
personnel yet control access and use of this information in a manner
*A trade secret may be defined as "any formula, pattern, device or compilation
of information which is used in one's business, and which gives him an
opportunity to obtain an advantage over competitors who do not know or use it.
It may be a formula for a chemical compound..." [5J.
I
,1
-J]-
-------
unauthorized bg the Act.
2.
Threats to Data
OTS must assume the responsibilitg for maintaining the confidentialitg
i .
of data from collection through final analgsis and dissemination, regardless
of whether the. data is processed manuallg or 'bg a computer, within the
AgenC!/ or under contract.
Procedures must be implemented which insure
that trade secret data is specificallg identified and tagged at all.stages
in processing, especially during output, in order that no unauthorized
dissemination is made at this data.
A risk analgsis w:U.1. provi.de the basis upon which a val:J.d secur:J.tlJpragram
can be established.
The recommended steps in this process incJ.ude,
o
An estimate of all potential damages resulting trom (1) loss
or destruction ot data and program files! (2) theft of confidential
or trade secret i.nformation! and (3) phgs:J.cal destruction or
theft of phgsical information resources.
o An estimate of the probability of occurrence for po'tential
threats and their effect on the information sgstem in terns
of the classes of loss potential.
o
Development of an annual loss expectanC!/ bg combining the
estimates of loss potential and threat probabi'1.itg.
o Selection of the arrag of remedial measures which effect
the greatest reduction in the annual loss expectancy at the
least total cost.
Detailed procedures for performing a risk analysis are contained in the
Federal Information Processing standard publication entitled, ;"Guidelines
for ADP Physical Securi tg and Risk Management" [llJ .
Application of risk management techniques to the design and development
of the OTS information system will result in a comprehensive securitg plan
-12-
-------
L-
II
based on foreseeable threats against the securi ty of the systemo
Once the
threats are identified and an appropriate set of safeguards 'is selected
to counter those threats, the implementation of safeguards can be assured
whether the information system is developed in-house or by an outside
contractor.
Regardless of where the information system selected is implemented, the
!:
responsibility for assuring data confidentiality remains the primary function
of OTS.
This function must consequently be assigned a place in the
administrative structure charged with the development of the system.
Administrative functions and other data security considerations are addressed
in the succeeding sections.
3.
Data Security and Confidentiality:
Considerations and
Recol1Ullendations for Implementation
Figure 1 depicts the areas of primary consideration in planning
for data security in information systemso
This section discusses the
relationships of the factors which have been checked in that matrix.
ao
Administration
Specific security measures such as indoctrination materials and
a securi ty audi t program can be developed ei ther by OTS or by a contractor
with OTS supervision.
Security audit consists of producing a detailed
record of the computer system's activity and analyzing this record, both
manually and automatically, in order to signal possible securi ty violations,
produce security status summaries and generate damage assessment reports.
-13-
-------
'"
Control and application programs should maintain duplicate sets of.data
- ,".
..
which should be separately stored and protected by management~' Finally,
automated techniques should implement the data securi ty requirementS' Eor
personal accountability for system usage and data access.
Administrative control of data and the people handling the 'data are the
simplest and most economical methods of protecting data.
These controls
must be adequately specified and uniformly applied.
Such controls include:
o
A precise designation of personnel who have access to the
facilities, equipment, programs, storag~, media, and data;
o
Precise specification of data access privileges:
copy, 110dify and delete; - ,: '
read,
o
Designation of a senior ars staff member, ,whose du.ties
include the role of securi ty manager, independent of ADP
operations, with direct responsibility and authority in
securi ty matters; .' ."
o
Periodic independent internal security audits and aperiodic
external audi ts of securi ty procedures.
Recommended administrative steps to organize the OTS security
program include the following:
.
o
Assignment of OTS managerial responsibility for information
securi ty and establishment of a task forc~~:.1;.()e. prepaEe
a securi ty program plan. ' ",
o
Performance of a preliminary risk analysis to identify
major problem areas and select top priority securitrj
measures as needed to correct major problem areas.
o
".,'"
. ....'-' ':'(":', . """
Ini tiation of a securi ty prog~ap?,. ,inc!tl~4'~$1:"'H,;i:!> '
"'.::..;:"(. .::::.':.:;.::~' '..t"<~"'1'
Preparation of a plan and~'a. scbeduie for l.mplementing
selected remedial measures identified in the preliminary
risk analysis.
Preparation and maintenance of a policy and plans
handbook to include: (1) a security andAata
confidentiality policy state~~td (2),:;'lllftP"datory security
-:.
-14-
-------
procedures; and (3) guidelines for incorporating
securi ty measures in system desi gn, to inc1 ude programming,
testing, and maintenance for the automated portions of the
sys tem.
Internal procedural security should be implemented by:
o
Determination of potential targets for fraudp theft, or
misuse of information resources by analyzing the work
flow and the nature of the tasks performed by in-house
or contractor personnel. This requires the development of
procedures that will minimize exposure to loss of information
resources. Such procedures may include (1) requiring
cooperation between two individuals to perform critical
tasks; (2) performance of additional checks and bounds
~omparisons; (3) formalization of standards for high
risk operations; and (4) independent quality control checks.
o
Designation of critical positions in the automated portion
of the system for managementp system programming, program
library control, input/output controlp exception prOCessing,
applications programming, data base management, quality
control, internal audit, hardware maintenance, and definition
of appropriate pre-employment screening requirements.
o
Maintenance of up-to-date and accurate organization charts,
delineations of responsibili ties for functions performed
ei ther in-house or by contractor personnel, and work
statements for appropriate key positions.
o
To prevent unauthorized processing p implementation control
of and record keeping procedures for jOb initiation, scheduling
and distribution of output.
o
Control of access to physical data files in order to maintain
data integrity, to protect storage media, to provide
audit trails of custody and use of data file, and to prevent
unauthorized use of data files. .
o
.Utilization of audi t trails.
o
Establishment of policy and procedures for program and data
file retention to satisfy requirements for (1) backup
operati~; (2) audit and management review of operations;
(3) control of program maintenance; (~) quality controls
on input data; and (5) non-dependence of one individual's
knowledge of systems and programs.
o
Establishment of procedures to assure personal accountability
for confidential information both in ars and in contractor
facilities.
-15-
-------
o
Assignment of several people to each sensitive task to reduce
threats of collusion.
o
Rotation of personnel among sensitive tasks.
b.
Personnel
Figure 1 shows that persOnnel security factors are pervasive
throughout all information processing activities.
These factors are
especially important where there is close personal access to the in.formatian
in human readable :form.
The most serious personnel threats to in.formation
are modification or disclosure, either accidentallY or maliciously.
In
computer based informat:l.on sJ/stems data is also vulnerable to personnel
threats during processing and, to a lesser extent, while in storage.
OTS should develop selection cri. teria for all personnel who are invoived
in all stages of informat:.:J.on processing identified Figure 1.
The integrity
of an employee i.s likfJly to be influenced by factors which can be evaluated
before emploYll!ent, including personal background, financial situation and
motivation.
Through initial orientation and periodic security indoctrination,
personnel training and education should emphasize the importance of security
procedures, in addition to adequately preparing the individual to perform
assigned tasks correctly and efficiently.
, Personal accountability and responsibility for confidential information is
a key to ensuring data securi ty.
Automated techniques for satisfying this
requirement' when processing computerized data are subsequently discussed.
Every employee ,handling confidential information must be aware that he is
personally accountable for such information under his control and that
he is responsible for its protection.
-16-
"
-------
-- _.._--
~ ---- ~---.
Data Collection
/ / /' /
Data Entry / ./ /./ / ~
Data Storage /' / / /
Data Processing / ~ ~ /
/ / y/ ./ / /
Data output
/ / /
Data Dissemination
Figure l
Data Security Considerations
-17-
-------
To minimize or obviate personnel factors which could contribute to compromise
of data, the fol~owing steps axe recommended:
o
Determination of the security training requirements for
OTS and cOhtractor senior management, operations staff,
support staff, etc.
o
Selection and implementation of app.ropriate securi ty
awareness techniques such as training lectures and seminars;
orientation booklets; amendments to job. descriptions. making
employees responsible for security; and specification of
individual responsibilities for data security.
o
Development of personnel selection criteria that include
consideration of factors such as personal background,
financial situation and zrotivation.
o
Dissemination of some (but obviously. not all) of the securi ty
measures in force at the contractor facility as well as
at OTS.
o
Publicizing of selected cases of disclosure of confidential
data at other installations when the penalties imposed were
severe. Details of perpetration, however, should be. omitted.
o
Continuation of securi ty awareness and education programs
throughout system operation.
c..
Media
The integrity of data to be processed under the provisions of
5.426 is dependent on the types of media used for transmission and storage.
The media of data collection include physical media (paper form), the format
of the data (representation, coding, description), and the method of transmission
(postal service, courier).
Confidentiality of the transmitted information
must be ensured for all media employed.
Vulnerability of unprocessed data
to unauthorized access is perhaps highest during the collection phase.
entry media involve
Data
..
an intermediate process (card punching, microfilming,
key-to-tape, on-line keying) which transfers the data from its collection
-18-
-------
media .to computer entry or storage mediao
Data is particularly vulnerable
to accidental rrodification during the entry phaseo
Data integrity assurance
is discussed under quality controlo
Data storage media (punched cards, magnetic tape and disk" microforms,
mass stores) have vastly different characteristics wi th respect to data
securi ty.
Both archival storage and rapid access requirements must be
considered.
In particular, media degradation due to aging and heavy usage
must be compared against anticipated data life and usage needso
(Geller
discusses these considerations in reference [4].)
Data media for output and dissemination must be designed, procured, and
protected.
Paper forms used for report printing may be designed to facilitate
analysis of the processed data and may be pre-printed wi th distribution and
confidentiality instructionso
Recommendations under this heading include the following:
o
Adequate provisions for protection of source documents,
input/output data, and programs throughout the entire
processing cycle in the information facility, whether
internal or contracted.
o
Preprinting of forms with confidentiality instructions.
o
Adequate provisions for protection of source documents
data, and programS while in transit.
o
Adoption of procedures to ensure that all trade secret
data is specifically identified and tagged at all stages
of processing.
o
Maintenance of an inventory of the accession numbers and
titles of report and data files in a location separate
from the data itself.
-19-
-------
o Provisions of a capabili ty for erasing magnetic media,
ei ther selecti vely or totally.
d.
Facilities
Considerations for data processing facili ty securi ty are
important whether the processing is done by ar.S or a contractor.
Data
processing facilities and equipment to be used are likely to vary greatly
in specific proposals received from prospective vendors.
Any con tractual
endeavor should emphasize the need for data securi ty and specify the types
of protect.1on which will be provided by the contractor to sat.1sfy O'rS
securi ey requirements.
Inadequate specification ot securi ty requirements
nra II res ul t :l.n ins uili cien t: secUl:.t ty to meet: conti den ti. al:J. ty needs, and,
conversely, over spec.tt.tcatJ.on may result in fewer responsive bids with
higher costs.
Facilities are subject to environmental conditions (hurric~e,tornado,
flood, humidity, etc.), physical 'destruction (by fire, slfDlce, water,
explosion, personal assault), and unauthorized personnel entry.
En vi ronmental
and physical hazards may be anticipated and designed for, but they cannot
be prevented.
Contingency plans may be prepared, however, to minimize the
effects of environmental losses.
Unauthorized personnel entry can generally
be prevented.
Physical security of facilities should be specified for the
processing of confidential, and in l7r:)st cases all, data.
Good engineering
management should be evident in facility location, design, construction, and
installation and should be considered in selecting contractors to' process
the OTS da ta.
Personnel safety and comfort, utility availability, and
physical access control should all be adequately engineered.
Site preparation
will vary greatly B.l7r:)ng candidate facili ties for data entry, storage,
processing and output/ however, all must follow appliCable guidelines for
-20-
-------
safety and security. [llJ
Plans for security of -the facili ty should include:
,
i
I
I
II
II
Ii
Ii
,I
II
"
I
I
,
o
Selection of a facility
facility so that it can
unauthorized entry.
or modification. to an existing
be easily protected against
o
Minimizing threats from fire, water, wind, etco
Provisions for physical protection should include the following:
I,
i:
[,
o
Identification of critical ADP areas to include the computer
room, the mechanical equipment room, data control and
conversion area, data file storage area, programmergs area,
forms storage area, and provisions of adequate physical
protection and access control for each area.
o
Protection against theft, vandalism, sabotage, and forced
intrusions by use of adequate lighting, intrusion detection
systems, physical barriers at doors, windows, and other
openings, and guards as required.
o
Control of access to critical areas and ADP facilities by
installation of conventional or electronic door locks,
supervision by guards or receptionists over movement of
people and materials, administrative procedures (sign-in
logs, identification cards or badges, property passes and
shipping recei ving forms), and similar measures.
e.
Equipment
The equipment employed in data processing will have a significant.
effect on the ease of providing for data security.
Since da ta securi ty
includes preventing accidental modification, the adequacy of equipment
and its state of maintenance should be considered.
Equipmen t managers may
perform maintenance locally, contract for preventati ve maintenance and
repair from the original vendor, or simply request ad-hoc repair from the
supplier when the equipment fails.
Data entry equipment (card punches and
readers, teleprinters, CRT terminals, optical scanners, microfilm readers)
-21-
-------
"
are combinatlons of mechanical and electronic machines.
$ven thoqgh
- .
technology has improved both disciplines, electronics h~. far surpassed'
mechanics in equipment reliability.
Data integrit!1 w.ill dePend on the
design and condition of the data entry equipment and the man-machine
interface between .the operator and the equipment.
.."
Data storage equipment varies in the protection provided. to stored data
. "',
through the physical and electronic data handling :techniques emplo.yed:
. '.~"' . .
f'.'"
reading and wri ting techniques, error detection/corre(:tion- techniques, and
storage media handling techniques.
In addition, data storage ~uipment and
facilities include those which store rerrovable media.
Tape libraries,
card and .forms storage, etc., must be physically. protected.. fri:)'ln' theft
, " t f;~
of the media and must be provided with fire, water, temperature, and humidity
..,
detection and control facilities.
Provision sho.uld be mad~f;r cleaning,
copying, and degaussing (magnetic erasure) equipment to be used fQr erasing
.' "
storage media no longer required for archi val
magneti c ~tor~~~.,~ ,-- - '~:\. .
'. '; :::::~ .~~;~ <~:'\iV;~..i'?::/"
~. . !.
Data security is of lesser concern in output equipment., unl.ess .this equipment
is also the dissemination system (remote terminals, computer and .:t;errrdnal
networks, telecollUllunications).
Impact prin,terstpatare l~~~:~..:.:~:X~~,
data processing facility may be physically protected against imauthorized
viewing of the printing.
Data dissemination through reports is generally
a manual task and hence regulated by administrative controls and procedures.
,'j: :~. ~..:.( ~S-->:i .~.:
Data processing equipment and control and appli caiJ. on programs which
process the data should have a minimal set of safeguards incorporated in
them, including means of user identification and verification, and isolation
.~ ... .'
. 1 ': .'.::'~
-22-
-------
of specific data from users not authorized to access it.
The implementation
of these safeguards will vary arrong systems.
Isolation techniques may
range from dedicated use of the system by one user p to a very complex
memory segregation algorithm.
Personnel identification may be performed
via personal recognition, badges, ID cards, or individual passwords.
Equipment security recommendations include:
o
Selection of a computer system configuration which can
operate at the required security level. Factors to
consider in this section include the availability of
operating systems incorporating state-of-the-art terminal
access to the system and data files. Such a selection
may require use of a dedicated computer system or use of
a general system operated in a restricted manner (e.g.,
no user programming concurrent to sensi ti ve processing).
o
Utilization of user and data isolation capabilities in
the system (memory protection and segmentation, data
storage segregation).
o
Implementation of a computerized authorization mechanism
for controlling access to sensitive data files. Data
authorization should include explici t data handling
permissions such as read only, read/write, append, and
delete.
o
Adequate provisions for equipment maintenance and backup.
fo
Quality Control
The implementation of quality control procedures throughout
all information processing operations is a f.undamental component of a data
security program.
Data in tegri ty is a necessary prerequisi te to the
quality control process and is variously defined to mean correct data or
data that is both correct and uncompromised.
Adequate quality control
procedures must be implemented to ensure that the data collected and entered
is accurate, and to detect and prevent unauthorized or accidental modifications
-23-
-------
to the dat~.
Qual~ty c:Qntrol procedures should include'provisions to
assure verification. of the .accuracy and integrity of the data at all steps
in the in forma tion system.
Assurance of integrity may include Use of
redW2dant storage and processing facilities.
To catch, errors introduced
by raul ty machines
or noisy conmunication channels" extremely imPortant
... . .'
data may be transmitted tWice and the two transmission compared whether
this transmission is wi thin a single computer system or among a pl'urali ty
or systems.
Encryption techniques may be employed if transmission of data
that is hi'ghly sensitive to disclosure is contemplated.
Quality control procedures should be developed to provide for,
II
Establishment of an internal audi t team to audi t ei ther
o~s or contractor data processing procedures with
representatives from the SPA audit, building safety and
security, ADP unit, and users units.
II
An audit: plan and schedule for systematic validation of
all critical security and emergency measures and reports
to OTS project mana98ment.
o An audit reporting system to monitor quality control
procedures in order to detect emerging. deficiencies and
to assure their prompt correction. A check list n&Y be
used for this purpose.
o Establishment or certirication procedures to deterrn:J.ne
that security reatures or the system comply with OTS
specirications which in turn satisry the security requirements.
Certirication is required in order to check that the design
is complete, to conrirm that the implementation is correct,
to determine that the installation meets all design standards
and requirements, to establish that a system is secure.arter
system modirication, or railure, or arter penetration (either
actual or suspected) has taken place.
o Establishment or procedures to resecure the system ir a
penetration does take place, and to veriry that the data
base and processing programs are unarrected.
-24-
-------
4.
Summary
This Section has presented the basic considerations of data
confidentiality which OTS msut take into account under the provisions of
the pending legislation.
The trade-offs among generalityp costs, efficiency,
and security must be made when specifying initial conditions of contracts
in data processing and in evaluating various responses from equipment and
services suppliers.
Defining and understanding the final goals of envrionmental
protection will lead to realistic predictions of the needed resul ts of data
processing and hence to the data collection and pxocessing requixelilents 0
'rhis Section pxesented recommendations which may be implemented by
--
OTS to effectuate these basic considerations.
.~ ........JM.~
-25-
-------
References and Suggested Reading
(1)
Branstad, Dennis K. and SU$~ K. Re,ed, "Executiv~ Guide to
Computer Security", U.S..:,Pepar,tment of Commerc~, Nlitidna,l
Bureau of Standards, wa~1U.ngton, D.C.., NB,S S~9ial Pub,liqation,
1974.
(2)
Davis, Ru~h M., "Privacy and Sf;Jcurity ip Computer Systems} An
Overview", CBEMA Privacy Sf;Jries 2, Compute,f ~d Busin~ss Equip!.f'Snt
Manufactuz:ers Association, Wa$P.i!1gton, p..C., Februa,ry 1974, 21 p~
(3)
Geller, Sydney B.,. "The Effects, of Magnetic Fie:tds of Magnetic
Storage Media, Used in Compu~rs", u.S. Depart~nt of COIllIllerce,
National Bureau of Standards, WaShin~t;on" D.C." NBS Teco/Ucal .
Note 735, July 1972,,30 p..
(4)
Geller, Sydney B., "Factors in Archival pata ~tor,age Systein$",
U.S. Department of COlll1lerce, National. Bureau:, of sta.ndar4~,
Was~ington, D.C., Working .paPer submitted to Datamation, 1'974, 16 P
(5)
Mi1grim, Roger M., "Trade secrets",. Mathew Bender, New York, N,.Y.,
1972 .
(6)
Parker, Donn B., Susan Nycum, and~. $t;e:pften Ow:;a, "Cornpu~r..
Abuse", Stanford Research Institu~,.M~).o,Par~, California,
1973, 131 Po.' . ,'. . '
(7)
"Records, Computer$ and tll.e Rigpt~. of CitizelJs, Report of the
Secretary's Advisory Co~.tt~cm Automated Per.sona1 Data Systems" ,
U.S. Department of Hea1t~, E,duca.ti01]. and W~lfare,. W~hington, D.C.,
July 1973,346 p.
(8)
Reed, Susan K. and, De;mis ~. Bl?'P1st.a,d, "Contro,11e.d A'ic;essibi1ity
Workshop Report", Q.S. Departrn~t of Commerce,. l'!atiq'la1 ~ureau of
Standards, WashJ.ng,tOI:l, D.C., NiJS;' Technic~l., Note 8,27" MaY, 1974, 82 p.
(9)
Reed, Susan K. and Martha. M., Gray, "Contro11~d. Acces$i~i1ity
Bibliography", U.S. Department of Commerce, Nati,ona1 B~~f3.u
of Standards, Washington, D.C." lf~ T~c~ic~l, l!,ote 780,- JUlJe
19.7'3, 11 p. -
(10) Renninger, Clark R.. and. Deoois K. Brans.tad, '~Government Looks
at Privacy and Security in COP.lputer SyptellLS", U,.S. Department of
qommerce, Nationa). Burea,u of StaJ7,dardiil, w,ashington., D.C.; NBS
Technical Note 809, February, 197,4, 37: po, '
(11) "Risk Management and Physical Security", u.S. Department of
Commerce, National Bureau of Standards, WashingtOn, D.C. I.
FIPS PUB 31. "
-26-
-------
"':T.~.i:~": '.:'
Co
System Considerations
In order to meet the requirements posed by the new, IJ3(Jislation, OTS
will need to seek servioes enoompassing:
(1) design and implementation of
a manufaoturer reporting system, as required by S.426; (2) aself-oontained
information management system to assist the Offioe of Toxio Substanoes in
the oontrol and management of the 9ata base that will be ~uitable for storage,
retrieval and reporting of data generated by 'the manufaoturer reporting system;
and (3) establishment and operation of a faoility for prooessing and safe-
guarding confidentiality of data reported.
, '~,
:"'~:: ..
r
':..".
A oandidate system must be flexible to evolve gradually as required to
aooommodate an initial volume of 10,000 manufaoturer report~ annually on
approximately 500 ohemioal substanoes.
As the initial vQlume of reports
grows and as OTS information needs become more orystallJ.z=e~i,:, ~f~",ch~iru.oal ':'
. , .. ,..,:" :, .~: ~. ,.'
, Abstraots fila may be needed for ohemioal structure searohing.
Some indeterminate proportion of the manufaoturer rerx;~:~,>~~,~,~',::,:~;~:~ :,~ izj:L '
,yolved in administrative transaotions that will frigger"s'Jgi{ii.l~::for orga-
.':,"
, ,
, .
nizational units within OTS/EPA to take appropriate aotion:
prepare a re-
port, publish a notioe in the 'Federal Register, eto.
1.
Reporting Form'Struoture
'< ~,'~;~~:>'
. , "'~~<':
The aotual design of the form to be used by the Office of ' Toxic
Substances requires oomplete information about the eventual reporting'reg-
ulations that will be issued by OTS.
Therefore, the final design of the
'. -0 ' '/', ;O~::::,.::'::.\
report form is left as a task for the providers of the di!:,p~}::b~i;ip's~:i:vioes.
In this discussion, some statements of OTS requirements are expressed and
-27-
-------
" -
sufficient detail about the reporti!lg form defined in order to provide a
basis for the'system description and cost estimates provided in this report.
The Bureau of Census form CB-50L(Exhibit '1-1) provides an excellEint example
of a reporting form to be considered by the Office of Toxic Substances.
There is a great deal of information on the form that assists the respond-
ant in understanding how, when, and to whom he should submi~ the forms.
Similar information should be provided in the ors form, so that respondents
will have clear instructions on' what data to report, what codes to use,
etc.
Exhibit 2 shows the data elements to be included in the reporting
form, presented are five 'items with several fields in, each item.
The i terns
arel
manufacturer data, chemdcal substances, end-uses I by-prodUct~1 and
production features.
Manufacturer data is that portion of the reporting'form that contains all
the informa tion needed to .1denti:fy the manufacturer, processor, or lmporter
,
who has submitted the report.
rhere ifJ ,only one such item neededtfor each
annual report.
Further, it can be assumed that subsequent years' reports
wlll contain substantially .1dentical information.
Chemical substance items contain information about a speciflc chemical
substance.
Like the information identifying the manufacturer, the infor-
mation about the chemical substance can be assumed to remain constant over
several annual reports.
-28~
-------
EXHIBIT 1-1
SAMPLE CENSUS FORM
PfNAl" FOR FAILURE TO REPORT
DUE DATE. APRIL 30 1968
F r I '.fJ r \I I lIu I. I II r II N I U ~, 7tH"~
,
I
II II' ~ II "(: 'p' II f'a '.' .
11"- CB-SOL U.S OLPARIMLNI 01 COMMLRCL NOTtCE - Ht..."u~\'w tll tht~ \IUluir'l i.,. rt"IU\'t',l ,'" Il\w (Tit I" n U.S. CIM'..\' lh thl" I.I1n~f"
ISOUI 8UI(AU Of 'Ht (tN.!oUlo I..",., ~lIur 1I',..,rll.,1I1I' 1:"11,,"..11""',1111'" ,'ullh'!.'IIII.11. II lIIilY III' "r.'n unh h\ ""HlfII (:"11"11'4
CENSUS OF BUSINESS "1111'10)1°"" .tuflrnay I,I' 1I,."lnnlv ,"u, "tali!lolit'illl"IlIIIl""II. "I'll.. law ..I:oou I'ru"illro IhGI ('nlli...
1967 rl'l,IIIII',11II \'''111" lilt... IIf" 11III1IIIIuo fru", 1"llall"""""'"
I" co",,,pondOMG pertaining 10 thi, roport, tEmplDvor
PETROLEUM BUSINESS. BULK STATIONS. TERMINALS ploa.. ,ofo, 10 'hi, Co.,v. fllo Nvm~ IdO"Ilflcalio. "'jj!
GENERAL INSTRUCTIONS
,-,...,.,. ,'um,.w.'," 000 f..'urn ,hi" fmm \n In" f'n'lC"lutlt' provid(':\t. CI.50l (5012)
U )UU ultI'rlltrd murro than un(" c!.'fahli"hml'ul IIm'aliun) und.,c the ~amf'
Emplll)'f'r Idt'nlifkatiun Numh.'c in )1167, 1'lItci,"~ un fhi!/. rCllurl sllllUld
Iw- ('un"ulidatrd (ur all suc.h lut'alinn!/l t'IH"'111 Ihat in ilf'm I. rnlt'c Iht'
'nc"ati.,n u( )lIur main t'~tal"lhlhment ami in il..111 I} pruvide in(urmaliull
sCIJalAtdy for eu('h luc'atiun.
If ),,,,r Emllinyrr Idrlltifinliun NUlI1br.r uh.. number aplH'urinp: on .
Emplu)'rr'. (JUllrl,~rI)" t'rft,'ral Tax Rt'lurn, Trf'a!lury Furm q~11 ,,'as
t'hal1l!.-d durinft 1967, 5ubmil a ft'lHlrl fur Ih" rntin" I),'rind IIf fll,,'rtUiun ~
i. 1%7 un IInr 1%7 Ct'n"u~ rel)f"lin~ furlU. ao(1 lis 1 _II EmIJIII)'er
Idt'nlirM'aliun Num~r. u~ed durin~ an)' Ilarl uf 1%7 in hem 2,
Thi!' rrpun shnuld f'flv,.r the ('ulf'ndar year 1'J67 "r, if rr('urd, arp.
maintained on D 6~('al )'ear hU!/Iis, Iht, revurt !/Ihuuld ('owcr Ihe 61:1('011 yeiu
wh~h includel al leasl IU mUlllhs uf 1967,
U book £ligures are nol ayailable, enle.r your be.1 e5timate.,
If unuMial t"if('URISlam't'~ !iohHuJd cau!!'e an un.Jut" bun.len in filin/.! L.y Ihe-
du~ date, or i( you hU\le an)' Iltl('~linn!ll, 1)I"a~c ,,"rite I.. Ihe Jefl'rr»on.
wile CeDeu. O,terationlS OffiC'f", Jt"fferlOnviJIe, Indiana 47130.
I. NAME AND PHYSICAL LOCATION 2. EMPLOYEE IDENTIFICATION NUMBER
8. Enlrr che nom~ by "hi('h "OU hlt,'ntify the ..lItaMJ.hment b Ih~ Emplll),.'r Idl'ntifit'alinlllEIl \:urnht'r printed in ,he [ldtlf('M~ la~1
IIII' S,'\lE liS Ihal UI:lt'I' (IIr Ihi:-:. 1'~lal,lh.hmf"nllin ,'uur lat,..., 11)67
\'diU ans''''r.. '1. thr IIlh,'r illtillirit''' uf Ihi. f..rm ~llInJld ",Iult' In tlu- 8('1110' [n",IU)'el'li Quarte-rly .'cuNal Ta'\ Ih,turn~ Trea.~ry Form 1)-4l?
ph,aiC"allorolion uf Ihi~ rIOlaillidllnt'nl whidl.mcty be tiifT..'rt'nl (rullllh...
mJiliulE uddrt'M", 0 V., 0 N.. (If "~..." .nl;u X.~
------------------------
t'e'n.'fOnl CUlt..
~~ip(" b-F.ntpr here 111101 81110unl uf l1uliuline. oil. and lI.her ('x.-is.. (lr ~ulr~ CI) If 00"1''':. 8l)l,rultimat",I)' ",,'h.u -:-:~,
laU'1 (Iocol. Slah', anlt Ft'ftt'rah t'ullrdt'll (rum ('u~lflm..,. and Ilai,t Itirrc'lI) Iwrn'lIl !If ~'lIur ""II'!/. in ;'11 alm\", , ..' X.lOo
by you In any (;tJt"ernnu'nl Ininl1 811,'nc'Y. The amllunt r('purl('.II...r.., !Ohuuld ~~~""nh'll rl'lail hUllin.,,,,s"! ..
aillot! bto indudrd in ilem 5.., d. "\lll'ru\iInaldy "h.:.lIIH'(("'III, if Lln~,
u( ....1t'1I. lilt'1U 5" 111"'\f') "OS
I~ine (' - ~lark "Ye!'.' i( drin',in (adlilirs are pruvidrd Iht' puhlil' onll yuu at't"uUnlrlt CI" h) -
('uoliumonly aervict' aulunJUbil",a, olhendst' marL. "Su," tU Sui... u( 1.1' .,... fur r~..lIr? q 1 1.1
.,
(2) Orli\lf'ril"!1 IIf fut.1 nil
dirf'l'l hi humt'!'"! q 2 1.2
431 Orliyerira anun 5181(' lin('.'~ \1- J 1.3.
I
i.
-29-
-------
1,__- ---
EXHIBIT 1-2
SAMPLE CENSUS' FORM
FORM CB.50L-Con.
"".. ..."~,,.;r..o ',....... ...." ...~.. ,'~ ~ .. ,.V,..o/'"'-'. 'c. "
6. COMPANY AFFIUATION
.. Mark .hl. box 0 if ,hi. bu.ine., i. owned ur <"n"uUid by a....h.. coin. b. Mark '.hl., boa 0 if .hi. bu,ine.. uwn. 0, cun,rui. .ny o.he, cumpany
~ny and en,."..he n,me. ,mailin~ add,e... and f.OI,''''ye, hlon.ifica,ion '" ctlmpani.. and onl.. ,he nlln.':n.ailin- add,.... and EmplOy., Id.n,i":
umt"", (If uwnin.c or ('unlrullinll ,'umpany (if knuwn). ' dun Nuruher uf uwnrd ur c:,ul1lfurIt.'d c'timlulnirs or knuwn).
Nan.. nf ("""I"ny '-'--l\lailln- add;... (Num~,. .i,e.:, ,,,'y, s,a,.: ~I,~ cc~e. EI Nu. (~dlp")
.' "1,'..'. -
-.-,'~~-~~""""'~
'1. DVR>Ll -nVRING 1967 8. INVENTORIES OF THIS ESTA8U$HMENT
R'"IM,rl 1111111 Witlt"". "aluria'", bUlHI""", I'ulllrni",!\i'lIn~" f.'r"" 111111 !lilli" rt'tf1l1nrra. I#i.... a- Hrvurl in".'fltn""!t .1 c'u~1 ul1uI' ralh",r than 41 illr.t I~nl"~' hU'IUllr
t,un pail' II' Ylllir rtnpl")"'I'!' ,Iurill~ I'H,7 .ltfOfort" d"ltllc'I;I.n!' ~ul'h """'flll,I"yf"'''' ItH.';)" IIwne,1 by )'IJU and "lIn"i~III..f lu l,tht~n"IHIl nut (tucnllt n nl~"fI I'".ht II)'
Suddl Sr,'uriIY c'lInlril,utilln~. wll"huMiulo[ Inl'''. Itruul' in:oloUrdn,',: Ilrt.miul1I!I. )'uu, Kf"I~Orl invf"nln,ir." aa "f datr. al'r.,'iflrd, ut nr..rcill in"i:nlur)' dHh'.
uniu" du,'"" aflll ~OI"ill~~ bund~, 1~t:U IJE ~lIc'h ,h'nUI as ,ii..I11;....oall'a)'. U('U. COhtmi...lon .ICf)nlt- H."V"rI III'''! unly iri"..nturie. uwne-d by you, t,h. nul
linn and "kk 1"lnr vuy.lh.. .,."t. r1Iui~JII'"IU£ Vuyml~nl in kintl bud. a"'~IMtttl4.
lods:inat. f'M.d. ~uul (.llIthinJ1). Ii"4Cl.lIl>E tal.ric." or offac'eu. if .. c'urtMlfit'iun, inc.'Judr. inventorie. IIwned by (;cimpaniel ('i' whum you teU on a ('ornmil!lilln
DO NOT in,'ludr. ('umpenutiun ur POl)lut'lIlllu. ut wilhelr....I, hy. I'n..~t'elur. ltasi!l.
ur p8r1nNIt of IIn unint'urpurated bu.ine... ,' Dull'" Cen.. Il."
. .. ~1,.r('hBndiae inwenturiet. al ('Ult 2XXX
I Dulla" i Cen.. I Key. IU 0.,".10...,31. 1967......... ...." ......... XX 2.5
Tn..1 ANNI!.~I. pay,nll du,in~ 19671 I 1.2.1
I (2) [)"~:«:.rj1bti.3J.., 1,~""'~~""'~"'J~~:'u~'"","'-'" 2~.'
.;(.:"'II,.,hu,ti..",.;""."...."...",..."....... " . xx , ..JI'.' "'."""" ~ f. f"'~'# . Xx.
9. YOUR 8USINESS LOCATIONS
.. In 19(.7' ,Ji,1 )'OU oprral" ,'our hu"in~lI. .1 mort' Ihon Ultf" I(}('ution un.l..r Ihe- .'
":lI1ltln,.t'r 1,1,'nlifi"ulion NumlH"r you hacl.1 II,. ..nd uf 1967[[[ 10 V.. 20 Nu
b. 1£ ..\,t'~:' j", marL...,1 .IHI\"', !,..t1llroUt.1)- 1i~1 bl'luw l'III'h Im'.,illll, ilU'lullin~ ~'uur
main ,.t.llinJ1.1uc'aliun an,t fat'ilil;"" utlu-r than ~,'lIinlt t':otahli:
-------
FORM CB-50L-Con.
II. KIND OF BUSINESS
CASOLINE. KEROSENE. DI!>TII.I.ATE OR RESIDUAL
Bu'k l~nnln.1 ('WIth .Ion... r.p.C"iI,.)-~'ark chi.. ilf'm if.he (,u'ilil)'
i. primarily rnAt8,u.d in Ihf' di!llributiun or .."nlin.., k..ru"..n... dia.illale ur
'nidu.~ (urt nil. an,'-
C.. ha. tnlll bull -Iur.c,. upa('ily of 2.100.000 pllunll or ",Ott!. qr
(b) has I,... nparily bu. rf"t"f'i"u ile principal product. by lanker,
~rar ur piprli",..
Aulom..le 181&.'011' Irrmlnal- Mar~ IhiA ilt'm if _hi. 'Bc-ilia)' ha. nu
buU. liquid 1101"1" I'al.adty. but tran"p..n. In8d difflc, (rum piprlinr.
Bulk "8tlon-Marlthi,. ilf'm if thr feral;.)' i~ primarily enj(81f'd in Ihr
dil.ribulion of ,aloJin... brOH'ne. distill.... or r,."idua) fuel oil. e..d-
(a) bas tolal .soralf' capuil')' of le.1 than 2.100.000 18110na: and
(b) dou not receive ita produCIG by lanker. barge or pipeline.
Trurk JoblM-r (no .Ioralle lanl..)- Mark Ihis itrm i( you M,ve no
'iltinnary bulk prlrulf"um «tnrage lanb Ind are primarily ~nKaged in
buying: on your own a('("Uunl. and aellins peotroleum product I (rom truckl.
Other Irpe of IH'trol~um produele dislrillUlor-Mark thb. item if
oprralina 81 8 pelrulrum prodUCI& dislribulor nul cU\lNed above. and
enler a deecriptinn 01 your bu&inr.&8. For rumple; f'kport"" imporlr.r,
packag.d goods jobber. etc.
LlQU.;FIED CAS (PETROLEUM)
Wholesale bulk plAnt or terminal- Mark this item if the (acility i.
primarily engaat'd in the dislribution of bulk liquefied petroleum fEases.
Wholrlal" boulrd aps dielrihulor- Mark thi, item if the (odlity is
primarily e"lalt'd in the wholesale di'lribution of bottled liquefied
petroleum p~l.
OTHER KIND OF BlJSINESS-1I none of tbe other types of operation
8pplirl, mark this item and describe briefly your mdhud of operation.
12. CAPITAL EXPENDITURES OF THIS ESTABLISHMENT. 1967
<':apilol upr.ndilurre rrfrr Itl all ('U!,U inrurrf"d durin~ IIJo7 ",hidl an' ("harlte-
ahlr In fi.rti a'lrl .,'C'(luuh flf thi!l eSlabli!ihmrnl ..1111 wl.i('h arr IIf the IYI)r.
(ur "lIi,'h d"llrt.t'iatilm ot"('"unl" arr urdinilrily maintain,',1. J)u nol indude
main\"nBon,'~ II.nd rf'pl1if (""':job ,",uuILed to ("..."~nt u\M"rJ,N-n..ibilili..,
ilnd thr!'r a~rn,'ir5 shall t,t" pruhibil,.,t frum uliinac thl" infllrUl,,"un fur ulhrf
.hotn natiunal dt'ff"n~t" IIUraltl~r~.
III Si~nalur.. and litlr flf autl...ru:f"d pt"rl'un
14.
CERTIFICATION
This rrrurt j" fl,uh!'lanliaUy Buunte- and t'tWf'r1l Ihr prriod (rom
Si~na\ur. ..f au\ho~i.e~ penun I 'tide
fORM CB.- SOL
II. Tht' Rurrau of Ihr Crl'l"UlO i~ nnl aUlh"riz..d lu rrlf'ur informal inn on Ihi;
((Irnl In Ihr Offi,'r II( Oil and Cd" o( Iht" l'.S. O"I'8r1mrnl of Inlrriur (or ile use
ill ""nn,.dilln -ilh il. 11,.1"~,alt"d rr$ptm!'lbihl~' f'lr pelrulrum .Inr~f'" and
di!luibuliun in Ihlt" neonl u( 0 natiunul rmt'rt(t"IU'y.
12) Sil(nature end litlr uf authorized pt"rtun
Trl"l,hnn,. j \rr.l "tHlr. numtM"r. r\I.)
Trlephune No.
Aru t"oor '1 Number I ESlep-ion
to
Datf:
101)-(100
-------
Exhibit 2-1
Manufacturer Report
Manufacturer Item
Data E1emen~ Name
Fiel.d'l'y'~e,
Fi.e1.d,. .~gth;
Corporate Name AN 30*
Corporate Address:
Street Number AN. 22*
Street Name
City A~ 20*
State ~.' 2*'
Zip Code N, 5
Co~ty A 15*'
Contact Name AN 20*
Contact Phone No., AN. 10
Reporting ,Estab. Name AN 30*'
Address:
Street Number AN 22*
Street Name
City A- 20*.
State A 2
Zip Code N, 5
County A 15-
Contact Na~ AN .20*'
Contact Phone No. ~N, 1'0*'
Corporate II) No. N 9
Function:
Manufacturer
Processor Ai 3
Importer
Date Submitted N 6lj
Data Restriction Ind:j.c~t;.ors N 20 y,
Report No. AN..' 10 Y
Test DataI~ AlV 20. lj.
TOTAL CHARACTERS
316
*Average Length
Y To be ~upp1ied by OTS.
YIt is assumed that an average of 20 boxes will be checked as confident
trade secret, etc.
-32-
.".
-------
Exhibit 2-2
Manufacturer Report
Chemical Substance Item
Data Element Name
Field Type
.Field Length
Substance Common Name A 30fl
CAS Registry Number N 9
CAS Name AN 30*
Trade Names:
Trade Name 1 AN 30*
Trade Name 2 AN 30*
Trade Name 3 AN 30'"
Synonym 1 AN 30*
Synonym 2 AN 30*
Synonym 3 AN 30>:1
Census Material Code N 9
TWo-D Representation Graphic
Molecular Weight N 6
Test Protocol Available? A llj
TOTAL CHARACTERS
265
*Average Length
Y To be supplied by OTS.
-33-
-------
'--~--~;--",
Exhibi t 2-.3
Manufacturer Report
Production Item
Data Element ,Name ,':i.eld7.'~e , ~ield ,Le~gth '
Manufactured Quantities: .' ,.
1st Quarter N 9
2nd. Quarter N 9
3rd Quarter N .9
4th Quarter N 9
Total AMual N '10
Processed Quantities: .."
1st Quarter N '!} ",-
2nd Quarter N .9
3rd Quarter N 9
4th Quarter N 9
Total Annual N 10
Imported Quantities:
1st Quarter N 9'
2nd Quarter N " '9
3rd Quarter N 9
4'th Quarter N. 9
Total Annual N 10
TOTAL CHARACTERS
138
..
Comment:
It should be noted that .-not all of : the "above fields w£ll
contain data. ,An establishment 'that ,manufactures,.cmly,
will obviously not report production.data for processed
or imported quantities.
';'34-
-------
Exhibit 2-4
Manufacturer Report
End Uses Item
Data Element Name
Field Type
Field Length
SIC Code
N
A
9
100
Description
Annual Production
N
10
TOTAL CHARACTERS
119
X 25 end uses = 2,975- characters/chemical substance
Note:
It is estimated that an average of 25 end uses will be reported
for each chemical substance.'
-35-
-------
Exhibit 2-5
Manufacturer' Report
By-products Item
Data Element Name
Field Type
Field Length
. . , . , , .
By-products,
Chemical Name
Annual Production
Di.sposal Method
By-product; or Uso?
By-product of Disposal?
By-product of Manufacture?
A
N
A
A
A
A
30"
9
30"
1.
1
l
Effluent... Gallons r;or Day
N
8
, ". 1
TOTAL CHARACTERS
80
x 4 by-products c 320'characters/chendcal substance
, , , , . , , , ,
, . , I , , , , . , , , , . , , , , , , , , . . , . . , , ,
Note:
It is estimated that 4 by-products will be reported' for, ,each'
chemical substance.
... Average Length
-36-
-------
--.----
I
Exhibit 2-6
Premarket Report
Manufacturer Item
Data Element Name
Field Type
Field Length
Corporate Name AN 30*
Corporate Address:
Street Number AN 22*
Street Name
City A 20"
State A 2
Zip Code N 5
County A 15*
Contact Name AN 20*
Contact Phone No. AN 10
Reporting Estab. Name AN 30*
Address:
Street Number AN 22'11
Street Name
City A 20ft
State A 2
Zip Code N 5
County A 15ft
Contact Name AN 20*
Contact Phone No. AN 10*
Corporate ID No. N 9
Function:
Hanufacturer
Processor A 3
Importer
Date Submitted N 6lj
Data Restriction Indicators N 20 Y
Report No. AN 10 Y
Test Data ID AN 20 Y
TOTAL CHARACTERS
316
""Average Length
Y To be supplied by OTS.
Y It is assumed that an average of 20 boxes will be checked as confidential,
trade secret, etc.
-37-
-------
Exhibit 2-7
Premarket Report
Chemical Substance Item
Data Element. Name
.E::Leld'l'!1pe .
':Leld .Lengtb .
Substance Common Name
ClIS Registry Name
A
'N
30*
9
ClIS Name
Trade Names:
Trade Name 1
Trade NartJe 2
Trade Name 3
AN
30*
Synonym 1 ;
Synonym 2
AN
AN
AN
AN
30ft
30ft
30ft
'30ft
Synonym 3
Census Material Code
AN
AN
30ft
30ft
':N
9
Two-D Representation
Molecular Weight
Graphic
Test Protocol Available?
'N
A
6
lY
TOTAL CHARACTERS
.265
,
i
'.
*Average Length
Y To be supplied by OTS.
-38-
-------
Exhibit 2-8
premarket Report
By-products Item
/
Data Element Name.
. , .F:i,~l,d . 'ryp~ ,
Field. Length
By-products:
Chemical Name
Annual Production
Disposal Method
By-product of Use?
By-product of Disposal?
By-product of Manufacture?
A 30*
N 9
A 30*
A 1
A 1
A 1
N 8
Effluent - Gallons per 'Day
TOTAL CHARACTERS
80
* Average Length
-39-
-------
Exh:J.bit 2..9
premarket RepOrt
End Uses Item
Data Element Name.
, ,Fi.eld :Type" , , " ,,' ,F.i.eldLe!1gth ,
SIC Code
Description
Annual Production
N '9
A ,100
N 10
'.
TOTAL CHARACTERS
,119 "
-40-
-------
End-use i tams contain the informa tion about the manner of use intended
for each chemical substance.
That is to say that end-uses exist only in
connection with a particular chemical substance.
Again, this information
is not likely to change from one annual report to the next.
By-products are very similar to end-uses iri that they exist only in connec-
tion with a particular chemical substance.
By-products may be chemical
substances or products.
There may be several by-products reported for a
chemical substance, and this information may be assumed to remain unchanged
over several annual reports.
.production items contain seasonal and annual production figures for a spec-
ific chemical substance.
It should be noted, however, that production
figures are also required for each end-use and each by-product reported.
It is anticipated that production figures will change with each annual
report, and that all annual reports will be kept on file for checking of
trend informa tion.
Exhibit 2 tabulates descriptions of the data elements associated with each
item for annual and premarket reports.
-41-
-------
,.'",.
,"
'Data 'and Report 'Volumes
In the following discussion of data volumes, the parameters used
2.
are estimates representing the best information, available from the
Office of Toxic Substances.
These parameters are presented, in, Table 1.
Currently, it is anticipated that 3 to 4 reporting regulations and 3 to
4 restrict.i.on regulations will be issued e
-------
Parameters used in Estimating Report and File data volumes
,'.
Chemical substances under regulation:
Year New Total
1975 500 500
1976 500 1000
1977 500 1500
1978 1000 2500
1979 1000 3500
1980 1000 4500
Average number of respondents for each chemical substance: 20
Average number of by-products for each chemical substance: 4
Average number of end-uses for each chemical substance: 25
Tab1 e
1
-43-
-------
I
-...
...
I
~~~.....,.
AzrmuIl ,ReportfDg JUemeat:s
1975 1976 1971 1978 1979 1980
-
New ft>t:a.l New 2bta1 New , 2Ot:a.l B8I !rot:a.l New !rotal NEM !!'otal
~ -
' . ,
1. Regulations .
2~ Reporting 3-4 ' 3-4 3-4 3-4 3-4 3-4
3. RestrictiOn 3-4 !-4 3-4 3-4 3-4 ' 3-4
4.0 Total '6-8 6-8 6-8 12-16" 6-8 18-24"0 6-8 24-32 6':"8 30-40 6-8 36-48
" - .
5. c:he3Iical Substances 500 500. 500 1000 500 1500 JOoo 2500 1000 3500 1000 450t>
.
6. By-product:s 2000 2000 2000 4000 2000 6000 4000 10,000 '4000 14,'000 40'00 18,poO
7. El1f!-uses 12,5'00 12,500 12,500 25,00'0 12,500 37,500 25,000 62,500 25,000 87,500 25,0'00 1),2,5/).0
, 0 I
. 20,000 '
8. Pro~uct:ion P.epor~s 10,000 10,000 10,000 IO,ODD 30,000 20,000 50,000 20,000 70,000 'I 20,000 90,000
o .'
\:.: -'.'
-...."'"
:0... 6', .-..
_0
---
....: - -- -
:17..- ~.
?able 2"
0,.
~-"
" -
, '
---" .-
,-
~
-. ,-"-- . -
-
~
0'0
-------
Another major da ta element is the end use reported for eadh chemical
substance.
Each dhemddal substance is estimated to have about 25 distinct
end uses reported.
The volume of end uses reported will grow from an
/
initial 12,500 in 1975 to 112,500 end uses by 1980.
The impact of row
7 is to show the number of posting adtions that will be required to
place the end use information onto separate catalogues for sorting by
names of the end uses.
This process is similar to indexing a book.
The
inclusion of the phrase "yellow dye" in the index requires posting of
the page number in the index for each use of the word "yellow dye" in the
book.
Posting names of by-produdts is similar to th~t for end uses.
Row 6
shows that the estimated 4 by-products reported for each chemical substance
will produce 2000 initial reporting elements which will grow to 18,000
by 1980.
Production figures were estimated at 20 respondents for each chemical
substance.
Row 8 shows an initial report volume of 10,000 reports which
will reach a total 90,000 reports in 1980.
These numbers show the growth
of the annual report load as the number of regulations in effect grows
with the consequent increase in the number of chemical substances under
study.
The impact of the premarket reports on the volume of the overall reported data
is negligible because the nature of premarket reports is estimated to be such
-45-
-------
that only one. respondent will be reporting a new ohenrioal substanoe or a
new use of an existing ohenrioal substanoe.
Consequently, the impaot on
OTS will be essentially linear over the six year per-iod while the total
number of "new" uses will not inorease the end use list signifioantly.
Table 3, "Estimated Reporting Volumes," shows the impaot on the Offioe of
Toxic Substanoes of the inoonring annual reports.
For example, in 1980 the
Offioe of Toxio Substanoes must be geared to handle the reporting require-
ments resulting from 36 to 48 existing regulations.
This me;ans tha t infor-
mation about 4500 distinot ohenrioals will be on file and "distributed am::>ng
270,000 annual reports.
Similarly, 112,500 end~uses will be distributed
among the 270,000 reports.
This highlights the need for good indexing
prooedures to provide effeotive tools for finding desired informatia:n.
To draw sharply. the distinotions being made,reportingvQlWl!es are the
number of distinot documents reoeived by the Offioe of Toxip substances
while thefile.volumes
(Table 4) are the items actually filed-for eventual
retrieval.
The aotual files may not contain muoh of the redundant infor-
mation reported each year.
For instance" information identifying the re-
spondent need not be repeated il1 the file if no change was reported.-
Since the repor.ts received consti tute legal documents, they must all' be
preserved in what shall be referred to as th'earohivalfiles-. Whether the
archival files will also be the .work1ng 11.les is- a point for discussion
and recommendation.
If the archival files and working files are separate
files, then add;i. tional prooesses are required wi th the implied additional
-46-
-------
Estimated Reporting Volumes
.
~
"
I
1975 1976 1977 1978 1979 1980
This year accum This year accum This year accum This year accum This year accum This year accum
1. Regulations
2. Reporting 3-4 3-4 3-4 3-4 3-4
3. Restrictions 3-4 3-4 I 3-4 3-4 3-4
4. Total 6-8 6-8 6-8 12-16 I 6-8 - 18-24 6-8 24-32 6-8 )0-40 6-8 36-48
5. Chemical Substances 500 500 1000 1500 1500 3000 2500 5500 3500 900b 4500 13,500
7. Product Figures 10,000 10,000 20,000 30,000 30 ;000 60,000 I 50,000 110,000 70,000 180,000 90,000 270,000
8. End-uses 12,500 25,000 37,500 i 62,500 I 87,500 112,500
I
I I
9. Premarket Report 500 500 500 .1000 500 1500 ! 500 2000 500 2000 500 3000
Table 3
-------
FIle StJorage p~
. . 1975 1976 1977 1978 1979 1980
This year accum 2'his year BCCUZ!! This year accum i"his year accum This year accum This year BCCum
1. RePorts in file 10,500 10,500 20,500. 31,000 3O,SCO 61,500 50,500 11.2,000 70,500 182,500 90,500 273,000
-
2. .pages - 21,000 62-'000 123,000 224,000 365,000 546,000
3.. Characters" 40,530 . 40,530 80,670 !-21,200 ~20,810 242,010 201,090 443,100 281,370 724,470 361,650 1,086,120
.
-
..
. .. in 1,000's
\!'able 4
. .,
- -
-
-------
costs and personnel.
A working file separate from the archival file may
have a different structure than the structure in which the incoming reports
are recei ved.
The new structure would reflect the interrogating or query
functions anticipated for the file.
The test data file will consist of the actual test data submitted by man-
ufacturers in support of their claims, or at the request of the Administrator.
Data items in this file can range from five to six pieces of paper to one
or two feet of documents.
Test protocols may also be included in the Test Data File.
Anticipated
volume of test data is unknown at this time.
For estimating purposes,
an annual volume of 1,000 reports is assumed for the first 3-year period
and 2,000 test data reports are assumed for the second 3-year period.
3.
OTS'Use of the Data Base
The manufacturer report and test da ta files will form the nucleus
of a data base which will, as it grows, provide for the information needs
of organizational units within OTS.
Three of the seven OTS organizational units expect to have requirements
on the manufacturer report file:
Early Warning, Restrictions, and Testing.
The Restrictions Unit will use data reported by manufacturers to justify
regulations assist in writing them or assess their impact.
The Testing
Unit will use the data collected as basis for issuing notices that partic-
ular chemicals are being tested.
The Early Warning Unit expects to use
the premarket reports for issuing warning notices for some substances, as
required.
-49-
-------
User units will have certain requirements imposed on them, in order to
ensure an orderly and use:ful development o:f the data base.
These require-
ments include:
(1)
identi:fication o:f chemical substances (or uses thereo:f) for
which test protocols exist; and.
(2)
identi:fication and collection o:f existing test data and protocols
:for inclusion in the OTS file.
As both the data base and OTS experience with it grow, particular attention
should be given to the gathering of in:formation on the types of inquiries
addressed to the data base by OTS user units.
Such user data will provide
"profiles" o:f questions asked by OTS; the profiles can then be used to
monitor the responsiveness o:f the data base to operational needs.
The
volume of such activity against the data base is also an important factor
which measures the work burden placed on the in:formation system and :forms
a substantial Portion of the costs.
OTS personnel will also need to mon-
itor administrative activity on the :files so that a preliminary estimate
of feasible start o:f automating this activity can-be made.
-50-
-------
III.
FEASIBILITY OF ALTERNATIVE APPROACHES
A.
Manual System
10
Information 'Plow
OTS will pUblish in the Federal Register a reporting regulation
which will ~ist chemical substances subject to study and regulation and
will require manufacturers., importers, and processors of the listed substances
to report certain information on these substances ~oOTSo
I
I'
Upon receipt of a report, a "jacket" is created and a report identification
number is assigned for each submission.
This ID number is unique for
each report and serves as a shelf locator as well as an identifier.
In addition to this, the ID number may be used as a "base number" for
associating and referencing further submissions or correspondence regarding
the report to which it was assigned. The jacket folders are filed numerically
by the report ID number; they are then flagged, as appropriate, to indicate
when the annual report is due or some other action is required.
Annual reports are reviewed for compliance and correctnesso
Reports
which are incomplete or those which appear to be inconsistent, are forwarded
to OTS for verification and/or validation by contacting the reporting
establishment.
Reports that appear incomplete or otherwise unsatisfactory
are verified or corrected by contacting the reporting establishment.
A copy of the transmi ttal letter is filed wi th the' jacket which is flagged
accordingly.
-51-
-------
Satisfactory reports are forwarded to the cataloging section for indexing
and l'08t1ng of cIocument If) In.ber. to a,ppop1ate catalogs.
Silt ind8Jting
JOinu or d88cri,ptDr8 w1ll' be required in order to Fov:1de an adequate.'
retr:1eval toOl to the file by a logical coordination of the desired terms .
These six terms are:
(1) names of manufacturers I (2) 'names of 'individual
I."
chemical substancesl (3) names or end-usesl (4) names of by-productS'1
(5) names of new chemical substancesl and (6) names of new end-uses of
existing chemical substances.
2'he.re U'e 111121/ ".11/5 01 .tm,pJeDll5tl2t.tng such a IIInua.l ooo.rd.:Lnate .l.ndgi13g
system, a.U 01 them var.tants of essent:i.all!l . s.1ngle oper.tJ.on,
the l1OstJ.ng'
of document 11) numbers to the appropriate terlllB.
one of the simplest
ways of implementation involves the use of a catalog of 5" X 8" cards
w:l.th the digits 0 through 9 prepr:i.ntedacross the top of the cards' to provide
:1.0 columns.
As an'ex~le, a report whose ID number is A0432 from the
ABC Chemical Company reportJ.,ng the use or vinyl chloride in the manufacture
of paint would be posted in the column .indicated by the terminal digit
of the ID number, in this case 2, on three cards: one for the manufacturer,
the ABC Chemical Company I one for the chemical substance, vinyl chloridel
and the th1rd for the end-use, paint.
It is estimated that approximately 32 ent:des will need to be made in the
various, catalogs in order to record each submission. Report ID n~rs
will need to be posted to cards for the manufacturer, for the chemical
substance, for an average of 25 end-uses and an (assumed) 4 by-prod!lcts
-52-
-------
for each annual report.
Individual test data received will be assigned an appropriate locator number.
This number will be recorded on the reporting form submitted by the manufa-
cturer; the test data report itself will be filed by its locator number,
apart from the annual (or prema.rket) report.
A test data catalog will
need to be maintained in order to provide a search tool for locating desired
test data reports, as well as for identifying duplicate test data submissions.
it is suggested that the test data catalog be a card file, in test data
number order, of short bibliographic descriptions for each test data report.
Each test data catalog card will be posted with the ID numbers
of annual or prernarket reports which submitted the test data.
A single,
summary card for test data, posted with ID numbers of all submitting reports,
will also be necessary for coordination with chemical substance names,
as described immediately below.
It is very likely that there will be a need for access to the test data
file by chemical substance.
Such access could be provided by posting the
test data ID number to the appropriate card in the chemical substance
file.
This is not recommended, however, since it is anticipated that
the chemical substance cards will be densely posted.
A preferable alter-
native would be a separate card catalog, with cards of a different color,
for posting of test data ID numbers.
Test protocols could also be accom-
rnodated and searched by means of such a test data catalog.
A chronological record will need to be maintained on a
current basis.
-53-
-------
Each 'entry will contain the ID number of all reports received, plu~ a
short title for each report.
For security 'purposes, the chronological
log should not be a card file but rather a 'book-type record designed to
discourage additions or deletions. . The purpose of this file is principall!
to provide ful,l identification of submissions which are recorded in the
search files simply as ID numbers.
To minimize the number of catalogs required, it is suggested that new
substances be accomrrodated in the chemical substance file.
New sub$,tances,
."" ''''
'. ~. ..: ... .:.~ ...'-.:, ~~ :
can be identified by preiix.i.~g "New Substanc~j:, :bet#;~i';~~~{-~~~~£.~~~'::"':'-
substance (as is done in Figure 2) or by the use of cards of a different
color.
New uses can be accommodated in a similar manner in the end-use
catalog.
The by-products catalog will contain names of chemical
substancE
as well as names of products.
The distinction can be preserved by the
techniques described immediately above.
Or, if volume of.by-products is
extremely low, by-products which are chemical substances can be merged
with the chemical substance file, and the remainder with ,the end-use file.
To control access to the catalogs and to safeguard confidentiali ty of
reported information, several techniques are available.
One would I>e to
restrict access to catalogs and files to s.pecified personnel only and
I
record all information--both confidential and non-confidential--in the
same way.
,Another technique would be the maintenance of separate cifltalogs
for confidential data.
A third technique would be the use of a prefix
to the ID number of a submission of indicate that a particular report con-
tains confidential data.
The third technique is simplest, and recommended
-54-
-------
I
[I
I
/ Jon.. Ch.mlcol Co.
, 0 I I I 111. I !I I " I II I a I 1 I 0 I «
I I I AIU\7A I I MIII1I I I I I
I ABC Chemlca~ Ce.
0 I I;! a 4 B t'.\ ? n f)
A04a2 ~a~le
. -
--
Manufacturei' File
/ Vinyl Chloride
/ 0 I I I 2 I 5 I 4 I !I I 8 I 7 I 0 I 9
I I 1404HZ A !i!O' '3 I I POI~11I I I I
, Ne. Substance X
0 I 2 ~ <3 5 a 7 e 9
P5418
-,...
. Chemical Substance FQ~e
/I
I New Uses (Varnish)
I
/ New Uses (PainU
, 0 I I I 2 I 3 I 4 I 5 I 6 I 7 I I,) I \!!
! I I I I I POl25 I I ~ I
I Paint
0 I II B 4 !!I II ? I;) £)
A0432 A 2075 P64li
I-'
-
.....
,
End Use File
Figure 2
'~ .~
-55-
-------
. .
for tha treason.
ID numbers must be assigned and posted in sequential order.
It is suggested
that prefixes be used to differentiate between annual and premarket reports,
as well as to indicate confidential material.
ID numbers can also be sUffixed,
in order to link any sul?sequent reports to th~ original submission., or
to reference correspondence regarding a particular report.
2.
Retrieval 'of 'Information
O'1'S personnel in need of i.nformation from the files would use
the catalogs 4escribed above by coordinating appropriate catalog cards
to retrieve specific documents from thefi1e.
'1'0 illustrate how this would
be done, let us assume that the chronological J.,og of documents contains,
among others, the following reports'
POl25 e Premsrket Report of Jones Chemical Co., reporting new
use of Vinyl Chloride in Paint
A0432 e Annual Report of the ABC Chemical Co., on Vinyl Chloride
in Manufacturing Pai.nt
A2073 - Annual Report of Jones Chemical Co., on use of Vinyl
Chloride in Paint
P54l8 e Premarket Repo~t on New Substance X announced by ABC
Chemical Co., for use in Paint
'1'he entries generated in the card catalogs to record the above documents
are illustrated in Figure 2.
A researcher who needs production figures
-
for vinyl chloride used in paint would, pullout the appropriate cards from
the chemical substance and end-use catalogs and scan both cards for common
entries.
He will note that two document numbers are c01lUllOn to both cards:
A0432 and A2073.
Since both are annual reports they would presumably
-56-
-------
contain production figures.
The researcher then pulls the reports from
the files, totals the production figures, and returns the reports to the
file.
In the course of the search he may note the P5418 entry on the
"paint"ceard.
Since the prefix indica tes a Premarket Report, the researcher
may go to the clwono1ogical log of incoming documents to get the identi ty
of the report in order to determine whether or not the report may be of
interest to this particular query.
3.
Personnel Considerations
In the Manual System, facility personnel will provide essentially
library-type services from a library consisting of manufacturer reports,
premarket reports, and test data reports.
These services can be broadly
classified into three types:
( 1) accessioning and ca ta1oging; (2) informa tion
retrieval; and (3) file maintenance.
Accessioning and cataloging personnel will be responsible for the processing
of incoming documents and reports .
The prbcessing will include functions
such as the following:
(1) preparation of Jackets for incoming reports;
(2) assignment of proper accession numbers to manufacturer and
test data reports to preserve identity and completeness of
manufacturer submissions;
(3) maintenance of chronological record of receipts;
(4) editing of data reported, for example, translation of
S;rC codes to end-use names; checking and correcting
spelling of names of manufacturers, chemical substances,
. etc. ;
(5) posting of ID number to appropriate descriptor catalogs;
(6) validation of data reported, including identification of
-57-
I,
-------
'-- ~~r
"'T'I
--.
(Q
c:
'"1
II)
W
.
3:
DI
~
I c:
VI DI
CO .....
I "tJ
'"1
o
o
II)
III
III.
--.
~
(Q
FILE PROCEss I~
~iEEKL y PRQcEss ING
!\l;CESS ION
COO'ROt
"'.ANUAL F'RocESS I"~
-JAcKET
PREPARATION
CATAlOG
POSTING
FROCESS FILE PREP~ ~~VT\JRE I PRoDucE
NOTIFICATION ;.~ J ':1
FOR t'.CTJONS -"-- knoo StI+1ARY
I£ooIRFl> OF REQUIRED REPORTS
, rbNcoMPUANCE
FLAG AN{
. . ..,:,..
kTlONS
PEWIRED
FJI..E
~
-------
reports for which contact with reporting establishment is
necessary;
(7) preparation of short bibliographic citations for test data
reports;
(8) maintenance of all catalogs on a current basis;
(9) maintenance of auxiliary catalogs, for example, "next number"
record.
To perform the accessioning and cataloging services, three types of personnel
will be required:
Technical Information Specialist, Data Technician, and
Library Technician.
The Technical Information Specialist will need sufficient
background in chemistry in order to direct and provide guidance. to operations
in this Section.
The Technical Information Specialist will also need to
be familiar with the total content and organization of the data base, as
well as with the organization and structure of the various catalogs.
The
Data Technician will review and edit all incoming reports for correct spelling
of all entries, for the validity and proper range of production figures
reported, and for the forwarding of invalid submissions to appropriate
OTS personnel.
The Library Technician will be responsible for the preparation,
posting, and maintenance of the various card catalogs.
The Library Techniqian
will also be responsible for accessioning of incoming reports and for the
preparation of jackets for each.
In order to estimate the number of each of the above types of personnel
which will be required to perform the necessary services, the following
assumptions are made.
Each submission
will require 40 entries or postings to
-59-
-------
----.- -~,~.-
." ..,.
~,-..,..~
I '
::' .:
variou. c.t.tlog card. or 1098' as described in the Information SeQ,tion ~
this report.
In that Sect jon, ap~oximately 12 s~~h ant(ies w~e'idantii,i~;
thi.s figure is rounded upward to the 1:Jextte.nth for easy est:.1ma ting purposes,
also in orfIer to provide for additional catalogs which may 'become desirable
once o;erations are underway.
rime estimates for perrorming these various functions are presented below.
Minutes
(a) accessioning (entry in chronological
log, assignment of ID number)
4
(b) preparation or :Jacket
(c) 'preparation or catalog card (typing
of indexing point to' card, from
appropriate catalog)
3,
1
(d) posting 'of indexing terms (reirove
proper card trom catalog, post ID
nUmber, reEils card in catalog)
(e) pre;5ration of bibliogra;hic citation
for test data reports
-J,;;i: ,
':"~"."" I,I."'~...I'
, ,""""" /. ~. .....'"
, :,<"", 10" .",,,,,,
5
(:t) editing/validation or reports
, "
~....;,~;,'/:f'
.'
~.\ ,
It should be".aoted ,here .that the above,ti'me estimates are int~itive ~nd
not supported by hard data.
Literature on, this subject is generally un-
available.
The chronological log will contain one filntryper each document receivedl
test data report volumes are estimated at 1,000 per year:Eor the first
three years and 2,000 per year:Eor the next three years.
The card ca talogs
-60-
-------
may contain approximately 400 different manufacturers, 4,500 different
chemical substances, an estimated 50,000 different names of end-uses, and
an equal number of different by-products.
Estimated numbers of postings
to each card catalog are presented in Table 3.
Personnel in the information retrieval section will be responsible for
servicing all requests addressed to the facility for any information stored
in the data base.
These services include the following:
(1) retrieval of specific documents (manufacturer, premarket,
or test data reports) from the files;
(2) use of the coordinate indexes to identify documents with
specific characteristics, for example, all reports of
manufacturers of a particular chemical substance;
(3) totaling of production figures from the documents retrieved in
(2) ;
(4) preparation of mailing lists;
(5) reception of visitors and servicing their requests;
(6) flagging of documents to indicate action-required items;
(7) maintenance of tickler files;
(8) periodic alerting of OTS personnel that preparation by OTS
of a particular report or letter is necessary, or publication
in the 'Federal Register is required;
(9) accessioning and referencing of all correspondence required.
Two types of personnel will be required to perform the above service:
Technical Information Specialist will direct and provide guidance to in-
formation retrieval activities and perform some of the more complicated
searches of the catalogs.
Record clerks will be responsible for the re-
maining services specified above.
-61-
-------
The following, assuJl!Ptions are made in order to estimate the m1ziorber of people
that will be required for this Section.
Minutes
(a) retrieval of specific docwnent from
file
2
(b) preparation of charge record
2
(c) servicing of queries based on coordination
of required terms (analyzing question,
structuring query, coordination of
appropriate term cards, listingID
numbers)
15
(d) maintenance of tickler files, flagging or
items, issuing notices of required action
5
The vol u.me of each of the above functions is unknown a:t this time.
For
estimating purposes, it is assumed that one-half of any given year's
accwnulation will require services (a), (b), and (d), and that approximately
50 questions will be serviced daily.
Duties of '!iJ:!!.. 'mai~tenance personnel are self-explanatory.
These PeOple
will be responsible primarily for the return to the proper~shelves of all
material used during a given period; and for ensuring that the file(s)
returned to the shelves are complete.
This. section will also be responsible
for the maintenance of the log of who had access to which records.
Supervisor and File Clerk personnel types will staff this Section~
The
filing unit is asswned to be a complete manufacturer report (all 4 items,
as presented in Exhibit 2), either annual or premaz:ket, subnutted by a
-62-.
-------
manufacturer for one chemical substance.
Each report will be filed in its
own jacket which will be labeled' wi th the report ID number 0
The jacket
label will serve as a shelf locator.
In the case of reports submi tted by
a single manufacturer for multiple chemical substances, each report for a
particular chemical substance will be a filing uni t wi th a different ID
number.
The jacket for a multiple submission could, but need not be,
labeled with the range of numbers included in the jacket.
Since report
numbers will be assigned sequentially, the multiple submission will still
be filed together, either in a common jacket or in individual jackets.
Individual jackets do mean increased supplies costs; however, they offer
advantages in preserving file integrity and in proper filing of the collection.
Individual jackets are assumed here for estimating purposes.
To estimate personnel needed, it is assumed that one-half of each year's
accumulation of reports will need to be removed from the files in response
to a request and then re-filed after the documents will have served their
purpose.
The charge records for the documents will then need to be disposed
of in accordance with whatever procedure is established.
Time estimated
per document is 4 minutes.
No account is made of the possible overlap
between file maintenance and retrieval activities, though it is obvious
that the file maintenance section will, more often than not, be re-filing
documents.
B.
Enhancements 'to the Manual System
1.
Description
This intermediate system is a series of independent progressive
steps to enhance the manual system and to move towards a computerized system ~
-63-
-------
. 1~'."
"
in a controlled sequence.
The steps are arra,nged iI? two broad categories
and are implicitly batch ~ocesses with respect to the Office of Toxic
Substances ~ata base.
The need to batch process falls rrom the low volume
or input data anticipated for the six year period. 'This period provides an
, ,
opportunity to perfect the ;processing procedures berore creating' the computerized
, ,
data base.
a.
Automation ot the 'Clerical ''l'asks Associated 'with 'the
.Indexing of'Data'Base'Documentat.:Lon
o
Use of automated data entry systems td facilitate
.1ndexing.
II Automation of catalog posting and publication.
II Construction and sorting of Ii vocabulary of da ta .
descriptors.
II Development of a co~rehensivQ Data Element Dictionary
across all O'l'S operation units and in cooperation
wi th other BPA offices. '
o Preparation of selected and sorted bib~idgra;hies.
Mechanization 'of Adm.:Lnistrat:J.ve Cler:J.cal'l'asks
b.
II Automation'ot mailing 1i8tand tickler file for
act:J.on-requ:J.red correspondence.
II
Use of Microfilm as a storage medium for working files.
o Development of simple information retrieval through
a computerized 'microfilm retrieval system.
It should be noted that these steps do not ret:{uire reducing the annual
..... . ...-'
reports or test data reports to machine~readable form and constructing a
data base rromthem.
Keypunching reports marks the Computerized System.
The steps described above require the accUmulation ormachi:ne-readable
input over a period of time and then performing relatively straight forward
sorting and listing functions.
The period ,of performance could be monthly
-64-
-------
or even quarterly as need and costs determine.
It should be noted, too,
that the archival paper files are outside these processes and thus their
integrity, completeness, and security are safeguarded.
2.
Indexing Aids
Several existing systems ranging from mini-computers to self-standing
data entry systems permit the user to compose his data on the face of a
CRT and then preserve this data on machine-readable media.
The storage
media range.
from "floppy disc" and cassette tapes to standard computer
magnetic tapes and disc units.
Essentially these systems give the user
great flexibility in correcting and formatting his ideas as compared to
a typewriter.
Further, the systems can initiate any input process with
a structured "form" projected on the screen to prompt the user to provide
a complete and more accurate input.
Beyond 1980 the system would first call for the information necessary to
prepare a standard OTS bibliographic citation, then the system would permit
the abstractor to compose an abstract which will be stored with the citation.
Finally, the abstractor would be required to insert one or more descriptors
which will be used to index and catalog the document.
At the outset the
abstractor may only index and catalog.
3.
'CatalogPostingandpublication
In a manual system the indexing and posting onto several catalogs
is a tedious and timeconsuming task with an inevitable chance of error.
The costs remain a fairly constant function of the number of additions
to the data base.
The addition of a new indexing point and, consequently,
a new catalog represents a major cost increment.
A catalog need not be
-65-
-------
-.-. ---.
. .
a card file, however, and several libraries have replaced card files with
bound computer listings.
After the indexer has ass'igned values for'the
various indexing points, the values can be machine entered, rearranged,
sorted, and listed to form the several catalogs.
The programming is straight
forward and operational costs represent a few hundred dollars of pr9cessing
time.
4.
Data Descriptor Vocabulary
A means to control the use of descriptors in the indexing process
is the development and definition of a standard vocabulary of data descriptors.
The standard vocabulary is published and all indexers are constrained to
use the terms in the
vocabulary.
5.
Data Element Dictionary .
Language (and its use) presents one of the major obstacles to
the successful development of data base systems.
Wherever possible the
OTS should ensure that all its components are using the same terms
in the same ways.
Experience indicates that the most' successful way to
achieve this objective lies in carefully prepared and maintained vocabulary
control procedures.
A data element dictionary provides 'a 'specifi.c n~e for a datum that ,the
data base will eventually contain and a written,' universally available
definition of that datum.
For example,
COMPANY-NAME
From line 1 of the' annual report. This
field contains the official. and legal naine
of the cOlllpany to which this da ta applies.
-66-
-------
CONTACT-POINT
From line 12 of the annual report. This
field contains ..the name and mailing address.
of the person who is the official contact
point between the OTS and the company
identified in this record.
...:J)
The above example shows the manner in which a field receives a unique
identifier and how the definition describes the data that will be in that
field.
In a data base as large as that anticipated by the OTS, the Data
Element Dictionary will reach a size that justifies using computer techniques
, '.
to maintain, update and report its content.
Such a file is usually called
the Data Directory.
'..
.'
The Data Directory file also has a structure that can be described.
Description of all the information needed for the Data Directory about
each datum in the OTS Data Base will provide a useful technique for col-
, .
lecting the information and an important management tool for controlling
the OTS Data Base.
For management control, the Data Directory may contain the names of the
organizational unit responsible for maintaining the accuracy of each datum
or for requesting its inclusion in the data base.
6.
Bibliography Preparation
Simple bibliographies of test data documents can be prepared
'..
fromc~;p~~erized lists of ~~p~iographic citations and indexing points or
.. .
terms. By sorting and/or selecting on keyword-in-title or indexing terms,
lists of simdlar documents can be produced.
7.. ' Automated 'Mailing List and Tickler File
The mailing lis.t of:'reSpof1dent1! and the action-required information
-67-
-------
can be combined in a single periodic computer run to provide a management
report that can highlight potential problem areas and can assist in, planning.
8.
Microfilm Working 'Files
The creation and use of microfilm working files 'leSsens the threat
to the integrity of the archival paper files and introduces a degree 9f
redundancy' that's>srmi ts the reconstruction of the OTS data base in
case of cata~trophe.
,
. '
,
9.
Microfilm 'Retrieval 'Systems
Several existing microfilm retrieval systems permit topical
searches either through :J.nlormat:J.on coded on the m:J.crof:J.lm or through
I,
co~uter-ass:J.sted :J.ndexes stored :J.n co~uter memory.
The la~t~~, class
of co~uter-assi.ted microlilm systeml generally uses minicomputers but
,
can also be part of a time-shared envlrOl2J!lent with larg~'computer ~ystems
and distributed terminals.
il'he nwnber of indexing points enpoded is
limited and access times are constrained by microfilm format and mechanical'
11mitations.
10.
Personnel Considerations
The tasks described in this Section are separate and independent
functions whicl2can be satisfied by normal service functions.,
Therefore,
no specific skills need be 'hired to accomplish the tasks but the service
organization working on the tasks will require skills as described in the
discussion of the computerized system.
However, support of some of the systems on a continuing basis after the
initial development will require (or at least be improved by) experience
in toxicological indexing and abstracting.
These skills may ,be acquired
-68-
-------
--n
I
and kept wi thin the OTS.
The indexing keypunchers are an exampJ.tI of such
skills.
2'0 avoid the recurring problem associated with introducing automation to
existing functions, OTS MOuld do well to seek employee's posessing manual
skills who are also familiar with automation and its impact.
Successful
development of the computerized enhancements to the manual system will be
very dependent on the support and cooperation of the existing personnel.
C.
The Computer.1zedSf}stem
Basically, all industrial reports are processed in a similar fashion.
However, the periodicity chosen for the process in each case will reflect
the vol Ulne of input for each report type.
Annual reports with the largest
input volume may require daily processing while premarket reports may re-
quire weekly or even lTOnthly processing.
Each period's collection of computer input is "batched" into one "input
transactions" file and applied against the OTS data base in a serial fashion.
As the transactions are applied, prestored programs can perform such tasks
as:
o
validity check of transactions;
o
validi ty check of the effect of the transaction on the
data base;
o
checking and reporting any aberrant or noteworthy
results from the update (e.g., new totals exceed
defined acceptable norms);
o
providing administrative reports on actions required by
OTS or manufacturers 1
o
preparation of mailing lists.
-69-
-------
I
'oJ
(:)
I
. '"TI
..J.
I.Q
c:::
..,
tt)
~
o
......
(,I)
........
c
00
(,I)
o
=:T
~
PI
r+
..J.
o
C
PI
c+
PI
'"TI
.....
o
"=
,.
-
.,....
---_._-~----,........- -:. ~.
=-== ===== ~
OfS/DI
~ ....
'~.o.e.' .'
.. ~ -..
---,
,.
i'
~---:
..
....- '.;I - -
...-
.. ~
--.4-...-
0:" ~.........-
..C8PI1tR
.....
...{"."," ....
-------
At the end of the periodic processing, the computer can provide administrative
information for planning purposes in the form of summary reports or audi t
trails.
Queries can be applied against the data base in two general forms.
Inquiries
which are frequently requested can be prestored as "canned" routines.
These can be used periodically to provide the kinds of reports required
by the operating units.
Other special reports will require specific coding
tasks to provide the necessary responses but this process will be materially
eased by the data management system with its precisely structured data
and available software components.
Most systems will provide results in
one or two days.
1.
Information Flow
a.
Mailroom
Because of the confidential nature of many of the reports,
special precautions will begin with the arrival of registered mail and
a system of accountability traced from the first person signing for the
registered mail.
The receipted document will be given a control (or
accession) number which will uniquely identify the document and be asso-
dated with it throughout its existence within the data base.
As the number
of reports and the number of people using the reports increase,' the need
to formalize the control of access to the reports will increase, also.
This may lead to the first of several administrative files to permit the
ready location of a certain document or to reveal who has had access to
that document.
-71~
-------
T'r;"''\} ..-"'1' ,.d~.~4.:.~_"t...~~..." -:-: :"'.'.:.... '.~,''''~.'"'-.,:, ':':'...-.4 ..
Themailroomwillal.80.providethe".first...sort..ofthe..~i1i(Joirtitrif..i1IIJterial
into the various categories.
. .
This discussion 1s 'based on the flow of
annual reports.
b. . Microfilming
The previously assigned control number will be used to
" I '. " .;
identify the microfilmed report and will become a part of the microfilm
record, the original document, and sUbsequest machine-readable r~cords.
The microf.ilm, co;y will become the worJd.ng file and will be available to
users in various forms as the needs of the users and the size of the data
base evolve.
The form of the microfilm storage will determine the manner
. .
.' ..
in which the machine-readable records direct the user to a part.1cular
microfilm.
For exam,ple, if the data is 'stored on reels of microfilm,
.'
the cOJnj)uter record could identify the reel number and posi tion on the
reel.
.c. ' 'Data 'Entry
After microfilming, the original documents will be forwarded
to the data entry section where certain 'data from the document will be
converted to a structured, machine-readable form for entry into the computer
for processing.
,; .
Proper design of the in;ut dopumentwill reduce decisions made at this
level to the absolute minimum.
Keystroking, in any of its several forms,
should be a fairly mechanical task.
Several of the newer forms of
keystroking permit the operator to record directly on magnetic tape and
present the input on a CRT screen so that corrections are immediately
'.
and easily made.
-72-
-------
,I
1
I
Data validation highlights the importance of reducing the keystroking
I:
,
process to a mechanical task.
The :crr:>st common way to insure that the
data input to the computer correctly represents the original document is
to keystroke the data twice and then compare the two records.
In :crr:>st
systems the operator is notified immediately of a discrepancy and is
forced to correct it.
Data validation has a profound effect on reducing
the a:crr:>unt of data errors in the data base.
Keystroking reduces the input into a set of machine-readable characters
:i
I:
,I
acceptable to the target computer.
This set is usually restrictive in
the sense that many characters normally found in the original documents
,.
"
are not available in the machine.
For example, the absence of Greek
letters will constrain the description of certain chemicals.
Standard
rules for recovery from such difficulties should be prepared for use
by keystroking operators.
(For example, using, "alpha", "beta" instead
of their symbols.)
The set of characters chosen should not unnecessarily
limit the ability of OTS to change vendors.
Therefore, the chosen
character set should be one that is widely available on many computers.
Certain data which may appear on the original reporting document are
not readily reducible to machine-readable form.
This data includes
graphics, mass spectrograms,' chemical structure figures, charts, and
drawings.
Additionally, cost effective considerations will argue against
inputting to the computer lengthy narrative descriptive material or
even abstracts of such material for full text searching.
I
, I
-73-
J
-------
Encoding,Abstr~cting,and'Quality 'controiot'Da~a
Encoding is the reduction of a concept, whether simple or
d.
complex, to a single datum, either a word, alphanumeric coded char~cte1,'
string, or numeric value.
Abstracting is the reduction of a. leoot1Jy
narrative description into a succinct equivalent.
Both processes ought
to reduce the size of the .explanation without losing any information qr
meaning.
Quality control of the data base is the continuQus ~roce~s of
assuring that; the data base is .complete, accurate, a~dcurrent.
Computers can check for clerical errors such as mi,ssing data, erro;neous
or nonsensical codes, and numbers outside valid ranges but cannot check
miscoding or make correlations that human checkers can.
, .
7'h~ quali ty of
the data base should be the specific responsibility of.a quality control
unit that will make decisions and use all available resOllr-ces. to i1)sure
the highest integrity and completeness of the data base.
Clearly, maintaining the integrity of the data base requires a respurce
eXPe.r1diture which increases as the requirement for accuracy inCreal?esl
the lOOre spent, the better the integrity.
Indirectly,.the data base is
quality cheCked by the users who. point out inaccuracies and .incompleteness.
'.,
;1'
, But it should be noted that increased usage justifies increas!3d qual;ity
control -- often asa resuJ. t of user scr.eams.
~
Certain portions of the annual reports will require. human int~rvention
in order to understand and abs.tract ,the repor.ts.being s~P.mitted.
For
example, the Bureau of Census .has a very structUred report:ing fprm
-74-
-------
reflecting their wealth of experience in collecting and building data
bases of information, but they have found the need to use a category
coded "other" for the inevitable surprises that respondents produce.
The "other" code requires a technician to analyze and record the unexpected
answer.
Furthermore, the use of "other" codes should be m:mi tored in
order to provide for new fields or changes in existing ones, as soon
as such needs arise.
2.
Data Entry Considerations
Data entry, the point where information enters the system,
represents a highly crucial point in the flow of information from the
manufacturers of toxic substances to the OTS.
Data entry is the largest
cost element in any machine-oriented system.
It is prone to exaggerated
errors because of the cascading misinformation resulting from bad input.
Bad design in the data entry aspects of a computer system foredooms the
entire system.
This Section discusses various methods of data entry in
order to review options without a close scrutiny of the feasibility or
economics of the methods for the OTS application.
To place data entry in context, data entry begins at the point where
OTS acknowledges receipt of incoming data.
At some point, some agent
of OTS must decide whether the data falls wi thin the scope of the OTS
data base and should, therefore, be included.
Data entry ends with the
insertion of all pertinent data within one or more storage units of the
system.
In terms of design, data entry costs should help determine the eventual
-75-
-------
."
-
cc
c::
.,
fD
U'I
.
n
o
3
-,;J
c::
r+
I fD
""'-I .,
0\ -
I N
fD
c..
"1:J
.,
o
n
fD
lit
lit
..I.
:J
cc
\tm..v fR(1(U$ur.;
=
BATa.. t!EEJ<' s
ItFUT ImRTs
~IZID l"aJ::ssur;
HPDA1E
01S
)\\YA rASE
rbTICES CF
. .
INaMU'IE
mr'ISSltl;
~rs.
..-
.r--
~-'.
fim&t
STNtWm
nalES
Su+wrf
- P.aJ.ans
. -.i
~
SPEcIAL
tlERIES
. .
~PECIAL
Ieam
.....-
-------
scope and size of the OTS data base as well as the frequency of input
processing.
Similarly, the design of the data file structures,
interrelationships, and storage media will affect the design of data
entry systems. . This means an inevitable iterative approach to designing
both .
From a procedural standpoint, data entry rests heavily on the quality
of the input documents, the training and capabilities of the data entry
.personne1, and the accession methods chosen in each distinct file.
No
manager should be surprised that computer system design must reflect the
question of personnel availability and that using the computer to replace
clerical help often results in higher operating costs.
Whether made
consciously or not, the decision between faster response and further
dependence on clerical staff often ignores the costs involved.
Data entry methods fall into two broad areas:
keystroke and optical
character recognition, (OCR).
Keystroke methods depend on the human eye, brain, and fingers.
All
data flow from the document to the storage medium, therefore, rests on
the inherent weaknesses and strengths of people.
Examples of keystroke
methods are:
o
Cardpunchlng. This classic method of data entry places
data on 80-column punch cards. This method is characterized
by many inherent limits, such as difficulty in correcting
errors, ease of misarrangernent of data, large storage
requirements, very slow input rates and consequent high
costs. But this method is universally used, nevertheless.
This last fact, always surprising, means the ready
-77-
-------
availab.il1 ty of personnel, job shops for peak loads', and
. Competit.ive pricing. (Some shops now use Korean and
Taiwaniari help to cut costs.)
o.
'Key-to-tape. The keystrokes record directly on a
magnetic tape and resul t in much l,arger records, ready
error correction, reduced media and computer entry costs,
and' Vl~ry reduced storage requirements. Though usage is
growing rapidly, key-to-tape systems are several orders
of magnitude less available than card punching systems.
o
Key-to-drum. Though quite similar to tbe key-to-tape
method in that the keystrokes record directly on a magnetic
storage medium, the random storage structure of the drum
permits the "clustering" of many j,nput terminals around
one drum. The necessary control for such clusters of
one device is often a minicomputer.
o
Intelligent terminal. This method resembles the key-to-
drum above except that the control computer can be used to
structure the input in an interactive dialogue with the
terminal operator. In effect, the terminal asks questions,
displays the proper format, and indicates detected' errors.
Additionally, the configurati~n of the intelligent terminal
may contain communication links for direct transmission
of data to the main system and its storage facilities.
o
Re-typing 'in 'OCR font 'or format. This is a". special form
of keystroking or mark-sensing which reformats docUments
not originally prepared in a form acceptable to OCR. Its
main advantage over other forms of keystroking lies in
the output which both humans and machines can read.
Mark-sensing methods permit a machine to read directly a previously prepared
document.
Mark-sensing includes such options as sensing special pencil
marks, Magnetic Ink Character Recogni tion (MICR), point-of-scale dcita
recognition devices, etc.
However, this repbrt considers only optical.
character reoognition~
OCR permits the scanning of printed material directly by mechanical devices
to translate the data into machine-readable form with little or no
human
intervention.
One group of OCR devices uses microfilm as input.
-78-
-------
A major disadvantage of OCR lies in the initial capital expenses of
acquir.ing this fairly sophisticated equipment.
However, current technology
indicates significant econonrles for unit processing costs.
For OTS this
meaD$ using vendors who have absorbed the initial costs or who have amortized
them over a period of time.
Such vendors do exist in the Washington, D. C.
area.
OCR offers particular advantages when the user has control over the format
and structure of the input material.
Further, the process pernrlts scanning
only a portion of the material first and perfornrlng a more complete scan
at a later time.
3.
Data 'Management System Selection
The data management system selected to provide the major portion
of the processing should be a proprietary package available to all potential
vendors and to ADP facilities within EPA or other important government
agencies.
By selecting, a proprietary package from a reputable supplier,
OTS is reasonably assured that the critical kernel of its information
handling capability will have continuing maintenance and support.
Further,
certain proprietary data management systems have implementations on several
vendors! equipment and offer some degree of flexibility in selecting
contractors.
As a guide to selecting one, from among the many available, data management
system to serve the OTS data base note the following important considerations:
o
Interface wi th a common procedural computer language;
o
Flexible report writer capability;
-79-
-------
o Basic orientat~on to sequential file processing ,but
extensible ~o random accessing;
o System sUP1'Ort of variable length files and fields,
with grouping in a hierarchical. data structurel
o Extensible to remote processing I
, "
o Audit trails of data processing activ~~i~sJ
o Extensive data validation :features;
o Ease o:f :file loading ( i.e., inputting the ,initial
values to a newly de:fined :file);
o Flexible repertoire o:f :file commands,
, ,
o Output control with selective language :facilities
which perm.1 t constraining. the ou,tput to ,pertinent
areas ot the tiles.
4. Data Management 'System Review
. I ~
"
..',
ot the approximately 40 data management systems reviewed,
three were selected as candidate systems with the necessary ~apabilities
and :features required by O~S data processing needs.
~hese candidate
systems are GIS, RAMI~, and System 2000; summary descriptions o:f each
are presented separately at the end ot this Section.
Common to all o:f these syst~ are the capabilities to per:form ~e
," n'
,j :.,
:functions basic to"all'da~a ba~e management systelf!S.
Each has a data",'
description language which enables the user to d,e:fine bOth logical and
~ysical characteristics of the OTS data base.
E~ch has a capability to
coord,fna't~:ana' s~;ervi.'se' the~:':h~r~a:re '.-and software cozn.ponents necessary
to process 'the data.
Each provides the capabili ty for all users to ".' ,
process required data without concern :for its physical location ,C!r'" for
its logical relationships with various user appligationprograms.
-80-
..' "".
-------
,-------------
In addition to the capabilities common to all data management systems,
the candidate system must have other capabilities to
satisfy requirements
peculiar to OTS.
These capabilities are:
(1)
computational facilities
which include arithmetic, logical, and relational operations; (2)
hospi-
tality to user application programs necessary to provide other computational
services not included in the candidate system but required by OTS; (3)
multi-file processing; (4)
sorting of records in user-specified order;
( 5)
flexible, user-specified reporting formats; (6)
hospitality to
a hierarchical data structure, with variable length records, permitting
multiple occurrences of fields at all hierarch~cal levels, (7)
audit
trails; and (8)
data validation facilities.
The above specifications are based on current understanding of OTS require-
ments.
These requirements are reviewed below, with discussions keyed
to the numbers in the preceding paragraph, as well as to paragraph numbers
in the summary descriptions.
(1)
Computational facilities provided with many data
management packages include arithmetic operations (add, subtract, multiply,
divide); Boolean operations (and, not, or, and combinations of these);
and relational operations (equal to, less than, greater than, etc. and
combinations of these).
OTS has indicated a need for all of these capa-
bilities; and all candidate systems have them.
(2)
Hospitality to user application programs is necessary
to supplement computational facilities not provided by the data management
system in order to enable the use of existing (or proposed) OTS application
programs with the OTS data base.
To date, one application program has
-81-
-------
been identified - an EPA program which relates CAS numbers to names and
synonyms' of chemical substances.
AI;! candidate systems can accomlOOdate
application programs written in prooedure-oriented languages.
( 3)
Mul.ti-filepr()Qes.$i.,ng, or theoapabilityto open more
than one file at a time is neoessary prinoipally for efficient file
management.
Two files have been identified as probable ini tial compo-
. .
nents of the OTS data base.
These files cOmprise records of disparate
structures and lengths; individual records within files may have dispar-
ate rates of activity.
For example, address portions of annual reports
will need to be retrieved muoh less frequently than annual production
figures for a given ohemioal substance.
There may be other such reoords
whose volume of activity or similarity of structure wi.ll warrant file
segmentation in order to opti.mize overallsto:tage and prooessing effioiency.
( 4)
sorting, 'of . re.c:p;rds .. i.l:J :.u,s.er-$peci:fj.ed o~der is a require-
ment included principally to provide some control over the information
retrieved from the data base in response to a query.
As an example,
information satisfying searoh criteria in a particular query may be .output
in the order in which the source data was entered, or in spine order PeOu-
liar to the data management system in ques;tion\
This may not always
be the best way to present the information to the user who may need ItIailing .
lists, for example, arranged alphabetically by manufacturer, or annual
production figUres for a dheinical substance listed in descending order
of magni tudes.
(5)
Flexible;usei:-specifiea:foE1I'/?1ts will provide additional
latitude in specifying print or display forinats for computer output.
This includes options such as user-supplied report titles or column headings,
-82-
-------
',-
PACKAGE:
GIS (Generalized Information System)
SUPPLIER:
IBM Corp., White Plains, N.Y. 10601
COST
Purchase:
Rental:
$450 month for Basic Retrieval System, prerequisite for
all features
SYSTEM REQUIREMENTS
Computer(s)
:
IBM 360/70 (with decimal arithmetic feature)
Core Storage
:
l13K bytes; additional 10K bytes with teleprocessing
feature plus 15-30K bytes if online terminals are used
Auxiliary Storage:
Minimum of three 2311 disks recommended
Input/Output
:
Card, printer, disk
Operating System
.
.
os
(1) Computational facilities: "Standard logical operators (e.g., greater
than, not equal, between)." Other operators included with system options:
increase, decrease, multiply, divide; total; average; unicount (number of
unique values of specified fields); scanning for occurrences of specified
stringsJ detection of increases/decreases in field contents since last
testedJ and detection of fields with blanks or zeroes.
(2) User Program "Interface: User-written code must perform file maintenance
with the Basic Retrieval System; file modify and update options necessary
for GIS to perform file maintenance. The BRS can pass control to user
programs written in COBOL, FORTRAN, PL/l, or assembler language and provide
the program with data retrieved in a query.
(3) Multi-file Processing: Up to 3 files concurrently; optional feature
extends multiple processing to a maximum of 16 files.
(4) Sorting: Alphabetic, ascending, descending on up to 64 keys.
Maximum of 115 fields can be extracted from up t03 files and assigned to
maximum of 16 temporary files for sorting and printing independently.
(5) Reports: Optional features control page formating, (size, spacing,
labels) subtotals, data suppression, and offline printing capability when
operating from online terminals. Utility option provides capability to
maintain library of source language programs and precompiled procedures.
(6) Data Structure: Basic system supports non-hierarchical files
in sequential or indexed-sequential mode. Hierarchic file support
permits repeating groups at up to 15 levels, with only 1 repeating
permitted at any particular level, except the lowest.
processed
option
group
-85-
-------
(7) Audit Trails:
1 evels .
Access qontrol is provided at both £.ile a,nd £.i,e~d
(8) Input Checks: The Edi t and Encode option provides for e4~ t~ng of inl'ut
data which includes range and validity checks based on specifications from
the file definition. It also provides for the a~ton.aticencqding ',!f s'!urce
data values. File and tiel,d access controls govern autp'?r.:ity ~o retrieve
and 11rJdify dat".
COMMENTS: IBM Class A Program Product. Current applications are i1).
industries requiring querying of large da~a bases; nUmber' of" ~~ta,l~ations
not indicated. '" '
, .,
,.
PACKAGE:
RAMIS (Rapid Access Man~gement I~foJ:~t~on System)
Mathematica, Inc., Princeton" N.J. 08~40
SUPPLIER:
COST
-
Purchase:
Rental:
$21,000 ,
$500 per month; $3,000 extrafo:r: TS ver~~on
BYSTEMREQUIREMENTS
r . . .
Computer(s)
.
.
IBM 360/40 and Up; !BM 370
.:.....
Core Storage
:
128K
Auxiliary Storage:
J. dj.sk
Input/Output
:
te.rminal; tape; c.ardJ:'eader; p;cinter;dj.sc
operating System(s):
os
. :.'l ~
. .. . '. ";'" .~'i'" ."
( 1) 'Computational'Faci,2j.ti es~:Ma tn'sma ti.r;~i:'~p~J:a!=ionsper¥rl t 'running
calculations to be, perf,Qrmed on ,c1a~a 4n,g~,v.en f:!-,e:"~. '!'h,ese' operat,!-'?Xls inclu.4e
average, maximum, minin1um, av~rage S,UJll .9fsqFar~l a~d percentaves. Other
operators include ari,tbmetic (pl'f#;, mi1J.US, d~v:#.a.e".multil)!-Y); relational
(equal to, less ,than, greater t1:lan, 1J.Qteql,Z.~;Z ;~o, ,a.1J.d~o~inatiQns :of these);
functional (minimum, 'maximum, square f.Qo,t~~:fCpqiJ.fm_t;i{Jtiqn, ;rog, abso;z.u.te value,
integer part) and logical (8.!'Jd-" or"i.f; :then) " ;::':~Q;tbe,r ,:facil:i,tiesoftered i.ncl ude
concatenat..ion,date conver:;io1l#1" and' f'~eld ,f?K1.l!=,;ng.
'...
(2) User Program Interface: Host langl,1age interface allows programs
(COBOL, FORTAAN, AND P~/l) toaqqess (by :u$ing lield;ll,ame~) t311~ c~a.nge
RAMIS records di.rectly. Data r~turned from one or more files can be retained
-86-
-------
for further processing wi thin RANIS, saved as sequential records that can
be passed to other programs, or incorporated into reports.
(3) Multi-file Processing:
processing as a set.
Up to 10 files can be linked together for
(4)
Sorting:
"Various types" (from next paragraph) .
(5) Reports: May be produced in standard formats or in formats specified
by the user, and may include results of calculations which provide column
totals, sub-totals, row totals, percentages; as well as various sorts and
edits. An option is an output package that generates histograms and plots
wi th two or more axes.
(6) Data Structure: Hierarchies are established through tree-structured
organization of files with cross referencing. Linked lists of fields are
maintained and accessed by the system through indexes. A maximum of 24
levels is allowed with repeating groups with variable numbers of occurrences
for each group.
(7)
Audit Trails:
Maintenance of log of rejected records (in file updates).
(8) Input Checks: Edit and audit checks on all files; directory name
checked against field names in search requests.
COMMENTS:
the world.
month.
First installed in 1967; 500 copies now in operation throughout
Available through National CSS time-sharing service at $600 per
USER COMMENTS: "A well-established package that has been enthusiastically
received in the marketplace. Users contacted are extremely happy with
system. They say data can be easily obtained when needed by nonprogrammers -
including technical, administrative, and clerical personnel. . . . Installation
and training support are rated good to very good; ongoing maintenance very
good to excellent. Supplier personnel respond rapidly and effectively to
all customer problems. Documentation is considered adequate but improving.
Minor deficiencies surfaced, most seem unique to a particular user. One
user has lOOK-byte partitions, and these are too small for RAMIS. Other
problems encountered were associated with the system's inability to handle
other data files without previously converting to RANIS format, which means
that duplicate files must be maintained when files are used by both RANIS and
other applications. However, a host language interface can be used to
tie RAMIS files to the system." (Auerbach)
System 2000
. PACKAGE:
SUPPLIER:
MRI Systems Corp., Austin, Texas 78766
-87...
-------
COST
-
Purchase:
Rental:
$25,000 (paid-up lease, one time charge)
$740 per month; 960 rental with option
SYSTEM REQUIREMENTS:
Computer(s)
.
.
CDC 6000 and Cyber 70 Series; IBM 360/40 and up;
IBM 370/135 and up, Univac 1100 series
16K decimal words, 90-150K bytes,. 22K words
Core Storage
.
"
Auxiliary Storage
,disk and at least 2 mag tape drives if Sequential
File Option is used (for all computer systems)
Input/Output
, ,
standard r/o for each computer system
Operating System
,
Scope, Kronos, OS, DOS, Exec 8
(1) Co~utational Facilities, The following relationships may be specified,
logical (and, or not), co~rative (equal, not equal, greater, less, or
combinations of these), range (for ega~le, from 30 to 40 lbs), and existence
(whether date of a transaction is entered). With the Immedic!te Access
Module, calculations such as tally, sum, average, maximum, and standard
deviation can also be specified.
(2) User Program Interface, The Procedural Language Feature enables users
to process System 2000 data with application programs written in any part
of the data base, retrieve data :l.n desired sequence and format, and update
the data base with the application programs.
System 2000 statements are embedded in these programs and, when encountered,
they activate System 2000 through the Interface Module.
(3) 'Multi-file Processing: Can process entire' files. from tape, or files
that are divided between tape and disk. Efficient support is available for
data bases tha.t range in size from a few thousand to hundreds of millions
of characters.
(4) Sorting: The Immediate Access Module permi ts users to specify sorting
on up to 40 keys.
(5) 'Reports: The Reporter Writer provides extensive formatting flexibility
including' column, row, and page headings; dates; rootnotes and other
explanatory. notes, nested control breaks wi th totals and sub-totals I and
ordering or report contents. Reports can also contain computed values whose
rormats are derived from their field contents and li terals. Report rormats
can be cataloged and stored to avoid repeti tive redefini tion. .
-88-
-------
(6) Data Structure: Inverted file structure oriented around the repeating
group can theoretically accomzrodate 32 levels. Practical limit however, is
8 to 10 levels. Up to 430 data elements may be included in one or more
repeating groups.
(7) Audit Trails: Creates machine-legible file. of updating transactions
that can be used with archive copy of data base for audit or backup purposes.
Password control for entire files; addi tional passwords may be used to
control access to individual fields.
(8)
Input Checks:
Source data checks against user's file definition.
COMMENTS: The package was first released in June 1970 and is now (October
1973) installed in more than 40 locations. Also offered through various
service bureaus nationally, and internationally through CDC's Cybernet
time-sharing network.
USER COMMENTS: "System 2000 users who were contacted feel that it is an
outstanding package. They indicate that the pa~kage is easily used for
all data management functions, emphasizing its flexibility in redefining
file formats and its retrieval capabilities that permit nonprogrammers to
structure rapidly both batch and on-line requests. The package is used
easily and effectively with programs written in other procedural languages.
These users indicate that System 2000 accomplishes its functions rapidly;
they attribute this primarily to indexes and other structural pointers in
the data base. However, the increase in speed is achieved at the cost
of storage space. In this tradeoff, which is almost unavoidable, the users
prefer the speed of System 2000.
Support provided with the package is rated very good. Supplier personnel
are knowledgeable and cooperative, and they maintain close contact with
System 2000 users." (Auerbach)
"Users of System 2000 contacted by Datapro 70 were unanimous in their
praise for the product and its vendor. The users stated that System 2000
performs as advertised and yields data base expansion 'ratios of 1.5:1 or
less when the user keeps within the 20% keyed item ratio recommended by
MRI. Datapro interviewed users with IBM System 370 and CDC Cyber 70
Systems. One user had particular praise for two features in the latest
release he's received: support for 1000 elements (up from 430) and multiple
data base support in procedural languages (which allows the user to work wi th
second, third, and fourth data bases while System 2000 automatically keeps
track of the user's position in each data base). This user also plans to
adopt the multiple-user option of System 2000. All users said that response
times for data base inquiries were very fast. In general, System 2000 offers
excellent potential for high-speed on-line information retrieval based on
complex requests." (Datapro)
-89-
-------
5.
System 'Configuration
The crucial kernel of the computerized informati~~ handling
capabilities is an off-the-shelf proprietary data management system selected
for use on the OTS data base.
In this discussion, the ;System 2000 Data
Management System is used as an example of a typical da ta management sgstem
and is used to provide the parameters for estimating the system configuration
necessary to support OTS computerized operations described herein.
According to available data, the System 2000 Data Management System
requires the fOllowing computer configuration:
Mainframe:
Univac 1110
MellOry:
131,000 words
Peripherals:
2 Magnetic'Tape Handlers
Univac FASTRAND III
Teleprocessing:
Equipment evailable should such an extension
become necessary.
It should be noted that System 2000 can operate on several distinct
vendor computer systems but the Univac 1110 is available to EPA and provides
a basis for cost estimates.
6.
Perso~l Considerations '
In the Computerized System facility personnel will provide the
same library-type services' from the same library as in the Manual System,
with assis;tance from a computer to automa:te many tasks as described bel'ow.
-90-
-------
Accessioning and cataloging personnel will be responsible for the processing
of incoming documents and reports and conversion to machine-readable
form of all document descriptions and of entire annual and premarket reports.
Funct.ions outlined for. this Section in the manual system will also need
to be performed here, except that manual posting of indexing terms to
appropriate catalogs will be unnecessary.
In addi tion to this, maintenance'
of catalogs will be automated.
However, some maintenance activity will
be required in regard to catalogs, for there will be a need to maintain
a catalog: of records in process or ready to be proceSsed in the next
scheduled output of a computer-produced catalog.
Additional functions for accessloning and cataloging personnel include:
(a)
construction and sorting of a vocabulary of data
descriptors;
(b)
preparation and scheduling of documents form1crofilming;
(c)
indexing of documents;
(d)
development of a comprehensive Data Element Dictionary
for allOTS operational units.
Personnel types required to fulfill accessioning and cataloging functions
in the Computerized System include Technical Information Specialist; Data
- -
Technician; Analyst/Programmer; Keypunch Operator; and Library Technician.
"'" Requi~ed compe'teooe..Qf.Data Technician personnel will remain essentially
. ... ~,. - 'J,;', ~
.... .. .
as that in the Manual System.
Library Technician personnel will, in
addition to their accessioning responsibilities cited above, have the added
responsibility of control and scheduling of documents for microfilming.
-91"
-------
Analyst/Programmer personnei will guide and Participate in the preparation of
the Data Element Dictionary.
Keypunch operators will transcribe all
necessary records for machine storage.
In order to estimate the nuniber
of people required, the following assumptions are made.
All data elements reported in the annual and premarket reports will 'be
converted ,to machJ.ne-readable form.
!rotal number of characters needed
is 4014 for manufacturer reports, and 1BO for premsrket reports as
descr ibed in ExhJ.bJ. t 2.
!rime estimates ,for performing these various functions are presented below.
Minutes
(a)
(b)
preparation of document for microfilming,
10
maJ.ntenance of controlled v9cabulary
1.5
(c)
data conversJ.on
150 keystrokes/
minute
Informat:J.on retrieval personnel will be responsible for servicJ.ng all
requests addressed to the facJ.lity for any information stored J.n the data
base.
Though the services provided will remain essentially the same as
in the Manual System, the tasks associated with information retrieval
,'; ,
'activities will be automated.
Information retrieval personnel will,.
therefore, use different tools and have three files to work with:
the
physical store of documentsl the machine file of manufacturer report
surrogates; and the microfilm store.
The services that will be requ:iJ:ed
are the following:
-92-
-------
(1)
retrieval of specific documents from any of the files;
(2)
reception of visitors and servicing their requests;
( 3)
scheduling computer operations to provide:
o
periodic indexes to holdings
o
vocabulary of data descriptors
o
selective bibliographies
o
Data Element Dictionary updates
o
periodic and aperiodic mailing lists
o
notices of action required by OTS or manufacturers
o
indexes to microfilm stores
(4)
preparation of routine queries for interrogating the
computer file ("canned" search routines); and
(5)
preparation of non-routine queries for retrieval of
documents or information from the machine file.
The types of personnel required to service these information retrieval operations
include:
Technical Information Specialist; Analyst/Programmer; Equipment
Operator; and Record Clerk.
The Technical Information Specialist will be
responsible for structuring technical search questions addressed to the
computer.
The Analyst/Programmer will convert the questions to appropriate
programs.
This person will also be responsible for analyzing other problems
presented for solution by computer and for preparing the required computer
programs.
This person will also be responsible for the production and..
periodic updating of the Data Element Dictionary.
Equipment opera tors will
be needed to operate and maintain microfilm readers, printers, computer.
terminals, and data conversion and communications equipment.
Record clerks
-93-
-------
IV.
EVALUATION AND COSTING CONSIDERATIONS
A.
Evaluation 'Considerations
1.
Evaluation 'Criteria
In comparing the various options available to OTS, it is impor-
tant to understand the various facili ties provided by the options iiswell
as the cost of the optional method.
The manual system can be compared
in function and cost to the computerized system.
The enhanced manual system
necessarily contains the manual system but the computerized enhancements
may reduce clerical tasks.
On the other hand, the computerized approach
may provide a capabili ty that is new or even impossible under a purely
manual system,.
An excellent case il1 point is the number of indexing points
available under the various systems.
An indexing point represents a pre-stored answer to an anticipated ques-
tion. For example, to file all the manufacturer annual reports alphabet-
ically by manufacturer name anticipates the need in the future to find
a specific report if one is given the manufacturer's name.
To keep a card
catalog of chemical substances which are filed under the various annual
reports anticipates that someone may, seek. information about specifi'c chem-
ical substances.
A manual system wi th indexing points requires manual
posting of reference points onto the inde:xing terms represented by the
catalog cards.
Consequently, each new requirement for an indexing point
means a significant increase in the labor of posting ~ entries but it
also means a significant task ofretrospectivelyre-indexing ~ll previously
indexed file entries.
A computerized system can trivially increase its
indexing capability by one indexing term.
';"94-
-------
The criteria used in comparing the cost/benefits of the three systems
are listed below with an explanation of each.
2.
a.
Indexing points capacity. The number of search term points
available to the user who is seeking informa tion from the
data base. Each search point or indexing point may have
many possible values. For example, the indexing point,
Chemical Substances, may have as many different values as
there are distinct chemical substances entered in the data
base.
b.
Input lag. The amount of time that the data base is behind
the actual status of the real world.
c.
Response 'time. The length of time necessary to receive an
answer to a query.
d.
Data extensibility.
da ta .
The ability to receive new kinds of
e.
Query extensibility.
questions.
The ability to answer new kinds of
f.
Reporting facilities.
from the data base.
various aids to present any information
g.
Availability of system 'components.
requiring developmental time.
Whether off-the-shelf or
h.
Personnel constraints.
to-find skills.
The requirement for special or hard-
1.
Costs.
j.
The ability to meet the confidentiality requirements
Security.
of OTS.
Facilities Evaluation
In order to perform the operations envisioned under the various
systems, the following facilities are required:
o
AdnUnistrative processing and clerical support;
o
Forms design capability;
o
Photocopying and binding,
-95-
-------
o
Secure storage facilities;
o
Performance site;
o
Indexing.
Each of these facilities will be discussed in sufficient detail to permit
the Office of Toxic Substances to evaluate compliance of offerors of these
services.
a.
Administrative'processing'andclericalsu~port
In a manual system the need for clerical support is quite obvious.
Less obvious and requiring special underscoring is the rather large admin-
istrative burden represented in the Toxic Substances Control Act.
The
Office of Toxic Substances faces a maze of interlinking dialogues with
its respondents, both individually and collectively.
It can reasonably
be expected that many of the respondents will be hostile and uncooperative
and will requi~e carefully collected correspondence files to protect' the
goals of the Office of Toxic Substances.
A fairly typical scenario might
begin wi th a query from a manufacturer as to whether his use of a certain
substance co~titutes a new use.
OTS's answer must include justification
which must come' from its data, base.
If the new use' requires test data,
OTS must show that such data is not presently available and must, provide
the manufacturer with a test protocol, if a protocol is requested.
From
the time the respondent returns the required test data, OTS must insure
that future use of the test data by other respondents will resul t in their
paying a proportionate share of the testing'costs.
This entire string of
events is punctuated by administrative actions to insure compliance, follow-
up, valid records as the basis of decisions, legally structured demands,
and fair proceedings.
-96-
-------
To a large extent, this administrative burden can be satisfied and antic-
ipated with good procedures. It cannot be ignored without jeopardizing
OTS's credibility.
b. Forms Design Capability
The correctness and completeness of the OTS data depend to a large
extent on the clarity and precision of the data requesting medium. Even
without the eventual goal of computerization, OTS must develop standard
respondent forms that help to discipline and structure the growing data
base. Forms design will become a continuing iterative task as procedures
develop and the need for more specialized forms increases.
c. Photocopying and Binding
The very nature of the OTS responsibilities argues for the fact
that the physical construction of the many reports and much of the corre-
spondence prevent treating this facility as an aside. The area of dissem-
ination, the volume of data, its economic and social importance, the poten-
tial of legal action, all argue for the availability of this capability.
The need to present OTS's position in a correct and professional manner
requires special attention to this facility.
d. Secure Storage Facilities
OTS must insure that its responsibilities and proprietary rights
to the information being collected and generated are being satisfied.
The storage facilities must protect OTS's data base from intentional and
accidental loss or damage. The problems and considerations in this area
are discussed fully in the Confidentiality Section. However, to provide
some indication of storage requirements, the following estimates are pre-
sented.
-97-
-------
--------1
For the volume of annual reports estimated to be 10,000 reportsJ in 1975,
and assuming one-eighth of an inch for each report, . the storage'required
is 12.5 four-drawer filing cabinets or 38 square feet of floor space. By
1980 there will be 270,000 reports on file.
That amount will require 338
four-drawer cabinets' or 1000 square feet of floor space. . The retention
term of the various data base documents is not.1cnown.
e.
Performance 'Site
Office of Toxic Substances requirements for quality control and
confidentiality lead to certain conclu.ions
about the ;erformance si te
of the several stel'S involved in processing the information.
To meet its
responsibilities, the Office of Toxic Substances must know precisely where
, ,
the processing steps are executed and what physical elements exist, to pro-
vide the protection needed.
To provide an outline for the following discussion, the possibility of
five discrete sites is assumed but the reader shou1d realize that certain
. . ' .
\ L \
of the steps are electives OTS may not Se.L9Ct;;, and all steps, might be per-
formed a,t one si te.
The five sites are,
o
Office of Toxic Substances1
o
Mail receipt and accession control point,
o
Microfilming,
,
o
Reduction to machine-readable I form (machine operation)/
o
Computer processing.
Needs applicable to all sites are discussed first, and then special
-98-
l >,:..
-------
requirements, if any, for each of the five sites are discussed.
Quality control and confidentiality though distinct have a common requirement
to maintain a controlled data base.
Quality control assures the integrity
of the data base by developing monitoring techniques and procedures that
prevent the loss of all or part of any information.
This assurance
generally results from good control and monitoring procedures that permit
the location on demand of any document wherever it may be in the processing
cycle.
Confidentiali ty needs this control to prevent the compromise of
any data by limating inadvertent or intentional access to the controlled
da ta .
Further, such controls permat the detection of any loss at the soonest
possible time together wi th the in forma tion of who last held the document
and was responsible for it.
The performance site should be specifically stated and in terms that delimit
the boundaries for rroving the data.
Each site should have one individual
responsible for confidentiality and inquiries about the site. The precision
in stating the site location provides the basis for the remaining discus-
sions.
It is self-defeating to state the performance site as being the
seventh floor, for example, when there exists no physical barrier to pre-
vent. access to the seventh floor from anywhere else in the building.
TO
state the performance site as being the building does state rrore accurately
the actual location but vastly enlarges the area to be protected.
The
Office of Toxic Substances must insure that it has access to the si te for
inspection in order to determine compliance with its requirements.
-99-
-------
The Office of Toxic Substances must determine and promttlgate the procedures
for controlling access to its site and any remote site.
Access cdntrol
procedures will define the boundaries, ma terial, documents, personnel, and
accountability rules in force at each site. . The boundaries (e.g., one
room set aside as a restricted area) should be clearly llJlIrked and made
known to all personnel.
The material and documents' subject to control
should be clearly marked and identified.
'l'he extent of oontrol reflects
a cost decision which must be dealt with in an evolving system as an incre-
mental process.
Personnel must be trained and selected with needs of con-
trolled access in ndnd.
Again such training will evolve as the system evolves
from a manual system to a computer data base.
Accountability requires.
the designing of the entire system in a way that each document becomes the
responsibility of one individual at a particular point in time I every doc-
ument is so assigned and a record of the assignment is kept in some form.
A library check-out system is an exam,ple of such an accountability systeml
the range of sophistication ot library checkout systems indicates the range
ot choices available.
The performance site should ~ovide the necessary storage facilities con-
sistent with the media being used.
The storage facilities should provide
protection for the material both during the workday and after hours. Further-
more, the ~otection should extend to thephy~ical facilities as well as
, .
to confidentiality of data.
Loss through fire would impact the integrity
of the data base as well as provide a mask for the compromising of infor-
mation.
Storage facilities often have inherent flaws which render them
-100-
-------
"-
less than desirable.
For example, basement or sub-surface storage is more
vulnerable to flooding than above ground storage.
The mail room will require a speoial aooounting funotion in order to prooess
fees payable to EPA, onoe operations are underway.
Microfilming operations will require special storage capabilities for the
film media; data entry and oomputer prooessing will also require several
speoial storage oapabilities.
f.
Indexing
Al though the annual reports from manufaoturers, importers, and
processors are amenable to struoturing through good forms design and oare-
ful administrative oontrol, the test data file will probably contain many
forms and structures.
Every attempt should be made to develop Test Proto-
cols that necessarily lead to struotured test data reports, but the manu-
facturer is not limited to any "required input structure and may submit test
data results in any conoeivable form, even inoluding reports or citations
from the open literature.
Though immediate requirements do not include
abstracting and indexing, future plans should include the construction of
an abstraot for ease of dealing with the data base, the assignment of de-
scriptors which serve to provide the neoessary rfi~J..(e'lal oapabilities, and
the posting of these descriptors to the various cataloging files.
All of
these tasks are highly intelleotual tasks requiring specific skills and
training in the areas of toxioology, ohemistry, and information sciences.
This assertion does not mean that the personnel required" represent a small
-101-
-------
or expensivesllbportion of the employment fo~ce.
The.experience of the
New York Times Information Bank and the Institute for Scientific Information
show that rather ordin~y people can d.o the abs,traqting, ta$k Cl:uite adequately
and, while special technical skills are needed to abstract technical subjects,
the skill requ:J.red is not extraordinary,..
B., Comparative 'Cost/Benefi t 'Analysis "
1.
Cost 'Estimates
Based on the reported requirements, anQ:the possible syst~ offer-
ed as satisfyillg these requirements, the following cost estimates can be
used to compare the options.
At the same tim~ it musJ: be, recognized that
,cost alone cannot be the determining, factor, and that good, planning req':Jires
a consideration of many additional criteria re1~vant to a ~sk a~ i~portant
as the Office of Toxic Substances data base.
The figur,es s;pown ref~ect
estimates for the years 1975 through 1980.
Addj.tio~al qomments are, off.ered
for the period, beyond 1980.
a.
'Re.portpage 'Estimates
In order to compute the. page count of the av~age. reJ?Ort for
use in estimating costs, the fOllowing estimates were mad~.
Since the
respondents would report 10,000 productiol1 items
on 500 chemicals, the
assumption is made that each respondent will report on one chemical sub-
stance in one report.
Further, we assumia ~at for each a1;1l1ual :report there
will be one page for the manufactur~ identification and certification
and one pagl3 of each chemical and its end-uses, by-products, and production
figures.
-102-
-------
For 1975 this means the 10,000 reports will have 20,000 pages.
Similarly,
the number of pages for each subsequent year's reports will be computed
r
by doubling the number of ~oduction reports received each year.
The total
page volume is presented in Table 5 for each of the six years in the period
1975 through 1980.
. '
b.
Report 'Cnaracter 'Estimates
Estimates for'the n~er 6f characters in an average report
are based on the data elements listed in Exhibit 2.
An average annual
report is estimated to require a total of 4014 characters distributed
among the following 5 items:
o Manufacturer 316 x 1 = 316'
o Chemical Substance 265 x 1 = 265
o Production 138 x 1 = 138
o End-use 119 x 25 = 2975
o By-products '80 x 4 == ' 320
Total Characters 4014
An average premarket report is estimated to require a total of 780
characters distributed among the following 4 items:
o Manufacturer 316 x 1 = 316
o Chemical Substance 265 x 1 = 265
o End-use 119 x 1 = 119
o By-products 80 xl== 80
Total Characters 780
~103-
-------
6
.~
Annual ~~ of Report J>zocesse8
1975 1976 1977 197. 1979 1980
LcM B1gh- lDII B1gb l.otI High LaII 81gb LaII High Low Bigh
1. AnDual RePorts 10,000 ~,ooo 30 ,000' 50,000 70,000 90,000
2. NUlIIber of -. . -
c.'Jaracter># . -40,140 8O,2f!0- 120,420 200,700 280,980 361,260
3. NlJ1!Iber of Pages- . 20 ,000 40,000 60,000 100,000 140,000 180,000
. .-
-
4. Pre::aar~et Reports 500 500 500 500 500 500 .
5. Number of -
~racter" - 390 390 390 390 390. 390
6. Nu:!lber- of Pages 1000 1000 10Q0. 1000 1000 1000
7. 1'otal Character" 40,530 . 8t? ,670 120,810 201,090 281,370 361,650
lCe!/ punching Costs
8. ($.30-$.50/1000) $12,159 $20,265 $24,201 $40,335 $36,243 $60,405 . $60,327 $100,545 $84,411 $240,685 . $108,495 $180,825
(130-Char/llJin) _2.6 -tfJr - 5: 2 -tfJr - 7. 7 8/~ 12.9 lIl/'.l~ - - 18.0 m/lJr 2!.2 TlI/fJr
.-
9. Total Pages 21,000 41,000 61,000 . 101,000 141,OQO: 181,000
-
Ificrofilm Costs
10. (.25 to . SO/page) $5,250 $10,500 $10,250 $20,500 $15,250 $30,5110 $25,250 $50,500 $35,250 $70,so.O $45,250 $9Q,SOO '.
-
.
:
,.
-
.. in 1,000'5
!'able 5 - -
-
-
&I." "......" .-=:I!!-~~
-------
c.
Other Estimates
o
Keypundhing
$.30 to $.50 per 1000
130 to 150 keystrokes
keystrokes
per minute
o
Microfilm
$.25 to $.50 per page
o
Microfilm Viewer with Automatic Retrieval
$1,500 to $6,000
o
Computer Processing Time
$350 to $375 per hour
o
Programming/Consulting Rates
$15 to $30 per hour
All figures are rough approximations hased dn published material or
previous experience..
2.
Comparative Analysis
The following presentation compares the cost/benefits of the
manual system with the computerized system.
In areas where appropriate
as a result of the comparison, the enhanced manual system is considered.
a.
Indexing Point Capacity
Manual System
..
-./.~ ~~~";i':
Pro's:
The manual system, because of its inherent limitations,
<.
uses fewer people at lower levels to maintain the six indexing.points pro-
posed.
Similarly, the.start-up co~~for initiating the various catalogs
.' ..r .,' ...,(.,;..:,~J::"
. ....,..-:' .
is negligible.
..
Con's:
Because of the high risk of error in manual operations,
particularly in high volume tedious work, the manual system may produce
intolerable errors in the OTS data base through faulty catalog posting,
{-'.
-105-
-------
filing, and the necessity for hand computations.
The nuiriber of indexing
points is inflexible and even the addition of"onem,re ~ndexing'point to
the existing data base is a significant task.
The need :to perform coordi-
nated searches necessarily implies "pulling""two or1OC)re catCilog cards to
perform searches, and compounds the problem of maintaining data -base integ-
rity.
'. . .. ""
Six reference points must be considered a-restricted or Timited search
capabili ty.
Refiling pulled cards is as prone to error as theorig!nal
filing.
The ~nual system necessarily becomes an in-house capability be-
cause it manipulates the ori-ginal documents and opportUnities to sub-contract
functions becomes 1OC)re difficult to find.
""Computerized System
Pro's:
The computerized system's indexing "points are ~lmost
unlimited and all fieids in the file can become inde;cing -points, at increased
processing costs.
The computerized system offers automatic cCitalog posting
in the sense that catalogs become unnecessary or, at least, invisible to
the users.
Users work with copies of tbefile rather than with catalog
cards and therefore remove nothing from the indeiting ca talogs which might
interfere with the searches of other users.
Though computers, too, can
commit blunders, the mechanized system repeats proven functions accUrately
and is much less prone to a multiplicity of minOr errors. The computerized
system lends itself to sub-contracting the indexing and- posting tasks since
copies of data "can be generated and combined easilg.
Con's:
The indexing points and the terms contCii'ned therein
require careful coding since the computer is very literal and sees differ-
ences between terms like sodium chloride and salt.
-106-
-------
b.
Input "lag"
"Manual "System
Pro's:
Since the bulk of the reports are annual reports the
amount of time between receipt and posting of incoming reports is not crit-
ica1.
However, premarket reports with a response time of 90 days require
a priority system to insure compliance.
Con's:
In a manual system, overview and tickler file functions
necessarily are manual, too.
The oneroUS administrative functions must
be borne by the clerical staff.
Computerized "System
Pro's:
The computerized system has the capability to provide
any degree of currency with parallel increase in costs.
In addition to
this, once the data base is in machine-readable form a variety of adrndnis-
tra ti ve repprts can be prepared to permi t management to moni tor the da ta
base's currency.
Con's:
All input must be reduced to machine-readable form.
c.
"ResponSe" Time
Manual System
Pro's:
The system proposed provides an adequate response time
for a very limited set of queries.
In most cases the users may perform
their own data base searches.
Con's:
The preparation and performance of any complex query,
can stretch into several days.
computerized "System
Pro's:
The response time can be reduced to any future needs
of OTS.
The proposed system will provide 24 to 48 hour response time for
-107-
-------
a virtually unlimited set or queries includi~g computational andcoinplex
boolean search ca~jljtjes.
Con's:
None.
, d. "Data 'Extensibility
'Manual 'System
Discussion:
The manual system offers little help in restructur-
ing the data files to take advantage of future goals.
Any reordering may
requ.:Lre.manual sorts w.:Lth that fHe unava.:Llable for accessing during the
sort.
ComPuterized 'System
Pro's:
Once cOlnJ>uterized, the data base can be extend~, restruc"
tured, r.organ.:Lzed, and expanded in many dif~erent. ways., This attribute
should be particularly desirable .:Lf the data base a~d its usage are subject
to increasing denllnds and evolving needs.
New fields can be added trivially
and references to data can reflect infor~~ion abou~ :the data, (for exam;le,
"not collected prior to 19'16", etc.).
Con's:
The flexibility described above can become its own
worst ene~y and knowledgeable control must b~,exerted to prevent the destruc-
tion or confusion of information al1!8Itdy in the data base.
,
Query Extensibility
'Manual 'System
e.
Pro's:
Queries can always be processed by onerous serial
searches of the data base.
Con's:
The indexing points anticipate the possible queries
which may be addressed to the data base but, subsequent changes to them are
difficult and require significant ex~nditure of clerical effort. '
-108-
-------
Computerized System
Pro's: As in the data extensibility discussion above, the
computerized system gives great flexibility to query responses to anticipated
questions but even novel searches requiring serial processing of the entire
data base can still be accomplished in the 24 to 48 hour response time
required by OTS.
Con's:
Query construction requires specialized skills but
technical users can be trained to perform fairly complex searches.
f.
Reporting 'Facilities
Manual System
Discussion:
The manual system offers no aids to the prepara-
tiop of the reports resulting from searches on the data base.
Computerized System
Pro's:
The computerized system offers a variety of reporting
capabilities.
At the same time the data base is searched, the computer
can perform computational tasks, formatting, cross-footing of columns,
pagination and high-lighting or summarization of the entire data base.
Con's:
As in the preparation of computer queries, the struc-
turing of computerized reports requires specialized skills in complex cases,
and training of OTS personnel in the simple queries entered directly by
users.
g.
Availability of 'System Components
Manual System
Discussion:
The manual system offers no problems in finding
the equipment, personnel, or the development of pr,ocedures.
-109-
-------
'Camputerized'System
Pro's:
The praposed camputeriz,ed system a£E~rs 1111 aff'7the-
shelf camponents.
There is no. develapment time ne.ces~ary fo.r th~ basic
saftware. The necessary hardware can be pravided 'thraugh utilizatianaf
the existing in-house camputer installatian, af computers a~ athe~ EPA sites,
ar thraugh contractor services.
Aside from OTS'~ desire to' keel! the archi-
val files stored within OTS's an-site starage facilities, the comput~rized
system may be permanently sited at the cantractar site if' the praper pre-
cautians are prt?vided.
Can's:
Develapment lead tilRf3 is required far the OT$ ~~pli-'
These pragrams will be prePar~d, in a high-ley-el pra~ram-
catian pragrams.
ming language and will undaubtedly be very stra,ight farward and require
less than two. weeks far each such application ar query.
In fact, same simpl'e
queries could be Prepared in a day.
However, saWe prablems If!ay require
significant lead time because af the prablem c,ap/plexj;ty.
h.
PersannelCanstraints
Manual System
Discussion:
The manual system b~cause of its ipJ1erent l-imi-
tations requires only clerical level help with ski-lIs in library or infar-
matian sciences.
Though experience in toxicalogical s~ience ~ill be useful
to these people, undoubtedly an adequate level of training can be ~ccomplished
in-house.
'Computerized System
Pro's:
The computerized system replaces the need for much
af the clerical labor required to perfo.rm the basic req!Jirements aE the
-110-
-------
""-
OTS. However, the staff of computer oriented personnel needed will be sig-
nificant1y higher paid than the clerical staff it replaces.
Though more
expensive, the skills necessary should not be difficult to acquire.
A1 ter-
natively, with a computerized system the ability to sub-contract tasks be-
comes feasible.
The trade-off of replacing clerks wi th programmers should
always be thought of as necessarily unbalanced.
The programmer is a too1-
maker who builds tools for tasks that would be infeasible with the same
dollar's worth of clerks.
Furthermore, the task can be repeated over and
over again with the same accuracy; .it can be done at speeds unthinkable
with clerks; and it can be done at anytime, day or night.
Con's:
The requirement to maintain the archival file of re-
ports necessitates the parallel structure of a manual file even if the
computerized approach is taken.
Further the archival data base must also
reflect the administrative actions taken by the OTS, such as, including
notification of non-compliance, for example.
i.
Security
Manual System
Pro's:
The simplicity of the manual system offers the easiest
path to a totally secure system.
As the Section on confidentiality states,
the entire file is placed in a secure storage place and submitted to security
procedures similar to those applied to any valuable item.
Con's:
Manual systems are prone to errors, undetected losses,
misfiles, and other human-related frailties which may mask or even create
the oPPOrtunity for the compromise of confidential data.
Computerized 'System
Pro's:
The computerized system can provide many supervisory
-111-
-------
or monitor.:i.ng capabilit.:ies that fac.:il.:i.tate thedeteqtion:of unauthori~ed
access, document losses, and .:improper da ta .:i.nsert.:i.ons.
These funct.:i.ons
can be performed in conjunction w.:ith the ,normal op~at.:i.o~ of t!2e syste,n
. . . .'" . .
or quickly and accurately performed~ for 'ad 'hoc purppfSes, ,acaP!ibility ,
"
imposs.:i.ble with manual systems.
'.
Con's:
In contrast to the simplici ty of the manual system,
. .
the com;>lexity of' com;>uter systems (es~ially, time-shared systems us"ing
communications lines) makes any assurance of complete co~uter security
very diftiC:ul t. '
Co~rative'CosCs
'l'he comparat.tve costs for the Sl/stems proposed are ;resentedin
3.
the follow.tng tables.
'l'h, costs of the ¥hanced manual system are discuss-
ed under the cost/benefit analysis at each function.
Costs associated
wi th overhead have not been. included in anll ot the, estimates, s.1,nce they
rema.tn fairly constant acros.s the various phMes.
, ,
'l'hese exp~ses include
such, items as forms printing, office equipent, floor sl>llce ren~al, supplies
and ;botocopying.
'l'able 6 prese~ts ~e personne~ costs for the manual system over the six
, . ~ '. ,'.", .
"
Personn~l costs represent the tot~ 'c~s~ asso-
year period 1975, to ,'1.980.
eiated with the manual system.
"
Table 7 presents the persOTl!lel co~,tiJ associa~edwith the computerized sys-
"
tem.
The personnel costs are entered into TQ.bie 8 to prov.:ide the total
costs for the computerized system.
It should be noted that the personnel
,.
described may be:.within the staff. of the service organization pro~iding
the functions.
-.1..1.'&-
-------
-~r
Phase I Personnel Roster and Costs y
1975 ~ W2. 1978 JJ!Z..! d!!2..
Facility M~nager (GS-12) 1 $ 18,000 1 $ 18,000 1 $ 18,000 1 $ 18,000 1 $ 18,000 1 $ 18,000
Technicial Information 1.5 15,000 1.5 15,000 1.5 15,000 1.5 15,000 1.5 15,000 1.5 15,000
Specialist ((;S-7)
Data TecPnicians . (GS-5) .5 4,500. 1 9,000 1.25 .11,250 2 18,000 3 27,000 4 36,000
.' Libra:cy Technicians (GS-4) 1.75 14,000 3 24,000 4.75 38,000 7.5 60,000 10 80,000 13 104,000
..... <)
....
'"
I
Record Clerks (GS-3) .5 3,500 1 7,000 1.25 8,750 2 14,000 3 21,000 3.5 24,500
File Clerks (GS-3) .5 3,500 .5 3,500 .5 3,500 1 7,000 1.5 10,500 1.5 10,500
TOTALS 5.75 $58,500 8.0 $76,500 10.25 $94,500 15 $132,000 20 $171 ,500 24.5 $208,000
Y Personnel figures are stated in person-years.
Tc1ble 6
-------
Phase III. personn.!'l ~s~~r and Costs Y.
," 0. "
m2. 1976 illL ~ !E2.. ill£.
Facility,Manager (GS-13) 1 $21,000 1 $ 21,000 l ' 21,000 1 $ 21,000 1 $21,000 1 ,$ 21,l!00
r Systems Analyst (GS-12) .s 9,000 .5 9.,000 1 18,000 1 18,000 1.5 27,000 1.5 27,000.
Programmer (GS-12) .5 9,000 .5 9,000 1 18,000 1 ~8,000 1.5 27,000 1.5 27,000
Technical Information 1 .10,000. 1 10,000 1 10,0.00 1..5 10,500 1.5 10 ,5~0 1.5 10,500
Specialist (GS-7)
Data Technicians (GS-5) .5 4,500 1 9,000 1.25 11,250 2 18,000 3 27,000 4 36,000
~. Library' Technci~s (GS-4J. .5 4,000. 1.2~ 10,0.°0 2 ,tJ,qpo 3 12,.000 .4.25 ?4,000 5.5 44,.000
. ,
~ K~YPunch opera~~rs -
lGS-4) 2.6 20;265 5.2 40.,135 7.7 ~O ,~p5 12.9 10.0,545 18 140,,685 23.2 ~80,825 Y
Record Clerks (GS':"3) ..5 3,500 .75 '5,~50 1 ."O(}O 2 14,000 2.75 19,.2.50 3.5 ?4,500.
FHe Clerks (GS-3) .5 3,500 .5 3,500 .5 . 3,~,OO 1 .,,000 1.5 10,500 1.5 10,500
.:.'
lIicrofi1111 Clerks .5 3,5()(I 1,..'; !n,~" .2.5 . 17,5('a 4 ~8,pOO 5.5 3~ ,5.00 7.5 ~2 ,5.00
TOTALS
. 8.1 $88,265
13.2 $127,585
18.95 $174,655
29.4 $247,045
40.5. $355,435
50.7 $433,825
Y Personnel figures are stated in person-years.
Y From Table 5, Annual Costs of Report Processes
Table 7
....--....-c.,.......--.:-
-------
a.
Computerized System
Data base maintenance costs for the computerized system are
based on a weekly pass of the entire data base on magnetic tape as a
batch process.
The parameters used are:
a magnetic tape reading speed
of 45,000 characters per second; and computer processing costs of $375
per hoUr.
Reducing the periodic processing rate to monthly would reduce
the cost to one quarter of the values shown.
Information Retrieval costs necessarily reflect the number and complexity
of the questions applied to the data base.
Many~of the reeurring queries
can be processed dux ing the file maintenance processes.
However, an
arbitrary one hour of processing time per month is included in the cost
estimates for special queEies each month.
In a batch system, query processing
time may increase with the growth of the data base, but this' factor was
ignored.
The costs for archival file maintenance are taken from the manual system
CGS ts .
The costs are based on a judgement that the filing and accession,
control, and maintenance of the archival file can be accomplished by
three record clerks.
-115-
-------
lS75 I 1976 1977 1978 1979 1980-
--
Report 1/01 wue
1. _Annual 10,500 20,500 30,500 . 50,500 70;500. .90. ,50.0.
2. nail!l - 42 82 122 20.2 .282 362
File Storage .. - -
~~ Pages . . 21,,000 - 6~,OOO 123,000. 224,0.00 365,0.0.0. 546,0.0.0.
4. Characters(l,o.OO'sr 40,53l! :121,200 242,010 443,10.0 724;470. 1,0.86,120.-
~
0
- I
, - t.
- -,
Personnel CostS $88,265 $127,595 $174,655 '247,045 $355,435 - $433,825
-
-
.,
Computer Processing
Data Base lfaintenance $ 4,1118 $14,589 $29,130. $53.336 $ 87,2o.~ $130.,736
- . ;
Info Retrieval $ 4,500 $..4,~ $ 4,500 , 4,500. -. $ 4,500 $ 4,50.0.
-~ -. -
Hicrofilming Cost $ 7,1115 $15,315 $22,1115 $31,875 : $ 52,875 $ 61,875-
-
0 0-
.. -
!'02'AL .. ;
~ ..
-.
-. :
!'able 8
.
-
- - --------
COIII;puterized sysUm Q)sts
t
....
....-
0\
I
.
. r - -~- . -.-
-" _. ~ -~ ----
.,
- -
-- -----.
.....,.--....- ~~...-
f.._~....;..P'--':""~-.NC~.~,.~.-_."': . ~- ....~. 4"
~~ . -
~
-------
b.
Enh~nced'Manual 'System
In comparing the rather basic manual system against a comput-.
erizedsystem dependent on the full reduction of annual and premarket re-
ports to machine-readable form, the recurring theme is the low cost of the
limited manual system against the flexibility of the expensive computer
system. The enhanced manual system described various options which permitted
the use of the computer to extend or expand the manual capabilities without
taking the full step to a computerized system.
The following discussion
gives a cost/benefit analysis of each enhancement described the Section
III(B).
1.
. Indexing Aids
The benefits appreciated through this approach over the
manual system include the savings inherent in the avoidance of several
typing and proofreading operations on each document indexed.
Final hard-
copy can be produced. on a variety of devices with an accuracy and cleanness
not possible with typescript.
Subsequent corrections can be applied to
the basic document wi thout re-typing the entire document and yet producing
a clean hardcopy.
A further benefit is the development of a machine-read-
able data file which can later be used in the OTSData Base or, with suit-
able data conversion, in existing data bases such as TOXLINE.
Cos ts range
from approximately $4500 for a Datapoint 2200 minicomputer to several dollars
of computer processing time from a terminal that can access System 2000.
This approach-could extend by several fold the number of input reports
processed by one clerk.
This approach would involve a negligible cost
increment.
-117-
-------
2. . Catalog 'posting 'and~'Publication
Cost considerations could never justify, thie appro.ac!1,
particularly in a slowly growing data, base:.
Emphasis; must be piaced on
the capabili ty of having a very large number of indexing point$, an.rJj the
abailability of other computer oriented tools, such as Key-Wo;r.d-ID""!Context,
listings.
Computer processing costs will actua,lly :increue, eJfP::m~ntially
as the total number of documents in the data, base, increases linear 1,1}. How-
ever, the tasks of assigning new indexing points, regrouping old indexing
points, assigning synonyms, and reordering the minor sort-keys can be
accomplished with very little additional cost at the next periodic catalog
prepara tion.
The factors affecting the cost of this approach include the indexing of
incoming reports onto a machine-readable form.
This value is a direct
function of the nUmber of indexing points selected and the cons,equ~t num-
ber of characters required to be keystroked.
The skills required exceed
that of an ordinary keypunch operator and will undoubtedly require in-house
training.
The processing costs necessary to sort and selectively print
the various catalog listings are another cost factor.
The' keystr~king
of the catalog indices is estimated at one tenth the cost of keystroking
the entire report.
Personnel .costs are estimated tQ exc~ed other ~nual
personnel costs by 30%.
Processing time is estima tad to be approJ{ima tely
one hour per month.
3.
Data 'Descriptor 'vocabulary
A major benefit is the control exerted on the ambigu:ities
of the English language and the development of an organized way to addre~s
-118-
-------
the problems of synonyms, olassifioation of oonoepts, and neologisms. The
indexers and future abstraotors must oontinuously battle these problems
and the periodic publioation of the data desoriptor vooabul~ry in diotion-
ary form will assist them.
The oost of a oontrolled vocabulary is estimated at $4000 to $5000 a year
in oomputer prooessing time.
The benefi t is too intangible to permi t oom-
paring a oontrolled vooabulary to an uncontrolled one.
4. Data Element Dictlonary
A Data Element Dictionary provides a means to oontrol
the data files and a means of communication to other potential users of
the data base.
It eases eventual computer~zation by formalizing data fields.
5. Bibliography 'preparation
The cost of maintaining an updated list of test data doo-
uments at the estimated input volume is approximately $4000.
The cos t
of produoing this program from existing generalized programming aids is
$3000.
The utility of suchadhoo special lists rests in the ability to
satisfy unantioipated needs.
6.
~utomated,Mai.ling'ListandTicklerFile
The oost of programming using generalized programming
tools is estimated to be $4000.
The oost of eaoh run will be a direct
furiotion of the number of respendenf~' in the~ file.
At the rate of 10,000
to 90,000 annual reports estimated for the six year period, the annual
cost eaoh year would be less than $10,000.
-119-
-------
7. . .Microfilm 'working 'Files
Microfilming has been justified in several installations
. ..
in terms of the storage space saved, the insur~nce, providedagainst.catas-
trophe, savings over paper reproduction (:osts, and redu(::ed .office space
requirements.
Such an analysis awaits mpre precise data,' of the" aC.tual form
,. ,...:!...'.
of many of the documents O~S will be receivi,ng.
HCMever, .the primary ben-
efit to be em;hasized here is its potentiai for un~on:with future computer
systems.
Microf.tlming costs .25 to .50 per page for the master film.
First
year costs are est.tmated to be $'1000 to;$lO,OOO.
~he sixth year costs which
may be cons.tdered .tndicative 01 the annual rateareest.f,mated all approxi-
. ..
mately $45,000 to $90,000.
B. ' 'M:l.crol:l.lm 'Retrieval ;S!lstems
, .
. ".
Such a By_tem would b. ~imarill/ intended lor the test
dolt. f:l.le wh:l.ch .1B unl.1kell/ to ev.rbe,.t'u.1ll/,~l2J)utt.d iJ2.to'.'~l!I.Pu#.r re-
trieval sl/stem.
COlrI,Puterilled bibliographic search systems, if implemented,
would augment the benefits ach$,eved from ~uch a s!lstam.
C. 'Gu.tdBl2ce tor 'RFP
..
1. 'Recommendat.ton'lor'Stand.rd'Pro»QsalOutline
, . ,
. . -
~he Request for propos~ls under considerat.1on by the Oftice of
Toxic Substances will seek to satisfy. ,the:. data management needs &mder the
proposed Toxic, Subs.tances Control Act.
, ,
By seeking contractor services to provide the administrative and clerical
functions associated with the extremely important toxic substances data
base, OTS will free :i.ts staff for the oversight and regulatory responsib:i.l:i.tieG
-120-
-------
under the proposed Act.
Since the basic responsibilities must remain with
OTS, every effort must be made to insure that the ultimate vendor can, indeed,
meet his delegated responsibilities.
Because of the unique nature of the new EPA responsibility, the uncertainty
of tasking parameters, the possible changes in the proposed legislation
before passage, and the .al1lr1st immediate production requirements of the
legislation, .the Office of Toxic Substances must seek a contractor who
can satisfy a variety of possible approaches to this task.
At the outset,
the requirements of the task may not justify full time assignments of per-
sonnel.
However, the continued growth of the data base is assured and
the contractor must have the abili ty to meet the growing demands.
Further,
the contractor must have the experience needed to contribute to the develop-
ment of enhancements and extensions to the evolving system.
The Request for Proposals must require all bidders to respond in conformance
with the standard proposal outline.
Adherence to this requirement will
facilitate proposal evaluations, insure complete responses, and satisfy
the need for assuring each bidder of impartiality.
The content of the standard proposal outline is intended to provide OTS
with the necessary information to evaluate the ability of the bidder to
perform adequately the tasks proposed.
The weights assigned to the various
sections indicate NBS recommendations for the relative importance of each
of the sections.
r-
-121-
-------
. Standard Proposal Outline
III.
. IV.
I.
II.
v.
Statement of the Problem
Overview of Suggested Processing Flow
A.
Initial Manlla'l System
B.
'Potential Manual System Enhancements.
C.
Computerized Information Processing system.
Proposed Manual System
A.
Forms Design'
1.
Source lOr Services
(in-house, sub-contracted, etc.)
2.
Performance. Site
(actual site where fun.cticn is ~rfo~).
3.
Facilities Available
(description of equipment, pro.cequres" and s~ial
skills required for. this. functio~)
4.
Personnel Assigned
(statement of individuals 8Ai~igned and. p;erc~1: 0-tt'
time required) ,
5.
Confidential provisions
(statement of canfiden~iality provis.iops speci~ic
to this function)
B.
Mail Receipt and Accessicn. Control
11
C.
Report Preparation, Compilation, Binding, and
Dissemina tion . y
Potential Manual System Enhanc~ts
A.
Microfilm
y
B.
Reduction to Machine-re4d4ble For~
y
y
Computerized Information processing System
The 8ubsections in III (A) are repeat~ here.
-122-
'5%
-
'10%
20%
10%
20%
-------
VII.
VIII.
A.
Computer Processing in Support of Manual System
Enhancements' Y
B. 'Generalized Data Management System Package
y
VI.
Proposed Confidentiality and Sedurity Procedures
A.
Contractor statement on the ownership of all data
developed under this contract: compiled lists; catalogs;
reports; computer programs; and all storage media.
B.
Detailed Plan for Meeting Confidentiality Requirements
Personnel
A.
Organization Plan
(describe relationships, reporting channels, and
relationship with the Office of Toxic Substances)
B.
Resumes of all Non-clerical Personnel
(include relationship with contractor [full-time
employee, consultant, etc.], length of service
with contractor, pertinent experience, and position
for which offered)
Corporate Experience with Similar Tasks
IX.
Government Supplies or Services Required by the
Contractor
X.
Equipment Necessary to support Proposal
~ Subject to final determination by OTS.
Y This entire form is repeated for the first year, second year, and
third year costs as major headings A, B, and C respectively.
-123-
15%
5%
8%
5%
2%
-------
XI.
Cost P1gures
/
A. Annual Costs
y
C. .
'l'hird' Year
-1.
A.
First Year
Parameters used for . .
. costs
B. .
Second Year
AnnUal RePorts y
Premarket Reports :
y
'l'est Data Reports y
Activity ,against data base
J.
Premarket quer~es
SUIJIIIU'JI Re.,por~
. ,.
~. Antla.:L,lMted Co." Undez the P~[;OSed II4Inual, Sy_tem
, .
Forms Des,ign
Mail Receipt and Accesslon Control "
"
"
'l'echnlcal Analysis
ReJ'OZt P~e;aration, Com;jlation, etc.
,
'$
'~., . .
3.
Enhancement Costs
Mi.crofi.lmJ.ng
Master ii.lm oost.per ~ge
Annual Costs
$
WorJci.ng or reuievalcopy/page
Annual Costs
,
... ."
Reduction to Machine-Readable Form
- ,
- ... . .
'" ~'. :
,. ,
Cost per 1000 characters of data
:. . . "'
< ,
. ,
'-.,.
. 'J"
Annual Costs
$
-124-
-------
IJ
4.'
CQll,PUter Processing Costs
Enhancement Support Costs
Computer Processing Costs
Per hour
Annual Costs
5.
Data Base Processing Costs
Computer processing Costs
Per hour
Annual Cos ts
On-line Storage Costs
Appropriate unit cost
Annual Costs
Software Development Costs (annual)
List of Products Developed
6.
Equipment Costs
List all equipment required imder
the proposal
-125-
$
$
'$' , , ,
'$ ,
'$
-------
v.
RECOMMENDATIONS
A. . 'OVerview
I ,
Current estimates of reports that will be received., as.' a res.ult of the
regulations issued by the Office of Toxic Substances;,reveal that by',198Q,
OTS will be receiving an annual inflwc of 90,500repor.ts (or 362 a day)
and will be dealing with a file.of oVer'.a haJ.fmillion pages of data.
This, combined with the planning cons.traint of limiting OTSr.s,'par.ticipa-
tion in this w01:'k to fo.ur people over this,'six year time, frame, strongly
suggests that computerized assistance will. be. mandatory b.y 1980.
Further-
more, there is no .reason to: assume. that the.1980'.inptJt Irepor.t...volume will
not continue to gr.ow in followingyear.s.
HOWever, the initial volume of
10,500 reports (or 40a day) dO.es. not justify the use of. automa.tion.
" .
Therefore, it is recommended that the"manual sgs:t;em.be.installed.wifh a
.,;,0...
l.ife time expectancy of four years (1975-1978). and that this time. 'be used
to gain experience with real data, collect .stabisti-cs., reaffirm, the esti-
mates, develop assessments of the available enGOding schemat:a, and "shake-
down" the reporting form' desi'gn.
However, the planning for the eventual
autoznation sho"udbegin at once w.ithservices sought from EPA"'s Management
In£ormationandData Systems. :bi.vision ,'on ap'p~o~ia.te. standardizat:ion and
development plans for the enhancements. ,to,.:the. manualsys:tem.
It. is also
recozmnended that O'rS' use the on-board
~u-tier system -and .t1ie S~t6lll.
'.
2000 Data Man.agement system for the enhancements.
Both of-these systems
are currently in use in EPA's I-fanagement 'Information and.Data Systems Divi-
sion.
-126-
-------
Though experimental or pilot, ,.p:rQgr~JTIS- ,?~~ ,~.qQ7J!:!~g/?d" pr.oc;1uc.tiol}. ke~-: " "
strok.Lng. anc;1 ;niC:fofi.:l..1J1j.pgo:1j1gJ~;L~:;.9.53_.4f}.fPyed U!L~iJ .tg.et.hJ..rd .Yt?ar wh.~n :t;h'f
aggregate of the t'h+.~ey(}a,~s; wj,llvpff.f!(f $.ufficient volume for.pub-~ .. ;'.'."
contracting (206,000 .pages. t~ .mic.rofl).mI. ~04 milliop. c~r~~er~ 1=.0. k~y-
stroke).
Microfilming cannot be justified if actual experience shows few
retrievals of the annual .reports.
Microfilming of the test data file does
not seem justified on the, b.a~~s of c.u~rent vol,u!l1e..; and ~equirements.
. '. ~
B.
'Recommended 'progr<'4m~
",,: 1,' ~ ,,'
.'.,' '-'. J. ,', '),' .,'
'.
,.
1.
First 'Year' (1975)
With assistance (possibly of the Bureau of the Census) /. design
a reporting form which supports and explains the informational goals of
the published regulations.
Initiate the manual system by hiring the iden-
tified skills.
Begin a monitoring p~ogra~ to develop statistics of actual
experience with r~spondents and OTS data.,users. . Collect and analyze end-
use responses for refinement of reporting requirements and possible develop-
ment of other end use codes.
Begin collection of data descriptors to pro-
vide the basis for a controlled vocabulary.
Investigate possible keystroking
and microfilming services.
2.
Second Year (1976)
Continue use of manual system/ statistics collection, and
nt)nitoring of actual experience.
Resolve end-use coding question.
Develop
experimental programs for microfilming formats!:!. test data file growth
-127-
-------
1975
Design Report
fOIm
Initiate
~
System.
."
-
ca
c
~
Estab~
~toring
Program .
Inves1:j.gate
fuj ~ codes
0\
.
I
....
~
Cb
I
.."
a
ca
.,
C»
:3
.."
~
C»
en
-
=-
. ca.
Begin IDanuaJ.
CollecCon 'of
Data ~ptDrs
Investigate
Keystxoking aid
Microf~
.Ser4ces .
::.
"
I91e
Initiate
nrta ElAlllPRt
- DictiaIary
Caoplete
»d use
Stuly
-.
Perform
ExpeciDB1ta1-~
KicrofilJiDng ,~
Initiate .
~~ted_.
~
Ibsting
~ I\w;ItI;
Bfl-
..
--
;~Initiate
. Production
Keyst:ro)dng
Review
'Microfilm
-
- Initiate
. Autalk1ted
System
1978
Manual ~'
System. .:In
~e1. with
Ccmprterized
System.
Review User
RequiraIerts
-
.-
,.
AutaIated
. System.:in.
, full operation
1979 - ]900
..
..:.Develop Library
"of Recurring
-- Queries. .
,
,"
-------
justifies it.
Begin development of automated catalog posting and publishing
in parallel with manual system posting.
Develop automated indexing aids
when catalog posting process is operating fairly reliably.
Expand the
number of indexing points.
Begin development of Data Element Dictionary.
3. Third Year '(1977)
Continue use of manual system, statistics cOllection, and
monitoring of actual experience,
Review microfilm usage for test data and
for reports on basis of retrieval activity.
Begin production keypunching
of annual and premarket reports in parallel with manual system enhancement
processes.
Begin development of computerized system in parallel with
manual system.
4.
Fourth . Year '(1978)
Continue use of manual system in parallel with computerized
system but set date for switch to full computerized system for beginning
of fifth year.
Begin intensive review of system satisfaction of user
needs.
5.
'Fifth'and'SixthYears(1979-1980)
Continue use of computerized system.
Develop library of
recurring queries and re~iew completeness of data base.
Review data base
growth and act:i vi ty.
-129-
-------
C. . 'Specific 'Topics
. .. . . . ... ...
1. ' 'Chezilical"subst~nce 'searching
By including the CAS Registry number in the manufactt,IEer reports,
. . .
the OTS will have the abili ty to search for. one. orllr:'re. unique chemica,l
substances in either the manual or computerized system.
Chelnica~ s4b-
structure searching will not be possible wj.th any of the systemsde~cribed.
However, it is' ,recommended that this. dapabili ty be obtained ,through. the use
of existing EPA chemical searching capabilities. as an auxiliary, independent
operation until the utility of duplicating the chemical structure data
wi thin the OTS data base can be establish~,.
2. 'C011fidentiality.
Under the~nual system the confid~tia) data. wi~l be compl~tely
located within the OTS and protected a~ any, va,luable i,tellJ should' be.
Hc;M-
ever, the series o,f recommenda tions rpade in the Confidentiality Section
represents a significant effort which must be addressed.
Under the com-
puterized system, OTS must seek a totally dedicated system and must bear
the subsequent expense.
3.
Reporting. Forms
OTS should seek cOJllpetent assist~ce. ~n the development of a
reporting form which satisfies informational needs/, manua~ and computerized
processing considerations, and the task of reducing the reports to machine-
readable form.
-130-
-------
4. ' 'Interface 'with 'Existing 'Data 'Bases
OTS should participate in and encourage efforts by EPA Manage-
ment Information and Data Systems to develop standards for data base
management.
-131-
-------
Appendix A. Existing Bibliographic Data 'Bases
The" following list of bibl1.ograpl1Jc da'ta 'bases which 'are pertJ.nent
to ,the needs of the Office of 'lode Substances was'compiled from 'NBS Technical
Note 814, "A'mechanized Information Services Catalog."
A description of
the entries can be found in the parent document. 1
1. ABSTRACTS ON HEALTH EFFECTS OF ENVIRONMENTAL POLLUTANTS
2. INDEX TO API ABSTRACTS OF REFINING LITERA!1.'URE
3. INDEX TO API ABSTRACTS OF REFINING PATENTS
4. BIOLOGICAL ABSTRACTS REVIEWS
5. CA INTEGRATED SUBJECT FILE
6. CHEMICAL I ABSTRACTS SERVICE SOURCE INDEX
7. ' CHEMICAL ABSTRACTS CONDENSATES
8. CHEMICAL-BIOLOGICAL AC'1!IVI'1!IES
9. CHEMICAL. MA1UCBT ABSTRACTS TAPE
10. CHEMICAL TITLES
11. COMPREHENSIVB DATA BASil OF PATIlNTS
12. COMPUTER BASIlD NUCLEAR MAGNETIC RESONANCE LITERA'l'URE
RETRIEVAL SYSTEM
13. COMPUTBRIZIlD INFORMATION RE~IEVAL SYSTBM OF THE GAS
CHRONA'1!QQRA:PHY LITERATURE
14. CPI MAGNETIC TAPS FILE
15. CURRENT PROGRAHS
16. EXCBRPTA MBDICA
17. FOOD SCISNCB & TBCHNOLOGY ABS~TS
18. GBOWGlCAL RBFERENCE FILE
19. GOVERNMBNT REPORTS ANNOUNCEMENTS
20. INDEX CHBMICUS REGISTRY SYSTBM
21. INFORMATION SYSTIM
22. INTERNATIONAL '1!UE DISBASE REGISTER
23. IOWA DRUG INFORMATION SBRVICE
24. ISI CI'.rATION MAGNETIC TAPES
25. ISI SOURCE INDEX MAGNSTIC TAPES
26. MASS SPEC'rROMETRY BULLETIN
27'. METALS ABSTRACTS INDEX
28. PANDEX C'URRBNT INDEX TO SCIENTIFIC AND TECHNICAL LITERATURE
29. ABSTRACT BULLETIN OF THE INSTI'l'UTE OF PAPER CHEMISTRY
30. P.A.S.C.A~L.
31. PATENT CONCORDANCE IN COMPUTER - READABLE FORN
32. PATENT OFFICE MECHANIZED SEARCH SYSTEMS
33. PSSTOOC
34. PETROLEUM ABSTRACTS MAS'1!ER RECORD TAPES
35. POLYMER ~CIENCE AND TECHNOLOGY
36. RINGDOC
37. S.A.B.I.R.
38. '1!EXTILE 'tECHNOLOGY DIGEST ),CEYTERM INDEX,
-132-
-------
39. TOXICOLOGY INFORMATION CONVERSATION ON-LINE NETWORK
40. UNION CATALOG OF MEDICAL PERIODICALS
41. VETDOC
-133-
------- |