METREK DIVISION,
MTR-7558
Vol. I
Chemical Substances Information Network
Volume
User Requirements and Systems Development Options
M. BRACKEN
J. DORIGAN
1. HUSNON
J. OVERBEY.II
JUNE 1977
fflnna
-------
MITRE Technical Report
MTR-7558
Vol.1
Chemical Substances Information network
Volume I
User Requirements and Systems Development Options
M. BRACKEN
J. DORIGAN
J. HUSHON
J. OVERBEYJI
CONTRACT SPONSOR
CONTRACT NO,
PROJECT NO.
DEPT.
Council on Environmental Quality
Environmental Protection Agency
National Library of Medicine
CEQ7A010
15360
W-56
JUNE 1977
THE MITRE CORPORATION
METREK DIVISION
McLean, Virginia
-------
/
Department ApprpvaU >^V.^/v^. "fr •
MITRE Project Approval:
/}
-------
ABSTRACT
Under a joint contract with the Council on Environmental Quality,
the Office of Toxic Substances, Environmental Protection Agency, and
the National Library of Medicine, METREK has surveyed potential users
of chemical substance information and has analyzed the ability of
existing data bases, both Federal and private, to meet these expressed
needs. In order to provide information on chemical substances in an
optimal manner, METREK has proposed the development of a Chemical
Substances Information Network. This network is designed to contain
both core component systems, and external systems locatable through a
directory. Modifications to existing systems which meet user require-
ments are suggested as well as general specifications for new systems.
Strategies for management and implementation of the various components
of the network are also presented.
iii
-------
EXECUTIVE SUMMARY
Under a joint contract with The Council on Environmental Quality,
the Office of Toxic Substances of the Environmental Protection Agency
and the National Library of Medicine, METREK has surveyed potential
users of chemical substance information within EPA, the Federal estab-
lishment and other industry, educational and consumer action group
users. User requirements for information have been characterized with
respect to subject matter and application to existing legislated
authorities and new mandates under the Toxic Substances Control Act
(TSCA).
In addition, METREK has collected data on various systems which
could supply some of this requested chemical substance information in
the categories of Substance Identification, Production Aspects, Market-
ing, Exposure, Epidemiology, Biological Effects, Environmental Effects,
and Standards and Regulations. Information collected by the Council
on Environmental Quality as a part of their inventery of Federal Chemi-
cal Data Bases was supplemented with information on private as well as
additional Federal data bases related to chemical substances. The
data bases were then considered with.regard to their relevance and
breadth of coverage in each of the information categories. Those data
bases determined to be most important were then labeled primary systems
and were evaluated more fully in those categories where they might
supply useful information.
-------
Other METREK efforts involved the performance of a detailed
analysis of alternative approaches for satisfying user requirements.
The information available in each category was compared to the needs
expressed during the interviews and the applicable existing data
bases and need for additional data bases identified.
Based on those existing primary systems identified above as best
able to supply essential information, the specifications for a Chemical
Substance Information Network are presented. The Chemical Substances
Information Network is designed to contain both core component systems
and external systems, locatable through a directory file. The core
systems include The Chemical Data Bases Directory, the Chemical
Structure/Nomenclature System, the TSCA Chemical Data Systems (Pro-
prietary and Public), the TSCA Reports Management System, the Toxicology
Data System, the Chronic Testing Support System, the Bibliographic
Literature Scanning System, the Laboratory Animal Data System and the
Regulated Chemicals Standards System. The content, management and
time-phased implementation of these core components are considered.
Where existing data systems can be used directly or modified to provide
the basic needs of these core systems, they are presented. In those
areas essential to chemical evaluation and regulation, where informa-
tion does not exist or is inadequate, new systems are recommended for
development. In addition, where existing systems are found to be useful
but redundant, consolidations are suggested.
To investigate alternative strategies for establishing the Chemi-
cal Substances Information Network and its member data systems, three
vi
-------
scenarios for systems development are examined. These scenarios are
based on various TSCA implementation strategies and differ with regard
to the nature of the information requested from industry and the timing
of these requests. For each scenario, different systems development
options are presented due to the variance in dependence on external
files to supply data potentially obtainable under TSCA.
Volume II of this report contains the appendices to Volume I,
including detailed documentation of the user requirements interviews
and background data for each of the primary data systems.
vii
-------
TABLE OF CONTENTS
Page
List of Figures xii
List of Tables xii
1.0 INTRODUCTION 1-1
1.1 Scope of Work 1-2
1.2 Limitations of the Study 1-6
2.0 USER REQUIREMENTS FOR INFORMATION CONCERNING 2-1
CHEMICAL SUBSTANCES
2.1 Introduction and Approach 2-1
2.2 Scope and Limitations of the User Requirements 2-8
Study
2.3 Legislative Authority of Regulatory Agencies 2-10
in Controlling Chemical Substances
2.4 Integration and Prioritization of Individual 2-12
Requirements
2.4.1 Identification of the Functional Areas 2-12
2.4.2 Functional Groupings 2-17
2.5 Analysis and Integration of User Requirements 2-21
2.5.1 Prioritization of Requirements Integrated 2-21
Across All Users
2.5.2 Prioritization with Respect to TSCA 2-46
Authority
2.5.3 Prioritization of Requirements with 2-51
Respect to EPA's Strategy for Implementing
TSCA
3.0 EXISTING FILES APPLICABLE TO TSCA 3-1
3.1 Introduction 3-1
3.2 Criteria Used to Select Files of Maximum Useful- 3-3
ness
3.3 Characterization of Selected Systems 3-24
4.0 IDENTIFICATION AND EVALUATION OF DATA FILES CONSISTENT 4-1
WITH USER REQUIREMENTS
4.1 Introduction 4-1
ix
-------
TABLE OF CONTENTS (Continued)
Page
4.2 Substance Identification 4-1
4.2.1 Basic Identification Data 4-2
4.2.2 Chemical/Physical Properties 4-3
4.2.3 Composition Data 4-5
4.2.4 Compound Impurities 4-7
4.2.5 Chemical Analysis Techniques 4-8
4.3 Production Aspects 4-8
4.3.1 Production Quantity, Plant Location and 4-8
Manufacturer
4.3.2 Production Process and Control Tech- 4-10
nology
4.3.3 By-Products and Impurities 4-H
4.4 Marketing 4-12
4.4.1 Usage Information 4-13
4.4.2 Economic Information 4-14
4.5 Exposure 4-16
4.5.1 Occupational Exposure 4-16
4.5.2 Environmental Exposure 4-18
4.5.3 Consumer Exposure 4-19
4.6 Epidemiology 4-20
4.7 Biological Effects 4-23
4.7.1 Acute Toxicity 4-24
4.7.2 Chronic Toxicity 4-25
4.7.3 Metabolism 4-27
4.8 Environmental Effects 4-28
4.9 Standards and Regulations 4-29
4.10 Summary and Conclusions 4-31
5.0 DEVELOPMENT OF AN INTEGRATED RETRIEVAL SYSTEM 5-1
5.1 Background 5-1
5.2 Approach to Defining Systems Development Options 5-4
5.3 Long-Range Objective of a Comprehensive Chemical
Substance Information System 5-7
-------
TABLE OF CONTENTS (Concluded)
Page
5.3.1 Requirement for Integrated Computer Network 5-7
5.3.2 Individual Components of the Chemical 5-14
Substance Information Network
5.4 Supporting Rationale for the Recommended Network 5-27
Design
5.5 Data Base Administration Responsibilities 5-31
6.0 RECOMMENDED SYSTEMS DEVELOPMENT OPTIONS 6-1
6.1 Clarification of Scenarios and Their Systems 6-1
Development Implications
6.2 Scenario I Systems Options 6-4
6.2.1 Directory Development Recommendations 6-9
6.2.2 Nomenclature and Structure Development 6-9
Recommendations
6.2.3 Toxicology Data Systems Development 6-15
Recommendations
6.2.4 Exposure/Use Systems Development 6-15
Recommendations
6.2.5 Development Recommendations for Other 6-17
Systems
6.2.6 Limitation on Recommendations 6-19
6.3 Scenario II and III Systems Options 6-19
6.3.1 Scenario II Systems Implications 6-19
6.3.2 Scenario III Systems Implications 6-20
6.4 Other Considerations of Systems Development 6-21
Options
6.4.1 Systems Options, Their Compatibility and 6-21
Development
6.4.2 Time-phase Implementation of the Core 6-24
Component System
6.4.3 Compatibility of Component Systems 6-29
6.5 Network Development and Management 6-31
xi
-------
LIST OF FIGURES
Figure Number
2-1
2-2
5-1
5-2
5-3
6-1
6-2
6-3
Illustration of Data Required and
Associated Attributes
Venn Diagram of Chemical Substances
Recommended Long Term Chemical Substances
Information Network Concept
Data Involvement of Selected Regulatory
Agencies
General Scheme for the TSCA Chemical
Data System File Structure
Recommended Chemical Substances Information
Network Concept Given Scenario I
Potential Linkage Between Data Bases
Timing of Critical Events Associated with
Evolution of the Chemical Substances
Information Network
2-20
2-41
5-11
5-13
5-21
6-7
6-13
6-25
Table Number
LIST OF TABLES
Page
2-1
2-2
2-3
2-4
Chemical Information Requirements for 2-3
Environmental and Health Hazard Analysis
Legislative Responsibilities of Agencies 2-11
in the Control of Chemicals
Offices/Agencies and Their Functional 2-15
Activities
Information Requirements Integrated Across 2-23
All Users
xii
-------
LIST OF TABLES (Concluded)
Table Number Page
2-5 Information Requirements by Specifying 2-27
Agency
2-6 Requirements Integrated Across Functions 2-43
Within Categories for All Users
2-7 Information Requirements Integrated Across 2-47
EPA
2~8 Requirements Integrated Across Functions 2-53
Within Categories for EPA
3-1 Data System Scoring 3-6
3-2 Data Systems Applicable to Substance 3-25
Identification
3-3 Data Systems Applicable to Production 3-27
Data Systems Applicable to Marketing 3-28
3-5 Data Systems Applicable to Exposure 3-29
3-6 Data Systems Applicable to Epidemiology 3-30
3-7 Data Systems Applicable to Biological 3-31
Effects
3-8 Data Systems Applicable to Environmental 3-33
Effects
3-9 Data Systems Applicable to Standards 3-34
and Regulations
3-10 Source of Data and the Proprietary Status 3-35
of the Primary Systems
6-1 Selective Comparison of Structure 6-11
Searching Techniques
xiii
-------
1*0 INTRODUCTION
The Toxic Substances Control Act (TSCA) was signed by the
President on October 11, 1976 and became effective January 1, 1977.
This Act provides EPA with the authority to regulate chemicals in
commerce not covered by existing Federal regulatory authorities.
One of the main thrusts of the Act is that it provides for a vital
source of new data with which to assess the possible risks and bene-
fits of chemicals in the environment. Under TSCA, manufacturers,
processors, exporters, and importers are required to report on:
(1) information on new chemical substances proposed for commercial
production and selected new uses of existing substances, (2) annual
production activities for selected existing substances as listed in
EPA reporting regulations, and (3) health and safety data. "Trade
secrets" and other confidential information may be included and must
be protected against unauthorized disclosure.
In addition to the specific reporting requirements of the Act,
elements of EPA will require supporting information from a variety
of external existing sources for decision-making purposes, particu-
larly in developing regulations calling for testing or restrictions
on the manufacture, use or distribution of certain substances.
Furthermore, EPA has stated in its draft strategy document,
Assessment and Control of Chemical Problems, EPA, February 1977,
that information obtained under the Act will be made available as
promptly and widely as possible to enable other Federal, state and
1-1
-------
local agencies as well as the private sector to be utilized as fully
as practical In meeting the purposes of the Act.
Section 25(b) of TSCA requires the Council on Environmental
Quality to coordinate a study within 18 months of the feasibility of
establishing (1) a standard classification system for chemical sub-
stances and related substances, and (2) a standard means for storing
and for obtaining rapid access to information on these substances.
This study was undertaken in support of CEQ's responsibilities
as stated above and the responsibility of the Information Management
Unit of the Office of Toxic Substances, EPA to design and establish
an effective system for the retrieval of toxicological and other
scientific data as called for by the Toxic Substances Control Act.
This effort is also supporting the National Library of Medicine in
its requirements to assemble and make available information concerning
chemical substances.
1.1 Scope of Work
Task 1 involved the identification and characterization of
groups that have regulatory responsibilities to control toxic sub-
stances and/or concern with the general goal of protecting human
health and the environment from unreasonable risks presented by
chemical substances. Those groups considered included:
1. The Office of Toxic Substances (OTS) in EPA
2. Other EPA Headquarters Offices
3. EPA regional offices
4. EPA laboratories
1-2
-------
5. Other Federal agencies and departments
6. State and local government agencies
7. International organizations
8. Other interest groups (industry, universities, public and
private interest organizations)
METREK characterized the information requirements of these
users with respect to:
1. Subject matter (e.g., physical properties, production data,
toxic effects, etc.).
2. How the information relates to the TSCA mission.
3. Who would use the information and for what application
(preparation of the regulations, creation of criteria
documents, etc.).
4. Characteristics of data required to satisfy the need
(e.g., timeliness, volume, accuracy, quality).
5. When the information is needed and how rapidly.
6. The kinds of manipulations of the data required to pro-
duce useful information.
7. The form in which the Information need would be expressed
(telephone query, written request, etc.).
8. The form in which the need could be satisfied (ad hoc
report, annual report, on-line interactive retrieval,
etc.).
1-3
-------
For each user or user group, METREK considered the importance of each
requirement with respect to (1) the particular use or user for which
it is intended (e.g., early warning, research, monitoring, etc.) and
(2) EFA priorities under TSCA. Section 2 of this report presents
the findings concerning the user requirements task.
Task 2 involved the identification and evaluation of potential
information sources to satisfy user requirements. Specifically,
METREK created an inventory of existing files, both Federal and
private, containing applicable information concerning chemical sub-
stances. The results of these findings are discussed in Section 3.
METREK characterized the information activities of each source
with respect to:
1. Ownership (agency, public domain, private, foreign).
2. Who uses that information and for what purpose or applica-
tion (preparation of regulations, creation of criteria
documents, etc.).
3. Types of information — bibliographic or numeric.
4. Mode of retrieval (batch, on-line interactive, manual).
5. Subject matter (physical properties, production data, etc.).
6. Characteristics of data (timeliness, volume, accuracy,
quality).
7. Kinds of manipulation of data available.
8. Form in which requests for information must be expressed.
1-4
-------
9. Form in which information is disseminated.
10. Maintenance (Is the file being maintained now or is it a
"dead" file that still contains useful information? Doea
maintenance involve "updating" or "rebuilding"? Who main*-
tains the file, and who pays for maintenance? How often
is it done?).
Task 3 matched the user requirements identified in Section 2
with the evaluated existing systems identified in Section 3, and
clarified those areas where user requirements are not being met by
existing files. In Section 4, METREK demonstrated the need for new
files as a result of their user requirements analysis and character-
ized those files which should be established to satisfy TSCA's
requirements.
Task 4 involves the analysis of the results of the first three
tasks and the development of an integrated information systems plan
from a user requirements point of view. METREK inventoried existing
and proposed systems for linking the files identified in Tasks 2 and
3, evaluated their strengths and weaknesses in the context of the
user requirements analysis of Task 1, and made recommendations as to
various system development options. These systems are or are envi-
sioned as on-line interactive retrieval systems that could (1) link
directly with a series of computerized information files, and (2) direct
the on-line user to other external information sources, with or with-
out on-line access, that are not physically linked to the central file.
1-5
-------
Several levels of systems development are presented which depend
on the TSCA implementation strategy and the timing and nature of
data requests from industry. These scenarios provide the basis for a
discussion of location and structure of those files required by EPA
to fulfill its mandate under TSCA.
I-2 Limitations of the Study
The time frame for completion of the study was constrained by
previously scheduled activities of EPA which were dependent on the
output of this effort. The user requirements study, the identifica-
tion of existing systems, and their evaluation (Tasks 1, 2 and 3)
were completed in two months and Task 4, two months later. Conse-
quently, the number of interviews that could be conducted was dependent
on the available time and the funding allocated to this task by the
project officer.
The selection of the groups to be interviewed was determined
by the Government project officer and the METREK project officer.
In addition, the quality of the interview was dependent on the
representatives chosen by the various agencies/institutions to dis-
cuss their respective user requirements and use of existing data
systems. In some cases, the representatives felt they could not
address the total needs of this organization due to the size and
diversity of programs. In several cases, additional interviews
were held or further clarification was sought through telephone
interviews.
1-6
-------
The same limitations existed with respect to the quality of
information obtained concerning existing data systems. CEQ conducted
a survey of Federal data bases and some agencies contacted by CEQ
failed to return their questionnaires and knowledge of their systems
had to be gained through telephone interviews. The information that
was provided on the data systems varied greatly in its degree of
completeness. Again, efforts were made to obtain more information
about relevant systems.
1-7
-------
2.0 USER REQUIREMENTS FOR INFORMATION CONCERNING CHEMICAL SUBSTANCES
2.1 Introduction and Approach
In this Section user requirements are first discussed with
ragard to their need for information on chemical substances and
second the importance of each requirement is ranked with respect to
both the particular use of the data and EPA priorities for informa-
tion under their strategy for implementing the TSCA authorities. In
ranking the relative importance of information requirements, partic-
ular emphasis has been given to that information needed to support
the testing and pre-manufacturing activities under TSCA section 4
and section 5 respectively.
To identify the requirements for these types of data, a large
number of face-to-face interviews were conducted in a structured man-
ner with representatives of the EPA Office of Toxic Substances, other
EPA Headquarters Offices, EPA Regional Toxic Coordinators, EPA Labo-
ratories, other Federal agencies and departments, international
organizations, and other interest groups representing the viewpoints
of industry, universities and other groups in the private sector. A
list of the specific organizations contacted is presented in
Appendix A.
During the interviews, the representatives of each organization
were asked to characterize their specific responsibilities and to
describe on going or anticipated actions or programs in response to
their respective existing responsibilities and/or their responsibilities
2-1
-------
as mandated by the passage of TSCA. It was felt that by initially
obtaining a comprehensive understanding of the organizations' respon-
sibilities, we could better discriminate between solicitations for
information justifiable by specific identifiable functions performed
by the organization and those which were less relevant. On this
basis, specific information requirements and the characteristics of
these requirements could then be identified for each potential user.
Additionally, during the interviews, specific information sources
were identified which are used currently to satisfy the need for
data. The currently unmet information needs, and which of these
needs could be fulfilled by the authorities for information collec-
tion mandated by TSCA were also discussed.
User requirements were divided into nine general categories.
These include Substance Identification, Production Aspects, Marketing,
Exposure, Epidemiology, Biological Effects, Environmental Effects,
Standards/Regulations, and Managerial/Administrative. Within the broad
categories, requirements for specific data elements were defined. The
particular elements of each category are listed in Table 2-1. For each
of these categories of data, it is also necessary to determine the
characteristics of both actual or anticipated usage, including the
data's accuracy and currency, access frequency, access mode, retrieval
mode, application or purpose, relationship to TSCA mission and manip-
ulations required to enhance the data's utility.
2-2
-------
TABLE 2-1
CHEMICAL INFORMATION REQUIREMENTS
FOR ENVIRONMENTAL AND HEALTH HAZARD ANALYSIS
I. Substance Identification
A. Descriptive Identification
1. Nomenclature
a. CAS Registry Number
b. CAS Preferred Name
c. Synonyms
e. Trade Names
f. Wiswesser Line Notation
g. Other Codes
2. Chemical Structure/Form
a. Chemical Structure
b. Molecular Formula
c. Formula Weight
3. Composition
a. Methods of Determination
b. Impurities
(1) identification (same as I.A.I, and I.A.2.)
(2) detection limits
(3) percent
(4) source
B. Chemical Properties
1. pH
2. Reactivities
a. With Water
b. Oxidation-Reduction
c. With Acid
d. With Base
e. Photoreactivity
f. Nucleophilicity
g. Electrophilicity
h. Thermal
3. Dissociation Constants
a. Organic Bases
b. Organic Acids
C. Physical Properties
1. State/Color/Texture
2. Density
3. Index of Refraction
4. Melting Point
5. Boiling Point
6. Freezing Point
7. Flash Point
2-3
-------
TABLE 2-1 (Continued)
8. Volatility
a. Vapor Pressure
b. Vapor Density
9. Solubility
a. Water
b. Organic Solvents
c. Octanol/Water Partition Coefficient
10. Spectral Properties
a. Absorption Spectroscopy
(1) ultraviolet range
(2) visual range
(3) infrared (IR) spectroscopy
b. NMR Spectroscopy
c. Fluorescence Spectroscopy
d. Optical Rotation, Optical Rotatory Dispersion
or Circular Dlchroism
e. X-Ray Diffraction
f. Mass-Spectroscopy
11. Persistence (half-life)
a. Hydrosphere
b. Atmosphere
c. Lithosphere
d. Shelf-life
D. Methods of Identification
a. Suitable Analytical Techniques
b. Standard Protocols
(1) AOAC methods
(2) ASTM methods
(3) other methods
II. Production Aspects
A. Production Source
1. Name and Location of Manufacturers
2. Amount Produced by Site
3. Fraction of Production Lost
4. Process
5. Control Technology
6. By-Products
(1) identity
(2) amounts
(3) disposal methods
7. Impurities
B. Commerce
1. Annual U.S. Production
2. Annual U.S. Imports
3. Annual U.S. Exports
4. Annual U.S. Consumption
2-4
-------
TABLE 2-1 (Continued)
C. Shipping Procedures
1. Handling
2. Storage
3. Transport
4. Fire Danger Rating
III. Marketing
A. Uses
1. Amounts by Use
2. Trend Data
B. Users
1. Amounts by Use
2. Place of Use
C. Substitute Chemicals
D. Economic Information
IV. Exposure
A. Occupational
1. Total Work Force
2. Occupational Group
3. Duration and Frequency
4. Route of Exposure
B. Consumer
1. Food
2. Drugs
3. Cosmetics
4. Pesticides
5. Other Products
6. Exposure Rate and Duration by Route
C. Environmental
1. Air
2. Water
a. Surface
b. Ground
c. Marine or Estuarine
d. Drinking Water
3. Soil
4. Plants
5. Wildlife
V. Epidemiology
A. General Population
B. Occupational Population
2-5
-------
TABLE 2-1 (Continued)
VI. Biological Effects
A. Clinical Studies
1. Exposed Populations
2. Procedures
3. Results
B. Toxicology (Human/Animal)
1. Acute Toxicity
a. Study Characteristics
2. Sub-chronic Toxicity (experimental conditions)
a. Study Characteristics
3. Chronic Toxicity
a. Carcinogenicity
b. Teratogenicity
c. Mutagenicity
d. Other
C. Metabolism (Human/Animal)
1. Blood and Other Organ Levels
a. Parent Compound
b. Metabolites (with CAS numbers)
2. Excretion Rates
a. (as above)
3. Absorption (gut, skin, respiratory tract)
a. (as above)
4. Distribution
a. Organ/Tissue Sites
5. Chemical Interactions
VII. Environmental Effects
A. Degradation
1. Biodegradation
a. Organism
b. Products (with CAS number)
2. Chemical Degradation
a. Rates
b. Products
B. Environmental Transport and Fate
C. Ecological Effects
a. Effects on Vertebrates (birds, fish, amphibians, and
reptiles)
b. Effects on Invertebrates (annelids, arthropods, and
crustaceans)
c. Effects on Plants
d. Effects on Microorganisms
D. Materials Effects
E. Weather and Atmospheric Modification
F. Bioaccumulation/Bioconcentration
2-6
-------
TABLE 2-1 (Concluded)
VIII. Standards and Regulations
A. Federal Standards and Regulations
B. State Standards and Regulations
C. Local Standards and Regulations
D. Non-U.S, Standards and Regulations
E. International Standards and Regulations
2-7
-------
Upon the completion of the interviews, it was necessary to
determine whether the information requested by an individual was
actually required to fulfill his responsibilities. There was the
additional task of determining if the justifiable requirements could
be satisfied by some legislative authority other than TSCA so that
one could effectively prioritize these requirements with respect to
EPA priorities under their strategy for implementing TSCA.
2.2 Scope and Limitations of the User Requirements Study
So that the results of this user requirement analysis can be
viewed in the proper context, it is necessary to highlight some
specific considerations which were associated with this task:
• In determining user information requirements, the types of
information which could be obtainable under authorities in
addition to TSCA were considered. This was done so that a
more comprehensive characterization of requirements for data
on chemical substances could be developed to aid CEQ in per-
forming its requirement under section 25 of TSCA.
• The time period within which the requirements analysis was
to be completed was constrained by the timing of other EPA
on-going, related studies. For example, the results of the
requirements analysis were not only to provide input to the
second phase of this effort but also to EPA's activities
associated with developing the information system to handle
data being provided in response to TSCA.
2-8
-------
• In order to obtain an assessment of the user requirements of
industry and other interest groups including consumers, the
Government project officer directed METREK to meet with rep-
resentative groups such as industrial trade associations,
select public interest groups and a representative of the
university community. In some cases, the groups were respon-
sive and provided representatives who had considerable know-
ledge of the concerns and user requirements of the group they
represented, and in other cases, less information was obtained
during the interview situation. The list of interviewed
groups is not meant to be comprehensive and, in fact, could
not be, due to funding limitations, time constraints and the
impracticability of meeting with large numbers of similar
groups. The groups selected are representative of their con-
stituents and have provided a valid assessment of user require-
ments and existing sources of information which they presently
use.
• The specific policies, actions and assignment of responsibili-
ties of EPA were evolving during the time period in which the
interviews were conducted. The impact of this circumstance
is that the relative priorities of the identified user require-
ments while representing the most accurate determination at
this point in time, must not be considered as static but,
rather, might be subject to changes in their relative emphasis.
2-9
-------
It is unlikely, however, that major changes in user require-
ments will occur.
The remaining portions of this Section contain a summary dis-
cussion of the major features of the integrated requirements. A
summary of the content and conclusions from each of the individual
interviews is presented in Appendix B of Volume II.
2.3 Legislative Authority of Regulatory Agencies in Controlling
Chemical Substances
Although a number of Federal regulatory agencies are involved with
controlling chemical substances, their legislative mandates vary in
terms of the specific chemical substances involved, the stage during
the chemical life cycle (e.g., pre-manufacturing, production, trans-
portation, use, disposal) or the application (e.g., industrial, consumer,
commercial). It is difficult to set forth a definitive list of specific
jurisdictional involvements of the relevant agencies since there are a
number of overlapping jurisdictions. Moreover, policies for implementing
the legislative authorities change as the agencies analyze and clarify
their positions.
Table 2-2 presents an indication of the legislative responsibilities
for various types of chemicals by agency. Several of the legislative
authorities impose provisions requiring manufacturers, processors, and
distributors to maintain records and report various types of informa-
tion. Production information, health and safety data and environmental
effects data are examples. The passage of the Toxic Substances Control
Act provides for imposition of additional research, record-keeping and
2-10
-------
TABLE 2-2
LEGISLATIVE RESPONSIBILITIES OF AGENCIES IN THE CONTROL OF CHEMICALS
AGENCIES
FDA
CPSC
OSHA2
ERDA
DOT3
EPA4
DSDA
DOD
TYPES OF CHEMICALS
FOODS
X
X
X
DRUGS
X
X1
X
COSMETICS
X
X
PESTICIDES
X
X
X
OTHER
CONSUMER
PRODUCTS
X
X
X
INDUSTRIAL
X
X
X
RESEARCH
X
X
X
X
to
I
1) Child Resistant Packaging Regulations
2) Concerned with protection of workers exposed to all chemicals
3) Transportation regulations
4) Also responsible in terms of plant emissions and effluents for all types of chemicals
-------
reporting requirements which extend the Federal government's information-
gathering and regulatory authorities. The implementation of TSCA
provides the opportunity to coordinate the collection of information
among the Federal agencies regulating similar areas of chemical sub-
stances so timely and accurate information can be obtained with the
least possible burden on business and industry.
One purpose of this study is to identify specific, justifiable
Federal and private sector requirements for chemical substance infor-
mation which may or may not be addressable under existing legislation.
2.4 Integration and Prioritization of Individual Requirements
To aid in characterizing user requirements, a comprehensive under-
standing of the potential applications of this information is necessary.
When examining these applications and examining the budget categories
of the regulatory agencies, common functional responsibilities and their
chronological sequence could be identified.
2.4.1 Identification of the Functional Areas
Within EPA, and also to a large extent within other Federal regu-
latory agencies, the functional responsibilities of individual offices
fall among ten general categories. These functional categories include:
• Hazard Identification/Prioritization and Early Warning of
Potential Risks
• Hazard Analyses
• Research/Development
• Development of Decision Packages (Criteria Documents)
2-12
-------
• Preparation of Regulations and Guidelines
• Monitoring/Testing
• Enforcement/Compliance
• Information/Education
• Support to Other Agencies/Organizations
• General Administration and Management
The typical decision-making pattern involves initially identify-
ing a hazard, followed by a "hazard analysis" which in some cases
must be conducted within a short time period. In other cases, devel-
opment of testing protocols and research are necessary to adequately
assess the hazard to humans and the environment from exposure to chemicals.
A "decision package," examining alternative regulatory options, is pre-
pared once the hazards are clearly identified. This package is then
forwarded to an action group for decision-making concerning regulatory
resolution of the problem. Monitoring data may need to be collected and
analyzed to determine the extent of the exposure. -Should the deci-
sion be made to regulate the substance/item (be it label, ban, limit
or control the manufacturing, etc.) a comprehensive data gathering
activity occurs which includes a more thorough economic analysis of
the impacts associated with individual regulatory actions. In some
cases more research, monitoring and data analyses are required to
support the regulation preparation stage. Subsequently, compliance
and enforcement of the regulation is the primary functional activity
in conjunction with an evaluation component to determine the
2-13
-------
effectiveness of the regulation in reducing the risk to the public
and the environment from exposure to that chemical substance.
For characterizing and integrating the requirements of individ-
ual offices according to their application of the data, the above
mentioned functions have been used. No single office performs each
of these functional activities. Some offices, depending on their
respective mandate, perform several of the functions (e.g., research
and hazard analysis) in support of other agencies or groups. In
Table 2-3, the specific responsibilities of individual EPA offices
and other agencies are identified.
Hazard Identification/Prioritization involves selection from the
universe of chemicals of those with which the agency or group will be
concerned. This category includes the function of early warning which
attempts to restrict the total number of chemicals by calling atten-
tion to those which may have significant potential for risk. The
types of chemicals are different depending on the particular mandate
of the agency or group. For example, the Consumer Product Safety
Commission examines chemicals used in consumer products; the Food and
Drug Administration focuses on chemicals used in foods, drugs, and
cosmetics. Typically, the agency, using various criteria and various
types of data, selects a subset of chemicals for which there is
greater concern about the risk of exposure.
Hazard analysis includes surveying the literature and analyzing
health and environmental effects test data submitted in response to
2-14
-------
TABLE 2-3
OFFICES/AGENCIES AND THEIR
FUNCTIONAL ACTIVITIES
RELATED TO TOXIC SUBSTANCES
OTS
Regulation
Testing
Coordination*
Hazard Assessment
Special Actions
Pre-raanufac taring
Information Management*
Program Management*
OTHER EPA
Enforcement
Water and Hazardous Materials
Air and Waste
Research and Development
Regions
Laboratories
OTHER FEDERAL
FDA
OSHA
NIOSH
CPSC
DOC*
DOI
DOD
NIEHS
NCI
ERDA
Inter agency Testing Committee
NAS
DOT
NLM
INDUSTRIAL/TRADE ASSOCIATION/CONSUMER
SOCMA*
CSMA*
CUT
MCA
NRDC
Cons. Found.
Labor Unions*
UNIVERSITY
NYU
"5
3
4-1
>
flj 0)
2 o
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
1
§*^
•« s
51
1=1 <->
M
cQ fd
•oy
m *j
§M
0
•H ^
01
•H *J
U 0
Q 9
X
X
X
X
X
X
X
X
X
60
h 1-1
4J ^\
•H i-H
O G
S3
-------
mandates requiring testing of selected chemicals. It includes a pre-
liminary hazard assessment in response to a citizen's petition or a
substantial risk notification as well as the assessment made with respect
to a pre-manufacturing notice under TSCA. It further includes a limited
economic analysis of the impact of alternative regulatory options. In
performance of the activities of hazard analysis, similar compounds
are often structurally compared.
Research/Development includes conducting the fundamental research
necessary to define, measure and control the effects of chemicals, to
understand their biological interactions, and to provide a basis for
the elimination or reduction of the exposure to those chemicals which
are deleterious to human health and the environment. Test method
development and research on the applicability of various control tech-
nologies are also included in this function.
Development of "Decision Packages" (Criteria Documents) includes
the development of comprehensive documentation which serves as the
basis for a decision concerning the need to regulate a chemical. No
single group has complete responsibility for the entire function but
rather contributes component parts of such a package.
Preparation of Regulations, Standards, and Guidelines is defined
here to include the analysis of and selection from the various regula-
tory options available to the different agencies, (e.g., banning,
seizing, labeling, packaging requirements, controlling the exposure
limits, etc.)- The function of the development of regulations is
2-16
-------
consistent with the need for a comprehensive data package which will
document and substantiate the recommended regulatory strategy.
Monitoring/Analyses includes monitoring and subsequent raw data
analysis of chemical concentrations in the air, water, and soil. Also
included in this functional activity is the analysis of epidemiological
studies to identify the effects of exposure on human health and other
species*.
Enforcement/Compliance involves enforcing compliance with the
particular laws that the agencies administer. It includes compliance
monitoring to identify violators, laboratory analysis Co substantiate
violations, and compilation of evidence to support legal action when
violators are found.
It is further recognized that three other functions exist for
which requirements for information could be identified. These include
the function of Information and Education**, Support to Other Agencies/
Organizations, and General Administration and Management. However, the
decision was made not to include these functions in this effort since
they did not impose unique data requirements separate from those already
identified for program responsibilities.
2.4.2 Functional Groupings
When these functions are analyzed, they can be grouped into
three categories which have common data requirements or data
*It'is.recognized that many of the epidemiological studies and/or systems
are also used for purposes of hazard identification and establishing
program priorities.
**The function of Information and Education incorporates the critical
activity of making the chemical information data bases available to the
scientific and academic communities for further enrichment and confirmation.
2-17
-------
attributes, and common characteristics with respect to the time frame
within which actions are required.
The first category includes the function of Hazard Identifica-
tion/Prioritization of chemical substances. When conducting these
activities, all chemical substances must be considered, the time
frame is typically long (or not a constraining parameter) and the
information need not be highly specific or detailed.
The second category of functions are associated with actions
which often occur in response to an external stimulus such as notifica-
tions of intent to manufacture a new chemical, substantial risk, im-
minent hazard or citizens petitions. Typically, the identity of the
chemical substance cannot be anticipated and the time frame within
which actions occur is generally short. The data must be sufficiently
specific and accurate to permit a fairly comprehensive assessment of
risk. It must also be defendable with respect to possible resulting
litigation. The functions in this category include Hazard Analysis,
Preparation of Decision Packages, Monitoring/Testing, Preparation of
Regulations, and Enforcement/Compliance.
The third category includes the same functions as the second
category with the additional activity of conducting Research and
Development. However, the characteristics of the data needed to
support functions for this category differ from the characteristics
of Category II. The distinction is that the particular chemicals
for which these functions are being performed are those which were
2-18
-------
identified either as a result of the Hazard Identification/Prioritiza-
tion process or those chemicals which were identified through imminent
hazard notifications or pre-manufacturing notices for which additional
assessment and review is required. The time frame available for
actions with respect to Category III functions is considerably longer
that that associated with Category II functions. The data developed
for supporting these functions must also be more defendable (i.e.,
accurate) than that required for Category II functions.
The functions of Information and Education, Support to Other
Agencies, and General Administration are not included in this effort,
although vital and essential, since they are not a direct part of the
regulatory chain of events.
The differences between these three categories are illustrated
conceptually in Figure 2-1. At the highest level, Category I,
there is a requirement for the least specific information for the
largest number of chemical substances. At lower levels, Categories
II and III, information is required for fewer chemical substances
but the need arises for data in additional categories and for more
specific and accurate information within each category.
It should also be noted that there is a normal progression from
Category I to Category III activities. There may also be a progres-
sion from Category II to Category III depending on the type and ade-
quacy of the regulatory action selected in the Category II regula-
tion step.
2-19
-------
FIGURE 2-1
ILLUSTRATION OF DATA REQUIRED AND ASSOCIATED ATTRIBUTES
2-20
-------
2.5 Analysis and Integration of User Requirements
It is within .this context that the requirements for information
concerning chemical substances are discussed. First, the categorized
information requirements are integrated across all users. Next, they
are integrated across EPA users. Finally, the categorized require-
ments are integrated across EPA users according to specific priorities
identified in the EPA strategy for implementing TSCA as reflected in
Assessment and Control of Chemical Problems - "An Approach to Imple-
menting the Toxic Substances Control Act"; Environmental Protection
Agency, February, 1977.
2.5.1. Prioritization of Requirements Integrated Across All
Users
In Table 2-4, the requirements for information within each cate-
gory, together with the requisite attributes of the data, are listed
as they relate to each function of responsibility for all users. The
scoring shown on this chart reflects the responses obtained from
representatives of these agencies during the interviews. When a cate-
gory or item is blank, there is no justifiable requirement cited. In
a few select cases, certain data elements listed in Table 2-1 have
been either eliminated or combined to form those listed in Table 2-4.
The source of the requirement is identified in Table 2-5.
2.5.1.1 Requirements Associated with Category I Functions In
conducting an initial screening of all chemical substances to identify
a restricted set of substances for which a more detailed examination
2-21
-------
PAGE NOT
AVAILABLE
DIGITALLY
-------
will be conducted, there is a consensus among the regulatory agencies
In their requirement for selective Substance Identification informa-
tion for a large number of chemical substances updated on an annual
basis.
As can be seen by examining the left column of Table 2-4, the
number of data elements required within this category is limited.
However, those which are required must be available for a large number
of chemicals. To have this information on only a restricted set of
substances would severely restrict their ability to conduct meaningful
hazard identification, early-warning, and prioritizatlon in any sys-
tematic manner. Without this information, substances of similar
molecular structure could not be grouped.
Beyond Substance Identification data, requirements for additional
categories of data vary somewhat according to the mandate of the re-
questor and the approach employed in conducting initial screenings of
all substances. The most frequently cited requirements are those for
Production, Use, and Exposure information. This consensus was supported
by EPA, NCI, OSHA, NIOSH and the Interagency Testing Committee.
In conducting initial screening, the above agencies have require-
ments for data on the quantity of each substance produced. For most,
this need can be adequately met by range type of data, indicating the
total amount produced. For this reason we have indicated that there
is a requirement for summary data (least specific). This does not
imply, however, that a lack of accuracy is acceptable. In the
instance where a large number of manufacturers are engaged in the
2-35
-------
production of a specific substance, highly accurate information must
be obtained from each manufacturer to avoid a highly imprecise total
when the individual production quantities are aggregated. Information
regarding the amount produced by "small manufacturers"* must also be
obtained to ensure the accuracy of aggregated statistics.
Within EPA and OSHA there is a justifiable requirement to obtain
site specific production data to be used in the initial prioritization
of substances. The EPA regions, in particular, stated a requirement
for this data for establishing resource allocation priorities in a
predictive rather than reactive manner. Aggregations of the amount
produced on a geographical or corporate entity basis will not satisfy
their requirement. Similarly, OSHA requires site specific production
information for establishing priorities for executing its responsibil-
ities.
General indications of changes in the production process or
technologies for controlling emissions and effluents resulting from
chemical production are required as the state-of-the-art evolves.
This information is used as an early warning indicator of a potential
new hazard.
Information regarding the usage of substances is required in
general categories sufficiently specific to permit the identification
of new usages. Baseline usage data with amounts are needed to assess
significant new usages.
*Currently, EPA is engaged in developing a quantifiable definition
of this term.
2-36
-------
In addition to production and usage information, data on the
workforce exposed to substances during their manufacture, and environ-
mental and consumer exposure are required as an initial indication of
the extent to which humans are exposed to the substance. An aggre-
gated national figure updated annually will satisfy the requirements
cited by EPA, OSHA, NCI, and the Interagency Testing Committee.
With the exception of information on changes in production
processes, control technologies and site specific production informa-
tion, it is required that the above information be accessible in an
interactive mode to facilitate the screening process. Non-automated
access to information regarding changes in production processes and
control technologies is adequate. However, due to the large amounts
of data associated with site specific production information, it is
recommended that this data should also be automated to facilitate
updating and maintaining the currency of that information.
Having restricted the total number of chemical substances from
the thousands which exist to a limited number of perhaps a few hundred,
additional information is required for the remaining substances to
enable a secondary screening to identify particular substances for
which a detailed hazard analysis will be conducted. For substances
selected as a result of the initial screening, both additional and
more detailed data are required.
In the Substance Identification category, information (in addition
to that previously cited) is required on the chemical and physical
2-37
-------
properties of each substance. There is no requirement, however, that
any of the information be automated - standard reference handbook texts
are adequate for physical properties data. Chemical property data,
however, is not currently available in easily retrievable and updated
form. Automation of these two types of data, however, would greatly
facilitate its access. Composition data for chemical substances is
also required and, except for product composition data required by
certain regulatory agencies, as mentioned above, does not need to be
automated.
General descriptions of the particular production process employed,
control technologies available and resulting by-products is required
in addition to the information previously cited in the Production
Aspects category. It is required that this information be updated as
significant changes occur. Automation of this data is not required,
but might be desirable to facilitate access.
The total quantities associated with each use and user category
are needed along with summary economic information from the Marketing
category. Specific workforce exposure by occupational group and
consumer, and environmental exposure data are required together with
data on media-specific concentrations, environmental persistence,
and transport and fate to further assess potential exposure threats.
Summary Epidemiology and Biological Effects information are needed for
determining human health effects. Finally, information regarding
existing Standards and Regulations is required. Except for biological
data, it is not required that this additional data be automated.
2-38
-------
2.5.1.2 Requirements Associated with Functional Categories II
and III. When dealing with chemical substances in Category II whose
identity is unanticipated until a request such as a pre-manufacturing
or imminent hazard notification is received, or even the priority
chemicals (Category III), requirements for substance identification
are generally similar to those cited for Category I activities. For
both Categories II and III, there is a requirement for information
regarding impurities present in the marketed grade of the substance
to aid in the evaluation of potential human health and environmental
effects. For the same purposes, there is a requirement to know the
place of use of the substance. Substance substitute information is
necessary for identifying the condequences of alternative regulatory
options from health and economic aspects. The requirement for inter-
active access capability is much stronger for Category II than for
Category III, however, due primarily to the shorter time within which
these functions must be performed*. The need for interactive systems
for Category II chemicals can be further justified by the increased
requirement for data manipulation and correlation capabilities to
facilitate hazard analysis and decision making.
However, it is important to realize that while, with the above
exceptions, no major difference occurs for the data required for
Categories II and III as opposed to Category I, a major difference does
*Normally, policy decisions for pre-manufacturing are required within 10
days according to the EPA/OTS Strategy Document. This can be extended
up to 90 days when a detailed analysis is required. In this instance
the remaining functions would be conducted as Category III functions.
2-39
-------
exist with respect to the chemical substance for which those data are
required. This difference is illustrated by a Venn diagram, Figure 2-2.
The large circle represents all chemicals. The circle labeled A
represents those Category I chemicals for which secondary screenings
of hazard identification are performed. The circle labeled B repre-
sents those unanticipated chemical substances (i.e., Category II)
identified through pre-manufacturing notices and substantial risk
notifications. The circle labeled C represents selected priority
chemicals (i.e., Category III) for which detailed hazard analyses are
performed. While systems can be designed to handle the large set of
data (i.e., the union of circles A, B, and C) which are responsive
to time frames associated with these functions, resources must be
provided for analyses and interpretation of the data to adapt it
for the purpose of regulatory decision-making.
2.5.1.3 Summary of Requirements By and Across Functional
Categories. In developing an information system to maintain the re-
quired data and to be responsive to the access characteristics of all
users, it is unnecessary to consider the functional application of
information, given that all applications are considered as being valid
and must be satisfied. The implication of this is that the charac-
teristics of any data category or specific item which represent the
most stringent requirement or demand on a system's capability become
the system design parameter.
Table 2-6 was developed to aid in evaluating the degree to which
existing data sources and systems could be utilized for satisfying
2-40
-------
FIGURE 2-2
VENN DIAGRAM OF CHEMICAL SUBSTANCES
2-41
-------
PAGE NOT
AVAILABLE
DIGITALLY
-------
information requirements of all users. The data characteristics of
this table represent, both the summary of individual functional cate-
gories and the system design parameters (integrated across functional
categories), to identify the most stringent requirement.
For certain data categories there is a requirement for accessing
data in an interactive mode. This requirement exists for Substance
Identification (molecular formula through chemical structure), Pro-
duction Aspects (site specific production quantity), Marketing (users
with amounts and uses with amounts), Exposure (workforce, air and water,
environmental and consumer), and Biological Effects data. In general,
the greatest degree of specificity is required for these items. These
requirements arise partially from the need for a capability to manip-
ulate, within short periods of time, large volumes of data associated
with many chemical substances. The requirement also arises from the
need to review, assess, and summarize biological activity data indi-
cating tests conducted, method of testing utilized and summary abstracts
of the results.
For several other data categories, computerization of data,
although unjustifiable by cited requirements, would enhance the utility
of the data functions associated with developing pre-manufacturing de-
cisions and responding to unanticipated substantial risks. Such cate-
gories include physical and chemical property data and environmental
degradation and bioaccumulation. For example, it would be useful to
develop computer files of baseline information on chemical and physical
2-45
-------
property data so that correlations with biological activity data can
be assessed for use in predicting biological effects of new sub-
stances. The EPA Strategy Document has stated that response to pre-
manufacturing notices must be made in a very short time. Therefore,
systems for assessing the completeness of a pre-manufacturing notice
must be developed as well as a system to assist in the analysis of
the data submitted. The development of similar analytical techniques
will be required to assist in the review of testing data.
2.5.2 Prioritization of Requirements with Respect to TSCA
Authority
Since the results contained in Table 2-5 were derived by inte-
grating requirements from all offices with EPA with those from other
Federal agencies, it is possible that requirements not directly
related to TSCA functional responsibilities are the main driving force
in determining the data items and their associated characteristics.
To aid in examining the extent to which this situation had occurred,
Table 2-7 was constructed by integrating over only those EPA offices
which have, or will have, a direct connection with implementing TSCA
responsibilities.
As can be seen by comparing these two tables, relatively few dif-
ferences exist either in the data required, the functional usage of
it, or in the attributes of the individual items: the requirement for
workforce exposure by occupation, the requirement for economic infor-
mation in Category I, and the request for Biological Effects data in
support of the Research and Development function are eliminated from
2-46
-------
PAGE NOT
AVAILABLE
DIGITALLY
-------
the EPA table. Nowhere in EPA was there a cited requirement for
information on the identify of individuals capable of providing expert
witness testimony. It would appear, however, that this would be a
justifiable requirement.
Table 2-8 was developed to aid in evaluating the degree to which
existing data sources and systems could be utilized for satisfying
information requirements prioritized with respect to the TSCA author-
ity. As before, the characteristics of the data items represent, both
for each functional category and across categories, the most stringent
requirement in terms of systems capabilities. In comparing Table 2-6
and 2-8, no major differences can be found.
2.5.3 Prioritization of Requirements With Respect to EPA's
Strategy for Implementing TSCA
As stated in its approach to implementing TSCA, EPA has divided
the activities it will be conducting under TSCA into four major func-
tional areas and several supporting areas, all of which are inter-
related. The major functional areas are:
1. Acquisition of Information and Assessment of Risks to
Health and the Environment;
2. Necessary Control of New Chemicals through TSCA Authorities;
3. Necessary Control of Existing Chemicals through TSCA
Authorities; and
4. Dissemination of Information and Assessments to Other
Programs and Interested Parties.
Supporting activity areas include the conduct of research, assistance
to interested parties and implementation of TSCA procedural aspects.
2-51
-------
PAGE NOT
AVAILABLE
DIGITALLY
-------
During the initial three years of TSCA implementation, EPA has
assigned top priority to the following operational activities:
establishment and implementation of a Pre-manufacturing Review System;
establishment of initial testing requirements; regulatory actions to
control a limited number of environmental problems associated with
existing chemicals; and assessment and control of unanticipated prob-
lems of urgent concern. With respect to collecting information in
support of these top priority activities, it is the policy of EPA to
gather data on a highly selective basis to serve specific purposes.
Confidentiality considerations are to be a major factor influencing
data collection, use and dissemination activities and strategies. In
selecting priorities among the potential environmental problems, EPA
has established the following principles:
• National or global toxic substance problems receive
priority over localized problems,
• Human health effects of toxic substances receive special
attention , and
• Discharges into the environment of substances in significant
quantities or those wich persist and/or bioaccumulate are
of particular concern.
In light of these priorities, as set by EPA in its implementation
strategy for TSCA, requirements for data to support pre-manufacturing
review, development of testing requirements and regulatory actions for
Recognition is given to ecological impacts that affect human health,
2-55
-------
priority selected chemicals and unanticipated problems could be ranked
by relative importance. When this is done, there is no change in the
data items or their characteristics from that of Table 2-7. This
finding should be differentiated from any determination regarding the
provisions of that implementation strategy to satisfy these needs.
2-56
-------
3.0 EXISTING FILES APPLICABLE TO TSCA
3.1 Introduction
METREK has attempted to assemble complete information on as many
files containing data relevant to toxic substances assessment as
possible.
Under their mandate in section 25(b) of TSCA to study the feasi-
bility of establishing a standard means for storing and for obtaining
rapid access to information concerning toxic substances, the Council
on Environmental Quality (CEQ) conducted a survey of Federal data
bases. In order to locate these data bases, CEQ combined the results
of two previous environmental data system surveys: the "Study of
Environmental Quality Information Program" prepared by EPA in 1971
but never published, and the "Survey of Environmental Data Systems:
prepared in 1974 by GAO. In early 1977, the heads of the relevant
agencies were then sent lists of systems attributed to their agency
along with a two-page questionnaire to be completed on each system.
Extra questionnaires were also included to cover new systems. When
necessary, a follow-up was performed by CEQ. It was discovered that
some of the systems for which information was sought were no longer
in existence or had been incorporated into other existing systems.
Two hundred twenty four completed questionnaires were made avail-
able to METREK by CEQ. Where information on a given system proved to
be inadequate, a telephone call was placed to the person designated
as a contact for that system to provide additional clarification.
3-1
-------
More Federal data systems were also uncovered during the inter-
views described in Section 2 of this report. These were then followed
up and a questionnaire completed for them. The Directory of the Con-
gressional Referral Center, Library of Congress "Federal Information
Sources and Systems" also provided information on approximately twenty
additional Federal data systems, bringing the total number of Federal
systems to 239.
METREK included 55 private and foreign as well as the Federal
systems in its inventory. A number of files applicable to toxic sub-
stances are available on private systems and are heavily used by both
the Federal and private sectors. Many of the private data systems
contain large numbers of files covering varied subject areas. This
means that the data held in these systems generally provides a broader
spectrum of information than that in the Federal data bases. They
also have the advantage of being available to anyone willing to pay
for the services.
Through this searching, METREK with the aid of CEQ, has attempted
to assemble the maximum amount of data on all aspects of chemical
substances. Much of the material collected in this initial compila-
tion was duplicative or very highly specialized in nature. The sub-
sequent sections of this chapter describe how a narrowed list of files
was selected which it was felt would fulfill the information required
to support TSCA-dictated activities and similar activities in other
Federal agencies with mandates to regulate toxic chemicals. The types
3-2
-------
of Information required to fulfill the various TSCA-related data needs
identified in Section 2 are discussed and the data systems most capa-
ble of supplying that information are identified.
3.2 Criteria Used to Select Files of Maximum Usefulness
In order to design an efficient data management system for infor-
mation concerning chemical substances, it was necessary first to
determine which of the existing Federal and private data systems
could provide useful data. This condensation of files was accomplished
in several stages.
First, the 260 files containing information in one or more of the
eight toxic substances categories explained in detail in Section 2
were segregated from those 34 files described in Appendix C which were
considered irrelevant. All files containing information pertinent to
toxic substances were retained.
In order to further limit the number of files needed to supply
relevant information, a dual scoring methodology was developed to
better characterize the individual data files. The first element of
the score denotes the importance of the information to toxic substances
research and regulation. This "importance factor" varies from a high
of "1" to a low of "4".
The methodology was subjective, and in some cases scoring was based
on insufficient information about the system. Efforts were made to
obtain adequate knowledge of systems in order to make valid Judgment,
and when in doubt systems were included until the second stage of
the project when more specific attention will be given to the feasi-
bility of systems integration.
3-3
-------
The second element of the score is a measure of the value of the
data and is determined by the following criteria:
• The number of records contained in the system;
• The specificity of the information;
• The extent to which the data were evaluated;
• The ease in accessing the data by both system and subject; and
• The breadth of coverage by the information in the data system.
The "value factor" was scored from a high of "a" to a low of "d". For
example, BIOSIS, a bibliographic file, received the highest possible
score, (la), for information on Exposure, Epidemiology, Biological
Effects, and Environmental Effects. This was predicated on the rele-
vance of these categories of data to toxic substances assessment
(earning a "1") and the exceedingly large number of records in the
system, the extensive and in-depth coverage of all biological topics,
and the ready availability of referred journal citations in a. com-
puterized file (adding up to a score of "a"). Substance Identification
was given a score of 2b for BIOSIS, because chemical and physical char-
acteristics of compounds or information other than common names are
not expected to be found in the BIOSIS files. On the other hand, the
Chemical Information System contains extensive data on Substance
Identification, making chemical identification possible from varied
inputs. The score of "la" was based on these highly specific and com-
prehensive records, including mass spectrometry data, CAS registry num-
bers, Wiswesser line notations, X-ray diffraction patterns, CNMR values,
and two-dimensional representations of molecules.
3-4
-------
Each data base was ranked according to this methodology in each
of the eight subject categories. The binary scores awarded to each
data system are included in Table 3-1. Some systems received a high
rating in several areas, some in only one, and some not at all. A low
score implies that the system does not contain data of the highest
value to TSCA-related activities.
Two additional columns have been included in Table 3-1 which pro-
vide supplemental file characteristics. One shows whether the system
is manual or automated, while the other indicates the data base owner-
ship. These facts are useful in determining the ease of accessing
the information.
Based on the binary ranking scheme, it was possible to select
those data bases of highest applicability to TSCA-rrelated activities.
All data systems receiving a minimum score of "Ib" in any data cate-
gory were selected for further consideration. These files are con-
sidered to be of primary importance and are designated by an asterisk
in Table 3-1.*
In a number of cases it was discovered that individual files were
completely contained and accessed from a major system. For example,
AEROS contains NEDS, SAROAD, EDS and HATREMS all of which contain
information applicable to toxic substances. Only AEROS was desig-
nated on Table 3-1 as a primary system because the subsystems were
available through it. It was also discovered that some identified
systems were merely specific subject area subfiles of other systems.
For example, CANCERLIT and CANCERPROJ are subfiles of CANCERLINE.
As above, only CANCERLINE was designated as primary.
3-5
-------
TABLE 3-1 DATA SYSTEM SCORING
SYSTEM
•Advisory Center on Toxicology
•Aerometric and Emission Reporting System
•Agricultural On-Line Access
Agricultural Research Service
•Air Pollution Technical Information Center
Air Quality Implementation Planning Program
American International Traders Index Register
American Statistical Index
Animal History Data System
•Annual Survey of Injuries and Illness
*Annual Survey of Manufacturers
AP1UT
Amy Chemical Information and Data System
Association of Data Base Producers
*Astro-4 Drug Information System
•Atlas of Cancer Mortality
ACRONYM
AEROS
AGRICOLA
APTIC
AITR
ASI
ADP
OWNER
NAS/NRS
EPA
NAL/USDA
ARS/USDA
EPA
EPA
DOC
Cong. Info.Serv.
FDA/HEW
BLS/DOL
DOC
Am. Petrol. Inst.
Army/DOD
Asso. DBF
FDA/HEW
HCI/HIH/HEW
DATA TYPE
55
w
uo
O M
E/J Z
is
i
lb
2b
2b
2b
3c
2c
2b
3c
-
2c
3c
2b
-
la
-
§
ss
g w
as w
II
-
lb
2b
2c
3b
2b
2b
-
-
lb
lb
2b
-
-
lb
-
„
i
in
3c
-
la
-
-
-
2b
-
-
-
3c
2b
-
-
2a
-
i
g
S
IV
2b
la
la
3c
2a
2b
-
-
-
-
-
2b
-
-
2c
-
X
s
s
V
la
-
-
-
-
-
-
-
-
la
-
-
-
-
-
lb
$
g w
3u
M
O fa-
as
VI
la
-
-
-
-
-
-
-
3c
-
-
-
-
-
-
-
3
i
K en
SH
U
H H
IE
VII
la
-
la
-
la
.
-
-
-
-
-
-
-
-
-
-
_
2
a o
3 H
11
CO K
VIII
la
-
-
la
2b
-
-
-
-
-
-
-
-
2a
-
as
o
o
N
M
g
||
sl
§5
M
c
c
c
c
c
c
c
c
H
c
c
C/M
C
C
M
-------
TABLE 3-1 (Cont'd.) DATA SYSTHS SCOR1BG
SYSTEM
*Blological Sciences Information Service
*Biological Sciences Information Service
*Blonedical Studies Group
Bird Toxicity & Repellency Data Base
Boston Collaborative Drug Surveillance Program
*Cancer Infomatlon On-Line
CANCERLIT
Cancer Projects
Carbon-13 Nuclear Magnetic Reasonance
Spectral Search System
Carcinogen Use Registry
*Carcinogenesls Bioassay Data System
Catalog of Information on Water Data
*Census Bureau Foreign Trade Statistics
•Centre Information de Securite
•Census of Manufacturers
CG-388 Chemical Data Guide for Bulk Shipment
by Water
Chemical Abstracts Condensates
ACRONTH
BIO-STOBET
BIOSIS
CAHCEKLINE
CANCEBFBOJ
SOfR
CBDS
CIS
CA-COH
OWNER
EPA
Biosciences
Info. Service
EPA
TWS/DOI
Boston Univ.
NCI/NIH/HEW
NCI/BIB/HEW
NCI/NIH/HEU
NIH/EFA
HIH/HEW
NCI/NIH/HEW
USGS/DOI
Census /DOC
Intl. Labor
Office, Zurich
Census /DOC
USCG/DOT
Aner.Cben.Soc.
DATA TTPE
I
2b
2b
2b
2b
3c
2a
2b
2b
la
2b
la
3c
2b
3c
2b
2c
2b
II
-
_
la
-
-
2b
-
-
_
-
-
-
la
-
la
—
2b
III
-
_
la
3c
-
-
-
-
_
-
2c
-
-
-
_
-
IV
-
2a
la
-
2a
la
-
-
-
-
2c
-
-
-
-
3c
-
V
-
la
la
-
-
la
-
-
-
-
-
-
-
2a
-
_
-
VI
3c
la
la
3c
2b
la
2b
la
_
-
la
-
-
-
-
__
3c
VII
la
la
la
-
-
-
-
-
-
-
-
2c
-
-
-
_
-
VIII
-
„
-
-
-
-
-
-
-
-
-
-
-
2b
-
—
-
(C)(M)
C
c
H
C
C
C
C
C
C
C
c
H
M
H
C
M
C
CO
1
-------
TABLE 3-1 (Cont'd.) DATA SYSTEM SCORING
SYSTEM
*Chemical Abstracts Service Chemical
Registry System
*Chemical Abstracts Service Information System
Chemical-Biological Data Base for
Herbicidal Information
Chemical Data Center
*Chemical Dictionary of the O.S. International
Tradf Commission
*Chemical Dictionary On-Line
*Chemical Economics Handbook
Chemical Hazard Response Information System
Chemical Industry Notes
^Chemical Information Data System
*Chemical Information System
*Chemical Monograph Referral Center
Chemical Hut agenesis: A Survey of the 1971
Literature
*Chemical Names File
Chemical Structure Index
Chemical lexicological Data Retrieval System
ACRONYM
CHEMLINE
CHRIS
CIS
CIDS
CIS
CHEMKiC
3KNL/EMIC-2
2SI
OWNER
Amer.Chem.Soc.
Amer . Chem. Soc .
Anoy/DOD
Chem. Data Ctr.
U.S. IIC
NLM/NIH/HEU
SRI
USCG/DOT
Fred leasts/
Chem. Aba. Serv,
Amy /DOB
H3B/EPA
CPSC
EMIC/ORNL
NCI/NIH/HEW
ISI
FWS/DOI
DATA TYPE
I
la
la
2a
Ib
Ib
la
2b
2b
-
la
la
la
2b
Ib
2a
4d
II
^
2a
-
2b
la
-
la
-
-
-
-
-
_
-
-
-
Ill
_
2b
-
-
2b
-
la
-
la
-
-
-
^
-
-
-
IV
_
2b
-
-
_
-
-
3c
-
-
2b
-
_
2b
-
-
V
_
-
_
-
_
-
-
-
-
-
-
-
_
-
-
-
VI
_
-
3c
-
_
-
-
-
-
-
-
-
2b
Ic
-
4c
VII
_
-
21i
-
_
-
-
-
-
-
- -
-
_
-
-
-
VIII
3b
_
-
Ib
-
-
-
-
-
-
-
_
-
-
-
(C)(M)
M
C
C
C
C
C
M
C/M
C
C
C
C
M
C
M
C
CO
I
00
-------
TABLE 3-1 (Cont'd.) DATA STSTEH SCORING
SYSTEM
*Cheaical Transportation Emergency Center
Chemistry and Effects of Blocldes In
Aquatic Systems
Chemistry Data System
Chicle. Embryo System
*Clinical Toxicology of Commercial Products
Clintox Literature System
Combination Chemotherapy Master File
Compendium of Toxicology
Compliance Data System
^Component Information for Chemical
Consumer Products
Computerized Engineering Index
Comprehensive Dissertation Index
Conformational Analysis of Molecules in
Solution
•Congressional Information Service Index
*Congressional Record Abstracts
Cosmetics Information System
ACRONYM
CHEMTREC
CTCP
MMPENDEX
3)1
CAMSEQ
CIS Index
CRECORD
OWNER
Mfg. Chan. As so.
ESIC/ORNL
FDA/HEW
FDA/HEW
U. of Rochester
CDC/HEW
NCI/NIH/HEH
AFIP/DOD
EPA
CPSC
Eng. Index, Inc.
Univ. Microfilms
International
KIH/EPA
Cong.Info.Serv.
Inc.
Capitol Service:
FDA/HEW '
DATA TYPE
I
Ib
2b
3c
3c
2b
2c
3c
Zb
3c
la
-
2b
Ic
-
-
2b
II
-
_
-
-
2c
-
-
-
3c
3c
3b
-
_
-
-
2c
III
-
_
-
-
-
-
-
-
-
_
-
-
.
-
-
2c
IV
-
2a
2c
-
-
-
-
2a
2c
3b
-
2b
-
-
-
V
-
2b
-
~
-
2c
-
-
-
_
-
2b
-
-
-
VI
2b
2b
-
3b
Ib
2c
3c
2b
-
_
-
2b
-
-
2b
VII
2b
2a
_
-
-
-
-
-
-
.
- .
2b
-
-
-
VIII
_
_
-
-
_
-
-
2a
.
-
-
la
la
3b
(C)(M)
M
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
ca
•o
-------
TABLE 3-1
-------
TABLE 3-1 (Cont'd.) DATA SYSTEM SCORING
SYSTEM
Drug Experience Information System
Drug Experience Reports
*Drug Registration & Listing Systea
Drug Research & Development Biological Data
Drug Research & Development Chemical
Information Bibliography File
*Drug Research & Developnent Chemical
Information System
*Dun'a Market Identifiers
Effluent Data Systea
EIS Industrial Plants
Emissions Data System
Energy Data System
Energy Information
Energy Line
Energy Research and Development Inventory
Environmental and Health Aspects of Selected
Organohalide Compounds
Environmental Chemical Data and Information
Network
Environmental Contaminant Evaluation Program
* Environmental Contaminant Monitoring Program
ACRONYM
EMI
EDS
EDS
ECDIN
OWNER
FDA/HEW
FDA/HEW
FDA; HEW
NCI/NIH/HEW
HCI/NIH/HEW
NCI/NTfl/HEVT
Dun & Brads treet
EPA
Fred leasts
EPA
EPA
ERDA
Env.Info.Ctr.
ORNL/ERDA
ERC/ORNL
OECD
FWS/DOI
FWS/DOI
DATA TYPE
I
2b
3b
la
2b
2b
la
3c
2c
-
3c
3d
2a
-
-
3c
-
3b
3b
II
-
2b
2b
-
-
-
2b
2b
la
2b
3c
-
-
-
-
-
-
-
Ill
-
-
-
-
-
-
Ib
3c
2b
-
-
-
2b
-
-
-
. -
-
IV
2c
3c
-
-
-
-
Ib
2b
-
-
3c
2b
-
-
-
-
2b
2b
V
-
-
-
-
-
-
-
-
-
-
-
2b
-
-
-
-
-
-
VI
2c
3c
-
3c
-
-
-
-
-
-
-
2b
-
2c
-
-
-
2c
VII
-
-
-
-
-
-
-
-
-
2b
-
2a
2b
2c
-
-
-
Ib
VIII
-
-
-
-
-
-
-
-
-
3c
-
-
2b
-
-
-
-
-
(C)(M)
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
H
M
-------
TABLE 3-1 (Cont'd.) DATA SYSTEM SCORING
SYSTEM
Environmental Data Index
Environmental Data System
Environmental, Health, and Control Aspects
of Coal Conversion
Environmental Information System
*Environmental Mutagen Information Center
Environmental Mutagen Information Center
Agent Registry File
Environmental Pollution Effects on
Aquatic Resources
^Environmental Reports Summaries
Environmental Residual Information System
Environmental Resource Center
Environmental Science Information Center
^Environmental Teratology Information Center
Environmental Teratology Information Center
Agent Registry File
EPA Reports System
Epidemiological Studies Program Systen
Establishment/Product Licensing System
Establishment Registration Support System
ACRONYM
END EX
EDS
EIS
EMIC
EMICARD
ESIC
ETIC
ETICABF
BBSS
OWNER
NOAA/DOC
NOAA/DOC
ERC/ESIC/ORNL
Swedish CEI
NIEHS/NIH/HEH
EMIC/ORNL
NOAA/DOC
EPA
EPA
OKNL/ERDA
NOAA/DOC
NIEHS/NIH/HEtf
ETIC/ORHL
HTIS/DOC
EPA
FDA/HEW
EPA
DATA TYPE
I
2b
2b
3c
3b
Ib
2b
_
2c
3c
-
3b
Ic
2b
2b
3c
2b
2c
II
-
-
3c
2b
-
_
_
-
2b
4c
-
-
_
2a
-
2b
2a
III
-
-
-
-
-
_
_
-
2b
-
-
-
_
Ib
-
-
-
IV
la
2b
-
2b
-
_
_
-
-
2b
2a
-
_
Ic
-
-
-
V
-
-
_
2b
-
_
_
-
-
-
-
-
_
la
2b
-
-
VI
-
-
3b
-
la
2b
_
-
-
3b
3c
Ib
2b
la
-
3c
3b
VII
la
2a
3c
2b
-
2c
2a
-
-
2b
2b
-
2c
la
-
-
-
VIII
-
-
_
-
-
_
_
Ib
-
-
-
-
_
-
-
3b
-
(C) (M)
C
C
C
C
C
C
H
C
C
C
C
C
C
C
C
C
C
-------
TABLE 3-1 (Cont'd.) DATA SYSTEM SCORING
SYSTEM
Excerpta Medica
Export Monitoring and Control System
'Exposure Dictionary for National
Occupational Hazards Survey
•Federal Inventory on Environmental and
Safety Research
•Fish Control Laboratory - Data Base
Information
•Fish-Pesticide Research
Food Information Storage and Retrieval
Foreign Trade of Member Countries of the OECD
Foreign Traders Index
Fuel Additive Registration
Funk & Scott (F&S) Indexes
Geophysical Monitoring for Climate Change
Global Environmental Monitoring System
Graphical Interactive tMS Analysis Program
Great Lakes Environmental Contaminant Survey
Great Lakes Fishery Information
Hazardous and Trace Emissions System
ACRONYM
EDSOHS
Date Base
FTI
GEMS
GLECS
HATREMS
OWNER
Information
System
OEA/DOC
NIOSH/HEW
ERDA
FWS/DOI
FWS/DOI
FDA/HEW
ERS/DOC
DOC
EPA
Predicasts
NOAA/DOC
UHEP
NIB/EPA
FWS/DOI
EPA
EPA
DATA TYPE
I
2b
3c
la
2c
2b
2b
2b
3b
3c
2b
3c
-
3c
3c
3c
3c
2b
II
-
2c
-
-
-
-
-
2b
2c
2b
Ib
-
-
-
-
-
-
Ill
-
2c
-
-
2c
' -
-
2b
2c
2c
la
-
-
-
-
-
-
IV
2c
_
-
2b
-
2c
-
-
-
3b
-
-
3c
-
3c
-
Ib
V
2c
-
-
2b
-
-
-
-
-
3b
-
-
-
-
-
-
-
VI
2a
-
-
2a
2b
Ib
2b
-
-
2b
-
-
2c
- '
-
2b
-
VII
-
-
-
Ib
la
la
-
-
-
3c
. -
2b
2b
-
2b
2a
-
VIII
-
-
-
-
-
-
-
-
-
2b
-
-
-
-
-
-
-
(C)(M)
M
C
C
C
M
M
C
C
C
M
C
C
M
C
C
C
C
CO
CO
-------
TABLE 3-1 (Cont'd.) DATA SYSTEM SCORING
SYSTEM
* Health Hazard Evaluations
Heavy Metals
Heavy Metals and Related Trace Elements in
Aquatic Environments
•Index Chemicals Registry System
Industrial Hygiene Automated Data System
Industry Surveys
* Industrywide Studies
Information Analysis Centers
Information and Documentation System for
Environmental Planning
information Bulletin of the Survey of
Chemicals Being Tested for Carcinogenicity
Information Center for Energy Safety
* Information Storage and Referral Section
* Inorganic Chemical Computer Toxicology
Parameter Data Base
INSPEC Science Abstracts
*International Cancer Epidemiology Clearing
House
ACRONYM
ICRS
OMPLIS
OWNER
PHS/CDC/HEU
TVA
ESC/ORNL
ISI
TVA
Standard &
Poor's
NIOSH/CDC/HEH
DSA/DOD
WHO
ORNL/ERDA
NIEHS/NIH/HEW
EPA
last, of Elec.
Engineers, U.K.
ICRDB/IARC/CCR
DATA TYPE
I
4d
3c
3b
la
2b
4d
3b
-
Ib
2c
-
3b
2b
2b
II
23
-
_
-
2b
-
3a
-
_
-
-
la
2b
-
Ill
2a
-
_
-
3c
3c
3a
-
_
-
2b
_
-
-
IV
2b
2c
3c
-
Ic
-
Ib
-
_
-
2b
_
-
-
V
2a
-
_
-
2b
-
Ib
-
_
-
Ib
_
-
la
VI
3c
2b
2b
-
3b
-
3b
-
Ib
-
la
Ib
-
Ib
VII
-
2b
2b
-
-
-
-
-
_
2b
-
la
-
-
VIII
Ib
-
_
-
2a
3c
2b
-
_
2b
-
_
-
-
(C)
-------
TABLE 3-1 (Cont'd.) DATA SYSTEM SCORING
SYSTEM
International Classification of Diseases
for Oncology
International Joint Commission Coordinated
Program on Fish Contaminants
International Referral System for Sources
of Environmental Information
International Registry of Potentially Toxic
Substances
Investlgational New Animal Drug Index
Iowa Drug Information Service
*IPC Chemical Data Base
Isotopic Label Incorporation Determination
*Klrk-Othmer Encyclopedia of Chemical
Technology
Laboratory Analysis Data Base
^Laboratory Anlaal Data Base
Laboratory Management System
Lower Lakes Reference Group
*Hanmal Toxicity and Repellency Data Base
Marine Ecosystem Analysis Program
Mass Spectrometry Data Centre
ACRONYM
ICD-0
IDIS
LADB
MESA
OWNER
WHO
FWS/DOI
DNEP
UNEP
FDA/HEW
C. of Iowa
IPC Industrial
Press, U.K.
NIB/EPA
Interscience
Publishers
CPSC
NIH/HEW
EPA
FWS/DOI
FWS/DOI
NOAA/DOC
Atomic Weapons
Res. Es tab .U.K.
DATA TYPE
I
_
_
_
-
2b
Zb
2b
3c
2b
2c
2c
3c
3c
2c
2b
la
II
_
_
_
-
2b
-
Ib
-
la
-
-
-
-
-
-
-
Ill
_
_
_
-
3c
-
la
-
Ib
-
-
-
-
3c
-
-
rv
_
_
_
_
2b
-
-
-
-
-
-
3c
2b
-
2b
-
V
_
_
_
_
-
-
-
-
-
-
-
-
-
-
-
-
VI
2b
_
..
_
-
2b
-
-
-
2b
la
-
3c
Ib
la
-
VII
_
2b
_
_
-
-
-
-
-
-
-
2a
-
la
-
VIII
_
_
_
_
-
-
-
-
-
2b
-
-
-
-
-
-
(C)(M)
M
_
M
C
C
M
C
C
M
C
C
C
M
C
C
C
CO
tn
-------
TABLE 3-1 (Cont'd.) DATA SYSTEM SCORING
SYSTEM
Mass Spectrometry Bulletin Search
Mass Spectral Identification
Mass Spectral Search System
Masters Theses In the Pure and Applied
Sciences
*Meat & Poultry Inspection Monitoring Program
*Medlcal Literature Analysis and Retrieval
System On-Llne
Medical Subject Headings Vocabulary
*The Merck Index Text Editing System
Michigan Dept. of Natural Resources
Fisheries Division
*Mlcroconstituents in Fish and Fishery
Products
MI-KOM Environmental Information Services
*Military Entomology Information Service
^Mineral Commodity Survey System
Multilateral Trade Negotiations Data Base
Multistation Atmospheric Pollution from
Power Production Study
The Mutagenlcity and Teratogenicity of a
Selected Number of Food Additives
ACRONYM
MEDLINE
MESH
ME IS
MTNDB
MAPPPS
3BNL-EMIC-1
OWNER
KIH/EPA.
NIH/EPA
HIH/EPA
Plenum Publ.
PAHIS/USDA
NLM/NIH/HEW
KLM/NIH/HEW
Merck
Michigan State
NOAA/DOC
Swedish CEI
Army/DOD
BOM/DOI
DOC
ERDA/NOAA
EMIC/ORNL
DATA TYPE
I
2a
2a
la
3c
2b
2a
Ic
Ib
2c
3c
-
2b
Ib
2c
3d
2b
II
-
-
-
- _
-
_
-
-
-
_
-
-
Ib
Ib
_
_
III
-
-
-
_
-
_
-
-
_
_
-
-
Ib
Ib
-
-,
IV
-
-
-
3c
Ib
_
-
-
2c
2b
-
2a
-
-
_
3b
V
-
-
-
3c
-
la
-
-
_
_
-
la
-
-
-
_
VI
-
-
-
3c
-
la
-
2b
3d
_
-
la
-
-
-
2b
VII
-
-
-
3c
-
_
-
-
3c
3b
-
la
-
-
3c
_
VIII
-
-
-
_
2b
_
-
-
_
_
-
-
-
-
_
_
(C) (M)
C
C
C
M
M
C
C
C
M
C
M
C
C
C
C
M
-------
TABLE 3-1 (Cont'd.) DATA SYSTEM SCORING
STSTIM
*HASA Scientific and Technical Information
System
National Air Surveillance Network
National Cancer Institute (NCI) Carclno-
genesis Program File
•National Center for Health Statistics
•National Center for lexicological Research
(NCTR) Integrated Research Support System
national Clearinghouse for Mental Health
Information
^National Electronic Injury Surveillance
Systen
National Eftisslons Data
National Fire Data Center
National Index of Energy and Environmental
Related Data
National Index of Energy and Environmental
Belated Models
*National Occupational Hazard Survey File
National Park Service (NFS) Pest Control
System
National Pollutant Discharge Elimination
System
National Referral Center
ACRONYM
HASH
NCHS
SEISS
NEDS
NOHS
NPDES
OWNER
NASA
EPA
NCI/NIH/HEM
HEW
FDA/NCTR
NDffl/NIH/HEH
CPSC
EPA
DOC
EBDA
ERDA
NIOSH/CDC/HEW
NPS/DOI
EPA
Library of
Congress
DATA TYPE
I
Ib
2b
3c
-
3c
2b
2b
2b
3c
3d
4d
2b
3c
3c
3b
II
-
-
-
-
-
-
-
Ib
-
_
_
2b
3c
2b
_
III
-
-
-
-
-
-
2b
-
_
_
la
3b
-
_
IV
2b
Ib
3c
-
-
-
la
2a
-
3c
3b
la
ltd
3c
-
V
-
-
-
la
-
-
la
-
-
_
3b
-
-
-
-
VI
2b
-
2b
-
Ib
2c
-
-
3c
2b
3b
-
-
-
-
VII
2b
-
3c
-
-
-
-
-
-
3b
3b
-
3c
-
-
VIII
-
-
-
-
-
-
2b
-
-
-
-
-
-
2b
-
(C)(M)
C
c
C
c
c
c
c
c
M
C
c
c
c
c
c
-------
TABLE 3-1 (Cont'd.) DATA SYSTEM SCORING
SYSTEM
*Hational Technical Information Service
National Hater Data Exchange
Navy Environmental Protection Support Services
Nevada Applied Ecology Information Center
Ken Aninal Drug Applications
New York Tines Information Bank
*KIOSH Technical Information Center
Occupational Safety and Health
*0ceanic Abstracts
*Oceanic and Atnospheric Scientific
Information Service
•Office of Standard Reference Data
Chemical Files
*Oil & Hazardous Materials Technical
Data System
^Organic Chemical Producers Data Base
Paper Chem
Parklawn Health Library
Pathology Data Systen
Permit Compliance System (Hater)
ACRONYM
NTIS
NAWDEX
NIOSHTIC
OASIS
OHM-TADS
Xtrfk Index
OWNER
DOC
OSGS/DOI
Navy/DOD
ERDA
FDA/HEW
New York Times
NIOSH/HEW
OSHA/DOL
Data Courier,
Inc.
NOAA/DOC
NBS/DOC
EPA
EPA
lust, of Paper
Chemistry
PHS/HEW
FDA/HEW
EPA
DATA TYPE
I
2b
3c
2b
2b
2b
3c
3c
3c
2b
2c
Ib
2a
2c
3c
3c
3c
3c
II
2a
-
-
3c
2b
3c
2b
2c
-
_
4d
2b
la
2b
-
-
3c
III
la
-
2b
-
3v
3c
3c
-
-
_
_
2b
2c
2b
-
-
-
IV
la
-
2b
2a
2b
3c
3b
3c
2b
2b
_
2b
2c
-
-
-
2c
V
la
-
-
-
-
3c
la
2c
-
_
_
_
-
-
-
-
-
VI
la
-
2b
-
-
3c
la
-
-
_
_
Ib
ib
-
2b
2e
-
VII
la
2c
-
Ib
-
2c
4d
3c
la
Ib
_
2a
-
-
-
-
-
VIII
la
-
-
-
-
2b
-
2b
-
_
_
_
-
-
-
-
-
(C)(M)
K
H
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
CO
-------
TABLE 3-1 (Cont'd.) DATA SYSTEM SCORING
SYSTEM
P/E News
'Pesticide and Industrial Chemicals
'Pesticide Enforcement Management System
Pesticide Import File Region X
Pesticide Registration Systems
'Pesticide Reporting System
Pesticide Sampling Information System -
Region X
'Pesticides Analysis Retrieval and Control
System
Pharmaceutical News Index
Pilot Data Base for Hazardous Substances
*POISINDEX
Poison Control Centres of Canada
*Polson Control Online Inquiry System
•Pollution Abstracts
'Population Studies System
Fred leasts Domestic Statistics
1
ACRONYM
PEMS
(now PARCS)
PARCS
PHI
OWNER
Amer. Petrol
Institute
FHA/HEB
EPA
EPA
EPA .
FDA/HEW
EPA
OPM/EPA
Data Courier.
Inc.
CPSC
Micromedex
Consumer & Corp
Affairs. Cana-
dian Govt.
FDA/HEW
Data Courier,
Inc.
EPA
Predicasts
DATA TYPE
I
3c
Ib
2b
2b
2b
2d
2c
la
-
2c
Ib
2a
Ib
-
2c
3c
II
2b
-
2b
2b
2c
2c
2c
la
2b
-
-
-
-
-
-
2b
III
2b
-
2b
2c
-
3c
2c
Ib
2b
-
-
-
-
-
-
la
IV
-
-
-
3b
-
2b
2b
-
-
-
-
3c
-
2b
la
-
V
-
-
-
-
-
-
-
-
-
- '
-
2c
Ib
-
la
-
VI
2b
3c
-
-
-
-
-
2b
-
2b
Ib
2a
Ib
-
Ib
-
VII
-
-
-
-
-
-
-
-
-
-
- '
—
_
la
-
-
VIII
-
-
Ib
2b
-
la
2b
-
2a
2b
-
~
-
-
-
-
(C)(M)
C
c
C
K
C
C
C
C
c
c
M
C/M
C
C
c
c
I
5
-------
TABLE 3-1 CCont'd.) DATA SYSTEM SCORING
SYSTEM
Predicasts Federal Index
Predicasts International Statistics
Predicasts Market Abstracts
*Predicasts Marketing Systems
Product Safety Indexed Document Collection
Program for Toxicology of Combustion Products
Proton Affinity Retrieval
Psychological Abstracts
Registry of Toxic Effects of Chemical
Substances
* Re port ing of Economic Data for Negotiation
of International Transportation Conventions
Research Information Services for the
Agricultural Sciences
Research Materials Information Center
*Research Program of Chemicals That Impact Han
Retirement History Study
RINGDOC
Science & Technical Division
Science Citation Search
Scientific Manuscript Bibliographic System
ACRONYM
RTECS
REDNITRAC
SC1SEARCH
OWNER
Predicasts
Predicasts
Predicasts
Predicasts
CPSC
NBS/DOC
NIB/EPA
Am. Psych. Assn.
CDC/HEW
DOC
SSIE
ORNL/ERDA
NCI/NIH/HEW
OPP/HEW
Derwent Publ.
Lib. of Cong.
ISI
FDA/HEW
DATA TYPE
I
-
3c
-
3c
-
2a
Ib
-
la
2b
2b
2a
la
- .
2b
2c
2b
3c
II
-
Zb
2a
2a
3b
2b
-
-
_
la
_
-
la
-
-
-
-
-
Ill
-
la
la
la
2b
-
-
-
_
la
_
-
la
-
-
-
-
-
IV
-
-
-
-
3b
-
-
-
4d
_
_
-
Ib
2c
-
-
2b
4d
V
-
-
-
-
3b
-
-
-
_
_
_
-
la
2b
-
-
2c
-
VI
-
-
-
-
3b
2c
-
3d
la
_
3c
-
la
2c
2b
-
Ib
4d
VII
-
-
-
-
-
2a
-
-
2b
_
3b
-
la
-
-
-
-
-
VIII
2b
-
-
-
2b
-
-
-
la
_
_
-
-
-
-
2b
'
-
(C)(M)
C
C
C
C
C
C
C
C
M
C
C
M
C
C
C
C
C
C
-------
TABLE 3-1 (Cont'd.) DATA SYSTEM SCORING
SYSTEM
Scientific Reference Services Branch
Selective Dissemination of Information
On-Line
Single Drug Master File
*Smithsonian Scientific Information Exchange
Soil, Water, Estuarine Monitoring System
*Solid Waste Information Retrieval System
Special Reports - Grant Supported Literature
Index
•Special Trade Representatives Centralized
Data Bank
•Standards Completion Program
State Implementation Plans
Statistical Center for the Tyler Texas
Asbestos
Strategic Environmental Assessment System
'Storage and Retrieval for Water Quality Data
Storage and Retrieval of Aerometric Data
•Subject Content Oriented Retriever for
Processing Information On-LIne
Substructure Searching System
ACRONYM
SDILINE
SSIE
SWEMS
SWIRS
GENIUS
STRCDB
SIPS
SEAS
STORET
SARD AD
SCORPIO
CIS-SSS
OWNER
CDC/HEW
NLM/NIH/HEW
NCI/NIH/HEH
SSIE
EPA
EPA
NCI/NIH/HEH
Off. of Spec.
Representative
for Trade Neg.
NIOSH/CDC/HEW
EPA
NCI/NIH/EEW
EPA
EPA
EPA
Lib. of Cong.
NIB/EPA
DATA TYPE
I
4c
2b
3c
2b
2a
2b
2b
2b
4d
3c
4d
4d
2b
2b
2c
Ib
II
-
-
-
-
-
-
-
la
-
-
-
-
-
-
-
-
Ill
-
-
-
-
-
-
-
Ib
-
-
-
-
-
-
-
-
IV
-
-
-
la
2b
-
-
—
Ib
2c
4d
-
la
2a
-
-
V
-
Ib
-
-
2a
-
2a
_
Ib
-
4d
-
-
-
-
-
VI
2b
Ib
3b
2b
2a
-
2a
~
4d
-
4d
-
» .
-
-
-
VII
2b
_
-
la
2a
Ib
-
~
-
-
-
2c
-
-
-
-
VIII
-
-
-
-
-
-
-
~
Ib
-
-
-
-
-
2b
-
(C)(M)
M
C
c
M
M
C
C
C
c
c
c
c
c
c
c
c
CO
-------
TABLE 3-1 (Cont'd.) DATA SYSTEM SCORING
SYSTEM
•Supplementary Data System
•Survey of Compounds Which Have Been Tested
for Carcinogenic Activity
Swedish Register of Environmental Research
•Technical Data Center
Technical Files
Technical Library Information Office
The Environment Information Retrieval System
*Thermophyslcal Properties Research Center
Toxic Materials Information Center
Toxic Substances Information Act
Toxicologlcal Studies
*Toxlcology Data Bank
•Toxicology Information Cta-Line
Toxicology Information Response Center
Toxicology Research Projects Directory
•Toxicology Testing In Progress
Toxline Backfile
Trace Contaminants Abstracts
ACRONYM
PHS-149
TDC
TEIRS
TDB
TOXLIHE
TIRC
TOX-TIPS
TOXBACK
TCA
OWNER
BLS/DOL
NCI/PHS/HEW
Swedish CEI
OSHA/DOL
TVA
TVA
Army/DOD
Purdue D.
ERDA/NSF
Virginia State
NIOSH/CDC/HEU
HtM/SIH/HEW
NLM/HIH/HEW
ERDA
NLM/NIH/HEtf
KLM/HTfl/HEH
KLM/NIH/HEW
TMIC/ORffl.
DATA TTPE
I
2b
2b
3c
Ib
2b
-
2b
la
2b
2b
2a
la
2b
2b
2b
2a
2b
3c
II
-
-
-
-
3c
-
-
-
2b
3b
2c
la
2b
-
-
-
2b
-
Ill
-
-
-
-
3c
-
-
-
3c
-
2b
2a
2b
-
-
-
2b
-
IV
Ib
-
2b
la
2b
-
-
-
2a
-
2b
2a
la
2b
la
2a
la
4d
V
la
-
2b
la
2a
-
-
-
2a
-
-
la
la
2b
la
2a
Ib
-
VI
la
Ib
2b
la
2a
' -
-
-
2a
-
2a
la
la
2b
la
la
la
3c
VII
-
-
2b
-
3c
-
-
-
2a
-
-
2a
la
2b
la
2b
la
3c
VIII
-
-
-
Ib
2b
- '
-
-
-
-
2b
-
-
-
-
-
-
-
(C) (M)
C
M
C
M
M
M
C
M
C
C/M
H
C
C
C
M
C
C
H
u
*
to
-------
TABLE 3-1 (Concluded) DATA SYSTEM SCORING
SYSTEM
*Trade Name Ingredient Clarification
Upper lakes Reference Group
DSDA-ERS Use of Pesticides
VIOLOG
Walter Reed Army Institute of Research
Biological Data System
Walter Reed Army Institute of Research
Chemical Inventory System
Halter Reed Army Institute of Research
Chemical Structure System
Walter Reed Army Institute of Research
Index File
Water Quality Data Base
Water Resources Scientific Information Center
Water Storage Data and Retrieval System
X-Ray Crystal Data Retrieval System
X-Ray Crystal Structure Retrieval System
X-Ray Powder Diffraction Retrieval System
ACRONYM
TNIC
WRSIC
WATSTORE
OWNER
CDC /HEW
IWS/DO1
ESS /USD A
EPA
Army/DOD
Army/DOD
Army/DOD
Army/DOD
TVA
DOI
USGS/DOI
NIH/EPA
NIH/EPA
SIH/EPA
DATA TYPE
I
la
3c
3c
3c
2b
2b
2a
2a
2b
2b
2b
la
la
la
II
2c
-
-
2c
-
-
-
-
-
-
-
-
-
Ill
-
-
2b
-
-
-
-
-
-
-
-
-
-
IV
-
2b
3c
3c
-
-
-
-
2b
2a
2b
-
-
7
- -
-
-
-
-
-
-
-
-
-
-
-
-
VI
-
3c
-
-
2a
-
-
-
-
2b
-
-
-
VII
-
2a
-
2b
-
-
-
-
-
2b
-
-
-
VIII
-
-
-
2b
-
-
-
-
-
2a
-
-
-
(C)(M)
C
M
M
C
C
C
C
C
C
C
C
C
C
C
to
to
-------
This narrowed the list of Federal and private files under consid-
eration to 100. These primary files will be used in the second stage
of the effort under this contract which calls for recommending a basic
methodology for accessing and linking existing toxic substances infor-
mation and the identification of new files needed. Due to the short
time frame of this project, the initial survey of potentially useful
data files could not be exhaustive.
3.3 Characterization of Selected Systems
When selecting files for inclusion in an information system, it
is necessary to compare those systems containing data in similar
subject areas. Tables 3-2 through 3-9 contain descriptions of the
selected data files by subject category. The eight information cate-
gories used in Table 2-1 are broken down into subcategories to permit
a rapid comparison of these systems containing data in each data
category. The primary systems designated on Table 3-1 by a "la" or
"lb" in a given data category are included on these category-specific
tables. A column is also included for comments. If a more in-depth
comparison is desired, all primary data systems are described in
detail in Appendix D.
In addition, the primary systems were examined to determine
(1) whether data were generated internally or whether they were merely
compiled from external sources of information, (2) if they contain
proprietary information, and (3) if they are collected as a result of
a mandatory solicitation. This information is included in Table 3-10.
3-24
-------
TABLE 3-2
DATA SYSTEMS APPLICABLE TO SUBSTANCE IDENTIFICATION
SYSTEM
Advisory Center on Toxicology
Astro-4 Drug Information System
Carcinogenesis Bioassay Data System
Chemical Abstracts Service Chemical Registry System
Chemical Abstracts Service Information System
Chemical Dictionary of the U.S. ITC
Chemical Dictionary On-Line
Chemical Information & Data System
Chemical Information System
Chemical Monograph Referral Center
Chemical Names File
Chemical Transportation Emergency Center
Component Information for Chemical Consumer Products
Defense Documentation Center
Drug Registration and Listing System
*
0
X
X
X
X
X
X
X
X
X
X
X
1
s
i
X
X
X
X
x
X
X
X
X
X
X
X
X
§
S
1
a,
H
m
WLN
WLN
X
WLN
X
WLN
SSS
WLN
x : x
Drug Research & Development Chemical Information System 1 X 'X
SSS
j
0
en
S
eiS
3£
u S
M [d
eg
X
X
X
X
X
X
X
W
Ed
M
H
M
P
Ejl
§ o
us
t-1 p
CO S
gd
U --'
X
X
X
x
X X
x i x •
X
*
en
en
2
§
M
S
1
^»
o
GRAPHI
o
3
09
X
X
X
i v
t A
1
COMMENTS
Manual card file
Drug production and registration
information
Lab experiment data
Tariff information
CIDs registration system
Also x-ray CNMRs and
Mass spec.
Referral system to monographs with
these data
Compounds tested for carcinogenicity
File used in case of accidental spills
Formulation of 15,000 products to
0.1Z level
1
i ;
Cn
WLN • Wisvesser Line Notation
SSS = Substructure Searching
* Not on original questionnaire
-------
TABLE 3-2 (CONCLUDED)
SYSTEM
Environmental Mutagen Information Center
Exposure Dictionary for NOHS
Index Chemicals Registry System
Information Bulletin of the Survey of Chemicals Being Tested
for Carcinogenicity
IPC Chemical Data Base
Mineral Commodity Survey System
|NASA Scientific and Technical Information Center
j
Office of Standard Reference Data Chemical Files
Pesticides and Industrial Chemicals
Pesticides Analysis Retrieval and Control System
Poison Control On-Line Inquiry System
P01SINDEX
Registry of Toxic Effects of Chemical Substances
Research Program of Chemicals that Impact Man
Technical Data Center
rhermophysical Properties Research Center
Toxicology Data Bank
Trade Name Ingredient Clarification
I
s
B
d
3 1
X i X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
w
g
u
§
WLN
WLN
WLN
X
WLN
u
M
C/t
5E
pj CO
•*-~ w
tj M
512
M a
S OH
[x] O
X
X
X
X
X
j
i
CO
H
t-t
a
£
^
S §
in §
§,-5
S
O M
X
X
X
X
X
X
X
X
X
*w
M
to
S
*5
<
u
H
1
X
i
b'
|
ti
u
3C
&
0
O
M
S
X
X
X
X
COMMENTS
May be expanded for data sources of
mutagen information
12,000 chemical names
Imports /Exports
Survey of mineral industry
Environmental information
Pesticide chemistry
New system, use and formulation
Contains 10,000 household products and
drugs
Contains 160,000 entries
Basic toxicology of 22,000 chemicals
(3,200 chemicals by SRI)
Documentation on occupational safety
and health
New system, on-line access to
toxicology data
* Not on original questionnaire
-------
TABLE 3-3
DATA SYSTEMS APPLICABLE TO PRODUCTION
CO
IO
NJ
SYSTEM
Aerometric and Emission Reporting System
Annual Survey of Injuries and Illnesses
Annual Survey of Manufacturers
Astro-4 Drug Information System
Biomedical Studies Group
Census Bureau Foreign Trade Statistics
Census of Manufacturers
Chemical Economics Handbook
Current Industrial Reports
Data Base of the U.S. International Trade Commission
Directory of Chemical Producers
Employment and Earnings
Inorganic Chemical Computer Toxicology
Parameter Data Base
IPC Chemical Data Base
Kirk-Othmer Encyclopedia of Chemical Technology
Mineral Commodity Survey System
Multilateral Trade Negotiations Data Base
Organic Chemical Producers Data Base
Pesticides Analysis Retrieval and Control System
Predicasts Marketing Systems
Reporting of Economic Data for Negotiation of
International Transportation Conventions
Research Program of Chemicals That Impact Man
Special Trade Representatives Centralized Data Bank
Toxicology Data Bank
X
X
X
X
X
X
X
X
X
X
X
x I x
H
M
o5
&
1
CW
S
X | X
X !
o
COMMENTS
NEDS
All establishments >111 employees by SIC code
By SIC code
Drug producers and amounts
For 14 compounds
Imports/exports
By SIC code
By SIC code
Manufacturers and importers in summary form
Manual
Size of workforce
172 inorganics (new system)
Imports/exports on 100 chemicals
Manual
200 mineral industries
Imports/exports
400 chemicals
Formulation information by producer
F f, S, KTS of Prs-Hlcast
I
j Import/export
On 3,200 chemicals (SRI)
Imports/exports
1,000 chemicals (new system)
*Not on original questionnaire
-------
TABLE 3-4
DATA SYSTEMS APPLICABLE TO MARKETING
SYSTEM
Agricultural On-Line Access
Biomedical Studies Croup
Chemical Economics Handbook
Data Base of U.S. International Trade Commission
Dun's Market Identifiers
IPC Chemical Data Base
Kirk-Othmer Encyclopedia of Chemical Technology
Mineral Commodity Survey System
National Occupational Hazard Survey
National Technical Information Service
Pesticide Analysis Retrieval and Control System
Predicasts Marketing Systems
Reporting of Economic Data for Negotiation of International
Transportation Conventions
Research Program of Chemicals that Impact Man
Special Trade Representatives Centralized Data Bank
a
o
E/3
X
X
X
X
X
X
X
X
X
X
u
en
o
X
X
X
X
X
X
W
H
H
4-4
H
W
cn
X
X
X
X
X
X
X
en
h-t
2
O
a
8
X
X
X
X
X
X
X
X
X
X
X
X
Ed
CO
0
o
H
0
5
X
X
X
X
X
1
O
U
•5
O
o
t-t
P3
M
X
X
X
COMMENTS
Agricultural chemicals
On 14 chemicals
Manual
8,000 chemicals - some manufacture, some imports
Import/export on 100 chemicals
Survey of 200 industries
Workplace uses
Government reports
Pesticides
All systems
Imports /exports
SRI file on 3,200 chemicals
Import/export
CO
I
10
00
-------
TABLE 3-5
DATA SYSTEMS APPLICABLE TO EXPOSURE
SYSTEM
Aerometric and Emission Reporting System
Agricultural On-Line Access
Biomedical Studies Group
Cancer Information On-Line
Current Employment Statistics
Dun's Market Identifiers
Industrywide Studies
Meat and Poultry Inspection Monitoring Program
National Electronic Injury Surveillance System
National Occupational Hazard Survey
National Technical Information Service
Oceanic and Atmospheric Scientific Information Service
Population Studies Program
Research Program of Chemicals that Impact Man
Smithsonian Scientific Information Exchange
Standards Completion Program
Storage and Retrieval for Water Quality Data
Supplementary Data Center
Technical Data Center
Toxicology Information On-Line
.
3
o
M
§
8
X
X
X
X
X
X
X
X
X
X
X
X
X
X
1
8
X
X
X
X
X
X
X
X
g
g
I
M
§
X
X
X
X
X
X
X
X
X
X
*
55
Ctf
g
M
£5
2
X
X
X
X
X
X
X
X
X
>•
g
o
o
t-4
1
g
ij
m
M
M
X
X
X
X
X
X
05
M
X
X
X
X
X
X
X
B!
H
£
X
X
X
X
X
X
X
COMMENTS
Includes NEDS, SAROAD, HATREMS, EDS, NASN
14 chemicals only
100 occupational studies
Levels of pesticides, drugs, metals and
residues
Emergency room injuries associated with
consumer products
Government reports
ENDEX
3,200 chemicals - SRI file
Research in progress
400 chemicals - includes WATSTORE, ECMS,
NPDES, LAM
5,000 chemicals
Including Toxback
s
*not included on original questionnaire
-------
TABLE 3-6
DATA SYSTEMS APPLICABLE TO EPIDEMIOLOGY
SYSTEM
COMMENTS
CO
CO
o
Advisory Center on Toxicology
Lnnual Survey of Injuries and Illnesses
Atlas of Cancer Mortality
Biomedical Studies Group
Biological Sciences Information Service
Cancer Information On-Line
Industrywide Studies
Information Storage and Referral Section
International Cancer Epidemiology Clearinghouse
Hedical Literature Analysis and Retrieval System On-line
Military Entomology Information Service
National Center for Health Statistics
National Electronic Injury Surveillance System
National Occupational Hazard Survey File
National Technical Information Service
HIOSH Technical Information Center
Poison Control On-Line Inquiry System
Population Studies System
Research Program of Chemicals that Impact Man
Standards Completion Program
Supplementary Data System
Technical Data Center
Toxicology Data Bank
Toxicology Information On-LIne
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Manual (minimal added data)
BLS biannual survey
14 Chemicals
100 studies performed by NIOSH
New system
Includes SDILINE
Baseline information
Consumer epidemiology
Plant profiles
Government reports
8,000 chemicals
Procuring incidence reports
CHESS
SRI
Surveillance re. 400 chemicals with standards
State Unemployment Insurance Records
OSHA Data Bank
New system - now covers 1,000 chemicals and drugs
Including TOXBACK
-------
TABLE 3-7
DATA SYSTEMS APPLICABLE TO BIOLOGICAL EFFECTS
SYSTEM
Advisory Center on Toxicology
Biomedlcal Studies Group
Biological Sciences Information Service
Cancer Information On-Llne
Carcinogenesis Bioaasay Data System
Clinical Toxicology of Coin»ercial Products
Environmental Mutagen Information Center
Environmental Teratology Information Center
Fish Pesticide Research
Information Bulletin of the Survey of Chemicals Being Tested
for Carcinogenicity
Information Storage and Referral Section
International Cancer Epidemiology Clearing House
Laboratory Animal Data Base
Mammal Toxicity and Kepellency Data Base
Medical Literature Analysis and Retrieval System On-Line
Inorganic Chemical Computer Toxicology Parameter Data Base
en
u
H
g
IA
3
M
g
•3
X
X
X
X
X
X
X
^
s
g
H
X
3
H
W
frt
§
X
X
X
X
X
X
X
X
X
X
X
X
B
IH
a
M
O
8
X
X
X
X
X
X
X
X
X
X
X
X
£J
u
1-1
pS
§
X
X
X
X
X
X
X
B
M
O
M
1
l5
g
X
X
X
X
X
X
X
.,
M
?*
£
X
X
X
X
X
•K
§
g
^
g
to
H
X
X
X
X
X
X
1
o
rH
X
0
(H
^
M
M
X
X
X
X
X
X
COMMENTS
Manual card
index
14 chemicals
Cancerpro j .
20,000 trade
names with
toxicity
New system
50,000 animals
172 Inorganics
CO
CO
*not included on original questionnaire
-------
TABLE 3-7 (CONCLUDED)
SYSTEM
Military Entomology Information Service
National Technical Information Service
National Center for Toxicology Experiment Integrated
Research Support System
NIOSH Technical Information Center
Oceanic and Atmoshperic Scientific Information Service
Oil and Hazardous Materials
Organic Chemical Producers Data Base
POISINDEX
Poison Control On line Inquiry System
Population Studies Program
Registry of Toxic Effects of Chemical Substances
Research Program of Chemicals. That Impact Man
Smithsonian Scientific' Inf oraation Exchange
Supplementary Data Base
Survey of Compounds That Have Been Tested for Carcinogenicity
Technical Data Center
Toxicology Data Bank
Toxicology Information On-Line
Toxicology Testing in Progress
M
M
W
M
g
d
X
X
X
X
X
X
X
X
X
X
>4
0
CJ
M
><
ft)
i
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
s
u
M
2
w
i
u
s
X
X
X
X
X
X
X
r
X
X
X
X
X
H
CJ
i
^!
i
X
X
X
X
X
X
X
X
X
e
M
tH
g
8
-------
TABLE 3-8
DATA SYSTEMS APPLICABLE TO ENVIRONMENTAL EFFECTS
SYSTEM
Advisory Center on Toxicology
Agricultural On-Line Access
Air Pollution Technical Information Center
Biomedical Studies Group
Biological Sciences Information Service
Biological Data Storage and Retrieval System
Defense Documentation Center
Distribution Register of Organic Compounds in Water
[Environmental Contaminant Monitoring System
(Federal Inventory on Environmental Safety and Health Research
Fish Control Laboratory Data Base Information
Fish Pesticide Research
Military Entomology Information Center
National Technical Information Service
Oceanic Abstracts
Oceanic and Atmospheric Scientific Information Service
Pollution
Research Program of Chemicals that Impact Man
ismithsonian Scientific Information Exchange
Solid Waste Information Retrieval System
Toxicology Information On-Line
Inorganic Chemical Computer Toxicology Parameter Data Base
BIOACCUMULATION
X
X
x
X
X
X
X
X
X
X
X
X
X
X
X
ECOLOGICAL EFFECTS
X
X
X
X
x
X
X
X
x
X
X
• X
I x
x
• x
; x
i
; x
' X
PHYSICAL EFFECTS
X
X
X
X
x
X
DEGREDATION
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
1 X
X
x
MONITORING AND ^
ANALYSIS TECHNOLOGY
X
X
X
X
x
X
1
X
X
BIBLIOGRAPHIC ONLY
X
X
X
x
X
x
X
X
X
X
X
X
COMMENTS
Manual file
For 14 chemicals
Biological effects of water quality(new system)
New system
Fish bioaccumulation studies
2,466 projects
1,500 chemicals in 8 species manual
500 chemicals in 100 species manual
Government reports
SRI file of 3,200 chemicals
Research in progress
172 inorganics (new svstem)
CJ
6
CO
* Not on original questionnaire
-------
TABLE 3-9
DATA SYSTEMS APPLICABLE TO STANDARDS AND REGULATIONS
SYSTEM
<
J
-------
TABLE 3-10
SOURCE OF DATA AND THE PROPRIETARY STATUS OF THE PRIMARY SYSTEMS
PRIMARY SYSTEMS
Advisory Center on Toxicology
Aerometric and Emission Reporting System
Agricultural On-Line Access
Air Pollution Technical Information Center
Annual Survey of Injuries and Illnesses
Astro- A Drug Information System
Atlas of Cancer Mortality
Biological Data Storage and Retrieval System
Biological Sciences Information Service
Biomedical Studies Group
Cancer Information On-Line
Carcinogenesis Bioassay Data System
Census Bureau Foreign Trade Statistics
Census of Manufacturers
Chemical Abstracts Service Chemical Registry System
Chemical Abstracts Service Information System
Chemical Dictionary of the U.S.ITC
Chemical Dictionary On-Line
Chemical Economics Handbook
Chemical Information and Data System
Chemical Information System
Chemical Monograph Referral Center
Chemical Names File
Chemical Transportation Emergency Center
Clinical Toxicology of Commercial Products
Component Information for Chemical Consumer Products
Congressional Information Service Index
Congressional Record Abstracts
CPSC Chemical Abstracts
Current Employment Statistics
Data Base of the U.S. ITC
Defense Documentation Center
Directory of Chemical Producers
Distribution Register of Organic Pollutants in Water
Drug Registration and Listing System
Drug Research and Development Chemical Information
System
ACRONYM
AEROS
AGRICOLA
APTIC
BIO-STORET
BIOSIS
CANCERLINE
CBDS
CHEMLINE
CIDS
CIS
CHEMRIC
PHS-149
CHEMTREC
CTCP
CIS INDEX
CRECORD
DDC
DCP
WATERDROP
DR&D CIS
INTERNALLY
GENERATED
DATA
X
X
I
X
X
X
X
X
X
X
X
X
X
EXTERNALLY
GENERATED
DATA
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
PROPRIETARY
INFORMATION
X
X
X
X
X
X
X
X
X
X
X
MANDATORY
SOLICITATION
DATA
X
X
X
X
X
X
X
X
-------
TABLE 3-10 (Continued)
SOURCE OF DATA AND THE PROPRIETARY STATUS OF THE PRIMARY SYSTEMS
PRIMARY SYSTEMS
Dun's Market Identifiers
Environmental Contaminant Monitoring Program
Environmental Mutagen Information Center
Environmental Reports Summaries
Environmental Teratology Information Center
Exposure Dictionary for the National Occupational
Hazards Survey
Federal Inventory of Environmental and Safety
Research
Fish Control Laboratory-Data Base Information
Fish-Pesticide Research
Health Hazard Evaluations
Index Chemicals Registry System
Industrywide Studies
Information Bulletin of the Survey of Chemicals
Being Tested for Carcinogenicity
Information Storage and Referral Section
Inorganic Chemical Computer Toxicology Parameter
Data Base
International Cancer Epidemiology Clearinghouse
IPC Chemical Data Base
Kirk-Othmer Encyclopedia of Chemical Technology
Laboratory Animal Data Base
Mammal Toxicity and Repellency Data Base
Meat S Poultry Inspection Monitoring Program
Medical Literature Analysis and Retrieval System
On-Line
Microconstituents in Fish and Fishery Products
Military Entomology Information Service
Mineral Commodity Survey System
NASA Scientific and Technical Information Service
National Center for Health Statistics
National Center for Toxicology Integrated Research
Support System
National Electronic Injury Surveillance System
National Occupational Hazard Survey File
National Technical Information Service
NIOSH Technical Information Center
ACRONYM
DMI
EMIC
ETIC
EDNOHS
ICRS
LADB
MEDLINE
MEIS
NCHS
NEISS
NOHS
NTIS
NIOSHTIC
INTERNALLY
GENERATED
DATA
X
X
X
X
X
X
X
X
X
X
X
X
X
X
EXTERNALLY
GENERATED
DATA
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
PROPRIETARY
INFORMATION
X
MANDATORY
SOLICITATION
DATA
x
x
-------
TABLE 3-10 (Concluded)
SOURCE OF DATA AND THE PROPRIETARY STATUS OF THE PRIMARY SYSTEMS
PRIMARY SYSTEMS
Oceanic Abstracts
Oceanic and Atmospheric Scientific Information
Service
Office of Standard Reference Data Chemical Files
Oil & Hazardous Materials Technical Data System
Organic Chemical Producers Data Base
Pesticide and Industrial Chemicals
Pesticide Enforcement Management System
Pesticide Reporting System
Pesticides Analysis Retrieval and Control System
POISINDEX
Poison Control On-Line Inquiry System
Pollution
Population Studies System
Predicasts Marketing Systems
Registry of Toxic Effects of Chemical Substances
Reporting of Economic Data for Negotiation of
International Transportation Conventions
Research Program of Chemicals That Impact Man
Smithsonian Scientific Information Exchange
Solid Waste Information Retrieval System
Special Trade Representatives Centralized Data Bank
Standards Completion Program
Storage and Retrieval for Water Quality Data
Subject Content Oriented Retriever for Processing
Information On-Line
Supplementary Data System
Survey of Compounds Which Have Been Tested for
Carcinogenic Activity
Technical Data Center
Thermophysical Properties Research Center
Toxicology Data Bank
Toxicology Information On-Line
Toxicology Testing In Progress
Trade Name Ingredient Clarification
ACRONYM
OASIS
OHM-TADS
PEHS
PARCS
RTECS
REDNITRAC
SSIE
SWIRS
STRCDB
STORET
SCORPIO
TDC
TDB
TOXLINE
TOX-TIPS
TNIC
INTERNALLY
GENERATED
DATA
K
X
X
X
X
X
X
X
X
X
X
EXTERNALLY
GENERATED
DATA
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
PROPRIETARY
INFORMATION
X
X
X
X
X
MANDATORY
SOLICITATION
DATA
X
X
-------
4.0 IDENTIFICATION AND EVALUATION OF DATA FILES CONSISTENT WITH USER
REQUIREMENTS
4.1 Introduction
This section presents a summary of user requirements for informa-
tion concerning chemical substances and compares these with the capa-
bilities of existing files. The primary files identified in Section 3
are evaluated with respect to their characteristics and attributes
(e.g., accuracy of data, specificity of data, degree of mechanization
and access). The characteristics of these files are compared with
those characteristics associated with the functional categories in
the User Requirements Analysis (Section 2).
Following a discussion of the primary files applicable to each
subject area, those primary files best able to supply the information
requirements are presented. The strong points and inadequacies of
each primary file are then analyzed. In the following sections of this
report, these applicable files are combined with new files which must
be created because the primary files are inadequate to meet the user
needs. The result is an integrated systems plan for supplying infor-
mation on chemical substances.
4.2 Substance Identification
The discussion of systems applicable to substance identification
data is divided into five sections. These are Basic Identification
Data, Chemical and Physical Properties, Composition Data, Compound
Impurities, and Chemical Analyses Techniques.
4-1
-------
4.2.1 Basic Identification Data
Basic Identification data for chemical substances include
molecular formula, chemical structure, CAS registry number, CAS-
preferred name, and synonyms. Molecular formula and chemical structure
are required for all chemicals in commerce for all three functional
categories (Categories I, II, and III). They are required to be
available on an interactive basis, updated annually and possess a
high degree of specificity.
There are a number of files which contain varying amounts of
this information. The NIH/EPA Chemical Information System (CIS) now
has the "candidate list" on-line through the TYMSHARE System, which
provides access to the CAS registry number, the preferred name, the
chemical structure, and the molecular weight. CIS is searchable by
chemical structure, substructure and CAS number. CIS can be used to
search for every occurrence of a complete structural formula or frag-
ment in its file, as opposed to a molecular formula. This procedure
is termed substructure searching and involves a search through a file
of connection tables for the part that has been specified by the user.
A number of additional externally generated files (e.g., OHM-TADS,
Merck Index, etc.) have been registered and are structurally searchable
through the CIS substructure searching system. CIS will update this
file when the final inventory is published by EPA, thereby providing
access to these data elements for all chemicals on the inventory. In
order for this file to maintain its currency, the file will have to be
4-2
-------
updated on an annual basis to Include changes made in Chemical Abstract
numbers, names, etc. Changes in CAS numbers will impact on all systems
maintaining CAS numbers as an access key. Manufacturers who may be re-
quired to report on an annual basis such items as changes in production,
use, etc., should be aware that CAS numbers do change as new informa-
tion about chemical structure is reported.
CHEMLINE is another file which provides basic identification data
for a large number of chemicals (100,000) and, in addition, provides
a locator designator which points to other files in the NLM system
which have information on this chemical. Where applicable, each CAS
number record in CHEMLINE contains ring information. At the present
time, the CHEMLINE system can be searched by this ring information or
by name fragments. NLM is considering loading the candidate list into
CHEMLINE, thereby providing access to the large numbers of users already
having access to the NLM data bases.
The Systems which will be discussed in more detail in Sections 5
and 6 include:
(1) CHEMLINE
(2) Chemical Information System (CIS)
(3) Army's Chemical Information Data System (CIDS)
4.2.2 Chemical/Physical Properties
The user analysis study indicated that chemical and physical prop-
erty data were not necessary for first level screening of chemicals.
However, for second level screening and Category II and Category III
functions they were necessary but there was no justifiable requirement
for an interactive system,
4-3
-------
The following are existing data systems which may be able to
supply relevant chemical property data: Chemical Information System
(CIS, Chemical Abstract Services Information files (e.g., CACondensates,
CBAC), NASA Scientific and Technical Information Data Base, and the
Toxicology Data Bank (TDB). Physical property data are available
from the Pesticides and Industrial Chemical File, Toxicology Data
Bank, the Office of Standard Reference Data Chemical Files and the
Thermophysical Properties Research Center.
The Chemical Information System contains extensive files of
mass spectral data, x-ray diffraction, and CNMR data which are avail-
able on-line through a commercial system making it widely accessible.
The toxicology Data Bank presently contains selected chemical and
physical data on approximately 1000 chemicals. These data have been
extracted from various handbooks and published sources and have been
evaluated before being entered into the system. TDB provides a poten-
tial focal point for physical and chemical data. The anticipated file
is expected to contain data on 4000-5000 chemicals. Selected chemicals
(Category III) for which hazard analysis, criteria documents, and/or
regulations are planned by various agencies, could be primary candi-
dates for inclusion into TDB, thereby enlarging the file and central-
izing such information.
Relevant data found in the Standard Reference Chemical Data file
and the Thermophysical Property Research Center already serve as
sources for much of this information, but for purposes of establishing
a centralized file, TDB provides an established mechanism for such data.
4-4
-------
Pre-manufacturing data, substantial hazard notifications, etc.,
received by OTS which fall into Category II will be handled by the EPA
Reports Management System. Plans to coordinate these data with data
in CIS and TDB will be addressed in a later report.
CAS files such as Chem Cond'ansates and NASA files can serve as
sources of physical and chemical data for chemicals not included in
TDB or CIS.
4.2.3 Composition Data
CPSC, FDA, NIOSH, OSHA, and OPP/EPA require product composition
data for chemical formulations that fall under their respective
authorities. These agencies utilize the composition data to accom-
plish first level screening since they are concerned about chemicals
in products that are manufactured in large quantities and/or offer
potentially high human exposure levels. The regulatory agencies also
use the files extensively in hazard analysis and enforcement activities.
Chemical composition of feedstocks (i.e., ingredients) and process
intermediates are required in order to set screening priorities for
more intensive second level testing. This information is needed on
an annual basis but is not required to be automated. The Office of
Enforcement within EPA, however, stated a need for chemical composition
data on an interactive basis to be responsive to short term or emergency
situations, where the identification of all components in a particular
substance, as formulated, is important to establish or substantiate
violations.
4-5
-------
Files which contain product composition data include: Astro-4
Drug Information System, Component Information for Chemical Consumer
Products (CPSC), Trade Names Ingredient Clarification File (NIOSH),
PARCS, Pesticides and Industrial Chemicals, Research Program of
Chemicals That Impact Man (SRI/NCI), Clinical Toxicology of Commercial
Products (CTCP), Poison Control On-Line Inquiry, and POISINDEX.
Most of the existing files of composition data respond to a
specific Federal mandate and describe end-product formulation rather
than providing the detailed chemical composition of all components
of a process or product mixture. In addition, product composition
files, such as those in FDA, CPSC, and NIOSH, contain a large percent-
age of data which are confidential and cannot be made available to
other agencies. The NCI file maintained by SRI (Research Program of
Chemicals that Impact Man) has general composition data but it is
limited both in coverage and specificity. POISINDEX, CTCP and the
Poison Control file of FDA provide composition data, but it is for
products that are typically ingested. Data in these files are
general, usually presented as ranges and focus on the active ingredients,
POISINDEX has composition data for the broadest coverage of products
(160,000 entries) .
PARCS provides composition of pesticide products but primarily
for active ingredients only. They are looking to OTS to obtain
information on the "inerts".
4-6
-------
A.2.4 Compound Impurities
Data on compound impurities are required for all analysis func-
tions associated with Category II and Category III type data. There
is no requirement for an automated system but there is an increasing
requirement for specificity, particularly for research and monitoring
functions. Groups conducting extensive testing (NCI, and the Testing
Group [OTS]) were particularly concerned about adequate characteriza-
tion of impurities before compounds enter long term test. This infor-
mation is generally not available except on a limited basis. PARCS
contains limited information on impurities in pesticides. Component
Information for Chemical Consumer Products contains formulary informa-
tion to the 0.1% level for consumer products. The NIOSH Trade Name In-
gredient Clarification File has formulary information to the 1.0% level
for industrial products. However, the primary intent of these files
is to provide product composition data on purposely included chemicals.
Firms who provided information to these agencies may not have reported
impurities if they were insignificant or not recognized.
Generally, this information is received from the chemical manu-
facturer on a case-by-case basis. Impurities in technical grade
chemicals will vary depending on the purity of the feedstock and on
the process. Most agencies stated that they needed knowledge of
impurities for proper assessment of risk, and were particularly con-
cerned that when testing results were reported detailed chemical analy-
ses should be provided as to the identification of the chemical and
its purity.
4-7
-------
4.2.5 Chemical Analysis Techniques
Knowledge of methods for chemical analysis including suitable
techniques and standard protocols was cited as a requirement for
Category II and Category III with greater specificity required for a
manual file updated annually or as changes in methodology occur.
Several sources of this information are available such as hand-
books of standard protocols (ASTM, AOAC). The Pesticides and Indus-
trial Chemicals file provides some information, but most is obtained
from searching bibliographic Chemical Abstract files. Several agencies,
including NCI and EPA, indicated the need for development of a cen-
tralized file of analytical techniques for determining impurities in
chemicals and methodologies for decontaminating chemicals.
4.3 Production Aspects
The discussion of systems applicable to production has been
divided into three subsections. These are: Production Quantity,
Plant Location and Manufacturer; Production Process and Control
Technology; and By-Products and Impurities.
4.3.1 Productio_n_ Quantity, Plant Location and Manufacturer
Production information is needed on a site specific basis for
all three functional categories with provisions for an annual update.
For Category I first screening, range data would be sufficient for
site as well as quantity. However, as a chemical proceeds through
Categories II and III the need for more exacting information becomes
imperative. Hazard identification, hazard analysis and enforcement/
compliance have the greatest needs. Because of the volume of data and
4-8
-------
the short time frame required for response, an interactive computerized
system will be required.
The following are existing data files or systems which may be
able to supply information on production: SRI's Directory of Chemical
Producers and Chemical Economics Handbook, Predicast Marketing Systems,
the Data Base of the U.S. ITC, the IPC Data Base, the Mineral Commodity
Survey System, Organic Chemical Producers Data Base, PARCS, the Research
Program on Chemicals that Impact Man, the Toxicology Data Base, the Cen-
sus of Manufacturers, the Annual Survey of Manufacturers, the Current
Industrial Reports of the Bureau of the Census, and the Annual Survey
of Injury and Illnesses.
No file provides site specific information on all chemicals in
commercial production. The Data Base of the U.S. ITC contains quan-
tities of synthetic organic chemicals produced, but the information
is confidential when there are less than three manufacturers or pro-
duction volumes of less than 1000 Ibs. per year. They also have
manufacturers and plant location information, but they do not have
chemical information by plant. The Current Industrial Reports of the
Bureau of the Census do contain production quantities by location,
but only in terms of SIC code. This information is also proprietary
and only summary statistics are released annually. The SRI files
have production information by site, but only for a limited number
of chemicals and the accuracy of some of their values has been ques-
tioned. All of the other data bases contain some pertinent information
on production but the coverage is uneven.
4-9
-------
Import quantities are also available through the IPC Data Base
and others, but again the information is by generic class rather than
specific chemical. The information of the Bureau of the Census covers
inorganic chemicals production and shipment data on both a monthly and
annual basis in their Current Industrial Reports Series and major
organic and inorganic product class value of shipment data in their
Annual Survey of Manufacturers. These data series do not list separate
information on all the chemicals covered by the TSCA inventory.
A large number of agencies are looking to OTS to provide this
site specific production information under their TSCA mandate. All
would like to have access to an interactive computer file in order to
reduce response time and to increase the ease with which the data can
be accessed, but they realize that much of the data in the file would
be of a proprietary nature. They are currently using these other
systems, but the procedure is time consuming, often costly, and may
not produce the desired information.
4.3.2 Production Process and Control Technology
Process information is required for the activities in the first
functional Category only with respect to the identification of evolving
technological changes. Specific process and control technology infor-
mation, however, are required for Categories II and III activities.
This information could exist in a manual form but there was a request
to have it regularly updated.
Existing sources of process and accompanying control technology
information include: the Kirk-Othmer Encyclopedia of Chemical
4-10
-------
Technology, the NEDS subsystem of AEROS, the Organic Chemical Pro-
ducers Data Base, the EIS and F & S subsystems of the Predicasts
Marketing Systems and the Toxicology Data Base. There is much less
collected information on available control technology than on pro-
duction processes. Control technology is often only included in the
above sources as it affects the process being discussed.
Probably the most complete existing source of production process
information is the Kirk-Othmer Encyclopedia of Chemical Technology.
This is, however, manual and somewhat dated. The Organic Chemical
Producers Data Base is probably the best mechanized file, but it is
limited in scope to 400 chemicals. The Predicast EIS and F & S
systems available through Lockheed could supplement the above systems.
In the near future, it is expected that process and control tech-
nology information would only be required on a case-by-case basis. In
order to update process trends as requested, especially for the Early
Warning function, however, it might be necessary to organize a baseline
process file. Process information is regarded as highly proprietary
by a number of manufacturers, so if it were decided to set up a
manual process file, strong industry resistance could be expected.
This need for the information will have to be carefully evaluated as
the plans for the implementation of TSCA become more firm.
4.3.3 By-Products and Impurities
Information on by-products and impurities is not required for the
primary screening under the Category I functions. For the secondary
4-11
-------
screening, however, range data are required. For Categories II and
III functions, greater specificity is required as to the nature of
the by-products and the impurities. In all functional areas a manual
file would be sufficient, with provisions for a regular updating.
Two existing data systems could be accessed to provide some of
the required information. The Organic Chemical Producers Data Base
does contain by-product information on the 400 chemicals which it
covers. The Research Program of Chemicals that Impact Man prepared
by SRI for NCI also contains this type of information on 3200 com-
pounds, but the file is incomplete in that not all information is
included for all compounds.
Several Government agencies, among them NCI, NIEHS and OSHA,
would like access to this sort of information were it to be available.
The Interagency Testing Committee is also looking to OTS to provide
by-products information since no data base exists and OTS has the
unique authority to collect this information under section 8(a) of
TSCA.
4.4 Marketing
The systems applicable to Marketing can best be discussed if
they are divided into two areas. The first area covers Use Information
and includes information on uses, users and places of use. The second
area includes Economic information and covers sales volumes, costs,
and market trend data.
4-12
-------
A.4.1 Usage Information
Range use data is required of all chemicals for the initial
screening step required to perform Category I functions. By the
second screening, information is required on uses including amounts
and how much of a chemical is involved per use. This same level of
specificity is required for Categories II and III. The majority of
the functional areas require this use information to be available in
an interactive mode. Updating use information annually will assist
in providing indicators of "significant new use."
Existing data systems and files which could supply useful infor-
mation include: PARCS (for pesticides), the Data Base of the U.S. ITC,
the Mineral Commodity Survey System, the Predicasts Marketing Systems,
Research Chemicals That Impact Man, the SRI Chemical Economics Hand-
book and the Kirk-Othmer Encyclopedia of Chemical Technology.
No comprehensive file of uses of all chemicals in commerce
currently exists. Those files described above which will probably
be most useful in supplying usage information are the SRI Chemical
Economics Handbook, Research Chemicals That Impact Man and the Kirk-
Othmer Encyclopedia of Chemical Technology. The Chemical Economics
Handbook and Kirk-Othmer, however, are not automated at the present
time so they could not fulfill the interactive requirement expressed
during the interviews. The file of Research Chemicals That Impact
Man is automated, but was reported as covering 3200 chemicals. The
National Occupational Hazard Survey contains occupationally oriented
use information, but it was the result of a one-time plant survey
4-13
-------
conducted in 1973 and is thus dated and the duplicability of its
results might be questionable.
Some of the composition data discussed in detail in Section 4.2
which might aid in the defining of uses and amounts is contained in
machine searchable files by use category. The CTCP, and POISINDEX
files have this capability for a number of consumer products. CPSC
and NIOSH also have composition data by use code but due to the
proprietary nature of their files they are not publicly available.
Some additional use data is contained in the Predicast Marketing
Systems and the Data Base of the U.S. ITC, but the uses are generally
consolidated into generic categories.
All of the above mentioned files employ different terminologies
to denote use. The creation of an interactive file which would deter-
mine "significant new uses" would require the existence of a base-
line use file and a standardized vocabulary for reporting use. A num-
ber of government agencies including OSHA, CPSC, NCI, and DOD, in
addition to several consumer action groups, are looking to OTS to
provide this base line use information in an easily accessible form
for all chemicals in commerce. Some agencies are currently using
contractors to supply use data on a compound by compound basis which
is both costly and time intensive. TSCA's authority could be effec-
tively utilized to provide a centralized file of usage information.
4.4.2 Economic Information
Marketing Information is required in a non-specific form for the
second screening under the Category I functions. Increasingly more
4-14
-------
specific data are required for Categories II and III functions. The
information can be collected in a manual form, but there is a require-
ment that it be updated annually.
The following existing files may be helpful in supplying the
required informatinon needs: the Data Base of U.S. ITC, the IPC
Chemical Data Base, the Mineral Commodity Survey System, Predicasts
Marketing Systems, the Reporting of Economic Data for Negotiation of
International Transportation Conventions, Dun's Market Identifiers
and the SRI Chemical Economics Handbook,
Of the above files, those which are probably most useful are the
Data Base of the U.S. ITC and the Chemical Economics Handbook. The
Data Base of the U.S. ITC contains data on synthetic organics but it
is publicly releasable only for those chemicals produced by more than
three manufacturers and in quantities over 1000 Ibs. per year. The
SRI Chemical Economics Handbook is manual and again is not exhaustive
in its coverage. The Predicast Marketing Systems and Dun's Market
Identifiers are commercial systems which supply valuable supplementary
information. Their coverage is unever, however, since they rely on
the release of this type of economic information in journals, govern-
ment reports, corporate annual reports, etc. Import/export information
is available through the IPC Data Base and the Reporting of Economic
Data for Negotiation of International Transportation Conventions - the
latter of which is concerned with commodities relative to the tariff
quotas.
4-15
-------
Most of the agencies queried do require this type of information,
and are currently obtaining it through the use of contractors. The
fact that only a manual file is required for this type of information
may suggest that OTS might cooperate with other agencies in defining
common information needs for these data, locator files to existing
data and designing access patterns.
4.5 Exposure
Information requirements and systems to fulfill these requirements
with respect to Exposure can best be discussed in terms of Occupational
Exposure, Environmental Exposure and Consumer Exposure. Eventually it
would be desirable to be able to also discuss cumulative exposure doses
due to a variety of these sources, but these types of data are very
difficult to obtain at the present time.
A.5.1 Occupational Exposure
To perform Category I functions, occupational exposure informa-
tion is required at a moderate degree of specificity. This specificity
requirement increases with passage to a Category II or III function.
It has been requested that this information be available through an
interactive mechanized file.
The following files and data systems may be of use in providing
this type of occupational exposure information: AEROS, Dun's Market
Identifiers, the National Occupational Hazard Survey, the Research
Program of Chemicals That Impact Man and several bibliographic systems
such as CANCERLINE, TOXLINE and the OSHA Technical Data Center.
4-16
-------
No one comprehensive source of occupational exposure exists. Of
the above systems, the most applicable is that associated with the
National Occupational Hazard Survey conducted by NIOSH in 1973. This
survey covered approximately 5000 of the estimated 5 million work-
places in the United States and was a one-time effort. Information
was collected on products, processes, number of workers, exposure,
presence of medical exams, protective equipment, etc. The exposure
data generated in this survey are mechanized and being used by NIOSH
in establishing program priorities. These data are also being used
by the Interagency Testing Committee as a primary source of occupa-
tional exposure data. The Dun's Market Identifiers and the BLS
Annual Survey of Injuries and Illnesses also contain info'rmation on
the number of workers per workplace as long as it exceeds eleven. In
addition, AEROS in its NEDS subsystem contains emissions and resultant
work force exposure data.
Several Federal agencies requested occupational exposure infor-
mation including OSHA, DOD, and NCI. The Interagency Testing Committee
was also concerned about occupational exposure. OSHA has a mandate to
safeguard worker health and has required industry to maintain health
and safety files including work assignments, exposures in excess of
the TLVs, adverse reactions, etc., for at least 30 years. OSHA merely
retains the ability to request this information from industry on an
as needed basis. OTS has a justifiable requirement for occupational
exposure information. It may be that OTS's requirements for occupa-
tional exposure could be combined with those of OSHA. Certainly a
4-17
-------
common request format for information storage and retrieval relative
to industry should be considered.
4.5.2 Environmental Exposure
For the first level screening under the Category I function,
environmental exposure data are required of a moderate specificity.
For Categories II and III functions, more specificity is required.
The environmental exposure data available should cover air, water,
soil and plants. As with the occupational exposure information, an
interactive automated file would be required.
The following existing data systems and files can potentially
supply environmental exposure information: AEROS, AGRICOLA, WATERDROP,
STORET, the Research Program of Chemicals That Impact Man, CANCERLINE
and TOXLINE.
For non-criteria pollutants there is no comprehensive file of
environmental exposure. EPA's AEROS system collects air pollution
data from a number of state and local agencies as well as from the
EPA monitoring network. This includes air quality as well as emissions
monitoring information. STORET serves a similar function for water.
The Research Program of Chemicals That Impact Man contains available
environmental exposure information on selected chemicals. In addition,
for water WATERDROP plans to collect monitoring data to determine the
presence of organic chemicals in water. AGRICOLA, formerly known as
CAIN is owned by the USDA and contains bibliographic references to the
effects of various emissions and effluents on crops and livestock.
4-18
-------
TOXLINE and CANCERLINE would similarly cite wildlife and plant effects
due to environmental exposure to toxic substances.
Environmental exposure information was widely requested, but
minimally available from present systems. Various Federal agencies,
including DOI, DOD, NCI, EPA, ERDA, in addition to the Interagency
Testing Committee, expressed a justifiable need for environmental
exposure information. EPA seems to be a logical focal point for the
collection of this sort of information and for the synthesis of
their existing raw data into a more reliable and usable form.
4.5.3 Consumer Exposure
Generalized consumer exposure information is sufficient to meet
Category I activities, but for Categories II and III more explicit
information is required. Again an interactive data file capability
has been requested for accessing this type of exposure data.
Several existing systems could help to supply some of the required
information on consumer exposure. They are: the National Electronic
Injury Surveillance System, the Meat and Poultry Inspection Monitoring
System, the Research Program of Chemicals That Impact Man and the
bibliographic files of CANCERLINE and TOXLINE.
There is no primary information system in the area of consumer
exposure. The best existing system is probably the Research Program
of Chemicals That Impact Man developed by SRI for NCI. This file is
limited to nine categories of information. The Meat and Poultry
Inspection Monitoring and a similar fish monitoring system supply
4-19
-------
concentrations of a number of pesticides and heavy metals in animals
and can be used as an indicator of human exposure due to their
ingestion. The National Electronic Injury Surveillance System con-
tains information on accidents associated with consumer products
reported to hospital emergency rooms. The information is reported by
generic classes of chemicals and is useful for acute type injuries
only. Several of the poison information systems such as POISINDEX,
and the Poison Control On-Line Inquiry System might also contain some
information on adverse consumer reactions, but the bulk of these data
relate to ingestion of substances by small children.
NCI, CPSC and ERDA, in addition to the Interagency Testing Com-
mittee, voiced a need for this type of information. There is no ade-
quate source to meet this sort of request in either a manual or
interactive mode. It therefore remains that a new system will be
required to fulfill this justifiable need.
4.6 Ep id emiology
Epidemiology studies are concerned with identification of popula-
tions exposed to toxic substances and their resulting adverse reactions.
These studies generally deal either with an occupational population or
with an identified section of the general population.
Epidemiological information is required for Category I secondary
screening functions though it need not be highly precise. In order to
complete Categories II and III activities greater specificity is
required. A manual system would be sufficient to meet the expressed
user information needs.
4-20
-------
A number of existing files contain epidemiolo,gical information.
These include: BLSrs Annual Survey of Injuries and Illness and
Supplementary Data System, the Atlas of Cancer Mortality, the Inter-
national Cancer Epidemiology Clearinghouse, the National Center for
Health Statistics, the National Electronic Injury Surveillance System
(NEISS), the National Occupational Hazard Survey, Poison Control
On-Line Inquiry, the Population Studies System, the Standards Comple-
tion Program, the Toxicology Data Bank and a number of bibliographic
systems including BIOSIS, CANCERLINE, NIOSHTIC, the Technical Data
Center, and TOXLINE.
With regard to supplying occupational epidemiology information,
there are several important systems. BLS's Annual Survey of Illnesses
and Injuries and Supplementary Data System together provide biannual
information on all work-related adverse effects requiring workman's
compensation by facility in addition to supplying total workforce
numbers for comparison. These data are proprietary in nature which
means that only summarized statistics by state and SIC code are pub-
licly available. In addition, the illness codes are not very
specific. For example, it is impossible to differentiate a liver
cancer from an ulcer. The National Occupational Hazard Survey pro-
vides data on 5000 establishments collected during a single survey
conducted in 1973. Worker health data were collected as well as
information on exposure levels to various potentially toxic substances.
It is extremely difficult to obtain chronic occupational effects data
4-21
-------
and most of the above collected data are of an acute nature. In
general, only where individual industries have been surveyed in
detail for a long number of years can valid conclusions be drawn.
Epidemiology information on the general population is also
available, but again it is usually based on acute toxic reactions.
Information on consumer-related adverse reactions to toxic substances
is available form the National Electronic Injury Surveillance System
which collects hospital emergency room data relative to injuries
associated with consumer products. Similar information is available
from the poison control adverse report systems. The International
Cancer Epidemiology Clearinghouse and the Atlas of Cancer Mortality
produced by NCI provide geographic as well as body site, sex, and race
information on cancer development and death.
The National Center for Health Statistics has collected U.S.
prevalence data on a number of conditions based on a nationwide survey
system. They are also one of the best sources of mortality informa-
tion. Their disease prevalences can be used to provide a baseline
against which alternative incidences can be measured. The number of
conditions on which they have collected data, however, are limited
and often generic in nature.
If baseline demographic data were required for comparison of
exposed and unexposed populations, this could be obtained from the
Census Bureau by all geographic divisions down to census tract.
4-22
-------
Need for epidemiologlcal information was expressed by OSHA, DOI,
DOD, CPSC, NIEHS, NIOSH and NCI as well as the Interagency Testing
Committee.
OTS has the authority to request health and safety information
from industry under section 8(d) of TSCA. TSCA also has a provision
which requires industry to maintain this sort of data in an accessible
form. OSHA may also solicit health and safety information, and indus-
try has begun to design computer systems to provide this information
to OSHA in an approved format. Care must be taken in designing OTS's
industry request format to ensure that it is compatible with OSHA's.
4.7 Biological Effects
The systems that provide relevant data concerning biological
effects are discussed in terms of systems which provide data, on Acute
Toxicity, Chronic Toxicity, and Metabolism. Both acute and chronic
toxicity data are necessary for second level screening in Category I,
however the data are not required to be precise. Data are required
to be available on an interactive basis and the information should be
as current as possible. NCI, OSHA, and various offices within EPA
preferred the data to be available in a centralized file for easy
accessibility. Category II functions required access to toxicity data
on an interactive basis with a little more specificity. Category III
functions, however, required greater specificity to substantiate the
need for regulations. Since more time is available for preparation
of information for selected high priority chemicals, interactive
access to the data was not justified for this function.
4-23
-------
4.7.1 Acute Toxicity
There are a number of systems which contain Acute Toxicity
information. Systems which contain relevant information are the
Advisory Center on Toxicology file, CTCP, POISINDEX, Poison Control
On-Line Inquiry, the Fish Pesticide Research System, the Mammal Toxicity
and Repellency Data Base, the Military Entomology Information System,
OHM-TADS, the Organic Chemical Producers Data Base, the Supplemental
Data System, the Research Program of Chemicals That Impact Man, the
Registry of Toxic Effects and the Toxicology Data Bank. Bibliographic
files which are frequently the source of acute toxicity data include
BIOSIS, NTIS, MEDLINE, TOXLINE and the Toxicology Research Projects
Directory.
CTCP, POISINDEX and the Poison Control On-Line Inquiry System
provide acute toxicity data collected from the published literature,
which have been evaluated before entry into the system. In addition,
these systems provide antidotal information for treatment of poisonings
involving the referenced chemicals. These systems are limited in size
(although POISINDEX now has 160,000 entries) and have been designated
primarily to assist physicians in the treatment of poisoning cases.
The Registry of Toxic Effects is the largest mechanized file of
acute toxicity data. Data are extracted from the published literature
and the sources are cited. There has been no evaluation made of the
data before entry into the system, but as stated, the reference is
cited so it is possible to obtain the original source for evaluation.
4-24
-------
This file will be available on-line in the near future through the
National Library of Medicine,
The TDB is a smaller file, now providing data on 1000 chemicals.
This file provides acute toxicity data for all data which are evalu-
ated before entry into the file. Also, the Research Program of
Chemicals That Impact Man developed by SRI for NCI provides data on
acute toxicity. It is used primarily by NCI to assist in selection
of chemicals for entry into the Carcinogenesis Bioassay Program.
OHM-TADS provides acute toxicity data for 1000 chemicals fre-
quently transported and therefore, subject to spills, fires, etc. In
addition, PARCS provides acute toxicity data for all registered pesti-
cides.
The Registry of Toxic Effects and TDB provide immediate sources
of relevant information pertinent to the needs of agencies involved
in controlling toxic substances.
4.7.2 Chronic Toxicity
Systems which provide data relevant to chronic toxicity are
CANCERLINE, CANCERPROJ, the Carcinogenesis Bioassay Data System, the
Information Bulletin of the Survey of Chemicals Being Tested for
Carcinogenicity, Information Storage and Referral Section, the Inter-
national Cancer Epidemiology Clearinghouse, the Laboratory Animal Data
Base, the NCTR Experiment Information System, the Organic Chemical Pro-
ducers Data Base, the Registry of Toxic Effects of Chemical Products,
the Research Program of Chemicals That Impact Man, the Survey of
4-25
-------
Compounds that Have been Tested for Carcinogenicity, the Toxicology
Research Project Directory, EMIC, ETIC, TOX-TIPS and TOXLINE.
The number of systems mentioned provides evidence that there is
no one file which provides information on an interactive basis which
would be responsive to the stated need for a coordinated file. The
concern was expressed by many offices and agencies that a coordinated
file is critical to discover what testing is being conducted in order
to reduce duplication. The coordinated file should contain as a
minimum, the type of test, the test method utilized, the investigator's
name and association, the species utilized and the results. The file
would serve as a validation both of testing methodology and of the
effectiveness of in vitro tests as a predictor of in vivo test results.
TOX-TIPS and the IARC Information Bulletin of Chemicals Tested
for Carcinogeneis provide the basis for a coordinated file of Carcino-
genesis Testing Information. EMIC is the focal point for mutagenic
testing data. However, with the widespread use of mutagenic screening
tests, much of the data that were formerly being published in the
journal literature and collected by EMIC are not now being published,
but are remaining part of company or governmental files. Testing
data submitted in response to testing regulations and as part of a
pre-manufacturing notifications will be entered into the EPA/OTS Reports
Management System. These data will need to be analyzed and made
publically available. In addition, EPA can require health and safety
data to be submitted under 8(d) of TSCA and, again, there would be the
requirement to make these data publically available. It is clear that
4-26
-------
to be responsive to the user requirement for an interactive file of
chronic toxicity information, considerable planning is necessary to
link existing files of data with files to be generated as a result of
TSCA regulations.
Testing regulations for submission of chronic toxicity data will,
by necessity, have to include standard formats for reporting of the
data. These formats should be consistent with data submissions re-
quired by other government agencies. Industry is most agreeable to
working out standardized reporting systems for both acute and chronic
toxicity test reporting, particularly with so many of the larger firms
considering the development of in-house information systems. Systems
such as CBDS and the NCTR Experiment Information System will be
examined for possible utilization by EPA for storage and retrieval of
long term testing data.
ETIC provides the best source of teratology data, even though it
is a very new system. TDB has assembled information on carcinogenicity,
mutagenicity and teratogenicity in one file, but only for selected
chemicals.
4.7.3 Metabolism
There is no cited requirement to have data with respect to metab-
olism in a computerized file. TDB does, however, provide an on-line
source for metabolism data for approximately 1000 chemicals. Other
sources of this type of data are CANCERLINE, BIOSIS, Fish Pesticide
Research, Information Storage and Referral Section, MEDLINE, NTIS,
4-27
-------
TOXLINE, and the Toxicology Research Project Directory. Most of these
files are bibliographic and provide references to journal articles.
Very little effort has been spent in terms of putting this type of
information into a machine file, although the value of access to this
data has become increasingly important.
4.8 Environmental Effects
Environmental Effects data can be divided into three main classes:
Degradation, Transport and Fate, and Disposal Procedures; Ecological
Effects and Bioaccumulation; and Weather and Materials Effects. Of
the above, the information required for first screening Category I
functions is degradation, transport and fate and bioaccumulation data,
with weather and materials effects also being required for the second
screening. For Categories II and III functions data in all areas are
needed with increasing degrees of specificity. A manual system would
be sufficient to supply these needs.
Most of the existing applicable files are bibliographic or referral
in nature. These include AGRICOLA, APTIC, BIOSIS, the Defense Docu-
mentation Center, The Federal Inventory of Environment and Safety
Research, NTIS, OASIS, SSIE, SWIRS, Toxicology Research Projects Direc-
tory and TOXLINE. The only identified non-bibliographic source of
information on degradation, transport and fate and disposal procedures
is the Research Program of Chemicals That Impact Man which has collected
information, when it is available, on 3200 chemicals in commerce. Some
information on bioaccumulation levels of heavy metals and pesticides
in several animal species can be gained from the Fish Control Laboratory
4-28
-------
Data Bank, the Fish Pesticide Research, the Environmental Contaminant
Monitoring Program and the Military Entomology Information Service.
These data are usually collected only in a few species and testing is
done on only a few prescribed chemicals. Its general usefulness to
the broad consideration of the effects of toxic substances on the
environment is questionable. In the area of weather and materials
effects, no useful non-bibliographic systems have been uncovered.
From the above discussion of existing data sources, it becomes
apparent that no primary file exists which adequately addresses the
area of environmental effects. This information was requested, how-
ever, during almost all of the Federal interviews and those with the
Interagency Testing Committee.
EPA has had a history of being the focus for the collection of
environmental effects-type data especially regarding selected criteria
pollutants. It seems reasonable to consider that they would provide
the central point for the collection of the types of data required in
the area of environmental effects for toxic substances regulation and
decision making. Only ranges of bioaccumulation data were required
for all chemicals, and the other types of environmental effects infor-
mation were only required for a narrowed list. A manual file could
therefore exist which would collect this sort of information using
the applicable bibliographic files to make it more readily accessible.
4.9 Standards and Regulations
Information concerning relevant Federal, state, local and inter-
national standards and regulations would be required to perform the
4-29
-------
second level of screening under Category I and for all subsequent
Categories II and IfI activities. For all purposes, a manual file
could be sufficient.
Several existing interactive files contain Federal regulations
such as CRECORD and the Congressional Information Service, Inc., and
are searchable by the chemical names under which the regulations were
promulgated. The NIOSH Registry of Toxic Effects of Chemical Sub-
stances contains all OSHA occupational exposure standards in both an
automated and a manual form. Other sources of Federal standards
information include the Standards Completion Program and the Technical
Data Center. Specific pesticides information is contained in the
Pesticide Reporting System and the Pesticides Enforcement Management
System. Import/Export information is also available in the Data Base
of the U.S. ITC and the IPC Data Base and the Multilateral Trade
Negotiations Data Base.
Only the Advisory Center on Toxicology and the Pesticide Report-
ing System have been identified as containing state regulations, on a
national basis. The files of the Advisory Center on Toxicology are
manual, making them not readily available for reference. Likewise,
international regulations and standards only exist in a bibliographic
form in EPA's Environmental Reports Summaries.
There does not seem to be a good centralized source for obtaining
all relevant standards and regulations concerning a given chemical.
All government agencies and the Interagency Testing Committee contacted
4-30
-------
during the course of this project expressed a desire to have access
to this sort of a file, though a manual file would be adequate.
4.10 Summary and Conclusions
The following discussion summarizes the conclusions drawn in
Section 4 regarding those systems most capable of providing informa-
tion in a given subject area. In Sections 5 and 6 of this report
those data bases selected for inclusion in the core systems will be
more closely analyzed. Other data bases, including many of those
mentioned in this summary section, should be peripherally available
but a need does not exist for them to be directly linked.
Substance identification is required for all chemicals. The
Chemical Information System and CHEMLINE, both of which will have
incorporated the candidate inventory list, are potentially able to
satisfy this user requirement. Information on chemical and physical
properties, composition and impurities is required for only a selected
subset of chemicals and access to it need not be mechanized. The
Chemical Information System and the Toxicology Data Bank do provide
selected data concerning physical and chemical properties, but
more attention is required in this area. With regard to composition,
several agencies have responsibility within their mandated areas of
concern to collect composition information. A major distinction must
be made, however, between chemicals and products. When areas of re-
sponsibility overlap, plans for cooperation and file linkage need to
be developed. An additional difficulty associated with composition
4-31
-------
information regardless of the ownership, is that it is generally con-
fidential in nature. No comprehensive file exists of impurities in
commercial chemicals Similarly regarding chemical analysis techniques,
no data base exists which can adequately satisfy the identified user
requirements.
There is a justifiable need for information on production, quan-
tity, plant location and identity of the manufacturers for all chemicals
in commerce. This production quantity information is not currently
available for all chemicals. The best available sources being the
Data Base of the U.S. ITC which only covers synthetic organics and the
Census of Manufacturers whose data are collected solely by SIC code.
In addition, much of the data in the above two data bases are confi-
dential, with only summary statistics available for public release.
The SRI Directory of Chemical Producers has only selected site specific
production information.
Information has been requested on changes in processes and con-
trol technology. The Kirk-Othmer Encyclopedia of Chemical Technology
is probably the best source covering the largest number of chemicals,
but it is manual and somewhat dated. The Organic Chemical Producers
Data Base constructed by Radian for EPA is probably the best automated
file of this type of data.
Range information on chemical by-products and impurities is re-
quired for a large number of chemicals for prioritization and hazard
identification. The Organic Chemical Producers Data Base and the SRI
4-32
-------
files are the only real sources of by-products and the number of
chemicals covered is limited. No comprehensive source of information
on impurities exists.
In order to perform hazard identification and early warning
functions, there is a justifiable requirement for usage information
on all chemicals in commerce. No file contains comprehensive use
information for all chemicals, but the SRI files, The National Occu-
pational Hazard Survey files and those of the CPSC and the poison
control centers such as POISINDEX and the Poison Control On-Line
Inquiry System could provide some relevant information. One of the
greatest problems associated with the utility of these various files
is that they all have unique use terminologies. There is a critical
need to adopt a common use terminology to permit multiple file access.
Economic data are typically required on a base-by-case basis to
assess the impact of a proposed regulation. No justifiable need
exists to have a comprehensive file, In general, highly specific
data are required only for a particular chemical or chemical group
and include market share by use, the availability of substitute chemi-
cals, etc. What is required, however, is an awareness of the existence
of such data in other agencies where it has been collected for their
mandated purposes.
The need for summarized data with respect to occupational, environ-
mental and consumer exposure is justified for all chemicals for hazard
identification and early warning. There is no comprehensive file of
4-33
-------
occupational exposure. The Occupational Hazard Survey is useful
though limited to data collected during a one-time walk-through of 5000
workplaces and extrapolated statistically to cover all workplaces.
The SRI Research Program of Chemicals That Impact Man provide expo-
sure data for select categories of chemicals. Monitoring files can
be used to derive some exposure information, but they are generally
structured on a priority and criteria pollutant basis. There is no
general consumer exposure file, although the CPSC file provides some
range data.
There is no expressed requirement for epidemiological data for
all chemicals in commerce. Studies of this type are usually required
to substantiate regulatory activities and for that purpose are per-
formed on a case-by-case basis. There is, however, a need to know
what previously conducted studies are available and their results.
There is also a need to collect baseline data for comparison with
observed results in order to perform early warning and hazard identi-
fication functions. The National Center for Health Statistics collects
information only for the presence and progression of certain diseases.
NCI's Atlas of Cancer Mortality is also highly specific.
There is a justifiable requirement for a comprehensive index of
all types of acute and chronic toxicity testing for the purposes of
(1) identification of those chemicals which have been tested, (2) the
validity of the test methods, and (3) the results. However, much of
this type of information is unevaluated and would require a reference
4-34
-------
to the original source for verification. Several existing files such
as the NIOSH Registry of Toxic Effects and the Toxicology Data Base
could be used to build a comprehensive toxicity index. TOX-TIPS and
the IARC file of Chemicals Being Tested for Carcinogenicity could
prove useful in identifying compounds under test. Currently, EMIC
provides the only centralized collection point for results of muta-
genic testing, but with the wide use of bacterial screening tests,
much of this information will probably go unreported and hence un-
collected. ETIC serves a similar function for teratological testing.
Information on bioaccumulation, degradation and transport and
fate are required for all chemicals in order to prioritize them and
for Early Warning and evaluation of pre-manufacturing notices. Environ-
mental effects information is not available in a coordinated fashion
for all chemicals, though a number of relevant bibliographic files do
exist. Several files contain bioaccumulation data for pesticides or
heavy metals in a selected list of species but both the chemicals and
species are very limited in number. A baseline file of normal accumu-
lation, transport, degradation, etc., levels is required on a large
number of chemicals to provide a basis for comparison of values
submitted by industry as a part of a pre-manufacturing notification
data package. Such data do not presently exist in a collected form.
Several Management Systems have been identified in the user re-
quirements study for purposes of assisting OTS and other offices in
EPA to more efficiently manage activities associated with TSCA, They
4-35
-------
are primarily tracking systems for decision packages, petition and
substantial risk notifications, and correspondence. In addition, a
compliance and monitoring management system is required. The require-
ments for automation are not defined yet because the volume of trans-
actions is not clearly identified at this time.
Several areas have been identified above where there are not
adequate files to meet justifiable user requirements. Other areas
have been identified where there are existing files of information
which satisfy all or some of the total requirements for specific
types of information.
In the next sections of this report, METREK will present sug-
gestions for the creation of new files and the agency which should
have lead development responsibilities. In addition, recommendations
will be made for linking existing and proposed files with various
systems development options.
4-36
-------
5.0 DEVELOPMENT OF AN INTEGRATED RETRIEVAL SYSTEM
5.1 Background
Section 10(b) of TSCA provides the EPA Administrator with the
authority to establish a system within EPA to collect, use and dissem-
inate data submitted to the Administrator under this Act. Section
10(b)(2)(A) authorizes the Administrator, with the cooperation of the
Secretary of HEW and other heads of appropriate agencies, to develop
an efficient and effective system for the retrieval of toxicological
and other scientific data necessary to carrying out the purposes of
this Act. The Act also explicitly states that systematized retrieval
shall be developed for use by all Federal agencies with responsibilities
in the area of regulation or study of chemicals and their effects on
health or the environment.
The legislative intent is clear in calling for EPA to establish
a system which will collect, store and disseminate data received in
response to regulations promulgated under TSCA. It is also clear that
EPA is to use the information gathering authorities of the Act not
only to assist other agencies in carrying out their respective re-
sponsibilities under TSCA, but also to apply this information to the
regulation of chemicals under various other legislative maridates.
The implementation of TSCA provides a unique opportunity for EPA
to design and build an information system capable of being responsive
to the needs of decision-makers in all government agencies. The
Act provides extensive authorities to collect information necessary to
5-1
-------
assess the environmental aspects of industrial chemicals. It provides
the authority to 'fill in the "information gaps" that exist in the
authorities of such Acts as the Federal Insecticide, Fungicide and
Rodenticide Act and the Food, Drug and Cosmetic Act which focus on
regulation of chemicals for specific uses. Table 2-2 in Section 2.3
demonstrates the overlapping authorities of existing legislation, but
also shows the impact that TSCA will have on the "universe" of chemicals.
EPA has stated in its "strategy document" published in February
1977, that it intends to utilize TSCA as an "important tool for develop-
ing the information base which will undergird many major decisions of
the future." It is further stated, that the explicit provisions of
TSCA underscore the clear intent of Congress that this legislation
serve the interests of many organizations in a variety of ways,
particularly with regard to acquisition and dissemination of data.
Furthermore, EPA recognizes that just as a coordinated approach
to the assessment and control of toxic substances is necessary, a
coordinated approach to data systems development is also necessary.
A critical first step was the assessment of user requirements of EPA,
other Federal agencies and private groups for information concerning
chemical substances, with particular attention being given to common
information requirements that could best be satisfied through TSCA.
It was also clearly recognized by EPA and the designers of the
legislation that a multiplicity of data activities presently exist
which collect, store, and disseminate data relevant to toxic substances
5-2
-------
regulation. Coordination within the government is critical to:
(1) assess the information requirements; (2) assess the existing
systems which satisfy these requirements; (3) identify the gaps in
information needed for regulatory purposes; (4) limit the total
reporting burden on industry; and (5) identify ways to make the in-
formation acquired under TSCA available as widely and as promptly as
possible.
Section 2 of this report presents the results of the user re-
quirements study. These users are looking to EPA to exercise TSCA's
information collection authorities and provide a comprehensive
system capability that permits access to these data. Furthermore,
they are looking for a capability to perform data correlations to
assist in the assessment of health and environmental effects. EPA
and other agencies plan t o use the information system to support
decision-making activities such as early warning, and selection of
chemicals for test, risk assessment, etc. Furthermore, they are
expecting to use this information to assist in the prediction of
health and environmental effects, in establishment of priorities for
long-term testing and in development of regulations.
For the desired systems capability to be responsive to user re-
quirements, it must be a comprehensive, integrated system capable of
providing a variety of data on a large number of chemicals. The
system must permit public access to the non-proprietary information
obtained under TSCA, but still provide full confidential protection
5-3
-------
to the data that are "trade secret." TSCA specifically excludes
from claims of confidentiality health and safety studies on chemicals
offered for commercial distribution and on chemicals subject to pre-
market notification or testing requirements. Other data such as
process information, may be considered confidential and will require
protection from disclosure. Systems development options responsive
to user requirements and consistent with EPA strategy to implement
TSCA are presented in the following sections.
5.2 Approach to Defining Systems Development Options
The results presented in Sections 2 and 3 identified the user
requirements for data and inventoried the currently available data
bases and systems which are potentially able to satisfy those needs.
Certain information gaps were found as were some apparent duplications
of effort. The objective of this section is to define system design
concepts which will satisfy the user requests.
At the direction of the project officers, the METREK analysis
of systems development options was confined to examining feasible
concepts of systems integration. Consideration was given at the out-
set to an on-line retrieval system that would link a series of com-
puterized information files and direct an on-line user to other
external information files (which may or may not have on-line access).
Recommendations were to be formulated with the ultimate goal of
achieving a system usable by the "end-user" rather than being limited
to information specialists or librarians. The priorities and policies
5-4
-------
of EPA/OTS were to be considered in terms of their impact on informa-
tion needs, data acquisition and system implementation. Scheduling
considerations from the user point of view were to be emphasized in
recommending systems development options.
A complementary effort by another independent contractor
addresses the development of a program to implement the systems
development recommendations presented in this report. Included in
the complementary effort are an analysis of software and hardware
characteristics and requirements, system maintenance requirements,
detailed systems specifications, and the costs associated with imple-
mentation of the recommended plan.
A first step in defining systems development options is to
formulate the long-range goals and objectives for a comprehensive
integrated system that is responsive to satisfying information re-
quests. Once the long-term capability has been formulated, alterna-
tive approaches for achieving that capability by utilizing or modify-
ing currently existing systems can be developed and evaluated.
Alternative approaches are dependent, however, upon EPA policies and
priorities for exercising the data gathering authorities granted by
TSCA. The extent to which EPA issues regulations requiring the sub--
mission of various categories of data greatly affects the nature of
any data base or system capability at a given point in time.
To limit the number of possible alternatives which could be
considered and to provide a framework for recommending systems
5-5
-------
development options, three scenarios have been developed. The first
is based on EPA information gathering policies as stated in their
strategy for implementing TSCA and current plans for section 8 rule-
making. The second is based on an incrementally increased information
gathering policy of EPA in terms of sections 8(a) and 8(b). The
third is based on an EPA policy to fully implement all data gathering
authorities listed in 8(a)(2).
Within each of these scenarios, specific system development plans
are presented. The designs are presented in terms of definitions of
component files, system linkages, file ownership and accessibility.
The relationships with other Federal files are defined in a way con-
sistent with their functional responsibilities. In developing the
system plans, a number of considerations are addressed. These include:
the current stage of development of data systems and bases; the degree
to which information requirements can be fully satisfied; the systems
ability to facilitate analysis of potential hazards and to disseminate
information to a large community of users while simultaneously pro-
viding for protecting confidential information; and the impact on the
users of the time frame within which implementation of enhancements
is possible.
5-6
-------
5.3 Long-Range Objective of a Comprehensive Chemical Substance
Information System
5.3.1 Requirement for Integrated Computer Network
When examining the information requirements integrated across
functions for all users (Table 2-6), it can be seen that there is a
justifiable requirement for an interactive system containing substance
identification data, production data by plant location, use data,
exposure data and biological effects data for Category I chemicals.
After analysis of existing systems, it is clear that this informa-
tion, for Category I chemicals, is not presently available in existing
data bases with the one exception of substance identification data
(that is, molecular formula, CAS registry number, CAS name, synonyms,
and chemical structure) for chemicals that are presently on the
"candidate list."*
It is also clear upon careful review of the legislation concern-
ing regulation of chemical substances, that the authority for obtain-
ing such information (production, use and exposure) resides in EPA.
EPA, utilizing the industrial reporting and recordkeeping provisions
of TSCA, is in the position to build a comprehensive data facility
required by EPA and other Federal agencies with responsibilities in
the area of regulation or study of chemical substances and mixtures
in commerce and their effects on health or the environment. EPA,
The assumption is made that the inventory of chemical substances
authorized under section 8(b) is an adequate definition of Cate-
gory I chemicals for most users.
5-7
-------
therefore, has a major role in the creation of such a comprehensive,
integrated system that provides data on all chemical substances.
When one examines the long-term need for an integrated system to
support EPA and other Federal agencies and one which permits rapid
access to information on chemical substances for purposes of making
risk assessments, predicting toxicity, selecting chemicals for test-
ing, approving chemicals for pre-manufacturing, etc., certain compo-
nents appear to be critical. These components vary in size and detail.
Extensive, detailed information similar to that outlined in Table 2-1
may eventually be collected by EPA for all chemicals in the inventory
(50,000 to 100,000) for regulatory purposes. On the other hand, selec-
tive data such as substance identification and structure information
may be obtained on as many as 500,000 chemicals by various agencies
involved in research or regulation under other legislative mandates.
MITRE recommends the following:
(a) That the information required to support TSCA
activities be .implemented in a set of function
specific on-line data bases;
(b) That all data bases which are of primary importance
to TSCA activities and which are likely to be
accessed as part of a coordinate search, should
have compatible data structures and should utilize
common, standardized nomenclature. These primary
data bases are identified on Page 5-15, and are
referred to as "core components;"
(c) That a network of data bases called the Chemical
Substances Information Network (CSIN) be developed.
This network system shall have the capability to
greatly facilitate access to core component systems,
and to direct users to other, non-core component
data bases which contain useful information, but
which are not part of the network system.
5-8
-------
A diagram of the proposed CSIN is shown in Figure 5-1. CSIN has as
its primary objective the service of those Federal agencies involved
in the study and regulation of chemical substances. A secondary goal
must be for CSIN to become a fundamental new information tool for
R&D activities in the biomedical community. This systems network as
shown in Figure 5-1 builds on existing systems, where appropriate, and
provides those additional analytical capabilities necessary to support
the decision-making activities and other governmental functions
previously described in Section 2.4.1.
In developing the network concept illustrated by Figure 5-1,
recognition has been given to the fact that the legislative responsi-
bilities of Federal agencies vary. In some instances, the agencies
are concerned with different types of chemicals (e.g., food, drugs, or
pesticides) although they may require similar categories of data. In
other instances, the agencies may be concerned with different aspects
of regulating the same chemicals and hence could use a common data
facility. This circumstance is perhaps better illustrated by consid-
ering Figure 5-2 and the following example. In support of TSCA-
related regulatory activities, there is a requirement for general
information for Category I chemicals (i.e., all chemicals subject to
regulation under TSCA) in a large number of data categories. There is
also a requirement for more detailed information within all categories
of data but for fewer chemicals (i.e., Categories II and III). To
regulate pesticides, there is a similar requirement for data for
chemicals used as pesticides. Simultaneously, NCI is concerned with
5-9
-------
PAGE NOT
AVAILABLE
DIGITALLY
-------
Oi
I-"
LO
\. Types of
^sChemicals
Data \v
Category x.
Substance ID
Production
Exposure
Biological
trrecus
Environmental
Effects
Pesticides
EPA/
OPP
Regulatory
lesponsi-
bilities
(
C
LJ
Other
Agri-
culturals
DREW 1
Drugs
NIOSH/OSHA
PROGRAMS (NCI
Foods
INVOLVEMENT
, NIEHS, NIOS1
a,
TSCA
EPA/
OTS
Regulatory
Responsi-
bilities
FDA)
LJ
Other
5
)
FIGURE 5-2
DATA INVOLVEMENT OF SELECTED REGULATORY AGENCIES
-------
carcinogenic effects of chemicals, such as pesticides, food additives,
drugs, other agricultural, "TSCA," or other types of chemicals,
while NIOSH/OSHA are concerned with occupational exposure and asso-
ciated health effects of chemicals regardless of their type.
The system design implications of this are that (1) no single
data base can fully satisfy all user requirements, and (2) multiple
data bases must be designed and developed in a manner that facilitates
cross-exchanges of data and retrieval of particular data by chemical
substance identifiers. A further implication is that the ultimate
direction in which various information systems will evolve is dependent
upon the requirements to be placed upon them by many and varying users,
each with his own unique data requirements. Although it is beyond the
scope of this effort to conduct a rigorous and comprehensive analysis
of alternative network systems responsive to these varying require-
ments (since the primary intent of this effort has been directed
at TSCA requirements), a general awareness of their implications has
been incorporated into the analysis used in developing the concept
expressed in Figure 5-1.
The concept of the CSIN proposed herein consists of a set of
core component data bases which are distributed over a network, and a
set of non-core component files, which are known and referenced in
the network, but physically do not reside within it. Several options
are available for linkage of the core-component systems. At the most
sophisticated level, data bases are directly linked. so that they
5-14
-------
appear to the user as a single, coherent system. The user, in essence,
deals with a data resource executive (a piece of computer software),
which in turn deals with the component data bases and their data base
management systems. This is relatively easy to implement if all the
core component systems reside at a single computer facility and under
control of the same data base management system (DBMS). It becomes
more difficult if the data bases are on different computers, and it is
prohibitively complex if the data bases are under the control of dif-
ferent data base management systems. It is probably necessary to
require that all directly linked core component systems be implemented
under the same data base management system. (It should be noted that
several commercial DBMS's are implemented on a variety of hardware
systems). Direct linkage appears to be appropriate for some
component systems, but is probably not required for all. For those
systems which are not directly linked, a user would access the net-
work directory, which would inform him of the location of the data
of interest, and would transfer him to the site of the data base.
From that point on, however, the user would be interacting directly
with the target data base. In order to access another data base,
the user wou],d have to access the directory again, and be redirected.
The precise method of linkage used and the important issues of file
backup and security must be addressed during the CSIN design phase.
Another alternative might involve component systems utilizing
a variety of software packages linked to a minicomputer that provides
5-15
-------
a common "macro language" and query capability which makes accessing
a variety of different systems transparent to the user. Detailed
decisions regarding the type of hardware and software are beyond the
scope of this report, and will be considered in the subsequent analy-
sis. Those data bases which are not core components will be referenced
in the network directory, and their location and method of access will
be given. The actual access will be left to the user.
5.3.2 Individual Components of the Chemical Substance Information
Network
Long-term user requirements for chemical substances information
can best be satisfied by the development, in an evolutionary manner,
of a distributed network of data bases and systems. Within the recom-
mended network, certain files and systems are of primary importance
and are core components. For these core components, user requirements
are best satisfied if these components are structured using common,
standardized nomenclature for data elements and categories. The data
bases of the core components should be maintained by a single data
base management system (DBMS) to facilitate cross exchanges of data
and retrievals of particular data. This, however, does not imply that
a single repository is required. In fact, user requirements are
probably best satisfied by maintaining the core components in differ-
ent computer facilities which provide time-shared systems capable of
supporting large numbers of terminals of various degrees of sophisti-
cation. Actual location of core components is not critical as long
as their access is equable and widely available to the public.
5-16
-------
The core components of the recommended distributed network are:
• Chemical Data Bases Directory
• Chemical Structure/Nomenclature System
• TSCA Chemical Data System (Proprietary)
• TSCA Chemical Data System (Public)
• TSCA Reports Management System
• Toxicology Data System
• Chronic Testing Support System
• Bibliographic Literature Scanning System
• Laboratory Animal Data System
• Regulated Chemicals Standards System
Other data bases and systems must also exist to provide access
to information on additional chemical substances and other categories
of data. Access to these non-core components is by referral with
coordination provided by the Chemical Data Bases Directory. Compati-
bility between data formats, nomenclature, data base management systems,
and overall system capabilities is less critical for these non-core
systems. In some cases, these non-core components are repositories
for categories of data similar to those contained in the core com-
ponents, but the set of chemical substances for which the data are
maintained is specific to certain legislative mandates or research
responsibilities. The specific contents of each of the :'core"
components of the network is discussed with what follows.
5-17
-------
5.3.2.1 Chemical Data Bases Directory. Within the core compo-
nents, the Chemical Data Bases Directory (CDBD) is the pivotal
file in that it is a "help file" and provides detailed information
on the nature of the data bases/systems in the network. It directs
the user to data systems which will satisfy his requirements for
information. It includes component file identifiers, data element
identifiers, and a general discussion of the types of compounds for
which there is data coverage. It does not identify specific chemicals
for which there is coverage. It indicates the specific mode of access,
including file names, telephone numbers, file ownership, file location,
system characteristics, size of file, update frequency, searching
capability, and output media. The CDBD provides standardized data
element terminology for all data elements in the core component systems,
The Directory also includes references to non-core files that may
maintain other data element names, with the Directory indicating the
necessary cross-reference terminology.
The Directory file must be widely available to the general pub-
lic, and structured for easy access. Maintenance responsibilities
will be shared by individual file owners, but the data resource ad-
ministrator of the network will have full responsibility for updating
and maintenance of file integrity. Section 5.5 contains a further
discussion of the data base administrator and network management.
5.3.2.2 Chemical Structure/Nomenclature System. The Chemical
Structure/Nomenclature System is the second critical element of the
5-18
-------
comprehensive Chemical Substance Information Network. This system
provides chemical identification data for approximately 500,000
chemical substances. It provides a sub-structure searching capa-
bility and a locator designator which points to other files in the
system containing information on that particular chemical substance.
The size of the file is important, because this file must serve all
agencies concerned with the study and/or regulation of chemicals.
It must contain chemicals that are used as drugs, pesticides, indus-
trial chemicals or those of research interest. The file must be
searchable by CAS number, CAS preferred name, synonyms, structure,
structure fragment, nucleus probe, molecular weight, etc. System
output must include 'display of the structure. The system must also
be usable without extensive knowledge of chemistry. The locator
designator (referencing all relevant files which contain the chemical)
is clearly feasible and should be an integral part of this system.
Updating such a system is clearly a sizeable responsibility since this
system contains the critical data linkage elements (CAS number,
synonyms, other identification codes and structure).
A continual interface with Chemical Abstract Services will be
necessary to allow for updates to the file as CAS numbers change, and
new chemicals with their respective registry numbers and structures
are added to the file. Industry files and other government files will
also require updates when there are changes in the Chemical Structure/
Nomenclature System.
5-19
-------
The Chemical Structure/Nomenclature System must be publicly and
widely available. As defined above, this system builds on and includes
features of both the present CHEMLINE and CIS/SSS systems.
5.3.2.3 TSCA Chemical Data Systems. The TSCA Chemical Data
Systems are also major components of the network and essentially
provide much of the critical data associated with chemical compounds
in commerce. The systems use a hierarchical file structure with the
chemical compound being the key data element. It is envisioned that
the record hierarchical structure would be similar to the general
scheme illustrated in Figure 5-3. The systems contain varying amounts
of data on the approximately 50,000-100,000 chemical compounds in
commerce and those chemicals that are subject to pre-manufacturing
*
review. The systems are constructed primarily from data submitted
as a result of regulations promulgated by EPA under TSCA and may
contain extensive amounts of confidential information. The TSCA
Chemical Data Systems contain both unevaluated and evaluated data,
(e.g., reviewed testing data). They are the source of, and home for,
chemical information necessary for environmental and health hazard
analyses (i.e., as defined in Table 2-1). Beyond providing a
structured data base, the TSCA Chemical Data Systems must provide,
Although data bases similar to the TSCA Chemical Data Systems are
required to contain similar data for other types of chemicals (e.g.,
pesticides), it is not recommended that these other data bases be
considered core components of the network. Cross-reference linkage
to these systems provided by the Directory, results in their in-
clusion in the Chemical Substances Information Network.
5-20
-------
PAGE NOT
AVAILABLE
DIGITALLY
-------
an analytical data manipulation capability to permit the system user
to identify correlations and interactions between various categories
of data by allowing the creation of specific temporary subsets of the
data file.
Because of the state-of-the-art of the technology associated
with protecting confidential data and the potential for inadvertent
disclosure, METREK recommends creating two systems: one that is a
proprietary system and one that is a public system. Direct access to
the proprietary system is limited to those persons in EPA and other
government agencies who are "approved users." EPA is responsible for
releasing non-proprietary data immediately into a second file for
repackaging to make it publicly available to a large number of users
simultaneously, EPA is also responsible for making decisions concern-
ing repackaging of the proprietary data. Industry representatives have
expressed considerable concern about the release of confidential data,
but agree that summarized or tabulated data or data aggregated in
ranges are acceptable. Consumer groups and environmentalists are
seeking release of as much data as possible in order to have that
information available for scientific review and assessment. A balanced
publicly available data base is the long-range goal to provide protec-
tion to the data claimed as confidential on the one hand, and on the
other hand, make much of the data publicly available.
5.3.2.4 Reports Management System. The Reports Management
System, while not representing a major system from the point of view
5-23
-------
of providing either data or an analytical capability to support risk
assessment-related activities, is critical to EPA's TSCA activities
since it provides a record of individual corporate submissions and
references to stored industrial health and safety studies and other
reports. Its primary function is to provide a reports locator and
tracking capability. The file is organized on a corporation basis and
contains corporation identification information (name, address, etc.)>
plant identification and location data, and references to reports
both requested and submitted on individual chemical substances.
5.3.2.5 Toxicology Data System. A Toxicology Data System is
another critical element of a comprehensive chemical substance infor-
mation network. The purpose of this system is to provide a structured
and consolidated source of biological effects data (e.g., acute tox-
icity, carcinogenicity, mutagenicity, teratogenicity, and other chronic
toxicity data). The system makes available test results and biological
effects data for all types of chemicals or those intended for general
research. It serves as a source for research data from government,
industry, academics, and other international sources. It permits a
user to examine chemicals analyzed using mutagenic screening tests and
compare the results with in vivo carcinogenic testing. Verification
of methodologies across laboratories and/or species will be facilitated.
The system contains the type of study, methodology, race/age/sex,
species/strain, route, site, effects, investigator, length of test,
degree of evaluation of the data, and a reference. The system will
evolve by combining, restructuring and enhancing capabilities currently
5-24
-------
available in TDB, EMIC, ETIC, TOX-TIPS, the IARC Bulletin of Chemicals
Tested for Carcinogenicity, the Survey of Chemicals which have been
Tested for Carcinogenicity (PHS-149), the Registry of Toxic Effects
of Chemical Substances, the Fish Control Laboratory Data Base and the
Fish-Pesticide Research. Access to this system is through on-line
terminals with the files directly linked to the TSCA Chemical Data
System (Public).
5.3.2.6 Chronic Testing Support System. The Chronic Testing
Support System provides a software capability and storage and re-
trieval module for the results of long term chronic toxicity monitor-
ing studies. The system may be used by government agencies in the
conduct of long term carcinogenesis bioassays (e.g., NCTR, NCI), by
EPA in carrying out its testing responsibilities under TSCA or its
other Acts, or by industry when required to conduct chronic tests in
response to government regulation. The system incorporates the require-
ments of the Carcinogenesis Bioassay Program of the National Cancer
Institute and the integrated laboratory support capability required
by the National Center for Toxicological Research. It is designed to
support private, independent agency or industry files with access and
update privileges limited to "approved users." The primary intent of
the system is to provide a computer utility for collection, monitoring,
evaluating, and reporting of bioassay information. The system permits
collection of data on chemicals and chemical preparations, the experi-
mental procedures and test environment, the observation data and
5-25
-------
complete pathology reports on individual animals. The system interfaces
with various statistical application programs and a report generator.
Use of such a system by government agencies and industry encourages
standardization of testing protocols, forces standardization of report-
ing, and incorporates concepts of good laboratory practice. Summary
results from bioassays should be structured for entry into the Tox-
icology Data System.
5.3.2.7 Bibliographic Literature Scanning System. Another major
component of the distributed network is a Bibliographic Literature
Scanning System containing references to toxicological and biomedical
journals. It is designed to assist researchers and other health pro-
fessionals in ascertaining what has been published on any specific bio-
medical subject, including results of human and animal toxicity studies,
effects of environmental chemicals and pollutants, cancer research,
and analytical methodologies. The system is searchable by CAS number,
chemical name, and citation (title, author, journal, etc.). Text
searching of the abstract is also permitted. This component is struc-
tured around existing systems including TOXLINE, MEDLINE, CANCERLINE
and CHEMRiC.
5.3.2.8 Laboratory Animal Data System. The Laboratory Animal
Data System is also recommended for inclusion in the network. This
system contains information on control animals including species,
strain, colony and observed terminal pathology collected from numerous
government and private sources. It provides baseline information on
5-26
-------
control animals and is useful in designing test systems and selecting
appropriate species. For increased compatibility with the other com-
ponents of the network, the Laboratory Animal Data System should be
transferred to the data base management system selected for the network
where it will be widely accessible to the public.
5.3.2.9 Regulated Chemical Standards System. Also incorporated
in the network is the Regulated Chemical Standards System which pro-
vides the user with information on standards or regulations which have
been proposed or promulgated concerning individual chemical substances
or classes of chemicals. The system incorporates occupational stan-
dards, transportation, packaging, and labeling requirements, threshold
levels, and various procedural regulations which impose industrial re-
porting requirements with respect to individual chemical substances or
classes of substances. State, Federal and international standards are
all included. The system is implemented on a data base management
system that is publicy available, thereby providing information to
manufacturers and processors as to their respective responsibilities
under various legislative authorities. Government agencies and inter-
national organizations require this system to maintain awareness of
proposed and promulgated standards in order to minimize the develop-
ment of conflicting standards.
5,4 Supporting Rationale for the Recommended Network Design
The network as defined in Figure 5-1 responds to the user require-
ments for an integrated, comprehensive data network that can be used
5-27
-------
for hazard identification and hazard assessment in the control of
chemicals affecting health and the environment. The network is de-
signed to coordinate collection and storage of like kinds of data and
to make as much of the data available to the public as possible. It
permits comparison of diverse elements of information, provides easily
updated systems and on-line interactive access.
The network provides a system for OTS to maintain information
collected under TSCA and make available the health and safety data in
a manner consistent with the requirements stated in the EPA/OTS RFP
No. WA 77-D072. It also facilitates access to a sub-structure and
chemical nomenclature system for a large number of chemicals. The
use of a common data base management system for all applicable compo-
nent members of the network permits efficient storage of the data,
eliminates redundancy of data items in separate data files, and pro-
motes more efficient processing and accessing of information. It also
enables a user to integrate information across many files providing a
much broader analytical capability. The network design shows direct
linkage of the TSCA Public Chemical Data System and the Toxicology
Data System since much of the data residing in these systems will be
needed simultaneously to respond to the type of queries where correla-
tion among varying types of data is needed. For example, a query
might require the system to identify high volume, high exposure chemi-
cals correlated with chronic toxicity data.
There is no direct linkage of the Chemical Structure/Nomenclature
System with the TSCA Chemical "Data System and the Toxicology Data
5-28
-------
System since sequential searches are acceptable to most users. How-
ever, if the Chemical Structure/Nomenclature System utilized a common
DBMS facility and did not require unique software, direct linkage would
be an automatic by-product available at no additional cost other than
that of converting the nomenclature system to the common DBMS.
Direct linkage of systems is thus recommended only for the
Chemical Data Base Directory, the TSCA Public Chemical Data Systems,
and the Toxicology Data System. The Chemical Structure/Nomenclature
System can be directly linked to these files if it resides in the
same data base management system at the same computer facility or if
the selected DBMS permits distributed data base management at differ-
ent computer facilities. Direct linkage of the other files is not
necessary since sequential accessing is adequate.
The systems selected as core components are included primarily
because 1) the data contained therein are critical to -the study and
regulation of chemicals or 2) the data system's software is critically
needed to store and retrieve necessary data. The core component
systems potentially provide the data necessary for hazard identifica-
tion, hazard analysis, and support for regulations regarding commercial
chemicals as well as for enforcement activities. Coverage of large
numbers of chemicals in the Structure/Nomenclature Systems and in the
Toxicology Data System fulfills requirements of research groups to
look at structural relationships regardless of the use of a chemical.
5-29
-------
The Chronic Testing Support System is included to provide a sophisti-
cated data handling capability for groups involved in long term testing
resulting in large amounts of monitoring data on many individual
animals.
A data system of environmental monitoring data consolidated
across all media was frequently mentioned as being "desirable" by
several groups interviewed. This type of system was not included as
a core component of the network since the feasibility of creating
such a system appears difficult and the requirement does not exist
at this time. Monitoring data for select chemicals are contained in
the TSCA Chemical Data System, and other existing systems are refer-
enced by the Directory. The UPGRADE System, developed by CEQ, pro-
vides an analytical capability to retrieve data from such files
as SAROAD and STORET and may answer the requirement to link environ-
mental monitoring data. As these systems include coverage of larger
numbers of chemicals, consideration may be given to consolidating
summary data into a core component system.
Private files of agencies which contain large amounts of pro-
prietary data are not included as core components but are referenced
by the Directory. Such files as the product composition files of
CPSC and NIOSH, the pesticide registration files of EPA and the drug
application systems of FDA are examples. However, consideration should
be given by these agencies to "spinning off" publicly accessible files
similar to that suggested for the TSCA Chemical Data System. Agencies
5-30
-------
with proprietary data have a responsibility to protect such data, but
they have an additional responsibility to make non-proprietary data
available if its release contributes to science.
Bibliographic files sponsored by Federal agencies other than
those in the NLM System are not included as core components. Files
such as NIOSHTIC and SWIRS should be made available on-line through
a time-shared network. If the public usage is too limited to support
the system in this manner, then the file should be dropped unless the
respective agencies find them critical to their operations,
Inclusion of core system components in the network and actual
development of the systems themselves will result from a dynamic
decision process. Not only do policy decisions within EPA and other
Federal agencies dictate program planning, they also impact on net-
work development. It is important to recognize the impact of these
policy decisions so that subsequent adjustments to the-network design
can be made as required.
New data bases, responsive to particular requirements, must
continue to be developed. They do not, however, always have to be
developed under the umbrella of the network or as part of an existing
system (even though cognizance should be given to inclusion of
standardized nomenclature and compatible file structures, etc.)
5.5 Data Base Administration Responsibilities
Management of the network is best provided by an independent
organization having a mandate to apply its resources to the advancement
5-31
-------
of science by collecting, storing and disseminating chemical and
toxicological information to investigators, educators, government
regulatory agencies and the public at large. Responsibility for over-
all development and maintenance of the comprehensive network as defined
in this report should be placed in an organization where crises,
emergencies and program activities will not take priority over the
information dissemination function. A regulatory agency typically
must respond to situations somewhat beyond their control (e.g.,
citizen's petitions, court decisions, emergency situations) which
cause continual shifts in program activities. Historically, informa-
tion activities in regulatory agencies have been neglected and resources
cut back or reprogrammed in times of crisis.
In the case of CSIN, EPA will have the responsibility for main-
tenance of the proprietary data collected under TSCA. Furthermore,
EPA will have the responsibility to separate the publicly releasable
information from the proprietary data. However, the maintenance of
the resulting public file does not need to be an EPA function. It
can physically reside in a government-owned or a privately-owned
computer accessed through a time-shared network.
The interagency committee authorized in section 10(b) or the
Council on Environmental Quality as designated by section 25(b) can
provide advice concerning which office should have the designated
responsibility for the network, and can continue to serve in an
advisory capacity as the network develops. A data resource admin-
istrator should be selected who is responsible for the design,
5-32
-------
development, operation and maintenance of the system. Cognizance
should be given by the data base administrator to the relationship
between the implementation of the reporting provisions of the Toxic
Substances Control Act and its impact on the network development.
In addition, development of other component systems of the network
must be appropriately scheduled in a manner consistent with the user
requirements.
During the development of the network, considerable attention
must be given by the network management to the creation of publicly
available data bases and packaging of data to serve the diverse
community of users. Consequently the data resources administrator
should possess sufficient knowledge of user applications to perform
a satisfactory trade-off among user demands.
The evolution of a standardized nomenclature is a requirement
for the continued maintenance of the directory and locator designators.
Sensitivity to the problem of unevaluated data versus evaluated data
must be recognized and handled. Where possible, references and
sources must be tagged. Where there are no citable references,
greater detail must be provided in the systems record to allow user
evaluation of the data. Maintenance of data integrity and data
currency of the core component systems are additional responsibili-
ties of the data resources administrator.
5-33
-------
6.0 RECOMMENDED SYSTEMS DEVELOPMENT OPTIONS
In Section 5.0 a comprehensive information network to satisfy
user requirements for information on chemical substances is defined.
It was noted that the specific data gathering policies and plans of
EPA will have a direct impact on recommendations concerning develop-
ment options for data systems. To provide a framework within which
specific and detailed recommendations could be formulated, three data
collection scenarios are defined. In this section, systems development
recommendations are made in response to those scenarios.
6.1 Clarification of Scenarios and Their Systems Development
Implications
Prior to discussing the specific system development- recommenda-
tions, the implications of the various data collection scenarios must
be considered. Each scenario must be analyzed both with respect to
its specific data base and system options and to the setting of
priorities for systems development.
The first scenario assumes that EPA will collect site specific
production information as a part of the inventory reporting under
section 8(b) of TSCA and that it can be processed within the next
three years. It is further assumed that a regulation under TSCA
section 8(a)(2) will require submission of information on amounts
produced by each category of use, descriptions of by-products result-
ing from production, uses, environmental and health effects, exposure
information and the methods used for disposal for approximately 1,000-
2,000 chemical substances of particular interest to EPA.
6-1
-------
At the end of the initial three-year time period, EPA will reach
a decision point. At this time, a choice will have to be made between
continuing the limited data gathering activities described under
Scenario I or initiating an increased data collection policy, i.e.,
the second scenario. Should EPA decide not to increase their data
acquisition activities, no changes in their systems development plans
would be required. If, however, the decision favors adoption of the
second scenario with its increased data collection requirements,
changes in systems development activities must occur.
Under the second scenario, it is assumed that EPA will initiate
a policy requiring submission of information on use, users and expo-
sure in addition to the information already being collected. Further,
it is assumed that the list of chemical substances for which reporting
under TSCA section 8(a)(2) is required will be extended to include a
total of between 7,000 to 10,000 chemical substances. External infor-
mation files, previously developed for other purposes, are used under
Scenario I to provide some use, user and exposure data. Since Scenario
II provides for these data to exist in the TSCA Chemical Data Systems,
access to external files containing use/exposure data would no longer
be critical. Continued maintenance of these files would have to be
predicated on a justification other than TSCA.
After approximately five years, EPA is assumed to make a second
policy decision. This decision involves a choice between continuing
the Scenario II data collection activities or initiating a policy to
fully implement all data collection activities authorized under TSCA
6-2
-------
section 8(a)(2). This would add to the TSCA Chemical Data Systems
information on by-products, environmental and health effects and
disposal methods for all chemical substances on the inventory. It
is further assumed that under this third scenario, EPA will implement
regulations requiring reporting of new uses of chemicals already on
the inventory in accordance with section 5(a) of TSCA.
Initiation of the third scenario will result in a major expansion
of the volume of data held for all chemicals on the inventory in the
TSCA Chemical Data Systems. This expansion will permit the satisfac-
tion of user requirements for all inventory chemicals which were pre-
viously only satisfied for selected chemicals. It will also facilitate
the data searching activities required to access the information, for
under Scenario III, most information previously only available from
external files will be available in a single system. However, EPA
and other regulatory agencies will always have to rely on outside
sources such as the scientific literature, reports from the research
agencies, epidemiological studies, etc., for science-based decision-
making.
The implications of the three scenarios with regard to information
required from industry, and external files needed to supplement this
information are integral to the developmental systems recommendations
presented in Section 5.3. A greater emphasis is placed on the conse-
quences of interactions between the first two scenarios since adoption
of these by EPA is considered most likely. It should be noted that
6-3
-------
full development of the Chemical Substances Information Network is the
ultimate goal of all recommendations regardless of the specific data
gathering scenario.
It is apparent that as EPA collects data under the TSCA reporting
provisions, including sections 4, 5, and 8, the TSCA Chemical Data
Systems will increase in size and dependence on other systems which only
partially satisfy their information needs will diminish. Data bases,
however, will continue to be developed which respond to specific Federal
responsibilities with respect to the TSCA chemicals and others not
covered under this Act (e.g., drugs and pesticides). Network com-
ponents which satisfy multiple user requirements and which contribute
to the satisfaction of EPA stated goals for implementing TSCA, are
given priority in the design of the network. In the long term,
network development will be accomplished by increased data collection
and enhancement of the TSCA Chemical Data System. Simultaneously,
concurrent enhancement or development of other core components must
occur in a manner consistent with 1) TSCA implementation plans,
2) network user requirements, 3) available funding, and 4) a willingness
on the part of concerned Federal agencies to cooperate in data
acquisition and data base development.
6.2 Scenario I Systems Options
As noted above, the TSCA implementation strategy, which impacts
the design of the first scenario, includes the collection of site
specific production data for all chemicals on the inventory during the
6-4
-------
next year. Therefore, the user requirements for this information
would be satisfied by the TSCA Chemical Data Systems to be developed
by the Office of Toxic Substances. Under the assumptions of the first
scenario, EPA obtains information on use, exposure, and biological
effects data for only about 1,000 - 2,000 chemicals on the inventory
under 8(b) or 8(a)(2). This, then places greater near-term reliance
on existing data bases to satisfy the identified user needs expressed
in Section 2.5.1.
In Figure 6-1 existing data systems and systems to be developed
in the future are displayed. The relationships of existing data bases
to the planned data bases are presented in such a way as to illustrate
the modular development of the network. Systems that are required,
during the interim, to at least partially satisfy user requirements
are discussed below in the following text as well as those data bases
which are recommended as integral components of the planned network.
In terms of responding to identified user requirements, under
Scenario I, within the next two or three years site specific produc-
tion information will be contained in the TSCA Chemical Data System
with a public file being developed. Chemical substances identification
data are available from CHEMLINE or from the Chemical Information
System/Substructure Search System. Marketing and use data, exposure
data, biological and epidemiological data, and environmental effects
data for chemicals of concern to the interviewed community of users
will be only partially available from a variety of systems.
6-5
-------
PAGE NOT
AVAILABLE
DIGITALLY
-------
6.2.1 Directory Development Recommendations
The Chemical Data Base Directory is of great importance since it
will be the focal point for information on data bases and reference
sources. Construction of the Directory should be given the overall
highest priority and should begin as soon as possible. Although re-
sponsibility for construction can be decided by the TSCA section 10
interagency committee or the section 25(b) CEQ Committee, responsi-
bility for administration and maintenance of the Directory is logically
within the National Library of Medicine since their system currently
provides terminal access to a large number of potential users of such
a directory. They have already initiated preliminary design work for
a Directory. The Chemical Information System, through its time-shared
network, could also serve as a temporary residence of the Directory —
the only differences being that the existing users of the CIS tend to
be a limited subset of the potential user community, and that the
system utilizes a private computer network rather than a Federally-
funded computer network.
6.2.2 Nomenclature and Structure Development Recommendations
As the next highest priority for the network, it is recommended
that CHEMLINE and CIS/SSS be enhanced along the lines of the present
planning for these files. Beyond those plans, it is recommended that
CHEMLINE include a locator designator for all files identified in
Figure 6-1 with primary attention being given to those files which
become merged or contribute to "core component" files. Improvements
to the CHEMLINE structure searching capability should also continue.
6-9
-------
For CIS/SSS, it is recommended, beyond the current plans to
increase the chemical substance coverage, that a nomenclature search
capability and a locator designator be provided. Enhancements to
substructure searching features of CIS/SSS are also necessary at this
time since the desirable state-of-the-art has not been reached.
Substructure searching is also inherent in the Army's CIDS
system. It is recommended that a coordinated activity in terms of
funding and development of a unified Chemical Structure/Nomenclature
System be initiated in the near future with the specific objective of
planning for development of the more comprehensive system described
in Section 5.3.1. An indication of the advantages and disadvantages
of these systems with respect to nomenclature and structure searching
is presented in Table 6-1. A more definitive evaluation of these
systems with respect to their structure search capabilities is desir-
able and is recommended.
Moreover, for all existing systems, emphasis must be placed on
chemical substance identification, since these data elements become
the critical linkages or connections between existing data bases.
Chemical Abstract Service preferred names (the widely accepted stan-
dardized nomenclature for chemical files) are used in a number of
files, but the majority of files have not been name-matched and pro-
vided with CAS numbers and names. Clearly, use of a CAS number
provides a universally acceptable standardized nomenclature and its
use should be encouraged. EPA, through the Chemical Information
6-10
-------
TABLE 6-1
SELECTIVE COMPARISON OF STRUCTURE SEARCHING APPROACHES
ASPECT
CHEMLINE
CIS/SSS FILE
CIDS
ADVANTAGES
LARGE NUMBER OF CHEMICALS
LOCATOR FILE
SEARCHABLE BY
• CAS NUMBER
• CAS NAME
• SYNONYM
• ULN
• MOLECULAR FORMULA/WEIGHT
• RING CHARACTERISTICS
• NAME FRAGMENTS
PUBLICLY AVAILABLE
SEARCHABLE BY
CAS NUMBER
SUBSTRUCTURE COMPONENT
CIDS KEYS
NUCLEUS PROBE
ATOM BY ATOM APPROACH
MOLECULAR FORMULA
MOLECULAR WEIGHT
• COMPOUNDS FROM MANY FILES
INCLUDED
• PUBLICALLY AVAILABLE
• SYSTEM BASED ON CAS CONNECTION
TABLES
SEARCHABLE BY
• MOLECULAR FORMULA
• STRUCTURAL FRAGMENTS
SHORT STRUCTURE SEARCH
LEARNING TIME
GOOD STRUCTURAL DISPLAY
CAPABILITY
DISADVANTAGES
• NON-CYCLIC STRUCTURES SEARCHABLE
ONLY BY NOMENCLATURE
• FEW FILES INCLUDED IN LOCATOR
• NO STRUCTURE DIAGRAM ENTRY AND
RETRIEVAL CAPABILITY .
NO NOMENCLATURE SEARCHING
CAPABILITY
MUST SEARCH EACH FILE
INDEPENDENTLY
STRUCTURE SEARCH METHODOLOGY
DIFFICULT TO LEARN AND
REQUIRES MORE ADVANCED
CHEMICAL KNOWLEDGE
STRUCTURAL DISPLAY NEEDS
IMPROVEMENT
LIMITED KEYS
-LIMITED TYPE OF CHEMICALS
INCLUDED
SPECIAL HARDWARE REQUIRED
FOR PRINTOUTS
NOT WIDELY AVAILABLE TO
PUBLIC
-------
System, has registered a large number of files and made the CAS
number, name and structure available through CIS/SSS. This has been
extremely useful in terms of standardizing nomenclature and making
structure information for these chemicals available. It is
important that owners of the file follow the registration of the CAS
name and number with incorporation of this information into the file.
Priority must be given to name matching files which will be needed
in the interim and which are identified in Figure 6-1.
Data elements such as the CAS number, CAS name, or Wiswesser
line notation code (WLN), when present in more than one file, can
provide a linkage between those and other files, also containing these
data. Figure 6-2 examines the substance identification data elements
included in a number of relevant files as reported in the CEQ Survey
and identifies the common data elements which would permit file inter-
connections. It can be clearly seen that the common link is the non-
standardized chemical name or synonym.
A data base mapping model and search scheme was developed at the
University of Illinois under National Science Foundation support
in order to test the feasibility of data element linkage among
various chemical files. Results demonstrated that use of a con-
sistent scheme for classification of data bases by subject and
common data elements greatly increases the potential for accessing
data bases.
In developing Figure 6-2 no indication of data items other than
CAS No. were cited unless it was definitely known that those
additional items had been incorporated into the file after it
was name matched.
6-12
-------
PAGE NOT
AVAILABLE
DIGITALLY
-------
6.2.3 Toxicology Data Systems Development Recommendations
Biological effects data for selected chemicals are available
during the interim from TDB, EMIC, ETIC, TOX-TIPS, the Registry for
Toxic Effects, the IARC Bulletin of the Survey of Chemicals Being
Tested for Carcinogenicity (PHS-149), the Fish'Control Laboratory
Data, and Fish-Pesticide Research. Physical/Chemical property data
are available from TDB, CIS, and the Thermophysical Properties Research
Center. Since biological effects data were cited in the user require-
ments study as being necessary, immediate consideration should be
given to the feasibility of developing an interactive system containing
data on a wide variety of chemicals. The 1,000 - 2,000 chemicals for
which OTS is considering requesting 8(a)(2) data in this calendar year
are prime candidates for inclusion in TDB. TDB should continue to be
enhanced during the interim period, with major attention being devoted
to the chemical and biological effects data, and minimal effort made
to include production data since this will be available in the TSCA
Chemical Data System.
6.2.4 Exposure/Use Systems Development Recommendations
Other critical data categories identified in the user require-
ments survey include use and exposure data. Systems which provide
some of these data include the NCI/SRI Research Chemicals That Impact
Man, the U.S. International Trade Commission Data Base, the Mineral
Commodity Survey, the Chemical Economics Handbook, Dun's Market
Identifiers, and the National Occupational Hazard Survey. Decisions
6-15
-------
to incorporate these systems into the network and enhance them by
extending their coverage are predicated on EPA's strategy relative
to data collection activities. For example, the NCI/SRI system pro-
vides exposure profile data for approximately 3,200 compounds (some
of which are pesticides, cosmetics and drugs). The system provides
the best attempt to date to model the uptake of chemicals by biological
systems as a result of use and exposure data which SRI collects from
various sources. It would provide the network with a source for limited
amounts of these data. Consideration must also be given to the econom-
ics of enhancing this data base to assist in satisfying user require-
ments and the recommended coverage of chemicals.
The use and exposure data collected under section 8(a)(2) will
need to be supplemented with body uptake information. The NCI/SRI
data base, since it now includes a methodology for generating these
update data, would be a logical candidate for federal support. If
this were decided to be the case, the NCI/SRI data base should focus
on generating uptake information on those chemicals selected by EPA
for section 8 reporting, excluding from their operation the obtaining
of the use and exposure data (these would be supplied to them by EPA).
An Alternative to support of the NCI/SRI data base would be the develop-
ment of an update algorithm through interagency R&D funding by those
agencies requiring this information (e.g., EPA, NIOSH, NCI, FDA, CPSC)
with lead responsibility assigned to one agency. The value and cost of
generating these uptake data through NCI/SRI or through an interagency
6-16
-------
agreement will have to be considered in light of EPA's expected
decision to require use and exposure data.
If EPA defers the decision to collect these data for all chemicals
in 1980 and continues in a first scenario data collection mode, then
enhancement use and exposure as well as uptake 'components of the NCI/
SRI file as part of the network would be a viable alternative to reach-
ing the long-term objective. It is thus recommended that the NCI/SRI
file be referenced as a relevant file for Scenario I. In addition,
consideration should be given to carefully enhancing its coverage in
such a way that it supports and does not overlap with EPA's industrial
reporting plans. As decisions are made in EPA regarding data collection
of production and use data, additional adjustments can be made concern-
ing further enhancement of the NCI/SRI file or the initiation of a
new interagency research effort.
Furthermore, the Directory should provide pointers to the files
and reference tools mentioned above as being potential sources of pro-
duction and use data during the interim stages in the development of
the Chemical Information System Network.
6.2.5 Development Recommendations for Other Systems
Other data requirements for physical/chemical property data,
environmental effects data, epidemiological data and additional
economic data can be partially satisfied by results published in
the open literature and by current research studies. The literature
scanning activities of the NLM and the other agencies, professional
6-17
-------
societies, etc., which are made publicly available through time-shared
government and private networks are a vital part of the interim system.
Relevant bibliographic files such as TOXLINE, MEDLINE, CANCERLINE,
SWIRS, NIOSHTIC, and those available through SCD, LOCKHEED and BRS,
are to be referenced by the Directory.
The two files developed for EPA by Radian and PEDCO provide
process data for selected chemical industries. As noted in Sections
4.5.2 and 4.8, environmental effects data and environmental monitoring
data are not readily available for a wide range of chemicals. As
additional data are collected, efforts should be made by EPA to
incorporate them into AEROS, STORE! or other appropriate systems for
wider dissemination.
During this interim period, product composition data can be
obtained in varying degrees of specificities from the CPSC System,
the NIOSH System, CTCP, POISINDEX and the Poison Control On-line
Inquiry System. EPA users felt that these data bases were particularly
valuable for obtaining use information, but as specific use information
became available to them through the TSCA Chemical Data Systems, they
would not use these systems as extensively.
Existing sources of some relevant epidemiology data include NEISS,
National Center for Health Statistics, the Atlas of Cancer Morality
and other systems identified in Section 4.6. None of these sources
specifically respond to user requirements for epidemiological or adverse
6-18
-------
effects data. These needs would only be met if EPA implements the
section 8 reporting and recordkeeping requirements.
The Directory and ultimately the locator in the Chemical Struc-
ture/Nomenclature System will provide references to manual files as
well as automated files. The Merck Index, the Chemical Economics
Handbook, etc., continue to be needed during this interim period to
provide varying types of data.
6.2.6 Limitation on Recommendations
The additional components typically included in a system develop-
ment plan (such as cost considerations, personnel requirements, spe-
cific recommendations for both software and hardware capabilities,
and required storage capacities) are not addressed here"since they
are not in the scope of this effort. These components are the subject
of a second, concurrent effort by an independent analysis team and
will be published separately.
6.3 Scenario II and III Systems Options
6.3.1 Scenario II Systems Implications
A Scenario II assumption is that EPA will initiate a policy of
requiring submission of information on use, users and exposure for all
chemicals in the inventory in addition to site specific production
information. It is further assumed that the list of chemical substances
for which 8(a)(2) reporting is required will be extended to include a
total of 5 to 10,000 chemicals. With the increased data being collected
by EPA, the content of the TSCA Chemical Data System will be greatly
6-19
-------
increased. Less reliance will therefore be placed on external systems
capable of providing limited interim use and exposure data for those
chemicals that fall under the jurisdiction of TSCA. Data bases of
lesser concern include the NCI/SRI, the U.S. ITC Data Base, Dun's
Market Identifiers and The Mineral Commodity Survey. Reference tools
which diminish in need include the Chemical Economics Handbook and
the Kirk-Othmer Encyclopedia of Chemical Technology.
The core components described as part of the long range objective
and illustrated in Figure 5-1 are still required to satisfy user
requirements to provide a more comprehensive system that will be use-
ful in carrying out the purposes of TSCA. The priorities for imple-
mentation of the core components do not change from those identified
in Scenario I. The major difference between the first and second
scenarios is that under the second scenario, use and exposure data
requirements for commercial chemicals will be more adequately satis-
fied by the increased reporting requirements and less dependence on
the other systems is necessary. That, in essence, is the only signi-
ficant change in the Scenario II systems development option.
6.3.2 Scenario III Systems Implications
In Scenario III it is assumed that EPA exercises its full
8(a)(2) reporting requirements for all chemicals in the inventory. In
addition, it is assumed EPA implements significant new use report-
ing under section 5(a). Implications of these, assumptions are that
both the proprietary and public TSCA Chemical Data Systems will be
6-20
-------
greatly expanded as far as the number of chemicals included. In
addition, stated user requirements for data previously available only
for selected chemicals (e.g., by-products data) will be available
on a large number of chemicals. Scenario III assumptions do not
impact on the systems included in the network or referenced by the
Directory. Reliance on external systems to partially satisfy data
needs is required to the same extent as in Scenario II. The Scenario
III increased reporting causes previously unmet data needs to be
satisfied more fully. Development of the core components of the
network is just as critical under Scenario III assumptions as under
those of I and II, and furthermore the priorities for implementation
remain unchanged.
6.4 Other Considerations of Systems Development Options
6.4.1 Systems Options, Their Compatibility and Development
Comparing of Figure 5-1 with Figure 6-1, one can see the similarity
in basic design structure from a user point of view between a network
which has the potential to be responsive to user requirements and the
currently existing systems which are partially responsive to some
data requirements and unresponsive to others. Previous discussion has
emphasized that satisfaction of user requirements is predicated on
the ability to obtain access to varying types of information necessary
to make assessments concerning the hazards of chemicals and their
impact on man and the environment. Although much of this information
6-21
-------
can be obtained by EPA using the industrial reporting provisions of
TSCA, much of this information must be generated from additional
testing and research.
As new data become available, they must be collected, structured
and made available in systems for easy retrieval. The Chemical Sub-
stances Information Network provides the potential structure for these
systems. It potentially satisfies needs for substance identification
data, production, marketing, and exposure data. It will also provide
a centralized source of existing epidemiological data, biological
effects data and environmental effects data for commercial chemicals.
Information on standards and regulations with respect to chemical
substances that have been promulgated by international, Federal, state
and local governments will be available.
Scenario I systems development options satisfy user requirements
for systems identification data (that is, chemical nomenclature and a
structure search capability) and for site specific production data
for chemicals in the Inventory. It does not satisfy requirements for
use, exposure and biological effects data nor does it provide adequate
data on epidemiology or environmental effects. It provides for
development of a Directory file which points to existing systems where
useful data can be found. However, coverage of these data bases is
very weak with respect to some categories of data (e.g., environmental
effects data) and not well coordinated for others (e.g., biological
effects data),
6-22
-------
Scenario II systems development option satisfies the user require-
ments for substance identification data, site specific production, use
and exposure data. These data fulfill some user's specific require-
ments associated with the hazard identification function. Other user
requirements are still unmet by Scenario II system options (e.g., bio-
logical effects data, epidemiology data and environmental effects data
for all chemicals on the Inventory.)
Scenario III satisfies previously unmet requirements and provides
for collection of all 8(a)(2) data for all chemicals on the Inventory.
However, these data, available in a structured data base, do not in
themselves respond to all user requirements. Linkage with other com-
ponent systems of the network is critical to coordination of the entire
spectrum of data which must be considered when making hazard evalua-
tions on chemicals or establishing regulations affecting their control
or release into the environment.
Network development is evolutionary and is dependent on EPA deci-
sions to implement TSCA. The approach most likely to be taken by EPA
toward implementation of section 8 rulemaking, will probably most
closely resemble the data collection activities described in Scenario I.
If this is the case, development of the network must proceed by build-
ing on existing systems capabilities. EPA may make a decision to
increase section 8 data collection activities some time in the future
and it could well choose an incremental approach such is suggested by
Scenario II. The actions of EPA significantly impact on the design
6-23
-------
of the Proprietary and Public TSCA Chemical Data Systems. In addition,
EPA's actions affect the design of the network in terms of decisions
to enhance existing data bases or to build new data bases to obtain
information that might, otherwise, be collected under section 8 of
TSCA*.
EPA actions do not, however, impact on the design of the other
core component systems described in Section 5.3.1. Development of these
systems must be concurrent with development of the TSCA Chemical Data
System no matter what data collection scenario is in place.
6.4.2 Time-phase Implementation of the Core Component Systems
The general events associated with the concurrent development of
core component systems and associated time frames are presented in
Figure 6-3 which assumes Scenario I data collection option as the
initial starting point. The figure also includes the events associated
with Scenario II and III systems development recommendations. In some
cases, lead agency responsibilities for systems development are
identified.
The figure presents a definition of existing systems on the
left-hand side and the network component objective on the right.
The figure indicates the point in time at which the existing systems
are consolidated, restructured, or enhanced. This is illustrated by
These decisions are further complicated by the fact that EPA has
the unique authority to collect these data. Any data bases developed
or enhanced would require extensive contractor support with no
Federal authority to obtain such data from industry.
6-24
-------
PAGE NOT
AVAILABLE
DIGITALLY
-------
the merging of horizontal systems lines or the positioning of vertical
lines indicating initiation or termination of specific events. Events
relating to more than one system are indicated by a box overlaid onto
all affected systems.
The time phased implementation for the core components shows
development of an interim directory after one and one-half years.
This capability is augmented with the ability to access CHEMLINE and
CIS/SSS for chemical identification and structural data on specific
chemicals. The TSCA Reports Management System is to be operational
in 1978 to be responsive to the assumed initial submissions of 8(a)(2)
data, inventory data, pre-manufacturing and testing data. The require-
ments for a system, as expressed in the RFP No. WA77-D072, are not
inconsistent with the recommendations made in this report. The RFP's
work statement specifies a system which can provide for storage and
retrieval of data submitted as a result of regulations promulgated
under TSCA. This system would encompass the Reports Management System
and the TSCA Proprietary Chemical Data System as described in this
report. Recommendations for a subsequent public system are not
explicitly stated in the scope of work of the envisioned contract.
The recommended major events associated with the consolidation
of existing systems containing biological effects data are also pre-
sented in Figure 6-3. A feasibility study to consolidate existing
systems with biological effects data into the Toxicology Data Bank is
recommended for initiation within year one. Subsequently, software
6-27
-------
modifications to TDB are required. It is recommended that EPA and
NIEHS take the lead responsibility in coordinating mutagenic data,
and NCI in consolidating existing carcinogenic data. NIEHS would be
the appropriate agency to coordinate teratology data, NLM and NIOSH
would take the responsibility for structuring acute toxicity data
using TDB and the Registry of Toxic Effects. Overall responsibility
to assure the development of the Toxicology Data System in a timely
manner is the responsibility of the network management. It is proposed
that the Registry of Toxic Effects be an integral part of the Toxi-
cology Data System. The yearly publication of the Registry (as man-
dated by Occupational Safety and Health Act of 1970) in the long term
will be a product of the Toxicology Data System. EMIC and ETIC, as
analysis centers, may still be critically needed operations in the
long term with most of the efforts being devoted to evaluation and
review of data.
Development of the Toxicology Data System is dependent on the
availability of resources. Figure 6-3 indicates mutagenic and carcino-
genic data are input into the system within three years. Acute toxi-
cology data and teratology data are loaded into the system in year four.
Metabolism data are entered in the system during year five. It is con-
ceivable that all of these data could be entered into the system
concurrently if funds are available, but the assumption is made that
resources for development of this system are limited. Consequently,
priority was given to mutagenic and carcinogenic data since the
6-28
-------
existing data are not well coordinated and such a priority best satis-
fies user needs.
A study to examine the feasibility of developing a Chronic Testing
System capable of being responsive to the requirements of NCI, NCTR,
EPA and industry is recommended and NCI should- take the lead responsi-
bilities. The system will incorporate the best features of the Carcino-
genesis Bioassay Data System developed by NCI and the National Center
for Toxicological Research Integrated Laboratory Support System.
EPA is recommended as being the lead agency for the development of
the Regulated Chemicals Standards System. Priority for inclusion in
the data base is given to Federal standards associated with commercial
chemicals with subsequent attention be given to state, local and inter-
national standards. Standards affecting pesticides should be entered
next into the system.
6.4.3 Compatibility of Component Systems
It is recommended that most of the core systems be connected by a
common data base management system. This serves to facilitate cross
exchange of information, direct linkage of files when necessary and
retrieval using a common command language. Similarly, it is recommended
that standardized nomenclature or data element terminology be utilized
by all core systems in the network. Standardization of nomenclature
is never easy to implement and many systems are usually affected. The
difficulties usually result from the inability of all system partici-
pants to agree on a standardized vocabulary. In the case of CSIN,
6-29
-------
arriving at standardized nomenclature may be somewhat easier since
most of the components are new or are being developed from existing
files. This difficulty must be addressed in the development of the
Directory and the subsequent development of other core components.
An alternative to achieving complete conversion of existing
nomenclature is the use of minicomputers. These would function essen-
tially as "black boxes" which provide conversion routines to first
translate user preferred terminology to the standardized terminology
employed by the systems in the network and second, translate into the
corresponding terminology employed by the individual data systems to
be accessed. This conversion is transparent to the user. Use of a
minicomputer increases the flexibility of the users of the system by
not requiring their learning the standardized network terminology.
Conversion routines can be written to update core system components
for feeder systems which employ different nomenclature. As systems
are designed for the network, standardized terminology would be
used.
The same approach can be utilized for systems that require
unique software and do not convert to the common DBMS. A "front end"
or "black box" can be employed which permits interrogation of the
system through a "macro query language" which in essence connects the
specialized software into what appears to be the DBMS.
6-30
-------
6.5 Network Development and Management
Development of the Chemical Substances Information Network is
operationally feasible and clearly within the state-of-the-art of
computer technology. Success of such a network as far as the users
are concerned is predicated on their ability to obtain data necessary
to carrying out their functional responsibilities of hazard identifi-
cation, hazard analysis, research, regulation, development and compli-
ance and enforcement. The authority to obtain much of the data, which
previously had not been available, now exists. Difficulties of pro-
tecting confidentiality of such data exist, but are not insurmountable
with proper data handling procedures. Difficulties are also encountered
in packaging proprietary data to make it publicly available. Some of
these have been handled before to the satisfaction of concerned parties,
but this is clearly an area where more innovation is required. Clear
delineation of how data are to be used will assist in data aggregations
and data packaging.
Success of CSIN is also dependent on the management of the net-
work development and financial support provided. This critical area,
discussed briefly in Section 5.5, involves the designation of an agency
with the responsibility for data base administration. A decision re-
garding network development and management responsibilities should be
made as soon as possible. CEQ, under its section 25(b) responsibili-
ties, or EPA through the section 10(b) Interagency Committee might make
recommendations as to the appropriate agency to undertake this respon-
sibility.
6-31
-------
EPA has an explicit responsibility to develop a system for the
data submitted under TSCA. However, there is no requirement that the
public system developed from these data or the other components of the
network needs to reside in EPA. Arguments are presented in Section 5.5
as to the merits of having the network management placed in an inde-
pendent organization which is not subject to frequent shifts in program
priorities.
The decision concerning the physical location and selection of
the executive computer and its backup capability is also important.
Implementation of the network could be modelled after the National
Library of Medicine's system which uses a Federally funded computer
and provides for its own program and systems support, or it could be
modelled after CIS which utilizes a private contractor who is respon-
sible for the system's support and marketing.
There are pros and cons to both approaches. One could argue
that an internally supported system would result in increased control
over network development and, consequently, the assurance of adequate
systems maintenance. On the other hand, the private contractor would
be responsive to the market demands, and would provide for continued
enhancements to the components of the network that prove to be self-
supporting or profitable. The final decision should depend on which
one is more economical in satisfying the performance standards.
6-32
------- |