SUPERFUND CHEMICAL ANALYSIS DATA SYSTEM
Mission Needs Statement
Contract #68-01-7282
February 1989
Prepared for:
U.S. ENVIRONMENTAL PROTECTION AGENCY
Office of Information Resources Management
401 M Street, S.W.
Washington, DC 20460
Prepared by:
BOOZ-ALLEN & HAMILTON Inc.
4330 East West Highway
Bethesda, Maryland 20814-4455
(301) 951-2200
Oi
crs
^QUARTERS LIBRARY
"•": ^MENTAL PROTECTION AGENCY
:V.:;K:NGTON,D.C. 20460
-------
TABLE OF CONTENTS
Chapter I. INTRODUCTION
A. Mission Need
B. Evolution of System Need
2
6
Chapter D. BACKGROUND
A. System Description
B. Operational Environment
C. Current System Constraints
8
8
10
13
Chapter HI. INITIAL SYSTEM CONCEPT AND INFORMATION FLOW 15
A. Input Requirements
B. Processing Requirements
C. Output Requirements
D. Information Flow
E. Developmental Constraints
15
25
26
33
36
Chapter W. SUMMARY CONCLUSIONS
39
-------
LIST OF EXHIBITS
Exhibit I-1
Exhibit n-1
Exhibit H-2
Exhibit ffl-1
Exhibit HI-2
Exhibit ffl-3(l)
Exhibit ffl-3(2)
Exhibit m-3(3)
Exhibit ni-4
Exhibit ffl-5
Potential Users and Applications of CARD Data
Overview of the CARD Database
CARD Process Flow
Initial Systems Concept
Data Integration Requirements
Output Mockups — Listings and Reports
Output Mockups ~ Maps
Output Mockups — Plots and Graphs
Flow of Information into CARD
Sample Menus and Screens
5
9
12
16
24
28
29
31
34
38
11
-------
CHAPTER I
INTRODUCTION
There is a strong conviction that needs for high quality, easily accessible and
comprehensive environmental monitoring data cut across all EPA programs. EPA
managers need environmental monitoring data to implement programs, conduct program
oversight, and develop regulations and policy. Previously, there was no centralized
repository of environmental monitoring data for all media collected under the Superfund
program (air, soil, and water).
Such a repository now exists. In January 1988, a pilot system was developed to
automate the collection of environmental sampling data taken at hazardous waste sites and
analyzed through EPA's Contract Laboratory Program (CLP). This system is called
CARD (CLP Analytical Results Database). In addition to analytical results, CARD also
contains associated quality assurance/quality control (QA/QC) data generated by the CLP.
(See Chapter II.)
Except for support of Superfund cleanup efforts, the CARD database in the past has
been limited to contract compliance screening; that is, determining whether contract
laboratories have analyzed environmental samples in the manner specified in the contract to
assure payment for work done. Prior to January 1988, CLP analytical results were
available only through paper files in the EPA Regional Superfund program offices and
were difficult to access.
The accessibility of the data in CARD, however, prompted the Office of
Information Resources Management (OIRM) to conduct a Mission Needs Analysis for a
Superfund chemical analysis data system. The purpose of the analysis, as described in the
EPA System Design and Development Guidance (Volume A), was to determine the
potential Agency users and uses of CARD and to also determine the constraints
surrounding its use. In the first step of the analysis, approximately 150 Headquarters and
Regional personnel were interviewed. Results were compiled in a report entitled: Mission
-------
Mission Needs Statement
Needs Analysis for a Superfund Chemical Analysis Data System: Report of the Data
Collection Effort (October 1988).
In a second step, interview findings were analyzed to determine the operational
environment and information flow that would best serve the potential CARD user
community. A conceptual model of the new system was also developed and presented in a
report entitled: Mission Needs Analysis for a Superfund Chemical Analysis Data System:
Initial System Concept (December 1988).
This Mission Needs Statement summarizes the results of the completed Mission
Needs Analysis and references the earlier two reports. The remainder of Chapter I
discusses the reasons for creating CARD and the circumstances surrounding the need for
its wider use. Other chapters are as follows:
• Chapter II, BACKGROUND, presents an overview of CARD and its operating
environment;
• Chapter III, INITIAL SYSTEM CONCEPT AND INFORMATION FLOW,
describes the inputs, processes, and outputs of an enhanced system which
meets prioritized and practical user requirements;
• Chapter IV, CONCLUSIONS, summarizes the conclusions which may be
drawn from the Mission Needs Analysis and discusses possible next steps.
A. MISSION NEED
The CLP supports the Superfund Program pursuant to the Comprehensive
Environmental Response, Compensation and Liability Act (CERCLA) of 1980. The
mission of this program is to identify and initiate cleanup of uncontrolled releases of
hazardous materials that pose a potential threat to human health or the environment. EPA,
in cooperation with state governments and responsible parties, must identify these sites and
mitigate their hazards. In addition, EPA has the authority to recover the cleanup costs from
the responsible parties.
-------
Mission Needs Statement
Cleanup activities are conducted in several stages, which may vary depending on
the type of cleanup required. Cleanup may be either a remedial or removal action. A
remedial action is a long-term cleanup project, during which detailed studies to characterize
the problems and select and implement an appropriate remedy are performed. Removal
actions are usually short-term activities in which imminent dangers are immediately
addressed.
During the course of all cleanup activities, samples of the wastes and the
surrounding environment are collected to assess the nature and extent of the hazardous
conditions and to monitor the effects of the remedy. These samples are analyzed in several
ways depending on time constraints and the demand for a particular level of quality to be
achieved. The primary source of sample analysis is laboratories under contract to EPA.
These laboratories are participants in the CLP.
The CLP was instituted in the early 1980s to support the Superfund Program. The
purpose of this program is to provide analytical sources to Superfund program personnel
so that data from laboratory analysis of environmental samples are of adequate and
documented quality. These data support key conclusions and decisions during the cleanup
process such as:
* site characterization,
• cleanup prioritization,
• remedy selection,
• risk assessment,
* cleanup effectiveness evaluation, and
• PRP association and enforcement support.
Data generated by the CLP are reviewed by the Sample Management Office (SMO)
to determine whether the laboratories under contract to EPA upheld the terms set by EPA in
their contract. It has been brought to the attention of OIRM through this mission needs
analysis, however, that there are many other potential applications of CLP data, and that a
keen interest in the data has been expressed by program personnel other than in the
Superfund program.
-------
Mission Needs Statement
Exhibit 1-1 shows the key functions and potential users of CLP data that were
identified. Potential users were identified in every program area of EPA. For example, the
following potential applications were identified:
• Office of Water (OW) personnel identified the potential application of CLP data
in the NPDES Program, to determine sources of contamination to a stream
reach when determining wasteload allocation in the permit issuance process.
• Office of Air and Radiation (OAR) personnel expressed a need for CLP data in
determining the compliance of Superfund cleanup remedies with air pollution
protection regulations.
• Office of Pesticides and Toxic Substances (OPTS) personnel identified the
importance of access to CLP data in meeting their Community-Right-to-Know
responsibilities under the Superfund Amendments and Reauthorization Act
(SARA) Title ID or their risk assessment activities under the Toxic Substances
Control Act (TSCA).
• Office of Solid Waste and Emergency Response (OSWER) personnel indicated
the need for CLP data in their Superfund site characterization and cost recovery
activities and in their monitoring of containment of hazardous materials under
the Resource Conservation and Recovery Act (RCRA) program.
• Other users, including the Office of General Council (OGC), the Office of
Research and Development (ORD), and the Office of Policy and Program
Evaluation (OPPE) described the value of CLP data in litigation support, risk
and methods assessments, and program effectiveness evaluations.
• Regional program personnel identified the use of CLP data in data validation,
cleanup remedy selection, permit issuance support and compliance evaluations.
-------
Mission Needs Statement
EXHIBIT 1-1
POTENTIAL USERS AND APPLICATIONS OF CARD
§
-------
Mission Needs Statement
It became apparent, therefore, that CLP data could be very valuable to EPA
program implementors for uses other than contractor compliance. The benefits to EPA of
expanded access to CARD include:
• a new, additional source of data to EPA's data pool,
• time savings in information access,
• time savings in report production,
• improved report production capabilities,
• a larger information base for decisions,
• improved management of analytical results,
• time savings due to the access of data from similar sites,
• facilitated multi-media analyses,
• uniformity of data and data-based decisions, and
• less of a need to re-key data.
Other needs for enhancing CARD and expanding its access are discussed below.
B. EVOLUTION OF SYSTEM NEED
Many factors contribute to the growing demand among EPA personnel for access to
CLP analytical results data:
• The trend for cross-programmatic, multimedia environmental analyses which is
rapidly gaining momentum, and the availability of tools such as geographic
information systems (GIS) to perform them;
• The responsibilities EPA has to provide answers to public inquiries under
SARA Title III, and the need for high quality data to meet those
responsibilities; and
• The growing trend towards multi-program compliance inspections and
enforcement actions.
-------
Mission Needs Statement
However, the utility of CLP data in their current form is greatly limited because of
constraints such as:
• The absence of certain data which would add meaning and context to the
findings, such as site and location identifiers or validation flags; and
• The difficulty of access through Regional paper files or the mainframe where
CARD resides.
Thus, EPA initiated this life-cycle development effort to plan for the redesign of
CARD and enhance its availability to a much-interested user community.
-------
CHAPTER II
BACKGROUND
Currently no system has had the accessibility or the required data to fulfill the information
needs specified during the Mission Needs Analysis. Program managers have relied on site
files or have done without useful Superfund sampling data. Although the CARD system
contains sampling information from Superfund sites, it is not in a format that can be readily
applied to programmatic needs. The CARD system, though, has many of the data elements
of the required system. This chapter describes those data elements in some detail and also
describes the CLP operating environment.
A. SYSTEM DESCRIPTION
CARD resides on an EPA IBM mainframe computer at the Research Triangle Park
(RTP) facility, running under the ADABAS database management system. It contains all
the CLP analytical results generated by the program since January 1988. The data includes
the actual analytical findings and their supporting QA/QC data. Samples are uniquely
identified by case number, Sample Delivery Group (SDG) number and lab sample number.
Information that is later provided by the lab include date of laboratory receipt, date of
sample analysis, and physical descriptors (weight, volume, etc.)
Organic and inorganic samples are the two basic categories of chemicals on the
CLP's Target Compound List. The organic samples are divided into three fractions
according to chemical make-up. They are volatile organic compounds (VOCs), semi-
volatile or base neutral acids (BNA) compounds, and pesticide/PCB compounds. The
inorganic samples are divided into two fractions - metals (consisting of 23 elements) and
cyanides. In the future, CARD also will contain results of the analysis of dioxins. Exhibit
n-1 illustrates the contents of the CARD Database.
- 8 -
-------
Mission Needs Statement
EXHIBIT II-l
OVERVIEW OF THE CARD DATABASE
ORGANICS
VOLATILE AND SEMI-
VOLATILE ANALYTICAL
RESULTS:
PESTICIDE ANALYTICAL
RESULTS:
LAB NAME
CONTRACT NUMBER
EPA SAMPLE NUMBER
LAB CODE
CASE NUMBER
SAS NUMBER
SDG NUMBER
MATRIX
SAMPLE WEIGKT
SAMPLE UNITS
DATE RECEIVED
DATE EXTRACTED
DATE ANALYZED
CAS NUMBER
COMPOUND
CONCENTRATION
INSTRUMENT ID
PCS STANDARDS
SURROGATES
MATRIX SPIKES
DUPLICATES
LINEARITY CHECK
BREAKDOWN MONITORING
METHOD BLANK TABLE
RETENTION TIME
SHIFT OF DSC
LAB NAME
CONTRACT NUMBER
EPA SAMPLE NO.
LAB CODE
CASE NUMBER
SAS NUMBER
SDG NUMBER
MATRIX
SAMPLE WEIGHT
SAMPLE UNITS
DATE RECEIVED
DATE ANALYZED
DILUTION FACTOR
CAS NUMBER
COMPOUND
CONCENTRATION
NUMBER OF TICs
INSTRUMENT ID
TUNE TABLE
CALIBRATIONS
INTERNAL STANDARDS
SURROGATES
MATRIX SPIKES
DUPLICATES
METHOD BLANK TABLE
CARD
DATA
INORGANICS
LAB NAME
CONTRACT NUMBER
LAS CODE
CASE NUMBER
SAS NUMBER
SOG NUMBER
EPA SAMPLE NUMBER
COMMENTS
MATRIX
PCT SOLIDS
DATE RECEIVED
CONCENTRATION UNfTS
CAS NUMBER
ANALYTE
COLOR
CLARfTY
TEXTURE
(TO BE DETERMINED)
-------
Mission Needs Statement
10
CLP laboratories are required to follow the strict analytical and reporting guidelines
specified in the CLP contract when analyzing environmental samples. The majority of
requirements related to the implementation of QA/QC protocols are necessary to certify the
quality of data generated during sample analysis. For instance, instrument calibration must
be performed before, during, and after the analysis of the field samples. Several QA/QC
samples also are run concurrently during the analysis. They include but are not limited to
blanks, duplicates, and matrix spikes. The number of QA/QC samples per case is
determined by the number of SDGs. A SDG consists of up to twenty samples or the
number of samples received at the lab over a fourteen day period, whichever is less. One
full set of QA/QC samples are analyzed per SDG.
The analytical results and the respective QA/QC data produced by each contract
laboratory are entered on the appropriate CLP data reporting form and the hardcopy and
electronic versions (on floppy disk) of the form are sent to the Sample Management Office
(SMO) for analysis. The SMO judges from those data whether the laboratory's
performance was in compliance with the contract.
B. OPERATIONAL ENVIRONMENT
The process flow of CLP data can be divided into three steps - data collection, data
manipulation, and data analysis and storage:
Step 1: Data generated by each contract laboratory is sent, simultaneously, to
three different destinations:
The Sample Management Office (SMO) in Alexandria, Virginia,
which receives data in both hardcopy form and on diskette;
The Environmental Monitoring and Support Lab (EMSL) in Las
Vegas, Nevada, which receives data in hardcopy form; and
The Environmental Services Division (BSD) of the EPA Regional
office in which the site is located (in Region DC, which has no ESD,
CLP data are sent directly to Superfund Program personnel).
-------
Mission Needs Statement
11
Step 2: The SMO loads the data diskette onto a microcomputer in order to run a
format and completeness check of the data. If it passes this preliminary
test, the data is uploaded to the logical mainframe (IBM 4381) in
Cincinnati, Ohio. Using procedures running under a statistical analysis
package called SAS, SMO then checks the data to assure that the
laboratory's analytical methods and results comply with the contract. This
process is called "contract compliance screening." Data not in compliance
is flagged. All data is then transmitted electronically to the EPA
mainframe (IBM 3090) in Research Triangle Park (RTP), North Carolina.
There it is converted to the format required by ADABAS, a widely used
database management system, and appended to the existing CARD data.
Step 3: Once the data is available in CARD, it is accessed electronically by the
Data Audit Group of EMSL for QA/QC purposes. Their staff review the
data to assess the adequacy of different analytical techniques which may
*
have been used. They also monitor the overall effectiveness of the CLP
program.
Finally, the personnel in the Regional ESDs review the CLP data from
hardcopy for acceptability for use in their cleanup efforts. This process is
called data validation. A determination is made as to whether the data is
"good", "unacceptable" or "acceptable with reservation" to serve the
various analyses that are performed pursuant to cleanup activities. Data
that is acceptable is passed on to the Superfund site managers in hardcopy
form or entered into a local database. Rejected data is sent back to the
laboratory from which it came, and the sample can either be re-analyzed
by the lab or forfeited. Data that is "acceptable with reservation" is
flagged to assist program personnel in interpreting its limitations. Flagged
data is then also passed to Superfund personnel. Currently, these
validation flags are not entered into CARD.
Exhibit n-2 illustrates the flow of data into CARD.
-------
Mission Needs Statement
12
EXHIBIT II-2
CARD PROCESS FLOW
Q
I
ri
-------
Mission Needs Statement
13
The CARD system is designed to have three automated functional capabilities: (1)
automated data capture from diskette, (2) automated data review by the Data Audit Group,
and (3) reporting capabilities.
C. CURRENT SYSTEM CONSTRAINTS
There are two principal constraints to the current CARD system which, if
overcome, would significantly increase its value to the environmental data user community.
These constraints include:
• Absence of data validation flags -- CARD data can be of varying acceptability to
those who access it. ESD personnel in each Regional office review data
analyzed by contract labs and assign flags indicating data limitations. These
flags are critical to the appropriate use of CARD data and the confidence in
decisions based upon it. However, data validation flags are not currently added
to CARD, but to the hardcopy CLP data given by the ESD to the Superfund
program personnel. In addition, the flags are not used consistently among
Regional ESDs, leading to a variety of interpretations throughout the Agency.
A process to ensure that the flags are incorporated into CARD and consistently
interpreted would add value to the data in the database.
• Absence of critical site and sampling data — There are many types of data about
the sites from which samples were collected and analyzed through the CLP that
exist in paper files but are not in electronic format. This data would greatly
enhance existing CARD data by adding meaning and context and would enable
CARD data to be interfaced to other EPA databases. Such data include, at a
minimum:
Site Identification - A data element, such as the EPA ID code or site name,
which identifies the site from which the sample was collected.
Sampling Location -- Identification of the location from which the sample
was taken and of which the findings are representative. Data that would
address this constraint include latitude/longitude coordinates of the sampling
-------
Mission Needs Statement
14
point, address of the site, hydrologic unit (surface water, aquifer, etc.) in
which the sampling point is located, or relative position to other points of
concern (e.g., distance and direction from drinking water source or focus of
the release).
Some of these data, particularly site ID code and name, can be found in the
CERCLA Information System (CERCLIS) for Superfund analyses, or the
Hazardous Waste Data Management System (HWDMS) for RCRA analyses.
Others may be found in site files or the cleanup contractor's manual or
automated files. An independent effort would be required to add these data to
CARD and will vary in the amount of effort needed depending on the
availability and accessibility of the information.
Input, processing, and output requirements for the new system are described fully
in the next chapter.
-------
CHAPTER III
INITIAL SYSTEM CONCEPT AND INFORMATION FLOW
Interviews conducted during an earlier stage of the Mission Needs Analysis yielded a
conceptual view of the inputs, processes, and outputs of a system which would address
prioritized and practical user needs. This conceptual view of the system is known as the
Initial Systems Concept (as depicted in Exhibit III-l). The Initial Systems Concept shows
CARD data and other inputs incorporated into a sophisticated system capable of performing
statistical calculations on the analytic data, producing charts and geographic overlays, and
interfacing to other large data systems.
In the sections that follow, each of the inputs, processes, and outputs shown in
Exhibit in-1 will be described further. For a more complete description of these items and
the feasibility of incorporating them into the proposed Superfund Chemical Analysis Data
System, refer to the Report of the Data Collection Effort and the Initial Systems Concept.
as cited earlier.
A. INPUT REQUIREMENTS
There are several data inputs required in the proposed Superfund Chemical Analysis
Data System. These inputs would add context and meaning to the existing analytical results
and QA/QC data, thus widening their utility to many potential users. The key types of data
input are described below.
I.
CARD Analytical Results and OA/OC Data
CARD data would continue to flow into the Superfund Chemical Analysis
Database. Analytical results, QA/QC data and flags assigned by SMO would be included.
The sample number, which had been assigned by the Regional Sample Control Center
- 15 -
-------
Mission Needs Statement
16
EXHIBIT IIM
INITIAL SYSTEMS CONCEPT
^Itl^y
-------
Mission Needs Statement
17
(RSCC) at the time that the sample was collected, would serve as the link between these
data and other incoming data relevant to the same sample.
2.
Site Identification Code
Every hazardous waste site that receives attention by EPA is assigned an
identification code called the "EPA ID code." This identification code serves as a link to
information from other sources.
EPA ID codes are assigned through the Facility Index Data System (FINDS).
Program personnel in the Regional Offices provide basic site descriptive information
(typically name and address) to on-site contractors responsible for upkeep of FINDS.
These contractors research the FINDS database to determine whether an ID code has
already been assigned to the site in question. If so, the existing ID code is communicated
back to the program personnel requesting it. If not, a new one is generated by the FINDS
contractors and then passed back to the program personnel.
The EPA ID code is twelve digits long. The first two characters are the
alphanumeric Federal Information Processing System (FIPS) code for the state in which
the site is located. The next nine digits are an arbitrary number generated by FINDS. The
last digit is a check digit calculated by FINDS to assure that the ID code is unique and
accurate.
An EPA ID code may be assigned to a site before sampling ever occurs. However,
this ID code is omitted from the sample package sent to the contract laboratory to ensure
that no bias occurs. Instead, a "case number" is associated with a sample and currently
used for unique identification in CARD. This case number is not, however, the
identification code used throughout the rest of the EPA to associate data for a single site.
Therefore, the EPA ID code would be pan of the site and sampling descriptors appended to
the analytical and QA/QC data in CARD by the Regional program personnel.
-------
Mission Needs Statement
18
3.
Geoeraohic Locators
Information defining the exact location from which a sample was collected is critical
to the utility of CARD data. This information can be used to:
• Interpret the significance of analytical results from samples taken at two
different places;
• Enable the integration of analytical results data with other locational-based
information such as soil maps, aerial photographs, drinking water source
locations, etc., in the performance of "whole ecosystem" analyses of
environmental conditions; or
• Facilitate association of a sample with the site for which it was collected.
OIRM is in the process of developing an Agency-wide policy for the collection and
communication of location information. The draft policy would require that, at a minimum,
the following data be collected for EPA-regulated and tracked facilities and monitoring
points:
• Latitude/longitude ("lat/long") coordinates and the level of precision and
accuracy necessary to support the use for which they were collected;
• Identification of the method used to determine lat/long coordinates;
• Description of the site represented by the lat/long coordinates; and
• Estimation of the quality in terms of precision of the lat/long coordinates.
An enhanced CARD database should include location identifiers in accordance with
OIRM policy. In addition to lat/long coordinates, other locators would be helpful, such as
site address, hydrologic unit (surface water, aquifer, etc.), and relative position to other
points of concern (e.g., distance and direction from drinking water source or focus of the
release). This information would be collected at the time the sample is taken, although
approximate sampling locations may have been determined during the development of the
site sampling plan and/or data quality objectives. This information would be provided by
the on-scene coordinator, associated with the appropriate sample numbers, sent to Regional
program personnel for review and then transmitted to CARD.
-------
Mission Needs Statement
19
4.
Sampling Descriptors
The addition of sampling descriptors to convey the method and circumstances under
which the sample was taken would greatly enhance the utility of CARD data by providing
users with more information to better understand and assess the quality of the data. The
QA/QC data generated in the lab is useful in determining the factors which may affect the
data quality at the time of analysis, but the whole picture cannot be drawn without an
understanding of the factors which may have been introduced during sampling.
Sampling descriptors would be of two types:
• Those which describe the sampling methodology, such as collection method,
sample depth, collection date and time, etc., which would help users understand
how the sample was taken; and
• Those which describe the "context" in which the sample was collected, such as
relative location (e.g., "upgradient") to the site, distance from water supply,
etc., which would help users understand why the sample was taken.
Most of these pieces of information may be routinely collected by the clean-up
contractors during the sampling episode, or determined prior to sampling in the
development of the site sampling plan and/or data quality objectives. They are often
maintained in the site (paper) file or the contractor's local database. The on-scene
coordinator would assure the documentation of sampling descriptors and link them to a
sample identification number. They would then be passed to the Regional program
personnel for review and forwarded for inclusion in the CARD database.
5.
Data Validation Flaes
Flags communicating the limitations of each analytical result are appended to the
CLP data by Regional ESD personnel. The appropriate flags are determined by the
comparison of the raw instrument data (printouts of data produced directly by the analytical
instrumentation) and the data summary sheets. The flags indicate that a value might be
-------
Mission Needs Statement
20
approximate, low, undetected, etc. They are critical to the understanding and proper
application of the data.
Guidelines for the assignment of validation flags can be found in the Functional
Guidelines for Data Review. These guidelines are selectively followed by Regional data
reviewers. There is, therefore, inconsistency among the Regions in the assignment of data
validation flags which would need to be overcome if the flags are entered into the system
and interpreted Agency-wide.
Under the proposed system, validation flags would continue to be assigned by
Regional ESD personnel. These flags would be transmitted to CARD after assignment.
Those Divisions which have automated the management of CLP data could electronically
transmit the data to CARD; those without such a system would enter the validation flags
into CARD via remote access. Unvalidated data in CARD would be identified as such, and
users would have the option to omit such data from any analyses.
6.
Special Analytical Services Results
Often, analyses performed by contract labs are not of a routine nature. For
example, analyses of air samples and some dioxin samples are not routinely performed and
are considered "special analytical services" (SAS). Results from special analytical services
are not currently included in CARD. They may be, however, paramount to understanding
the nature of a hazard at a site.
No consistent way of managing SAS data has yet been developed. The types of
quality control, the disposition of the information, the analytical techniques used, and other
factors are determined by program personnel at the time that the special service is
requested. This poses a difficulty for incorporation into a database. A uniform format for
common, key data elements necessary for management of SAS data should be developed
by CLP implementors.
The process for incorporation of SAS data into CARD would not differ from that of
routine analytical services (RAS) data. On-scene coordinators would associate each sample
with a sample number, and all sample numbers to an EPA ID code. Site and sampling
-------
Mission Needs Statement
21
descriptors would be passed to the Regional program personnel for review. Regional ESD
personnel would validate the data and organize it into a format compatible with CARD,
including flags. These data would then be transmitted to CARD either electronically or via
remote terminal entry.
7.
Reference Data or Flags
Many decisions made during the Superfund cleanup process are based upon the
comparison of analytical results generated from the CLP to established limits or criteria.
Such limits or criteria might include reportable quantities of hazardous material on the
Superfund list or maximum levels of contaminants for safe drinking water (MCLs). Levels
which exceed these limits dictate certain necessary responses. CARD data would be of
great value to many users if the various limits and criteria were accessible in CARD as
reference material so that analytical results could be compared to them automatically and
exceedence could be flagged.
These reference materials are readily available from EPA and other program
sources. They are, expectedly, in a wide variety of formats, and would require some effort
to enter into the new system.
8.
Additional Data Fields for Miscellaneous Entries
Some miscellaneous types of information were identified as of value to potential
CARD users. These include the name and phone number of a contact and the location of
CLP-generated data sheets. This information would be particularly useful to those users
trying to obtain more detailed data than would be available in CARD. It may, however, be
subject to frequent change and would therefore need careful maintenance.
9.
Creation of Interfaces to Other Data Systems
The trend in the interpretation of environmental data is toward multi-media
analyses, where data from many sources and representative of several media are integrated
-------
Mission Needs Statement
22
to produce "whole-ecosystem" analyses of environmental conditions. Geographic
information systems will be particularly valuable in performing these types of analyses.
CARD data would be a significant addition to this type of analysis if they could be
interfaced to other types of data collections. The types of data collections that have been
identified include:
Non-CLP Sample Analysis Data — Many of the samples that are collected at
Superfund sites are not analyzed through the CLP, but rather in the field or in
other laboratories. These data are, however, equally valuable to a complete
understanding of environmental conditions. They are available from many
sources including state and contractor databases and could be linked to CARD
data by site ID code or sample location.
Superfund Management Information — The status and activities of cleanup
explain why many of the samples analyzed through the CLP were collected and
add value to interpretation of the results. These data may be found in databases
such as CERCLIS and ROD (Record of Decision). These data could be linked
to CARD data by site ID code.
Hydrogeologic Data — Many parameters about the water bodies that may be
affected by the site from which samples were collected can be found in non-
EPA databases. These databases, such as WATSTORE and the Ground Water
Site Inventory (GWSI), are available through the U.S. Geological Survey
(USGS). The data in these databases could be linked to CARD data primarily
through sampling site location information (which has yet to be entered in
CARD).
Ambient Monitoring Data -- Data representative of environmental conditions
unaffected by the hazardous waste site are needed as a comparison to determine
the relative severity of the hazardous conditions. These data may have been
collected by other organizations pursuant to other activities, or by the cleanup
team to determine background conditions. Sources of ambient monitoring data
include EPA's STORET and River Reach File and the USGS National Water
User and WATSTORE systems. These data could be linked to CARD via
geographic locators such as hydrologic unit and latitude/longitude coordinates.
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement
23
• Demographic and Land Use Data - CARD data could be very effective in the
determination of risks if they were integrated with data that described the
potentially affected populations and surrounding environment. Databases that
could provide this information include the OW's Industrial Facilities Discharge
File and OSWER's HWDMS. Geographic locators could serve as the link
between the data in these systems and the data in CARD.
• General Site Information — Data about the general ecological characteristics of
the area in which the sampling site is located would be of great value to the
interpretation of CARD data. These data are general rather than specific in
nature and are representative of Regions or areas rather than points. Sources of
general ecological information include the USGS Land Use/ Land Cover
Database and the National Oceanic and Atmospheric Administration's
climatological databases. These data could be linked to CARD data through
geographic locators.
• Toxicological and Risk Information ~ A critical part of the Superfund (and
other environmental protection) decision-making process is the determination of
potential risks to human health or the environment posed by an uncontrolled
release of hazardous materials. Assessment of risks are based upon the
understanding of the toxicological properties of the hazardous material as well
as the routes of exposure to potential targets. The interface of CARD analytical
results with toxicological and risk data would be valuable for performance of
risk assessments. Databases of this information include IRIS, PHRED, and
TOXCAT. The information linking these databases to CARD could be the
Chemical Abstract Society (CAS) number and name of the hazardous
substance.
Exhibit III-2 summarizes the data integration requirements of the Superfund
Chemical Analysis Data System.
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement
24
EXHIBIT III-2
DATA INTEGRATION REQUIREMENTS
HYDROGEOLOCIC DATA
Ground ind surface water information to
supply. Databases include:
• WATSTORE
• GWSI
The common dauekrners. between these
databases would bea geographic locator
(i.e., latitude/longitude).
possible risks ID water
DEMOGRAPHIC AND LAND USE INFORMATION
Land UK information for remedy selection. Database) include:
HWDMS
BBS
USGS Und Use/Urd Cover
MARF2
The common dm element between these
databases would be a geographic locator (i.e.,
latitude/longitude).
AMBIENT MONITORING INFORMATION
TOXOLOCICAL AND RISK INFORMATION
Toxicity information to identify hazardous subsumes. Database*
include
Ambient monitoring data to determine possible affected
Database include:
IRIS
PHRED
TOXCAT
Toxline
STORET
USGS National Water Use
River Reich
Common data elements between these
databases would be geographic locators
and, for STORET only, the EPA Site ID
number.
The common data elements are the CAS
number and the compound name.
SUPERFUND MANAGEMENT INFORMATION
GENERAL SITE INFORMATION
Ecologic, biologic, meteorologic data. etc., for programs. Databases
Status of cleanup for trend and program analysis. Sources include:
CERCLIS
ROD database
• NPL Technical Database
• USGS Land Use/Land Cover
(CIRAS)
• NOAA databases
Common dau elements are the EPA Site ID
Number for EPA-specific databases and
geographic locators for other database!.
The common data element between
these databases would be the EPA Site
ID Number,
NON-CLP SAMPLE ANALYSIS DATA
Sampling data from sources other than the contract labs. Sources
include the following:
• Regional automated systems
• RCRA program
• PRPs
• Slate programs
The common data element between
these databases would be the EPA Site
ID Number or a geographic location
(ie., Unnjdertongitude coordinate)
-------
I
I
I
I
I
I
I
I
I
f
I
I
I
I
I
I
I
I
I
Mission Needs Statement
25
B. PROCESSING REQUIREMENTS
The enhancement of CARD would enable users to perform various basic analytical
functions, including ad-hoc queries involving sorts and selections of data, statistical
summarization and other specialized functions. These are discussed in more detail below.
1.
Sorting and Selecting Capabilities
SCADS would support users' frequent ad-hoc queries by permitting the use of key
data fields as a basis for information sorting and selection. Among the types of information
by which users need to sort and select are:
• Geographic location, including state, county, site, etc.;
• Type of site (e.g., oil spill, wood treatment facility, chemical plant, etc.);
• Hazardous materials identified; and
• EPA ID code.
2.
Statistical Analytical Capabilities
The capability to perform various types of statistical analyses of CARD data would
allow users to interpret and report analytical results and assess data quality. Statistics, such
as the determination of average, minimum, maximum, standard deviation, and rate of
violation were identified as useful in meeting analytical requirements. Statistical software
can be interfaced to SCADS to provide these capabilities. Types of software that may
provide the necessary functions include the Statistical Analysis System (SAS), STORET,
and Lotus 1-2-3.
3.
Specialized Functions
There are some special processes that users would have available on SCADS which
would enable particular types of data processing. These include:
-------
I
I
I
I
1
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement
26
• Automated translation of case number to EPA ID number — This capability
allows the association of samples with a particular site. Users may then
aggregate data from samples taken at a site, or perform other types of data sorts
and selections based on site.
• Interface to a Geographic Information System (GIS) ~ Users may perform
graphic analyses of data using the mapping capabilities of a GIS if analytical
results are associated with locational identifiers such as latitude/longitude-
coordinates and if an interface between SCADS and GIS is created. EPA has
the ARC/INFO system which serves as a GIS in all the Regional offices. Use
of GIS would allow the integration of CARD data with other types of
locational-based data such as soil or vegetation maps, or drinking water source
locations.
* Data Security - Certain occasions may arise where it becomes necessary to limit
the access to a subset of CARD data. For example, some data may be
enforcement-sensitive some or all of the time. SCADS could permit read,
write, and delete privileges to only authorized users.
The feasibility of the special functions described above depends on the availability
of other data to SCADS. For example, the feasibility of the translation of case number to
EPA ID depends on the availability of a file with EPA ID codes. The feasibility of
interfacing with GIS depends upon the addition of location identification data to SCADS.
Data security would involve development of a system and procedure for authorizing access
to certain subsets of CARD data.
C. OUTPUT REQUIREMENTS
There are four types of output that an enhanced CARD system would provide.
These include data listings and reports, maps, charts, and formatted files. These output
types are discussed in greater detail below.
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement
27
1.
Data Listings and Reports
Exhibit 111-3(1) shows examples of required listings and reports. Listings and
reports are presentation of raw or analyzed CARD data in non-graphic form.
Users would be able to produce standard and ad hoc listings and reports of CARD
information that had been selected for meeting user-specified criteria. For example, a user
may want to see a list of the observations in which lead was detected in all the samples at a
single site, or a list of the locations of all the sites in the nation in which lead was detected.
These could be produced in either hard-copy form or viewed directly on a computer screen
in response to a user query.
In addition, users could produce output that contains a statistical summary of data
rather than a raw data listing. Users would be able to select the data to be included in an
analysis and the type of analysis to be performed.
All database management systems can provide users with capabilities to meet this
output requirement. There should be no impediments to having SCADS offer this
capability.
2. Maps
Exhibit ni-3(2) provides examples of mapped output using CARD data. Maps are
the geographic representation of data (analysis-based) or the identification and depiction of
locations meeting user requirements (non-analysis-based). Data can be presented as:
• Points plotted to represent the precise location where sampling occurred; for
example, a map may be produced showing all the sampling locations where
toluene was detected at a site.
• Areas shaded to a pattern that represents the combined findings of all points
within an area; for example, a map may be produced showing the average
values of toluene levels in groundwater from 1988 to 1989 in all the counties in
a state.
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement
28
EXHIBIT 111-3(2)
OUTPUT MOCKUPS -- LISTINGS AND REPORTS
PAGEl
ABC MANUFACTURING
ANYTOWN, USA
39 01 16.3 077 02 15.8 2
PAGE 4
ABC MANUFACTURING
ANYTOWN, USA
3901 16.30770215.82
BORON CL
UG/L MG/L
7.80 294K
7.10 299
9.30 275L
7.80 405U
6.10 298U
6.30 276
6.80 199
DATE
79/0105
78/02/15 1500
78/04/08 0400
79/05/02 1700
78/07/13 1200
78/10/01 0800
79/11/15 1500
TJME MEDIA
0300 CORE
WATER
VERT
BOTTOM 639
CORE 428
DREDGE 791
WATER 279
JTD
|)EV MAX MJN
69.71 514 274
1.3839 7 3
1149.97 1500 2.34
015 015
1197.62 1241 2
5.525 135 3
Standard and ad hoc
listings and reports
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement
29
EXHIBIT 111-3(2)
OUTPUT MOCKUPS -- MAPS
LEAD IN GROUNDWATER IN ARKANSAS (BY COUNTY)
AMBIENT TREND 1972-1977 VS 1978-1985
47.35 MILES PER INCH
Maps overlayed with
analytical data
Over zs% Improvement
10-25% Improvement
Under 10% chang*
10-25% wertenlng
Over 25%
EDB Application Area PC-22
I 1 1
10 KM
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement 30
• Contours consisting of isobars of equal concentration for a specified geographic
area; for example, a map could be produced showing the concentration
distribution of toluene at a site during a particular sampling event.
Mapping capabilities are most effective when adequate locational data are available.
Location data must be complete, precise and accurate enough to produce a meaningful
analysis, and consistent with other locational data. Therefore, this capability would be
most feasible upon the adoption of an Agency-wide policy for the uniform collection and
communication of location.
To produce a map, data would be reduced using simple statistical analyses and then
plotted on a map using a Geographic Information System (GIS) such as ARC/INFO.
Mapping capabilities offer users the opportunity to integrate CARD data with other
geographical-based data such as soil maps, drinking water source maps, etc. Types of
maps would include both incident location maps (i.e., plotting of the concentrations of
various hazardous materials at the locations they were detected) and contour maps (i.e.,
isobars of equal concentrations of hazardous materials at a site).
The success of mapping analysis using CARD data depends on the availability and
quality of location identification data and the compatibility of these data with location data in
other data sets.
Mapping offers many benefits to users. In particular, it will:
• Facilitate multi-media analysis,
• Enhance integration of CARD data with other locational-based data, and
• Produce easily-interpreted visual analyses of data.
3. Charts (Plots and Graphs)
Exhibit 111-3(3) shows examples of charts, including plots and graphs that could be
produced using CARD data. Types of charts might include bar graphs, pie charts and
regression lines.
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement
31
EXHIBIT 111-3(3)
OUTPUT MOCKUPS - PLOTS AND GRAPHS
Residual Dioxin Detected in Conneaut Lake
Taken From Two Depths at Carson's Bridge
Residual Dioxin Detected in Conneaut Lake
Taken From Two Depths at Carson's Bridge
(hi Phlta fmr Mien)
8/83 «8i 10/8511/89 12/81
ar85 »8S 10/85 11/85 12/85 1/86 2/86 3/86 4/86 5/88 6/86 7/86 8/88
Monthly Intervals
Plots and graphs
* SamplM taken at or abov* 10 teat
A Sample* t»K«n below 25 tMt
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement 32
Charts are particularly useful in:
• Depicting trends, including change in concentration over time; for example, a
graph could be produced showing the change in monthly average heptachlor
values from cleanup initiation to post-cleanup operation and maintenance.
• Depicting comparisons, such as differences in concentration values in different
locations; for example, a bar chart could be produced showing dioxin levels at
two different locations for several months.
Benefits offered to users by having chart-producing capabilities include:
• a visual means of statistically interpreting data,
• a format that can readily be incorporated into decision documents such as a
Record of Decision (ROD), and
• an effective means of reducing and analyzing raw data.
Chart-producing capabilities are closely tied to the availability of statistical analysis
software. As this type of software is highly developed and readily available, there are few
problems anticipated in providing this output capability through SCADS.
4. Formatted Files
Users could produce files of selected CARD data in compatible formats for
downloading to other applications. This would enhance the ability to share CARD data
with other data collections. For example, users would be able to download a subset of
CARD data into files in the formats of popular commercial packages such as SAS, dBase,
and Lotus 1-2-3.
Benefits offered to users by this capability include:
* Reduced time and effort required to download data,
• Fewer opportunities for the introduction of errors in the transference of data
from one collection to another,
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement 33
• Greater utility of existing data, and
• Reduced need to collect new data.
The most direct method to perform this function is to produce ASCII (American
Standard Code for Information Interchange) files. Many database management systems
have the capability to export and import ASCII files, and there would therefore be few
limitations to offering this capability in SCADS. It has not yet been determined,
however, whether ASCII files are sufficient for the most needed
applications, or whether other download capabilities need to be examined.
D. INFORMATION FLOW
Exhibit III-4 depicts the proposed organization process flow for an enhanced
CARD data system. This process would be implemented in four steps: (1) initial sample
collection, (2) data review and sample analysis, (3) flag assignment, and (4) update of the
database. During some stages several activities would be performed concurrently.
Step 1: The first stage in the management of CLP analytical results data would be
the initial sample collection and documentation of sampling information.
Personnel from the Regional Sample Control Center (RSCC) would
assign a sample number, obtained from the SMO, to each sample. Any
other data about the site relevant to the sample would have that number
associated with them. The samples would be sent to a contract lab.
Descriptive information associated with each sample would be
documented and sent to the Regional program personnel. This includes
descriptive data about the site, such as facility ID code and location. It
also includes descriptive data about the sampling process, such as depth,
distance from source, method used, etc.
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement
34
EXHBIIT III-4
FLOW OF INFORMATION INTO CARD
'
Q. I
t
•a
-!
« -v
I & ^ mmm •?
-J 1 X o>
^x ^
O / DC
o
o
•S
-I
o
CD
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Missioo Needs Statement 35
Step 2: Two activities would be performed concurrently in Step 2. One activity
would be the review of the sampling descriptive data by Regional program
personnel. Data would be screened for completeness, accuracy, etc.
Those descriptors, once approved, would be transmitted to the CARD
database to be associated with the appropriate samples.
The other activity would be the analysis of the samples by the contract lab.
Results would be sent to two destinations:
• the Regional ESD for validation and
• the Sample Management Office for contract compliance screening.
Data would be reviewed by these organizations before becoming part of
CARD.
Step 3: The third step of this CARD process would be the assignment of various
types of flags to the analytical results produced by the contract lab.
Regional ESD personnel assign flags that describe the limitations, if any,
of every sample. SMO assigns flags that describe the degree of
compliance under which the lab was operated when the sample was
analyzed. Both sets of flagged data would be transmitted to the CARD
database.
Step 4: The last step in the operational process would be the update of CARD.
The CARD database would be updated with the analytical results and three
other types of information:
• site and sampling descriptors from the Regional program personnel,
• data validation flags from the Regional ESD personnel, and
• contract compliance screening flags from SMO.
These types of information will be the composition of the entire CARD
database.
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement 36
This process flow differs from the current one in a number of ways. Specifically,
Regional program personnel would add site and sampling descriptors to the database,
adding context to the analytical results. The descriptors would be linked to the analytical
results by the sample identification code that would be assigned by the on-scene
coordinator in the first step. These site and sampling descriptors would be sent to and
reviewed by Regional program personnel, a step that is not now performed. In addition,
the descriptor data found acceptable by the Regional program personnel would be
transmitted to the database and added to the analytical results generated during other stages
of the process. CARD does not now contain site and/or sampling descriptors. Lastly, the
flags assigned at various stages of the process would also be added to the database.
E. DEVELOPMENTAL CONSTRAINTS
There are several factors that could constrain the development of a Superfund
chemical analysis data system. These factors include:
• Mainframe Access — There are several factors to be considered when
developing SCADS on EPA's mainframe computer. These include potential
storage and telecommunications limitations and timesharing costs.
Most of EPA's major systems, such as CERCLIS, FMS and STORET are on
the mainframe computer. A Superfund chemical analysis data system is
anticipated to be a large system demanding significant computer resources to
run. The possibility exists that capacity limitations on the mainframe could be
encountered with the expected expansion of CARD. In addition, users would
be dependent on a telecommunications system to access the system. Costs
incurred with use of the mainframe and the telecommunications network will
likely be significant.
* Timing « There are several ongoing activities with OIRM that could affect the
way that information resources are managed throughout EPA. Among them are
policy development activities for identification codes and locational identifiers.
Ideally, these policies should be fully developed and implemented before CARD
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement 37
is changed, so that CARD enhancements are in compliance with these Agency-
wide policies.
• System Security and Data Sensitivity — Data that has been collected
exclusively for litigation support is highly confidential to EPA. They will be
used to support EPA's case preparation and criminal processing or cost
recovery activities. Some of these data may have been generated through the
CLP. Their inclusion with other non-sensitive data, such as those data
collected to support the RVFS and which are part of the Administrative Record,
could compromise EPA's chances for success in litigative actions.
• User Interface -- Potential CARD users have expressed reluctance to learn a
new computer system. Many have requested use of an existing system for
CARD access. Existing systems may not, however, adequately support CARD
users. A new system may have to be developed to meet their needs. Users
have expressed feeling overwhelmed by the number of systems they must be
able to operate and have identified this as something that would discourage them
from making the most use of a new system. Exhibit in-5 shows some simple
interactive menus and screens that could be incorporated into a Superfund
chemical analysis data system.
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement
38
EXHIBIT III-5
SAMPLE MENUS AND SCREENS
Contains
Menus and scree
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
CHAPTER IV
SUMMARY CONCLUSIONS
The following may be concluded from the analysis of the mission needs for a Superfund
Chemical Analysis Data System:
• A definite need exists outside of the Superfund Program for access to CLP
analytical results data, particularly in the areas of permit issuance, enforcement
support and environmental trends analysis.
• The addition of data validation flags and location identifiers are crucial to the
utility of CARD data because without them CARD cannot support uses other
than determination of contractor compliance.
• The addition of key data elements, particularly EPA ID code and
latitude/longitude coordinates of the sampling point would enhance the
capability to integrate CARD data with other data collections by serving as
information that would be in common with them.
• Overcoming the current constraints to the utility of CARD, particularly adding
validation flags and other missing data and creating necessary interfaces would
entail a sizable effort by OIRM and Superfund system managers.
• Mapping of CARD data cannot be performed without the addition of locational
identifiers.
• Serious consideration should be given to the ability of existing data
management/analysis systems such as STORET to support CARD data and the
needs identified during the mission needs analysis.
• Output from SCADS needs to be in several formats, including hardcopy for
inclusion in reports and electronic for integration with other systems.
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Mission Needs Statement
40
There is a clear need to move forward in the enhancement of CARD. The
recommended next step is the conduct of a preliminary requirements and options analysis
for the system.
This concludes this Mission Needs Analysis for a Superfund chemical analysis data
system, in accordance with the EPA Design and Development Guidance. The next phase
of the project, the Preliminary Design and Options Analysis, will define system
requirements, identify hardware/software options, evaluate the costs and benefits of each
option, and select the best option for the eventual development of the system.
------- |