V.I
                OOOR90101A
                                U.S. ENVIRONMENTAL PROTECTION AGENCY
                           OFFICE OF INFORMATION RESOURCES MANAGEMENT
                           RESEARCH TRIANGLE PARK, NORTH CAROLINA 27711
                        AUTOMATED LABORATORY STANDARDS:
             EVALUATION OF THE USE OF AUTOMATED
                     FINANCIAL SYSTEM PROCEDURES
                           CONTRACT 68-W9-0037, DELIVERY ORDER 035
                                                  JUNE 1990

-------
        Automated Laboratory Standards:

Evaluation of the Use of Automated Financial
             System Procedures
                Prepared for:

Office of Information Resources Management
   U.S. Environmental Protection Agency
Research Triangle Park, North Carolina  27711

                June 25,1990
                                             Prepared by:

                          BOOZ-ALLEN & HAMILTON Inc.
                                   4330 East-West Highway
                                 Bethesda, Maryland 20814
                                            (301) 951-2200

                                  Contract No. 68-W9-0037
                            Computer Sciences Corporation
                                    79 T.W. Alexander Dr.
                Research Triangle Park, North Carolina 27709
                                            (919) 541-9287

                                   Contract No. 68-01-7365

-------
                      Acknowledgments
      This report was  the  combined efforts  of Computer  Sciences
Corporation, Booz*Allen & Hamilton Inc., EPA staff, and outside experts.
Richard Trilling of CSC researched and prepared the draft for public review.
Jennifer Abrams, Marguerite Jones,  and Marcia Balestri of Booz* Allen
evaluated  the comments and completed this final report.  Numerous EPA
staff and outside experts provided  substantial critical reviews and valuable
technical comments.  Richard Johnson of the Scientific Systems Staff of EPA's
Office of Information  Resources Management directed the contractors' work
and managed the review process.
                                -n-

-------
                       Table of Contents


Executive Summary	iv

Background	1

      Exhibit 1: Need for EPA's Automated Laboratory
            Standards Program	2

      Exhibit 2: Considerations in Developing Automated
            Laboratory Standards	4

Findings	7

      System Introduction and Comparison	7

            Exhibit 3: Generic View of an Automated Laboratory System	8

            Exhibit 4: Generic View of a Savings Account System	9

      Risks to Data Integrity	11

      Controls That Can Help Protect Against the Loss of Integrity	12

      The Importance of Backing Up the Data Base and
      Maintaining an Audit Trail	15

      The Importance of the Auditing Function	16

      Legality Validity	17

Summary and Conclusions	19

Glossary

References
                                 -in-

-------
                      Executive Summary
      The  U.S.  Environmental  Protection Agency  (EPA) has initiated a
program to ensure the integrity of computer-resident data in laboratories
performing analyses in support of EPA programs.  The possession of sound
technical data provides a fundamental resource for EPA mission to protect
public health and the environment.

      This report describes the  findings of a review of standards used in
existing automated systems in the financial  industry.   EPA has chosen to
study financial system standards  because the financial industry has had many
years of experience with automated systems, and reliability and validity of
financial data is critical to the success of financial institutions.   In addition,
auditors have developed a broad system of controls  designed to ensure the
integrity of financial data compiled in automated systems.

      The  financial  industry's experience in preserving data  integrity in
automated financial systems can  well be applied to the automated laboratory
environment. The main sources  of risk to data integrity are present in both
automated  financial  and automated laboratory systems.  These include the
following:

      •     Adding incorrect data to the data base

      •     Having multiple applications affecting a single data base

      •     Intentional or inadvertent acts

      •     Failure  of the  data base management  system to  function as
            specified.

Because automated financial systems  have been developing for  a number of
years, much research has been conducted on controls to help protect a system
                                 -iv-

-------
against these risks.  Controls that can aid in minimizing these four significant
risks include:

       •     Verifying input data

       •     Securing data by restricting access

       •     Permitting write access to only one application/user at a time

       •     Instituting a program for detecting/reporting/correcting system
            problems.

Finally, the need to back up the data base regularly and maintain an accurate
and  complete  audit  trail, in  addition to  employing  these  data integrity
controls, has been  demonstrated in the financial environment and  will be
equally important in the automated laboratory environment.
                                   -v-

-------
                           Background
      The  U.S.  Environmental Protection Agency (EPA)  has initiated a
program to ensure the  integrity of computer-resident data in laboratories
performing analyses in support of EPA programs by developing standards for
automated laboratory processes.  The possession  of  sound technical data
provides a fundamental resource for EPA's mission  to protect the public
health  and environment,  regardless  of the activities  of  the  specific
environmental programs.  The activities of these environmental programs
are diverse, and include  basic research at EPA's environmental research
centers, environmental sample analyses at EPA's  regional laboratories and
contractors' laboratories, and product  registration relying on analytical data
submitted by the private sector.

      EPA recognizes that the implementation of an  automated laboratory
standards program will require each laboratory to allocate resources of dollars
and  time for  the program's  execution.   Experience has shown that in
developing and  using a proper standards program, a net  savings may be
achieved, as acquisition, recording, and archiving  of data will be improved
with a net reduction in test duplication.

      Within EPA, the Office of Information Resources Management (OIRM)
has assumed the objective  of establishing an automated laboratory standards
program. The need for this program is evidenced by several factors.  Exhibit 1
illustrates these  factors,  which include  the  rising use of  computerized
operations  by laboratories,  the  lack  of uniform standards  developed or
accepted by EPA, evidence of problems associated  with  computer-resident
data, and the evolving needs of EPA auditors and inspectors for guidance in
evaluating automated  laboratory operations.

      Laboratories collecting data for EPA's programs have  taken advantage
of increasing technology to  streamline  the analytical processes.  Initially,
automated instrumentation entered the laboratories to increase productivity
and enhance the accuracy  of reported  results.  Computers maintaining data
                                  -1-

-------
   E
   cu

   O)
   o
   V)

   •5
   03
   T3

   I
   CO
   o
   H-I
   CU

   O
x-o
in £

   03

   O
   V)

   2
   LU
   T3
   0)
   
-------
bases of results were then used for data management and sample tracking.
These computer systems were integrated into more sophisticated laboratory
information  management systems (LIMS).  Methods for data reporting
include electronic mail, electronic bulletin boards, and direct links between
central processing units.  Each of these advances necessitates thorough quality
control  procedures  for data generation, storage, and retrieval to ensure the
integrity of computer-resident data.

      Currently, EPA has no Agency-wide  guidelines  for laboratory
information integrity that laboratories collecting and evaluating computer-
resident data must follow.  The requirements  that  must be considered in
developing automated laboratory standards come from a variety of sources, as
Exhibit  2 illustrates, including the requirements of the Computer Security Act
of 1987  (P.L.  100-235, January 8, 1988) and various EPA program-specific data
collection requirements  under  Superfund, the  Resource Conservation  and
Recovery Act, the Clean Water Act, and the Safe Drinking Water Act, among
others.  Additionally, OIRM has developed electronic transmission standards
and  is  developing  a strategy  for electronic recordkeeping and electronic
reporting standards that will affect all Agency activities.  The development of
uniform principles  for automated data in EPA laboratories, regardless of
program,  will take  into account the  common  elements  of all these data
collection  activities, and provide a minimum standard that each laboratory
should  achieve.

      There is increasing evidence of problems associated with the collection
and  use of  computer-resident laboratory  data supporting various EPA
programs.  To illustrate, as of November 1989, EPA's Office of the Inspector
General was investigating between 10 and 12 laboratories in  Superfund's
Contract Laboratory Program  (CLP) for a variety of allegations, including
"time traveling"  and instrument calibration  violations.  In "time  traveling,"
sample  testing dates are manipulated, by either adjusting the internal clock of
the instrumentation performing the analyses or manipulating the resultant
computer-resident data. (Hazardous waste samples must be assayed within a
prescribed time period or the results may be compromised.) Additionally,
calibration standard results have allegedly been electronically manipulated
                                  -3-

-------
  CO
  •o
  c
  CO
  2
  o
  n
  to


  •o

  a
  (0
WE

h- O

5 3
X O)

u.E
  Q.

  £

  
-------
and other calibration results substituted when the actual results did not meet
the range specifications of the CLP procedure being followed.

      Because  the  introduction of automation  is relatively new and still
evolving, no definitive guidelines for EPA auditors and inspectors have been
developed.  Inspectors must be alert to the steps in those procedures used by
the laboratories generating and using computer-resident data where the
greatest  risk exists.  These critical process points indicate the magnitude of
control that should be placed on each step of the process. If adequate controls
are not present, the remainder of the process cannot correct a deviation, and
the entire  process  will  provide no reliable  conclusions.   Automation
introduces many new variables into a system, each with its own set of critical
process  points.  Inspectors must verify  that laboratory management has
recognized the various  risks  and  has  instituted  an  appropriate  risk
management program.

      As part  of the EPA's program  to ensure the integrity of computer-
resident data, EPA  reviewed  the  policies  and procedures in place in
automated financial systems. The purpose of the review was twofold.  First,
EPA hoped to  learn possible risks to data integrity and controls to protect
against them.   Second, it was  hoped that some of the standards in place in
automated financial systems could be applied to the automated laboratory
environment.  As a  result of its research in automated financial systems, EPA
identified many procedures that could be applied  to automated laboratories.

      Other areas of evaluation in developing the standards program include
a review of current technology,  a survey of current  automated  laboratory
practices, and  an analysis of  the applicability  of EPA's  Good  Laboratory
Practice regulations to automated laboratories.  The findings  of each of these
evaluations are provided in separate reports.

      The purpose of this paper is to report on the findings of EPA's research
into the financial systems and associated procedures.  It is intended to provide
guidance for people developing standards for the automated laboratory
setting, by indicating sources of risks and procedures to control risks.
                                  -5-

-------
      The findings reported in this paper are based on both library research
and interviews with laboratory and financial systems experts.  Because little
has been written on the preservation of data integrity in automated financial
systems specifically, interviews with local bank systems operations managers
were used to learn more about practical application of the theory.

      This paper is organized as follows.  First, a side-by-side introduction
and comparison of a  generic laboratory system and a generic  automated
banking system will be presented to illustrate the similarities between the two
systems and thus the applicability of procedures used in the financial arena to
the laboratory environment.  Then, the paper  will present a discussion of
risks to data integrity and controls to counter the risks. Next, the paper will
discuss the importance of backup, audit trail, and auditing functions. Finally,
a brief introduction to the questions of the legal validity of computer-resident
data will be presented.
                                   -6-

-------
                             Findings
System Introduction and Comparison

      An example of a generic laboratory system is presented in Exhibit 3.  In
practice,  the degree of automation varies widely across laboratories; however,
at a base level, most systems have the general components and interfaces
depicted in the exhibit.   In  summary, when a sample is received  at a
laboratory, sample identification data is entered into the system (usually by
manual keying, but the use of a bar code is becoming increasingly common).
Analyses are then performed on the sample;  these will be done  completely
without human interaction in the most  automated of systems, or entirely "by
hand" in the most elementary of systems.   Sample analysis results are  then
validated (again automatically or manually, depending on system capabilities)
and either posted to the data base or reprocessed.  The data base is  then
accessed to produce desired reports, such as sample analysis reports.

      Exhibit 4 illustrates a generic automated savings account system.  As
with automated laboratory systems, the degree of automation will vary across
banks, although not nearly to the degree it will in laboratory systems because
basic automated financial systems have been developed and put  in place for a
number of years.  The system depicted in Exhibit 4 details the manual deposit
and withdrawal system; several other applications, such as automatic deposit,
automatic mortgage payment withdrawal,  and wire transfer, may affect the
account balance data base as well, and are therefore noted in the exhibit.

      The deposit or withdrawal process begins as account change data are
keyed  to the  system  either  by a teller or by the  account owner at the
Automated Teller Machine (ATM). The account change data are verified by
daily teller  and ATM  reconciliations.   The account change data  are then
posted to the main data base, where  changes to  the appropriate account
balances are made. The data base is then accessed to produce desired reports,
such as monthly savings account statements.
                                  -7-

-------
                     EXHIBITS
Generic View of an Automated Laboratory System
•	^\
   Sample
Identification
.Information^^'
             POSTING TO
           MAIN DATA BASE
                I
                Main
              Data Base
                                      I
            MODIFICATION
                                      Audit
                                      Trail
Reports
                          -8-

-------
                                    EXHIBIT 4
                   Generic View of a Savings Account System
Wire Transfer
  Application
 Direct Deposit
  Application
  Automatic
 Withdrawal
 Application
                                 Deposit or
                                 withdrawal
                                 information
                 Savings
                 Account
                 System
                               Entry to system
                               (keyed by teller
                              or through ATM)
 Savings
 account
change file
Savings account
change
validation


Teller drawer
reconciliation
report
  I
                                                   Validated
                                                 savings account
                                                    changes
                                  Savings
                               account master
                                   file


Update
savings
account
master file


Report of
savings account
changes
                                                     I
                                                   Updated
                                                 savings account
                                                  master file
                                        -9-

-------
      Automated laboratory and financial systems are similar in many ways.
Both systems build a data base of information on which important decisions
will be based, so the integrity of the data is critical to the usefulness of both
systems.  Both systems are dependent on human input.  Savings account
systems receive data input from tellers and ATMs; most laboratory systems
require human input  of  either sample  identification information or test
results. In addition, both systems require a well-developed system support
plan,  which  includes  staffing requirements and procedures, and defined
performance control criteria. Finally, system planning for both systems must
include procedures for  establishing and maintaining an audit trail.

      The systems differ  in ways that may at first appear to be relatively
minor but ultimately  have important implications for the  application  of
financial system data integrity standards and procedures  to the automated
laboratory environment.  First, and most importantly, an audit trail is easier
to establish with an automated financial system than  with  an automated
laboratory because paper backup of transactions (such as deposit and
withdrawal slips) are generated in the course of initiating a change to the
financial data base in  the first place. With automated laboratory systems,
unless a well  defined process is adhered to (such as a manual logging process
that tracks sample identification and test results data), it may be harder  to
establish an audit trail. This is primarily for two reasons: first, the sample is
thrown away (or naturally degraded), and second, records of the results  of
analyses may be inadequately documented.

      The second difference between  the systems is that financial systems are
more  subject  to fraud and embezzlement than laboratory systems; therefore,
some  of the procedures used to ensure security of the data base in financial
systems may not be required  in automated laboratory systems.  However,
Pincus  (1989) recommends that several financial auditing  techniques be
applied to scientific data to detect fraud.

      The final significant difference between the systems  is that a financial
system will  have a  much higher  volume of transactions than will  a
laboratory system.
                                  -10-

-------
Risks to Data Integrity

      After  reviewing  available  information,  we have identified  four
primary risks to data integrity that would be present in both automated
laboratory and  automated  financial  systems.  The following, summarized
from Perry (1983), presents a description and defines the implications of each
type of risk.

1.    Incorrect data are added to the data base. Data added to the data base
      can be incorrect due to data entry error and/or lack of data verification
      and validation.  Data entry errors are easy to make, especially when
      data are manually keyed in.  And unless data are verified or validated
      before being posted to the main data base, incorrect data can be keyed in
      correctly,  and still be incorrect when added to the data base.  The
      implications of adding incorrect data to the data base are obvious. At
      best, with a good audit trail,  corrections can be made to the data base,
      and the  integrity of  the data preserved  (however,  not without
      considerable extra effort).  At worst,  the incorrect data may never be
      discovered, and the  data base could provide  bad information and
      ultimately lead to incorrect decisions.

2.    The data base is  interfered with and data integrity  is damaged.
      Through intentional or inadvertent acts, the integrity of the data in the
      data base may be  damaged.  Fraud and embezzlement in financial
      systems are examples of intentional acts; as discussed on page three of
      this paper, "time traveling" and instrument  calibration allegations are
      examples  of incidents of potential laboratory fraud currently being
      investigated by EPA's Office of the Inspector General.  It is also easy to
      imagine accidental interferences  with the  data  base  that  would
      compromise data integrity.  Again, the implications of these risks are
      that they  would affect data integrity  and that their occurrence could
      easily go unnoticed.

3.    Data base is affected by multiple applications, and the data resident in
      data base may not reflect the  effect of  all applications at all times. The
      use of multiple applications  on a  single data base is common in the

                                  -11-

-------
      banking industry and is introduced in Exhibit 4.  The risk to integrity is
      that one  application will change the original data, and another
      application will access and make decisions based on data that do not
      reflect changes made  via the  first application.  For example, consider
      the number of applications that can affect a  savings account balance
      (see Exhibit 4). Suppose an automatic withdrawal application (for an
      automatic mortgage payment) is initiated to effect a withdrawal from
      an account.  However, at the  same time, the account owner,  using an
      ATM, attempts to withdraw all of the money from the account. If there
      were not procedures in  place to ensure that the account balance data
      reflect the effect of all  the applications at all times, clearly, more money
      could be withdrawn from the  account than is actually available.  While
      the use of multiple applications on  the same data  base may be less
      common in laboratories, it could certainly  be  the  case that the type  of
      analysis run would depend on the results of a  previous analysis, and, if
      the data base doesn't reflect those results, the  wrong experiment could
      be run.

4.     Failure of the data base management system  to function as  specified.
      The data base management system includes all of the algorithms that
      are required to update,  sort, reproduce, and maintain the data in the
      data base. If the system does not function as it is designed to  or if there
      are flaws in the data base design so that it does not achieve  the same
      results as a manual system would have (for example, if withdrawal
      information is posted to  the wrong account), the resulting data will not
      be useful as a source of  information because their integrity will not be
      assured.

Controls That Can Help Protect Against Loss of Integrity

      The following, also summarized from Perry  (1983), presents a list  of
possible controls that can be used  to  help protect  against the loss of data
integrity for each of the types of risk identified above.  Where relevant, actual
procedures used by local banks in the administration of  savings account data
bases are included.  The controls included are intended not to be prescriptive
but to provide suggestions.  The system configuration, potential risks, and

                                  -12-

-------
data characteristics must be considered when designing controls for a specific
system.

1.     Incorrect data are added to the data base. Several control procedures
      exist that could be used to help ensure that the data added to the data
      base are correct:

      •     Re-key the data

      •     Produce system control report listing updates to be posted to the
            data base

      •     Include in the system design a validation step using computer-
            resident  validation  data  and/or  verification   routines and
            procedures.

      Banking systems ensure that the data are correct by applying a version
      of the third suggested  control: requiring  tellers  to  "balance  their
      drawers" at the end of every shift, before the data are posted to the
      main data base.  To  do so, the teller must reconcile the deposit and
      withdrawal slips with the  data entered  on  the terminal.  The same
      reconciliation process is conducted for ATM activity.

2.     The  data base is interfered with and data integrity is damaged. The
      following procedures help prevent against this sort of risk:

      •     Restrict access to the system; allow different people different
            degrees of access; employ password control over access

      •     Bond  employees

      •     Enable a  security  officer  to oversee  all activity and report
            suspicious activity

      •     Train  personnel  on how  to  use  the  system  (to avoid
            unintentional acts).
                                  -13-

-------
      Bank systems typically function using all four of these controls, because
      financial data is especially susceptible to fraud and  embezzlement.
      Although it is less likely that a laboratory system would require bonded
      employees or a security  officer function, the other controls listed above
      may be beneficial.

3.     Data base is affected by  multiple applications, and data  resident in data
      base may not reflect the effect of all applications at all times.  Several
      control procedures exist that could be used to help ensure against the
      risks associated with multiple applications to a single data base:

      •     Development of data base control standards for entry, validation,
            use, and deletion of data

      •     Concurrent data control whereby if one user is changing a data
            element, another user is temporarily denied write access

      •     Data ownership,  where one  individual is  responsible for  each
            data element

      •     Development of a data element reconciliation utility, including
            a library of locations for redundant data elements.

      Bank  systems have well-developed data base control standards for
      different applications that interface with and affect a  data base.   For
      example, in the automatic withdrawal scenario described earlier, banks
      have  daily ATM and teller "shutdown" periods during  which other
      applications (such as automatic mortgage payment withdrawals) are
      performed. As a result,  the data are never being affected by more than
      one  application  at the same time.   Additionally,  procedures for
      ensuring that a wire transfer is reflected in the data base are  well
      defined, and only staff with a higher level of responsibility are able to
      execute transfers.
                                  -14-

-------
4.    Failure of the data base management system to function as specified.
      The primary control for protecting against the loss of integrity resulting
      from this risk is the development of a tool for formal reporting of data
      base problems.  The automated financial systems we considered had
      routine procedures and forms to report problems with the data base,
      such  as  reporting to the  data base administrator any  unusual or
      problematic occurrence in posting data to the main account data base.

The Importance of Backing Up the Data Base and Maintaining an Audit Trail

      In addition to  using data  base integrity controls, the importance of
backing up  the data  base and maintaining  an  audit trail cannot be
overemphasized.  Despite even  the  most careful use of controls, there  is
always the possibility  that  something could  happen  to  disrupt data
availability, which may necessitate that data be recreated.  In addition, data
resident in a data base do not suffice as legal evidence; the court looks to hard-
copy  data  documenting the complete chain  of custody for legal evidence.
Regular maintenance of backup files, and strict adherence to procedures
designed to maintain an audit trail is fundamental to ensuring that a data
base can be relied upon for correct information now and in the future.

      Two aspects of providing backup for a  data base are important.  First,
the backup should be made as often as is required to maintain the usefulness
of the backup data base. In theory, the frequency of backing up the data base
depends on the cost of losing the data. In the banking industry, backups are
made at least daily; in automated laboratories,  the frequency with which
backups should be made depends on the  degree of activity in the laboratory,
the sensitivity and difficulty of the activity being tracked, and the size and
type of hardware.  Second, the  backup copy  of the data base should be
maintained  on  a different storage device from  the original  file; in other
words, the  backup should not be made on the same disk (or tape), but on a
different disk (tape).  It is also advisable to store the medium with the backup
in a  different location to protect against risks  that might affect both the
original and the backup, such as fire or intentional  acts.
                                  -15-

-------
      In designing procedures to maintain an audit trail, it is important to
consider hard-copy data to document the complete chain of custody for every
data element added to the data base. In the banking industry, that hard-copy
chain is documented by deposit and withdrawal slips, teller reconciliation
reports, monthly statements,  and the like.  In the laboratory environment,
procedures must include maintaining  backup of all sample identification
information, analysis results, validation results, calibration results, and other
ancillary information  to recreate the  complete  chain  of  custody  of  the
laboratory results.  From an  evidentiary standpoint, a hard-copy  chain of
custody surpasses a computer-resident trail.

The Importance of the Auditing Function

      Although the emphasis in auditing is on controls, the typical steps in
the process  of auditing  resemble  the steps in traditional automated data
processing  (ADP)  systems  analysis, including  EPA's  own   system
methodology, as the following demonstrates:

1.     The role of the auditor in controlling  ADP activities  to ensure data
      integrity:

      •     Understand  the purpose of the system and  its requirements —
            the need for the system;

      •     Design, implement, and test an effective system of control; and

      •     Test the system for compliance — that is,  determine how well
            the outputs of the system meet expectations.

2.     The role of the systems analyst in controlling ADP activities to ensure
      data integrity:

      •     Understand  the  purpose of  the  system and specify  its
            requirements;

      •     Design, implement,  and  test  a  system  to   satisfy those
            requirements; and
                                 -16-

-------
      •     Evaluate the ability of the system to satisfy those requirements.

      The EPA System Design and  Development Guidance (OIRM,  1989)
describes and documents the steps in traditional systems analysis and design.
EPA's methodology for systems development specifies the following stages in
the software life cycle:

      1)    Mission Needs Analysis
      2)    Preliminary Design and Options Analysis
      3)    System Design
      4)    System Development
      5)    System Implementation
      6)    System Improvement Plan
      7)    Software Improvement Increment
      8)    Software Obsolescence and Disposal

      The  ADP  auditor  must "safeguard   software,  trace  computer
transactions, review the systems development cycle, and monitor adherence
to administrative policy" (Gallegos and Bieber, 1986, p. 2).

      To make sound managerial  decisions, organizations  need properly
      authorized, complete, accurate, and reliable data.  Achieving the data
      integrity objective requires that systems have adequate controls over
      how data are entered, communicated, processed, stored, and reported.
      (GAO, 1986, p. 11).

      Auditing imposes controls  to maximize  one's  assurance of data
integrity.  The concepts of risk and  control provide  a useful  perspective for
reviewing laboratory automatic data  processing.

Legal Validity

      Ultimately, data integrity is important so that one may have confidence
in the conclusions that are drawn from the data.  In general, automated data
base systems must maintain  audit  trails and provide a complete chain of
custody for data  from  which conclusions  with legal implications will be
                                 -17-

-------
drawn (see Glover et al., 1982).  At present, it is typically the case that the
chain of custody must be documented in court using hard-copy data.

      In automated financial systems, it is currently  the  case that banking
institutions are  required by Federal statute to generate paper trails of all
financial transactions.  In other words, no matter how carefully financial
systems implement procedures to control risks,  they  must provide  written
documentation  of all  activity.   The Electronic  Fund Transfer Act, which
regulates ATMs  and other aspects of the "cashless society," states:

      For  each  electronic fund  transfer initiated by  a consumer from an
      electronic terminal, the financial institution holding such consumer's
      account shall, directly or indirectly, at the time the transfer is initiated,
      make  available  to  the consumer  written documentation  of such
      transfer [EFTA, 1979: Paragraph 906, Section (a)].

For a listing of the Federal Reserve Board's regulations in  conformance with
the Electronic Fund Transfer  Act, see the  Board  of Directors of the  Federal
Reserve (1979).  For further comment on the Act, see Schroeder (1983).
                                  -18-

-------
                 Summary and  Conclusions
      The financial industry's experience with automated financial systems
standards is well applied to the  automated laboratory environment.  The
financial industry demonstrates that reasonable levels of assurance of that
integrity can be achieved through careful  use  of  controls and  backups.
Integrity of computer-resident data in automated laboratory systems depends
simply on whether systems are designed with appropriate controls.  Although
specific controls applied will differ from between the two  applications, in
general,  the risks the  data bases are subject to and  controls that can  be
considered to counter  the risks are similar.  Examples of these are shown
below:
RISKS
Incorrect data are added to the data
base.
The data base is interfered with and
data integrity is damaged
Data base is affected by multiple
applications
Failure of data base management
system to function as specified
CONTROLS
Re-key the data
Produce system control report
Include a validation step
Restrict access
Employ a security officer
Train personnel
Develop standard procedures
Use concurrent data control
Create data ownership
Develop data reconciliation utility
Develop routine procedures and
forms to report problems
      Integrity of computer-resident data is important so that one may draw
valid conclusions from such data — including conclusions that will hold up
in court.  Even  ADP systems in the financial and banking industry must
maintain paper trails of transactions should those transactions be challenged
in court.  Despite  rapid  and complex developments  in  automated data
processing, the need and expectation still exist that only hard-copy paper trails
                                 -19-

-------
constitute generally accepted evidence in support of conclusions drawn from
computer-resident data.

      Systems concerned with financial matters have developed standards to
achieve reasonable levels of assurance of data integrity. Thus, it seems fair to
conclude that systems  concerned with issues of public health and the
environment — e.g., automated laboratory data processing — would require
similar levels of assurance of data integrity and the controls required to
achieve such levels.
                                 -20-

-------
                Automated Laboratory Standards Program


                               GLOSSARY
Application  controls - one of the two sets or types of controls recognized by
the auditing  discipline.  They are specific for each  application and include
items such as data entry verification procedures (for instance, re-keying all
input); data base recovery and roll back procedures that permit the data base
administrator to recreate any desired state of the data base; audit trails that not
only assist the  data base administrator in recreating  any desired state of the
data base, but also provide documentary evidence of a chain of custody for
data; and use of automated reconciliation transactions that  verify the  final
data base results against the results as reconstructed through the audit trail.

Application  software - a program  developed, adapted, or  tailored to the
specific  user  requirements  for  the  purpose of data  collection,  data
manipulation, data output, or  data archiving [Drug Information Association].

Audit trail - records of transactions that collectively provide documentary
evidence of processing, used  to trace from original  transactions forward to
related records and reports or backwards from records and reports to source
transactions.  This series of records documents the  origination and flow of
transactions  processed through  a  system [Datapro]. Also,  a  chronological
record of system  activities that is sufficient to enable the  reconstruction,
reviewing, and examination of the sequence of environments  and  activities
surrounding or leading  to an  operation, a procedure, or  an event  in  a
transaction from its inception to final results [NCSC-TG-004].

Auditing - (1)  the process of establishing that prescribed procedures and
protocols have been followed; (2) a technique applied during or at the end of a
process  to  assess the acceptability of  the  product.  [Drug  Information
Association]; (3) a function used by management  to assess the adequacy of
control [Perry].  That is, auditing is the set of processes that evaluate how well
controls  ensure data integrity.   As a  financial example, auditing  would
include those activities that review whether deposits have been attributed to
the proper accounts; for  example, providing an  individual with a hard-copy
record of the transaction at the time of deposit  and sending the individual  a
monthly  statement that lists all transactions.

Automated laboratory  data  processing - calculation, manipulation, and
reporting of analytical results using computer-resident data, in either a LIMS
or a personal computer.

Availability - see "data availability."
                                  G-l

-------
                Automated Laboratory Standards Program


Back-up - provisions made for the recovery of data files or software, for restart
of processing, or for use of alternative computer equipment after a system
failure or disaster [Drug Information Association].

Change  control -  ongoing evaluation of system operations and  changes
during the production use of a system, to determine when and if repetition of
a validation process or a specific portion of it is necessary. This includes both
the ongoing, documented evaluation, plus any validation testing necessary to
maintain a product in a validated state [Drug Information Association].

Checksum  -  an error-checking method  used in data communications in
which groups of digits are summed, usually without regard for overflow, and
that sum checked against a previously  computed sum to verify that no data
digits have been changed [Drug Information Association].

Cipher - a method of transforming a text in order to conceal its meaning.

Confidentiality - see "data confidentiality."

Control - "that  which prevents, detects, corrects, or reduces a risk" [Perry,
p.45], and thus  reasonably ensures that data  are complete, accurate,  and
reliable.  For instance, any system that verifies the sample  number against
sample identifier  information  would  be  a control  against inadvertently
assigning results to the wrong sample.

Computer system - a group of hardware components assembled to perform in
conjunction with a set of software programs that are collectively designed to
perform a specific function or group of functions  [Drug  Information
Association].

Data  - a  representation of facts, concepts, or  instructions  in a formalized
manner suitable for communication, interpretation, or processing by human
or automatic means [ISO, as reported by Drug Information Association].

Data  availability - the state when data  are in the place needed by the user, at
the time the user needs them, and in the form needed by the user [NCSC-TG-
004-88]' the state where information or services that must be accessible on a
timely basis to meet mission requirements  or to avoid other types of losses
[OMB]. Data stored electronically require a system to be available in order to
have access to the data.  Data availability can be impacted by several factors,
including system "down time," data encryption,  password  protection,  and
system function access restriction.

Data Base Management System  (DBMS) - software that allows one or many
persons to create a data base, modify data in the data base, or use data in the
data base (e.g., reports).
                                  G-2

-------
                 Automated Laboratory Standards Program
Data base - a collection of data having a structured format.

Data confidentiality - the ability to protect the privacy of data; protecting data
from unauthorized disclosure [OMB].

Data element (field) - contains a value with a fixed size  and data type (see
below). A list of data elements defines a data base.

Data integrity - ensuring the prevention of information corruption [modified
from EPA  Information Security  Manual];  ensuring the prevention  of
unauthorized modification [modified from OMB]; ensuring that data are
complete, consistent, and without errors.

Data record - consists of a  list of values possessing fixed sizes and data types
for each data element in a particular data base.

Data types -  alphanumeric (letters, digits,  and special characters), numeric
(digits only), boolean (true or false), and specialized data types such as date.

Electronic  data integrity  - data integrity protected by a computer system;
automated data integrity refers to the goal of complete and  incorruptible
computer-resident data.

Encryption - the translation of one character string into another by means of a
cipher, translation table, or algorithm, in  order to render the information
contained therein meaningless to anyone who does  not possess the decoding
mechanism [Datapro].

Error - accidental mistake caused by human action or computer failure.

Fraud - deliberate human action to cause an inaccuracy.

General  controls - one of the two sets or types of controls recognized by the
auditing discipline. These operate across  all  applications.  These  would
include developing  and staffing a quality assurance program that works
independently of other staff;  developing and enforcing documentation
standards; developing standards for data transfer and manipulation, such as
prohibiting the same individual  from both  performing and  approving
sample testing; training individuals to perform data  transfers; and developing
hardware controls, such  as writing different backup cycles  to different disk
packs and developing and enforcing labelling conventions for all cabling.

Integrity - see "data integrity."
                                  G-3

-------
                Automated Laboratory Standards Program


Journaling - recording all significant  access or file activity events in  their
entirety.  Using a journal plus earlier copies of a file, it would be possible to
reconstruct the file at any point and identify the ways it has changed over a
specified period of time [Datapro].

Laboratory Information Management System  (LIMS)  - automation of
laboratory  processes under  a single unified  system.  Data  collection, data
analysis, and data reporting are a few examples of laboratory processes that
can be automated.

Password  - a unique word or string of characters used to authenticate an
identity.  A program, computer operator,or user may be required to submit a
password to meet security requirements before gaining access to data.  The
password is confidential, as opposed to the user identification [Datapro].

Quality assurance - (1) a process for building quality into  a system; (2) the
process of ensuring that  the  automated data  system  meets  the  user
requirements for the system and maintains  data integrity; (3) a planned and
systematic pattern of all actions necessary to provide adequate confidence that
the item  or product conforms  to established technical requirements
[ANSI/IEEE Std 730-1981, as reported by Drug Information Association].

Raw data - "...  any laboratory worksheets, records,  memoranda, notes, or
exact  copies thereof,  that are the result of original observations and activities
of a study  and are necessary for the reconstruction and evaluation of that
study. . .   "Raw data" may include photographs, microfilm or microfiche
copies, computer printouts, magnetic media,  . .  . and recorded  data  from
automated  instruments."  [40 CFR 792.3]  Raw data are the first or primary
recordings  of observations or results.  Transcribed data (e.g., manually keyed
computer-resident data taken from data sheets or notebooks) are not raw data.

Risk  - "the probable result of the occurrence of an adverse event..."  [Perry,
p.45].  An  "adverse  event"  could be either accidental (error) or deliberate
(fraud). An example of an adverse event would be the inaccurate assignment
of an accessionary number  to a test sample.  Risk, then, would be the
likelihood that  the results of an analysis would be attributed  to the wrong
sample.

Risk analysis  -  a  means of measuring  and  assessing the  relative
vulnerabilities and threats to a collection of sensitive data and the people,
systems, and installations involved in storing and processing those data.  Its
purpose is  to determine how security measures can be effectively  applied to
minimize   potential  loss.   Risk analyses  may vary  from  an   informal,
quantitative  review of  a microcomputer installation to  a formal,  fully
quantified review of a major  computer center [EPA IRM Policy Manual].
                                  G-4

-------
                Automated Laboratory Standards Program


Security - the protection of computer hardware and software from accidental
or malicious access, use, modification, destruction, or disclosure.   Security
also pertains to personnel, data, communications, and the physical protection
of computer installations [Drug Information Association].

System  - (1) a collection of people,  machines, and methods organized to
accomplish  a set of specific functions;  (2)  an  integrated  whole  that is
composed of diverse, interacting, specialized structures and subfunctions; (3) a
group of subsystems  united by  some  interaction or  interdependence,
performing  many duties but  functioning as a single unit [ANSI  N45.2.10,
1973, as reported by Drug Information Association].

System Development Life Cycle (SDLC) - a series of distinct phases through
which development projects progress.  An approach to computer system
development that begins  with an  evaluation of  the user  needs  and
identification of the user requirements and continues through system design,
module design, programming and  testing, system integration and  testing,
validation, and operation and maintenance, ending only when use of the
system is discontinued [modified from Drug Information Association].

Transaction log - also Keystroke, capture, report, and replay - the technique of
recording and storing keystrokes as entered by the user for subsequent replay
to enable the original sequence to be reproduced exactly [Drug Information
Association].

Valid - having legal strength or  force, executed with proper  formalities,
incapable of being rightfully overthrown or set aside [Black's Law Dictionary].

Validity - legal sufficiency, in contradistinction to mere regularity (being
steady or uniform in course, practice, or occurrence) [Black's Law Dictionary].
                                 G-5

-------
                            References
Black, Henry C. (1968), Black's Law Dictionary, Revised Fourth Edition (West
Publishing Co., St. Paul, Minnesota).

Board of Governors of the Federal Reserve System (1979), "Electronic Fund
Transfers," Regulation E (12 CFR Part 205), Effective March 30,  1979  (as
amended effective May 10, 1980).

Datapro Research (1989),  Datapro Reports on Information Security (McGraw-
Hill, Inc., Delran, New Jersey).

Dice, Barry,  Operations Manager, Sovran Financial Corp., Telephone
Interview, April 25, 1990 (Hyattsville, Maryland).

Drug Information  Association  (1988),  Computerized Data Systems  for
Nonclinical  Safety Assessment:  Current  Concepts and  Quality Assurance
(Drug Information Association, Maple Glen, Pennsylvania).

Electronic Fund Transfer Act (1979), 15 USC sec. 1693 et. seq.

Gallegos, Frederick, and  Doug Bieber, (1986), "What Every Auditor Should
Know  about  Computer  Information  Systems," available  as  Accession
Number  130454 from the General Accounting Office  (GAO)  and reprinted
from p. 1-11 in EDP Auditing  (Auerbach Publishers, Inc., 1986).

Glover,  Donald E., Robert G. Hall, Arthur W. Coston, and Richard J. Trilling
(1982), "Validation of Data Obtained During Exposure  of Human Volunteers
to Air Pollutants," Computers and Biomedical Research 15(3):240-249.

National  Bureau of Standards (1976), Glossary for  Computer Systems Security
(U.S. Department of Commerce, FIPS PUB 39).

National  Computer Security Center (1988) Glossary of Computer  Security
(U.S. Department of Defense, NCSC-TG-004-88, Version 1).

Office of Information Resources Management (1989).  EPA System Design and
Development  Guidance, Vols. A,  B, and C (U.S.  Environmental Protection
Agency, Washington, D.C.)

Perry, William E. (1983),  Ensuring Data Base Integrity (John Wiley and Sons,
New York).

-------
Pinkus,  Karen V.  (1989), Financial  Auditing  and Fraud  Detection:
Implications for Scientific Data Audit.  Accountability in Research 1:53-70.

Schroeder, Frederick J. (1983), "Developments in Consumer Electronic Fund
Transfers," Federal Reserve Bulletin 69(6):395-403.

U.S. General Accounting  Office  (1986), Evaluating  the  Acquisition and
Operation of Information  Systems  (General Accounting Office, Washington,
D.C.).

U.S. General Accounting Office (1987), Bibliography of GAO Documents, ADP,
IRM,  &  Telecommunications 1986 (General Accounting Office, Washington,
D.C.).

-------

-------