OOOR90101C ------- ------- Automated Laboratory Standards: RESULTS FROM THE SURVEY OF CURRENT TECHNOLOGY FOR AUTOMATED LABORATORIES Prepared for. Office of Information Resources Management U.S. Environmental Protection Agency Research Triangle Park, North Carolina 27711 June 15,1990 Prepared by: BOOZ-ALLEN & HAMILTON Inc. 4330 East-West Highway Bethesda, Maryland 20814 (301) 951-2200 Contract No. 68-W9-0037 Computer Sciences Corporation 79 T.W. Alexander Dr. Research Triangle Park, North Carolina 27709 (919) 541-9287 Contract No. 68-01-7365 ------- Acknowledgments This report was the combined efforts of Computer Sciences Corporation, Booz* Allen & Hamilton Inc., EPA staff, and outside experts. Richard Trilling, Will Harrelson, and Trevor Elliott of CSC researched and prepared the draft for public review. Marguerite Jones, Ronald Ross, and Lynn Eberhardt of Booz* Allen evaluated the comments and completed this final report. Numerous EPA staff and outside experts provided substantial critical reviews and valuable technical comments. Richard Johnson of the Scientific Systems Staff of EPA's Office of Information Resources Management directed the contractors' work and managed the review process. -ii- ------- Table of Contents Executive Summary iii Background 1 Exhibit 1 - Need for EPA's Automated Laboratory Standards Program 2 Exhibit 2 - Considerations in Developing Automated Laboratory Standards 4 Review of the Literature 7 Survey of LIMS Vendors 9 Development and Administration 9 Results of the Survey 10 Other Sources of New Technology 13 Conclusions 16 Glossary References Appendix A: Survey Questionnaire Appendix B: Summary of Results -in- ------- Executive Summary The U.S. Environmental Protection Agency (EPA) has initiated a program to ensure the integrity of computer-resident data in laboratories performing analyses in support of EPA programs by developing standards for automated laboratory processes. The activities of these environmental programs are diverse, and include basic research at EPA's environmental research centers, environmental sample analyses at EPA's regional laboratories and contractors' laboratories, and product registration relying on analytical data submitted by the private sector. This report investigates the availability of current automated technology that will provide adequate assurance that computer-resident data will be reliable. Several vendors of laboratory automation and laboratory information management systems (LIMS) were surveyed to determine if standards and controls are available that will ensure the reliability and validity of the data generated. Additionally, an extensive search of the literature did not reveal any hardware or software currently on the market that will guarantee the integrity of the data produced. Vendors already offer a variety of control techniques such as audit trails and password protection, and provide customizable systems to meet the varying needs of each of their customers. Most vendors rely on existing control features (e.g., password protection and system backup) provided by operating systems rather than duplicating them. Of the technological advances identified, the following can be considered in developing standards for automated laboratories: • Magnetic ink character recognition (MICR), which permits characters in labels to be read by magnetic scanners when written in standard format and standard location • Optical scanning, which permits the recognition of patterns of ink, such as those used in bar codes (universal product codes) -IV- ------- • "Smart cards," or credit cards that communicate with a remote computer from a sample analysis station via an embedded processing chip. These technologies can be tailored to the laboratory environment to assist in data management operations. -v- ------- Background The U.S. Environmental Protection Agency (EPA) has initiated a program to ensure the integrity of computer-resident data in laboratories performing analyses in support of EPA programs by developing standards for automated laboratory processes. The possession of sound technical data provides a fundamental resource for EPA's mission to protect the public health and environment, regardless of the activities of the specific environmental programs. The activities of these environmental programs are diverse, and include basic research at EPA's environmental research centers, environmental sample analyses at EPA's regional laboratories and contractors' laboratories, and product registration relying on analytical data submitted by the private sector. EPA recognizes that the implementation of an automated laboratory standards program will require each laboratory to allocate resources of dollars and time for the program's execution. Although this program may be considered too expensive by some, laboratory managers must consider that in developing and using a proper standards program, they will achieve a net savings as information processes do not have to be repeated and expensive mistakes can be avoided. Within EPA, the Office of Information Resources Management (OIRM) has assumed the objective of establishing an automated laboratory standards program. The need for this program is evidenced by several factors. Exhibit 1 illustrates these factors that include the rising use of computerized operations by laboratories, the lack of uniform standards developed or accepted by EPA, evidence of problems associated with computer-resident data, and the evolving needs of EPA auditors and inspectors for guidance in evaluating automated laboratory operations. Laboratories collecting data for EPA's programs have taken advantage of increasing technology to streamline the analytical processes. Initially, automated instrumentation entered the laboratories to increase productivity and enhance the accuracy of reported results. Then, computers maintaining data bases of results were used for data management and tracking. Computer -1- ------- 2 g" to •o (0 CO I! XT? iu s (0 CE C5 O cc 0. (0 o DC CO CL LU £ •o fo uu •t ^ ..••-\ ....•• ^^ ', -2- ------- systems were then integrated into more sophisticated laboratory information management systems (LIMS). Each of these advances necessitates thorough quality control procedures for data generation, storage, and retrieval to ensure the integrity of computer-resident data. Currently, EPA has no Agency-wide guidelines that laboratories collecting and evaluating computer-resident data must follow. The requirements that must be considered in developing automated laboratory standards come from a variety of sources, as Exhibit 2 illustrates, including the requirement of the Computer Security Act of 1987 (P.L. 100-235, January 8, 1988) and various EPA program-specific data collection requirements under Super fund, the Resource Conservation and Recovery Act, the Clean Water Act, and the Safe Drinking Water Act, among others. Additionally, OIRM has developed electronic transmission standards and is developing a strategy for electronic record keeping and electronic reporting standards that will impact on all Agency activities. The development of uniform principles for automated data in EPA laboratories, regardless of program, will take into account the common elements of all these data collection activities, and provide a minimum standard that each laboratory should achieve. There is increasing evidence of problems associated with the collection and use of computer-resident laboratory data supporting various EPA programs. To illustrate, as of November 1989, EPA's Office of the Inspector General was investigating between 10 and 12 laboratories in Superfund's Contract Laboratory Program (CLP) for a variety of allegations, including "time traveling" and instrument calibration violations. In "time traveling," sample testing dates are manipulated, by either adjusting the internal clock of the instrumentation performing the analyses or manipulating the resultant computer-resident data. (Hazardous waste samples must be assayed within a prescribed time period or the results may be compromised.) Additionally, calibration standard results have allegedly been electronically manipulated and other calibration results substituted when the actual results did not meet the range specifications of the CLP procedure being followed. If true, these allegations may be treated as felonies. -3- ------- CO •E co •o ) 2 o .Q (0 2 (0 CME X 0) UJ.E Q. £ § 0> Q (0 .0 ? 2 o> T3 "<0 C O o 2 2 2 o -, *•=•-». Igg^gs*. ^£SSO^§6 ------- Because the introduction of automation is relatively new and still evolving, no definitive guidelines for EPA auditors and inspectors have been developed. Inspectors must be alert to the steps in those procedures used by laboratories generating and using computer-resident data where the greatest risk exists. These critical control points indicate the magnitude of control that should be placed on that step of the process. If adequate controls are not present, the remainder of the process cannot correct a deviation, and the entire process will provide no reliable conclusions. Automation introduces many new variables into a system, each with its own set of critical process points. Inspectors must verify that laboratory management has recognized the various risks and have instituted an appropriate risk management program. As part of the EPA's program to ensure the integrity of computer- resident laboratory data, the Agency is investigating what automated data processing (ADP) systems exist, what controls and standards are feasible, and how vendors have identified and/or developed the standards they implement. Particularly important is whether there have been recent technological advances of devices or subsystems that provide full assurance of integrity for computer-resident data. To investigate these issues, the following activities were performed: 1) Reviewed professional journals to identify articles introducing or describing such advances 2) Developed and administered a survey to five (5) vendors of (LIMS) to determine what data integrity control features these vendors' products provide 3) Conducted telephone interviews with representatives of two major laboratory instrumentation manufactures to obtain information pertaining to the flexibility of the laboratory instrumentation currently available -5- ------- 4) Explored new technologies used in the banking, retail, and manufacturing industries that have the potential to enhance data integrity in the laboratory environment. This report complements our earlier report that reviews data integrity data processing standards in automated financial systems (OIRM, 1989b). These standards include: • Use of logon/pass word security • Data entry verification • Flagging of changes made and retention of both original and altered data (audit trails) • Protecting reliability of data by prohibiting the same person from both authorizing and allocating funds • Maintaining hard-copy data outputs. That report concludes that the financial auditing discipline offers reasonable levels of assurance of integrity of computer-resident data and recommends consideration of certain standards used in automated financial systems in developing standards for automated chemistry laboratories. -6- ------- Review of the Literature Twelve months (fall 1988 to fall 1989) of professional journals that deal with laboratory science, laboratory automation, and laboratory information management were reviewed. These journals include Analytical Chemistry, American Laboratory, Chemical Industry, Laboratory Practice, and Science. In this review, searches were made for articles on a variety of topics, including laboratory automation, laboratory information management, scientific computing, and related topics. Four articles were found. These articles were Sandowski, C, and G. Lawler, "A Relational Data Base Management System for LIMS," American Laboratory 21:3 (March 1989), pp. 70-79; Merrer, Robert J., and Peter G. Berthrong, "Academic LIMS: Concept and Practice," American Laboratory 21:3 (March 1989), pp. 36-45; Megargle, Robert, "Laboratory Information Management Systems," Analytical Chemistry, 61:9 (May 1989), pp. 612A-621A; and Anon., "Products - Information Management," Laboratory Practice, 38:5 (May 1989), pp. 87-91. Library on-line search tools were used at a major university library for these topic areas and related key words, and no references ("hits") were found. A similar search using EPA's Online Library System (OLS) was performed, which included not only articles from professional journals but also all EPA items registered with the National Technical Information Service (NTIS). In this search, two references were identified and are Dessy, Raymond E., The Electronic Laboratory (Washington, D.C.: American Chemical Society) and McDowall, R.D. ed., Laboratory Information Management Systems (Wilmslow, U.K.: Sigma Press), 1987. The publication by McDowall (1987) contains two LIMS articles of interest: Mattes, D.C., 1987, "LIMS and Good Laboratory Practice," and Brown, Elizabeth H., 1987, "Procedures and their Documentation for a LIMS in a Regulated Environment." It was concluded from the review of these searches that laboratory automation and laboratory information management are not yet common topics, and probably not yet part of the main stream of laboratory literature. Further, no existing laboratory standards were identified by the literature search. -7- ------- The journals were also reviewed to find advertisements for laboratory automation and/or laboratory information management systems and advertisements from the following vendors were found: PE Nelson CI Beckman Varian Associates, Inc. FIAtron Laboratory Systems Axiom Systems, Inc. Laboratory MicroSystems, Inc. Radian Corporation Advanced Systems Management, Inc. VG Instruments, Inc. Harley Systems, Inc. It was also known that Hewlett Packard and Digital Equipment Corporation have LIMS systems. This substantial number shows that even if the laboratory automation/laboratory information management system topic is not heavily discussed in professional periodicals, vendors nevertheless have found a market. -8- ------- Survey of LIMS Vendors Development and Administration The purpose of the survey was to identify LIMS that provide reasonably high levels of assurance of data integrity. Consequently, the items included in the questionnaire deal with a variety of ADP controls that reduce the risk of threats to data integrity. The survey elicits information about system documentation; security; data integrity; data reduction and analysis; and backup, archiving, and recovery. The full questionnaire appears in Appendix A. In developing the questionnaire, the following sources for topics and for questionnaire items were consulted: • A checklist of ADP audit features already being used by EPA in laboratory site visits to determine which such features are in place or feasible (OIRM, 1989a) • Standard systems analysis and design techniques (OIRM, 1987) • EPA LIMS functional specifications (OIRM, 1988). The survey was administered by telephone to the following vendors: CI Beckman Varian Associates, Inc. VG Instruments, Inc. Hewlett Packard PE Nelson The only significant problem encountered with survey administration was that vendors' products were typically highly customizable and therefore not easily characterized by a structured survey. -9- ------- Results of the Survey MAJOR FINDINGS There are four major findings from the survey: • Vendors offer a variety of features that can be customized to provide assurance of data integrity, such as passwords and records of data changes. • System vendors offer system specific data integrity features; however, there is no required standard set of data integrity features. • There is no "magic box" or technological advance to guarantee absolute data integrity. • Most vendors rely on existing control features (e.g., password protection and system backup) provided by operating systems rather than duplicating them. The discussion of the findings from the survey is supplemented with information obtained from telephone conversations with representatives from two LIMS vendors. These are discussed in more detail below. Customizable Products Vendors offer a relatively extensive variety of control techniques for ensuring data integrity. Vendors do not rely on or reference a set of standards in deciding which feature to deliver. Vendors respond instead to requests from their customers and deliver a system that provides the data integrity features specified by their customers. -10- ------- No Standard Configuration Presently, the manufacturers of automated laboratory systems design their instrumentation to incorporate their individual data integrity controls. With so many manufactures of this equipment in the market place, it stands to reason that without a universal standard to adhere to, there exist no standard configuration of data integrity controls. No "Magic Box" The vendors surveyed have not made advances in hardware or software that would guarantee full data integrity. It was thought that vendors might be making use of optical disk technology, write once/read many (WORM), or a similar, highly controlled method to minimize risk to data integrity. Vendors surveyed are not incorporating this type of technology in their systems. Reasonable levels of data integrity can be achieved through traditional controls such as re-keying for data entry verification, logon and password security, and using an audit trail to implement a chain of custody. After asking vendors all the questions, they were then asked to mention any additional features in their systems designed to ensure or useful at ensuring data integrity. Vendors did not identify any additional methods of ensuring data integrity. It can be assumed that any such methods had not been overlooked during the design of the survey. Confidence is high that the laboratory automation commercial market in general has not made technological advances in hardware or software that would guarantee full data integrity. Vendors Reply on Operating Systems Vendors will typically rely on existing control features of the operating system rather than duplicate those control systems. Password access control, for instance, usually consists of whatever the operating system provides. Backup typically consists of whatever frequency and medium the operating system provides. -11- ------- SUMMARY OF RESULTS Appendix B presents detailed results of the survey. The systems are generally capable of doing whatever the customer needs. Even if a system does not offer a particular option as a feature, it is usually flexible enough to allow the customer or vendor to customize the system in order to provide that option through programming or third-party software. -12- ------- Other Sources of New Technology The laboratory automation/LIMS vendors do not seem to use new technology for the laboratory environment. In the review of other automated systems, however, a number of technological advances were identified. To illustrate, banking, retailing, and manufacturing have witnessed technological advances that have potential for the laboratory environment. At least three such advances were identified and are discussed below: • Magnetic ink character recognition (MICR) • Optical scanning and bar codes (universal product codes) • Magnetic cards • "Smart cards." Technological advances have been made in magnetic ink and in sensitive scanners capable of MICR. Using this ink and MICR, scannable checks can be processed by machine because the banking industry has adopted a standard format for labeling checks with bank and account information. The magnetic ink, written onto checks in a standard format and standard location, can be read by most new scanners. Carrying the scanning technology further, optical scanners can read merchandise codes that have been written with ink and that conform not only to a standard location and format but also use a universal product code that has been developed so that merchandise can be labeled unambiguously. (Universal product codes are most familiar as "bar codes" on grocery products and are read by registers that are really terminals connected to central processing units and read the codes from merchandise, keep running totals of the individual's bill, and may even perform automatic inventory control and reporting.) This technology could be adopted for the laboratory environment as a method of labeling and reading samples that enter a laboratory. Labels could be affixed to sample containers during sample processing. Scanners would be -13- ------- installed at every station in the laboratory at which sample identification information is important. The scanner would read the sample identification from the physical sample container and pass that information to software. Software would compare the sample identification information read from the container with that entered at sample receiving time in order to verify that results information was being attributed to the proper sample. Magnetic stripes on credit cards provide a bank with information about an individual's account. Additionally, some methods of transportation (notably, the Metro subway system in Washington, DC) use magnetic stripes to record fare information that can be linked to distance and time of day. In some implementations, "smart cards" -- credit cards with an imbedded processing chip, as well as the traditional magnetic stripe ~ can communicate with and provide additional information to the host computer in a number of applications. Card-assisted ADP in the laboratory might work in the following manner: a physical sample would move through the laboratory and its identification information would be checked at each station, as desired above. At one or more of these stations, an authorized individual (perhaps a laboratory director of principal investigator) might enter a magnetic ("mag") card that would authorize posting of sample information to the data base and would retrieve from the data base information required for the next posting (the result of an analysis or the status of the experiment). Without the intervention of the mag card, the information from the sample could not be posted to the data base. Additionally, a smart card containing stored information on a sample, or on a number of similar samples being run together, could be inserted into an instrument to record the results of the sample analyses. Smart cards can be pre-formatted to receive data in any configuration, such as tabular, and are ideal for transmitting data from remote instrumentation to a central data management system. Smart cards can be erased and reformatted for use with a new sample or set of samples, thereby making them more cost effective. -14- ------- In general, automation technology that uses standards and is implemented adequately performs its tasks more reliably and perhaps more cost effectively than could be true of manual performance of the same tasks. Therefore, the technology described above has significant implications for the laboratory environment. -15- ------- Conclusions Automation technology that uses standards and is implemented adequately performs its tasks more reliably and perhaps more cost effectively than could be true of manual performance of the same tasks. After reviewing available literature and surveying various LIMS vendors, it was determined that laboratory automation and LIMS commercial vendors have not developed a standard set of controls that provide full assurance of the integrity of computer-resident data. Vendors typically deliver systems customized to fit the specifications of their customers, but there are no standards that define the default, baseline system each vendor delivers. These vendors have not made hardware or software advances that guarantee data integrity. Standards for laboratory automation would provide a common denominator for software design and other technological advances. Technological devices developed for a variety of fields have the potential to be applicable for use in the laboratory setting, but these devices have had little acceptance in this environment. These devices include the following: • Magnetic ink character recognition (MICR) • Optical scanning and bar codes • Magnetic cards • "Smart cards." Universal product codes (bar codes) have been used for sample identification in a few laboratories, and acceptance of that technology may be increasing. It is worth noting, however, that the technological advances in the banking, retailing, and manufacturing industries can be used only because each industry has developed standards for use of the technology. The technology for reading magnetic ink from checks works only because the banking industry has developed a standard format and a standard location for writing information onto the checks. Similarly, the retailing and manufacturing industries have developed standards for the format of universal product codes. -16- ------- The results of the survey of five LIMS vendors has indicated that the vendors are not currently standardizing their systems and technology. Until the time that the vendors voluntarily work in concert or are provided with a set of standards from outside sources, little progress can be made in incorporating these techniques into the analytical chemistry laboratories of concern to EPA. By tailoring existing technologies to the laboratory setting and by setting standards for operation of automated equipment, laboratory processes can produce data with increased efficiency and integrity. -17- ------- Automated Laboratory Standards Program GLOSSARY Application controls - one of the two sets or types of controls recognized by the auditing discipline. They are specific for each application and include items such as data entry verification procedures (for instance, re-keying all input); data base recovery and roll back procedures that permit the data base administrator to recreate any desired state of the data base; audit trails that not only assist the data base administrator in recreating any desired state of the data base, but also provide documentary evidence of a chain of custody for data; and use of automated reconciliation transactions that verify the final data base results against the results as reconstructed through the audit trail. Application software - a program developed, adapted, or tailored to the specific user requirements for the purpose of data collection, data manipulation, data output, or data archiving [Drug Information Association]. Audit trail - records of transactions that collectively provide documentary evidence of processing, used to trace from original transactions forward to related records and reports or backwards from records and reports to source transactions. This series of records documents the origination and flow of transactions processed through a system [Datapro]. Also, a chronological record of system activities that is sufficient to enable the reconstruction, reviewing, and examination of the sequence of environments and activities surrounding or leading to an operation, a procedure, or an event in a transaction from its inception to final results [NCSC-TG-004]. Auditing - (1) the process of establishing that prescribed procedures and protocols have been followed; (2) a technique applied during or at the end of a process to assess the acceptability of the product. [Drug Information Association]; (3) a function used by management to assess the adequacy of control [Perry]. That is, auditing is the set of processes that evaluate how well controls ensure data integrity. As a financial example, auditing would include those activities that review whether deposits have been attributed to the proper accounts; for example, providing an individual with a hard-copy record of the transaction at the time of deposit and sending the individual a monthly statement that lists all transactions. Automated laboratory data processing - calculation, manipulation, and reporting of analytical results using computer-resident data, in either a LIMS or a personal computer. Availability - see "data availability." G-l ------- Automated Laboratory Standards Program Back-up - provisions made for the recovery of data files or software, for restart of processing, or for use of alternative computer equipment after a system failure or disaster [Drug Information Association]. Change control - ongoing evaluation of system operations and changes during the production use of a system, to determine when and if repetition of a validation process or a specific portion of it is necessary. This includes both the ongoing, documented evaluation, plus any validation testing necessary to maintain a product in a validated state [Drug Information Association]. Checksum - an error-checking method used in data communications in which groups of digits are summed, usually without regard for overflow, and that sum checked against a previously computed sum to verify that no data digits have been changed [Drug Information Association]. Cipher - a method of transforming a text in order to conceal its meaning. Confidentiality - see "data confidentiality." Control - "that which prevents, detects, corrects, or reduces a risk" [Perry, p. 45], and thus reasonably ensures that data are complete, accurate, and reliable. For instance, any system that verifies the sample number against sample identifier information would be a control against inadvertently assigning results to the wrong sample. Computer system - a group of hardware components assembled to perform in conjunction with a set of software programs that are collectively designed to perform a specific function or group of functions [Drug Information Association]. Data - a representation of facts, concepts, or instructions in a formalized manner suitable for communication, interpretation, or processing by human or automatic means [ISO, as reported by Drug Information Association]. Data availability - the state when data are in the place needed by the user, at the time the user needs them, and in the form needed by the user [NCSC-TG- 004-88]' the state where information or services that must be accessible on a timely basis to meet mission requirements or to avoid other types of losses [OMB]. Data stored electronically require a system to be available in order to have access to the data. Data availability can be impacted by several factors, including system "down time," data encryption, password protection, and system function access restriction. Data Base Management System (DBMS) - software that allows one or many persons to create a data base, modify data in the data base, or use data in the data base (e.g., reports). G-2 ------- Automated Laboratory Standards Program Data base - a collection of data having a structured format. Data confidentiality - the ability to protect the privacy of data; protecting data from unauthorized disclosure [OMB]. Data element (field) - contains a value with a fixed size and data type (see below). A list of data elements defines a data base. Data integrity - ensuring the prevention of information corruption [modified from EPA Information Security Manual]; ensuring the prevention of unauthorized modification [modified from OMB]; ensuring that data are complete, consistent, and without errors. Data record - consists of a list of values possessing fixed sizes and data types for each data element in a particular data base. Data types - alphanumeric (letters, digits, and special characters), numeric (digits only), boolean (true or false), and specialized data types such as date. Electronic data integrity - data integrity protected by a computer system; automated data integrity refers to the goal of complete and incorruptible computer-resident data. Encryption - the translation of one character string into another by means of a cipher, translation table, or algorithm, in order to render the information contained therein meaningless to anyone who does not possess the decoding mechanism [Datapro]. Error - accidental mistake caused by human action or computer failure. Fraud - deliberate human action to cause an inaccuracy. General controls - one of the two sets or types of controls recognized by the auditing discipline. These operate across all applications. These would include developing and staffing a quality assurance program that works independently of other staff; developing and enforcing documentation standards; developing standards for data transfer and manipulation, such as prohibiting the same individual from both performing and approving sample testing; training individuals to perform data transfers; and developing hardware controls, such as writing different backup cycles to different disk packs and developing and enforcing labelling conventions for all cabling. Integrity - see "data integrity." G-3 ------- Automated Laboratory Standards Program Journaling - recording all significant access or file activity events in their entirety. Using a journal plus earlier copies of a file, it would be possible to reconstruct the file at any point and identify the ways it has changed over a specified period of time [Datapro]. Laboratory Information Management System (LIMS) - automation of laboratory processes under a single unified system. Data collection, data analysis, and data reporting are a few examples of laboratory processes that can be automated. Password - a unique word or string of characters used to authenticate an identity. A program, computer operator,or user may be required to submit a password to meet security requirements before gaining access to data. The password is confidential, as opposed to the user identification [Datapro]. Quality assurance - (1) a process for building quality into a system; (2) the process of ensuring that the automated data system meets the user requirements for the system and maintains data integrity; (3) a planned and systematic pattern of all actions necessary to provide adequate confidence that the item or product conforms to established technical requirements [ANSI/IEEE Std 730-1981, as reported by Drug Information Association]. Raw data - "... any laboratory worksheets, records, memoranda, notes, or exact copies thereof, that are the result of original observations and activities of a study and are necessary for the reconstruction and evaluation of that study. . . "Raw data" may include photographs, microfilm or microfiche copies, computer printouts, magnetic media, . . . and recorded data from automated instruments." [40 CFR 792.3] Raw data are the first or primary recordings of observations or results. Transcribed data (e.g., manually keyed computer-resident data taken from data sheets or notebooks) are not raw data. Risk - "the probable result of the occurrence of an adverse event..." [Perry, p. 45]. An "adverse event" could be either accidental (error) or deliberate (fraud). An example of an adverse event would be the inaccurate assignment of an accessionary number to a test sample. Risk, then, would be the likelihood that the results of an analysis would be attributed to the wrong sample. Risk analysis - a means of measuring and assessing the relative vulnerabilities and threats to a collection of sensitive data and the people, systems, and installations involved in storing and processing those data. Its purpose is to determine how security measures can be effectively applied to minimize potential loss. Risk analyses may vary from an informal, quantitative review of a microcomputer installation to a formal, fully quantified review of a major computer center [EPA IRM Policy Manual]. G-4 ------- Automated Laboratory Standards Program Security - the protection of computer hardware and software from accidental or malicious access, use, modification, destruction, or disclosure. Security also pertains to personnel, data, communications, and the physical protection of computer installations [Drug Information Association]. System - (1) a collection of people, machines, and methods organized to accomplish a set of specific functions; (2) an integrated whole that is composed of diverse, interacting, specialized structures and subfunctions; (3) a group of subsystems united by some interaction or interdependence, performing many duties but functioning as a single unit [ANSI N45.2.10, 1973, as reported by Drug Information Association]. System Development Life Cycle (SDLC) - a series of distinct phases through which development projects progress. An approach to computer system development that begins with an evaluation of the user needs and identification of the user requirements and continues through system design, module design, programming and testing, system integration and testing, validation, and operation and maintenance, ending only when use of the system is discontinued [modified from Drug Information Association]. Transaction log - also Keystroke, capture, report, and replay - the technique of recording and storing keystrokes as entered by the user for subsequent replay to enable the original sequence to be reproduced exactly [Drug Information Association]. Valid - having legal strength or force, executed with proper formalities, incapable of being rightfully overthrown or set aside [Black's Law Dictionary]. Validity - legal sufficiency, in contradistinction to mere regularity (being steady or uniform in course, practice, or occurrence) [Black's Law Dictionary]. G-5 ------- References Anon. (1989), "Products - Information Management," Laboratory Practice, 38:5 (May 1989), 87-91. Black, Henry C. (1968), Black's Law Dictionary, Revised Fourth Edition (West Publishing Co., St. Paul, Minnesota). Brown, Elizabeth H. (1987), "Procedures and their Documentation for a LIMS in a Regulated Environment," PP. 346-358 in R.D. McDowall, ed. Laboratory Information Management Systems (Wilmslow, U.K.: Sigma Press, 1987). Datapro Research (1989), Datapro Reports on Information Security (McGraw- Hill, Inc., Delran, New Jersey). Dessy, Raymond E. (1985), The Electronic Laboratory (Washington, D.C.: American Chemical Society, 1985). Drug Information Association (1988), Computerized Data Systems for Nonclinical Safety Assessment: Current Concepts and Quality Assurance (Drug Information Association, Maple Glen, Pennsylvania). Mattes, D.C. (1987), "LIMS and Good Laboratory Practice," Pp. 332-345 in R.D. McDowall, ed., Laboratory Information Management Systems (Wilmslow, U.K.: Sigma Press, 1987). McDowall, R.D. (1987), ed.. Laboratory Information Management Systems (Wilmslow, U.K.: Sigma Press, 1987). Megargle, Robert (1989), "Laboratory Information Management Systems," Analytical Chemistry, 61:9 (May 1989), 612A-621A. Merrer, Robert }., and Peter G. Berthrong (1989), "Academic LIMS: Concept and Practice," American Laboratory 21:3 (March 1989), 36-45. National Bureau of Standards (1976), Glossary for Computer Systems Security (U.S. Department of Commerce, FIPS PUB 39). National Computer Security Center (1988), Glossary of Computer Security (U.S. Department of Defense, NCSC-TG-004-88, Version 1). Office of Information Resources Management (1987), EPA Systems Design and Development Guidance, Vols. A, B, and C (Washington, D.C.: U.S. Environmental Protection Agency, 1987). ------- Office of Information Resources Management (1988), "EPA LIMS Functional Specifications." (Washington, D.C.: U.S. Environmental Protection Agency, March 1988). Office of Information Resources Management (1989a), Survey of Laboratory Automated Data Management Practices (Research Triangle Park, N.C.: U.S. Environmental Protection Agency, 1989). Office of Information Resources Management (1989b), Automated Laboratory Standards: Evaluation of the Use of Automated Financial System Procedures (Research Triangle Park, N.C.: U.S. Environmental Protection Agency, 1989). Perry, William E. (1983) Ensuring Data Base Integrity (New York: John Wiley and Sons, 1983). Sandowski, C, and G. Lawler (1989), "A Relational Data Base Management System for LIMS," American Laboratory 21:3 (March 1989), 70-79. ------- Appendix A Survey Questionnaire ------- Interviewer Name Date and Time_ Name of Respondent Firm System Description 1) What kind of system is in use? (Describe the hardware manufacturer and model) Manufacturer: Model: Name of LIMS Product: Describe the DBMS and other software in use by the system ------- The following questions are to determine what mechanisms are used to prevent unauthorized access to the system and data. System Security Yes No 1) Were specific standards or other guidance used in the design or implementation of security measures? If yes, what reference? 2) Does the system require personalized logon for each user? 3) Does each user have a password? 4) Are there any group user identification or passwords used by members of a functional group ? 5) How often does the system require passwords to be changed? 6) Are there established password standards? 7) Does the data management system track changes to the data? If so, how ? 8) Does the system automatically flag data as having been edited? If so, how ? ------- 9) Is there a record maintained of the unaltered data? 10) Are there any additional security mechanisms not covered in the previous questions ? ------- The next series of questions relate to the documentatio provided to the customer by the vendor about the installed LIMS compute system. System Documentation 1) Does the vendor provide for each Yes No installation/system a) System Implementation Plan? b) System Detailed Requirements Document? c) Software Management Plan? d) Software Test and Acceptance Plan? e) Software Preliminary Design Document? f) Software Detailed Design Document? g) Software Maintenance Document? h) Software Operations Document? i) Software User's Guide? j) System Integration Test Reports? 2) What additional documentation do you provide the customer ? ------- The following series of questions address the data entr} function wit.M" tl->o LIMS. Data Entry Yes No 1) Does the data entry individual use a personalized logon to access the system? 2) Is there a password required to access the data entry module? 3) Is the individual entering data a) from a hardcopy? b) by prompting system to access an existing data file? c) prompting the system to access data directly from another system or instrument? 4) Does the system alert the data entry personnel if an error is made in data entry (i.e., values out of date range or incorrect flags, etc.)? 5) Does the system prevent entry of incorrect or out-of-range data? Are the errors logged ? 6) Does the system prompt the individual entering data if there are missing fields? ------- The next series of questions evaluate mechanisms that may b< used to verify the integrity o^ data as the data is entered into th< system. Data Verification Yes No 1) Is the screen used for data entry a) designed to match the forms used for entering data? b) convenient for the individual responsible for data entry? 2) If data is manually entered from a hardcopy, is the data validated by a) re-keying by the same person? b) re-keying by another person? c) review by same person? d) review by another person? 3) Does the system verify data entered based on a) datatype b) matches against predefined values c) matches to keys of a preexisting record d) legal value assigned to worng unit of analysis e) quality control limits ------- 4) Are there additional mechanisms in use to the quality of the data at the point of entry ? ------- Data Integrity Yes No 1) When data is manually entered into the data base, if changes are required due to clerical errors are they made by a) data entry operator? b) data entry supervisor? c) systems group? d) QA group ? 2) If the data is committed to the data base can further changes be made to the data 3) If a change is made to data after it has been committed to the data base does the system maintain a log of a) who made the change? b) when the change was made? c) a record of both the unchanged and changed data? 4) If data is entered into the central data base via a data set on a computer readable media can further changes be made to the data? ------- 5) Is there additional information that you can provide relatina to data integrity on your product ? 10 ------- The next series of questions are directed toward functions i: the system that have the potential to modify or alter the data. Data Reduction and Analysis Yes No 1) Are the algorithms or formulas used for data manipulations performed by the system available in a written format? 2) - How many data records are processed to test each algorithm? 3) Are the analysis test results documented ? 4) How many data are records processed to test each validation algorithm? 5) Are the validation test results documented? 6) Are these checks done a) during system development? b) whenever changes are made in the data base? c) periodically by quality assurance staff? d) through the use of internal quality control samples? 7) If algorithms or formulas are modified a) is this documented? b) is it possible to determine which data sets were processed with which version of the calculations? c) are old results recalculated with new formulas? How ? d) are changes reflected in the detail design documentation? 11 ------- Data Review Yes No 1) Are there facilities to allow the analyst to examine and review results data ? If yes, explain 2) Are there facilities to allow the analyst to examine and review quality control data ? If yes, explain 3) Are there facilities to allow the analyst to examine and review instrument calibration data ? If yes, explain 4) Do supervisors need to approve results ? If so, what facilities are available to allow the analyst and supervision to online review and approve results data? 12 ------- The following questions relate to system backup and recovery in the event of a fa i ' Backups/Archival 1) What areas of the system are backed up ? 2) How often are backups are performed ? a) daily? b) weekly? c) monthly? d) other: 3) Are the backups a) partial? b) total? 4) Who is authorized to perform system backups? 5) On what media are the backups stored a) magnetic tapes? b) disks? c) diskettes? d) other: 6) When the system is backed up, is this documented on the system log ? 7) Are command files written to drive backup operations? 8) Can data and analysis programs be restored in a logically related manner so that the results may be regenerated ? 13 ------- Recovery From System Failure Yes No 1) If the system fails due to a power failure or glitch does the system a) restart automatically? b) have a manual restart? c) other: 2) Does the system lose the data being processed? If yes, how much data ? 3) Does the system start from where if left off? 4) If data is lost, can the system show the loss and identify which data was lost? 5) Does the system journal ? 6) Is there a recovery procedure for data retrieval? 7) Is there additional information that you can provide for data recovery in your system ? 14 ------- The following sections address the issue of record and data tracking in the LIMS. Records Tracking Yes No 1) Which of the following records are maintained on the data system? a) results of instrument calibrations? b) results of instrument blanks? c) results of additional quality control samples such as duplicates, spikes, etc.? d) laboratory identification of case samples? e) flags made associated with problems found during initial samples receipt (such as missing client information, leakage, etc.)? d) flags associated with quality control problems? e) records of individuals who review data? f) any modifications of data flags made by data review staff? g) evidence that data review was completed and samples were released for reporting? 3) If the data system tracks both case samples and their associated quality control samples, is there a pointer used in the system to link the case sample with a) standards? b) blanks? c) instrument calibrations? d) instrument conditions? e) duplicates? f) spikes? g) internal standards in sample? h) surrogate standards in sample? i) compounds under investigation? j) unknown compounds found in sample? 15 ------- 4) Is it possible using the data system to change any of these key link? (i.e., could a case sample be linked to a different quality control set than that with which it was run)? If yes, does the system maintain a record a) of who made the change? b) who authorized the change? ~ c) of both the unchanged and changed case/quality control link? 5) What additional mechanisms are available for data and data change tracking in your product ? 16 ------- Records Audit Yes No 1) Does the system perform any of the following data reduction functions? a) linear or quadratic reduction for standard curves? b) quantitative analysis for unknowns utilizing formulas derived in a) c) flagging of data to indicate i) standards outside of quality control acceptance criteria? ii) sample results outside linear range? iii) sample results below detection limits? iv) sample results below reporting limits? v) blanks with compounds above acceptable limits? vi) comparison of duplicate results outside acceptable limits? vii) comparison of spiked and non_ spiked samples outside acceptable limits? viii) other: 17 ------- 2) If flags are changed on the system, is there documentation kept of both the changed and unchanged flags? 3) Are the flags of sufficient detail to characterize problems with the data (i.e., a flag merely setting the sample as invalid without providing detail as to the nature of the problem may not be sufficient)? 4) Are technical records maintained on the data system sufficiently complete as to allow scientific review of the data? 18 ------- Other 1) Do you have any suggested literature (references, meetinc proceedings, etc.) on these topics ? 19 ------- Appendix B Summary of Results ------- Features and characteristics offered by all five vendors: • A personalized logon is required for each user. • Each user has a password. • The data base management system tracks changes to the data. • Data are automatically flagged, and a record is maintained of the unaltered data. • Data can be entered from a hard copy or an existing data file. • The system alerts the data entry personnel if a detectable error is made during data entry. • The data entry screen can be designed to match the data entry forms. • The data can be validated by a review by the same person or by a different person. • The system can verify data based on: data type, matches against a pre- defined value, a legal value assigned to wrong unit of analysis, and quality control limits. • When data are manually entered into the data base, changes required due to clerical errors can be made by the data entry operator, the data entry supervisor, and the quality assurance (QA) group. • When data are entered into the central data base via a data set on a computer-readable medium, further changes can be made. • Algorithms and formulas used for data manipulation are available in hard copy. • Analysis and validation test results are documented. • The analyst has facilities to examine and review results data, quality control (QC) data, and instrument calibration data. • Data and analysis programs can be restored in a logically related manner so results can be regenerated. • The system starts automatically or manually after a power failure or interruption. B-l ------- • The system loses the data being processed at the time of the failure. • The system journals. • There is a recovery procedure for data retrieval. • The following records are maintained on the data system: results of additional quality control samples (duplicates, spikes, etc.), laboratory identification of case samples, and any modifications of data flags made by a data review staff. • The system performs the following data reduction functions: linear or quadratic reduction for standard curves, quantitative analysis for an unknown utilizing the linear and quadratic formulas, flagging of data to indicate standards outside of QC-acceptable criteria, and flagging of data to indicate sample results outside linear range. • Technical records maintained on the data system are sufficiently complete for scientific review. Features and characteristics offered by four of the vendors: • Groups can have group user identification or passwords. • The data entry individual uses a personalized logon. • A password is required to access the data entry module. • Data can be entered from another system or instrument. • When data are manually entered into the data base, changes required due to clerical errors can be made by a systems group. • Data reduction and analysis checks are done during system development. • If algorithms or formulas are modified, it is possible to determine which set of data was done with which version of the formulas. • Supervisors need to approve results. • Command files are written to drive backup operations. • After a system failure, the system restarts where it left off. B-2 ------- The following records are maintained on the data system: results of instrument calibrations, results of instrument blanks, flags associated with problems found during initial sample receipt, flags associated with quality control problems, records of individuals who review data, and evidence that data review was completed and samples were released for reporting. If the data system tracks both case samples and QC samples, there is a pointer to link the case sample with: standards, blanks, duplicates, spikes, internal standards in samples, surrogate standards in samples, compounds under investigation, and unknown compounds found in samples. System flags data to indicate: sample results below detection limits, sample results below reporting limits, blanks with compounds above acceptable limits, comparison of duplicate results outside limits, and comparison of spiked and non-spiked results outside limits. If flags are changed on the system, documentation of both flags is kept. Features and characteristics offered by three of the vendors: • System prevents entry of incorrect or out-of-range data. • System logs errors. • System prompts for missing fields. • System verifies data based on matches to keys of a pre-existing record. • Data reduction and analysis checks are done whenever changes are made in the data base. • If algorithms or formulas are modified, it is documented; old results are recalculated with new formulas, and changes are reflected in detailed design documentation. • If the data system tracks both case samples and QC samples, there is a pointer that links the case sample with instrument calibrations and instrument conditions. • Flags are of sufficient detail to characterize the problems with the data. B-3 ------- Features and characteristics offered by two of the vendors: • Data can be validated by a re-key by the same person or another person. • Further changes can be made after data are committed to the data base. • Data reduction and analysis checks are done periodically by a QA staff member and through the use of internal QC samples. • When the system is backed up, this is documented on the system log. • If data are lost, the system shows the loss and identifies which data elements were lost. B-4 ------- |