&EPA United States Environmental Protection Agency Environmental Monitoring System* Laboratory P.O. Box 93478 Las Vegas NV 89193-3478 Research and Development Environmental Software From EMSL-LV EPA 600/8-91/044 July 1991 Environmental Software From EMSL-LV ------- June 1991 ENVIRONMENTAL SOFTWARE FROM EMSL-LV Prepared by Lockheed Engineering & Sciences Company Environmental Programs Office 1050 E. Flamingo Rd. Las Vegas, Nevada 89119 Contract Number 68-CO-0049 Work Assignment Manager J. Jeffrey van Ee Exposure Assessment Research Division Environmental Monitoring Systems Laboratory - Las Vegas P.O. Box 93478 Las Vegas, Nevada 89193-3478 ENVIRONMENTAL MONITORING SYSTEMS LABORATORY - LAS VEGAS OFFICE OF RESEARCH AND DEVELOPMENT U.S. ENVIRONMENTAL PROTECTION AGENCY Las Vegas, Nevada 89193-3478 ------- NOTICE Mention of trade names or commercial products does not constitute endorsement or recommendation for use. The information in this document has been funded wholly (or in part) by the U.S. Environmental Protection Agency under Contract No. 68-CO-0049 to Lockheed Engineering & Sciences Company. It has been subject to the Agency's peer and administrative review, and it has been approved for publication as an EPA document. ------- TABLE OF CONTENTS Introduction, Overview of Software Packages 2 ASSESS .5 CADRE, Geo-EAS 9 Hypertext 11 SCOUT 13 Survey 15 ------- INTRODUCTION The Environmental Monitoring Systems Laboratory (EMSL-LV) is committed to the development of quality software that is critical to the monitoring and remediation efforts of the U.S. EPA Regions. EMSL-LV has undertaken the development of expert systems, computerized information systems, decision support systems, and "smart" advisors to provide easy access to the specialized knowledge necessary to meet the decision-making responsibilities in the monitoring and remediation of hazardous waste sites. Our early work focused on the computerization of tasks that were previously manually performed and labor intensive. These software packages, such as the widely used Computer-Aided Data Review and Evaluation (CADRE) program, met with great success at many levels and paved the way for further environmental software development. The next generation of computerization included software capable of addressing specialized skills such as geophysics, geostatistics, and sampling protocols. Our computer programmers work along with the scientists in these areas to ensure technical accuracy as well as ease of use. Now we are looking at the exciting applications that will be possible with emerging CD-ROM, Hypertext, and Multimedia technology. As always, your needs guide our research. Following the series of fact sheets and informational material in this catalog, you will find a brief survey form. Please take a few minutes to complete it and return it to the EMSL-LV. We would like to learn of your interests. Expert System and Environmental Software Development EMSL-LV supports EPA decision-making and environmental data processing through the development of expert system and conventional software with environmental application. Several systems are already in use and others are under development. The applications targeted by EMSL-LV have a_direct positive impact on the quality of the environmental data obtained by the Agency and the appropriateness of the data analysis and interpretation. This environmental software is customized to the needs of the regional user. EMSL-LV provides follow-up support to all users. ------- OVERVIEW OF SOFTWARE PACKAGES EMSL-LV software development efforts center on computerization of waste site characterization. Software packages can be used for the various phases of waste site evaluation. Some programs assist in the project planning while others are used for the analysis of collected data. All of the packages contribute to the improvement of data quality. This section presents an overview of our software applications to different stages of environmental data collection projects. Preliminary Evaluation: The Geophysics Advisor Expert System assists in the selection of geophysical monitoring methods Field Sampling: ASSESS assists in the planning of soil sampling events with appropriate quality control samples to determine sampling errors and their sources. A computerized Hypertext version of the document, detailing the ASSESS approach, "A Rationale for the Assessment of Errors in the Sampling of Soils,* is available also. The Soil Sampling Expert System (ESES) is under development to assist in the planning of soil sampling for metal contaminants. Sample Analysis: The Smart Method Index (SMI) aids in the selection of analytical methods. SMI provides access to data bases of chemical, physical, and radiological methods, as well as state action limits for drinking water quality parameters. Data Validation: The Computer-Aided Data Review and Evaluation System (CADRE) performs data validation of Contract Laboratory Program organic analysis results. A CLP inorganic version is under development. Data Analysis and Interpretation: Geo-EAS provides full geostatistical analysis capabilities for spatially-related data. The Scout software helps determine anomalous data points in multivariate data (outliers). These packages work together and provide advanced graphic capabilities. ASSESS, CADRE, Geo-EAS, Geophysics Advisor, the Hypertext Rationale, Scout, and SMI are currently available directly from EMSL-LV or through the Center for Environmental Research Information (CERI). To request a software package of interest please fill out the enclosed survey sheet. ------- Expert Systems and Environmental Software Developed by EMSL-LV SYSTEM T S A GeoEas SCOUT ASSESS FUNCTION AND FEATURES Rationale (or Assessing Errors In the Sampling of Soils Performs geostatistical analysis on spatially-distributed environmental data. Includes kriging. graphics, and plotting capabilities. Assists in exploratory data analysis. Identifies multivariate outliers, determines the variable(s) in which the anomaly occurred, and displays the data set through interactive three-dimensional graphics. Calculates measurement errors for soil sampling based on results from appropriate quality assurance samples. Explains a soil sampling quality assurance approach in a computerized document through the use of hypertext and provides access to the ASSESS software. ------- Expert Systems and Environmental Software Developed by EMSL-LV SYSTEM FUNCTION AND FEATURES Geophysics Expert Advisor Environmental Sampling Expert System (ESES) Computer-Aided Data Review and. Evaluation (CADRE) Smart Method Index (SMI) Provides assistance on the use of geophysical monitoring methodology for hazardous waste site assessments. Uses expert system techniques for method selection. Assists in preparation of field sampling plans to measure ground- water contamination and metal pollution in soil. Combines expert system and hypertext techniques for decision support and help. Performs semi-automated data validation for the Superfund Contract Laboratory Program multi-method analytical results. Provides natural language access to various EPA analytical method and standard data bases. ------- United States Environmental Protection Agency v/EPA INTRODUCTION FEATURES SCREENS AND MENUS Environmental Monitoring Systems Laboratory P.O. Box 93478 Las Vegas NV 89193-3478 TECHNOLOGY SUPPORT PROJECT ASSESS: A Quality Assessment Program ASSESS is an interactive program designed to assist the user in statistically determining the quality of data from soil samples taken at a hazardous waste site. EMSL-LV scientists have developed this public- domain, user-friendly Fortran program to assess precision and bias in the sampling of soils. The total error in a sampling regimen is the sum of measurement variability and natural variability of the contamination. It is the field scientist's challenge to mitigate the measurement variability by careful sample- taking, thoughtful sampling design, and the use of recommended quality as- sessment samples. The greatest potential for error, both random and bias, is in the sampling step. Field conditions, tool contamina- tion, operator differences, all can affect variability and bias in a sample before it gets to the analytical step. The value of ASSESS is its ability to detect and isolate error at critical steps in the sampling and measurement function. Installation is simple and is described in the User's Guide referenced at the end of this text. ASSESS plots graphics directly on the screen to give the user a quick look at data or results. All graphics can be formatted to give hard copy via pen plotters or other graphics printers. ASSESS checks for missing data and for data input errors of sufficient magnitude to fall outside numeric parameters that have been previously set. Reports and plots can be incorporated into WordPerfect. After an introduction screen, ASSESS presents screens and menus beginning with the Data Quality Objectives (DQO) Screen. The user inputs known information about the site and sampling method and desired confi- dence ranges. Next, the user may choose the Sampling Considerations Screen. This screen allows entry of further specifics about the field sampling, such as, number of samples taken, number of batches analyzed, cost, and batch data. The next screen is the Historical Assessment Screen that provides options for entry of historical data that may be critical to the interpre- tation of this sampling. A Quality Assessment Data Screen follows that allows the user to view and edit the quality assessment data that are called for in the parent document, A Rationale tor the Assessment of Errors in the Sampling of Soils, referenced at the end of this fact sheet. These quality assessment samples are fundamental to the successful use of ASSESS. They include samples that will check for and evaluate error in every sampling step. At this point, it is possible to produce scatter plots to visually inspect the contribution to the total error that is made by any particular quality assessment sample with the confidence in the error estimates being a function of the number of data. The Transforms Screen follows and it gives the user a method for applying unary or binary operations to the entire data set. For example, the field scientist or data interpreter may wish to truncate the data, view the plot as a log or In function, or perform a basic mathematical operation on all data. The Results Screen displays variances for sample collec- tion, batch dissimilarity, sub- sampling error, and handling differences. This screen also shows the total measurement error. A report of the results and a list of historical infor- mation and the quality assessment data may be saved to a file or printed. ASSESS is based on the use of field duplicates, splits, and performance evaluation samples that isolate and assess variability throughout the measurement process. An option is provided for the use of duplicates and splits in the calculation of variability when inadequate types and numbers of performance evaluation samples exist. ------- DATA FILES STATUS HARDWARE REQUIREMENTS ASSESS incorporates simple ASCII text files that can be created with any text editor. Two output files can be produced by ASSESS, one of which can be read as a data file by ASSESS and the other, which is not ASSESS readable, gives a report-like document. A third type is provided so that the user may edit an input file without entering all the data through ASSESS. ASSESS is currently avail- able in Version 1.0. This is a prototype environmental software package. Further development is planned and input from field scientists and EPA Regional personnel is solicited so that the next version may be more tailored to user needs. ASSESS is based on the EPA publication, A Rationale for the Assessment of Errors in the Sampling of Soils, and it is strongly recommended that users familiarize them- selves with the concepts in that document before trying to apply ASSESS. Hardware requirements for using ASSESS are: • IBM PC (or compatible) • 1.2 MB floppy disk drive 5 1 /4" (or 3 112" DD or HD) • Minimum graphics hardware is Hercules graphics card, monochrome display with graphics capabilities, CGA and EGA • Minimum 512 K RAM • Math coprocessor chip is recommended but not required REFERENCES ASSESS User's Guide, U.S. EPA Report, EMSL-LV, in press. van Ee, J. J., L J. Blume, and T. H. Starks, A Rationale for the Assessment of Errors in the Sampling of Soils, EPA Report, EPA/600/4-90/013, May 1990. FOR FURTHER INFORMATION For copies of the ASSESS program, send preformatted floppy disks with capacity of: 2 31/2'DD, 1 3 1/2" HD, or 1 51/4-HD to: Mr. J. Jeffrey van Ee U.S. Environmental Protection Agency Environmental Monitoring Systems Laboratory P.O. Box 93478 Las Vegas, NV 89193-3478 5: For general questions regarding the use of ASSESS at a site, contact: Technology Support Center U.S. Environmental Protection Agency Environmental Monitoring Systems Laboratory P.O. Box 93478 Las Vegas, NV 89193-3478 (702) 798-2270/734-3207 FTS 545-2270 FAX/FTS 545-2637 ------- &EPA INTRODUCTION FEATURES United States Environmental Protection Agency Environmental Monitoring Systems Laboratory P.O. Box 93478 Las Vegas NV 89193-3478 TECHNOLOGY SUPPORT PROJECT CADRE: A Data Validation Program The Environmental Monitor- ing Systems Laboratory - Las Vegas (EMSL-LV) has developed a computer software system to aid environmental scientists and data analysts in the evalua- tion of data generated by the Contract Laboratory Program (CLP). This system, CADRE (Computer-Aided Data Review and Evaluation) assists in the validation of results from various CLP methods. CADRE provides data analysts with a quick and reliable method for examining data that will be used tor decision making at hazard- ous waste sites. The pro- gram automates the phases of data validation that involve electronic-format data. The data validation process involves comparison of quality control (QC) indicators used in the analysis with pre- established data quality criteria. Non-compliant data are qualified with appropriate codes to indicate the severity of the defect. The final assessment of the data is made by the data reviewer, using the information pro- vided by CADRE. Examples of QC parameters that are checked by CADRE are: holding time, blanks, calibration, and precision. CADRE can read data in several CLP electronic formats. It checks for data completeness, and allow the user to edit data. After the validation is complete, CADRE reports the results. CADRE can be customized by the user to validate data collected using several methods in the CLP. Users can configure CADRE to examine different compounds, alternate quantitation limits, or varying QC parameters. Another customization of CADRE involves changing data validation criteria to meet the needs of a modified method. The user can choose, for example, to allow a longer holding time if the compound of interest is unlikely to volatilize or degrade. The ability to modify CADRE'S specific data quality codes provides the user with greater flexibility and responsibility. To protect the data from tampering and from human error, a layered security system allows each user access to the program features he or she needs. The program blends ease of use with a sophisticated screen system. Knowledge of data validation rationale and microcomputer operation are recommended for the effec- tive use of CADRE. A user's guide, training courses, and technical user support are available from the EMSL-LV. CLP ORGANIC VERSION The CLP ORGANIC version of CADRE evaluates data from CLP analysis of volatile, semivolatile, and pesticide compounds. Volatile and semivolatile organic com- pounds are analyzed by gas chromatography/ mass spectrometry (GC/MS). Pesticide analysis is a GC method. CLP ORGANIC CADRE can be customized to evaluate modified versions of these routine analyses. It can use alternate data validation criteria selected by the user. Data can be read by CLP ORGANIC CADRE from the CLP Analytical Results Database (CARD) or from Agency standard format files. Checks performed by CADRE include: quantitation limits holding time GC/MS tuning calibration internal standards system performance surrogate recovery matrix spike recovery precision of duplicates contamination of blanks ------- QUICK TURNAROUND METHOD VERSION ADVANTAGES AND LIMITATIONS HARDWARE REQUIREMENTS The Quick Turnaround Method (QTM) version of CADRE reviews data ob- tained by the QTM methods. There are QTM methods available for VOC, PAH, phenols, pesticides, and PCB. These methods are based on the need for fast extraction and chromato- graphic analysis within 2 days. For speed and simplicity, QTM CADRE works in conjunction with other software for electronic data transmission from the laboratory to the user through the Agency communications network. QTM CADRE is completely automated. The data re- viewer needs only to set up the system and interpret the reports. The use of computerized data evaluation is changing the workplace for many data reviewers. The automation of routine checks will give the individual more time to thoughtfully interpret the results. It is anticipated that in- creased accessibility of computer hardware to personnel will lead to greater demand for programs like CADRE that will streamline routine work. Currently, CADRE is being developed for inorganic methods. Advantages Limitations Fast, complete, and consistent data validation Easy customization for modified methods Reduction of human error Automated report generation Requires availability of powerful computer for efficient use Reviewer judgement needed for some decisions Available for CLP organic and QTM methods only Needs complete data set in electronic format Hardware requirements for using CADRE are: • IBM PC (or compatible) • MS-DOS (or equivalent) • Hard disk drive • 640 K RAM A math coprocessor chip is recommended but not required. For easy use, a mouse pointer is recommended. REFERENCE Simon, A. W., J. A. Borsack, S. A. Paulson, B. A. Deason, and R. A. Olivero, Computer-Aided Data Review and Evaluation: CADRE CLP Organic User's Guide, U.S. EPA, June 1991. ^Nc^IOA/ FOR FURTHER INFORMATION For further information on CADRE, contact: Mr. Gary Robertson U.S. Environmental Protection Agency Environmental Monitoring Systems Laboratory P.O. Box 93478 Las Vegas, NV 89193-3478 (702) 798-2215 FTS 545-2215 For information about the Technology Support Center at EMSL-L V, contact: Mr. Ken Brown, Manager Technology Support Center U.S. Environmental Protection Agency Environmental Monitoring Systems Laboratory P.O. Box 93478 Las Vegas, NV 89193-3478 (702) 798-2270 FTS 545-2270 ------- X-/EPA INTRODUCTION THE METHODOLOGY EQUIPMENT REQUIREMENTS United States Environmental Protection Agency Environmental Monitoring Systems Laboratory P.O. Box 93478 Las Vegas NV 89193-3478 TECHNOLOGY SUPPORT PROJECT Geo-EAS: Software for Geostatistics An EMSL-LV f nvironmcntal The Environmental Monitoring Systems Laboratory-Las Vegas (EMSL-LV) can meet the needs of scientists who work with spatially distributed data. The complexity of contaminant distribution and migration at hazardous waste sites requires a mathematical method that is capable of interpreting raw data and converting them to useful information. Geostatistics began in the mining industry and has grown to include applications ranging from microbiology to air monitoring. Though the application of geostatistics is crucial to the delineation of buried contami- nants, not every field scientist can be expected to develop customized geostatistical algorithms for individual sites. Geostaticians at the EMSL- LV developed a software package, Geo-EAS in 1988. The current version, Geo-EAS 1.2.1, was released in 1990. This program offers the environmental scientist an interactive tool for performing two-dimensional geostatistical analyses of spatially distrib- uted data. Geostatistical methods are useful for site assessment and monitoring where data are collected on a spatial network of sampling loca- tions. Examples of environ- mental applications include lead and cadmium concentra- tions in soils surrounding smelters, and sulfate deposi- tion in rainfall. Kriging is a weighted moving average method used to interpolate values from a data set onto a contouring grid. Thekriging weights are computed from a variogram, which measures the correlation among sample values as a function of the distance and direction be- tween samples. Kriging has a number of advantages over other inter- polation methods: Smoothing Kriging regresses estimates based on the proportion of total sample variance ac- counted for by random noise. The noisier the data set, the less representative the sample and the more they are smoothed. Declustering The kriging weight assigned to a sample is lowered to the degree that its information is duplicated by highly corre- lated samples. This helps mitigate the impact of oversampling hot spots. Anlsotropy When samples are highly correlated in one direction, kriging weights will be greater for samples in that direction. Precision Given a variogram represen- tative of the area to be esti- mated, kriging will compute the most precise estimates from the data. Estimation of the variogram from sample data is a critical part of a geostatistical study. Geo-EAS is designed to make it easy for the novice to use geostatistical methods and to leam by doing. It also provides sufficient power and flexibility for the experienced user to solve practical problems. Geo-EAS was designed to run under DOS on an IBM, PC, XT, AT, PS2, or compat- ible computer. Graphics sup- port is provided for Hercules, CGA. andEGA. At least 512 Kb of RAM is required, but 640 Kb is recommended. An arithmetic co-processor chip is strongly recommended due to the computationally inten- sive nature of the programs, but is not required. Programs may be run from floppy disk but a fixed disk is required to use the programs from the system menu. The system storage requirement is ap- proximately three megabytes. For hardcopy, a graphic printer is required. Support is provided for most plotters. Design features such as simple ASCII file formats and standardized menu screens, give Geo-EAS flexibility for future expansion. It is antici- pated that Geo-EAS will be- come a significant technology transfer mechanism for more advanced methods resulting from the EMSL-LV research and development programs. Geo-EAS software and docu- mentation are public domain, and may be copied and dis- tributed freely. ------- MAPS AND MENUS The Geo-EAS programs use an ASCII file structure for input. The files contain a header record, the number of variables, a list of variable names and units, and a nu- meric data table. All Geo-EAS programs are controlled interactively through menu screens which permit the user to select op- tions and enter control pa- rameters. The programs are structured to avoid a "black box" approach to data analy- sis. Several of the more complex programs permit the user to save and read param- eter files, making it easy to rerun a program. The programs DATAPREP and TRANS provide capabil- ity for manipulating Geo-EAS files. Files can be appended or merged, and variables can be created, transformed, or deleted. Transformation QpgcgjiQPg jnciucje naturai log, square root, rank order, indicator, and arithmetic operations. POSTPLOT creates a map of a data variable in a Geo-EAS data file. Symbols represent- ing the quartiles of the data values or the values them- selves are plotted at the sample locations. STAT1 computes univariate statistics, such as mean and standard deviation, for vari- ables in a Geo-EAS data file, and plots histograms and probability. SCATTER and XYGRAPH both create x-y plots with optional linear regression for any two variables in a Geo- EAS file. SCATTER is useful for quick exploratory data analysis, while XYGRAPH provides additional capabili- ties such as multiple "y" vari- ables, and scaling options. PREVAR creates an interme- diate binary file of data pairs for use in yARIO, which com- putesand displays plots of variograms for specified dis- tance and directional limits. Variogram models can be interactively fitted to the ex- perimental points. The fitted model may be the sum of up to five independent compo- nents, which can be any com- bination of nugget, linear, spherical, exponential, or Gaussian models. XVALID is a cross-validation program which can test a variogram model by estimating values at sampled locations from sur- rounding data and comparing the estimates with known values. KRIGE provides kriged esti- mates for a two-dimensional grid of points. A shaded map of estimated values is dis- played on a Geo-EAS file of kriged grid results. CONREC generates contour maps from a gridded Geo- EAS data file, usually the output from KRIGE. Options are provided for contour inter- vals and labels and degree of contour line smoothing. REFERENCE: Isaaks, E. H. and R. M. Srivastava, An Introduction to Applied Geostatistics, Oxford University Press, New York, 1989. AVAILABILITY: For further information about Geo-EAS, contact Dr. Evan Englund. Government agencies and academic or research institutions can obtain a copy of Geo-EAS with User's Guide at no charge by sending three pre- formatted high-density diskettes (5-1/4" or 3-1/2") to: Dr. Evan J. Englund U.S. Environmental Protection Agency Environmental Monitoring Systems Laboratory-Las Vegas P.O. Box 93478 Las Vegas, NV 89193-3478 FAX: (702) 798-2248 FTS: 545-2248 Others can obtain a copy for a distribution charge of approximately $45 (includes diskettes, User's Guide, and USA shipping) from either: ACOGS or COGS P.O. Box 44247 P.O. Box 1317 Tucson, AZ 85733-4247 Denver, CO 80201 -1317 FAX: (602)327-7752 Phone: (303)751-8553 FOR FURTHER INFORMATION: For information about the Technology Support Center at EMSL-L V, contact: Mr. Ken Brown, Manager Technology Support Center U.S. Environmental Protection Agency Environmental Monitoring Systems Laboratory-Las Vegas P.O. Box 93478 Las Vegas, NV 89193-3478 (702) 798-2270 10 ------- United States Environmental Protection Agency INTRODUCTION THE RATIONALE DOCUMENT HOW HYPERTEXT WORKS Environmental Monitoring Systems Laboratory P.O. Box 93478 Las Vegas NV 89193-3478 TECHNOLOGY SUPPORT PROJECT Hypertext: A Showcase for Environmental Documents The amount of "required reading* for those engaged in hazardous waste site remediation is overwhelming. Documents pile up • often leaving the scientist no option but to briefly review the abstract or the executive summary. Fortunately, there exists a computer software tool, hypertext, that allows for documentation on disk that can provide all readers/users with various layers of infor- mation. The tiered knowl- edge in hypertext makes it ideal for experts in the field of the publication who can scan through the general informa- tion and concentrate on a particular section. It is also suited to the novice in the document's area who can access highlighted areas for in-depth definitions of unfa- miliar terms, full-screen presentations of tables and figures, and references to ancillary works. Hypertext is an easy-to-use, timesaving reading tool for the overburdened scientist. The ability to read an elec- tronic book helps each reader optimize the information-time ratio. Scientists at the EMSL-LV have used hypertext on a frequently used document, "A Rationale for the Assessment of Errors in the Sampling of Soils* by J. Jeffrey van Ee, Louis J. Blume, and Thomas H. Starks. The original hardcopy document is about 60 pages long, and contains 4 figures and 8 tables. The document also contains several formulas that may be unfamiliar to many users. The hypertext version fits on a floppy disk, keeps general information "hidden* unless it's requested by a novice user, and high- lights frequently used tables for easy access. Hypertext can be applied to any document that exists in digital form. The level of hypertext a document needs depends on the complexity and length of the original document and the anticipated expertise of the reading audience. The Rationale mentioned above addresses the com- plexity of the sampling and analysis of soils for inorganic contaminants from experi- mental design to the final evaluation of all generated data. Sources of error abound but they can be successfully mitigated by careful planning or isolated by intelligent error assess- ment. Error can be either biased or random. Biased error is indicative of a sys- tematic problem that can exist in any sector of soils analysis, from sampling to data analysis. The first step in analysis of variability is to establish a plan that will identify errors, trace them to the step in which they occurred, and account for variabilities to allow direct corrective action to eliminate them. Error assessment should be understood by the field scientist and the analyst. To implement the ideas in the Rationale document and aid scientists in the estimation and evaluation of variability, the EMSL-LV has developed a computer program called ASSESS. By applying statistical formulas to quality assurance data entered, ASSESS can trace errors to their sources and help scientists plan future studies that avoid the pitfalls of the past. Scientists at the EMSL-LV took the disk containing the Rationale document and extracted sections such as the Table of Contents, tables, figures, and certain equations and formulas. These sec- tions appear separately when selected in the new hypertext version. Then, throughout the document, certain words and phrases were highlighted so definitions can be ac- cessed by a keystroke. When a reader receives a hypertext document on disk, he or she can look at the Table of Contents and decide which sections to read. By selecting, for example, the section entitled "background", 11 ------- HOW HYPERTEXT WORKS (Continued) BRIDGE TO ASSESS ADVANTAGES AND LIMITATIONS HARDWARE REQUIREMENTS the reader can be briefed on the scope of the document. A term within the Background section, e.g., "representative" may be highlighted. Readers wishing the definition of •representative" as used in this document may get an immediate clarification. In traditional (linear) hardcopy documents, a reader must either wait for the definition to be clarified in text or seek an external definition through outside reference materials. The Rationale document is the basis for an EMSL-LV environmental software program called ASSESS. The philosophy and statistical background in the document is exercised practically with ASSESS, which is also available on disk. The hypertext version of the Rationale document prepares the reader to use ASSESS and also serves as a physical link to the program. The last item on the Rationale docu- ment hypertext menu is "ASSESS". After becoming familiar with the concepts in the document, the user may select "ASSESS* to begin to use the software. This hypertext linkage of two or more documents or programs can simplify and clarify many software applica- tions for novice users. By providing ASSESS users with the technical background in its development and Ratio- nale document readers with a viable program, hypertext serves all levels of users in error-tracing in the complex application of soil sampling. Increased availability of computer workstations and the development of user- friendly programs have made hypertext an almost unquali- fied bonus to busy readers/ users. Hypertext is easily and effectively used for: acronyms and abbreviations, terms and phrases, tables and figures, graphics, formu- las and references. Advantages Limitations Streamlined and non- interruptive Linkage to other hypertext documents Time-saving for expert; instructional for novice Availability of computer with appropriate hardware Some computer literacy required Hardware requirements for using this hypertext package are: • IBM PC (or compatible) 1.2 MB floppy disk drive, 5 1/4' (or 3 1/2" DO or HD) Minimum graphics hard- ware card, monochrome display with graphics capabilities, VGA and EGA Minimum 640 K RAM Math coprocessor chip is recommended but not required REFERENCES Text, ConText, and Hypertext; Writing with and for the Computer, E. Barrett, ed., The MIT Press, 1988. van Ee, J. J., L J. Blume, and T. H. Starks, A Rationale for the Assessment of Errors in the Sampling of Soils, EPA Report, EPA/600/4-90/013, May 1990. FOR FURTHER INFORMATION For more details on Hypertext and the Rationale document, contact: Mi-. J. Jeffrey van Ee U.S. Environmental Protection Agency Environmental Monitoring Systems Laboratory P.O. Box 93478 Las Vegas, NV 89193-3478 (702) 798-2367 For information about the Technology Support Center at EMSL-L V, contact: Mr. Ken Brown, Manager Technology Support Center U.S. Environmental Protection Agency Environmental Monitoring Systems Laboratory P.O. Box 93478 Las Vegas, NV 89193-3478 (702) 798-2270 12 ------- f/EPA United States Environmental Protection Agency Environmental Monitoring Systems Laboratory P.O. Box 93476 Las Vegas NV 89193-3478 TECHNOLOGY SUPPORT PROJECT Scout: A Data Analysis Program INTRODUCTION FEATURES/ SPECIFICATIONS MENUS The complexities of correct data interpretation challenge environmental scientists everywhere. Environmental software packages have been developed to address the various needs of data analysts and decision mak- ers. One frequent need is for the reliable determination of outliers in a data set. Scout is a program developed to identify multivariate or univariate outliers, to test variables for lack of normal- ity, to graph raw data and principal component scores, and to provide output of the results of principal compo- nent analysis. Scout pro- vides interactive graphics in two and three dimensions. There are many advantages of a graphical display of data in a multidimensional format: it allows a quick visual inspection of data, it accentu- ates obvious outliers, and it provides an easy means of comparing one data set with another. Scout has the flexibility to allow viewing and limited editing of a data set. Scout features on-line help, with a "built-in" users guide. Scout is a valuable addition to the library of environmental software packages available from the EMSL-LV. Scout is a public domain, Turbo Pascal program that is user friendly and menu driven. Scout reads ASCII data files that are in Geo- EAS format. The first line of aQeo-EASdatafileisa comment line, generally used to describe the origin of the data. The second line of the file must contain the number of variables • always a number greater than or equal to 1 and less than or equal to 48. The next lines contain variable names in the first 10 columns and the associated values in the next 10 col- umns. Scout is compatible with most IBM personal computers that have an EGA, VGA, or Hercules graphics system. Scout will run with or without a math co-processor, but this feature is preferred for handling floating point calculations. A fixed disc drive is strongly recom- mended because Scout performs many transfers between memory and disc during execution. On-line help is available throughout Scout and the user can access it by selecting the •System" option in the main menu and then selecting 'Information'. There are five menus in Scout: file management, data management, outliers, principal components analy- sis, and graphics. After the introduction screen, the user should choose the "File Management* option on the main menu. This option allows the user to load the Scout data file or read an ASCII data file and to access various subdirectories of data. Scout saves data files in two formats: binary and the Geo-EAS ASCII format. Scout has the ability to search for file names, includ- ing wild cards. The current search string is printed at the top of the window. Other options in this area include •Write ASCII Data File" for saving the Scout file and "Merge Two Data Files' for combining two files into one. The second menu is "Data Management" which includes options for editing data, variables, and observations. This menu also displays summary statistics, such as mean, standard deviation, and variance. Additionally, there is a Transform" option which allows the user to test each variable for lack of normality, based on the Kolmogorov-Smimov test at the five percent significance level. The critical value, test statistic, and apparent conclusion are displayed. The Anderson-Darting test is also performed and a hori- zontal histogram is displayed at the bottom of the screen. Menu three is "Outliers", which applies two powerful tests for discordancy to the data: the (Maha)anobis') generalized distance, and the (Continued) 13 ------- MENUS (Cont.) Mardia's multivariate kurtosis test. After selecting "Outli- ers", the user can tell Scout which variables to test, or use the default wherein Scout tests all variables. The user must then decide to use the generalized distance test or Mardia's kurtosis. If a large proportion of the data is identified as discordant, the user should be cautious that the problem may be due to lack of multinormality. The outlier report may be dis- played, sent to a file, or printed. By selecting "Causal Variables" the user can test each variable for its contribu- tion to the discordant nature of the outlier. This option can trace some independent errors, such as typographical or transcription errors. The fourth menu is "Principal Component Analysis" which allows the user to select the variables to be used and to display covariance or correla- tion. By choosing the "View Components'1 option, the user can view the eigenvectors and eigenvalues of the PCA. Scout will prompt the user to specify whether or not to include previously deter- mined outliers. The user can graph the component scores, which are products of the eigenvectors and the stan- dardized observation vectors. A Transform Data" option is available to change the data in memory from observations to component scores. The fifth, and final, menu is "Graphics" which features two graphics systems: two- dimensional and three- dimensional. The two- dimensional system is used to display scatter plots and x-y plots. The three-dimen- sional system is used to display three variable plots, which can be rotated to illustrate the added dimen- sion. The user can modify graph colors and shapes. Graphics screens may be saved by writing to a file on disk. The user can change the size of the graph by zooming in or out using the V or"-" keys. The four arrow keys are used to rotate the graph. The left and right arrows rotate the graph around the Z axis. The up and down arrows rotate the graph around an imaginary horizontal axis that passes through the origin. Another feature, "Search Observation Mode", is available and allows users to identify the individual observations shown on the graph. REFERENCES Chemometrics: A Textbook. Massart, D. L, B. G. M. Vandeginste, S. N. Deming, Y. Michotte, and L. Kaufman, Volume 2 in the Series "Data Handling in Science and Technology", B. G. M. Vandeginste and L. Kaufman, eds., Elsevier, Amsterdam, the Netherlands, 1988. Gamer, F. C., M. A. Stapanian. and K. E. Fitzgerald, Finding Causes of Outliers in Multivariate Data, J. Chemometrics, in press. FOR FURTHER INFORMATION For copies of the Scout program, contact: Mr. Jack Teuschler U.S. Environmental Protection Agency-CERI 26 King Drive Cincinnati, OH 45268 (513) 569-7314 FTS 684-7314 For additional technical information about Scout, contact: Dr. George Flatman U.S. Environmental Protection Agency Environmental Monitoring Systems Laboratory P.O. Box 93478 Las Vegas, NV 89193-3478 (702) 798-2628 FTS 545-2628 For information about the EMSL-LV Technology Support Center, contact: Mr. Ken Brown, Manager Technology Support Center U.S. Environmental Protection Agency Environmental Monitoring Systems Laboratory P.O. Box 93478 Las Vegas, NV 89193-3478 (702) 798-2270 FTS 545-2270 14 ------- SURVEY 1. I am familiar with (check appropriate box): 1 ' ASSESS ' ' Geophysics Advisor 1 ' CADRE ' ' Hypertext 1 ' Geo-EAS ' ' SCOUT 2. I would be interested in receiving a copy of the software package, and my preformatted disk(s) are enclosed. 3. My job is mainly in the area(s) of: 1 ' Geology ' ' Computer Programming 1 ' Chemistry ' ' Statistics 1 ' Biology ' ' Sampling 1 ' Other, please specify: 4. The software program that would best help me in my job would be: 5. My computer is a: 15 ------- SURVEY (CONT) 6. You may contact me at: Name: Address: State/Zip Code:. Phone #:_ Fax#: E-Mail: Please return this survey and direct any questions to: Mr. J. Jeffrey van Ee U.S. Environmental Protection Agency Environmental Monitoring Systems Laboratory P.O. Box 93478 Las Vegas, NV 89193-3478 (702) 798-2367 16 ------- |