United States Environmental Protection Agency Robert S. Kerr Environmental Research Laboratory Ada, OK 74820 Research and Development EPA/600/S8-90/004 May 1990 &EPA Project Summary Geostatistics for Waste Management: A User's Manual for the GEOPACK (Version 1.0) Geostatistical Software System S.R. Yates and M.V. Yates GEOPACK, a comprehensive user- friendly geostatistical software system, was developed to help in the analysis of spatially correlated data. The software system was developed to be used by scientists, engineers, regulators, etc., with little experience in geostatistical techniques and still satisfy the requirements of more advanced users. By using GEOPACK, and spending a little time becoming familiar with geostatistics, end-users will be able to include these geostatistical techniques in their work and research environments. This Project Summary was developed by EPA's Robert S. Kerr Environmental Research Laboratory, Ada, OK, to announce key findings of the research project that is fully documented in a separate report of the same title (see Project Report ordering information at back). Introduction Using the geostatistical techniques in the analysis of spatially correlated data generally requires the use of a computer to handle the large number of samples and carry out the lengthy calculations. Unless one knows someone who is willing to provide the necessary computer programs, one is faced with the difficult task of finding, purchasing or developing the required computer software. Although there are a number of practicing geostatisticians who undoubtedly have access to the necessary programs, these programs are not generally available or they are proprietary codes. Often, the programs which are developed for research purposes are subject to limited availability and are difficult for others to use or modify for purposes other than those for which they were originally designed. GEOPACK was developed with the philosophy that geostatistical software is needed that can be used by individuals with a minimum level of geostatistical expertise and yet can also satisfy the needs of more sophisticated users. The specific objectives in developing this program were: (1) develop geostatistical software for individuals without a great deal of geostatistical training and allow those individuals to learn these techniques and eventually use them in their work environment (2) develop a system which is adaptable in the sense that additional programs could be incorporated into the system at a later date without having to alter previous programs or recompile the entire system (3) develop programs which produce graphic output in a variety of forms and of publishable quality to meet the needs of research scientists and engineers (4) include on-line help facilities and extensive error checking in the programs The on-line help facilities offer information concerning the operation of ------- the system, its capabilities and limitations, how to alter the system, as well as programming conventions and definitions. GEOPACK allows the incorporation of other programs, such as the GEO-EAS (EPA/600/4-88/033) system. Examples showing how this software can be used in the analysis of spatially correlated data can be found in the GEOPACK users manual. Basic System Description The GEOPACK system includes programs to do the more common statistical and geostatistical analyses. The system is estimation oriented in that if the ordering in the menu system is followed, a grid of estimates for the selected variable in the data set will result. A description" of the various components of the system follows. Baste Statistics Basic statistics such as the mean, median, variance, standard deviation, skew, kurtosis and maximum and minimum values can be determined for the selected data set. Routines are also available for linear regression, polynomial • regression, Kolomogorov-Smirnov test for distribution and calculating the percentiles of the data set. A user- supplied statistics package can be incorporated into GEOPACK to allow the user to access the comprehensive statistical analyses that are contained in many commercial statistics packages. Var/ography The sample semivariogram or cross- semivariogram for a two-dimensional spatially-dependent random function can be determined. The approach used in determining the sample semivariogram is similar to that outlined in Journel and Huijbregts (1978). A model can be fitted to the sample semivariogram using the nonlinear least-squares fitting procedure of Marquardt (1963). This provides a first estimate for the coefficients to be used in a cross-validation program and helps to automate the model-fitting procedure. If the least-squares technique fails, or other information is available which should be included in the model-fitting process, the traditional iterative method of selecting the model coefficients and viewing a graph comparing the sample values to the model can be utilized. Unear Estimation GEOPACK includes programs to calculate the ordinary kriging and cokriging estimators in two dimensions along with their associated estimation variance. The programs include punctual and block kriging and geometric anisotropy. There is a cross-validation option which uses the kriging estimator in a jackknifing mode to cross validate the spatial correlation structure. It is possible to include indicator kriging in an analysis by first transforming the data. Nonlinear Estimation Nonlinear estimators such as the disjunctive kriging and cokriging estimators can be determined along with the estimation variance and the conditional probability that the value is greater than a specified cutoff level. Up to 10 cutoff levels are allowed. As with the linear estimation method, this type of an analysis can be done on punctual or block support and may include anisotropy. Help Facilities The program includes on-line help facilities to provide the user with information concerning the operation of the program, data requirements, conventions, definitions, run-time errors, missing files, etc. that are encountered during execution. At the main menu level, the help information is of a general nature. During execution of a program, the help is more specific, such as defining a term. Virtually all the information needed to operate or modify the system is available from the HELP facility. Other Features of GEOPACK The program also includes various graphics capabilities such as linear or logarithmic line plots, contour and pixel diagrams. The program can be interfaced with any user-supplied graphics package so that custom diagrams can be_ developed. GEOPACK uses dynamic allocation of memory so that data sets with a wide range of variables and positions can be used without having to alter the program. A large storage array is partitioned based on the number of samples and variables so that there is little wasted space compared to defining the arrays to have a fixed number. One limitation is that GEOPACK allows a maximum of 10 variables plus their x and y positions and a sample or position number. If an array must be created by a program, the space is obtained from the large storage array. If attempts are made to use more memory than is available, an error message is printed out giving the memory status. From this information, a decision can be made on how to reduce the memory needs to allowable limits. GEOPACK uses data in a standard ASCII format for data input. Data can be entered with any program (data base or spreadsheet) or word processor that supports ASCII format. There is a seven line header associated with each data file. This header consists of three lines of title information, the number of random variables, total number of samples, the names to use for the random variables, and a format specifier which describes the way the data is to be read into GEOPACK. The format specifier follows the ANSI FORTRAN convention. The sample data file is on the following page. Program Structure The program' has been structured to enable the addition of programs by end- users. This has led to the development of a menu system from which a particular program is executed. Part of the system is hard coded and cannot be changed; but by using what is termed a USER'S menu, a program or another user-defined menu can be added to the system at any time. The user menu is accessed from the F5 key and reads the instructions contained in a data file. The data file can be modified to include a different set of instructions, and thus allow the system to be modified to suit the end-user's needs. Through this menu, the end-user can add any number of programs, menus and subdirectories to the system. Program Utilities Many utility programs are included enabling the user to access a variety of information and other computer functions while using GEOPACK. A sample of the utilities from within GEOPACK are those to 1) select, edit or modify an existing data set^ 2) pack_the contents of the temporary directory into a compressed" format for later use or to extract all or some of the files contained in a compressed file, 3) display the program structure, 4) temporarily leave the program in a DOS shell, 5) execute a DOS command, 6) view a file. Also included are a number of utilities which help the user to tailor GEOPACK to specific needs by facilitating the passage of information to the new user-supplied applications. Computer Requirements The programs have been written in a combination of FORTRAN and C programming languages and run on IBM- compatible microcomputers such as the PC-AT, Compaq-286, -386, Zenith, etc., ------- This is a typical data file. There are 4 random variables: MOIST, TEMP, SAND and OIL-%. There are 119 positions where data was collected (only 4 positions are shown). 4 119 12 3546 7 MOIST (G5.0.12F10.3) 1 6.0000 2 6.0000 TEMP 7.0000 10.0000 SAND 46.8500 46.2900 OIL-% 999.9990 5.9250 56.5102 55.6444 6.5362 5.2454 118 119 24.0000 22.0000 21.0000 24.0000 46.3500 47.1400 999.9990 999.9990 54.4012 52.5845 4.0463 2.5345 using an MS-DOS operating system (ideally version 3.2 or greater) and 640 K memory. (PC-XT and compatibles are not recommended for GEOPACK.) A math coprocessor is recommended but not required. This is a mathematical intensive system and calculations will be significantly faster if a math coprocessor is installed. The system can support the use of a virtual disk (RAM disk) as the tem'porary storage device. GEOPACK also requires that the ANSI.SYS driver be installed if the screen output is to perform properly. Because of its integrated nature, GEOPACK requires a hard disk storage with about 4 Mbytes free disk space. A graphics monitor is required due to the graphical nature of the output. GEOPACK supports either a CGA, EGA, VGA or HERCULES graphics adapter and the appropriate monochrome or color monitor. The system includes a graphics program for the printing of graphical images and supports the following devices: HPGL compatible plotters, HPCL compatible printers, and Epson compatible dot matrix printers. Software Availability The GEOPACK system in its executable form is in the public domain and can be obtained by sending four Preformatted diskettes [either 5.25 inch high density (1.2Mb) or 3.5 inch high density (1.44Mb)] to: GEOPACK Distribution US EPA Robert S. Kerr Environmental Research Laboratory P.O. Box 1198 Ada, OK 74821-1198 Telephone: (405) 332-8800 References Journel, A. G., and Ch. J. Huijbregts, Mining Geostatistics, Academic Press, New York, 1978. Marquardt, D. W., An Algorithm for Least- squares Estimation of Non-linear Parameters, J. Soc. Ind. Appl. Math., 11:431-441, 1963. ------- S. R Yates is with the U.S. Salinity Laboratory, Riverside, CA 92501 and M. V. Yates is with the University of California, Riverside, CA 92521 David M. Walters is the EPA Project Officer (see below).. The complete report, entitled "Geostatistics for Waste Management: A User's Manual for the GEOPACK (Version 1.0) Geostatistical Software System," Order No. PB 90-186 4201 AS; Cost: $17.00 subject to change) will be available only from: National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone: 703-487-4650 The EPA Project Officer can be contacted at: Robert S. Kerr Environmental Research Laboratory U.S. Environmental Protection Agency Ada, OK 74820 United States Environmental Protection Agency Center for Environmental Research Information Cincinnati OH 45268 Official Business Penally for Private Use S300 EPA/6QO/S8-90/004 ------- |