User's Guide for the ECOSAR Class Program MS-Windows Version 0.99d November 1998 prepared by: William M. Meylan and Philip H. Howard Syracuse Research Corporation Environmental Science Center 6225 Running Ridge Road North Syracuse, NY 13210 preparedfor: J. Vincent Nabholz and Gordon Cash Risk Assessment Division (7403) U.S. Environmental Protection Agency 401 M St., SW Washington, DC 20460 ------- Table of Contents Page 1. INTRODUCTION 1 2. COMPUTER-SOFTWARE REQUIREMENTS 2 3. INSTALLING the ECOSAR Class Program 3 4. STARTING ECOSAR Class Program 5 5. DATA ENTRY and EDIT KEYS 7 5.1. Entering Data 7 5.1.1. SMILES Notation 7 5.1.2. Individual Data Entry Fields 7 5.2. Function Keys & Buttons 9 5.3. Importing Structures 11 6. RESULTS 13 6.1. Structure Window 14 6.2. Konemann Equation 15 6.3. Example SAR Equations 15 7. BATCH RUNS 16 7.1. Batch Output Formats 17 8. SPECIAL CLASS Calculations 18 9. BIBLIOGRAPHY 20 APPENDIX A - Selected SMILES Information 21 APPENDIX B - Description of User Input File 22 APPENDIX C - CAS Number Data Base 22 APPENDIX D - Estimation of Water Solubility 23 APPENDIX E - List of ECOSAR Chemical Classes 24 ------- 1. INTRODUCTION The structure-activity relationships (SARs) presented in this program are used to predict the aquatic toxicity of chemicals based on their similarity of structure to chemicals for which the aquatic toxicity has been previously measured. Most SAR calculations in the ECOSAR Class Program are based upon the octanol/water partition coefficient (Kow). Various surfactant SAR calculations are based upon the average length of carbon chains or the number of ethoxylate units. SARs have been used by the U.S. Environmental Protection Agency since 1981 to predict the aquatic toxicity of new industrial chemicals in the absence of test data. The acute toxicity of a chemical to fish (both fresh and saltwater), water fleas (daphnids), and green algae has been the focus of the development of SARs, although for some chemical classes SARs are available for other effects (e.g, chronic toxicity and bioconcentration factor) and organisms (e.g., earthworms). SARs are developed for chemical classes based on measured test data that have been submitted by industry or they are developed by other sources for chemicals with similar structures, e.g., phenols. Using the measured aquatic toxicity values and estimated Kow values, regression equations can be developed for a class of chemicals. Toxicity values for new chemicals may then be calculated by inserting the estimated Kow into the regression equation and correcting the resultant value for the molecular weight of the compound. To date, over 150 SARs have been developed for more than 50 chemical classes. These chemical classes range from the very large, e.g., neutral organics, to the very small, e.g., aromatic diazoniums. Some chemical classes have only one SAR, such as acid chlorides, for which only a fish 96-hour LC50 has been developed. The class with the greatest number of SARs is the neutral organics, which has SARs ranging from acute and chronic SARs for fish to a 14-day LC50 for earthworms in artificial soil. The ECOSAR Class Program is a computerized version of the ECOSAR analysis procedures as currently practiced by the Office of Pollution Prevention and Toxics (OPPT). It has been developed within the regulatory constraints of the Toxic Substances Control Act (TSCA). It is a pragmatic approach to SAR as opposed to a theoretical approach. This ECOSAR program is designed for the expert user. You are expected to have some knowledge of environmental toxicology and organic chemistry. It is menu-driven and contains various help functions to assist you. You cannot change any of the equations or data stored within the program or accidently erase any important information. The following pages show you 1 ------- how to install, access, and use the ECOSAR program. If you have any questions or comments on the ECOSAR program, or find any errors, please contact: ECOSAR Program Risk Assessment Division (7403) U.S. Environmental Protection Agency 401 M St., SW Washington, DC 20460 2. COMPUTER-SOFTWARE REQUIREMENTS The ECOSAR Class Program is designed for use on the IBM and IBM-compatible series of personal computers running Microsoft Windows 3.1 and higher (including Windows 95 and Windows NT). Although a mouse or other pointing device is not required, it is highly recommended. The ECOSAR Class Program requires approximately 0.5 MB of hard disk space. Use of the supplemental SMILECAS Database (a database of 103,000 SMILES notations indexed by CAS number for program retrieval to ECOSAR Class Program) requires a hard drive and -9.1 MB of additional disk space. The ECOSAR Class Program runs under Windows 95/98/NT; however, it is not currently designed to run as a multi-tasking program (e.g. running the ECOSAR Class Program batch- mode runs in the background while running another program in the foreground). Batch-modes should be run in the foreground until completion. 2 ------- 3. INSTALLING the ECOSAR Class Program The ECOSAR Class Program diskette contains an installation program that can install ECOSAR Class Program and create a Windows Program Group with program icon. The installation program must be started while Microsoft Windows (3.1, 95 or NT) is running. To install, place the floppy diskette in the appropriate floppy drive. Then, (a) in Windows 3.1, select FILE, RUN from the Program Manager's menu, or (b) in Windows 95/98, press the Start button and select Run. Then: If the floppy is in the a: drive, enter a:install If the floppy is in the b: drive, enter b:install The FILE, RUN entry box (in Windows 3.1) may look similar to the following: Run Command Line: OK install Cancel 1 1 Run Minimized Browse... Help The Run entry box (in Windows 95/98) may look similar to the following: |Run ¦ ?|_x | Type the name of a program, folder, or document, and | Windows will open it for you. Open: |A:\install.exe jd OK | Cancel | Browse... | The ECOSAR Class Program program does not actually require the installation program because installation can be handled manually; ECOSAR Class Program and its help file can be used as they exist on the floppy (that is, you can start ECOSAR Class Program directly from the 3 ------- floppy if you want). However, the installation program automatically creates a hard-drive subdirectory, copies the necessary files to it, and creates a Windows program group. The ECOSAR Class Program group folder (named "Ecosar") contains a ECOSAR icon that starts the program. The following files are installed during the installation process: ECOWIN.EXE: the necessary ECOSAR executable file ECOWHELP.HLP: a file containing extensive help information for SMILES notations, program execution, key & button usage, etc. The following files are NOT installed during the installation process (and are not on the installation disk), but can be used by the ECOSAR Class Program. These files must be obtained separately: SMILECAS.DB: a database of more than 103,000 SMILES notations indexed by CAS (Chemical Abstract Registry) number. By simply entering a CAS number in ECOSAR allows automatic retrieval of available SMILES. This database can also be used to run automated batches of CAS numbers. SMILEC AS IDX: index file for SMILECAS DB KOWWIN.EXE: a Syracuse Research Corporation program that estimates log Kow from SMILES. When this program (and its two library files listed below) are available in the same subdirectory as ECOSAR. it allows ECOSAR to automatically start KOWWIN and retrieve the KOWWIN estimate. The estimation methodology is described in a journal article (Meylan and Howard, 1995). Note: the KOWWIN program located in the ECOSAR subdirectory must be closed while ECOSAR is running. Otherwise, ECOSAR can not use it. A duplicate copy of KOWWIN can be running from a different subdirectory however. CSDLL.DLL: QCBASED.DLL: a library file required by the KOWWIN program, a library file required by the KOWWIN program. These files are available from Syracuse Research Corporation, Environmental Science Center, 6225 Running Ridge Road, North Syracuse, NY 13212-2510 (Dr. Phil Howard, 315-452-8417). 4 ------- 4. STARTING the ECOSAR Class Program The ECOSAR Class Program is started like any other Microsoft Windows program. The easiest way to start ECOSAR Class Program is to double-click the program icon installed in the ECOSAR program group during installation. For additional information on starting Windows programs, consult your Windows documentation. The following Introductory Screens are displayed: ECOSAR Class Program Information E ECOSAR Class Program Information The ECOSAR Class program is a computerized version of the ECOSAR analysis procedures as currently practiced by the U.S EPA Office of Pollution Prevention and Toxics (OPPT) Please consult the Oser's Manual or On-Line Help for a description of this program's uses and capabilities. General On-Line Help is available from the Help option on the Main Menu Bar. Individual field information is available on the main data entry screen by pressing the F1 key while the cursor is located in a particular field. Ecotoxicity of most ECOSAR chemical classes can be predicted by entering only a compound's structure by means of a SMILES notation (a log Kow value may be required if the KOWWIN program is unavailable). SARs for Surfactants, Polymers, Dyes and Inorganics do not require SMILES; available SARs are accessed from the "Special_Classes" on the Main Menu Bar. START Initial Selection The ECOSAR Chemical Hierarchy contains six divisions as shown below in the selection list. Select the chemical division you want to start with and press the "OK" button. El Inorganics. Organometallics, Polymers, Surfactants and Dyes are the "Special Classes". Access to the "Special Class" QSARs is also available from the ECOSAR Main Menu Bar. Select: C Inorganics C Organometallics C Polymers C Dyes Surfactants C Anionic Surfactants C Cationic Surfactants C Nonionic Surfactants C Amphoteric Surfactants (•" All Others The "All Others" division requires a SMILES notation for evaluation. ECOSAR's default data entry screen (requiring a SMILES) applies to the "All Others" division. The correct QSAR class is determined from the SMILES. OK 5 ------- The "Initial Selection" screen lists the six divisions of the ECOSAR Chemical Hierarchy. The divisions are Inorganics, Organometallics, Polymers, Dyes, Surfactants (Anionic, Cationic, Nonionic, Amphoteric), and All Others. Inorganics, Organometallics, Polymers, Dyes and Surfactants are the "Special Classes" (see Section 8). The default selection is "All Others". This division requires a SMILES notation for evaluation. Appendix E lists the Chemical Classes identified in the "All Others" division. Program execution ("All Others" division) begins at the data entry screen; an example is illustrated in Figure 1. Ecosar Classes v0.99d File Edit Functions BatchMode ShowStructure Special_Classes Help Previous Get User | Save User | CAS Input | Calculate | Has Enter SMILES: c1ccccc10 Enter NAME: CAS Number: Chemical ID 1: Chemical ID 2: Chemical ID 3: Phenol 108-95-2 Hydroxybenzene Measured Water Sol (mg/L): Melting Point (deg C): 41.00 Log Kow: 1.460 Measured Log Kow: Figure 1. Example Data Entry Screen Note: the appearance of the screen may vary somewhat due to screen resolution (e.g. 640 X 480 vs. 800 X 600), user selection of MS-Windows attributes (e.g. colors, font size, etc.), etc. In addition, Figure 1 illustrates how the entry screen appears when using Windows 95. Appearance in Windows 3.1 varies slightly. 6 ------- 5. DATA ENTRY and EDIT KEYS The information in section 5 applies to the main data entry screen shown in Figure 1. It concerns structure estimation using SMILES notation. Information concerning data entry for "Special Classes" (calculations not using SMILES) is presented in section 8. 5.1. Entering Data 5.1.1. SMILES Notation Calculations from the main data entry screen require the chemical structure of the compound as a SMILES notation. Users unfamiliar with SMILES notations can consult a descriptive journal article (Weininger, 1988) or the ECOSAR Class Program help file (accessed by selecting "Help" from the top menu). The following Internet web-site locations also contain extensive information about SMILES notations: (1) http://www.daylight.com (Daylight Information Services) (2) http://esc.svrres.com (Syracuse Research Corporation) Three different methods can be used to enter the SMILES notation with chemical name: (1) direct entry by the user from the keyboard (2) entry from a previously created user file that is accessed by pressing the F4 key (or clicking the "Get User" button) (3) entry from a supplementary database that is accessed by pressing the F8 key (or clicking the "CAS Input" button) and entering the Chemical Abstract Service (CAS) Registry number of the compound. The program can estimate only one chemical at a time. Separate data entry is required for each chemical, although batch mode runs are possible (see F5, F7 function keys below and see Section 7). Estimation of the entered SMILES notation is started by pressing the PgDn key (or clicking the "Calculate" button) at any time during data entry. 5.1.2. Individual Data Entry Fields The following is a description of the individual data entry fields on the main data entry screen (pressing the F1 key where the edit cursor is located gives a brief description of that field): 7 ------- (1) SMILES: the SMILES notation of the structure to be estimated. A maximum of 360 characters are allowed. This field is required. Do not leave any blank spaces in front of a SMILES notation ... a SMILES is considered finished when a blank space is encountered. (2) Name: the name and/or description of the structure. This field is optional; not required. A maximum of 120 characters are allowed. (3) CAS Number: the CAS (Chemical Abstract Service Registry) Number. This field is optional; not required. When a SMILES is retrieved from the SMILECAS Database, the CAS is automatically inserted in this field. (4) Chemical ID 1: optional description / identity field; not required. (5) Chemical ID 2: optional description / identity field; not required. (6) Chemical ID 3: optional description / identity field; not required. (7) Log Kow: the log octanol-water partition coefficient. A value is required unless the KOWWIN Program (Syracuse Research Corporation) is present in the same subdirectory as the ECOSAR Class Program. When KOWWIN (and its two library files) are available in the same subdirectory as ECOSAR. it allows ECOSAR to automatically start KOWWIN and retrieve the KOWWIN estimate. Note: the KOWWIN program located in the ECOSAR subdirectory must be closed while ECOSAR is running. Otherwise, ECOSAR can not use it. A duplicate copy of KOWWIN can be running from a different subdirectory however. (8) Measured Water Solubility: the Measured Water Solubility in mg/L. This field is optional. It is NOT required! When left blank, a Water Solubility will be calculated from the log Kow value. Predicted toxicity values are compared to the Water Solubility .... if toxicity exceeds Water Solubility, the toxicity value is marked with an asterick (*) to indicate 'No Effect at Saturation'. Water Solubility is not used to calculate ecotoxicity values. The estimation methodology is described in Appendix D. (9) Melting Point: the Melting Point (in deg C). This field is optional; not required. It is used to calculate Water Solubility when a measured Water Solubility is unavailable. It generally helps in estimating more accurate water solubilities, but is not required to estimate Water Solubility. (10) Measured Log Kow: the measured log Kow value, if available. This field is informational only. It is not used to calculate ecotoxicity values. The value in the Log Kow field is used to calculate ecotoxicity values. 8 ------- 5.2. Function Keys & Buttons Fl: Accesses a help message for the individual field where the blinking cursor is located. General Help is available from "Help" on the Menu Bar at the top of the screen It is a standard Windows help system; to access a specific help topic, simply click on the topic (or keyword) that is highlighted in green where the mouse pointer changes to a hand. Previous F2: Pressing the F2 key or clicking the "Previous" button recalls the most recent SMILES and chemical name that was calculated or attempted to be calculated by the program. It can save a lot of time when making small changes to large SMILES and names. It is especially useful after a SMILES notation error occurs....the incorrect SMILES can be recalled and edited. F3: Clears the currently displayed SMILES Notation, Chemical Name and other data. All entry fields are filled with blank spaces. Get User F4: Pressing the F4 key or clicking the "Get User" button displays a file selection dialog box that allows the user to open a file of previously saved SMILES notations and chemical names. The default name of the file is SMILES.INP; this is for compatibility with similar programs. The file selection box looks for files with the extension ".INP", so it is best to name files with this extension when creating them with the F6 key ("Save User"). A "Get User" file can contain up to 1500 SMILES and names and the user can select any single SMILES and name for input. The SMILES.INP file can be created one chemical at a time by using the F6 key as described below. Also, "Get User" option is only usable after a file has been created with the F6 key feature!! An example screen is shown to the right. Selection is made by highlighting the desired line and clicking to "OK" button or by double-clicking the desired line. See Appendix B for the correct file format required! Figure 2. Example User Input File Selection 9 ------- F5: Pressing the F5 key (or clicking the "BatchMode" option on the main menu and selecting "Batch File Input Using SMILES Strings") brings up the selection box shown. The F5 key is used for batch entry of SMILES strings from ascii text files. The text files MUST be in either of two formats. (1) String Format or (2) EcowinFonnat. String Fonnat must have the SMILES string at the beginning of each line in the file; it can then be followed by a space(s) and then the name or other ID. The SMILES is considered terminated at the first space. An example String Fonnat is as follows: CCCCO Butanol clcccccl Benzene Fclcccccl Fluorobenzene CC(=0)C Acetone EcowinFonnat is the same fonnat used by the "Get User" and "Save User" button features. Therefore, the "SMILES.INP" file can be used directly to run batch file outputs. In this fonnat, the name comes first (maximum of 60 characters) followed by a colon and one space, and then the SMILES notation. An example EcowinFonnat is as follows: Butanol: CCCCO Benzene: clcccccl Fluorobenzene: clccccclF Acetone: CC(=0)C Save User F6: Pressing the F6 key or clicking the "Save User" button displays a file selection dialog box that allows the user to save the SMILES notation and chemical name cunently showing on the data entry screen to the file. The default name of the file is SMILES.INP; this is for compatibility with similar estimation programs. After a file is selected (or entered by the user), ECOSAR appends the SMILES notation and chemical name currently showing on the data entry screen to the file. If the file does not already exist, ECOSAR will create it and append the cunent SMILES and name as the first entry. The SMILES and names in a "Saver User" file can be accessed from the data input screen by pressing the F4 key. See Appendix B. F7: The F7 key is used to enter CAS numbers from an ascii text file...the number of CAS numbers in the file is not limited. The user must enter the file name....a election menu is not cunently available. The F7 key is used primarily for batch-mode runs...output is written to files named "CASLOG#.OUT" where "#" is a number detennined by the program. The fonnat of the ascii text file is: no spaces in front of the CAS number, hyphens and leading zeros are optional, and a trailing cartridge return....example: 000050-00-0 71-43-2 108883 000050-02-2 NOTE: the presence of SRC's SMILECAS.DB database is required! It is not included with ECOSAR Class Program unless acquired separately. Select Batch Text Format: m Cancel Batch Text Format Choices — StringFormat — on each line, the SMILES string comes first and ends with the first blank space...name or ID can follow the blank space. EcowinFormat — each line must be in the format used by the "Get User" list which is kept in the SMILES.INP file. 10 ------- CAS lnpuT~| F8: Pressing the F8 key or clicking the "CAS Input" button requires the presence of a supplemental database file (SMILECAS.DB) and index file in the current subdirectory. A small data entry window is created on the data entry screen which asks for the CAS number of the chemical. An error message will appear in the window if the program can not find the database or index file. The database file contains about 103,000 entries, but not all chemicals with CAS numbers are included in the file. If the chemical is not in the database, an appropriate message is displayed. The program can identify impossible CAS numbers by examining the check digit (the final number of the CAS). The SMILECAS Database is not included with the ECOSAR Class Program installation. It must be acquired and installed separately (Syracuse Research Corp., Enviromnental Science Center). Calculate | PgDn: Pressing The PgDn key or clicking the "Calculate" button calculates the SMILES currently showing on the data entry screen. If an acceptable SMILES has been entered, the Results Window will either appear or be updated. If an incorrect SMILES has been entered, an error message box will appear. After removing the error message box, the incorrect SMILES can be recalled and then edited by pressing the F2 key or clicking the "Previous" button. Esc: During data entry, pressing the Esc key exits the program. When the Results Window is active, pressing the Esc key removes the Results Window. Enter: Pressing the Enter (Return) key sends the cursor to the next data entry field. Tab or Shift-Tab: changes entry fields. 5.3. Importing Structures Note: this feature is available only when the KOWWIN program is located in the same subdirectory as the ECOSAR program. ECOSAR requires a chemical structure in a "SMILES notation" format. ECOSAR (v0.99c and above) adds an "import" features that allows other chemical structure formats to be imported directly into ECOSAR. The "import" feature is accessed from the Menu Bar via: "File"...."Import Structure" as shown in the figure below. The "import" features uses the structure format conversion engine of the commercial software package ConSystant(tm) available from ExoGraphics, PO Box 655, West Milford, NJ 07480, (201) 728-0188. Syracuse Research Corporation has a license agreement with ExoGraphics that permits incorporation of the ConSystant(tm) DLL with SRC estimation programs. Imported structures are converted to SMILES notations and placed in the SMILES data entry field of ECOSAR. ECOSAR filters the conversion to make the ECOSAR notation as compatible as possible with ECOSAR. However, some converted SMILES notations (especially SMILES with charged ions) will require some user modification before ECOSAR can estimate the structure. Importable structure formats include: 11 ------- Alchemy in MOL files ChemDraw files ChemDraw Connection Tables HyperChem HIN files MDL MOL files MDL ISIS SKC files Molecular Presentation Graphics MPG files PCModel files Beilstein ROSDAL files Softshell SCF files Tripos Sybyl Line Notations Tripos SYBYL MOL2 files BioCAD Cataylst TPL files Ecosar Classes v0.99d ma Edit Functions =9 Import Structure Exit tnter SMILES: Enter NAME: CAS Number: Chemical ID 1: Chemical ID 2: Chemical ID 3: Log Kow: | - BatchMode ShowStructure Alchemy III file ChemDraw file Ch e rn D raw Co nnectionTable HyperChem HIN file MDL MOL file MDL ISIS SKCfile Molecular Presentation Graphics PCModel file Beilstein ROSDAL file Softshell SCF file Tripos Sybyl Line Notation Tripos SYBYL MOL2 file BioCAD Cataylst TPL Special_Classes Help \S Input Calculate iter Sol (mg/L): j Point [deg C|: sured Log Kow: 12 ------- 6. RESULTS The Results Window presents the results of ECOSAR Class Program's estimations. It appears when a SMILES notation is calculated. Figure 3 below illustrates an example Results Window: Ecowin Results Print Save Results Copy Bernove Window Help SMILES : c1ccccc10 CHEM : Phenol CAS Nun: 108-95-2 ChemlDI: Hydroxybenzene ChemID2: ChenID3: MOL FOR: C6 H6 01 MOL WT : 94.11 Log Kow: 1.46 (User entered) Melt Pt: 41.08 deg C Wat Sol: 3725 mg/L (calculated) IhE =l ECOSAR Class(es) Found Phenols ECOSAR Class Organism Duration End Pt Predicted ng/L (ppm) Konenann Equation : Fish (guppy) 14- -day LC50 373. .245 Phenols : Daphnid 48- -hr LC50 8. .424 Phenols : Daphnid 96- -hr EC50 140. .460 Phenols : Daphnid ChU 3. .193 Phenols : Fish 96- -hr LC50 29. .737 Phenols : Fish 30- -day ChU 4. .573 Phenols : Fish 60- -day ChU 0. .196 Phenols : Green Algae ChU 10. .280 Figure 3. Example Results Screen The Results Window can be moved, sized and placed anywhere on the Microsoft Windows desktop. It does not need to be removed to calculate another SMILES notation; the Results Window will be updated when another SMILES is calculated. The Results Window lists the SMILES (which might have been modified by the program due to aromatic detection or other conversion), molecular formula, molecular weight, and the fragments used to derive the estimation. The following menu choices are available when the Results Window is active: Print: prints the results as shown. 13 ------- Save Results: saves the summary output to a file. The output files are named ECOW*.DAT where is a number from 1 to 100. Numbering begins at 1 and automatically proceeds to number 100. Currently, all results are appended to the same file number until the program is exited. The next time the program is started, the next available number is used; therefore, different files are used from session to session! If all numbers have been used in existing files, then number 1 will be used and the existing file ECOW001.DAT will be overwritten!! Copy: copies the results as shown (minus the rectangle enclosing the estimate) to the Windows clipboard. The results can then be copied into other Windows programs such as word processors. When copied to a word processor (such as Word Perfect, Ami Pro, or Microsoft Word), a non-proportional font (such as courier) must be used for correct formatting!! ... Also, the page width margins must be wide enough! Remove Window: deletes the Results Windows; a new Results Window will appear with the next estimation. It may be more convenient to move and size the Results Window for personal preference (after the first estimation) rather than to remove it after each estimation. If the Results Window is left on the screen, the next estimation results will simply replace the existing results. The Log Kow value in the Result Window designates whether it was entered by the user or calculated by the KOWWIN program. The Water Solubility value designates whether it was calculated or measured (see Appendix D for water solubility estimation methodology). 6.1. Structure Window Note: this feature is available only when the KOWWIN program is located in the same subdirectory as the ECOSAR program. The Structure Window shows a 2-dimensional plot of the chemical structure. An example "Structure" window is shown here. The window shows the entire structure (it does not "clip" sections of the molecule). In order to fit the entire structure in the window, the aspect ratio of the MS-Windows metafile depiction has been rendered proportional (that is, by changing the height or width of the window, the structure scaling changes). At times, the height or width of the window may need to be changed to give a better structure depiction. When results from the Results window are printed with the "Print results with structure" option, the aspect ratio of the structure will be printed (if possible) with the same aspect ratio as the Structure window. Structure File Edit Structure Help 14 ------- The Structure Window Menu Bar gives access to printing the structure, saving the structure as an MDL MOL file or an ISIS SKC file, copying the structure to the MS-Windows clipboard, or changing selected window parameters. Changeable windows parameters include background colors of the structure or bottom text areas. Double clicking the text at the bottom of the window allows the text to be changed. Copying the structure (from the menu bar Edit) to the Windows clipboard has two options: (1) Copy (as placeable metafile): this copies both structure and text to the clipboard. Some word processors and drawing programs require "placeable metafiles" for graph import. The ability of other Windows programs to use placeable metafiles varies. (2) Copy structure (as metafile): the copies only the structure to the clipboard. Most commercial word processors will import this format. 6.2. Konemann Equation The Konemann equation is an equation developed from a variety of different compounds (including chlorobenzenes, chlorotoluenes, chloroalkanes, diethyl ether and acetone) using guppies and 14-day exposure periods (Konemann, 1981). The equation is: Log (1/LC50) = 0.871 log Kow -4.87 where LC50 is in umol/L. 6.3 Example SAR Equations The following are example SAR equations used by the ECOSAR Class Program to calculate ecotoxicity values. They are indicative of all SARs calculated from SMILES and log Kow values. Acrvlates: Log 48-h LC50 = 0.00886 - 0.51136 log Kow (Daphnids, mortality) Log96-hLC50 = -1.46 - 0.18 log Kow (Fish, mortality) Log ChV = -1.99 - 0.526 log Kow (Fish chronic value; survival/growth) Log96-hEC50 = -1.02 - 0.49 log Kow (Green Algae, growth) The values calculated by these equations are in units of millimoles/L. 15 ------- 7. BATCH RUNS Batch runs are used to make multiple estimates from a single input file. The ECOSAR Class Program can make "batch runs" from three different types of input files. Each input file must be in a specific format, otherwise, the batch run will fail. Program access to "batch-runs" is available from (a) the top menu option "BatchMode", (b) various options under the top menu option "Functions", and (c) the F5, F7 Function keys. The following describes each "batch run" input file that ECOSAR Class Program can use. (1) CAS Number List - This is a plain text file (usually with a ".txt" file extension) containing a list of CAS (Chemical Abstract Service) Registry numbers. The format of the ascii text file is: no spaces in front of the CAS number, hyphens and leading zeros are optional, and a trailing carridge return. For example: 000050-00-0 71-43-2 108883 000050-02-2 NOTE: the presence of SRC's SMILECAS.DB database is required! The SMILECAS Database must be in the same subdirectory as the ECOSAR Class Program program. There is no limit to the number of CAS numbers in the file. The F7 function key accesses the CAS batch list option. (2) SMILES String, String Format List - This is a plain text file (usually with a ".txt" file extension) containing a list of SMILES notations. A "String Format" list must have the SMILES string at the beginning of each line in the file; it can then be followed by a space(s) and then the name or other ID. The SMILES string is considered terminated at the first space. An example String Format is as follows: CCCCO Butanol clcccccl Benzene Fclcccccl Fluorobenzene CC(=0)C Acetone The F5 function key accesses the SMILES String batch option. The output file is named "BATCH#.OUT" where is a number determined by the program. (3) SMILES String, Ecosar Format List - This is a plain text file (usually with a ".inp" file extension) containing a list of SMILES notations. EcowinFormat is the same format used by the "Get User" and "Save User" button features. Therefore, the "SMILES.INP" file can be used directly to run batch file outputs. In this format, the name comes first (maximum of 60 characters) followed by a colon and one space, and then the SMILES notation. An example Pckoc Format is as follows: 16 ------- Butanol: CCCCO Benzene: clcccccl Fluorobenzene: clccccclF Acetone: CC(=0)C The F5 function key accesses the SMILES String batch option. The output file is named "BATCH#.OUT" where is a number determined by the program. 7.1. Batch Output Formats Batch runs can capture results as either "Full Output" or "Summary Output". Full Output captures results for each compound the same as they would appear in the "Result Window" (if each compound was estimated individually); these output files can get very large for large numbers of compounds. Summary Output captures selected results and places these results on a single line for each compound. Before running a batch with "Summary Output", the format of the output file can be selected from the dialog box shown here. The default is "space filled" with required identifiers to identify various results. Output can also be "Comma de-limited" or "Tab de-limited". These output selections separate results on each line with either commas or tabs. This is useful for importing batch output file directly into other programs( such as Microsoft Excel™ or Lotus 123™ spreadsheets). Batch Output Format r Select Original (spaces] C Comma de limited C Tab de-limited m Select the format (or single line (summary) batch output files. Original uses space de limiters and parentheses identifiers when needed (this is the default]. Comma or Tab-delimited output separates selected result options with either commas or tabs. OK Cancel 17 ------- 8. SPECIAL CLASS CALCULATIONS The ECOSAR Class Program has been developed primarily for the following scenario: (1) enter a SMILES notation, (2) computer determination of appropriate ECOSAR classes for the SMILES notation, and (3) calculate the ecotoxicity SARs using a log Kow value. Several "Special Classes" of ECOSAR SARs or classifications do not use the log Kow value or can not be adequately classified from the SMILES (in this ECOSAR version). These "Special Classes" include (a) Polymers, (b) Inorganics, (c) Dyes, and (d) Surfactants. The current version of the ECOSAR Class Program does not include SARs for Polymers, Dyes, or Inorganics (these may be added in the future). However, SARs are available for various Anionic, Cationic, Nonionic, and Amphoteric Surfactants. Instead of the log Kow value, these SARs utilize the number of ethoxylate units or the average length of a carbon chain. These "Special Classes" are accessed from the Main Menu bar (see Figure 4). Ecosar Classes v0.99d File Edit Functions BatchMode ShowStructure Special_Classes Help Previous | Get User | Save Use Dyes Enter SMILES: Enter NAME: CAS Number: Chemical ID 1: Chemical ID 2: Chemical ID 3: Log Kow: Polymers Inorganics Organometallics Surfactants.. D Measured Water Sol (mg/L): || Melting Point [deg C): Neutral/Nonionic Anionic Cationic Amphoteric Measured Log Kow: Figure 4. Special Class Menu Options The Special Classes have their own data entry dialog box (see Figure 5). The calculated results are placed in the same Results Windows as results using SMILES notations (an example is illustrated in Figure 6). Note: the Water Solubility or Water Dispersibility fields in the data entry dialogs are not used in SAR calculations. 18 ------- Amphoteric Surfactants Chemical Name: Surfactant P100 CAS Num: Chemical ID 1: Chemical ID 2: Chemical ID 3: Water Sol (mg/L): Num Ethoxylates: 3.00 El Highlight Amphoteric Surfactant Class. Press Calculate Button. (Note: SMILES Not Required or Used) Alkyl-N itrogen-E thoxylates Surfactants, Ethomeen (C=08) Surfactants, Ethomeen (C=09) Surfactants, Ethomeen (C=10 Surfactants, Surfactants, Surfactants, Surfactants, Surfactants, Surfactants, Surfactants, Surfactants, Ethomeen Ethomeen Ethomeen Ethomeen Ethomeen Ethomeen Ethomeen Ethomeen (C=11) (C=12] (C=13) (C=14J IC=15] (C=16] IC=17] (C=18) Calculate Cancel Figure 5. Example Entry Dialog Box for Surfactants Ecowin Results Print Save Results Copy Be move Window Help [sE CHEM : Surfactant P100 CAS Nun: ChemlDI: ChemID2: ChemID3: Wat Sol: 0 mg/L Number of Ethoxylates: 3.00 ECOSAR Class: Surfactants, Ethomeen (C=10) Organism Daphnid fish Algae Duration 48-hr 96-hr 96-hr End Pt LC50 LC50 EC50 Predicted mg/L (ppm) 7.71*5 7.745 7.745 -d Figure 6. Example Results Window for Surfactants 19 ------- 9. BIBLIOGRAPHY Koneman, H. 1981. Fish toxicity tests with mixtures of more than two chemicals: a proposal for a quantitative approach and experimental results. Toxicology 19: 229-238. Meylan, W.M. and P.H. Howard. 1994a. Upgrade of PCGEMS Water Solubility Estimation Method (May 1994 Draft), prepared for Robert S. Boethling, U.S. Environmental Protection Agency, Office of Pollution Prevention and Toxics, Washington, DC; prepared by Syracuse Research Corporation, Environmental Science Center, Syracuse, NY 13210. Meylan, W.M. and P.H. Howard. 1994b. Validation of Water Solubility Estimation Methods Using Log Kow for Application in PCGEMS & EPI (Sept 1994, Final Report), prepared for Robert S. Boethling, U.S. Environmental Protection Agency, Office of Pollution Prevention and Toxics, Washington, DC; prepared by Syracuse Research Corporation, Environmental Science Center, Syracuse, NY 13210. Meylan, W.M. and Howard, P.H. 1995. Atom/Fragment contribution method for estimating octanol-water partition coefficients. J. Pharm. Sci. 84: 83-92. Meylan, W.M. and Howard, P.H. 1996. Improved method for estimating water solubility from octanol/water partition coefficient. Environ. Toxicol. Chem. 15: 100-106. Weininger, D. 1988. SMILES, A Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 28: 31-36. 20 ------- APPENDIX A Selected SMILES Information A SMILES notation is considered terminated at the first blank space. Characters following the first blank space are ignored! Entering the Nitro Function (N02) The nitro function (N02) is usually written as N(=0)(=0) or N(=0)=0 in SMILES notation. In this program version, the nitro function can also be designated (simply) by the capital letter T. Entering the Sulfonic Acid Function The sulfonic acid function (-S02-0H) is usually written as S(=0)(=0)0 in SMILES notation. Carbonyl Function (C=0) Information The carbonyl function (C=0) should always be entered in upper case letters. Additional information is presented in the SRC document "A Brief Description of SMILES Notation". Metals & Charged Species Charged species can not be entered directly into the program with + and signs. Compounds, such as QACs (quaternary ammonium compounds), can be entered by simply attaching the charges as if a direct bond exists; for example, tetramethyl ammonium bromide can be entered as > N(C)(C)(C)(C)Br ...also, for many hydrochlorides, simply ignore the HCL portion of structure (leave it out and enter the compound as the nonhydrochloride; alternatively, see section below). ECOSAR Class Program can accept and evaluate the following METALS: Na sodium Hg mercury K potassium Li lithium Use the chemical symbol to include any of these metals; for example, sodium acetate could be: Na0C(=0)C. Alternatively, the above metals and ALL OTHER metals can be put in a SMILES notation by bracketing as follows: [Na] sodium [As] arsenic [Ca] calcium [Sn] tin [Pb] lead.... etc. Valence charges are NOT evaluated in brackets!! ATTACH metals to the corresponding negatively charged species and do NOT use + and charges in the SMILES!! Example: in some SMILES notations, sodium hexanoate would be entered as: [Na+][0]C(=0)CCCCC however, this is not allowed in this program because charges are not allowed and oxygen can not be bracketed. Entering Hydrogen Directly For ECOSAR Class Program, direct hydrogen entry in a SMILES notation is not allowed with the exception of connection to aliphatic or aromatic nitrogen for the purpose of entering a nitrogen with a valence greater than +3 (eg, various quaternary ammonium compounds and hydrochlorides)...nitrogens with a valence of +3 or less ignore direct hydrogen entries. Hydrogen is entered as an upper case H (as in the following examples): (1) acridine hydrochloride: clccc2cc3ccccc3n(H)(CL)c2cl (2) benzenepentanamine hydrochloride: clccccclCCCCCN(H)(H)(H)CL When to include the "HCL" in SMILES for various hydrochlorides depends upon the nature of the hydrochloride...for example, most hydrochlorides represented generically as: Formula HCL can ignore the HCL; however, most ammonium-type compounds (such as #2 above) require the direct hydrogens. Aromatic Selenium Aromatic selenium can be entered as either (1) lower case se or (2) as [se] ....for example, selenofuran could be entered as (1) clcseccl or as (2) clc[se]ccl ....if entered as: Cl=CSeC=Cl, ECOSAR Class Program will automatically convert it to: c 1 c [se] cc 1 Miscellaneous In selected diazoacetyl compounds (eg. azaserine, N2=CH-C0-0-CH2-CH(-NH2)-C00H), the N2 is commonly written as: N+=N". For the purposes of SMILES notation, the unit is considered as: N#N. 21 ------- APPENDIX B Description of the User Input File The User Input File is a file containing up to 1500 SMILES notation and chemical names that can be accessed during the execution of ECOSAR Class Program. It can be used to enter SMILES notations and chemical names onto the data entry screen. By default, the User Input File is named SMILES.INP. This name must be used; it can not be changed by the user. The 1500 entries that comprise SMILES.INP are determined by the user. This file can be useful for purposes other than data entry into ECOSAR Class Program. For example, it can be used for record keeping purposes. It can also be used for entering data into other estimation programs available from Syracuse Research Corporation that utilize SMILES.INP and SMILES notation, such as HENRYWIN (estimation of Henry's Law Constant), AOPWIN (estimation of atmospheric oxidation) and KOWWIN (estimation of octanol- water partition coefficient). The User Input File is accessed during ECOSAR Class Program data entry by pressing the F4 key. The SMILES.INP file must exist in the subdirectory from which ECOSAR Class Program was started. The SMILES notation and chemical name showing on the data input screen can be added to the SMILES.INP file by pressing the F6 key during data entry. If the SMILES.INP file doesn't already, the F6 key will create it and add the current notation and name as the first entry. Currently, there is no way to edit or delete entries to SMILES.INP during ECOSAR Class Program. However, SMILES.INP is a plain text file and it can be edited with any text editor or word processing program (as long as it is imported and saved as a DOS text file). Any text editor or word processing program can be used to create and add entries to SMILES.INP as long as the format is correct. The correct format is the following: the chemical name (up to 60 characters) followed by a colon (:), then one space (and only one space) followed by the SMILES notation and a carriage return. APPENDIX C CAS Number Data Base The CAS Number data base is used to input SMILES notations and chemical names onto the data entry screen by entering the Chemical Abstract Service (CAS) Registry number of a chemical. It is available as a separate product from Syracuse Research Corporation and is not included with the ECOSAR Class Program. The CAS Number data base (SMILECAS.DB) and index file (SMILECAS.IDX) must be located in the subdirectory from which ECOSAR Class Program was started. A hard disk is required to use the data base due to the size of the data base. The SMILECAS.DB file is approximately 7.3 MB and the index file is approximately 2.4 MB. The CAS Number data base currently contains 103,000 entries. The initial 20,000 entries were obtained from the U.S. EPA file of CAS numbers, SMILES notations and chemical names used by the GEMS program software. The entries in this file are the discrete organics listed in the U.S. EPA TSCA Inventory. Although the number of entries is large, various chemicals that may be of interest may not be included in the data base. The CAS Number data base is accessed by pressing the F8 key at the data entry screen. A pop-up window will appear requesting entry of the CAS number. The SMILECAS.DB file is a translated version of a dBase® III+ DBF file. The dBase® DBF file is not used by ECOSAR Class Program due to the inefficient space filling which exists in a DBF file (the DBF file is about 35 MB in size compared to 7.3 MB for the DB file). 22 ------- APPENDIX D Estimation of Water Solubility The ECOSAR Class Program estimates water solubility using methodology developed for the U.S. EPA and described in Meylan and Howard (1994a, 1994b, 1996). The estimation equations used in the current version are as follows: No Melting Point Available: log WaterSol (moles/L) = -0.312 - 1.02 log Kow Liquid at 25 deg C: log WaterSol (moles/L) = 0.551 - 1.091 log Kow Solid at 25 deg C: log WaterSol (moles/L) = 0.2236 - 1.009 log Kow - 0.00956 (Tm - 25) (where Tm is the melting point in deg C) Note: all water solubility estimates pertain to 25 deg C. 23 ------- APPENDIX E List of ECOSAR Chemical Classes The following is an alphabetic list of chemical classes identified from SMILES notations by the ECOSAR Class Program: Acid Chloride/Halide Neutral Organics Acrylamides Peroxy Acids Acrylates Phenols Aldehydes Phenols (dinitro) Aliphatic Amines Propargyl Alcohols Anilines (amino-meta) Propargyl Alcohols - Hindered Anilines (amino-ortho) Propargyl Ethers Anilines (amino-para) Quinone Aromatic Amines Salicylates Azides Salicylic Acid Aziri dines Schiff Bases Benzotriazoles Silamines Benzyl Alcohols Silanes (alkoxy) Benzyl Amines Surfactants-anionic Benzyl Halides Surfactants-cati oni c Diazoniums Surfactants-noni oni c Diepoxides Thiazolidinones Diketones Thiazolinone (iso-) Dinitro Aromatic Amine Thiocyanates Dinitrobenzenes Thiol s(mercaptans) Epoxides Thiophenes Esters Triazines Esters (phosphate) Ureas(substituted) Haloacetamides Vinyl/Allyl Alcohols Hydrazines Vinyl/Allyl Ethers Imides Vinyl/Allyl Halides Isocyanates Vinyl/Allyl Ketones Malononitriles Vinyl/Allyl Sulfones Methacrylates 24 ------- |