Technical Assessment of the Current Tentatively Identified Compound (TIC) Protocol


           United States
           Environmental Protection
           Agency
          Office of Research and
          Development
          Washington, DC 20460
EPA/600/R-97/011
December 1997
svEPA
Technical Assessment of the
Current Tentatively Identified
Compound (TIC) Protocol

-------
       Technical Assessment of the Current
Tentatively Identified Compound (TIC) Protocol
                     September 1997
                      J.R. Donnelly
                       Task Lead
           Lockheed Martin Environmental Services
            U.S. Environmental Protection Agency
       National Exposure Research Laboratory-Las Vegas
              Environmental Sciences Division
              Environmental Chemistry Branch
                   G. Wayne Sovocool
                Work Assignment Manager

-------
Notice:

The U.S. Environmental Protection Agency (EPA), through its Office of Research and Development
(ORD), partially funded and collaborated in the research described here. It is intended for internal EPA use
only. Mention of trade names or commercial products does not constitute endorsement or recommendation
for use.
Acknowledgments:

Data system studies presented in this report include experimental work by D. Youngman and N.
Herron (Lockheed Martin).

The report incorporates substantial review comments from Dr. John M. McGuire (formerly, EPA,
ERD-Athens, Chair, TIC Improvement Task Force) and Mr. Gary L. Robertson (ERP, NERL-LV), two
chemists who have been intimately involved with the CLP. They reviewed both the initial draft document
(February, 1995) and the subsequently revised document (April, 1995) incorporating their earlier comments.
The report also includes suggestions and review comments from Dr. Wayne N. Marchant (former Director,
CRD-LV), Dr. Christian G. Daughton (Acting Chief, ECB, ESD-LV), and Dr. Donald F. Gurka and Mr.
Michael Hiatt, also in ECB.

The external to EPA peer review was provided in July, 1997 by: Mr. David W. Bottrell, Chemist,
Data Management Program Manager, U.S. Department of Energy (DOE), Office of Environmental
Management, Germantown, Maryland; Dr. James D. Petty, U.S. Geological Survey, Chief, Chemical Fate and
Dynamics Branch, Environmental Contaminants Research Center, Columbia Missouri; and Mr. Martin H.
Stutz, Senior Chemist, Environmental Technology Division, U.S. Army Environmental Center, Aberdeen
Proving Ground, Maryland.

The contributions of all of the above reviewers to the quality of the document are gratefully
acknowledged.
ii

-------
                                   CONTENTS


Notice	ii

Acknowledgments	 ii

CONTENTS	iii

TABLES  	  v

FIGURES	  vi

EXECUTIVE SUMMARY	vii

1.  PURPOSE	  1

2.  TECHNICAL CONSIDERATIONS	  1

3.  DATA REVIEW PROCEDURES	  3

4. SPECIFIC PROBLEMS IN DETECTING, IDENTIFYING, AND QUANTITATING TICS	  4
      4.1 Scan range limitations	  4
      4.2 Library deficiencies	  5
      4.3 Limitations of low resolution quadrupole GC/MS	  5
      4.4 Data system capabilities	  6
      4.5 Isomer identifications	  6

5.  FINDINGS FROM CLP DATA REVIEWS	  6

6.  DATA SYSTEM CAPABILITY STUDIES	  8
      6.1 NIST and Wiley libraries	  8
      6.2 File formats	  10
      6.3 Background Subtraction	  11
      6.4 Isotopic ratio and elemental composition program 	  11

7.  ALGORITHMS AND PROCEDURES FOR TIC IDENTIFICATIONS 	  11
      7.1 Background Subtraction	  12
      7.2 Biller-Biemann Spectral Isolation Algorithm	  12
      7.3 Mass Spectral Enhancement	  12
      7.4 Biller-Biemann Library Search Algorithm	  15
      7.5 Probability-Based Matching Algorithm	  15
      7.6 Performance Comparison of PBM and Biller-Biemann Algorithms	  16
      7.7 Normalization and Tilting Algorithms	  16

8.  ADVANCED DATA HANDLING STUDIES	  17
      8.1 MassTransit™ 	  17
                                        iii

-------
      8.2 Molecular weight estimation	  19
      8.3 Application of Colby's concept	20

9. QUALITY ASSURANCE PROCEDURES  	27

10. CONCLUSIONS 	27

11. REFERENCES	  30
                                      IV

-------
                                         TABLES

Table 2.1  Summary of TIC Data Study	 3
Table 6.1  Molecular weight ranges of compounds in the NIST data base	 10
Table 7.1.  The steps performed in the spectral enhancement operation	 14
Table 8.1.  Data file input formats, including commercial brands, supported by MassTransit™	 18
Table 8.2.  Output data file formats, including proprietary commercial, supported by MassTransit™	 19
Table 8.3.  Comparison of native and enhanced mass spectral quality indicators	23

-------
                                           FIGURES
Figure 8.1. Comparison of native and enhanced total ion current chromatograms demonstrating
        effects of different summing intervals	21
Figure 8.2. Enhanced total ion chromatogram of indeno(l,2,3-cd)pyrene and dibenzo(a,h) anthracene
        showing natural peak widths of late-eluting compounds	 22
Figure 8.3. a) Native and enhanced total ion chromatograms of a naphthalene-containing mixture; b)
        native mass spectrum of naphthalene in mixture retrieved by data system; c) enhanced mass
        spectrum of naphthalene in mixture; d) NIST reference spectrum from those chromatograms.  ... 24
Figure 8.4. Plot of the standard deviations of simulated 0A under varying noise conditions versus the
        estimated 0A value	 26
Figure 8.5. Comparison of a) spectral enhancement results using the quadratic fit, and b) spectral
        enhancement results without calculating the quadratic fit	 26
                                               VI

-------
EXECUTIVE SUMMARY
The National Exposure Research Laboratory-Las Vegas (NERL-LV) conducted research on
tentatively identified compounds (TICs) using Superfund samples and data submitted by the Contract
Laboratory Program (CLP). This research effort is intended to provide valuable information for Superfund
regarding TICs, which comprise approximately 90% of the analytes detected in Superfund samples. [The
studies presented in this report on TICs used Superfund data submitted by the CLP.] These studies involved
reviewing the CLP GC/MS hard-copy data and the raw data files to assess the effectiveness of current CLP
protocols for TICs, and to provide information complementary to that reported by the CLP.

In this project, 99,513 TICs were reported in 792 Sample Delivery Groups (SDGs) studied. Of
these, the CLP reported identifications with Chemical Abstracts Service (CAS) numbers for 16%. Not all of
these identifications were correct, however. It was estimated from this study that perhaps 30% of these 16%
were correct, and possibly another 10% were correct except that the TIC was an isomer of the compound
whose CAS number was reported. Forty-one percent of the TICs were listed as being partially identified, and
the remaining 43 percent were reported as "unknown". Examples of partial identifications include "unknown
chlorinated aromatic," "unsaturated hydrocarbon," "unknown PAH" (polynuclear aromatic hydrocarbon), etc.
In many cases, TICs were reported by the CLP as "unknown" despite having a high-probability mass spectral
library match found by the data system. Data reviews conducted in this study emphasized analytical results
for TICs in soils because this matrix type was found to be much more likely than water to contain compounds
of interest.

An overview of the data indicated that the most commonly reported classes of TICs were saturated
and unsaturated hydrocarbons. The next most prominent groups of compounds were PAHs and aromatic
compounds, which were frequently substituted with aliphatic hydrocarbons. Higher molecular weight steroid
compounds (i.e., cholesterol) as well as PCBs were also reported. Elemental sulfur was reported by the CLP
laboratories in ca. 50% of soil samples. The Target Compound List (TCL) phthalates were found in ca. 80%
of the soil samples, and non-TCL phthalates were found in ca. 10%. TIC mass spectra were found, including
some with recognizable chlorine and/or bromine ion groups, for which there are no library spectra in the
computerized mass spectral data bases. Such spectra must presently be manually interpreted. Additional
compounds could be included in the NIST and Wiley mass spectral data bases to assist in identifying TICs
for environmental monitoring efforts. These compounds include higher molecular weight PAHs, and
industrial process solvents and chemicals. Additional pesticide metabolites and degradation products should
be included; some are available in hardcopy form but not in commonly used software data bases.

The reporting trends from the laboratories were highly varied with respect to accuracy of reporting
TICs. While some laboratories made honest efforts to identify the TICs, others simply labeled all TIC peaks
as unknown and made no attempt to accurately identify the TICs. Several laboratories would only commit to'
identifying sulfur and reported all other TICs as unknown. Some laboratories used the name of the library
match compound with the highest score on the library search, treating that as the identification regardless of
whether that identification was reasonable based upon manual spectral interpretation.

-------
proposed by Colby and in-house developed spreadsheet-based macros and other procedures. A comparison
of die two library searching algorithms was performed on twenty CLP data files.  Library searches were
conducted with an HP DOS data system using PBM and with a Finnigan system using the Biller-Biemann
algorithm. The results demonstrated that the two algorithms performed to the same level of quality and
reliability for compounds in the molecular weight ranges associated with semivolatile TICs.

        Commercial MassTransit™ software was tested for its ability to facilitate converting data files into
formats usable by other data systems, including spreadsheets on personal computers. Interfacing the different
MS data system formats through this software effectively accomplished a standardization of TIC data. This
software converted GC/MS data from various contractors into ASCII text.

        These data were subsequently imported into an EXCEL™ spreadsheet and manipulated using a macro
written in-house to perform mass spectral resolution enhancement. The spreadsheet macro performed the
sorting and statistical and mathematical procedures necessary to separate the TICs from interfering
compounds. The extracted mass spectra were compared with reference spectra contained in the NIST
database.  Results of these comparisons showed that usually 80 to 90 percent of the ions contained in the
reference spectra were  successfully extracted using this method.  This procedure improved mass spectral
quality, and the data system's ability to perform successful  library searches. The fit quality  parameters
showed systematic improvements after subjecting the data to resolution enhancement procedures.

        In summary, this project investigated the effectiveness of current TIC reporting under the CLP
protocols. Identification of TICs was found to be of variable quality across participating laboratories. The
commonly used mass spectral data systems were found to provide essentially the same results.  The libraries
(Wiley and NIST) contain numerous entries that are not necessary for environmental studies and increase
search time somewhat, and could be improved by adding additional compounds relevant to environmental
monitoring. The available algorithms and data system procedures are satisfactory and provide virtually
identical results. Mass spectral resolution enhancement procedures could materially help in identifying TICs
by separating TIC spectra of interest from those of aliphatic hydrocarbons or other background signals.
Several recommendations were made to improve the effectiveness of TIC reporting, although the specific
mechanisms for implementation would require further study and testing:

(1)     require reporting the first library match of TICs on Form 1 if the library match meets a specified
        probability level, rather than "unknown" regardless of search results;
(2)     require reporting of TICs when the mass spectra, or library matches, indicate the presence of
        heteroatoms such as N, P, S, halogen, or heavy metals;
(3)     incorporate RRT data and frequency of occurrence of TICs into a data base, and add RRT criteria to
        aid in TIC identification;
(4)     extend the mass range to 600 Daltons (mass units, Da), with a tune emphasizing greater sensitivity
        for masses above 300 Da, for better detection and  identification of TICs in this higher molecular
        weight range;
(5)     recommend or require that different GC temperature programs or column phases be used in a second
        analysis to improve separation of some TICs; and
(6)     because sulfur was found in approximately 50% of the soil sample data, improvements to the GPC
        procedure, a cleanup with copper, or some other improved procedure would be worthwhile;
(7)     identify additional compounds suitable for addition to the mass spectral data bases, because they
        have been found, or would be anticipated in known types of waste sites.
                                              viii

-------
We understand that the first recommendation is currently being implemented by OERR. The remaining
recommendations may not be suitable for a contract mechanism, instead, they may be better implemented by
having an experienced laboratory specialize in the identification of TICs, with the freedom to use instruments
and altered conditions for the solution of specific problems.
                                               ix

-------
1. PURPOSE:

The National Exposure Research Laboratory-Las Vegas (NERL-LV) conducted research on
tentatively identified compounds (TICs) using Superfund samples and data submitted by the Contract
Laboratory Program (CLP). This research effort was intended to provide valuable information for Superfund
regarding TICs. The initial effort included developing strategies and procedures necessary to assess and
enhance the TIC information obtained through the CLP on samples submitted by the Regions. State-of-the-
art approaches already developed through Office of Research and Development (ORD) research were applied
to maximize the potential the identification of compounds in samples from Superfund sites. The results of
these research-level studies on actual samples are designed to assist the Regions and the Project Officers in
the following ways:

1) measure the efficacy of current TIC identification and reporting procedures for compounds present in
Superfund samples;

2) identify improvements that may be needed for specific sample types, analyte classes, or
programmatic reporting requirements;

3) identify compounds having potential human or environmental risk;

4) detect compounds, singly or in a series or group, that may be useful for determining the source of the
pollution, or for tracking separate effluent streams or point sources.
2. TECHNICAL CONSIDERATIONS

The studies presented in this report on TICs used Superfund data submitted by the CLP. All regular
analytical services (RAS) data available at NERL-LV i.e., those data known to be generated under the CLP
protocols were studied. These studies involved reviewing the CLP GC/MS hard-copy data and the raw data
files to assess the effectiveness of current CLP protocols1 for TICs, and to provide information
complementary to that reported by the CLP. The results included identifying TICs that may have value for
environmental monitoring and remediation, confirming the presence of suspected compounds, and where
necessary, correcting misidentifications. The study targeted both hazardous and potential marker (tracer)
compounds for determining the source of contamination. Some compounds may not be considered toxic or
suitable source marker compounds, but the techniques could be used equally well on other Superfund samples
for components that could have these characteristics.

In this project, 99,513 TICs were reported in 792 Sample Delivery Groups (SDGs) studied (Table
2.1). TICs were found to comprise approximately 90-95% of the analytes detected in the Superfund sample
data studied. Data reviews conducted in this study emphasized analytical results on soil, because this matrix
type was found to be much more likely than water to contain compounds of interest. Of these, the CLP
reported identifications with Chemical Abstracts Service (CAS) numbers for 16%. Not all of these
identifications were correct, however. It was estimated from this study that perhaps 30% of these 16% were
correct, and possibly another 10% were correct except that the TIC was an isomer of the compound whose
CAS number was reported. Forty-one percent of the TICs were listed as being partially identified, and the
remaining 43 percent were reported as "unknown". As a result, 84% to 90% of the analytes in Superfund
samples remain unidentified under the current CLP reporting procedures. Examples of partial identifications
include "unknown chlorinated aromatic," "unsaturated hydrocarbon," "unknown PAH" (polynuclear aromatic

-------
hydrocarbon), etc. In many cases, TICs were reported by the CLP as "unknown" despite having a high-
probability mass spectral library match found by the data system. Long and McGuire studied the data sets on
27 samples, agreeing with the CLP TIC identifications 36% of the time. They recommended discouraging the
use of "unknown" for reporting TIC identities.2

It should be noted that traditionally an absolute chemical structure determination is based upon
synthesis by a rational method giving unique products or by X-ray crystallography. It is generally impractical
to separate individual analytes from environmental samples to perform such structure determinations, because
of separation difficulties, low concentrations of the analytes, and cost or sample throughput considerations.
Secondary but powerful identification methods may involve selected, problem-specific combinations of
techniques such as nuclear magnetic resonance, optical spectroscopy, arid mass spectrometry.

While less rigorous, the CLP identifications of Target Compound List (TCL) analytes are usually
reliable because they are derived from (a) mass spectra matched against spectra of chemical standards
obtained on the same instrument, and (b) capillary column gas chromatographic (GC) relative retention times
(RRTs) matched to those of the chemical standards obtained on the same instrument. In contrast, the
"identifications" of TICs are of lower reliability because authentic standards were not used for reference
spectra and retention times. For TICs, the library data base spectra may have been obtained under different
experimental conditions (instrumentation, sample introduction), and a GC RRT data base to assist in TIC
identifications is not specified in the CLP protocol. However, the EPA set up a TIC Work Group of involved
and interested parties, and the Group has developed a GC RRT data base for many TICs. In many cases,
partial identifications may be sufficient to decide if further investigation is needed. For example, mass
spectral data consistent with the assignment of chemical features (halogens, aromatic rings, alkyl groups, etc.)
may be sufficient for some purposes. Reinterpretations and identifications made in this project utilized
authentic standards whenever possible. In other cases, the identifications could not be considered fully
confirmed but are still not as "tentative" as those reported by the CLP.

Currently, CLP laboratories are required to provide TIC identifications with the data package
submission. For the semivolatile fraction, laboratories are required to report (per sample) a maximum of 20
TIC identifications (30 TICs for CLP OLMO3.1), whose individual areas are over 10% of that of the nearest
internal standard.1 They submit the top three data system "hits" or tentative identifications in the data
package, and are expected to interpret these results and report the most likely identification on Form I. The
quality of the interpretations reported on this summary form is important because the raw data printouts may
not be retained by many data users due to space limitations. The accuracy and completeness of these
identifications vary widely among laboratories. Problems and inconsistencies result for the users of TIC data
(see Section 5, below).

-------
Table 2.1. Summary of TIC Data Study.
TIC Reporting by CLP Laboratories
Total
TICs
99,513
100%
Ident.
with CAS
#
15,946
16.0%
Partial
Ident.
40,651
40.8%
Unknown
42,916
43.1%
Data Reviewed in this Study
#ofSDGs
Reviewed
792
# of Samples
Reviewed
8,078

12.3 = Average number of TICs reported per sample
10.2 = Average number of samples per SDG
3. DATA REVIEW PROCEDURES

The investigators examined existing hard copy data from the CLP to identify and select cases that
were likely to contain environmentally significant TICs, based upon results reported under the CLP.
Additionally, they evaluated analytical interferences or sample contamination that, while not environmentally
significant, will hamper the identification effort of underlying significant TICs. An initial screening of likely
data was made using the CLP data package Form I for TICs, resulting in the identification of a good data
cross-section.

For this study, "significant" TICs included those that may increase risk, or help to characterize and
differentiate wastes by source or effluent stream. Simple aliphatic hydrocarbons, for example, were not
considered significant on an individual basis, although in aggregate, they might assist in an identification
(e.g., a type of fuel) or in source fingerprinting. It was not the intent of this project to identify all TICs. No
further study was made on work conducted by Viar & Co. for EPA.3 Viar concluded that most TICs are
believed to be relatively harmless or do not have any relevant toxicity data for risk assessment. The type of
study that they performed necessarily utilizes a subset of incomplete lexicological data, and these general
conclusions may not apply to specific situations where a TIC could be found that is "significant" for risk or
source determinations. It is well to remember that unidentified compounds with established health and
environmental concerns e.g., PCBs, were originally recognized while monitoring for other compounds, as
organochlorine pesticides.

Investigators looked for TICs that appeared to contain atoms and functional groups which may be
found in environmental contaminants of concern, including P, Cl, Br, F, N, S, PAHs, and organometallic
compounds containing heavy metals. Emphasis was placed on seeking spectra with significant intensities of
upper mass ions [above ca. 200 mass units, or Daltons (Da)], or masses below 200 Da that are relatively
intense in the spectrum and potentially characteristic. Library hits were verified, when questionable, by
reviewing the GC/MS raw data file tapes to obtain additional information from the original analyses, such as
better quality spectra, better background subtractions, or non-reported spectra. During this process, mass
spectral interpretations were augmented by the application of normally used data system algorithms.

-------
4. SPECIFIC PROBLEMS IN DETECTING, IDENTIFYING, AND QUANTITATING TICS

In this project, the accuracy of TIC identification and quantitation by the CLP-specified procedures
were assessed. This study also evaluated the frequency of problems seen with respect to chromatographic
resolution such as the possibility that overlapping GC peaks prevent accurate identifications.

TICs whose areas are over 10% that of the nearest internal standard are approximately quantitated
against that internal standard. The assumption is made that the response factors for the two compounds are
the same. Several TICs that can potentially appear in sample extracts may be trace components from
required quality control (QC) solutions e.g., low percentage impurities or degradation products in surrogates,
internal standards, etc. The nature of the TIC definition (10% of internal standard) can mean that irrelevant
compounds can become contractually significant in a clean sample. It is important to run associated blanks to
determine if TICs are actually of environmental origin.

Chromatographic peak shape and resolution is generally good. However, organic acids do not
generally chromatograph well on the GC columns commonly used for EPA analysis of semivolatile
compounds. The chromatographic peaks tend to be skewed in a way that frequently causes poor integration
by the data systems. These integration errors result in inaccurate quantitations.

Co-elution of large amounts of hydrocarbons with smaller amounts of TICs may occur. If
hydrocarbons are present in sufficiently large quantities, the portion of the composite mass spectrum due to
the lower-concentration TICs may be given minimal significance or neglected by the computerized matching
algorithm.

Gel-permeation chromatography (GPC) cleanup of soil samples is required in the March, 1990 CLP
statement of work (OLM01.0) and all subsequent statements of work. This cleanup removes large amounts
of high molecular weight hydrocarbons and sulfur from soil sample extracts. However, soil samples
containing high amounts of oil or sulfur have the potential of overloading the GPC column. These
hydrocarbons interfere with the identification and quantitation of target compounds, and the identification of
TICs. High levels of sulfur were also encountered in some water samples.

Instances of reporting and interpretation errors by CLP laboratory personnel were found during the
data reviews. A few instances were found of errors such as manual integration of a peak that eluted during an
electronic power glitch without adequate discussion of the problem and manual integration of electronic noise
as a chromatographic peak. A more common, and significant error found in this study was reporting all TICs
as "unknown" even if the data system library match was good. This last situation may be a result of less
contractual emphasis on these compounds, where the laboratories save the time and cost that would be
expended in TIC interpretations.

The following subsections discuss several scientific issues relating to the CLP method protocol.
These issues may cause TIC data not to be reported or limit the mass spectral interpretation specialist in
assessing the data.

4.1 Scan range limitations

The CLP SOW currently states that the scan range for the analysis of semivolatile compounds is to
be 35 to 500 Da. This can cause the misidentification of compounds that have significant masses or clusters
of masses above m/z 500. Certain TICs require special analyses using higher mass ranges. Structure

-------
assignments may be made, or structural features inferred, from fragment ions of TICs having molecular
weights over 500 Da and from neutral losses calculated by mass differences among fragment ions or between
fragment ions and the molecular ion. Some compounds exhibit characteristic low mass (below 35 Da)
fragments, such as C2 at m/z 24 of highly unsaturated hydrocarbons, m/z 26 from CN, m/z 27 from HCN,
CH2NH2 at m/z 30, H2S at m/z 34, and fragments containing C, H, and N or O at m/z 29 and 30. Hence,
benefits could be obtained from extending the scan range in either direction, for certain TICs. The CLP has
not extended the mass range to avoid detector saturation by air peaks (m/z 28,32), and in recognition that
relatively few TICs of concern having molecular weights over 500 are chromatographed using the protocol
method.

4.2 Library deficiencies

TIC mass spectra have been found, including some with recognizable chlorine and/or bromine
clusters, for which there are no library spectra in the computerized mass spectral data bases. Such spectra
must presently be manually interpreted.

Significant differences in spectral peak intensities have resulted from the use of various tunes, types
of instruments, and inlet systems. It is normally not evident to the CLP laboratory or other library data base
user what exact analytical conditions were used to generate each NIST reference spectrum. Where seemingly
duplicate spectra are present, they may have been generated under different conditions, and one of them may
provide a better match than another to the unknown.

Additional compounds could be included in the NIST mass spectral data base to assist in identifying
TICs for environmental monitoring efforts. These compounds include higher molecular weight PAHs, and
industrial process solvents and chemicals. The inclusion of additional natural products such as vegetation
decomposition products would allow the investigator to eliminate such compounds quickly from further
study. Additional pesticide metabolites and degradation products should be included; while some are
available in the literature (hardcopy form), they are not available in commonly used software data bases.

4.3 Limitations of low resolution quadrupole GC/MS

Some GC/MS spectra do not adequately differentiate between and precisely define the number of
chlorine and bromine atoms present in an analyte. This situation was noted in this study when low intensity
responses are encountered, when the lowest intensity members of mass clusters are not observed above the
noise level, and occasionally when large numbers of these atoms are present and the differences in relative ion
intensities are small among the candidate Brj, Clb and B^CL, clusters. Also, quadrupole instruments
commonly employed by laboratories that work under the CLP may not have adequate sensitivity for the
detection of some analytes and/or precise measurement of smaller peaks (such as carbon-13 isotopes).
Efforts are made through use of logical neutral mass losses in the spectra and manual interpretation to resolve
this problem and obtain additional information about TICs. Molecular ions of some compounds are weak or
missing, a deficiency that has been discussed by Donald Scott of EPA-NERL-RTP.4"6

-------
4.4 Data system capabilities

There are different data processing techniques among the major vendors of quadrupole GC/MS data
systems used for environmental analyses. The two firms with perhaps the longest experience in
environmental applications for their systems (Finnigan and Hewlett-Packard) provide relatively advanced
software to perform necessary data manipulations such as background subtraction and peak enhancements.
Data systems from these two companies are used to report approximately 80% of the data submitted to the
US EPA under the CLP.

The two systems noted above list the detected masses of chromatographic peaks to two decimal
points. This information is occasionally useful for determining whether a given mass spectral peak is due to a
fragment containing negative mass defect atoms yielding masses less than the integer mass (chlorine,
bromine, fluorine, etc.) as contrasted with hydrocarbon ion series having positive mass defects yielding
masses greater than the integer mass. The appearance of mass clusters is more generally useful for the
detection of bromines and chlorines in unknowns.

It was found that another data system, from a smaller vendor, had fewer software-based peak
subtraction capabilities than the above systems. Although this system reports all masses as nominal masses,
a data system command can be used to print the measured masses showing the mass defects. The mass defect
is occasionally useful; a defect near 0.5 Da sometimes occurs, generally from presence of multiple bromine
atoms in the analyte molecule. The data system may misassign the nominal mass when instrumental
variability plus the mass defect combine to reach the 0.5 Da level. In this project, misassigned masses were
found in analytes with negative mass defects, such as hexabromobenzene, and with positive mass defects,
such as arylalkylamines. In the latter case, the actual mass defect was positive, and less than 0.2 Da.

4.5 Isomer identifications

Many times CLP sample reports list TICs with some non-isomer specific formation such as a
"dibromobenzene" because chemical standards to determine relative retention times or retention indices were
not available to determine which particular isomer was present. This situation may have significance for
health concerns, in cases where different isomers have significantly different toxicities.
5. FINDINGS FROM CLP DATA REVIEWS

Samples were found with computerized library matches to sulfonamides, unusual ethers, steroids,
vitamin-E, Cellosolves, thiophenes, acridines, PAHs, carboxylic acids, polychlorinated biphenyls (PCBs),
pesticides, oxidized phpsphines, natural products, and organometallics. Many of these matches were of poor
quality. Some of the spectra in the data packages were difficult to match because of apparently poor spectral
quality. The data examined in this study showed a wide variety of MS ions and fragmentation patterns
characteristic of classes including long chain dioic (dicarboxylic) acids, from soap products, phthalates,
aliphatic hydrocarbons and alcohols, and alkylated benzenes and naphthalenes.

-------
compounds (i.e., cholesterol) and PCBs were also reported. Elemental sulfur was reported by the CLP
laboratories in ca. 50% of soil samples. The TCL phthalates were found in ca. 80% of the soil samples, and
non-TCL phthalates were found in ca. 10%. Of all the data reviewed, there was only one data case in which
an organometallic compound (tetraphenyltin) was found.

Tentative identifications by the CLP varied from the uninformative "unknown" to the more
informative, e.g., "trimethylnaphthalene isomer" or "unknown trimethylnaphthalene." Some TIC spectra
were reported as unknown with the base peak m/z listed in the CLP report summary (Form I-F). The
following are examples of reported items that were considered in this study to be reported as unknown:

Unknown
Laboratory artifact
Laboratory contaminant
Unknown (base peak m/z = 43)

The following are examples of reported items that were considered in this study to be partially identified:

Unknown aromatic
Trichlorobenzene isomer
Unknown hydrocarbon
The reporting trends from the laboratories were highly varied with respect to accuracy of reporting
TICs. While some laboratories (ca. 2%) made honest efforts to identify the TICs, others (ca. 30%) simply
labeled all TIC peaks as unknown and made no attempt to accurately identify the TICs. Some laboratories
(ca. 2%) would only commit to identifying sulfur and reported all other TICs as unknown. Some laboratories
(ca. 20%) used the name of the library match compound with the highest score on the library search, treating
that as the identification regardless of whether that identification was reasonable based upon manual spectral
interpretation. It was also noted that the method of reporting TICs varied among personnel within a given
contract laboratory. There are several examples of all TICs being reported as unknown for half of the
samples within a single SDG while efforts were made to identify or partially identify TICs for the remaining
samples in the same SDG. These percentages indicate that the reporting of TICs is not emphasized by CLP,
and the quality of TIC identifications appears to be independent of laboratory selection and payment for
analyses.

In some cases, the TIC identity is known, but unreported. For example, two isomeric
methylnaphthalenes exist; 2-methylnaphthalene is a Target Compound; 1-methylnaphthalene is a TIC. Their
mass spectra are not distinguished by the data system search, but their GC RRTs are different, and the RRT
for the TCL analyte was measured in the laboratory TCL standards analysis. Many labs report the TIC
isomer as "unknown PAH" although they have enough information for a correct identification. Non-TCL
phthalates were found as TICs in ca. 10% of the samples, but these compounds were not emphasized in this
project. They may be present in the original samples or may be laboratory artifacts, but they are not usually
considered hazardous.

The hard copy (paper) data submitted by the laboratories varied in content as well. The current
SOW for organic analysis by the CLP requires that the laboratory submit the spectrum of the unknown
followed by the spectra of the three best matches (if present). All laboratories do this but the format which is
submitted often lacks information needed for further review or manual spectral interpretation. These

-------
problems include spectra that have no or minimal labeling on the m/z axis, making it difficult or impossible
to accurately determine the correct m/z value for a given ion. Names of spectral matches are often too long to
present the entire name of the match on the graphical display. If the CAS number is not present (this occurs
about one third of the time), the reviewer cannot determine the full name or exact structure of the compound.
6. DATA SYSTEM CAPABILITY STUDIES

GC/MS data systems used in this project included: HP RTE, HP DOS, HP UNIX, Finnigan
INCOS™, VG DOS, and Extrel DEC. The HP and INCOS™ data systems were interconnected by a
personal computer-based local area network (LAN) equipped with MassTransit™ Version 1.02a (Palisade
Corp., 1993) and Excel Version 5.0 (Microsoft Corp., 1994) software.

Senior level chemists applied existing data system procedures to aid in the manual or automated
identification of difficult or complex TICs from the existing CLP data. The procedures available included the
following techniques: background subtraction, spectral enhancement using predefined data system
algorithms combined with background subtraction, optimization of certain mass ranges, global and local
normalization factors or tilting, changes in search speed, minimum and maximum molecular weight,
minimum number of ions to search, comparison of retention times, and various types of spectral cleanup and
background subtraction before library searching.

TICs that can cause analytical interferences were evaluated. These compounds varied in direct
environmental significance, but they also can present difficulties in evaluating other closely eluting TICs.
The feasibility of applying or designing computer algorithms that subtract these interfering TIC spectra was
investigated, resulting in the development of an algorithm based on Colby's concept for spectral
"deconvolution" or spectral enhancement.

We studied related, recent work on compound identification for applicability to the goals of this
project. For example, Donald Scott (EPA-NERL-RTP) has developed a molecular weight estimator
program;4"6 Bruce Colby (Pacific Analytical Labs) has studied the concept of spectral deconvolution for
overlapping peaks;7'9 Stephen Stein (NIST) has studied the probabilities of correct identifications from mass
spectral library searches,9 and has tested search algorithms for compound identification.10

6.1 NIST and Wiley libraries

The Hewlett-Packard BigDB EI-MS library (130,000 spectra; 1986 version) was used for most
library searching. This library includes the 49,000 entry NIST and 81,000 entry Wiley libraries, both also El
mass spectral libraries. The 75,000 entry NIST library (1992 version) was also used for searching.
Numerous cases were noted where no library match was found by the data system, or the library matches
were of such poor quality that the databases were ineffective in correctly identifying the TIC. Since the time
of this study, these libraries have been expanded by the inclusion of new entries.

Three features of the data bases are noteworthy: 1) there are cases of multiple spectra present for the
same CAS registry number (replicate spectra); 2) there are numerous instances where ions are present in the
reference spectra at m/z values higher than the molecular ion mass cluster for the molecule. These higher
mass ions indicate the presence of impurities or incorrect spectra; 3) quality indices (QI) are assigned to
spectra in the data base. Factors 1) and 3) are qualitatively useful, and data interpreters can use them to
augment manual mass spectral interpretations of unknown TICs against library spectra.

-------
The Hewlett-Packard BigDB library has the advantage that it contains additional compounds not
present in the NIST data base. However, searches using the HP software for compounds present in both
libraries were found to be similar in effectiveness to searching the NIST library alone,. This situation
indicates that spectral quality and accessibility via computerized searching are similar for the two libraries.
The HP data system output did not specify whether selected spectra were from the Wiley or NIST data bases
within the BigDB library. However, searching a larger library did take more data system time, and often gave
more apparent matches that were caused by the presence of duplicate spectra of the same compound. Library
searches using unique masses tend to be fast, whereas those using non-unique masses, such as those of
aliphatic hydrocarbons, take longer due to the presence of many spectra in the data base having those masses
in the spectra.

The distribution of compounds contained in the NIST 75K database as provided on the HP UNIX
GC/MS data system is shown in Table 6.1.

It can be seen that about 4 % of the compounds present in the NIST library have molecular ions
above the CLP-required scanning range (35-500 Da) and about 17% are duplicate spectra. For example, the
75K version of the NIST database includes nine spectra for decafluorotriphenylphosphine (DFTPP), four
spectra for cholesterol, and five spectra for cholesterol trimethylether. Of the three spectra for
hexachlorophene, one does not contain the molecular ion/isotopic molecular ion group (m/z 404 ... 416), but
rather a single low-intensity peak at m/z 407 (entry #74,279). The presence of multiple spectra for a
compound results in more than one of the top three or five hits that are printed by the data system being the
same compound, rather than different possibilities. This situation results in the data package containing
fewer unique possibilities for the TIC identity, and the data user does not have potentially valuable candidates
against which the TIC identification could be reinterpreted.

Some hydrocarbon spectra are incomplete, showing only the lower intensity but important ions above
m/z 300 (e.g., hentriacontane and 3-methylhentriacontane) or m/z 150 (e.g., tritriacontane, 2-
methyldotriacontane, 3-methyltritriacontane). Other hydrocarbon spectra are complete, showing the
characteristic saturated hydrocarbon ions such as m/z 43, and 57. This presentation results in the lower-
intensity responses for the high mass ions difficult to observe (e.g. tetratriacontane). Both the full spectrum
and the partial higher mass spectrum are useful for certain purposes, the former for the molecular weight, and
the latter for verification that the low mass saturated hydrocarbon pattern is present.

Because these mass spectral databases are intended to serve multiple purposes, they include
compounds that are intractable (not suitable for introduction through the GC-peptides and other naturally
occurring biological molecules of high molecular weight), and compounds that are not generally found at
hazardous waste sites. The large databases increase the amount of time it takes to perform computerized
library searches. A shorter subset of the data base without duplicate and aliphatic hydrocarbon spectra would
reduce search times perhaps 15%. However, much of the time involved in using the larger library does not
impact the cost of analysis, because searches are usually performed automatically. Labor costs are incurred
for the time needed to interpret the results, prepare the summary form (Form I), and to make duplicate copies
of the data package. The added costs should be balanced against the potential that an unexpected compound
may be found in a sample, and could be identified if the data base were larger. As an example, a series of
arylalkylamines was identified in a CLP-RAS case, partially because one member of the series was present in
the 104,000-entry Wiley library. This library was searched in an effort to identify this series of TICs whose
elemental compositions were known from high resolution mass spectrometric (HRMS) accurate mass
measurements. Generally, when attempting to determine the chemical structure of unidentified compounds,

-------
the largest possible library increases the probability of directly matching the compound or of recognizing
substructural features from compounds having these in common.
Table 6.1. Molecular weight ranges of compounds in the NIST data base.
Molecular Weight Range
15-34
35-99
100-199
200-299
300-399
400-450
451-499
500-599
600-699
700-799
800-899
900-999
1000-1260
Duplicate spectra
Total
# of compounds
19
1407
21274
19959
11187
3229
1840
1936
799
338
100
51
57
12631
74828
% of Database
0.03
1.88
28.43
26.67
14.95
4.32
2.46
2.59
1.07
0.45
0.13
0.07
0.08
16.88
100.00
6.2 File formats

At present there are 36 file formats being used by various GC/MS data system vendors. The five
most common formats applied to environmental analysis are the HP RTE, HP DOS, Finnigan INCOS™, VG
DOS, and Extrel DEC formats. Due to differences in these formats, it has been difficult to compare
quantitation and library search procedures between various software packages and hardware platforms. This
comparison is difficult even with two data formats arising from the same GC/MS data system vendor.

A universal GC/MS data format (netCDF) has been proposed, although the various instrument
vendors have not folly agreed on its exact format. This data file standard is proposed by the International
Association of Environmental Testing Laboratories as a universal file format for exchange of raw GC/MS
data between different GC/MS equipment vendors. The common format will enable transfer of GC/MS data
10

-------
files to different hardware platforms. This file format will allow processing of any one raw GC/MS data file
using software from multiple vendors. To date, the netCDF format is standardized for GC data but not for
GC/MS data. Converting netCDF to ASCII text with public domain software would allow for the potential
of data fraud, by a lab manipulating the data because it is in the flexible, editable ASCII format. In addition
to this concern, another limitation to the use of netCDF is that even if this format becomes available,
additional software may be needed that would not be compatible with older systems.

The issue of file formats is significant for studies of mass spectral data system capabilities. Prior to
this project, little work had been done to compare the various search algorithms for accuracy in identifying
TICs found during the analysis of environmental samples. Most laboratories lack the ability to interconnect
dissimilar data systems by means of a local area network (LAN) that could transfer data files and convert
them from one format to another. Each data system requires that the data file be in the proprietary format
used by that manufacturer. Therefore, we obtained information about available algorithms and compared the
results of using different algorithms. To make such comparisons on identical data sets, we developed the
capability to move files from one system to another through a DOS-based PC network, and subsequently
interconvert file formats.

6.3 Background Subtraction

The use of software subtractions was investigated for TIC spectra in samples that were heavily
contaminated with long-chain hydrocarbons. It was found that high concentrations of hydrocarbons would
render such subtractions of minimal value. The presence of unsaturated and saturated hydrocarbons would
further complicate the situation.

Software supplied with most systems for the analysis of environmental samples have simple
background subtraction techniques for the removal of moderate interferences. The Finnigan system allows
for the subtraction of a single scan, multiple scans on one side of the target peak, or background subtraction
from both sides of the target peak if there are multiple interferences present. The HP system has similar
capabilities. In this study, it was found that both of these systems were fully satisfactory.

6.4 Isotopic ratio and elemental composition program

A program written by Dr. Andrew H. Grange, currently National Research Council/NERL-ESD-LV
Senior Research Associate, shows the correct peak cluster ratios for any combination of chlorine and bromine
in the range from 0 to 10 bromines and 0 to 16 chlorine atoms. This program also calculates candidate
elemental compositions of a molecular ion and isotopic molecular ions containing bromine and chlorine
atoms. This program was used to support HRMS accurate mass determinations.'' •'2
7. ALGORITHMS AND PROCEDURES FOR TIC IDENTIFICATIONS

The process for determining TICs in mass spectral data involves two separate and distinct
operations. First, a peak in the gas chromatogram must be isolated from other interfering compounds, and a
representative mass spectrum must be obtained for the peak. Second, the mass spectrum corresponding to the
peak must be searched against a mass spectral data base. There are two common methods currently being
used to isolate and remove interferences in mass spectral data. These are application of manual background
subtraction procedures and the use of various algorithms to isolate the spectra of interest.
11

-------
7.1 Background Subtraction

The simplest method of mass spectral enhancement is background subtraction. This method was
useful for the elimination of background responses but provided limited success when two or more
components coeluted.

7.2 Biller-Biemann Spectral Isolation Algorithm

A more advanced background substration type of method for the isolation of spectra is the Biller-
Biemann process, available on Finnigan and some other data systems. The data system searches the mass
chromatogram and flags mass spectra where a set number of ions maximize within a set number of scans of
each other. Typically the number of maximizing masses is three and the scan range is two scans. After this
has been done, these scans are processed using an enhancement routine that discards all other ions that do not
maximize within that retention time window. This algorithm has been used commercially on Finnigan™ data
systems for about 20 years with good success.

7.3 Mass Spectral Enhancement

Fourier transform of data from intensity vs. time to intensity vs. frequency was tested. It was found
that the required frequency resolution was not readily available. A two-dimensional package might prove
useful, but the only commercially available packages were one-dimensional and were found to broaden the
chromatographic peaks, decreasing the apparent resolution rather than increasing it as would be desired. A
quadratic fit was found to be key (see below) in achieving the desired spectral enhancement and background
rejection.

A method of mass spectral enhancement via background rejection was developed in this project,
employing a concept proposed by Bruce Colby of Pacific Analytical Laboratories.7"8 Colby suggested a
resolution enhancement approach related to the Biller-Biemann process. It includes features that resemble a
digital implementation of the widely used phase-locked amplifier in electronics.

This method of background rejection and spectral enhancement for the identification of unknowns is
a potentially valuable substitute for conducting further analyses of the samples. The capabilities of modern
personal computers makes it worthwhile to consider the use of mathematical methods for spectral
enhancement and background subtraction to isolate the spectra of unknown compounds of interest from
interferences. This is especially evident when there are large numbers of samples present from a site
containing numerous interfering compounds.

In this study, we considerably expanded, developed, and implemented the resolution enhancement
approach as a set of macros in a Microsoft EXCEL™ spreadsheet.13 A description of the resolution
enhancement approach follows below. Results of applying these algorithms to environmental samples are
presented in Section 8.

The spectral enhancement approach was designed and implemented as a set of macros written in
Visual Basic for Applications in Microsoft EXCEL™. Mass spectral data were accessed from a Hewlett-
Packard data system and the National Institute of Standards and Technology (NIST) mass spectral data base.
The data files were converted to ASCII text with MassTransit™ Version 1.02a (Palisade Corp., 1993) and
imported into EXCEL™ Version 5.0 (Microsoft Corp., 1994). The intensity maxima (peaks) for each ion in

-------
a mass chromatogram were determined and their raw retention times (scan numbers) were noted. The raw
scan numbers were adjusted to fractional scan numbers for each m/z to match the precise time during a scan
when that mass was measured.  This adjustment was expressed as an offset term to the scan number, Os,
defined as follows:

                              Os = (MCmT-MMin)/(MMax-MIIJ                        (1)
where
                              M^ is the current mass
                                 ,^ is the minimum mass sampled, usually m/z = 35
                                   is the maximum mass sampled, usually m/z = 500,
The numerator represented how far into the scan the mass occurred and the denominator represented the full
scan range. Hence, for linear scanning, Os represented the simple fraction of the full scan completed at the
given mass.

        Near the apex of a mass chromatographic peak, the peak generally appeared to be reasonably
parabolic, and could be represented by the following quadratic equation:
                                                                            (2)

where
                               Y is the intensity
                               X is the retention time (scan numbers)
                               a, b, and c are constants to be determined.

The quadratic form was exactly fitted to each peak ion intensity and the intensities of the preceding and
succeeding scans. By selecting these three intensities, the three coefficients could be exactly determined.

        The retention time of the apex of the fitted curve was expected to be an accurate estimation of the
true retention time of the constituent of interest, within constraints imposed by signal noise from scan to scan.
This apex was readily found using the common technique of taking the first derivative of the quadratic
function and setting it equal to zero, as follows:
                                          b = 0                                              (3)

where
                               Y' is the first derivative.

        A variety of methods, including packaged optimization routines in EXCEL™, were available for
solving this algebraic problem. A fast, extremely simple axis-translation solution to the problem was selected
that reflected the simplicity of the mass chromatographic data in a spreadsheet format. A peak intensity, I0,
presented in scan co-ordinates in the spreadsheet, was the origin for the fitted quadratic equation, and each of
the adjoining intensities was set to coordinates of 1 and -1, respectively, with values of IR(ight) and IUeft). These
changes of variable provided a simple closed form for the offset of the apex, OA, in scan coordinates:

                               0A = -(IR- 10/2(1,, + IL-2I0)                     (4)
                                                 13

-------
The optimized retention time for the mass was obtained by adding Os and 0A to the raw scan number of the
peak. Once a peak was identified, the entire optimization procedure was compactly performed by a single
line of code in the macro that contained only one multiplication and two division operations.

The quadratically optimized retention times, masses, and observed peak intensities from the mass
chromatograms were stored in a list as they were calculated. After all of the peaks were optimized, the list
was sorted with respect to retention time. The intensities of all masses falling within selected sequential
retention time windows (summing intervals) over the entire chromatographic time range were summed and
placed into a second list of retention times and intensities. The summing interval duration was set at the
beginning of the experiment, typically between 0.1 and 0.33 scans. The selection of a summing interval
shorter than one scan yielded a total ion current chromatogram with greatly enhanced resolution, showing
distinct, baseline resolved peaks under the high levels of background signals. The mass spectra obtained for
these mathematically resolved peaks were free of most background mass responses because few of the
background masses maximized coincidentally with the masses of the peaks under investigation.

After the spectral enhancement procedure was complete, a peak of potential interest was selected,
giving the background-rejected mass spectrum. The mass spectrum was exported with the appropriate header
to a text file. This file was imported into a mass spectrometer data system through a program such as
MassTransit™, and the mass spectrum was searched against a reference data base such as the NIST mass
spectral library. Table 7.1 outlines the steps required to perform a spectral enhancement analysis of a mass
chromatogram.

Table 7.1. The steps performed in the spectral enhancement operation.
Step
1
2
3
4
5
6
7
8
Action
Convert the raw data into text format.
Parse data using EXCEL™ version 5.0 into a two dimensional matrix placing the reported intensity
for each ion present in a cell indexed to the mass and scan numbers.
Start a sliding window looking for all ions that are present in three successive scans, and identify
each case where the ion maximizes in the center of the window.
Adjust the nominal scan time for time lag due to instrument scanning, Os.
Adjust the peak retention to observed ion distribution using quadratic fit, 0A.
Place the time corrected data in a new matrix and sort with respect to the adjusted retention time.
Apply a filter to group ions based on adjusted retention times and generate a resolution enhanced
chromatogram.
Extract ions within a given time range (peak) to produce a mass spectrum with enhanced
background rejection
14

-------
7.4 Biller-Biemann Library Search Algorithm

The Biller-Biemann type library search algorithm currently used by Finnigan and Fisons (formerly
VG Masslab) uses a "sliding window" to determine the 16 most significant ions present in the unknown.
This window is typically 20 Da wide. This width is meant to ensure that all ions contained in characteristic
tightly-grouped mass clusters (such as from chlorine, bromine, or patterns exhibited by heavy metals) are
retained as significant peaks to be searched against the reference spectra contained in the data base. The
algorithm also uses peak intensity weighting. It multiplies the intensity of each peak by its mass number to
give higher priority to low intensity peaks at higher masses. This weighting is especially important when the
molecular ion is of low intensity. The algorithm then selects the 16 most intense peaks, and searches this
reduced "chemically significant" mass spectrum against a condensed version of the specified library that has
the 16 largest "chemically significant" peaks of every entry. After determining the 20 best candidate spectra,
the algorithm then searches the full unknown spectrum against the full spectra of those 20 best candidates
from the library. The ranking can be performed against quality of fit, reverse fit, or no fit, at the discretion of
the user. During the expanded search, the algorithm again adjusts or weights the experimental ion intensities
by multiplying them by the corresponding m/z value, increasing the significance of low intensity high mass
ions.

During the main search, the fit (FIT), reverse fit (RFIT), and purity (PUR) are determined for the
complete experimental spectrum against each of the candidate spectra. The FIT ranking rates the degree that
the library spectrum is present in the unknown. The RFIT ranking rates the degree to which the unknown
spectrum is contained in the library spectrum. The PUR ranking rates the resemblance of the unknown to the
library entry. The three parameters FIT, RFIT, and PUR are scaled using values which have a range of 0 to
1000. Clean spectra usually produce matches that have numerically high FIT and PUR values. Spectral
search results with high FIT but lower PUR indicate the presence of coeluting compounds or interferences
that have not been adequately subtracted out. It is left to the data system operator to decide whether the
library search results will be ranked by FIT, RFIT, or PUR.

7.5 Probability-Based Matching Algorithm

A second library search algorithm, used primarily by Hewlett-Packard and Extrel, involves
determining the uniqueness of the ions present in the unknown spectrum. This algorithm was developed by
Dr. Fred W. McLafferty at Cornell University and is called probability-based matching (PBM). Initially all
ions present in an unknown spectrum are assigned uniqueness values based on the number of times each mass
occurs in the NIST data base. The unknown is then compared against all entries in the NIST data base.
According to the HP 59872 RTE MS Data System Manual, this algorithm is based on the fact that the
probability particular ions will occur follows a log normal distribution and that the probability of finding
higher mass ions decreases by a factor of two each increment of 130 mass units.

The PBM algorithm uses only a reverse search to determine the ranking of NIST candidate spectra
against the unknown. A reverse search means that each library spectrum is compared against the
experimental spectrum to determine if the library spectrum is contained in the experimental spectrum. The
PBM procedure is significantly different from the Biller-Biemann technique used by Finnigan and Fisons;
however, the output is similar to that produced by the Finnigan method. The various parameters that are
reported and are useful include the following: Prob-the probability that the NIST data base spectrum
matches the unknown spectrum; K--the confidence factor, from 15 to 250, with a high number indicating
15

-------
great similarity between the unknown and the library entry; dK-the difference between a perfect match and the
confidence factor K, with a low dk value generally indicating a good match.

The McLafferty PBM algorithm is currently in use by Hewlett-Packard and Extrel for the
identification of unknown compounds. This algorithm uses a filter to reduce the number of ions present in
the unknown spectra to a subset of between 15 and 26 chemically significant ions by eliminating three ions
(m/z 18,28 and 32) and then eliminating fragments that represent illogical neutral losses (i.e., loss of 9 Da)
from the spectrum of the unknown. The intensities of the remaining ions, including the molecular ion, are
weighted by their masses, as occurs in the Biller-Biemann procedure. The mass peaks are assigned values
based on the probability of their occurrence (uniqueness values) in the mass spectral data base as a whole.
The filtered spectrum is used to search a condensed subset of the NIST and/or Wiley data base to choose
candidate spectra that will then be matched against the full mass spectrum of the unknown. The final
comparison uses all peaks present in the original spectrum against the full spectra of the candidate matches
present in the reference database.

The PBM algorithm offers the advantage of fast search speed, especially for compounds that have
high-intensity molecular ions above 300 Da. The tilting function accommodates differences in instrument
tuning and variations caused by differing instrument types (magnetic versus quadrupole). This algorithm
performs less well for compounds having low molecular weights (e.g., hydrocarbons) and compounds that
have very few ions (e.g., acetone), as discussed below.

7.6 Performance Comparison of PBM and Biller-Biemann Algorithms

A comparison of the two library searching algorithms was performed on 20 CLP data files. Library
searches were conducted with an HP DOS data system using PBM and with a Finnigan system using the
Biller-Biemann algorithm. The results demonstrated that the two algorithms performed to the same level of
quality and reliability for compounds in the molecular weight ranges associated with semivolatile TICs.

The PBM algorithm performed less well for low molecular weight compounds with few ions because
PBM uses 16 to 25 ions for pre-searching and matching, and because the uniqueness values associated with
low mass fragments are usually small.
7.7 Normalization and Tilting Algorithms

The library search algorithms include the optional feature of global and local normalization factors
(Finnigan) or tilting (Hewlett-Packard) for matching the library spectral relative intensities against the
unknown spectrum. This feature adjusts the spectrum of the unknown to matches in the library database.

Global normalization multiplies all ion intensities in the unknown spectrum by a global
normalization factor to make the average intensity of the peaks similar to those of the library entry. In a
second step (local normalization), individual peak intensities in the unknown spectrum are normalized to the
corresponding peaks in the library spectrum. This system does not alter peak intensities by more than a
factor of two, and peaks that are not found in both spectra are not normalized. This procedure is intended to
correct for mass spectral differences that may occur when the same chemical compound is analyzed under
different experimental conditions.
16

-------
In the Hewlett-Packard tilting procedure, a comparison is made between the library spectrum and the
unknown. The coefficients for a quadratic equation are determined to provide normalization factors for the
best fit of the peak intensities in the library spectrum to those of the unknown. The library entry is then
rescaled using these factors, and library search results are reported using the rescaling coefficients that gave
the best results. Unlike the Finnigan data system that adjusts the unknown mass spectrum to fit it to the
library entries, HP adjusts the library entries to get them closer to the experimental mass spectrum of the
unknown. The two procedures provided similar results in this project.
8. ADVANCED DATA HANDLING STUDIES

One goal of this work was to investigate the option of identifying TICs that were reported as
unknown using computer software and related studies of the data, as opposed to conducting further analyses
of the samples. For highly contaminated samples, it has traditionally been necessary to re-extract the sample
using different procedures, to perform alternate clean-up methods on the extracts, or to use highly specialized
instrumentation to remove interferences and separate the TICs of interest from mass spectral contaminants.
These procedures are time consuming and costly. In addition, it is only possible if samples or extracts are
available.

The capabilities of modem personal computers makes it worthwhile to consider the use of
deconvolution algorithms, background subtraction techniques, and other mathematical methods to isolate the
spectra of unknown compounds of interest from interferences. This is especially evident when there are large
numbers of samples present from a site containing numerous interfering compounds.

After a given data file is translated into a suitable format, spectral isolation or enhancement can be
performed in a matter of minutes as opposed to the hours of time necessary for re-extraction/dilution, sample
cleanup, and concentration. If an effective method of spectral deconvolution were generally available in an
appropriate software format, the time necessary for the identification of many TICs could be reduced from
several labor hours to about 20 minutes. This advantage is particularly useful when limited amounts of the
initial sample are available.

In this study, advanced data system and aftermarket software-based procedures were applied to
archived data to provide a more complete assessment of current CLP TIC reporting status. We purchased the
commercial MassTransit™ software and tested its ability to facilitate converting data files into formats
usable by other data systems, including spreadsheets on personal computers. Interfacing the different MS
data system formats through this software effectively accomplished a standardization of TIC data.

8.1 MassTransit™

MassTransit™ is a commercial software product developed by Palisade Corporation. It accepts data
files in 36 different GC/MS data file formats (Table 8.1) and produces output files in any of the seven
formats listed below in Table 8.2.
17

-------
Table 8.1. Data file input formats, including commercial brands, supported by MassTransif
        AnelvaAGS-7000




          Anelva DOS




       Balzers Quadstar 420




       Balzers Quadstar 421




             EPA




         Finnigan INCOS




         Finnigan ITS40




         Finnigan ITS80




       Finnigan MAT SS300




       Fisons/VG MassLab




     Fisons/VG Lab-Base/Trio




        Fisons/Thermolab




        Fisons/VG 11/250




        Fisons/VG JCAMP




             Hitachi




         HP Chemstation




            HPRTE




        JEOL Complement
     JEOL Mario




JEOL DA50000-DA7000




     JEOL CAMP




     Kratos DS90




    Kratos MACH3




      MASPEC




    Nermag SIDAR




       netCDF




       Netzsch




       Palisade




Perkin-Elmer Qmass 910




   Shimadzu PAC200




   Shimadzu QP-5000




    Shrader System




   Teknivent Vector/1




   Teknivent Vector/2




         Text




     Varian Saturn
                                    18

-------
Of these input formats, INCOS™ (Finnigan), and two Hewlett-Packard formats were available for
testing through the DOS-based PC network. The network connection was needed to convert the data into a
personal computer DOS-based format. The Fisons data system was not connected to the network, so this
format was not tested in this study. Note that the Finnigan ITS40™ and ITS80™, and the Varian Saturn™
formats are ion trap data systems.
Table 8.2. Output data file formats, including proprietory commercial, supported by MassTransit"
HP Chemstation™
netCDF
EPA
Text
Palisade
Fisons/VG JCAMP
Teknivent Vector/2
For these studies, the HP Chemstation™ and text output formats were used. Concerning the other
MassTransit™ output formats, netCDF is not finalized; EPA format is for 9 track tape storage and is not a
true data system format for data handling. The Text format allows reading but is not used by data systems.
However, we used the Text format output files to transfer ASCII data to EXCEL™ spreadsheets to perform
spectral enhancement analyses. Palisade is a proprietary MassTransit™ format. The Fisons and Teknivent
formats were not available for data processing in this study.

We tested the capabilities of MassTransit™ on a PC-based local area network (LAN), taking raw data
files from a Finnigan INCOS™ data system and converting them to HP Chemstation™ DOS-based data
system file formats. A raw Finnigan data file from a Finnigan INCOS™ data system was converted to EPA
data file format using the Finnigan EPA utility program. The file was then downloaded onto a PC using the
trivial file transfer protocol (TFTP) and converted to the HP Chemstation™ data format using the
MassTransit™ software. Next, the file was transferred over the LAN using file transfer protocol (FTP) to the
HP UNIX data system, where library searching was performed with the PBM algorithm.
8.2 Molecular weight estimation

Lockheed requested and received information and the program for molecular weight estimation from
Donald Scott (EPA-NERL-RTP).4"6 Our preliminary review of this program indicated that it was most useful
for low molecular weight organics (e.g., gas sample analysis). In some cases, the molecular ion was estimated
19

-------
only to within several Daltons. This program in its present state of development was less useful for higher
molecular TICs, where the correct mass of the molecular ion was very important for identification.

8.3 Application of Colby's concept

The original reports on mass spectral enhancement and background signal rejection techniques
demonstrated their potential on solutions of known reference standards.7"8 In the present study, subject data
files on actual environmental sample analyses were translated into a format suitable for a personal computer
spreadsheet. Spectral isolation or enhancement was performed in a matter of minutes, as opposed to the
hours of time necessary for re-extraction, sample cleanup, and concentration.

The capabilities of this method were used to resolve a complex chromatogram that appeared to have
several indistinct and broadly eluting components into a highly resolved elution pattern containing potentially
significant components, separated from the background contamination. In Figure 8.1, pollutants having
unique mass spectral features were separated from broad aliphatic hydrocarbon background signals that
eluted across about 20 scans. The procedures were used also to increase the quality of mass spectra selected
as a result of inspections of these chromatograms. Improvements in mass spectral quality were measured in
terms of the increase in quality of mass spectral fit parameters reported by die data system performing the
library search.

We converted GC/MS data from various contractors into ASCII text format using the MassTransit™
software as discussed above. These data files were subsequently imported into an EXCEL™ spreadsheet and
manipulated using a macro to perform the sorting and the statistical and mathematical procedures necessary
to separate the selected peaks from interfering compounds. The resulting total ion current mass
chromatograms were evaluated to determine whether the duration of summing intervals significantly affected
the results. Figures 8. la through 8. Id show the effects of varying the summing intervals from 0.05 scan to
0.5 scan. Summing intervals in the range of 0.05 to 0.1 scan did not provide sufficient integration of
individual mass responses for reliable detection of small peaks, such as the one in the range of 7.0 to 8.0
scans (denoted # 1 in Figure 8.1). Using a scanning interval of 0.33 scans, this peak was readily observable.
At the wider summing intervals such as 0.5 scan, closely eluting peaks such as those occurring in the scan
range of 131 to 134 (denoted #2 in Figure 8.1) were not resolved adequately. These peaks were readily seen
to be distinct with a summing interval of 0.1 or 0.2 scans. Based on these observations, the optimum
summing interval seemed to be between 0.2 and 0.33 scans. Under these conditions the correct elution width
of a pure compound needed to contain all of the compound's ions can range up to approximately 1.5 scans, as
demonstrated by the enhanced mass chromatograms of indeno( 1,2,3-cd)pyrene and dibenzo(a4i)anthracene
shown in Figure 8.2. In another case, using a 0.2 scan summing interval, the EXCEL™ macro successfully
baseline-resolved 10 known components that eluted within a period of 20 scans.
20

-------
              2500
          in
          •£   2000
          D
         ••=•   1500 -


          I
         o   1000 -
          c
               500
                0 -
a) 0.05 Scan per Summing Unit
                        c) 0.3 Scan per Summing Unit
b) 0.2 Scan per Summing Unit
                                                       d) 0.5 Scan per Summing Unit
                                             11 ft r"|Tiii tf in i fri*?] > I'i'i | i \*i'ft iii^\u\snr\ iwi [.. i q f,, \y, it, 11 .i 11-. 1-1 vp
                      10  20  30  40  50  60  70  80 90  100 110 120 130 140  0  10  20  30  40  50   60  70  80  90  100  110

                                        Scan Number                                          Scan Number
Figure 8.1. Comparison of native and enhanced  total ion current chromatograms demonstrating effects of different summing intervals.

-------
40000
w
"H 30000-
D
S, 20000
>,
c
0) 10000-
5760
2270
2280 2290
Scan Number
2300
2310
Figure 8.2. Enhanced total ion chromatogram of indeno(l,2,3-cd)pyrene and dibenzo(a,h) anthracene
showing natural peak widths of late-eluting compounds.
Initial tests of the spectral enhancement algorithm were performed using standards containing known
coeluting compounds to check the technique for accuracy in separating the spectra of coeluting analytes. The
extracted mass spectra were compared with reference spectra contained in the NIST data base. Results of
these comparisons showed that usually 80 to 90 percent of the ions in the experimental spectrum that were
common to ions in the reference spectra were successfully extracted using this method. The ions that did not
extract well were those with a low ion current and a relatively high noise level.

A dramatic demonstration of the method's ability to extract improved mass spectra from a
chromatogram with high background signal is shown in Figure 8.3. This example, based on the mass
spectrum of naphthalene, showed the value of the spectral enhancement technique for library searches. The
enhanced total ion chromatographic peak profile shown in Figure 8.3.a predicted that the mass spectrum was
contained in the scan range 17.2 to 18.0. The mass spectrum in Figure 8.3.b was obtained from a mass
spectral data system by using the standard background subtraction technique of averaging the three spectra at
the peak max and subtracting the average mass spectrum from the two adjoining minima. The data system
was unable to identify this mass spectrum. The enhanced mass spectrum shown in Figure 8.3.c (scan range
17.2 to 18.0 scans) was correctly identified as naphthalene by the data system, which also produced the
reference mass spectrum in Figure 8.3.d. The failure of the standard procedure appeared to be heavily
dependent on the absence of the masses at m/z 127 and 129. The artificial enhancement of the peaks at m/z
50, 51, 61 to 64, and 74 to 78 did not hinder the search routine, because it strongly weighted the apparent
molecular ion in making identifications.

Other examples of improved mass spectral quality were provided by the spectral matching quality
indicators generated by the mass spectral data system for background subtracted mass spectra directly
extracted by the data system against those that were enhanced and extracted using the spectral enhancement,
background rejection technique described in this report. The comparisons shown in Table 8.3 utilized the
data for the mass chromatogram in Figure 8.1. Both of the examined fitting quality parameters showed small
but systematic improvements after subjecting the data to the spectral enhancement procedures. Such
22

-------
apparently small improvements can, however, provide significant improvements in mass spectral library
search results, as has been demonstrated on solutions of standards.8
             Table 8.3.  Comparison of native and enhanced mass spectral quality indicators.
Compound
2,5-Dimethyl-
benzo[b]thiophene
Tetradecane
1 -Ethylnaphthalene
Scan # (Fig.
8.1.c)
12
16
19
Native
Qual. Fact.
89
89
96
Enhanced Qual.
Fact.
91
93
97
Native
Cross Corr.
9606
8462
9904
Resolution
Enhanced Cross
Corr.
9927
9621
9954
                                                23

-------
 in
 'c
 3
 o
 _o
 75
 4-1
 O
                                                          Resolution Enhanced Naphthalene
                                          10               15
                                             Scan Number
in
C
V
c
0)
_>
75
01
tr



&
'in
£
'c
1
75
"S
oc




£,
"in
c
£
c
a
^
«3
rc
u
tr





600"

400 ~


200 ~






.1

t
1 II
1 .11, ,ll .. ,.
0 ' 40 ' 50 ' 60 ' tQ ' 80 ' 90 ' 100
j
800-

600-

400-


200-
Q
















' llO ' 120 ' 130 ' 140 ' 15

c) The relative intensities of these masses are
significantly greater than those in the reference spectrum
^^J\

^--'^// \ background-subtracted





1
^y^^ / 1 \ Masses missing from
•^^ / I \^^ mass spectrum v
y i NV
J7 rt ^\
ll 1 '
. ,lll Illl ..., ....
' 40 ' SO ' 60 '' 70 ' '8b"'r '90 ' 100
1000 -, 	

800-


600-

400-

200-

d)








,
1 1 .,.,. .1

40 50 60 70 80 90 100
m/z
\\
\\
1
ll








I
1 110 ' 120 ' 130 ' 140 ' IS










.1









I
1 ' i i i i ' i i ' i l i
110 120 130 140 15

Figure 8.3. a) Native and enhanced total ion chromatograms of a naphthalene-containing mixture; b)
native mass spectrum of naphthalene in mixture retrieved by data system; c) enhanced mass
spectrum of naphthalene in mixture; d)  NIST reference spectrum from those chromatograms.
                                                  24

-------
A modification of the spectral enhancement procedure by calculating the quadratically predicted
intensity at each peak was tested. The difference between any given enhanced peak relative intensity and the
relative intensity derived from raw data in the native mass chromatogram was less than 2%. Therefore, this
modification to the approach was not studied further.

The stability of the quadratic fitting procedure was tested on 13 peaks with apex offset values (Of)
in the range of ±0.45 scans. Mass spectrometer noise was simulated by randomly adding noise values in the
range ±5, 7.5, or 10 % of the experimental value to each of the three experimental values used to calculate the
quadratic approximation. These treatments simulated total noise vs. signal levels of 10,15, or 20 percent
peak-to-peak (p-p), respectively. The standard deviation was plotted against the OA value (Figure 8.4).

The data showed that for scan-to-scan noise levels up to 20 percent, p-p, the quadratically predicted
0A generally varied less than ±0.2 scans at the 95 percent confidence level (see Figure 8.4). This situation
indicated that the underlying mass spectral signal maximized within 0.2 scan of the calculated value if the
scan-to-scan noise was less than 20 percent, p-p. Although scan-to-scan noise information was not readily
available for commonly used environmental mass spectrometers, it appeared to be unlikely that scan-to-scan
noise levels would approach 20 percent in properly operating units. This value of 0.2 scans for an
approximate OA uncertainty was about the same as the optimum summing interval. Therefore, a
quadratically optimized peak would fall within one summing interval of its correct position, an acceptable
situation because mass spectral peaks were typically 4 to 6 summing intervals wide.

A simple comparison of the results obtained by performing the OA optimization and by omitting this
step is shown in Figure 8.5. The results obtained by correcting for the scan offset, Os, but not for OA are
shown in Figure 8.5.b. These data were characterized by a series of peaks of unit scan width, as should be
observed. The procedure simply identified those ions maximizing in that mass spectrometric scan. The
appearance of doublets in most of the peaks was an artifact of the summing interval, 0.1 scans, and the
distribution of ion intensities in common environmental mass spectra. The same enhanced mass
chromatogram, but incorporating the quadratic optimization term, 0A, is shown in Figure 8.5.a. The primary
effect of including inter-scan effects was a sharp reduction in the number of observable "peaks." This was
particularly apparent in the region of scan numbers 1935 and 1939. The only major observable addition to
the chromatogram was the emergence of a potential very narrow peak at scan number 1928.7. However, the
appearance of two peaks in this region resulted from the short 0.1 scan summing intervals used; the two
"peaks" merged into one when a 0.2 scan summing interval was used. This occurrence emphasized the
importance of selecting the proper sized summing interval. Examination of mass peak sequences showed that
neutral loss sequences were interspersed across both peaks so that they should be considered one peak,
reinforcing the previous conclusion about selecting the proper summing interval length. Based on preceding
results, the primary utility of the simple Os-only retention time optimization procedure appeared to be in
studying whether there were any systematic, mass-dependent principles such as low mass discrimination
involved in the inter-scan corrections of the quadratic optimization procedure.

The capabilities of the spectral enhancement concept were ultimately limited by the characteristics of
the EXCEL™ 5.0 spreadsheet. The spreadsheet permitted addressing any section of a mass chromatogram
containing up to 16,000 records. The mass chromatogram was imported, the optimized mass retention times
determined, and the resulting synthetic mass chromatogram was obtained in approximately 5 to 10 minutes.
The resulting summary data such as total ion current mass chromatograms and mass spectra were called
synthetic because they were created from selected subsets of the total data for the run of interest rather than
25

-------
        0.4-
        0.3-
   .5
   '>
    o>

   5   °-2
    u.
    (Q
   •O
    «   0.1
   0)
        0.0
           -0.8
       20% Noise (p-p)
       15% Noise (p-p)
       10% Noise (p-p)
-0.4
     0!0
Apex Offset
Figure 8.4. Plot of the standard deviations of simulated OA under varying noise conditions  versus the
estimated OA value.
      300
        o-f
         1915
                                                           1200
                                                                5*
                                                                sr
                                                                                     -I  800
                                                                                        400  5.

                                                                                             c
                                                                                             3
                            1925
                 1935
                      1945
1955
Figure 8.5. Comparison of a) spectral enhancement results using the quadratic fit, and b) spectral
enhancement results without calculating the quadratic fit.
                                                  26

-------
the complete native data set. The synthetic mass spectrum for each resulting peak of interest was readily
obtained in hard copy or electronic format that could be analyzed by a mass spectral data system.
9. QUALITY ASSURANCE PROCEDURES

To conduct quality assurance reviews of TIC data, at a minimum the following information for TIC
spectral matches should be included in the data package:

1) labeling of the mass axis or major fragments
2) the full name of the spectral match (if possible)
3) the CAS number associated with each spectral match
4) the molecular weight of the match
5) the composition of the spectral match
6) the ranking (score) of the spectral match and
7) a table of ion intensities vs. m/z values.

We tested conversions through MassTransit™ software between the HP formats and INCOS™,
finding that all ion intensities and retention times remained the same. Conversions from and back to, HP
Chemstation™ showed no changes in the data. Conversion from one HP format to the other, and then
workup in the new format gave results identical to workup in the original format. Thus, it was concluded that
MassTransit™ did not modify the data, and that the different data handling software packages available from
HP treated the data in equally authentic ways.

Quality assurance procedures for analytical determinations included acquiring accurate mass
measurements in triplicate, careful calibration of the mass range with PFK, and obtaining SIR-based accurate
mass measurements.11-12 Additionally, peak profiles were not considered valid unless points on each side of
the maximum were observed in addition to the maximum. Some TICs were not amenable to CLP-type
GC/MS analysis due to their low concentrations or to the absence of their mass spectra in the library data
base. Other techniques such as HRMS, LC/MS, and GC/FT-IR provided valuable data in some cases.12
Instances were found where the CLP MS data system rounded off masses incorrectly, where the low-intensity
isotopic members (including the one of lowest mass) of halogen ion groups were not seen, and where the scan
range of 35 to 500 Da used by CLP prevented the detection of the molecular ion.
10. CONCLUSIONS

In this project, 99,513 TICs were reported in 792 Sample Delivery Groups (SDGs) studied (Table
2.1). Of these, the CLP reported identifications with Chemical Abstracts Service (CAS) numbers on 16%.
Not all of these identifications were correct. It was estimated from this study that perhaps 30% of the 16%
portion were correct, and possibly another 10% were correct except that the TIC was an isomer of the
compound whose CAS number was reported. Forty-one percent of the TICs were listed as being partially
identified, and the remaining 43 percent were reported as unknown. Target Compound List (TCL) analytes
were 5 to 10% of the total number of analytes in the Superfund sample data studied, with the remainder being
TICs. If only about 16% of the TICs are reported with CAS number identification, and only about 30% to
40% of those identifications are correct, then perhaps only 5% to 6% of the TICs are correctly identified. The
result is that approximately 84-90% of the analytes remain unidentified under the current CLP requirements.
27

-------
An overview of the data indicated that the most commonly reported classes of TICs were saturated
(branched and straight chain alkanes) and unsaturated (alkenes, dienes, etc.) hydrocarbons. The next most
prominent groups of compounds were PAHs and aromatic compounds that were frequently substituted with
aliphatic hydrocarbons. Higher molecular weight steroid compounds (i.e., cholesterol) and PCBs were also
reported. Elemental sulfur was reported by the CLP laboratories in ca. 50% of soil samples. The TCL
phthalates were found in ca. 80% of the soil samples, and non-TCL phthalates were found in ca. 10%.

TIC mass spectra were found, including some with recognizable chlorine and/or bromine clusters, for
which there are no library spectra in the computerized mass spectral data bases. Such spectra must presently
be manually interpreted. Instances were found where the CLP MS data system rounded off masses
incorrectly, where the low-intensity isotopic members (including the one of lowest mass) of the halogen ion
groups were not seen, and where the scan range of 35 to 500 Da used by CLP prevented the detection of the
molecular ion.

Additional compounds could be included in the NIST and Wiley mass spectral data bases to assist in
identifying TICs for environmental monitoring efforts. These compounds include higher molecular weight
PAHs, and industrial process solvents and chemicals. Additional pesticide metabolites and degradation
products should be included; some are available in hardcopy form but not in commonly used software data
bases.

The reporting trends from the laboratories were highly varied with respect to accuracy of reporting
TICs. While some (ca. 2%) laboratories made honest efforts to identify the TICs, others (ca. 30%) simply
labeled all TIC peaks as unknown and made no attempt to accurately identify the TICs. Some laboratories
(ca. 2%) would only commit to identifying sulfur and reported all other TICs as unknown. Some laboratories
(ca. 20%) used the name of the library match compound with the highest score on the library search, treating
that as the identification regardless of whether that identification was reasonable based upon manual spectral
interpretation.

Data system algorithms and procedures studied in this project included background subtraction,
Biller-Biemann spectral isolation and library searching, and McLafferty Probability Based Matching (PBM).
These software-based capabilities are implemented in the mass spectrometer vendor's data systems.
Procedures for mass spectral resolution enhancement were also developed and tested, using the concept
proposed by Colby and in-house developed spreadsheet-based macros and other procedures. A comparison
of the two library searching algorithms was performed on 20 CLP data files. Library searches were
conducted with an HP DOS data system using PBM and with a Finnigan system using the Biller-Biemann
algorithm. The results demonstrated that the two algorithms performed to the same level of quality and
reliability for compounds in the molecular weight ranges associated with semivolatile TICs.

Commercial MassTransit™ software was tested for its ability to facilitate converting data files into
formats usable by other data systems, including spreadsheets on personal computers. Interfacing the different
MS data system formats through this software effectively accomplished a standardization of TIC data. This
software converted GC/MS data from various contractors into ASCII text.

These data were subsequently imported into an EXCEL™ spreadsheet and manipulated using a macro
written in-house to perform mass spectral resolution enhancement. The spreadsheet macro performed the
sorting and statistical and mathematical procedures necessary to separate the TICs from interfering
compounds. The extracted mass spectra were compared to reference spectra contained in the NIST database.
Results of these comparisons showed that usually 80 to 90 percent of the ions contained in the reference

-------
spectra were successfully extracted using this method. This procedure improved mass spectral quality and
the data system's ability to perform successful library searches. The fit quality parameters showed systematic
improvements after subjecting the data to resolution enhancement procedures. This approach was found to
be effective in extracting mass spectra of individual compounds from background signals, including those of
hydrocarbon mixtures having broad elution profiles on the chromatographic column utilized for the GC/MS
acquisitions. The approach could have significant value for EPA and CLP as a rapid, cost effective
alternative to special extract clean-up and re'analysis schemes for removal of chemical interferences that
render the mass spectra of selected peaks/unknowns difficult or impossible to interpret.

Many analytes of potential interest are not amenable to GC/MS analysis. Other techniques such as
LC/MS can be used to detect these compounds. LC/MS conditions can be selected for analytes that are
thermally unstable, have low-volatility or high molecular weight, or are highly water-soluble. HRMS can be
used to determine accurate masses and elemental compositions.11'12 Some of the above suggestions may be
difficult or costly to implement. An alternative would be for the CLP laboratory to "flag" samples for the
Region or other requestor to consider sending to an Agency "expert" or research laboratory for further study
oftheTICs.

Seven possible improvements for TIC reporting and identification by CLP laboratories were
identified:
(1) require reporting the first library match of TICs on Form 1 if the library match meets a specified
probability level, rather than "unknown" regardless of search results;
(2) require reporting of TICs when the mass spectra, or library matches, indicate the presence of
heteroatoms such as N, P, S, halogen, or heavy metals;
(3) incorporate RRT data and frequency of occurrence of TICs into a data base, and add RRT criteria to
aid in TIC identification;
(4) extend the mass range to 600 Da, with a tune emphasizing greater sensitivity for masses above 300
Da, for better detection and identification of TICs in this higher molecular weight range;
(5) recommend or require that different GC temperature programs or column phases be used in a second
analysis to improve separation of some TICs;
(6) because sulfur was found in approximately 50% of the soil sample data, improvements to the GPC
procedure, a cleanup with copper, or some other improved procedure would be worthwhile;
(7) identify additional compounds suitable for addition to the mass spectral data bases, because they
have been found, or would be anticipated in known types of waste sites.

The first recommendation is currently being implemented for CLP TICs. The remaining six ((2)-(7))
recommendations are offered as tools for continued strengthening of the CLP, and may, as recommended by
one of our external reviewers, best be implemented by using a centralized laboratory. This could be a federal
laboratory performing research in compound identification and capable of supplementing the above
recommendations with special MS techniques (e.g., CI-MS, HRMS,and MS/MS) and other non-MS
techniques with adequate sensitivity (e.g,. FT-IR).

In summary, this project investigated the effectiveness of current TIC reporting under the CLP
protocols. Identification of TICs was found to be of variable quality across participating laboratories. The
commonly used mass spectral data systems were found to provide essentially the same results. The libraries
(Wiley and NIST) could be improved by adding additional compounds relevant to environmental monitoring.
The available algorithms and data system procedures are satisfactory and provide virtually identical results.
Mass spectral enhancement procedures could materially help in identifying TICs by separating TIC spectra of
interest from those of aliphatic hydrocarbons or other background signals.

-------
11. REFERENCES

1.      U.S. Environmental Protection Agency, Contract Laboratory Program, Statement of Work for
       Organic Analysis, Multi-media, Multi-concentration. Document Number OLM01.0, 1990.
       Including Revisions, OLM01.1--OLM03.1, Dec 1990--Aug 1994.  U.S. Environmental Protection
       Agency, Cincinnati, OH.

2.      J. M. Long and J. M. McGuire, "Assessment of Tentatively Identified Compounds in Superfund
       Samples," U. S. Environmental Protection Agency, Environmental Research Brief EPA/600/M-
       89/030; June, 1990.

3.      Viar and Company, "Evaluation of the Toxicity of 798 Commonly Occurring Semivolatile
       Tentatively Identified Compounds," Interim Report and Final Report on SMO Performance Event
       259, October 31,1991 and December 31,1991.

4.      Donald R. Scott, "Rapid and Accurate Method for Estimating Molecular Weights of Organic
       Compounds from Low Resolution Mass Spectra," Chemometrics and Intelligent Laboratory
       Systems, 1£, 193-202 (1992).

5.      Donald R. Scott, A. Levitsky, and S. E. Stein, "Large Scale Evaluation of a Pattern
       Recognition/Expert System for Mass Spectral Molecular Weight Estimation," Analytica Chimica
       Acta, 278,137-147(1993).

6.      Donald R. Scott, "Empirical Pattern Recognition/Expert System for Molecular Weight Estimation of
       Low Resolution Mass Spectra," Analytica Chimica Acta, 285. 209-222 (1994).

7.      Bruce N. Colby, "Spectral Deconvolution for Overlapping GC/MS  Components," Journal of the
       American Society for Mass Spectrometry, 3, 558-562 (1992).

8.      Colby, B.N.; D'Arcy, P.H.  Reliable Compound Identification at Low Levels in Complex
       Environmental Samples. Proceedings of the 41st ASMS Conference of Mass Spectrometry and
       Allied Topics; San Francisco, CA, 1993; p. 813.

9.      Stephen E. Stein, "Estimating Probabilities of Correct Identification from Results of Mass Spectral
       Library Searches," Journal of the American Society for Mass Spectrometry, 5, 316-323 (1994).

10.    Stephen E. Stein, "Optimization and Testing of Mass Spectral Library Search Algorithms for
       Compound Identification," Journal of the American Society for Mass Spectrometry, 5, 859-866
       (1994).

11.    Andrew H. Grange, Joseph R. Donnelly, William C. Brumley, Stephen Billets, and G. Wayne
       Sovocool, "Mass Measurements by an Accurate and Sensitive Selected-Ion-Recording Technique,"
       Analytical Chemistry, 66,4416-4421 (1994).

12.    A.H. Grange, J.R Donnelly, W.C. Brumley, and G.W. Sovocool, "Determination of an Elemental
       Composition from Mass Peak Profiles of the Molecular Ion (M) and the M+l  and M+2 Ions,"
       Analytical Chemistry, 68, 553 (1996).

                                              30

-------
13.     N.R. Herron, J.R. Donnelly, and G.W. Sovocool.  "Software-based Mass Spectral Enhancement to
       Remove Interferences from Spectra of Unknowns," Journal of the American Society for Mass
       Spectrometry, 7, 598 (1996).
                                            31

-------