vvEPA
                               United States
                               Environmental Protection
                               Agency
                                Environmental Research
                                Laboratory
                                Athens GA 30613
                               Research and Development
                                EPA/600/M-89/030 June 1990
ENVIRONMENTAL
RESEARCH  BRIEF
       Assessment of Tentatively identified Compounds in Superfund
                                        Samples
                                J. M. Long and J. M. McGuire*
Abstract
 Stored mass spectral data for 27 semivolatile samples
 analyzed,by 7 private laboratories under contract with the
 U.S. Environmental Protection Agency were reanalyzed at
 the Environmental Research Laboratory, Athens,  GA
 (AERL). Results of the reanalysis were compared with the
 original  contract laboratory results. In instances where
 specific compound identifications had  been made by a
 contract laboratory, AERL identifications agreed 36% of
 the time, disagreed with the identification 11 % of the time,
 disagreed on the presence of the GC peak 1-9% of the
 time, or concluded data were insufficient for identification
 34% of the time.

 Background

 Public  Law  96-510, entitled  "The  Comprehensive
 Environmental Response, Compensation and Liability  Act
of  1980" (commonly known as  Superfund), authorized,
among other things, testing and monitoring of waste sites.
To accomplish this,  the Superfund Contract  Laboratory
Program (CLP) was established and comprehensive
methods were implemented for contract laboratories to use
in analyzing and reporting target analytes (1, 2). In addition
to the target analytes, the statement of work required each
contractor, for each  semivolatile sample analyzed  by
GC/MS,  to conduct mass spectral library searches to
determine the possible identity of  as  many as  20
 "Environmental Research Laboratory, Athens, GA 30613'
                semivolatile  components not listed on EPA's Target
                Compound List (3).  The  selections were to  be those
                compounds having the greatest  concentrations.
                Substances that exhibited responses less than 10% of the
                nearest internal standard were not to  be considered. A
                further requirement was that  the 1985 or most recent
                release of the  National  Bureau of Standards spectral
                library was to be used to conduct the searches. Reporting
                requirements stipulated  that compounds not meeting' the
                complete identification  requirements contained in  the
                statement of  work should  be reported as "unknown," or
                "unknown hydrocarbons," or "unknown  aromatic," etc. In
                other words,  the tentative identification should be as
                specific as possible  even for compounds identified as
                "unknown."

                At the time the tentatively identified compounds (TIC)
                concept was  included in the  statement of work, it was
                thought that such  tentative identifications might provide
                information that would be  useful in modifying the target
                list. Since then, thousands of TIC identifications have been
                made and  a need has  appeared for assessing their
                reliability.  This  work  is the first part of  such  an
                assessment. Multi-spectral identification or confirmation of
                selected TICs is in progress  at this time and will  be
                reported separately as the second step in the assessment
                process.

                Approach

                The TICs made by the  contract laboratories  were
                compared to those made by the AERL, using the same
                mass spectral data but with different analytical protocols,

-------
In order to determine the  reliability of  the  contract
laboratory identifications. The data for this research brief
were generated from 7 contract laboratories on a total of
27 extracts and  were processed  by computer programs
developed by AERL personnel (4) and others (5-11). The
best mass spectrum for each GC peak was located and
extracted from  its  background  by the  programs. The
concentration was then estimated,  and as many as ten
possible identifications using Probability-Based  Matching
(PBM) were compiled for each resolved peak in the mass
chromatogram.  The library, containing 110,000  spectra,
used in this work is a sub-set of the complete Wiley  library
and  is  larger and more extensive  than the NBS  library
used by the contract laboratories.

The complete AERL computer identification programs rely
heavily on historical relative retention data, which were not
available for this study. Accordingly, AERL develqped the
following rules to facilitate selection of the  best match.

1.  The value of the  PBM derived ratio k/(k + Ak}  should
    be greater than  or  equal to 0.50,  and shpuld' be
    significantly higher than that for the next best hit.

2.  The value  of the  PBM k should be  greater than  or
    equal to 40.

3.   Priority should  be given to k's with " +" signs, which
    indicate the presence of an ion  in the  unknown
    spectrum  at  a  mass  corresponding to the  molecular
     weight of the library match.                 i

 4.   Priority should  be given to matches that have relative
     retention time agreement (where available).

 5   If  two  adjacent scans  have the  same  value  of
     k/(k+Ak) and if these are the best HITS, the one with
     the higher concentration should be chosen.

 6.   A match should be chosen if the value of k/(k + Ak) is
     greater than or equal to 0.95  and the contamination
     value is less than 2.

 7.   No match having 3 PBM flags or one with "anhydride"
     as part of the chemical name should be chosen.
                                              i
 8.   The same identification should not  be chosen more
     than once in the same run.

  Using  this screening procedure, a technician was able to,
  in  many instances, eliminate all but two or three of the ten
  most probable identifications. From the  two or three  that
  passed the screen, the most probable identification then
  was chosen by the analyst based on his  reexamjnation of
  the data.                                     I

  Results                                     |

  Table 1 represents an example  of part of the information
  summarized  in the  report  of  TICs for an  individual
  semivolatile extract  by  contract  laboratories.  [Table  2,
  which closely  resembles  Table 1,  contains  the AERL
  identifications corresponding to those in  Table 1. The data
  in Table 3 summarize the comparison of TIC reports, such
Table 1. Representative Identifications in TIC Report

                                   Rt      Est.
 CAS Number    Compound Name   (minutes)  Cone.   Q*
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
140-76-1 Pyrldine, 5-Ethenyl-2-
Methyl
Unknown Hydrocarbon
Unknown Hydrocarbon
Unknown
Unknown Hydrocarbon
Unknown
Unknown Hydrocarbon
Unknown
Unknown
Unknown Hydrocarbon
Unknown Hydrocarbon
Unknown
Unknown Hydrocarbon
Unknown
Unknown Hydrocarbon
Unknown Hydrocarbon
Unknown
Unknown
9.27
14.39
15.97
16.37
16.90
18.35
18.85
19.07
19.74
20.17
20.25
21.44
21.54
22.44
22.62
23.77
25.04
27.54
26000
12000
44000
11000
25000
17000
51000
200000
46000
38000
37000
40000
34000
10000
24000
33000
25000
20000
J**
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
  "EPA Qualifier from statement of work
 "J = estimated concentration or concentration is less than the
 quantitation limit
 as the one in Table 1, for each sample, and the computer
 outputs  resulting from AERL processing of the  contract
 laboratory mass spectral data.  For  each TIC  compound
 name entered in Table 3, the purity or probability of the
 spectral match is recorded, as are the identifications made
 by AERL personnel  and the corresponding value of the
 PBM k/(k + Ak).

 Specifically, Table 3 summarizes the data obtained on the
 27 semivolatile extracts  by the 7  contract laboratories
 along with  comparative  data  obtained  by  AERL  by
 processing the mass spectral  data generated by those
 contract laboratories. The information includes the number
 of TICs  made, the  number of  those  that are "specific,"
 "generic,"  and "unknown"  and the range  in purity  for
 compounds in the three  groups, and  a descriptor for the
 overall  shape of the mass chromatogram. Purities are
 listed  for   laboratories  using Finnigan   instruments;
 probabilities  are listed for those using Hewlett Packard
 instruments. Specific identifications for the purposes of this
 report  are  defined  to  be those  employing  specific
 compound  names,  e.g.,  n-hexadecane. Generic
 identifications are those employing chemical family names,
 e.g., unknown hydrocarbon. The remaining  identifications
 are defined as unknown identifications. These employ only
 the descriptor, "unknown." For  specific identifications,  the
 comparative data obtained by AERL include a breakdown
 by four categories-agreement  (A), disagreement (D),  no-
 scan (NS), and  misidentified  (MIS).  For generic and
 unknown identifications,  the same  categories,  with  the

-------
 Table 2. AERL Identifications Corresponding to TIC Report
         (Table 1)
        Compound Name
Rt (minutes)
                                          Concentration
1. pyridine, 5-ethenyl-2-
methyl
2. unknown hydrocarbon
3. unknown hydrocarbon
4. MIS
5. Unknown hydrocarbon
6. MIS
7. Unknown hydrocarbon
8. MIS
9. MIS
10. unknown hydrocarbon
11. unknown hydrocarbon
12. unknown hydrocarbon
13. unknown hydrocarbon
14. MIS
15. unknown hydrocarbon
16. unknown hydrocarbon
17. MIS
18. MIS
9.27
14.39
15.97
16.37
16.90
18.35
18.85
19.07
19.74
20.17
20.25
21.44
21.54
22.44
22.62
23.77
25.04
27.54
8500
11000
13000
1000
4000 .
18000
14000
Saturated
20000
7000
8000
18000
9000
2500
15000
13000
9900
5900
 exception  of  MIS,  are  used. The  agreement  and
 disagreement categories  need no  explanation.  The  NS
 category indicates that there was no scan in the AERL-
 processed data corresponding  to the contract laboratory
 scan. These scans were absent due either  to  a known
 deficiency in AERL's  peak recognition program  or to the
 contract laboratory's reporting  peaks that were  not  real.
 The MIS  category  refers to spectra that were  not
 interpretable by AERL personnel based  on GC/MS alone.
 The table  also includes a range of values obtained for
 k/(k + Ak) for specific, generic, and unknown identifications.

 For "specific" identifications in  Table 3, the overall range
 of purity/probability values was: for  contract laboratory A,
 576-977; for B, 371-964; for C, 504-829; for D,  625-977; for
 E,  677-873;  for F, 67-95; and  for  G, 52-93.  The AERL
 k/(k + Ak) range of values for identifications corresponding
 to and  in agreement  with those for contract  laboratories
 was: A,  0.40-0.94; B, 0.67-1.00; C, 0.50-0.95; D, no data- E
 0.80-0.80;  F, 0.66-1.00;  and G,  0.55-0.91.  Regression
 analysis indicated  there is no linear correlation  between
 either purity or probability values and k/(k + Ak) values.

 For "generic" identifications, the range of purity/probability
values for contract laboratory A was 736-845; for B, 217-
923; for C, 138-803; for D, no data;  for E, 558-909; for F,
11-89; and for  G,  15-81.  The AERL k/(k + Ak) range  for
identifications corresponding  to and in  agreement with
those for contract laboratory A was 0.23-1.00;  for B, 0.33-
1.00; for C, 0.15-0.75; for D, no data; for E, 0.42-1 00-  for
F, 0.17-1.00; and for G, 0.50-0.91.
  For  "unknown"  identifications,  the   range  of
  purity/probability values for contract laboratory A was 290-
  800; for B, 197-768; for C, no data; for D, 683- 711- for E
  216-883;  for F, 20-70;  and for G,  11-38.  The AERL
  k/(k + Ak)  range for identifications corresponding to and in
  agreement with those for contract laboratory  A was 0.16-
  1.00; for B, 0.18-0.88; for C, no data; for D, 0.21- 0.72; for
  E, 0.29-0.82; for F, 0.19-0.86; and for G, 0.31-0.72.

  There  are two k/(k + Ak) ranges for  each  identification
  category in each sample reported. The first range was
  obtained from those  identifications for which there  is
  agreement between the contract laboratory and AERL. The
  second range was obtained from those identifications for
  which there is disagreement.

  It  is evident  from Table 3 that  the  values  for AERL's
 k/(k + Ak) are, for the  most part, greater than or equal  to
 0.50 for the specific identifications. In a few instances, the
 compound identification having a value less than 0.50 was
 determined by the analyst to be reasonable and therefore
 was selected as the best HIT.

 For the generic category, the range of k/(k +Ak) values for
 each mass spectrum was obtained by selecting the lower
 and upper values from all the identifications comprising the
 generic group. The same is true for the unknown category.

 The lower purity ranges for both generic  and unknown
 identifications tend to be lower than those for  the specific
 identifications.  This is  not  unusual since  only a  poor
 correlation is expected between purity values and either
 generic or unknown identifications.

 Table  4 summarizes  the data contained in Table 3  and
 shows overall agreement and disagreement between the
 contract laboratory TICs and those determined  by AERL.  It
 is interesting to note that of the 478 contract laboratory
 TICs involved, 38% were specific identifications, 39% were
 generic identifications,  and  23%  were  unknown
 identifications. AERL was in  agreement with 36%  of the
 specific identifications  and in disagreement with 11% of
 them. In many instances, a  disagreement on  a specific
 identification  would be considered an  agreement  on  a
 generic basis.  The designation of a  GC peak  as n-
 hexadecane by one of the  contract laboratories and as
 "unknown hydrocarbon"  by  AERL is an example of this
 situation. This table indicates that  AERL is in agreement
 by  roughly the same percentage with  four  of the  seven
 contract laboratories on specific identification and with five
 of  seven contract laboratories  on  both  generic  and
 unknown identifications. The  NS category comprised 19%
 and the MIS category 34% of the specific identifications,
 respectively. For the generic  identifications, AERL was in
 agreement  with 48% and  in  disagreement with 22%. For
 unknown identifications, AERL was  in  agreement 54% and
 in disagreement with  10%. The NS  category  comprised
 15%  of the generic  and 22%  of  the  unknown
 identifications.

Table 5 contains concentration estimates for  each sample
reported in  Table 3. It appears that, for all three identifica-

-------
o
s!
Ul
s
3
 «
 S
w
 o
              •I?
              OQ
                    I    !   !
                    to o  o 10  to
                       CO  O> ^}>  to
                       o  c> c>  c>

                    cb to  o o  A
                    CM to  CM co  c\j

                    c> ci  c> cS  o
                 v>
o CM  to o to    ^*
O 00  V3 N- N-    to

79999:9
CM T—  ^J- IO CO    O)
CM co  CM co <-    i-
                            3    3    £
                                                                  CO O  OO CM O>
                                                                  ci o
                                                   .0)
                                                                        •»    co 10 oo  oo
                                                     ;  9
                             O O O  CD
               II

               CD-
             Si S
                  s
                  ~
                coO



                II
                           8
                                                                        i-    i-     O
                                                                   DQ
                                                                               03    03
                                                                                         8
                                                                                          03
                                                                                                03
9 :   99
i*-    O) i* O
c\i    c\i r*« T^~

o    c$ o o
                                                                              9'9 797!  7  !  9
                                                                                                     3
                                                                                                             !   99
                                                                                                                o o  o o
                                                                                                    O    i-    i-
                                                                                                           O    O
                                                                                                                      °p
                                                                                                                                            8

-------
CD
S fe

3 §
a. ii


iC w
o c
CD CD
£S

>
— =tfc
1 <• -S §
C Q Q.1^-
•S tti S -J
c TOO:
0 GOUJ
0 ^
^•^ *w -~^
o — '
co roO_
a o -°
,2 0^

-C "O >
.O) CD Q
3: § -j
CO r-
00 K
? i S5 i : !
<0 to
CM CVJ
o o
O i- >~
T^

o o o

IV. 10 O
CM 00 Co ^5-
O> 00 OS st-
O O O O i j
fv. CM oo  »—
op op
tv.r-.lo
CO r- r-


N. IV. r-

T- IO CM
O> O> C7)
v. o CM
 00 ^>
t- r- CM
DO 00 O O O


U- u. CD

T3 13

§ S
CM

9 : : i

CO
0
o o


0 0

CO O
rv. »-

! ? ! 9
IV. ^
CM co
o c



eg oo

o o

t- CO CO O
O5 O) CO O
9997
OO CO O O)

o o o o
1- O


- °
CO t-
10 Cg


00
Cp


CO O

^
CO 00
IO CO
r- CO


co en

CO
O5 O)
CM °?
10 o
CO

CO CO

« c
CM CM
CM CM
co rv. 10 co
o o o o


CD CD





























































1!
"?


IV "^ r\ "C ^ ^
|lj ||i|
K ^£ •§ ° ^ w
^ ^ O m W CD
O^r*"1 	 | cQ5"oE-
2 •§ .c'ffi 'S "§ I o
-S |> §^| °>o Q.
lllsfli!
'go ^S o:g-§"c5
OcoS^^ctOm
CD2S"^O§)O
gllf.S^^^
itii^lll
"^ g g -o -^ (J .5 .£
"c:ECj§c6fc2-2
qjcDPovT)CD^
£ ^ Q.Q.c"§Q-'(V
J> D)-j S.SOJUJUJ

ll^fttt
II 1^11 11

fct**Jx«,k_^^
S^^ooffffffS
S
5 'C
~y Q
0) .-^-Q *f Q)

-------
Table 4. Condensed TIC Statistics on Agreement/Disagreement of Contract Laboratory/AERL
TICs (AERL)
r
\J












^fifitrftC't
Lab.
(CL)
A
B
C
D
E
F
G
Total
Mean
Std.
Dev.


No.
136
111
55
20
30
58
68
478

%
TICs (CL)
%
Specific Generic
35 7
11 69
25 75
90 0
17 63
26 43
62 19


38 39
28 31




%
Unknown
58
20
0
10
20
31
19

23
18

Specific Identifications Generic Identifications Unknown
QI


% A % D % NS MIS % A % D % NS
44 8 27 21 100
50 17 0 33 61
57 0 0 43 56
0 0 33 67 0
20 0 40 40 53
60 0 13 27 68
19 52 19 10 0

36 11 19 34 48
23 19 16 18 36

0
21
34
0
0
24
77

22
28

0
18
10
0
47
8
23

15
16


% A
65
77
0
50
67
94
23

54
32

Identifications

% D
20
14
0
0
33
0
0

10
13


% NS
15
9
0
50
0
6
77

22
30

Legend
% Specific
% Generic
%
%
%
%
%
Unknown
A
D
NS
MIS
Percentage
Percentage
Percentage
Percentage
Percentage
Percentage
Percentage
of contract laboratory identifications that employ specific chemical names
of contract laboratory identification that employ chemical family names
of contract laboratory identifications that employ only the descriptor




unknown
of contract laboratory identifications agreed upon by AERL
of contract laboratory identifications disagreed with by AERL
of contract laboratory identifications for which there were no corresponding
of contract laboratory identifications not interpretable by AERL

scans

in AERL

data



tion categories,  the agreement between contract
laboratories and AERL is within a factor of three.
                                             i
Conclusions                                [

Overall, the agreement  between AERL and  the [contract
laboratory  identifications  for  both  specific and  generic
identifications appears  to  be fair and  also Iroughly
equivalent  for five  of the  seven  contract laboratories.
Generic and  unknown identifications comprising '62% of
the total is indicative, to some extent,  that perhaps fewer
samples should  have been analyzed  in order to obtain
more thorough  interpretations of the data  generated.
Future work statements  should be  written in a manner to
strongly discourage the  use of "unknown" identifications.
Such identifications should be used only as a last resort. It
was observed in at least  one instance  that the  same
specific compound identification appeared more than once
in a single TIC  report. This suggests that this particular
report  did  not receive a great deal of review. Finally, in
several instances,  it appeared that relative retention time
data were ignored in assigning compound identities.

Acknowledgments

Paul Kimsey's help in applying the AERL rules to screen
the computer outputs is gratefully acknowledged, as is the
advice  of  Dr. Susan Richardson  and Al Thruston,  Jr.
concerning compound identifications.            [

References

  1. Fisk,  J.F., A.M. Haeberer, and S.P. Kovell,  Spectra,
    Volume 10, Number 4,22 (1986).
 2.  Friedman, D., ibid, 40.

 3.  USEPA Contract Laboratory  Program, Statement of
    Work  for  Organic  Analysis,  Multi-Media  Multi-
    Concentration, 10/86. Rev: 1/87, 2/87, 7/87, 8/87.

 4.  Shackelford, W.M., D.M. Cline, L. Burchfield, L. Faas,
    G. Kurth, and A.D. Sauter, "Computer Survey of  Gas
    Chromatography/Mass Spectrometry Data Acquired in
    the U.S. Environmental Protection Agency Screening
    Analysis: System and  Results,"  pp.  527-554 in
    "Advances  in  the Identification  and  Analysis of
    Organic  Pollutants in Water,"  ed.  L.H.  Keith,  Ann
    Arbor Science, Ann Arbor, Ml (1981).

 5.  Smith,  D.H.,  M.  Achenback,  W.J. Yeager,  P.J.
    Anderson, L. Fitch, and T.C. Rindfleisch,  Anal. Chem.,
    49, 1623(1977).

 6.  Dromey, R.G.,  J.  Stefik, T.C. Rindfleisch,  and A.M.
    Dufield, Anal. Chem., 48, 1362 (1976).

 7.  Pesyna,  G.M.,  R. Venkataraghavan, H.R. Dayringer,
    and F.W. McLafferty, Anal. Chem., 48, 1362 (1976).

 8.  Atwater,  B.L.(F.), D.B. Stauffer, F.W. McLafferty,  and
    D.W. Peterson, Anal.  Chem., 57, 899 (1985).

 9.  Stauffer, D.B., F.W.  McLafferty,  R.D. Ellis,  and D.W.
    Peterson, Anal. Chem., 57, 1056 (1985).

10.  McLafferty,  F.W. and  D.B.  Stauffer, J. Chem.  Inf.
    Comp. Sci. 25, 245(1985).

11.  Shackelford, W.M., D.M. Cline, L. Faas, and G. Kurth,
    Anal. Chem. Acta., 146, 15 (1983).

-------
Table 5. Comparison of TIC Concentrations
Contract Lab.
(CL)
A

A

A

A

A

A

A

A

B

B

B

B

B

B

C

C

C

C

D

E

F

F

F

F

G

G

G

EPA Sample #
AERL Run #
FG489
44121
FG496
44140
FG490
44128
FG488
44127
FG494
44138
FG495
44139
FG493
44137
FF397
44136
AK077
44082
DH939
44080
ER728
44081
YD028
44076
YD037
44078
YD035
44073
DH441
44165
DH438
44170
DH444
44166
DH449
44171
CQ538
44072
CR385
44075
GE325
44092
ER837
44093
ER843
44094
ER844
44095
ES062
44071
£8067
44070
£8054
44068
Specific
Idents.

2700

900

230000

26000

230

900

650

120

2400

39000

100

-

57000

2500

7600

1000

13000

500

900

12000

470

59000

6500

46000

900

1700

700
Cone. (CL)
Generic
Idents.

--

--

-

33000

--

-

-

--

3600

17000

50

108000

4900

2200

11000

2200

3800

900

-

18000

-

25000

2100

30000

7500

3900

1300
Cone. (AERL)
Unknown
Idents.

2800

9500

340000

46000

500

6400

2700

200

1100

16000

80

74000

-

1700

--

--

-

-

500

12000

89

12000

2100

33000

250

900

--
Specific
Idents.

2300

700

7300

8500

200

1200

400

400

1300

7700

80

--

3200

1000

3100

700

2100

200

--

13000

200

32000

3200

16000

1100

600

800
Generic
Idents.

2000

280

-

77500

225

-

--

--

3000

6000

60

7000

76000

600

7400

--

400

700

760

22000

--

30000

2800

35000

--

1000

200
Unknown
Idents.

2200 (ng/kg)

6100 (iig/L)

4400 (iig/kg)

6400 (iiglkg)

290 (ng/L)

6000 (uglL)

2100 (ng/L)

300 (tig/L)

300 (uglL)

1600 (fig/kg)

90 (uglL)

500 (fig/kg)

- fag'kg)

25 (iiglkg)

-- fag/kg)

-- tea/kg)

- (uglkg)

- fag/kg)

230 (iiglkg)

4000 fag/kg)

200 (ftg/L)

14000 (uglkg)

1600 (iiglkg)

20000 (iiglkg)

-- fag/L)

200 (iiglkg)

- titg'kg)

-------
Note: Mention of trade names or commercial  products
does not constitute endorsement or recommendation for
use by the U.S. Environmental Protection Agency.     I
 United States
 Environmental Protection
 Agency
Center for Environmental Research
Information       '
Cincinnati OH 45268   .   ....  '-
 Official Business
 Penalty for Private Use $300

 EPA/600/M-89/030

-------