vvEPA
United States
Environmental Protection
Agency
Environmental Research
Laboratory
Athens GA 30613
Research and Development
EPA/600/M-89/030 June 1990
ENVIRONMENTAL
RESEARCH BRIEF
Assessment of Tentatively identified Compounds in Superfund
Samples
J. M. Long and J. M. McGuire*
Abstract
Stored mass spectral data for 27 semivolatile samples
analyzed,by 7 private laboratories under contract with the
U.S. Environmental Protection Agency were reanalyzed at
the Environmental Research Laboratory, Athens, GA
(AERL). Results of the reanalysis were compared with the
original contract laboratory results. In instances where
specific compound identifications had been made by a
contract laboratory, AERL identifications agreed 36% of
the time, disagreed with the identification 11 % of the time,
disagreed on the presence of the GC peak 1-9% of the
time, or concluded data were insufficient for identification
34% of the time.
Background
Public Law 96-510, entitled "The Comprehensive
Environmental Response, Compensation and Liability Act
of 1980" (commonly known as Superfund), authorized,
among other things, testing and monitoring of waste sites.
To accomplish this, the Superfund Contract Laboratory
Program (CLP) was established and comprehensive
methods were implemented for contract laboratories to use
in analyzing and reporting target analytes (1, 2). In addition
to the target analytes, the statement of work required each
contractor, for each semivolatile sample analyzed by
GC/MS, to conduct mass spectral library searches to
determine the possible identity of as many as 20
"Environmental Research Laboratory, Athens, GA 30613'
semivolatile components not listed on EPA's Target
Compound List (3). The selections were to be those
compounds having the greatest concentrations.
Substances that exhibited responses less than 10% of the
nearest internal standard were not to be considered. A
further requirement was that the 1985 or most recent
release of the National Bureau of Standards spectral
library was to be used to conduct the searches. Reporting
requirements stipulated that compounds not meeting' the
complete identification requirements contained in the
statement of work should be reported as "unknown," or
"unknown hydrocarbons," or "unknown aromatic," etc. In
other words, the tentative identification should be as
specific as possible even for compounds identified as
"unknown."
At the time the tentatively identified compounds (TIC)
concept was included in the statement of work, it was
thought that such tentative identifications might provide
information that would be useful in modifying the target
list. Since then, thousands of TIC identifications have been
made and a need has appeared for assessing their
reliability. This work is the first part of such an
assessment. Multi-spectral identification or confirmation of
selected TICs is in progress at this time and will be
reported separately as the second step in the assessment
process.
Approach
The TICs made by the contract laboratories were
compared to those made by the AERL, using the same
mass spectral data but with different analytical protocols,
-------
In order to determine the reliability of the contract
laboratory identifications. The data for this research brief
were generated from 7 contract laboratories on a total of
27 extracts and were processed by computer programs
developed by AERL personnel (4) and others (5-11). The
best mass spectrum for each GC peak was located and
extracted from its background by the programs. The
concentration was then estimated, and as many as ten
possible identifications using Probability-Based Matching
(PBM) were compiled for each resolved peak in the mass
chromatogram. The library, containing 110,000 spectra,
used in this work is a sub-set of the complete Wiley library
and is larger and more extensive than the NBS library
used by the contract laboratories.
The complete AERL computer identification programs rely
heavily on historical relative retention data, which were not
available for this study. Accordingly, AERL develqped the
following rules to facilitate selection of the best match.
1. The value of the PBM derived ratio k/(k + Ak} should
be greater than or equal to 0.50, and shpuld' be
significantly higher than that for the next best hit.
2. The value of the PBM k should be greater than or
equal to 40.
3. Priority should be given to k's with " +" signs, which
indicate the presence of an ion in the unknown
spectrum at a mass corresponding to the molecular
weight of the library match. i
4. Priority should be given to matches that have relative
retention time agreement (where available).
5 If two adjacent scans have the same value of
k/(k+Ak) and if these are the best HITS, the one with
the higher concentration should be chosen.
6. A match should be chosen if the value of k/(k + Ak) is
greater than or equal to 0.95 and the contamination
value is less than 2.
7. No match having 3 PBM flags or one with "anhydride"
as part of the chemical name should be chosen.
i
8. The same identification should not be chosen more
than once in the same run.
Using this screening procedure, a technician was able to,
in many instances, eliminate all but two or three of the ten
most probable identifications. From the two or three that
passed the screen, the most probable identification then
was chosen by the analyst based on his reexamjnation of
the data. I
Results |
Table 1 represents an example of part of the information
summarized in the report of TICs for an individual
semivolatile extract by contract laboratories. [Table 2,
which closely resembles Table 1, contains the AERL
identifications corresponding to those in Table 1. The data
in Table 3 summarize the comparison of TIC reports, such
Table 1. Representative Identifications in TIC Report
Rt Est.
CAS Number Compound Name (minutes) Cone. Q*
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
140-76-1 Pyrldine, 5-Ethenyl-2-
Methyl
Unknown Hydrocarbon
Unknown Hydrocarbon
Unknown
Unknown Hydrocarbon
Unknown
Unknown Hydrocarbon
Unknown
Unknown
Unknown Hydrocarbon
Unknown Hydrocarbon
Unknown
Unknown Hydrocarbon
Unknown
Unknown Hydrocarbon
Unknown Hydrocarbon
Unknown
Unknown
9.27
14.39
15.97
16.37
16.90
18.35
18.85
19.07
19.74
20.17
20.25
21.44
21.54
22.44
22.62
23.77
25.04
27.54
26000
12000
44000
11000
25000
17000
51000
200000
46000
38000
37000
40000
34000
10000
24000
33000
25000
20000
J**
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
J
"EPA Qualifier from statement of work
"J = estimated concentration or concentration is less than the
quantitation limit
as the one in Table 1, for each sample, and the computer
outputs resulting from AERL processing of the contract
laboratory mass spectral data. For each TIC compound
name entered in Table 3, the purity or probability of the
spectral match is recorded, as are the identifications made
by AERL personnel and the corresponding value of the
PBM k/(k + Ak).
Specifically, Table 3 summarizes the data obtained on the
27 semivolatile extracts by the 7 contract laboratories
along with comparative data obtained by AERL by
processing the mass spectral data generated by those
contract laboratories. The information includes the number
of TICs made, the number of those that are "specific,"
"generic," and "unknown" and the range in purity for
compounds in the three groups, and a descriptor for the
overall shape of the mass chromatogram. Purities are
listed for laboratories using Finnigan instruments;
probabilities are listed for those using Hewlett Packard
instruments. Specific identifications for the purposes of this
report are defined to be those employing specific
compound names, e.g., n-hexadecane. Generic
identifications are those employing chemical family names,
e.g., unknown hydrocarbon. The remaining identifications
are defined as unknown identifications. These employ only
the descriptor, "unknown." For specific identifications, the
comparative data obtained by AERL include a breakdown
by four categories-agreement (A), disagreement (D), no-
scan (NS), and misidentified (MIS). For generic and
unknown identifications, the same categories, with the
-------
Table 2. AERL Identifications Corresponding to TIC Report
(Table 1)
Compound Name
Rt (minutes)
Concentration
1. pyridine, 5-ethenyl-2-
methyl
2. unknown hydrocarbon
3. unknown hydrocarbon
4. MIS
5. Unknown hydrocarbon
6. MIS
7. Unknown hydrocarbon
8. MIS
9. MIS
10. unknown hydrocarbon
11. unknown hydrocarbon
12. unknown hydrocarbon
13. unknown hydrocarbon
14. MIS
15. unknown hydrocarbon
16. unknown hydrocarbon
17. MIS
18. MIS
9.27
14.39
15.97
16.37
16.90
18.35
18.85
19.07
19.74
20.17
20.25
21.44
21.54
22.44
22.62
23.77
25.04
27.54
8500
11000
13000
1000
4000 .
18000
14000
Saturated
20000
7000
8000
18000
9000
2500
15000
13000
9900
5900
exception of MIS, are used. The agreement and
disagreement categories need no explanation. The NS
category indicates that there was no scan in the AERL-
processed data corresponding to the contract laboratory
scan. These scans were absent due either to a known
deficiency in AERL's peak recognition program or to the
contract laboratory's reporting peaks that were not real.
The MIS category refers to spectra that were not
interpretable by AERL personnel based on GC/MS alone.
The table also includes a range of values obtained for
k/(k + Ak) for specific, generic, and unknown identifications.
For "specific" identifications in Table 3, the overall range
of purity/probability values was: for contract laboratory A,
576-977; for B, 371-964; for C, 504-829; for D, 625-977; for
E, 677-873; for F, 67-95; and for G, 52-93. The AERL
k/(k + Ak) range of values for identifications corresponding
to and in agreement with those for contract laboratories
was: A, 0.40-0.94; B, 0.67-1.00; C, 0.50-0.95; D, no data- E
0.80-0.80; F, 0.66-1.00; and G, 0.55-0.91. Regression
analysis indicated there is no linear correlation between
either purity or probability values and k/(k + Ak) values.
For "generic" identifications, the range of purity/probability
values for contract laboratory A was 736-845; for B, 217-
923; for C, 138-803; for D, no data; for E, 558-909; for F,
11-89; and for G, 15-81. The AERL k/(k + Ak) range for
identifications corresponding to and in agreement with
those for contract laboratory A was 0.23-1.00; for B, 0.33-
1.00; for C, 0.15-0.75; for D, no data; for E, 0.42-1 00- for
F, 0.17-1.00; and for G, 0.50-0.91.
For "unknown" identifications, the range of
purity/probability values for contract laboratory A was 290-
800; for B, 197-768; for C, no data; for D, 683- 711- for E
216-883; for F, 20-70; and for G, 11-38. The AERL
k/(k + Ak) range for identifications corresponding to and in
agreement with those for contract laboratory A was 0.16-
1.00; for B, 0.18-0.88; for C, no data; for D, 0.21- 0.72; for
E, 0.29-0.82; for F, 0.19-0.86; and for G, 0.31-0.72.
There are two k/(k + Ak) ranges for each identification
category in each sample reported. The first range was
obtained from those identifications for which there is
agreement between the contract laboratory and AERL. The
second range was obtained from those identifications for
which there is disagreement.
It is evident from Table 3 that the values for AERL's
k/(k + Ak) are, for the most part, greater than or equal to
0.50 for the specific identifications. In a few instances, the
compound identification having a value less than 0.50 was
determined by the analyst to be reasonable and therefore
was selected as the best HIT.
For the generic category, the range of k/(k +Ak) values for
each mass spectrum was obtained by selecting the lower
and upper values from all the identifications comprising the
generic group. The same is true for the unknown category.
The lower purity ranges for both generic and unknown
identifications tend to be lower than those for the specific
identifications. This is not unusual since only a poor
correlation is expected between purity values and either
generic or unknown identifications.
Table 4 summarizes the data contained in Table 3 and
shows overall agreement and disagreement between the
contract laboratory TICs and those determined by AERL. It
is interesting to note that of the 478 contract laboratory
TICs involved, 38% were specific identifications, 39% were
generic identifications, and 23% were unknown
identifications. AERL was in agreement with 36% of the
specific identifications and in disagreement with 11% of
them. In many instances, a disagreement on a specific
identification would be considered an agreement on a
generic basis. The designation of a GC peak as n-
hexadecane by one of the contract laboratories and as
"unknown hydrocarbon" by AERL is an example of this
situation. This table indicates that AERL is in agreement
by roughly the same percentage with four of the seven
contract laboratories on specific identification and with five
of seven contract laboratories on both generic and
unknown identifications. The NS category comprised 19%
and the MIS category 34% of the specific identifications,
respectively. For the generic identifications, AERL was in
agreement with 48% and in disagreement with 22%. For
unknown identifications, AERL was in agreement 54% and
in disagreement with 10%. The NS category comprised
15% of the generic and 22% of the unknown
identifications.
Table 5 contains concentration estimates for each sample
reported in Table 3. It appears that, for all three identifica-
-------
o
s!
Ul
s
3
«
S
w
o
•I?
OQ
I ! !
to o o 10 to
CO O> ^}> to
o c> c> c>
cb to o o A
CM to CM co c\j
c> ci c> cS o
v>
o CM to o to ^*
O 00 V3 N- N- to
79999:9
CM T— ^J- IO CO O)
CM co CM co <- i-
3 3 £
CO O OO CM O>
ci o
.0)
•» co 10 oo oo
; 9
O O O CD
II
CD-
Si S
s
~
coO
II
8
i- i- O
DQ
03 03
8
03
03
9 : 99
i*- O) i* O
c\i c\i r*« T^~
o c$ o o
9'9 797! 7 ! 9
3
! 99
o o o o
O i- i-
O O
°p
8
-------
CD
S fe
3 §
a. ii
iC w
o c
CD CD
£S
>
— =tfc
1 <• -S §
C Q Q.1^-
•S tti S -J
c TOO:
0 GOUJ
0 ^
^•^ *w -~^
o — '
co roO_
a o -°
,2 0^
-C "O >
.O) CD Q
3: § -j
CO r-
00 K
? i S5 i : !
<0 to
CM CVJ
o o
O i- >~
T^
o o o
IV. 10 O
CM 00 Co ^5-
O> 00 OS st-
O O O O i j
fv. CM oo »—
op op
tv.r-.lo
CO r- r-
N. IV. r-
T- IO CM
O> O> C7)
v. o CM
00 ^>
t- r- CM
DO 00 O O O
U- u. CD
T3 13
§ S
CM
9 : : i
CO
0
o o
0 0
CO O
rv. »-
! ? ! 9
IV. ^
CM co
o c
eg oo
o o
t- CO CO O
O5 O) CO O
9997
OO CO O O)
o o o o
1- O
- °
CO t-
10 Cg
00
Cp
CO O
^
CO 00
IO CO
r- CO
co en
CO
O5 O)
CM °?
10 o
CO
CO CO
« c
CM CM
CM CM
co rv. 10 co
o o o o
CD CD
1!
"?
IV "^ r\ "C ^ ^
|lj ||i|
K ^£ •§ ° ^ w
^ ^ O m W CD
O^r*"1 | cQ5"oE-
2 •§ .c'ffi 'S "§ I o
-S |> §^| °>o Q.
lllsfli!
'go ^S o:g-§"c5
OcoS^^ctOm
CD2S"^O§)O
gllf.S^^^
itii^lll
"^ g g -o -^ (J .5 .£
"c:ECj§c6fc2-2
qjcDPovT)CD^
£ ^ Q.Q.c"§Q-'(V
J> D)-j S.SOJUJUJ
ll^fttt
II 1^11 11
fct**Jx«,k_^^
S^^ooffffffS
S
5 'C
~y Q
0) .-^-Q *f Q)
-------
Table 4. Condensed TIC Statistics on Agreement/Disagreement of Contract Laboratory/AERL
TICs (AERL)
r
\J
^fifitrftC't
Lab.
(CL)
A
B
C
D
E
F
G
Total
Mean
Std.
Dev.
No.
136
111
55
20
30
58
68
478
%
TICs (CL)
%
Specific Generic
35 7
11 69
25 75
90 0
17 63
26 43
62 19
38 39
28 31
%
Unknown
58
20
0
10
20
31
19
23
18
Specific Identifications Generic Identifications Unknown
QI
% A % D % NS MIS % A % D % NS
44 8 27 21 100
50 17 0 33 61
57 0 0 43 56
0 0 33 67 0
20 0 40 40 53
60 0 13 27 68
19 52 19 10 0
36 11 19 34 48
23 19 16 18 36
0
21
34
0
0
24
77
22
28
0
18
10
0
47
8
23
15
16
% A
65
77
0
50
67
94
23
54
32
Identifications
% D
20
14
0
0
33
0
0
10
13
% NS
15
9
0
50
0
6
77
22
30
Legend
% Specific
% Generic
%
%
%
%
%
Unknown
A
D
NS
MIS
Percentage
Percentage
Percentage
Percentage
Percentage
Percentage
Percentage
of contract laboratory identifications that employ specific chemical names
of contract laboratory identification that employ chemical family names
of contract laboratory identifications that employ only the descriptor
unknown
of contract laboratory identifications agreed upon by AERL
of contract laboratory identifications disagreed with by AERL
of contract laboratory identifications for which there were no corresponding
of contract laboratory identifications not interpretable by AERL
scans
in AERL
data
tion categories, the agreement between contract
laboratories and AERL is within a factor of three.
i
Conclusions [
Overall, the agreement between AERL and the [contract
laboratory identifications for both specific and generic
identifications appears to be fair and also Iroughly
equivalent for five of the seven contract laboratories.
Generic and unknown identifications comprising '62% of
the total is indicative, to some extent, that perhaps fewer
samples should have been analyzed in order to obtain
more thorough interpretations of the data generated.
Future work statements should be written in a manner to
strongly discourage the use of "unknown" identifications.
Such identifications should be used only as a last resort. It
was observed in at least one instance that the same
specific compound identification appeared more than once
in a single TIC report. This suggests that this particular
report did not receive a great deal of review. Finally, in
several instances, it appeared that relative retention time
data were ignored in assigning compound identities.
Acknowledgments
Paul Kimsey's help in applying the AERL rules to screen
the computer outputs is gratefully acknowledged, as is the
advice of Dr. Susan Richardson and Al Thruston, Jr.
concerning compound identifications. [
References
1. Fisk, J.F., A.M. Haeberer, and S.P. Kovell, Spectra,
Volume 10, Number 4,22 (1986).
2. Friedman, D., ibid, 40.
3. USEPA Contract Laboratory Program, Statement of
Work for Organic Analysis, Multi-Media Multi-
Concentration, 10/86. Rev: 1/87, 2/87, 7/87, 8/87.
4. Shackelford, W.M., D.M. Cline, L. Burchfield, L. Faas,
G. Kurth, and A.D. Sauter, "Computer Survey of Gas
Chromatography/Mass Spectrometry Data Acquired in
the U.S. Environmental Protection Agency Screening
Analysis: System and Results," pp. 527-554 in
"Advances in the Identification and Analysis of
Organic Pollutants in Water," ed. L.H. Keith, Ann
Arbor Science, Ann Arbor, Ml (1981).
5. Smith, D.H., M. Achenback, W.J. Yeager, P.J.
Anderson, L. Fitch, and T.C. Rindfleisch, Anal. Chem.,
49, 1623(1977).
6. Dromey, R.G., J. Stefik, T.C. Rindfleisch, and A.M.
Dufield, Anal. Chem., 48, 1362 (1976).
7. Pesyna, G.M., R. Venkataraghavan, H.R. Dayringer,
and F.W. McLafferty, Anal. Chem., 48, 1362 (1976).
8. Atwater, B.L.(F.), D.B. Stauffer, F.W. McLafferty, and
D.W. Peterson, Anal. Chem., 57, 899 (1985).
9. Stauffer, D.B., F.W. McLafferty, R.D. Ellis, and D.W.
Peterson, Anal. Chem., 57, 1056 (1985).
10. McLafferty, F.W. and D.B. Stauffer, J. Chem. Inf.
Comp. Sci. 25, 245(1985).
11. Shackelford, W.M., D.M. Cline, L. Faas, and G. Kurth,
Anal. Chem. Acta., 146, 15 (1983).
-------
Table 5. Comparison of TIC Concentrations
Contract Lab.
(CL)
A
A
A
A
A
A
A
A
B
B
B
B
B
B
C
C
C
C
D
E
F
F
F
F
G
G
G
EPA Sample #
AERL Run #
FG489
44121
FG496
44140
FG490
44128
FG488
44127
FG494
44138
FG495
44139
FG493
44137
FF397
44136
AK077
44082
DH939
44080
ER728
44081
YD028
44076
YD037
44078
YD035
44073
DH441
44165
DH438
44170
DH444
44166
DH449
44171
CQ538
44072
CR385
44075
GE325
44092
ER837
44093
ER843
44094
ER844
44095
ES062
44071
£8067
44070
£8054
44068
Specific
Idents.
2700
900
230000
26000
230
900
650
120
2400
39000
100
-
57000
2500
7600
1000
13000
500
900
12000
470
59000
6500
46000
900
1700
700
Cone. (CL)
Generic
Idents.
--
--
-
33000
--
-
-
--
3600
17000
50
108000
4900
2200
11000
2200
3800
900
-
18000
-
25000
2100
30000
7500
3900
1300
Cone. (AERL)
Unknown
Idents.
2800
9500
340000
46000
500
6400
2700
200
1100
16000
80
74000
-
1700
--
--
-
-
500
12000
89
12000
2100
33000
250
900
--
Specific
Idents.
2300
700
7300
8500
200
1200
400
400
1300
7700
80
--
3200
1000
3100
700
2100
200
--
13000
200
32000
3200
16000
1100
600
800
Generic
Idents.
2000
280
-
77500
225
-
--
--
3000
6000
60
7000
76000
600
7400
--
400
700
760
22000
--
30000
2800
35000
--
1000
200
Unknown
Idents.
2200 (ng/kg)
6100 (iig/L)
4400 (iig/kg)
6400 (iiglkg)
290 (ng/L)
6000 (uglL)
2100 (ng/L)
300 (tig/L)
300 (uglL)
1600 (fig/kg)
90 (uglL)
500 (fig/kg)
- fag'kg)
25 (iiglkg)
-- fag/kg)
-- tea/kg)
- (uglkg)
- fag/kg)
230 (iiglkg)
4000 fag/kg)
200 (ftg/L)
14000 (uglkg)
1600 (iiglkg)
20000 (iiglkg)
-- fag/L)
200 (iiglkg)
- titg'kg)
-------
Note: Mention of trade names or commercial products
does not constitute endorsement or recommendation for
use by the U.S. Environmental Protection Agency. I
United States
Environmental Protection
Agency
Center for Environmental Research
Information '
Cincinnati OH 45268 . .... '-
Official Business
Penalty for Private Use $300
EPA/600/M-89/030
------- |