SEDD SPECIFICATION
Draft Version 5.1
For the Staged Electronic Data Deliverable (SEDD)
August 2005
-------
This page intentionally left blank.
-------
Table of Contents
1.0 AN INTRODUCTION TO THE SEDD SPECIFICATION 1
1.1 DEFINITIONS 2
1.2 RELATIONSHIP OF A DTD OR SCHEMA TO THE SEDD SPECIFICATION 3
1.3 OVERVIEW OF AN EDD FILE CREATED USING A DTD OR SCHEMA BASED ON
THE SEDD SPECIFICATION 4
1.4 THE SEDD SPECIFICATION HIERARCHY 5
1.5 USE OF THE SEDD SPECIFICATION BY THE DATA REQUESTER 9
2.0 SEDD SPECIFICATION SYNTAX, REQUIRED DATA ELEMENTS, AND DATA
ELEMENT VALUE FORMATS 12
2.1 SYNTAX 12
2.1.1 Character and Line Syntax 13
2.1.2 Element Syntax: Data Elements, Names, and Values 15
2.1.3 SEDD Hierarchy, Header and Dependent Data Element Syntax 18
2.1.4 XML File Syntax 21
2.2 TYPES OF REQUIRED DATA ELEMENTS 21
2.2.1 Required Data Elements 24
2.2.2 Conditionally Required Data Elements 24
2.2.3 Data Elements Required for Traceability (Including QC) 25
2.2.4 Data Elements Required for Portability (Including QC) 25
2.3 DATA ELEMENT VALUE FORMATS 25
2.3.1 Text 26
2.3.2 Identifier 26
2.3.3 Limited List 26
2.3.4 Numeric 26
2.3.5 Date 27
3.0 CONCEPTS AND RELATIONSHIPS 29
3.1 CONCEPTS 30
3.1.1 Samples 30
3.1.2 Instrument QC 31
3.1.3 Method 33
3.1.4 Method QC Sample 37
3.1.5 Analysis 38
3.1.6 Results 38
3.2 RELATIONSHIPS 40
3.2.1 Batches 40
3.2.2 Analysis Groups 45
3.2.3 Analyte Groups 48
3.2.4 QC Categories and QC Linkage 50
3.2.5 Comparisons 53
l
-------
This page is intentionally left blank.
11
-------
SEDD Specification Draft Version 5.1
August 2005
1.0
AN INTRODUCTION TO THE SEDD SPECIFICATION
What is the Purpose of the SEDD Specification?
The SEDD (Staged Electronic Data Deliverable) Specification is a specification for developing
standardized electronic data deliverable formats for environmental analytical data. SEDD is
designed to be Agency and Program neutral. An Electronic Data Deliverable (EDD) is comprised
of actual electronic data that is delivered as a unit. The analytical data delivered by laboratories
as EDDs to their data requesters includes sample identification, laboratory measurements, and
laboratory Quality Control (QC) information.
The SEDD Specification provides the framework for developing the Document Type Definitions
(DTDs) or Schemas and the resultant EDD formats by providing general specifications for the
overall data structure of the EDD, while remaining flexible enough to be tailored for present and
future individual Program or Agency needs. Figure 1 shows a general system for the transmittal
of environmental analytical data from the data generator (the analytical laboratory) to the data
requester (Federal or State Agencies, private firms, etc.). Data are generated at the analytical
laboratory and sent to the data requester in the electronic format required by the data requester.
These data are then checked by the data requester for format and content to ensure the data meets
the data requester's contractual and technical requirements.
FIGURE 1. GENERAL SYSTEM FOR TRANSMISSION OF ENVIRONMENTAL ANALYTICAL DATA FROM THE DATA
GENERATOR TO THE DATA REQUESTER
Data Generator
e.g., Laboratories
DTD/Schema Send
Document Type Definition (DTD)/Schema - Format Required by Data Requester
EDD Submit
Electronic Data Deliverable (EDD) Sent by Data Generator
Data Requester
e.g., Federal Agencies
Each data requester must develop a guide based on the SEDD Specification and their individual
technical requirements. This guide should include the DTD or extensible Markup Language
(XML) Schema and a complete set of instructions for developing their specific individual EDD
format. For the most common forms of environmental analysis (e.g., GC/MS, GC, ICP, ICP-
MS), DTDs and Schemas have been developed for SEDD and must be used with no modification
(see Section 1.5).
1
-------
SEDD Specification Draft Version 5.1
August 2005
1.1 DEFINITIONS
The following terms are used throughout this document:
SEDD Specification
A specification for developing standardized electronic data deliverable formats for
environmental analytical data. Under this specification, a common structure and dictionary
are required.
Data Requester(s)
Individual(s) or organization(s) directly responsible for requesting analytical services and
data from the analytical laboratory. For the purposes of this document, the data requester is
assumed to be the person or organization responsible for developing the DTD or Schema.
Examples of Data Requesters include Federal or State Agencies, private engineering or
environmental firms, etc.
DTD (Document Type Definition)
The DTD provides the set of rules for developing the structure and data elements for specific
XML EDD formats. These rules are established by the data requester and the SEDD
structure.
Schema
The Schema (similar to but more powerful than the DTD) would give the set of rules for
developing the structure and data elements for specific EDD formats along with the criteria
for specifying the attributes of the data reported. These rules are established by the data
requester and the SEDD structure.
Data Generator
Individual(s) or organization(s) directly responsible for generating data. For purposes of this
document, the data generator is assumed to be the person or organization responsible for
producing and transmitting the EDD or Schema. Examples of Data Generators include
analytical, radio analytical, or field laboratories.
EDD (Electronic Data Deliverable)
An electronic file created by a data generator (usually the analytical laboratory) for
transmitting and reporting analytical data.
XML (extensible Mark-up Language)
This is a standard devised by the World Wide Web Consortium (W3C) and provides a
common approach to representing information over the Web. XML is a language for
describing data. XML is not owned by any one vendor and thus remains an open standard.
XML is text based, therefore it is processable using any platform. The SEDD Specification
uses XML as its transmission standard.
2
-------
SEDD Specification Draft Version 5.1
August 2005
Element
An element consists of a start tag, content, and an end tag. An element may contain other
elements. The SEDD Data Element Dictionary (DED) contains the list of tags that
developers may use. The SEDD Structure specifies relationships between certain elements
from the DED.
DED (Data Element Dictionary)
The Data Element Dictionary contains the definitions of the defined data elements along with
where in the structure they can be used.
1.2 RELATIONSHIP OF A DTD OR SCHEMA TO THE SEDD SPECIFICATION
The SEDD Specification provides the requirements for developing DTDs and Schemas and the
resultant EDDs for reporting data to meet data requester's (e.g., a Federal Agency) needs. Final
DTDs and examples of the resultant EDDs for major environmental methods are presented in
separate files in Appendix C.
The SEDD Specification is flexible in that it can satisfy diverse requirements. The SEDD
Specification uses a data model based on analytical activities as a starting point for requesting
data deliverables. The SEDD Specification uses names for nodes and data elements to describe
diverse types of laboratory activities. To take full advantage of the standardization available from
the SEDD Specification, data requesters shall use these generalized names in their DTDs or
Schemas. A Data Element Dictionary (DED) listing these generalized names is included in
Appendix A. A DTD or Schema specifies which of the SEDD Specification nodes and data
elements are required.
If, after careful review of the SEDD structure and the SEDD DED, a data requester is convinced
that a particular critical requirement of their program is not addressed, specific data elements may
be added to the data requester's DTD or Schema to address those needs. The data requester must
be aware that implementation of specific data elements particular only to their Program reduce
the ability of multiple Programs to share their data. Data requesters must present their specific
data element needs to the SEDD Program Manager for possible future incorporation into the
SEDD DED that is included within the SEDD Specification (see Section 2.1.2).
A DTD or Schema rarely requires all of the information available in the SEDD Specification.
When necessary, a data requester can require additional information. However, all DTDs or
Schemas and the resultant EDDs created using the SEDD Specification will use the same
structure and DED.
This section, in conjunction with Sections 2, 3, and the SEDD DED, constitutes the Staged
Electronic Data Deliverable (SEDD) Specification for developing the Document Type Definition
(DTD) or Schema and the resultant EDD.
3
-------
SEDD Specification Draft Version 5.1
August 2005
This SEDD Specification document is not a comprehensive specification for a specific DTD or
Schema and EDD format. Specific DTDs or Schemas and the resultant EDDs must still be fully
developed and defined by the data requester using the SEDD specification as a reference. In
order to derive the greatest benefits from the XML technology, data requesters must utilize both
the SEDD Specification structure and the SEDD Specification Data Element Dictionary.
1.3 OVERVIEW OF AN EDD FILE CREATED USING A DTD OR SCHEMA BASED ON
THE SEDD SPECIFICATION
A SEDD Specification-compatible EDD is a text file (with a .xml extension since XML is used as
the transmission standard) that reports data in the form of "elements". There are two kinds of
elements - data elements and nodes. Data elements contain data and have the following parts: a
data element name (start tag); a data element value; and a data element name (end tag). The data
element name is a descriptive label. The data element value is the data value associated with that
particular data element.
Data elements are contained within nodes. Nodes also consist of a start tag and an end tag, but
instead of containing a data value between the start and end tag, nodes contain data elements
between the tags. Figure 2 shows an example of data elements contained within a node. Nodes
are arranged in a hierarchy. The level of a node in the hierarchy partially determines the node's
meaning.
FIGURE 2. SEDD SPECIFICATION ELEMENTS
A. Typical Data Element and Value
24,2
Data Element Name (Start Tag)
Data Element Vah
Data Element Name (End Tag)
B. Data Elements Arranged Within a Node
TNT
l 18-96-7
7 24.2
ug/L
Node Element Name (Start Tag)
A Data Element Within A Node
Node Element Name (End Tag)
4
-------
SEDD Specification Draft Version 5.1
August 2005
1.4 THE SEDD SPECIFICATION HIERARCHY
The SEDD Specification hierarchy is based on a model of laboratory activities, the linkages
between these activities, and the data these activities produce. To take a typical laboratory
scenario, each sample analyzed by one method typically has several results that have to be
reported (e.g., a volatile analysis by SW-846 Method 8260 would have many analytes whose
concentrations are determined and reported). Information regarding the sample and method
(sample identification, method used etc.) would be captured in the SamplePlusMethod node since
this information would be the same for all volatile analytes being reported for the sample. The
results of the analyses performed on the sample using this one method (e.g., the concentration of
dichloromethane being 50 ug/L) would be captured in the ReportedResult node. There would be
several ReportedResult nodes, one for each of the analytes whose results are being reported, with
each of one of these nodes linked directly to the one SamplePlusMethod node (see Figure 3).
FIGURE 3. SIMPLIFIED SEDD SPECIFICATION HIERARCHY
5
-------
SEDD Specification Draft Version 5.1
August 2005
To take full advantage of the standardization available from the SEDD Specification, data
requesters shall use these structures in their DTDs or Schemas. To assist data requesters, the
SEDD Specification structure can be implemented in the following three primary stages
depending upon the level of detail the data requester needs in the EDD:
Stage 1 contains the minimum number of nodes and data elements to transmit "results-only"
data. Only limited method QC data (or no QC data) would be reported in Stage 1. The Stage
1 structure is presented in Figure 4.
FIGURE 4. SEDD SPECIFICATION - STAGE 1
Header
SamplePlusMethod
Analysis
ReportedResult
Stage 2 builds on Stage 1 by adding method (Stage 2a) and instrument (Stage 2b) QC data.
The Stage 2a structure is presented in Figure 5 and the Stage 2b structure is presented in
Figure 6.
6
-------
SEDD Specification Draft Version 5.1
August 2005
FIGURE 5. SEDD SPECIFICATION - STAGE 2a
7
-------
SEDD Specification Draft Version 5.1
August 2005
FIGURE 6. SEDD SPECIFICATION - STAGE 2b
AnalyteGroup
Pe
ak
PeakComparison
Stage 3 builds on Stage 2 by adding additional measurement data to allow for the
independent recalculation of the reported results. The Stage 3 structure is presented in Figure
7.
8
-------
SEDD Specification Draft Version 5.1
August 2005
FIGURE 7. SEDD SPECIFICATION - STAGE 3
1.5 USE OF THE SEDD SPECIFICATION BY THE DATA REQUESTER
A. Common Environmental Analyses (GC, GC-MS, ICP, ICP-MS, CVAA, etc.)
For the most common forms of environmental analysis, program-neutral DTDs already exist and
shall be used to ensure development of a uniform format for the transmission and mutual
exchange of environmental analytical data. These DTDs, along with Valid Values for many data
elements, are provided in Appendix C.
Program-specific requirements would be addressed in the instructions to the data generators.
9
-------
SEDD Specification Draft Version 5.1
August 2005
B. Other Types of Analysis (e.g., Pharmaceutical, Biotechnology, Agriculture, Food and
Beverage Testing, etc.)
Use of the SEDD Specification is not restricted to just environmental analysis, but can also be
used for any type of chemical, radiological, or microbiological analysis data transfer. To create
the DTD or Schema, the data requester must assess the current and future data needs of their
program. Data requesters should evaluate which data elements (fields) they currently receive
from their laboratories, either in an electronic deliverable or on a hardcopy form. Data requesters
should also evaluate the level of assessment they need to apply to the data (e.g., check
completeness only, or confirm calculated values). Based on this evaluation, data requesters then
select the appropriate stage from the SEDD structure. This forms the basis for creating the node
elements in the DTD or Schema using the required linkages.
Based on the specific data items required by the program, the data requester then adds the
corresponding data elements from the SEDD DED to the appropriate node(s).
The DTD or Schema may also be used to define acceptable values for some of the elements. Data
requesters may wish a certain value to be constant (fixed), or to come from a limited set of
possible values. Developers should look under "Attributes" in an XML reference source for
further details. The use of Schemas allow more flexibility in this area.
In addition to the DTD or Schema, the data requester must also specify the appropriate formats
for each element. This would include details on identifying samples, methods, and projects, and
specifying the minimum precision for measurements and results and the maximum length of any
reported value. The following format considerations must be addressed:
Define the level of detail required (e.g., Stage 1, Stage 2a, Stage 2b, Stage 3) provided in
Section 1.
NOTE: It is strongly recommended that data requesters develop their data requirements for a
Stage 3 DTD or Schema first, then pare their requirements down to a lower stage to
ensure that the same DTD or Schema will work as data requester data needs grow.
Use the SEDD Specification (i.e., use the selected stage which must include the required data
elements, the linkages between the data elements, and the SEDD DED for defining all data
elements).
Define all valid values associated with appropriate data elements using the accompanying
Valid Value list. Some elements may be restricted to a single valid value while others could
have many valid values. For example the data element ResultUnits could have the following
valid values associated with it: ug/L, ug/kg, mg/L, mg/kg, etc. A data requester may decide
to only allow ug/L for Volatile Organics in a water matrix.
To assist the data generator in creating the resultant EDD, the data requester will also need to:
Make the DTD or Schema and program specifications available to the data generator.
10
-------
SEDD Specification Draft Version 5.1
August 2005
Work with the data generator to clarify which specific data items already being supplied
correspond to which specific SEDD data elements.
Sections 2 and 3 present the basic information to create an EDD based on the SEDD
specification. Each data requester and data generator must become familiar with the guidelines
and requirements given in these two sections. Once they become familiar with the guidelines,
specific DTDs or Schemas and special instructions can be created by the data requester. The data
generator would use this information to create the EDD.
Section 2 gives the basic rules for presentation of data in each data element, the hierarchy of the
elements within the DTD or Schema and the resultant EDD, and the overall EDD file which must
meet XML requirements.
Section 3 describes some of the key concepts underlying the SEDD Specification analytical
model.
IMPORTANT NOTE: Sections 2 and 3 are not meant to be tutorials in XML technology.
Only basic XML rules as applicable to the SEDD Specification are
summarized. Both the data requester and data generator must be
familiar with XML to correctly apply the SEDD Specification.
11
-------
SEDD Specification Draft Version 5.1
August 2005
2.0 SEDD SPECIFICATION SYNTAX, REQUIRED DATA ELEMENTS, AND DATA
ELEMENT VALUE FORMATS
This section provides the structural and data representation rules (i.e., syntax) for the SEDD
Specification, along with the required data elements, and the format for each data element value.
This information is broken into the following three subsections:
Section 2.1 Syntax
Section 2.2 Required Data Elements
Section 2.3 Data Element Value Formats
Data can be transferred between a data generator and a data requester using XML technology
only if the specifications regarding the data format have been worked out between the two parties
prior to transmission of data (in the form of an XML file). The SEDD Specification provides the
basis of the format for transmission of analytical data by clearly defining: the overall data
structure; data elements; and relationships between the data elements.
NOTE: Most of the examples provided in Sections 2 and 3 pertain to Stages 2b and 3 of the
SEDD Specification. At present, most laboratories deliver EDDs equivalent to Stage
2a of the SEDD Specification (sample results plus method QC data - see Figure 5).
Pilot studies have demonstrated that most laboratories can implement EDD's based on
a Stage 2a SEDD Specification format within a few months. Laboratories can then
build on the Stage 2a EDDs to create and deliver Stage 2b (see Figure 6) or Stage 3
(see Figure 7) EDDs by simply adding additional nodes and data elements to the Stage
2a EDD (as data requestor needs change).
2.1 SYNTAX
Syntax is defined as the rules for representation of data and structure. This section describes the
syntax for the following components of SEDD Specification-compatible EDDs, as well as the
overall file syntax, as follows:
Characters and Lines (see Section 2.1.1)
Elements, Names, and Values (see Section 2.1.2)
SEDD Specification Hierarchy (see Section 2.1.3)
XML File Syntax (see Section 2.1.4)
Figure 8 is a graphical representation of a SEDD Specification-compatible EDD for a common
organics method. This figure illustrates the major components discussed in the syntax section
and provides a useful frame of reference for visualizing the relationship(s) between components.
Figure 8 is not a complete example of the SEDD Specification syntax or data element list.
12
-------
SEDD Specification Draft Version 5.1
August 2005
FIGURE 8. SEDD SPECIFICATION HIERARCHY AND DATA ELEMENTS FOR VOLATILE ORGANICS - STAGE 2a
SamplePlusMethod
< AnalyteT yp e>
< AnalyteGroup ID >
< AnalyteT yp e>
2.1.1 Character and Line Syntax
Characters and Lines
An EDD is a string of characters in a series of lines.
The specific character set used would be specified in the opening XML declaration statement
to call specific data linkage library modules/parsers within the operating system program of
the computer.
NOTE: The encoding called UTF-8 (Unicode Transformation Format - 8-bit form) will be the
default used for SEDD. The UTF-8 character set uses all 7-bit ASCII codes for values
0 through 127 and uses the ISO-8859-1 (Latin-1) codes for values 128 through 255.
The use of other character codes is possible.
13
-------
SEDD Specification Draft Version 5.1
August 2005
There are six basic types of lines:
1. XML declaration line: This is the first line in an XML document. Consists of optional
leading spaces; a less than sign followed by a question mark followed by the letters xml;
the xml declaration statements; a question mark followed by the greater than sign (e.g.,
). The character set used by the EDD would also be specified in
this same line (e.g., ).
2. Document type declaration lines: This is generally the second line in an XML
document. Consists of optional leading spaces; a less than sign followed by an
exclamation mark followed by the letters DOCTYPE; the DOCTYPE declaration
statement; the greater than sign (e.g., ).
3. Blank lines: Contain no characters. Blank lines can occur anywhere in an EDD. They
can be used for visual formatting to aid browsers.
4. Comment lines: Consist of optional leading spaces; a less than sign, the exclamation
mark, two dashes; followed by the comment; followed by two dashes and greater than
sign (e.g., from actual comments
made as part of the official EDD itself. Within an EDD, the Comment data
element is used to convey information that may be important for the proper
interpretation of the data presented (example: This is a
comment. value). A data element
usually delivers a single piece of information.
6. Nodes: Consist of optional leading spaces; a less than sign followed by a node tag name
followed by a greater than sign (e.g., ). A closing node tag name must also
be used later in the EDD and consists of optional leading spaces; a less than sign,
followed by a forward slash, followed by the same node tag name, followed by the
greater than sign (e.g., ). A node contains other data elements.
Optional leading spaces in lines are allowed so EDD data can be indented to improve
readability.
14
-------
SEDD Specification Draft Version 5.1
August 2005
2.1.2 Element Syntax: Data Elements, Names, and Values
Data Elements
Definitions for the SEDD Specification data elements are provided in Appendix A.
A data element tag consists of a start data element tag name followed by the data element
value content followed by an end data element tag name (e.g., 24.2). A
data element usually delivers a single piece of information. Data elements are sometimes
called fields.
Spaces are not allowed unless they are an integral part of the data element value.
The SEDD Specification has no rules forbidding the use of any data elements and data
requesters cannot add any such rules. If data requesters receive more data elements than
originally requested, the data requester should ignore these additional data elements.
Data Element Names
A data element name is a string of alphanumeric (A-Z, a-z, 0-9) characters. A name identifies
the data delivered by a data element.
A name should be limited to a maximum of 30 characters.
Character case is significant in names. The first letter of all words used in data element
names must be capitalized. Example: AnalyzedDate would be acceptable while
Analyzeddate would not.
Data element names defined in SEDD are listed in Appendix A with a description of their
usage.
Implementation-defined names are strongly discouraged but allowed. They must begin with
the word "New". It is recommended that an identifier for the implementation (i.e., the
Agency responsible for the data) follow the word "New" (e.g., 'NewEPADataElement').
This will prevent conflicts across formats when implementation-specific names have been
added. Useful implementation-defined names will be incorporated in later versions of SEDD
under SEDD-assigned names in the SEDD DED. Data requesters are urged to carefully
check the SEDD DED to ensure that the needed data element is not already present in the
dictionary under another name.
NOTE: Please check with the SEDD Program Manager for any assistance regarding the
addition of new data elements.
Names should be spelled-out and each word should have its first letter capitalized (e.g.,
'ResultUnits') using the Upper Camel Case (UCC) convention. Excessive abbreviation
should be avoided.
15
-------
SEDD Specification Draft Version 5.1
August 2005
Data Element Values
A data element value can contain any string of characters allowed in a line, as restricted by
rules defined by the data requester. A value is the data delivered by a data element.
NOTE: Part of the definition of a name is a possible restriction on the format of values
associated with it, such as numeric or date formats. Section 2.3 defines these
formats.
By SEDD rules, the length of the value is specified by the data requester.
The value begins immediately after the starting data element tag name. In particular, any
spaces after this starting tag and before the ending tag are part of the value.
A value can be null (contain no characters). Null values would be reported on a single line
using the following format: optional leading spaces; a less than sign, followed by the data
element tag name, followed by a forward slash, followed by the greater than sign (e.g.,
).
Nodes
A node element consists of an opening node tag and a closing node tag which are located on
separate lines within the EDD. The opening node tag is a less than sign followed by a node
tag name followed by a greater than sign (e.g., ). A closing node tag name must
also be used later in the EDD and consists of a less than sign, followed by a forward slash,
followed by the same node tag name, followed by the greater than sign (e.g., ).
The first node encountered in the EDD is referred to as the root node. This node can only be
used once. All data delivered in the EDD must fall within the opening root node (e.g.,
) and the closing root node (e.g., ).
Node Element Names
A node element name is a string of alphanumeric (A-Z, a-z, 0-9) characters.
A name should be limited to a maximum of 30 characters.
Character case is significant in names. The first letter of all words used in node element
names must be capitalized. Example: SamplePlusMethod would be acceptable while
SampleplusMethod would not.
The following are SEDD Specification-defined node element names and a description of the
intended use of the associated elements.
16
-------
Analysis
AnalysisGroup
Analyte
AnalyteComparison
AnalyteGroup
Handling
Header
InstrumentQC
Peak
SEDD Specification Draft Version 5.1
August 2005
Describes one complete sequence of events, from taking a
sample aliquot and preparation through measurement,
defined as part of one method.
Links several analyses used to compute results for one
method.
Per-analyte data from one analysis or one group of analyses.
Describes the effects of potentially interfering analytes on a
peak (e.g., ICP Interelement Correction factors).
Links several analytes used to compute results for another
analyte.
Describes any manipulation of the sample prior to taking an
aliquot (e.g., filtering, ashing, leaching).
Carries information for the reader/parser of an EDD data
stream.
Describes QC analyses that do not yield results
(concentrations). These data are usually about instrument or
process QC (e.g., initial and continuing calibration data).
Reports an actual measurement made during one analysis of
one analyte.
PeakComparison
PeakReplicate
PreparationPlusCleanup
ReportedResult
SamplePlusMethod
Implementation-defined nodes are not allowed.
Describes cross-peak comparisons (e.g., abundance ratios
and inter-peak resolutions).
Reports data when multiple peak measurements are made
(e.g., multiple exposure readings).
Describes a preparation or cleanup as part of an analysis or
an independent method.
Contains the actual results of a method (analyte
identification and computed final results).
Contains data about the characteristics of one sample
analyzed under the criteria of one method.
17
-------
SEDD Specification Draft Version 5.1
August 2005
Node elements should be spelled-out and each word should have its first letter capitalized
(e.g., 'ReportedResult') using the Upper Camel Case (UCC) convention. Excessive
abbreviation should be avoided.
Node Elements
A node definition begins at a node definition line and continues until the closing node
definition line is encountered.
The contents of a node are the node definition line that starts it, all data elements and other
nodes within it, and the node definition line that ends it. In particular, blank and comment
lines (, as opposed to Comment data elements) are not part of any node.
Most data element names can only be used in specific nodes (see Appendix A). However,
some data element names may appear in more than one node, possibly with slightly different
definitions.
A data element name may not appear more than once in any given node.
With the exception of data elements denoted as required by the SEDD specification (see
Section 2.2) and those specified as required by a SEDD-compatible EDD implementation, no
data element is required to appear in an EDD.
SEDD specification has no rules restricting the order of data elements in nodes and data
requesters cannot add any such rules.
2.1.3 SEDD Hierarchy, Header and Dependent Data Element Syntax
SEDD specification nodes are arranged in a hierarchy. Figure 7 defines this hierarchy. As
implied by the term hierarchy, nodes at any level can repeat as many times as needed.
Subject to the rules associated with the hierarchy, there are no restrictions on the number,
type, and order of nodes in an EDD and a data requester cannot add any. For example, an
empty EDD is valid, if not useful. Data requesters can choose to ignore nodes they do not
recognize or which are of no interest to them.
Parent nodes with identical content cannot be repeated. For example, a SamplePlusMethod
node could not be repeated before each individual ReportedResult node associated with it. It
is more efficient to only have one SamplePlusMethod node for all associated
ReportedResults' nodes.
NOTE: Data requesters need to check an EDD for global consistency of all potentially
redundant data.
18
-------
SEDD Specification Draft Version 5.1
August 2005
Header Nodes
A Header node is always the first node in an EDD. It provides the information (e.g., EDD
version and implementation identification) needed to identify and process the EDD reliably.
Each Header node can only refer to and report data based on a single DTD or Schema. Data
for multiple methods can be reported under a single Header node provided that the same DTD
or Schema can be used by each method.
An EDD (e.g., a disk file), may have multiple Header nodes as long as each header is unique.
Data processing must start fresh at each new Header node.
If two valid EDDs are concatenated (i.e., two data files with different Header nodes are
appended in one file), the result is one valid EDD.
Other Nodes
In an EDD, all other nodes must be preceded by at least one node at each higher level. Figure
7 defines the dependencies between these nodes. For example, the SamplePlusMethod node
is dependent on a Header node.
Any given node is said to be associated with the closest preceding node at each higher level
in an EDD.
The pattern of associated higher-level nodes must match the same pattern as in Figure 7. For
example, in order to have a valid ReportedResult node in an EDD, it must be preceded by a
SamplePlusMethod node and a Header node, as shown in Figure 7. There cannot be an
InstrumentQC node between the ReportedResult and SamplePlusMethod nodes. Otherwise,
the ReportedResult node would not relate to a sample. This concept is called nesting. All
nodes must be properly nested.
When more than one node of a higher level and identical type precedes another node, the
dependent node is related to the nearest higher order node as shown in the following
example:
Initial_Calibration
SV202
ICALO 1
RRF-010
SV204
ICALO 1
RRF-050
l 1/20/1997 14:50
GCMS 1
SV205
32
-------
SEDD Specification Draft Version 5.1
August 2005
ICALO 1
RRF-080
l 1/20/1997 15:50
GCMS 1
SV206
ICALO 1
RRF-160
l 1/20/1997 16:50
GCMS 1
ICALO 1
3.1.3 Method
A method corresponds to a defined process for the identification and quantitation of selected
analytes. The analyte list for a method often corresponds to compounds or substances
measurable after one analysis on one analytical instrument.
A method should include specifications for the type, frequency, and performance criteria for
QC samples.
Details of a method can be client-specific. The following SEDD Specification data elements,
ClientID, ClientMethodID (for the instrumental analysis), and MatrixID, are used to identify,
not describe, a method. This allows the reader of an EDD to look up in their own database
whatever method characteristics are needed to correctly process the data. Some programs,
such as the Contract Laboratory Program (CLP), define the method to include all of the
sample processing steps. Other programs, such as SW-846, will define unique methods for
each of these steps.
The SEDD Specification defines four types of activities associated with applying a method to
a sample: characterization; handling; preparation; and instrumental analysis.
33
-------
SEDD Specification Draft Version 5.1
August 2005
~ Characterization
Characterization applies to the sample as received by the data generator. This includes
recording or measuring characteristics such as weight, color, texture, temperature,
moisture, and pH. All such data is reported in the SamplePlusMethod node using various
data elements (e.g., pH, Clarity and Turbidity). Characterization might include a
screening process to determine which variant of a method to apply (e.g., determining the
level for Organics).
Here are two examples for reporting Characteristics.
Example 1 - Use of a Screening Method
In a complex case (e.g., a GC screen prior to a GC/MS analysis) the details of the screen
could be reported under a separate SamplePlusMethod node with a different
ClientMethodlD. However, it is not common to report this level of detail.
Example 2 - Reporting of Percent Moisture (% Moisture)
In some cases, a characteristic such as Percent Moisture can be reported in one of two
possible ways. The first, and most common approach is to treat Percent Moisture as a
characteristic property of the sample itself and report it using the PercentMoisture data
element in the same SamplePlusMethod node as the sample. The second approach is to
treat Percent Moisture as a separate test entirely. For this situation this test would then be
reported in a separate SamplePlusMethod node with a separate Results node for Percent
Moisture.
~ Handling
Handling applies to any manipulation of the sample prior to taking an aliquot for
preparation/analysis. Examples include filtering, decanting, drying, grinding, ashing, and
leaching (TCLP in particular). It is common to further characterize the sample (e.g., by
weight) after each handling. All such data is reported in a Handling node under the
appropriate SamplePlusMethod node. Each Handling node is characterized by a
ClientMethodlD or HandlingType data element.
Many methods have no handling stage and no Handling node is required in these cases.
Less commonly, more than one handling is done, so more than one Handling node is
required.
Example:
A sample is dried and the weight is recorded in one Handling node using one
HandlingType data element. In a subsequent step, the dried sample is ashed and the
weight after ashing is recorded in a second Handling node with a different HandlingType
data element.
34
-------
SEDD Specification Draft Version 5.1
August 2005
~ Preparation
Preparation applies to all processing done to an aliquot prior to instrumental analysis.
The details of preparation might involve many steps (e.g., taking an aliquot, extraction,
and cleanup). Most methods have a primary processing step, such as chemical extraction
or separation that are part of the analysis method. The preparative steps could also be
described in a separate method. The specific preparation details would normally be
captured in a separate PreparationPlusCleanup node, not in the Analysis node.
A few values, such as aliquot size, dilution, and yield (radiochemistry), are often
important to report for each preparation. These data can go in an Analysis or a
PreparationPlusCleanup node. For analyses that require a minimum or no preparative
steps, no PreparationPlusCleanup nodes would be needed.
PreparationPlusCleanup nodes are used to report specific preparation steps, especially
when a separate method is used to describe this activity. These are similar to Handling
nodes in that there might be none, one, or several, depending on the method. The
difference is that a PreparationPlusCleanup node applies to one aliquot used in a
preparation, while a Handling node applies to one sample prior to taking an aliquot.
The ClientMethodID or CleanupType data element in the PreparationPlusCleanup node is
used to characterize the cleanup.
~ Instrumental Analysis
Instrumental analysis, more generally called the determinative step, is where
measurements are made for a list of analytes. Values such as instrument identification
and date analyzed are reported in the Analysis node. Analyte-specific values from this
analysis are reported in Analyte nodes under the Analysis node.
If the analytical technique involves measurements of multiple peaks per analyte (e.g.,
GC/MS mass spectra, multi-component GC analytes, ICP emission spectra), details
would normally be reported in Peak nodes under each Analyte.
The following example, derived from the SW-846 Semivolatile data for Method 8270,
illustrates how the various activities in one method are reported in XML format. This
example uses the following Nodes: SamplePlusMethod; Handling; Analysis; and
PreparationPlusCleanup. The Analyte node is not used. This example XML file starts
with a comment line. Please note that there are comment lines in XML format to clarify
information presented in the nodes or data elements following the comment lines.
35
-------
SEDD Specification Draft Version 5.1
August 2005
65
Decanted
SV422
8270C
12/24/1997 14:3 8
1
Dry
3 5 50B
3 0. 0
g
36
-------
SEDD Specification Draft Version 5.1
August 2005
1.00
mL
12/07/1997
3640A
Matrix_Spike
Spike
PreparationBatch
108-95-2
Phenol
Spike
75
ug/L
67.3
ug/L
71. l
3.1.5 Analysis
The SEDD Specification defines an analysis as one complete sequence of events starting with an
aliquot or prepared sample, perhaps involving preparation, and including an instrumental
analysis. This information would be captured in the Analysis node and related nodes (all nodes
located below the Analyses node in the heirarchy - see Figure 7). An analysis may be part of a
method applied to a sample or part of an instrumental QC process. Thus Analysis nodes are
present under both the SamplePlusMethod node and the InstrumentQC node.
3.1.6 Results
The final results of a method are always reported in ReportedResult nodes. Intermediate
results underlying the final results are reported in Analyte nodes, however, final results can
also be reported in Analyte nodes. These final results take into consideration any dilutions or
Percent Moisture calculations that would be needed.
NOTE: In the simple case of only one analysis per method, the same values could be
reported in both ReportedResult and Analyte nodes. The ReportedResult nodes are
required. The Analyte nodes are optional and might be used to report intermediate
38
-------
SEDD Specification Draft Version 5.1
August 2005
and final measurements or final measurements from both a primary and
confirmation analysis.
The SEDD Specification distinguishes between the result of a method (e.g., reported
concentration of benzene of 250 ug/L in a groundwater sample by Method 8260) and the
result of an analysis (e.g., serial dilution for metals). Method results are reported in
ReportedResult nodes directly under the SamplePlusMethod node. Analysis results can be
reported in Analyte nodes under Analysis nodes or in Peak nodes under Analyte nodes.
Because InstrumentQC does not have a method-like result, no ReportedResult nodes are
used.
Certain analytes in certain methods are always measured on a per-analysis, not per-method,
basis (e.g., surrogates and internal standards). They should be reported in Analyte nodes, not
under ReportedResult nodes. For example: in the analysis of pesticides there are separate
surrogate recoveries computed for each column and surrogate analyte, which must be
reported in Analyte nodes.
Whenever more than one analysis is performed, either the data element LabAnalysisID or the
data element AnalysisGroupID must be used. The data element LabAnalysisID must be used
to link each reported result to the SINGLE underlying Analysis data used to produce that
result. The LabAnalysisID data element must be present in both the Analysis node and
ReportedResult node. If each reported result is calculated from MULTIPLE underlying
Analysis data, then the AnalysisGroupID data element must be used. See Section 3.2.2 for a
complete discussion of analysis groups.
There are three ways to report sample reanalysis data:
1) If the laboratory is to report two sets of results, two SamplePlusMethod nodes should be
used.
2) If the laboratory is to pick one "best" result for each analyte, use one SamplePlusMethod
node. The LabAnalysisID data element would be used to link each result to the correct
underlying analysis.
3) If the data requester requires all potential results in addition to the selected result(s), use
Analyte nodes to report them under each Analysis node.
The following example, derived from the SW-846 Inorganics data, illustrates how both
final and intermediate results can be reported for one soil sample. This example XML
file starts with a comment line. Please note that there are comment lines in XML format
to clarify information presented in the nodes or data elements following the comment
lines.
39
-------
SEDD Specification Draft Version 5.1
August 2005
7440-43-9
ICP-CDl-981103-49
<\This is the final reported result in soil units for the soil sample.>
0.7 6
mg/Kg
3.2 RELATIONSHIPS
This section provides a discussion of the analytical relationships within the SEDD Specification.
These relationships are used to link each reported sample result back to the underlying processes
that were used to generate or evaluate that result.
3.2.1 Batches
The SEDD Specification uses the concept of a batch as the primary mechanism for
associating QC samples with regular ones. Analytical data review requires this association to
assess the impact of QC sample performance to the quality of the regular sample results. For
example, when doing volatiles by GC/MS, it is common practice to tune the instrument,
40
-------
SEDD Specification Draft Version 5.1
August 2005
calibrate it, and analyze a blank once every shift. In this case, the SEDD Specification uses
the concept of the Analysis batch to associate all the analyses done in one shift. All three QC
analyses (tune, calibration, and blank) are associated with all regular sample analyses in that
shift by having a common value for the AnalysisBatch data element that occurs in each of
their Analysis nodes.
The actual value used for the AnalysisBatch data element corresponding to one shift is not
specified by the SEDD Specification, only that the value must be the same for all analyses in
one shift and different for analyses in different shifts. The basis of the value for the
AnalysisBatch data element must be given in the instructions provided by the data requester.
The SEDD Specification uses the following eleven (11) data elements to define Batches:
AnalysisBatch
A group of analyses done on one instrument under the control of one continuing calibration
or continuing calibration verification. Calibration is used in a generic sense. The details of
what defines an AnalysisBatch depends on the method.
PreparationBatch
A group of aliquots prepared together for analysis by one method. 'Together' can imply
similarity of the time, place, and manner of preparation, with details depending on the
method. The notion of preparation is used in a generic sense for any activity prior to
instrumental analysis. Method blanks and/or Laboratory Control Samples are often used to
demonstrate that the laboratory's process is in control in each PreparationBatch.
HandlingBatch
A group of samples, not aliquots, handled together during the initial processing for analysis
by one method. An example of QC in a HandlingBatch is a TCLP apparatus blank.
CleanupBatch
A group of aliquots going through a cleanup step together as part of preparation for analysis
by one method. An example of QC in a CleanupBatch is a Gel Permeation Chromatography
(GPC) calibration.
RunBatch
A group of analyses done on one instrument under the control of one initial calibration.
Calibration is used in a generic sense. The details of what defines RunBatch depend on the
method. Typically, one RunBatch includes the analyses from one or more analysis batches.
MethodBatch
A group of samples, not aliquots, with similar matrices, analyzed by one method and
expected to have similar response to the method. Matrix spikes and duplicates are typical
types of QC associated with a MethodBatch.
41
-------
SEDD Specification Draft Version 5.1
August 2005
LabReportingBatch
A group of samples reported as a unit, (e.g., a CLP Sample Delivery Group). This batch is
often used to define the context for definition (uniqueness) of batch values in data generated
by the laboratory.
StorageBatch
A group of samples that are stored together. Volatile Organic Compound (VOC) refrigerator
blanks are examples of QC associated with a StorageBatch.
ShippingBatch
A group of samples shipped in one container, such as a crate, cooler, or ice chest. Trip and
temperature blanks are examples of QC associated with a ShippingBatch.
EquipmentBatch
A group of samples collected using the same equipment in a defined period of time. Rinsate
blanks are examples of QC associated with an EquipmentBatch.
SamplingBatch
A group of samples collected together. Field blanks are examples of QC associated with a
SamplingBatch.
The following example shows how a regular field sample is linked to the preparation, run, and
analysis batches when implemented according to SW-846 Method 8270 (Semivolatile) rules.
This example XML file starts with a comment line. Please note that there are comment lines in
XML format to clarify information presented in the nodes or data elements following the
comment lines.
Initial_Calibration
ICAL 1
42
-------
SEDD Specification Draft Version 5.1
August 2005
RRF-10
first run batch
first analysis batch
Instrument_Performance_Check_Tune
Tune 2
first run batch
second analysis batch
Continuing_Calibration_Verification
CCV 1
first run batch
second analysis batch
S001
first run batch
second analysis batch
first preparation batch
43
-------
SEDD Specification Draft Version 5.1
August 2005
Matrix_Spike
first run batch
second analysis batch
first preparation batch
Matrix_Spike_Duplicate
first run batch
second analysis batch
first preparation batch
Method_Blank
first run batch
second analysis batch
first preparation batch
44
-------
SEDD Specification Draft Version 5.1
August 2005
As shown in the above example, the Client Sample ID "S001" is linked to three batches as
follows:
To the intial calibration and associated tune by the data element RunBatch containing the
value "first run batch".
To the continuing calibration verification and associated tune by the data element
AnalysisBatch containing the value "second analysis batch".
To the Method QC samples (Matrix Spike, Matrix Spike Duplicate, and Method Blank)
by the data element PreparationBatch containing the value "first preparation batch".
3.2.2 Analysis Groups
Some methods may require several analyses to calculate one result (e.g., Total Organic
Carbon (TOC), method of standard additions). Further, some methods may allow several
potential results to be calculated, with an average final result selected for reporting as the
"final result." An example using the Method of Standard Additions is given below.
The SEDD Specification uses AnalysisGroup nodes under the SamplePlusMethod node to
associate several analyses underlying one set of possible results.
AnalysisGroup nodes under the InstrumentQC node are also used to link several analyses
(e.g., multipoint initial calibrations, however, single-point initial calibrations can also be
reported in this manner) that are part of one instrument QC process.
The AnalysisGroupID data element must be present in the following three nodes -
AnalysisGroup, Analysis, and ReportedResult.
The following example, derived from the SW-846 Inorganics data, illustrates a complex method
requiring AnalysisGroup nodes. Data is shown for only one analyte and many less important data
elements are omitted. In this example, the method of standard additions (MSA) was performed
on the field soil sample "BH-0945Q". This example XML file starts with a comment line. Please
note that there are comment lines in XML format to clarify information presented in the nodes or
data elements following the comment lines.
45
-------
SEDD Specification Draft Version 5.1
August 2005
First Analysis Group
7440-28-0
Thallium
First Analysis Group
MSA
First Analysis Group
GFAA-TL 1-981015 -28
MSA-0
7440-28-0
7. 6
ug/L
46
-------
SEDD Specification Draft Version 5.1
August 2005
First Analysis Group
GFAA-TLl-981015-29
MSA-l
7440-28-0
3. 5
13
ug/L
First Analysis Group
GFAA-TL 1-981015 -3 0
MSA-2
7440-28-0
7. 0
17
ug/L
First Analysis Group
GFAA-TL 1-981015-3 l
MSA-3
7440-28-0
10.5
21
ug/L
47
-------
SEDD Specification Draft Version 5.1
August 2005
As shown in the above example, the Client Sample ID "BH-0945Q" has been analyzed using the
Method of Standard Additions to determine its final Thallium Result "1.58" with ResultUnits
"mg/kg". The final reported result is linked to the data element AnalysisGroupID with a value of
"First Analysis Group" instead of to a single analysis (which would have used the data element
LabAnalysisID). All the MSA Analyses are similarly linked to each other and the final result
through the same AnalysisGroupID data element having the value "First Analysis Group".
3.2.3 Analyte Groups
Some methods may require the reporting of an analyte that is not directly measured by that
method where the reported analyte is actually the combination of two or more analytes that
are directly measured by that method (e.g., Hardness, where the reported analyte Hardness is
actually the summed values of the Calcium and Magnesium measured analytes).
The SEDD Specification uses AnalyteGroup nodes under the Analysis node to associate two
or more analytes that are directly measured using the method(s) indicated to report an analyte
that typically cannot be directly measured.
The AnalyteGroupID data element must be present in the following three nodes -
AnalyteGroup, Analyte, and ReportedResult.
The following example, derived from SW-846 Inorganics data, illustrates a method requiring the
AnalyteGroup node. Data is shown for only two measured analytes and many less important data
elements are omitted. In this example, a single analysis was performed on the field water sample
"SW-001" and a final result was reported for 'Hardness'. This example XML file starts with a
comment line. Please note that there are comment lines in XML format to clarify information
presented in the nodes or data elements following the comment lines.
First Analyte Group
Calcium
7440-70-2
Magnesium
7439-95-4
6010C
IEC 1
P2_ICP_042694
7439-92-l
Lead
220.3 5
53
-------
SEDD Specification Draft Version 5.1
August 2005
7429-90-5
Aluminum
0.0003 70
As shown in the above example, during the analysis of Lead by ICP-AES, Aluminum is an
interferent. Lead is identified as an analyte under the Analyte node using the data element
AnalyteName. The Peak used for measurement of Lead is identified in the Peak node by the data
element PeaklD. Aluminum interferes with this peak and its interelement correction factor is
reported in the AnalyteComparison node in the data element CorrectionFactor.
Peak Comparison
Peak comparisons are used to compare measurements made at two or more different peaks.
Peak comparisons can describe cross-peak comparisons within the same analyte (e.g.,
abundance ratios fortunes), between two analytes (e.g., calculating the relative response
factor (RRF) for initial and continuing calibrations or inter-peak resolutions in
chromatography).
Data from these peak comparisons are reported in the PeakComparison node (see Figures 6
and 7).
One common example for using PeakComparison nodes (given below) is for the reporting of
GC/MS tune data where only a single analyte is involved. The following example XML file
starts with a comment line. Please note that there are comment lines in XML format to clarify
information presented in the nodes or data elements following the comment lines.
8270C
Instrument_Performance_Check_Tune
Tune 1
SV417
5074-71-5
DFTPP
System_Monitoring_Compound
54
-------
SEDD Specification Draft Version 5.1
August 2005
68
198
0. 0
69
0.0
As shown in the above example, a DFTPP tune (for method SW-846 8270 semivolatile organics)
is being reported under the Analyte node using the data element AnalyteName. The peak (mass
number 68) to be evaluated is identified under the Peak node using the data element PeaklD.
This peak (mass number 68) is compared to the following two peaks (mass numbers 198 and 69)
under separate PeakComparison nodes using the data element PeaklD. The data element
PercentRatio is used to report the actual comparison (expressed as a percent ratio).
55
------- |