&EPA
            United States      Solid Waste and
            Environmental Protection Emergency Response
            Agency        (5306)
                      EPA530-R-95-010
                       PB95-191 235
                         April 1995
Guidelines for
Assessing the Quality
of Life-Cycle Inventory
Analysis

-------

-------
                             EPA Contract Number 68-D2-0065
                           RTI Project Number 94U-5810-49 FR
Guidelines  for Assessing the Quality
   of Life-Cycle  Inventory Analysis
                     March 1995
                      Prepared for

                      Lynda Wynn
                      Eugene Lee

                  Office of Solid Waste
            U.S. Environmental Protection Agency
                    401 M Street, SW
                 Washington, DC 20460
                      Prepared by

                     Jodi S. Bakst
                   Christopher J. Lacke
                     Keith A. Weitz
                     John L. Warren
           Environmental Economics and Management
               Center for Economics Research
                 Research Triangle Institute
              Research Triangle Park, NC 27709

-------

-------
                                    PREFACE

       Life-cycle assessment (LCA) results can vary depending on how the sponsoring
group defines the goals and scope of the LCA.  With an increasing number of
organizations using LCA for a wide variety of purposes, the need has arisen for objective,
scientifically based guidelines for conducting LCAs.

       EPA has responded to this need by supporting a multi-office LCA program that
helps develop such guidelines.  This multi-office program consists of representatives from
the Office of Research and Development (ORD), the Office of Air Quality Planning and
Standards (OAQPS), the Office of Solid Waste (OSW), and the Office of Pollution
Prevention and Toxics (OPPT).  The LCA program uses a consensus-building approach,
coordinating closely with the Society of Environmental Toxicology and Chemistry
(SETAC).  Through the organization of a series of workshops, SETAC has laid the
groundwork for the development of a technical framework for conducting LCAs.

       Two of the first in a series of EPA LCA methodological guidelines documents,
Life-Cycle Assessment: Inventory Guidelines and Principles and Life Cycle Design
Guidance Manual: Environmental  Requirements and the Product System, were published
in early 1993. Supplementary LCA documents including Life-Cycle Assessment:  Public
Sources of Data for the LCA Practitioner and Life-Cycle Impact Assessment: A
Conceptual Framework, Key Issues, and Summary of Existing Methods were completed in
September 1994.  Ongoing EPA LCA projects include life-cycle inventory case studies on
residential carpeting  systems, shop  towels in industrial laundaries, and solvent alternatives;
streamlined LCA methodology development; and product re-design through LCAs.

       The purpose of this guidelines document is to provide practitioners with direction
and to facilitate the use of a consistent  approach for assessing and documenting LCI data
quality.  This document discusses key issues that make the assessment of LCI data quality
unique, outlines a possible framework for assessing and documenting LCI data quality,
and discusses specific techniques to support the LCI data quality assessment process.  The
relatively qualitative approach to data quality assessment outlined in this document is
representative of the current state of LCA practice and data quality assessment techniques.
As LCA and data quality assessment techniques evolve, it is expected that a greater level
of statistical analysis will be used.
                                         in

-------
                             ACKNOWLEDGMENTS
       This document was originally prepared for the U.S. Environmental Protection
Agency (EPA) Office of Solid Waste, under EPA Contract No. 68-D1-0118, where
Eugene Lee and Lynda Wynn served as project officers. Work was completed under the
EPA Office of Air Quality Planning and Standard (OAQPS), under EPA Contract No. 68-
D2-0065, where Charles French served as project officer. The technical work was
originally conducted under Research Triangle Institute (RTI) Project No. 94U-5290-49 FR
and completed under Project No. 94U-5810-49 FR.  Carolyn Acree, Judy Parsons, and
Andrew Jessup provided editorial, word processing, and graphics support for this
document.

       Technical guidance, review, and comments were provided by Dr. Roy Whitmore,
Senior Statistician, RTI, Center for Research in Statistics. Additional guidance, reviews,
and comments were provided by Mary Ann Curran  (ORD/RREL), Tim Ream (OAQPS),
Eun-Sook Goidel (OPPT), and Bruce Vigon of Battelle. Special thanks to S. Nasir AH of
Arthur D. Little,  Inc. and  Terrie Boguski and Robert Hunt of Franklin Associates, Ltd
who participated in case study applications and technical review of these guidelines.

       Peer Reviewers included Paul Arbesman, Allied Signal; Derek Augood, Scientific
Certifications Systems; Bob Berkebile, American Institute of Architects; Terrie Boguski,
Franklin Associates, Ltd.; Michael Brown, Patagonia; Frank Consoli, Scott Paper
Company; Gary Davis, University of Tennessee; Richard Denison, Environmental Defense
Fund; James Fava, Roy F. Westin,  Inc.; Kate Gross, The Body Shop Inc.; Michael
Harrass, Amoco  Corporation; Frances Irwin, World Wildlife Fund; Greg Keoleian,
University of Michigan; John Kusz, Safety Kleen; Arthur Weissman, Green Seal  Inc.;
Beth Quay, The Coca-Cola Company; Athena Sarafides, NJ Department of Environmental
Protection; Jacinthe Sequin, Environment Canada; Karen Shapiro, Tellus  Institute; Donald
Walukas, Concurrent Technologies Corporation; and John Young, Hampshire Research
Institute. Additional reviewers included John Wilkens, DuPont; David Wheeler, The Body
Shop, Inc.; and Joel Ann  Todd, The Scientific Consulting Group, Inc.
                                        IV

-------
                                   CONTENTS
Chapter

Preface
Acknowledgments
Key Terms, Definitions, and Acronyms
Page
   111
   iv
   x
       Introduction
  1-1
       1.1    The Current State of LCA Data Quality Evaluation
       1.2    Purpose and Scope of These Guidelines
       1.3    Areas Outside the Scope of These Guidelines
             1.3.1   Peer Review
             1.3.2   LCA "Code of Practice"
             1.3.3   Life-Cycle Impact Assessment
             1.3.4   Life-Cycle Improvement Assessment

       1.4    Overview of Document
  1-3
  1-4
  1-5
  1-5
  1-6
  1-7
  1-8

  1-8
       Key Data Quality Issues for Life-Cycle Assessment

       2.1    Using a Variety of Data to Build an LCI
             2.1.1  Data Sources
             2.1.2  Data Types
             2.1.3  Levels of Aggregated Data

       2.2    Identifying "Key" LCI Data Sets
       2.3    Aggregating Data in the LCI Data Development
             Process
       2.4    Uncertainty Propagation
       2.5    Using Nonparametric Samples in an LCI
       2.6    Using Environmental Releases Data in an LCI
  2-1

  2-2
  2-2
  2-5
  2-6

  2-7

  2-8
  2-9
 2-10
 2-10

-------
Assessing the Quality of Life-Cycle Inventory Data
 3-1
3.1    Scope and Boundary Definition
3.2    LCI Data Collection Plan
       3.2.1  Defining Data Quality Goals (DQGs)
       3.2.2  Identifying Data Sources and Types
       3.2.3  Identifying Data Quality Indicators (DQIs)
       3.2.4  Developing a Data Questionnaire

3.3    Data Collection and Quality Assessment
       3.3.1  Collecting Data
       3.3.2  Assessing Data Sources Against Relevant DQIs
       3.3.3  Assessing Data Quality

3.4    Evaluating Model Sensitivity and Results
       3.4.1  Performing Sensitivity and Uncertainty Analyses
       3.4.2  Determining Whether DQGs Have Been Met
3.5    Documenting and Referencing LCI Data Quality Results

Applying Sensitivity and Uncertainty Analysis Techniques to LCI Data

4.1    Sensitivity Analysis Techniques
       4.1.1  Single System (One-Way) Sensitivity Analyses
       4.1.2  Tornado Diagrams
       4.1.3  Ratio Sensitivity Analysis
4.2    Uncertainty Analysis
       4.2.1   Sources of Uncertainty
       4.2.2  Qualitative Measures for Controlling Data Errors
              and Uncertainty
       4.2.3  Quantitative Methods of Uncertainty Analysis

 Compensating for Missing Data and Data Deficiencies

 5.1    Missing Data and Data Deficiencies
 5.2    Adjusting for Missing Data and Data Deficiencies
 3-3
 3-7
 3-8
3-10
3-10
3-12

3-15
3-15
3-16
3-19

3-22
3-23
3-24
3-25

 4-1

 4-2
 4-3
 4-3
 4-7
 4-9
 4-9

4-12
4-14

  5-1

  5-1
  5-2
                                   VI

-------
             5.2.1   Imputation
             5.2.2   Weighting Methods
5-3
5-7
Appendix A:  Data Quality Indicators                                            A-l
             A.I    Acceptability                                               A-2
             A.2    Bias                                                      A-3
             A.3    Comparability                                              A-5
             A.4    Completeness                                              A-6
             A.5    Description of the Data Collection Method and Limitations     A-6
             A.6    Precision                                                  A-8
             A.7    Level of Reference                                        A-11
             A.8    Representativeness                                        A-12
Appendix B:  Bibliography
B-l
                                        Vll

-------
                                    TABLES
Number                                                                   Page

3-1   Case Study Example:  LCI Scoping and Boundary Definition                3-5
3-2   Case Study Example:  DQGs, Data Sources, and DQIs                      3-9
3-3   Selected Data Quality Indicators Relevant to LCI                         3-12
3-4   Case Study Example:  LCI Data Quality Worksheet                        3-21
3-5   Example Proof Statements for the Bias DQI                              3-22
4-1   Hypothetical Sensitivity Analysis Worksheet                               4-4
4-2   Hypothetical Tornado Diagram Worksheet for Product X                    4-6
4-3   Energy Sensitivity Analysis on Two Production Systems                    4-8
4-4   Uncertainty  Propagation Formulas for Random and Systematic Errors        4-15
A-l   Sample Data to Calculate Bias                                           A-4
A-2   Sample Data to Calculate Precision                                      A-10
                                       vm

-------
                                    FIGURES
Number

2-1   Example Data Sources, Data Types, and Levels of Data Aggregation
3-1   Life-Cycle Stages
3-2   LCI Data Development Steps
3-3   Example of a Partial LCI Data Questionnaire
4-1   Sensitivity of Air Releases for Product X
A-l   Relationship Between Precision and Bias
Page

  2-3
  3-2
  3-4
 3-13
  4-5
 A-9
                                        IX

-------
                            KEY TERMS AND DEFINITIONS
Aggregated Data

Data Aggregation


Data Quality
Data that are represented as summary reported values.

The aggregation of data within and between system components
that occurs in the LCI data development process.

The degree of confidence an analyst has in a data source or a
data value based on an evaluation of data quality goals (DQGs),
data quality indicators (DQIs), and the role of the data in an
overall LCI.
Data Quality Goals
(DQGs)
Qualitative statements that define specifications for the
adequacy of data used in an LCI or for certain LCI parameters.
Data Quality Indicators Quantitative or qualitative terms defining data characteristics
(DQIs)                    that serve as benchmarks against which data quality can be
                            assessed to determine whether DQGs have been met.

                            Data Quality Indicators Used in this Document:
                            Acceptability
                            Bias
                            Comparability
                            Completeness
                            Data Collection Method
                            and Limitations
                            Precision
                            Referenced
                            Representativeness
                      The degree to which the data source has been peer
                      reviewed, evaluated against an accepted standard, or
                      checked for errors through expert judgment.

                      The level of systematic error that causes the mean values
                      of a data set to be consistently (over repeated samples)
                      higher or lower than corresponding "true" parameter
                      values. With respect to LCI data, "true" parameter
                      values may not be known.

                      The degree to which different methods, data sets, or
                      decisions agree or can be represented as similar or
                      equivalent.

                      The amount of data available for the analysis compared
                      with the amount of data needed or desired.

                      The level of information describing the method
                      of data collection, including any limitations associated
                      with the data collection method.

                      The degree of spread or variability, expressed
                      numerically if possible, in a set of data values or
                      measurements compared to the mean of the data values.

                      The degree to which data values reference the original
                      data source.

                      The degree to which the data represent what the analyst
                      is trying to describe or depict.
                                               X

-------
Data Type
Goal Definition
and Scoping
Impact Assessment
Improvement
Assessment
Imputation


Inventory Analysis



Key Data
Life-Cycle Assessment
(LCA)
Life-Cycle Stages
 Measured Data
Data resulting from different data generation methods (e.g.,
measured, modeled, sampled, nonmeasured, and regulated data).

The definition of the  LCA purpose and objectives; the
identification of the product, process, or activity of interest; the
identification of the intended end use study results; and the key
assumptions employed.

The quantitative and/or qualitative classification,
characterization, and  valuation of impacts to ecosystems, human
health, and natural resources, based on the results of an LCI.

The identification and evaluation of opportunities to  achieve
improvements in processes that result in reduced environmental
impacts, based on the results of an LCI or impact assessment.

The procedure of replacing a missing value with a value
considered to be a reasonable proxy or substitute.

The identification and quantification of raw materials and
energy inputs, air emissions, water effluents, solid waste, and
other life-cycle  inputs and outputs.

A data point or data  set that contributes significantly to the
overall LCI results.

A holistic tool used to evaluate the resource consumption and
environmental burdens associated with a product, process, or
activity through its entire life cycle. A complete LCA includes
four components:  goal definition and scoping, inventory
analysis, impact assessment, and improvement assessment.

The stages of any  production process, including raw materials
acquisition; manufacturing (including materials manufacture,
product fabrication, and filling/packaging/distribution);
use/reuse/maintenance; and recycle/waste management.

Data generated using analytical or  physical measurement
procedures, including survey questionnaires, sampling, or
monitoring.
                                         XI

-------
Modeled Data

Nonmeasured Data
Nonparametric
Samples
Nonstatistically Based



Proof Statements


Primary Data
Random Sample
Secondary Data



Sensitivity Analysis


Statistically Based


Subsystem

System

Uncertainty Analysis
Data generated as output from models.

Data generated through the use of engineering estimates, best
professional judgment, educated guesses, etc.

Data that are typically chosen or developed from relatively
small samples, that are not independent of each other, and often
do not result  in normal bell-curve  distributions.  Such data are
often difficult to analyze using standard statistical techniques.

Any quantity or function that is not developed from the data in
a probability-based sample, but developed, for example, from
convenient or judgment-based samples.

Statements that provide general rules by which data analysts
can evaluate a specific data source.

Plant-specific, measured, modeled, or estimated data for
conducting an LCI that the practitioner can directly access or
for which the practitioner has input into the data collection
process.

A sample of n units selected from a well-defined population in
such a way that all possible samples of size n have the same
chance of being selected.  For example, a sample selected using
n independent, equal-probability selections.

Data that have not been collected  specifically for the purpose of
conducting an LCI and for which  the practitioner has no input
into the data  collection process.

A systematic process for evaluating the effect of variations of
input parameters to a system on model results.

Any quantity or function that is developed from the data in a
probability-based sample.

An individual process that is part  of a defined system.

A collection of operations that perform a desired function.

The evaluation of uncertainty associated with inferences.
Measures of uncertainty may be both qualitative (e.g.,
representativeness of the sample or choice of a model) or
quantitative (e.g., sampling precision and bias).
                                         xu

-------
Variability
The fluctuation in variables over time or space. Variability can
result from natural fluctuations or from the sampling and
experimental design (e.g., through the use of experimental
blocking factors, unequal probabilities of selection).
                                           xui

-------
                                 ACRONYMS






CV         Coefficient of Variation




DOE        U.S. Department of Energy




DQG        Data Quality Goal




DQI        Data Quality Indicator




EIA        Energy Information Administration




EPA        U.S. Envkonmental Protection Agency




LCA        Life-Cycle Assessment




LCI        Life-Cycle Inventory




QA/QC     Quality Assurance / Quality Control




SETAC     Society of Envkonmental Toxicology and Chemistry




SIC        Standard Industrial Classification




TRI        Toxics Release Inventory
                                       xiv

-------
                                     CHAPTER 1
                                  INTRODUCTION

       The quality of life-cycle assessment (LCA) or any other decisionmaking tool is
only as good as the quality of the information upon which it is based.  Thus, assessing the
quality of data utilized in a life-cycle inventory (LCI) is essential if the overall LCA
results are to be properly interpreted and communicated.  While LCA practitioners usually
undertake some level of data quality assessment, the rigor with  which that evaluation is
applied—and the extent to which LCA reports discuss data quality—varies significantly.
Practitioners are encouraged to continue and advance LCA data quality assessment
practices.
                                              KEY POINTS
                                                                        in $ssessir*g daia\
                                                 quality is needed, "'    *          v
                                                 ~     •* .    *,  * ?.      *  *       <,
                                              ^cx:ts^>.^^A     i ^ ^                  ^  «,
                                              >«ATrtese guidelines Jprovide,, LCA practitioners* with ,
                                                * systematic procedures for "assessing* aiid reporting
                                               ,* LCI data qualify,   > *k         .   *
                                             *»•>»-,. At «*   *,        v  •*'   *  ^
                             will apply to   ^ !»"~\Tlie^etguideBnQs focus on assessing the^^qualrty of>
                                            v\. * 'key* LCt data sbarces.       J >*<•  J
                             tively          ^\^i\\'-\'\"  *\    i'  ^   '   " •   '*?
'<-lfit>*»i>  ^ ~     _j   ••'       *
*s«, These guidfettoes br,sprri!far practices sjiould lead to
   increased confidence in the ftnaf LCJ results.
       This document provides LCA
practitioners with guidelines for assessing
and communicating LCI data quality in a
systematic manner. Although the focus of
this document is on the inventory analysis
component of LCA, some of the data
quality issues and procedures
LCA data as a whole. A relatively
qualitative approach to LCI data quality
assessment is taken in this  document,        ^	
representative of the current state of LCA
practice and existing data quality assessment techniques.  The consistent application of
this or a similar framework should advance both the assessment and documentation of
data quality issues in LCAs.  As LCA and data quality assessment techniques evolve, it is
expected  that a greater level of statistical analysis  will be used.

       LCA is a holistic concept and methodology to identify the environmental
consequences of a product, process, or activity throughout its entire life cycle and to
identify opportunities for achieving environmental improvements.  Life-cycle stages
addressed in LCA include raw  materials acquisition, manufacturing, use/reuse/
maintenance, and recycling/waste management.
                                          1-1

-------
       The LCA approach consists of four interrelated components:

          •  Goal definition and scoping:  the definition of the study purpose and
            objectives; the identification of the product, process, or activity of interest;
            the identification of the intended end use study results; and the key
            assumptions employed.

          •  Inventory analysis:  the identification and quantification of raw materials
            and energy inputs, air emissions, water effluents,  solid waste, and other life-
            cycle inputs and outputs.

          •  Impact assessment: the qualitative or quantitative classification,
            characterization, and valuation of impacts to ecosystems, human health, and
            natural resources, based on the results of an inventory analysis.

          •  Improvement assessment:  the identification and evaluation of opportunities
            to achieve improvements in processes that result in reduced environmental
            impacts, based  on the results of an inventory analysis or impact assessment.

       A variety of organizations use the LCA methodology to serve  a number of
different purposes. Companies use LCA results internally to support management
decisions on production processes, product design, and product packaging.  Companies use
LCA results externally to support product environmental claims.  Government agencies
use LCA results to support both technical policy and  regulatory decisions.  Consumer and
environmental groups utilize LCA results to compare the environmental preferability of
alternative products or production processes.

       To date, most LCA practice involves practitioners compiling inventories and
drawing conclusions based on simple data assessment techniques (e.g., the lower the
amount of inputs or outputs, the lower the resulting environmental burden).  Few
practitioners have conducted impact assessments based on the results  of an inventory
analysis, largely due to the lack of development and consensus on specific impact
assessment  techniques.  Guidelines for conducting improvement assessment, although
widely practiced on an informal basis, also have not yet been  established. As existing
LCA tools and techniques are refined and new ones developed, practitioners are expected
to conduct more impact and improvement assessments.

       Completing an LCA typically requires the acquisition and synthesis of significant
amounts of data. Given LCA's data-intensive nature  and the importance of decisions  that
are made based on LCA results, data quality is a  critical concern. Important problems can

                                         1-2

-------
arise if LCA results are based on poor or inadequate data.  For example, if an internal
analysis is based on poor quality data, misleading results could lead to process or product
decisions that are unnecessarily costly.  Likewise, conclusions based on external analyses
that used poor quality data could mislead public perception about the environmental
profile of a product or process.  Due to such concerns, there is a growing consensus
within the LCA community that data quality assessment procedures need to be integrated
into LCAs and the results discussed more fully.

1.1    THE CURRENT STATE OF LCA DATA QUALITY EVALUATION

       The evaluation of data quality for LCA studies was once considered a
predominantly internal, informal exercise that relied heavily on expert judgment, with
limited reporting in the final study results. LCA data quality assessment is still a largely
judgmental process and will continue to be so for at least the near term. However,
statistical information on some data is currently available and should be used whenever
possible.

       Recently, an international survey was conducted (Vigon and Jensen, 1992) to
evaluate the current treatment of data quality and data base issues by LCA practitioners.
Thirty-three European and North American organizations involved in LCA practice
responded to questions addressing  scope and boundary setting; data collection procedures;
and data analysis, interpretation, and reporting.  Following are highlights of the
practitioner survey results.

       Current data quality practices typically include an informal system of cross-check
procedures. This analysis may involve a comparison through expert judgment of internal,
primary data with other published  or trade association data, or a comparison of estimated
or measured data with theoretical values.  Other approaches for checking data quality
include internal project team discussions and third-party review.  However, these
procedures are often not clearly defined and documented in the final LCA  results.

       Practitioners are aware of the need for greater clarity and consistency in assessing
and reporting data quality. Reflected in the  survey responses is an acknowledgment of the
lack of a systematic approach for determining the quality of the data being collected.
Specific areas identified in the survey as needing improvement include techniques for
sensitivity analysis, uncertainty analysis, and the quality of available data.  Improvements
                                         1-3

-------
in these and additional techniques should result in more tightly developed conclusions and
enhanced understanding of the impact of the data quality on LCA results.
        Cj * *1   I"lHp Iff  * Itat   *W 1*"  ^*™4»'V™ *^"1/^ ^ V   ^  -^ ^ «* ^5 S' T^lf^ p, •»<*"
        «  I   ffc *l  H I   I  III.'!-* »•>*«*,, »« •«'  * «»  >- V*" •*"* 'J-^"* ^5
         •;  i  :;  practitioners are aware of the need for greater clarity
          ,    .,, and consistency in assessing and reporting data quality.
       This document addresses several of the data quality assessment and reporting needs
identified in the LCA practitioner survey.  In particular, Chapter 3 presents a data quality
assessment framework which, if widely employed, will improve the consistency and
clarity with which LCA data quality is assessed and reported. This framework is designed
to complement the steps typically taken by practitioners during the conduct of an LCI.

1.2    PURPOSE AND SCOPE OF THESE GUIDELINES

       The purpose of these guidelines is to provide guidance for LCA practitioners and
others in assessing the quality of "key" LCI data sources (see Sections 2.4 and 3.4.2 for a
discussion  of key data sources). These guidelines address where and how data quality can
be considered when conducting an LCI and describe a general, systematic approach for
assessing LCI data quality based on data quality goals (DQGs), data quality indicators
(DQIs), and documentation of practitioner judgments.  The approach relies on a series of
data quality worksheets for compiling and reporting data quality information on key data
sources. These guidelines do not suggest conducting detailed assessment and
documentation of every data source used, nor do they attempt to lay out each step an
analyst should take to collect LCI data and assess its quality. The document also
discusses sensitivity analysis, uncertainty analysis, and data compensation techniques.
These guidelines have been tested through the conduct of two LCA practitioner data
quality assessment case studies and modified  accordingly.
                    jrira'1 f i	is '"" T- ^''"isiamsH" n "n- -"Jj- "™ -i»^'iw™j™TBiviH-™~*. ~~.^ -.n^-» -r -- ,. , j-* ^T -^ -jv r™ ff j^» ^- -^- -  ^ . -
                     purpose  of these  guidelines  is to provide a
                framework for LCA practitioners and others to assess
the quality of "key
IM pi 4 ft*  i *"' » i»* ««*
                                  LCI data sources.
       It is recommended that LCA practitioners incorporate data quality assessments into
 both internal and external LCAs. Internal LCAs must provide the stakeholders with high-
 quality information that is accurate as an engineering, marketing, and overall
                                         1-4

-------
decisionmaking tool. Practitioners may use external LCA results to educate the public
about products and processes, to support product claims* or to advocate a particular public
policy issue.  Given the public nature of the information, characterizing the accuracy of
external LCA results is an important concern.  If practitioners follow these guidelines or
similar practices when assessing and reporting data quality, users of LCA results should
have increased confidence in the information.  In addition to LCA practitioners, possible
users  of these guidelines include:

       •  analysts who are working to develop and improve LCA methodologies,
       •  analysts who need to assess the quality of data used in existing LCAs, and
       •  decisionmakers who use LCA results.
       Steps in the LCI data development process are complementary and iterative, and
thus assessing LCI data quality is an iterative process.  For example, if the collected data
does not meet the previously defined DQGs, additional data must be collected or DQGs
must be redefined. The interconnection between LCI data development steps is discussed
further in Chapter 3.

1.3    AREAS OUTSIDE THE SCOPE OF THESE GUIDELINES

       In addition to data quality concerns,  four additional areas are important to the
future development and use of LCA:  the development of an LCA peer-review process, a
code of LCA practice, and procedures for conducting LCA impact and improvement
assessments. It is beyond the scope of this  document to define or discuss any of these
issues in detail.  Here, they are addressed briefly in terms of their relationship to LCI data
quality concerns.

1.3.1 Peer Review

       The Society for Environmental Toxicology and Chemistry's (SETAC's) document,
A Technical Framework for Life-Cycle Assessment, indicates that peer review is a
necessary and important step for ensuring the technical and scientific credibility of LCAs

                                        1-5

-------
(SETAC, 1991). In response, the SETAC LCA Advisory Group collaborated with
business, consumer, and environmental groups and with academia to develop an interim
peer-review framework (SETAC, 1992).  The framework recommends, among other
things, the establishment of a peer-review panel early in any LCA that will be used in a
public forum to review the following:

       • the LCA study goals,  scope, and key assumptions;
       • data acquisition and compilation methodologies;
       • compiled data and associated quality measures; and
       • the communication of results in a draft final report (EPA, 1993b).

       These guidelines also recommend incorporating peer or expert review into the data
quality assessment process.  Peer review  is recommended from a data quality perspective
because it provides an additional set of checks and balances on the use of LCI data.  Peer
review should improve the conduct of LCIs, increase the understanding of the results, and
aid in further identifying and subsequently reducing  any environmental consequences of
products or processes (EPA, 1993b).  However, peer review of data sets and their quality
is expected to remain largely internal for at least the near term because of confidentiality
issues and the need to have a high level of expertise at the process or product level.

       Although EPA supports the use of peer reviews, this document does  not attempt to
define an appropriate peer-review process for LCI data quality assessment or LCAs in
general.  SETAC will continue to identify and develop LCA peer-review procedures.

1.3.2 LCA "Code of Practice"

       Inconsistencies and uncertainty in LCA results have resulted in the recognition of
the need for a standardized LCA methodology (Denison, 1992). These guidelines present
operating principles for the evaluation and reporting of LCI data quality.  They do not,
however, outline a code of practice for the entire LCA process; the development of such
principles is being undertaken by SETAC.

       This LCA "Code of Practice" is not a standard for conducting LCAs.  Instead, it
provides guidance on the process and on the methodological aspects of conducting an
LCA, emphasizing that
                                        1-6

-------
       •  LCA is a complex, multi-dimensional tool;
       •  LCA methodology has yet to be described in full (of the LCA components, only
          inventory analysis has been well documented);
       •  inventory analysis and subsequent improvement decisions are the state of the art
          in 1993; and
       •  new issues will arise as practitioners expand the application of LCA (SETAC,
          1993a).

       Although SETAC does not have regulatory powers or the authority of a standards
 organization, it hopes that this LCA "Code of Practice" will be recognized as  authoritative
 and will serve to enhance the use of LCA as a tool for environmental management
 (SETAC, 1993a).

 1.3.3 Life-Cycle Impact Assessment

       Data quality concerns associated with conducting an impact assessment extend
 beyond those of the inventory.  A framework for conducting an impact assessment,
 including the key issues related to the impact-assessment phase of an LCA and possible
 methods for conducting impact assessment, are addressed in EPA's "Life-Cycle Impact
 Assessment: A  Conceptual Framework, Key Issues, and Summary of Existing Methods"
 (EPA, 1994b),  and SETAC's A Conceptual Framework For Life-Cycle Impact Assessment
 (SETAC, 1993b).

       Impact  assessment is generally used to determine the environmental effects
. attributable to inventory items.  Conducting an impact assessment typically involves a
 range of models and analytical methodologies that often require extremely complex data
 manipulations compared to those used in inventory analysis. Thus, the quality of impact
 assessment data and the results  produced by these models will be much more difficult to
 assess than those of the inventory component.

        In general, impact assessment focuses on a greater number of data values than
 inventory analysis because it focuses on the relative harm of inventory items, which is not
 solely a function of their magnitude. For example,  10 pounds of xylene released to a
 river may not be considered as significant as 10 tons of carbon dioxide (COj) released to
 the air based on absolute values. However, the environmental impact of xylene would be
 very important, probably more so than the larger amount of CO2.  Thus data values that
 may not be considered key to the inventory may be critical to the impact assessment, and

                                        1-7

-------
quality assessment is not as straightforward as focusing on a few key data values based on
degree of effect on final LCI results.

1.3.4  Life-Cycle Improvement Assessment

       Data quality concerns also exist when conducting the improvement assessment
phase of an LCA.  Although widely practiced on an informal basis, there is no consensus
on procedures for conducting a life-cycle improvement assessment. Vigon and Curran
(1993) have outlined one possible framework for conducting an improvement assessment,
but it has not yet been discussed in a public forum. Because it is in the improvement
assessment component of LCA where options for achieving environmental improvements
are evaluated, accurate and reliable data are critical. The lower the quality of data used in
improvement assessment, the higher the risk of making the wrong decision.

       One data quality concern relating to both impact and improvement assessment is
that of compounding uncertainty, where imperfect information developed in the inventory
is used as input data for imperfect impact and improvement assessment models.  These
and other issues will need to be researched and addressed as methods for conducting
impact assessment continue to develop.

1.4    OVERVIEW OF DOCUMENT

       The LCI data quality assessment guidelines  are outlined as follows:

       Chapter 2 identifies key issues that make data quality for LCI unique, reviews the
data sources and data types typically used when conducting an LCI, and discusses
potential problems with data quality.

       Chapter 3 provides  a detailed discussion of the steps involved in building the LCI
and discusses where and how data quality can be evaluated.  Chapter 3 also defines DQGs
and DQIs and examines their applicability to an evaluation of LCI data quality.

       Chapter 4 explains the concepts behind sensitivity and uncertainty analysis
methodologies and their applicability to LCI data.

       Chapter 5 discusses  some alternatives for handling missing or deficient data and
summarizes existing data compensation techniques.

                                       1-8

-------
                                    CHAPTER 2
        KEY DATA QUALITY ISSUES FOR LIFE-CYCLE ASSESSMENT

       "Data quality" can take on a variety of meanings.  For the purposes of these
guidelines, data quality is defined as the degree of confidence an analyst has in a data
source or data value based on an evaluation of data quality goals (DQGs), data quality
indicators (DQIs), and the role of the data in an overall LCA. (See Chapter 3 for a
detailed discussion of DQGs and DQIs.)
       r^//<,$?|j^                                              v-    ~, ',
       v.  < v  »*:- -!_*_» __^_.^ i	._»_ J.,_i.<._ base8 on an evaluatloirbf "\s v*> *
                                               * *   _ * *     .il ±L   **     "*"
                                            quality; i
                                   an overall
          •= >,
          "A- »
       Ideally, data of the highest possible
quality for use in LCIs would consist of
                                          X:KEY POINTS
                                           '
                                          G?
                                                        rnore to the overall quality of the LCI
                                                            ff
                                            „•> Wncertain^ oceyfe and* can be 'compounded at* a
                                                  * .     *       >   ;„««.>**
                                                  ! of  Jte ^existtng  environmental  -data5- was
measured values with ranges and standard   „% *^Da*a-u|«LirtiCte are typically a mlxt
errors, which would be developed using     ^'*  s^A,   >v%s>    *   °%"
                   ,    ,         , ,         ' 3 »a A. 'subset^ ol x"kev* * data ° ssts
methods  documented and approved by an    ***? ..?  ^.'  ?A	.-? J ^. „-__„
accepted standards organization, measured
and reported by individual material
component (i.e., chemical species), and
reported  on the basis of a specific time
frame. In practice, most of the data used
.   T _„   ,        _,..,,    , .         ^Y-''^*""•*•-  N  !^*  ^"V*  >;  "           -  o
in LCIs does not fit this ideal, and in        ^•••••^••••a        •••i	i.	     *
many cases may not need to meet these
stringent criteria. Therefore assessing and communicating the quality of LCI data is
critical.

       LCA practitioners and interested parties have recognized key issues that make data
quality assessment unique in the context of LCI:

       •  a variety of data are used to build an  LCI which may vary by source, type, and
          level of aggregation;
       •  some subset of "key" data sets may account for a larger amount of variability;
       •  aggregating data sets is inherent in the LCI data development process;
                                        2-1

-------
       • uncertainty propagation in the LCI data development process;
       • the use of nonparametric samples in an LCI; and
       • the use of existing data sources which have not been specifically developed for
         LCI use.

       This chapter discusses each of these issues as they relate to LCI data quality.  The
purpose of this discussion is to provide the reader with a general understanding of key
LCI data quality issues, as they provide the foundation for many of the LCI data quality
assessment steps and procedures discussed hi  Chapter 3.

2.1 USING A VARIETY OF DATA TO BUILD AN LCI

       LCI data sets can be comprised of data that varies by the primary or secondary
nature of the data source, the type of data employed (e.g., measured, modeled, or
nonmeasured), and the level of data aggregation (e.g., individual plant or industry average
data).  It is important that practitioners recognize these characteristics among data sets and
consider how their use may affect the outcome of the LCI.  For example, secondary data
sources are generally less specific than actual measured or monitored plant-specific data
found in most primary data sources.  Similarly,  aggregated data may provide little
indication of the variability in a parameter of interest, such as facility-specific air
emissions.  Nonaggregate data, on the other hand, often can be used to calculate statistical
measures, such as the mean, median, standard deviation, and kurtosis, to  provide an
indication of the central tendency, variability and skewness in the data. Examples of data
sources, data types, and different levels of aggregation are shown in Figure 2-1 and
discussed in the following sections.

2.1.1   Data Sources

       Data sources have been defined fairly  consistently throughout the LCA literature,
and authors agree on the definition of a data source within the context of an LCA. A data
source is where LCA data are found and includes, for example, industry  reports;
government documents, reports,  and data bases; journals; and reference books (see Figure
2-1).
                                        2-2

-------
  Figure 2-1. Example Data Sources, Data Types, and Levels of Data Aggregation
                  Sources: SETAC, 1991; EPA, 1993b; EPA, 1994a.

       The EPA's Life-Cycle Assessment:  Inventory Guidelines and Principles (EPA,
1993b) identifies sources of government documents, including the U.S. Department of
Commerce's Census of Manufactures, the U.S. Bureau of Mines' Census of Mineral
Industries, the U.S. Department of Energy's Monthly Energy Review,  and EPA's Toxic
Release Inventory (TRI) data base.  SETAC's report, A Technical Framework for Life-
Cycle Assessment (SETAC, 1991), also refers to data sources in general categorical terms
and provides specific examples of information sources within the different source
categories.

       EPA's report Life-Cycle Assessment:  Public Data Sources for the LCA
Practitioner (EPA, 1994a) is a reference document on public data sources potentially
useful in LCAs. This report identifies and describes major types of public data sources
potentially applicable to LCA and evaluates data bases for LCA applicability.  Other
sources of data discussed include bibliographic data bases, relevant documents and
                                        2-3

-------
directories of literature, electronic data base clearinghouses, foreign data bases, and
ongoing studies.

       Data used in LCIs are typically a mix of primary and secondary data.  Because
secondary data are typically collected for purposes other than LCA, additional data quality
issues are often applicable.  The following sections discuss data quality issues related to
the use of primary and secondary data in the LCI data development process.
Primary Data
       Primary data are plant-specific, measured, modeled, or  estimated data that the
LCA practitioner can directly access or for which the practitioner has input into the data
collection process.  In most cases, primary data are preferred for use in LCIs because they
are specific to the product or process being evaluated and are more amenable to assessing
data quality concerns.  Companies conducting or commissioning LCIs have usually
classified their primary data as proprietary. Under this circumstance, the data and the
associated collection methods are unavailable for review or simply not released to the
public.  In some cases, summary results are made available but they do not include the
actual plant-specific data. Consequently, verifying the quality of these data can be
difficult. If an LCI involves the use of proprietary information, the level of data quality
rests largely on the reputation of the company providing the data.

       Although plant-specific data are often more representative, they are not always
preferred to industry-based data.  Situations can arise where plant-specific data are less
representative than  aggregate industry figures. For example, a steel-can  manufacturer may
buy steel on the metals market but may not be able to specify how or where the steel was
manufactured. In this case, an LCI that used plant-specific manufacturing data may be
less representative than one using data that reflect the mix of plants  and processes that
constitute the steel  commodities market (Arthur D. Little, Inc., 1993a).

Secondary Data
       Secondary data include data that were not collected specifically for the conduct of
LCI and for which  the practitioner has no input into  the data collection process.
Secondary data sources can be more difficult to evaluate from a data quality perspective.
The data source may not include an explanation  of the data collection methods and data
variability.  Secondary data may be specific to a product or process under  evaluation or
they may be aggregated values.  In  the latter case, the data may require  some form of
                                        2-4

-------
manipulation before values suitable for use in LCIs can be generated (see the discussion
in Section 2.1.3 on aggregated data).

2.1.2  Data Types

       A data type is defined by the method used to generate the data.  Data types
generally include measured data (statistically or nonstatistically based), modeled data, and
nonmeasured data (e.g., educated guesses or estimates).

Measured Data
       Measured data are monitored or sampled.  These data can be collected under
statistically  based or nonstatistically based protocols. In addition to the general concerns
expressed above regarding primary and secondary data, measured data present potential
data quality problems. Monitored or sampled data could be inaccurate if an inappropriate
sample design is used or if there is bias in the monitoring or sampling devices.  It is
important to have at least a description of the data collection methods or the statistical
protocol to  assess the validity of measured sources.

Modeled Data
       Practitioners may use models to generate data for LCIs, or they may rely on
secondary sources that use a model to generate the data. Models can be used to simulate
an industrial process or estimate emissions from a production process.  The potential
shortcomings associated with modeled data primarily pertain to how the model  is
constructed. To provide simulated data with a high confidence level, models should be
validated. Validation refers to the degree to which the model has been checked against a
standard or reference value to determine if it represents what it is supposed to represent.
If the model output equals or is reasonably close to the reference value, then the model is
likely to generate relatively accurate results. Validation techniques include using expert
opinion to determine whether the model presents an adequate representation of  the process
or using the model to reproduce an historical data set.  A model that is not validated and
verified may produce apparently useful results that may in fact have very little  relation to
the true values desired.

        It is often easier to verify and validate primary modeled data than secondary
modeled data because practitioners usually have access to their own models and can
assess the accuracy of the model and results.  Secondary data  sources may provide  neither
the full model nor information regarding the author's verification and validation activities.
                                         2-5

-------
Nonmeasured Data
       Nonmeasured data include estimates based on best professional judgment or other
educated guesses. Many nonmeasured data sources do not describe the methods used to
estimate the data or whether mathematical or other techniques were employed to
compensate for missing data and data deficiencies.  Assessing the quality of these data
requires reviewing the evidence and assumptions that went into deriving the value(s).
Without sufficient understanding of how and why the estimates were made, an analyst
may make biased inferences using the data. Unless experts who developed the data can
be consulted or there is adequate peer acceptance of the data source, the analyst may be
unable to determine the extent of potential bias.

2.1.3  Levels of Aggregated Data

       Aggregated data typically are presented as summary reported  values.  There are a
variety  of levels in which data can be aggregated:

        •  individual measures (i.e., unaggregated);
        •  temporal averages (monthly, annual);
        •  spatial averages (facility, industry, county, state, national); and
        •  normalized (per unit value) (EPA, 1993b).

       Aggregated data are potentially useful when the LCI results are used for broad
applications.  Aggregated data are generally more representative of a  composite function,
such as an industry as a whole.  In addition, such data can be made publicly available, are
widely  usable, and are more general in nature (EPA, 1993b).

        The quality of aggregated data is often more difficult to assess than the quality of
nonaggregated data because aggregated data sets usually do not list the intermediate
values that were used in compiling the  data. The data aggregation process can eliminate
valuable information (e.g., variability in the data) needed to evaluate the quality of a data
set.  Li addition, aggregated data also may require some sort of manipulation (e.g.,
interpolation, extrapolation, or back-calculation) to generate a value suitable for use in
LCIs.  Any uncertainty associated with the aggregated data may be compounded by the
method used  to produce appropriate LCI data values.

        For example,  the DOE Monthly Energy Review contains a section with monthly
and yearly totals for residential and commercial, industrial, transportation, and utility
                                         2-6

-------
energy consumption. Within each class of end users, the data are broken down by the
type of fuel used to generate the energy. To use these data in lieu of facility specific
information to perform an LCI on a product (e.g., tires), an analyst would have to make
assumptions about the percentage of industrial energy consumed in tire production.  This
percentage may be estimated using expert opinion/knowledge or using a mathematical
model.  The level of uncertainty in the aggregated consumption values could be
compounded by the uncertainties associated with any assumptions used in modifying the
data.

       Nonaggregated data may or may not provide further information to assess data
quality. Nonaggregated data can be used to calculate such statistical measures as the
mean, median, standard deviation, and kurtosis—all of which may provide an indication of
the central tendency, variability, and skewness in the data set.  On the other hand,
nonaggregated data may be composed of inputs  from different plants within the same
company.  Differences  in the data could be due to the way the parameter was measured
(e.g., direct measurements,  statistical sampling, engineering estimates, modeling); the
different operating  characteristics of the plants (e.g., fuel and raw material mixes or
regulatory requirements); or the period of measurement (e.g., episodic, continuous, random
sampling, or daily/monthly/annual). In these instances, nonaggregated data may provide
no additional useful information unless the other sources of variability also are discussed.

2.2   IDENTIFYING "KEY" LCI DATA SETS

       The complexity of the system being studied in the LCI can greatly affect data
quality assessment.  Even relatively simple systems may require collection, review,
comparison, assessment, and validation of hundreds or thousands of data points (Franklin
Associates, Ltd.,  1993).  While it is ideal to have high quality data for all data sources,
time and resource constraints require that data quality assessment efforts of multiple large
data sets  be proportionately allocated to the most important or "key" data sets. Key data
sets are defined as those that contribute significantly to final results and thus dictate the
 quality of the overall LCI. It is recommended that practitioners focus data quality
 assessment efforts on key data sets.  However, it should be noted that key data for any
 particular analysis or conclusion may not be key for other analyses.
                                         2-7

-------
       Identifying which data are key to the LCI thus becomes a critical aspect of data
quality assessment. A number of qualitative and quantitative techniques are currently
being used to help practitioners pinpoint these key data sets.  For example, in the field of
environmental labeling, "screening" matrices are often used to provide a more qualitative
indication of the most significant life-cycle stages or environmental effects.

       Formal and informal sensitivity analysis techniques may also be used to help
identify key data sets. Formal techniques provide a more quantitative approach and
require generating mathematical models to evaluate the relative importance of individual
or multiple system parameters.  Informal sensitivity analysis provides a more  qualitative
approach and relies on the analyst's empirical knowledge of the data set to determine
which parameters are likely to contribute the most to  LCI results. The results of an
informal sensitivity analysis may be validated by formal sensitivity analysis techniques
(Franklin Associates, Ltd., 1993).  Specific sensitivity analysis techniques are described in
further detail hi Chapter 4.

2.3    AGGREGATING DATA IN THE LCI DATA DEVELOPMENT PROCESS

       As used here, the term "data aggregation" refers to the aggregation of data within
and between system components that occur in the LCI data development process.  In
short, data aggregation is a common technique in the  LCI data development process that,
when practiced correctly, can be a useful tool for presenting and managing LCI data.
Typical primary and secondary LCI data vary widely  on a variety of characteristics,
including:

       • the time frame over which the data were  collected;
       • the geographic scale represented by the data (e.g., facility-level observations
         versus national averages);
       • product or process variations (i.e.,  multiple  processes may be used to
         manufacture the same product, each with its own distinct quantity and mix of
         inputs and outputs);
       • data value size (e.g., small versus large measurements);
       • significant figures of data values (e.g., a rough estimate versus a number that
         has four significant digits);
       • data collection process (e.g., actual measurement versus expert judgment); and
       • data manipulations (e.g., normalizing per unit of output).
                                        2-8

-------
       It is unlikely that any two LCI data points would have exactly the same
characteristics. Therefore, the issue of data aggregation focuses primarily on the
aggregation of disparate data points that typically occurs in the LCI data development
process.  Such aggregation of "apples and oranges" has the potential to misrepresent the
system of interest, and may contribute to the propagation of uncertainty.

       An issue associated with data aggregation is how the inclusion (or exclusion) of
large versus small data values may potentially affect LCI results.  In general, large data
values contribute proportionately more to estimates of means and totals (typically final
LCI results) and to the quality of the final results than  do smaller data values.  Different
boundary settings may include or exclude potentially large data values. Thus, the
dominance of large numbers is a key consideration when aggregating many numbers of
different magnitudes and quality. Quality should be emphasized on the larger data values
in the  data set to reduce or compensate for uncertainty in the final results. However,
small data values also can be important to data quality if they are measured or estimated
poorly.  The issue of large versus small data values is addressed in greater detail in
Chapter 3.

2.4    UNCERTAINTY PROPAGATION

       Uncertainty in data usually occurs and can be compounded at a number of stages
in the  LCI data development process.  This can make it difficult for practitioners to
develop accurate conclusions or decisions based on LCI results (Vigon and Jensen,  1992).
Potential sources of uncertainty include:

       •  the data source itself,
       •  any assumptions and/or calculations used to  manipulate LCI data, and
       •  aggregating very different data sources or values as described above.

       Uncertainty analysis techniques may be used to identify which errors hi input data
are compounded as the LCI model is utilized.  Determining the uncertainty or variability
in data may also be accomplished through more qualitative means, which can help LCA
practitioners ensure that their results are accurate and representative. Some of these
qualitative activities, which are addressed more specifically in Chapter 4, include:

       •  careful planning and execution of the LCI process leading up to final  results,
       •  formal and informal reviews of the LCI data .development process,
                                         2-9

-------
       • focusing data quality efforts on values that significantly influence LCI results,
         and
       • ensuring that sample sizes for key data values are as  large as possible, thus
         increasing the reliability of the estimates (Arthur D. Little, 1993b).

2.5    USING NONPARAMETRIC SAMPLES IN AN LCI

       In many cases, a portion of the primary data that make up an LCI are
nonparametric samples. Nonparametric samples refer to data that are chosen or developed
from a relatively small sample size and where the data are not independent of each other.
Nonparametric samples do not come from a probability sample  and thus are often difficult
to analyze using standard statistical techniques (Franklin Associates, Ltd., 1993). In the
context of LCI, nonparametric samples would include data generated through engineering
estimates, expert guess, and similar approaches where the shape of the "true" distribution
of a parameter is not known.

       Although methods for  statistical analysis of nonparametric samples have been
developed in other fields (e.g., limnology), little has been done  in the area of LCA to
develop statistical techniques with which to analyze nonparametric LCI data. Until such
techniques and procedures are established, practitioner judgment will remain a substantial
part of choosing data sources  and assessing data quality in cases where only
nonparametric samples are available.  When using such practitioner judgment, whether in
the context of nonparametric samples or otherwise, it is critical that all key steps and
assumptions are checked, documented, and reviewed.

2.6    USING EXISTING ENVIRONMENTAL DATA IN AN LCI

       Much of the existing sources for environmental data, whether public or private,
have been developed for purposes (e.g., regulatory compliance) other than that of
conducting an LCI (see EPA,  1994a). Although these sources may be readily accessible
for use in an LCI, the quality of the data must be considered.  Even when the quality of
the data is considered to be good for the original purposes, the  use of the data in an LCI
may need be to be reconsidered. For example, TRI data are typically based on
engineering estimates as prescribed by law; however, a greater level of certainty may be
required for LCI—especially if the LCI is to be used in an external application. There  is
also a bigger issue of specificity regarding the use of TRI data. That is, TRI data are
typically generated for generic waste streams and are not necessarily representative of the
                                       2-10

-------
specific waste stream data needed for an LCI that is often linked to a specific product or
process.  Practitioners should carefully evaluate existing environmental data to ensure that
they accurately represent what is desired in the LCI.

       In addition, many regulatory data bases report data on only a range of facilities for
a subset of total environmental releases. For example, EPA's AIRS database covers point
source emissions for primarily Clean Air Act criteria pollutants of sources emitting more
than a threshold of 100 tons per year—except 5 tons per year for lead and 1,000 tons per
year for CO (EPA, 1994a). The AIRS database also may report either actual or estimated
emissions values, without stating which values are being reported. The lack of distinction
between actual or estimated emissions is a common situation with environmental data
sources which contain  aggregated information.

       While many existing data sources for environmental releases provide a readily
available source of LCI data, it is recommended that practitioners exercise caution in
using existing environmental data that has been developed for purposes other than
conducting an LCI.  Any limitations associated with such environmental releases data
should be clearly identified and communicated in the final LCI results.
                                         2-11

-------

-------
                                    CHAPTERS
        ASSESSING THE QUALITY OF LIFE-CYCLE INVENTORY DATA

       The LCI is the initial data generation and collection component of an LCA.  As
such, the quality of data used in the LCI directly affects the results of the overall LCA.
This chapter outlines a framework for assessing the quality of LCI data that is designed to
complement the LCI data development steps typically taken by practitioners. Although
this framework focuses  exclusively on the LCI component of LCA, some of the data
quality assessment issues and techniques are likely to apply to LCA data as a whole.
       The relatively qualitative approach
to LCI data quality assessment outlined in
this framework reflects the current state of
LCA practice and existing data quality
assessment techniques.  The consistent
application of this or a similar framework
should advance both the assessment and
documentation of data quality issues in
LCAs.  As LCA  and data quality
assessment techniques evolve, it is
expected that a greater level of statistical
analysis will be used.
 , V f ?ft S',?V ',"*.**. ' V "•>"&. «. - ,-"" *<- fVt ' * -'<-..-" j> , , XX? ."° . t
f^^^r^f:^^, ^m::^ ^ i
^^^v^V,;^^\^^'V -i ?>*
^< *%V"*f ftft COnSIStSnt p***™*^******* *&'&**& rtf *a eiArt3 x vf/ ^. .
  ^Viiioy,^K> wonp=.^u•*•>* *>« ^ .^ ^* .*> > s \ " i , '•A xjj' ' >«>,
* 5 tifit*> r/n  eeiA rf •orCAnrtora mf/tfwi!aTc/sn to- irtrvnMrHiart Qi"\r»( IT *> ,*
       The LCI data quality assessment framework presented in this chapter was tested in
case studies by two recognized LCA practitioners. Using existing LCI data, the
practitioners performed data quality assessments on selected data sources using this
framework and documented their results.  Where appropriate, examples from these
practitioner case studies have been used to illustrate the application of the framework.
SETAC (1991) and EPA (1993b) have prepared key LCA documents that outline
procedures for conducting an LCI.  As indicated in EPA's Life-Cycle Assessment:
Inventory Guidelines and Principles, LCIs should contain data for the following life-cycle
stages:

       •  raw materials acquisition;
       •  manufacturing (which includes materials manufacture, product fabrication, and
          filling/packaging/distribution);
                                         3-1

-------
       •  use/reuse/maintenance; and
       •  recycling/waste management (EPA, 1993b).

Each of the stages listed above receives inputs such as energy and raw materials and
produces outputs such as waste streams (ak, water, solid waste), recyclables, co-products,
and final or usable products (see Figure 3-1).
INPUTS
 Energy
 Raw
 Materials
LIFE-CYCLE STAGES





'
Raw Materials Acquisition
i
/
•
Manufacturing
•i
r ,
Use/Reuse/Maintenance
i
\
Recycle/Waste Management
*
">
f- i
^ i
/ ^
4 f
3
! s 4

 System Boundary
                          Figure 3-1. Life-Cycle Stages
                        Source:  SETAC, 1991; EPA, 1993b
OUTPUTS
                                                           Atmospheric Emissions
                                                           Solid Wastes
                                                           Waterborne Wastes
                                    Final Products
                                                           Co-Products
                                      3-2

-------
       Practitioners undertake an involved process to build LCI data for each input and
output associated with the system of interest.  Figure 3-2 outlines the typical steps
practitioners take in the LCI data development process and identifies points in the process
where data quality activities should be incorporated.  Steps involved in the LCI data
development process include the following:

       •  defining study scope and boundaries,
       •  developing an LCI data collection plan,
       •  undertaking data collection and quality assessment,
       •  evaluating model sensitivity and results, and
       •  documenting and referencing study results.

       Figure 3-2 also highlights the iterative nature of the LCI data development process.
Although the process is depicted as moving in a linear direction, data  limitations and other
factors typically cause practitioners to continuously reconsider the scope and objectives of
the LCI.  For example,  in moving from scoping and boundary definition toward
documenting and referencing LCI results, a practitioner might recognize that a significant
component of the life cycle was inadvertently excluded.  In this case,  the practitioner
should re-evaluate and possibly refine the scope of the LCI,  which would likely require
collecting and assessing additional data.  The following  sections discuss the steps in the
LCI data development process and their relationship to LCI data quality assessment.

3.1    SCOPE AND BOUNDARY DEFINITION

       Defining the scope of the overall LCI is the first step in the LCI data development
process.  During this step, the overall purpose and objectives of the LCI are defined.  This
includes identifying the product, process, or activity under evaluation  (e.g., process change
or product comparison), the level of analysis undertaken (e.g., simple inventory
calculations based on a "less is better" criterion or an impact assessment), and the end use
 of the final LCI results. With respect to the end  use, practitioners should identify whether
the final LCI results will be used in an internal or external manner, and whether the LCI
 will be absolute or comparative (i.e., evaluating one product or process versus multiple
 products or processes).  Table 3-1 shows a case study example for  LCI scoping and
 boundary definition.
                                         3-3

-------
 Scoping and Boundary Definition
   • Define study objectives
   • Identify LCI end uses
   • Develop input/output chart
          LCI Data Collection Plan

            • Define data quality goals
            • Identify data categories and sources
            • Identify data quality indicators
            • Develop data questionnaire
    u_

1	
                    Data Collection and Review
                     • Collect data
                     • Assess data sources against DQIs
                     • Determine quality of key data
                       sources
                                                                                  Peer
                                                                                Review
                                                                                 Panel
                             Evaluate Model Sensitivity and Results
                               •  Perform sensitivity/uncertainty analysis
                             ,  •  Determine key data values and
                                 uncertainty
                               •  Determine whether DQGs were met
                               •  Collect additional data, if necessary
                                       Document and Reference LCI Results
                                        • Reference key data sources
                                        • Summarize data quality information
                                        • Document key assumptions and
                                         calculations
                                        • Document uncertainty in LCI results
                       Figure 3-2. LCI Data Development Steps


       Practitioners should also consider data quality as they define the scope of an LCI,

including the type of data needed for the analysis and the level of data quality necessary

to support the intended purpose of the LCI as well as the overall LCA. In addition to

defining the objectives of the LCI, practitioners need to determine where data quality will

be stressed—this might be thought of as developing initial, broad data quality goals

(DQGs).  For example, if an LCA is concerned with initiating a manufacturing process

change, the quality of LCI data may be less significant for raw materials or transportation
                                           3-4

-------
TABLE 3-1.  CASE STUDY EXAMPLE:  LCI SCOPING AND BOUNDARY
              DEFINITION
 Study Goal:


 Study Objectives:
 Study Scope:
To assess the environmental performance of the process for
manufacturing product X and to identify opportunities for
environmental improvements.
•  Develop a working data base of baseline environmental data for the
   manufacture of product X.

•  Evaluate the working data base for the most critical pollutant types.

•  Develop a final data base containing pollutant types considered to
   be most critical.

•  Identify areas where critical pollutant types may be reduced.

A confidential industry data source will be used to develop baseline
environmental data for one of the client's major product lines. The
resulting data base is to be used on an internal basis only.

Primary data will be collected from nine separate facilities.  Study
boundaries include raw material and energy inputs, water consumption,
air emissions, water effluent, and solid waste generation  associated
with product manufacturing at these facilities.

One of the foreseeable limitations in developing this data base will be
that all nine facilities are multiproduct manufacturers.  As a result,
facility engineers may be required to use their best judgment to
allocate aggregate waste streams to product X.
Source: Modified from Arthur D. Little, 1993b.


input data (Arthur D. Little, 1993a). Practitioners should ask the following questions
when defining the scope of the LCI:


       •  Will the data be used for LCI only, or will the data be used to conduct an
          impact assessment?
       •  What type of decision will be made using the data (e.g., modifying a product or
          an industrial process, imposing a regulation, or comparing one product to
          another)?
       •  How will altering the scope of the study influence data quality needs?
       •  How will altering data quality affect overall LCI results?
                                         3-5

-------
       •  Can data commensurate with the scope of the study be acquired given the
          available resources?

       Note that if the answer to the last question above is "no," there is a need to
reevaluate the study scope and/or the amount of resources  allocated to the study.  After
clearly defining the scope and the general level of data quality, appropriate system
boundaries must be delineated. System boundaries are typically summarized in an
input/output flow chart or matrix, which identifies all inputs and outputs of the overall
system and its subsystem components.

       The input/output parameters needed for an LCI have been thoroughly discussed in
the SETAC (1991) and EPA (1993b) documents. These documents state generally that all
inputs (e.g., energy, raw materials, and water) and outputs  (e.g., air emissions, water
effluent, solid and hazardous wastes, and co-products) of the system under evaluation
should be identified clearly.  System boundaries are typically chosen to accurately portray
the system being studied and to control the uncertainty and cost of the LCA; there is no
single "correct" way to define them.

       When delineating system boundaries it is important, especially for external
applications, to indicate clearly which -life-cycle components are excluded by the system
boundaries and how including these components may  affect the LCI and the interpretation
of the overall LCA results.  Practitioners should also consider how different boundary
settings can affect the outcome of the LCI and the  overall  LCA.  For example, a
comparative  study of two existing LCAs for corrugated cardboard production by Ekvall
(1992) found significantly different results, even though the two facilities were located in
the same country and produced the same product using similar processes. It was generally
concluded from this study that differences in system boundaries, surrounding conditions
(e.g., recycling rates, available resources), and data used in the analyses led to  differences
in the overall results (Ekvall, 1992).

       Practitioners should also consider which LCI data sets would be included or
excluded if boundaries were drawn differently (i.e., these data sets which will dictate the
quality of overall LCI results). It is critical that practitioners clearly  define the system
boundaries and explain how those boundaries were determined, as well as any limitations
to the study  they may present. Clearly defined boundaries not only help place LCI results
into proper context but are essential for decisionmakers and other interpreters of the results.
                                         3-6

-------
                                                     **>"      "  s.     " *
                                                      define the system
                  Xi-lbbundarips/ancl'j&xplain'liovv these .boundaries were
                 >'^ jd§termSned» as welt as any.Iirmtations to the study they
                 *" f<" **»*»%* «•*•<*%** A**i"p   vt<>     *    ^ *" .v    -v.             *•
                     may present
       Once the scoping and boundary issues have been addressed, SETAC recommends
that practitioners submit this information to a previously established peer-review panel
(SETAC, 1992). Depending on the intended use of the study, a peer-review panel can be
made up of members of the organization sponsoring the LCA or persons outside the
process.  In examining study scope, boundaries, and assumptions,  the peer-review panel
should consider:

       •  the purpose for conducting the LCA;
       •  boundary settings;
       •  the choice of functional unit and range of alternatives; and
       •  any assumptions used in determining study boundaries and functional units.

Later reviews will also consider:

       •  anticipated data availability;
       •  proposed data aggregation procedures;
       •  anticipated data variability and sample sizes; and
       •  the consistency of goals and objectives, boundaries, assumptions, and data
          availability (SETAC, 1993a).

 Review efforts are intended to help practitioners determine whether the identified scope,
 boundary, and data categories are reasonable.  Such efforts help in enhancing the quality
 and credibility of the LCI data used and the overall LCI results.

 3.2   LCI DATA COLLECTION PLAN

        After the scope and boundaries of an LCI have been defined, practitioners can
 outline an LCI data collection plan. Because data collection is a significant part  of an
 LCI budget, the data collection plan is typically prepared before the project is funded
 (Franklin Associates, Ltd., 1993).  Key elements of a data collection plan include the
 following:
                                         3-7

-------
       •  defining data quality goals,
       •  identifying data sources and categories,
       •  identifying data quality indicators, and
       •  developing a data questionnaire.

In developing a data collection plan, the DQGs should be aimed at ensuring accuracy and
representativeness of the final results, and data questionnaires should facilitate the
collection of appropriate data (Arthur D. Little, 1993b).

3.2.1  Defining Data  Quality Goals (DQGs)

       The first step in developing an LCI data collection plan is defining DQGs to
indicate where data quality is a high priority and the level of effort necessary to obtain
data of a desired  quality.  DQGs are qualitative statements that define specifications for
the adequacy of data used in an LCI or for certain LCA parameters.  DQGs identify
which data categories are important to the analysis and the features that are desired in
data. For example, if an LCI is conducted to determine the impacts from energy usage
and solid waste generation for a particular industrial process, the practitioner may desire
precise energy and solid waste data, but may be willing to accept more uncertainty in the
other life-cycle parameters.  Sometimes a practitioner may plan to use existing data for
less critical aspects of  the study and collect new or site-specific data for the more critical
aspects (Franklin Associates, Ltd., 1993).  This type of requirement can be summarized in
DQGs.
           ;* ijftu?' *r \*V" *	-v <•	-  , .  v  ».<•'•  ^ •' • •?	   t°
             = ;/Data quality goals (DQGs) are qualitative statements
             = * that Define specifications for the adequacy of data used
            1 ^'l ^iri an^LCI4,pr for certain LCA parameters,  , \ _ v
DQGs provide a framework for balancing available time and resources against the quality
of the data required to make a decision or statement of overall environmental or human
health impact (EPA, 1986a).  DQGs are closely linked to overall study goals and serve
two primary purposes:
                                         3-8

-------
       •  to aid practitioners in structuring an approach to data acquisition based on what
          data quality will be stressed in the analysis; and
       •  to serve as data quality performance criteria.

       At a basic level, DQGs should identify the level of data quality sought for
individual data sources used in the LCI.  DQGs may also identify the level of data quality
sought for groups of LCI parameters (e.g., major inputs, outputs, and life-cycle stages) or
for the overall LCI.  Because there presently are no standard requirements for developing
DQGs, practitioners  will determine both the number and nature of DQGs necessary to
meet the goals  and scope of the LCI.  Table  3-2 shows  an example of a DQG and other
inputs to a data collection plan. Other examples of hypothetical DQGs include the
following:

   TABLE 3-2. CASE STUDY EXAMPLE DQGS, DATA SOURCES, AND DQIS
 DQG:


 Data Source:
 Data Type:

 Relevant DQIs:
Facility-specific data are required for raw material and energy inputs,
water consumption, air emissions, water effluent, and solid waste
generation.
Confidential industry data source.
Data used will be primary measured, modeled, and nonmeasured (i.e.,
estimated through best engineering judgments) data.
Acceptability
Bias
Completeness
Comparability
Representativeness
 Source:  Modified from Arthur D. Little, 1993b.

        •  Approximate data values are adequate for the energy data category.
        •  Air emission data should be representative of similar facilities in the U.S.
        •  Primary data for the recycling/reuse data category should be used, rather than
          national averages.

 The goals and scope of the LCI and the accompanying DQGs can impose a structure that
 enables practitioners to assess the quality of key data prior to final use in an LCI.  DQGs
 also provide a clear indication of where in the LCI data quality was made a priority. It
                                          3-9

-------
should be a regular practice to report hi detail the objectives of the LCI and the related
DQGs.

3.2.2  Identifying Data Sources and Types

       Once DQGs have been identified, practitioners should have an understanding of the
type and quality of data needed for the LCI.  As discussed in Chapter 2, LCI data sets can
be comprised of data that varies by the primary or secondary nature of the data source,
the type of data employed (e.g.,  measured, modeled, or nonmeasured), and the  level of
data aggregation (e.g., individual plant or industry average data).  It is important that
practitioners recognize these characteristics among data sets and consider how they may
affect the LCI results.  For example, secondary data sources may be less specific than
actual measured or monitored plant-specific data (i.e., primary  data).  Although primary
data are typically more accurate for a specific process, they are not always preferred to
secondary data. The manufacture of basic commodities that are used as  inputs to many
production systems, for instance, may be better represented by  secondary data that reflect
the mix of plants and processes that constitute the commodities market, then by data from
one or a few plants.

       Similarly, aggregated data may provide little indication  of the variability in specific
parameters, such as facility-level air emissions. However, industry aggregated  data may
be preferable when the LCI results are to be used for broad application across the industry
(EPA,  1993b).  Nonaggregated data, on  the other hand, can be  used to calculate statistical
measures, such as the mean, median, standard deviation, and kurtosis, to indicate the
central tendency, variability, and skewness in the data.  For examples of data sources, data
types,  and different levels of aggregated data, refer back to Figure 2-1.

3.2.3  Identifying Data Quality Indicators (DQIs)

       The quality of LCI data can be assessed by evaluating the data using defined
DQIs.  DQIs are quantitative or qualitative terms defining data characteristics that serve  as
benchmarks against which data quality can be assessed to determine whether DQGs have
been met. In addition to  determining whether DQGs have been met, evaluating data
sources against DQIs may also help practitioners to establish levels of confidence for
particular data sources.
                                         3-10

-------
       Table 3-3 defines selected DQIs applicable for assessing LCI data quality.
Appendix A provides a more complete description of these DQIs and discusses their
application to LCI data.  Although a large number of potential DQIs have been identified
(see SETAC, 1994) and may be applied in LCIs, the list of DQIs in Table 3-3 has been
condensed to eliminate any overlap between the DQIs.  For example, a longer DQI list
might contain both:  1) quality assurance/quality control (QA/QC)—the degree to which
data were subject to quality assurance and quality control procedures; and 2) verification/
validation—the  degree to which data have been checked for errors or evaluated against
an accepted standard.  The distinction between these potential DQIs can be unclear
because they are both indicators of data acceptability.

       In some cases, DQIs with similar meanings have been combined.  For example,
QA/QC and validation/verification were combined to strengthen the acceptability DQI. In
other cases, some possible DQIs (e.g., data aggregation, accessibility, model
documentation, model limitations, and statistical measures) were not included because
alone they reveal little about the quality or utility of the data. While these DQIs can be
used, in practice they may contribute information to a more encompassing DQI such as
acceptability or representativeness.

        The DQIs in Table 3-3 are expected to be the most practical DQIs for purposes of
LCI data quality assessment.  They are not, however, the only acceptable DQIs.  Other
DQIs may  be used  as long as practitioners take care to explain why and how they were
 applied.  The intent is for practitioners to use and describe the DQIs that are most
 appropriate and applicable to the specific data source being evaluated.
                                          3-11

-------
  TABLE 3-3. SELECTED DATA QUALITY INDICATORS RELEVANT TO LCI
           DQI
Definition
Acceptability
Bias
 Comparability

 Completeness

 Data Collection Method
  and Limitations

 Precision
Referenced
Representativeness
The degree to which the data source has been peer reviewed, evaluated
against an accepted standard, or checked for errors through expert
judgment.

The level of systematic error that causes the mean values of a data set to
be consistently (over repeated samples) higher or lower than the
corresponding "true" parameter values. With respect to LCI data, "true"
parameter values may not be known.

The degree to which different methods, data sets,  or decisions agree or
can be represented as similar or equivalent.

The amount of data available for the analysis compared with the amount
of data needed or desired.

The level of information describing the method of data collection,
including any limitations associated with the data collection method.

The degree of spread or variability, expressed numerically if possible, in
a set of data values or measurements compared to the mean of the data
values.

The degree to which data values reference the original data source.

The degree to which the data represent what the analyst is  trying to
describe or depict.
3.2.4  Developing a Data Questionnaire

       The data questionnaire is the final component of an LCI data collection plan.
Practitioners usually develop a data questionnaire to obtain primary data. Developing a
data questionnaire is closely tied to, and iterates back, to the defined DQGs.  For example,
the more information collected on the system of interest through the questionnaire, the
more information available for determining whether the DQGs have been met.  If it is
realized that the data will not be adequate to meet DQGs,  the practitioner may need to
redefine the DQGs. Figure 3-3 illustrates the types of information that an LCI data
questionnaire might include.
                                         3-12

-------
DATA. FOR THE MANUFACTURE OF:
product

Raw Materials


Net Water Use
Water intake
Water output
Process Energy
Electricity
Purchased
Self-generated
Fuel
Natural gas
Coal
Residual oil
Wood
Distillate oil
Other
Steam
Transportation Energy
Average shipping distance to customer
Solid Wastes (please note if any
Wastewater sludge
Packaging mater.
Off-spec, product
Trim or scrap
Process wastes
Other
Atmospheric Emissions
Particulates
Nitrogen oxides
Hydrocarbons
Sulfur oxides
Carbon Monoxide
Aldehydes
Other organics
Waterborne Wastes
Fluorides
Dissolved solids
BOD
COD
Phenol
Sulfides
Oil
Suspended solids
Acid
Metal Ion
Chemicals

quantity
Ib.
Ib
Ib
gal.
gal.

kwh
kwh
cu.ft.
Ib.
gal.
thou.Btu
gal.

Ib.
(miles)
Prepared by:
Company
Name:
Phone
Number:



List Co-products, Recovered Heat or Steam
Generated by Process
Ib.



Ib.
Ib.

Self-generated Electricity






type and quantity of fuel/kwh
type and quantity of fuel/kwh
type and quantity of fuel/kwh
type and quantity of fuel/kwh
qual/quant of cogen. steam/kwh
psig. type of fuel


of these materials are recovered
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.

rail ship
truck air
for recycling)
% moisture
material
(controlled emissions only)
Methane Ib.
Odorous Sulfur
Ammonia
Hydrgn. Fluoride
Lead
Mercury
Chlorine
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
(controlled emissions only)
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.

Cyanide
Chromium
Iron
Aluminum
Nickel
Mercury
Lead
Phosphates
Zinc
Ammonia
Carbon dioxide
Ib.
Ib.
Ib.
ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.
Ib.

NOTE: A complete data questionnaire would include information on chemical characteristics (e.g., toxicity, persistence,
bioaccumulation) and possibly environmental information (e.g., location of release, ambient pollution levels).


              Figure 3-3.  Example of a Partial LCI Data Questionnaire
                              Source: Franklin Associates, Ltd., 1993.
                                               3-13

-------
       Data questionnaires should be designed based on the expected respondent (e.g.,
corporate manager, facility engineer, or research scientist) and should solicit information
to enable the validity and representativeness of the data to be assessed.  When primary
data are available, practitioners want to obtain data that are representative of the identified
DQGs, information on applicable confidence measures (e.g., error limits and variances),
and a description of the data collection methods (e.g., sampling, direct measurement, or
engineering estimates).  Where primary data are unavailable or not needed, practitioners
will need to identify whether appropriate secondary data sources exist. This will require a
search of the public literature,  data bases, and reports.

       Preparation of a data questionnaire is a multidisciplinary task that may best be
accomplished through a formal or informal workgroup.  Using the study scope as a
baseline, the  workgroup can perform the following tasks:

       • outline specific descriptions of data requirements,
       • review the draft data questionnaire, and
       • modify the data questionnaire to address variations among facilities.

The resulting data questionnaire, whether prepared by a workgroup or not, should provide
a clear understanding of what  data are needed and how they are to be used in the LCI.

       As part of the process of preparing the data questionnaire, the workgroup will also
want to establish the manner, and level of detail, in which data collection procedures from
all key sources (e.g., plant managers, process engineers, industry experts)  will be
documented. Such information on data collection procedures should include any
assumptions employed and/or limitations of the data collection procedures.               '•

       Once the data collection plan is completed, it is recommended that the plan be
reviewed by  a peer-review panel. The purpose of this review is to ensure that the
components of the data collection plan (DQGs, data sources and  categories, and the data
questionnaire) are consistent with the scope and objectives of the LCI.  This review not
only helps to validate the data collection plan but also may help to identify  flaws in  the   ;
plan so it may be modified accordingly.
                                        3-14

-------
3.3    DATA COLLECTION AND QUALITY ASSESSMENT

       Data collection and quality assessment is a critical phase of the LCI data
development process that includes the following:

       •  collecting data,
       •  assessing data sources against relevant DQIs, and
       •  determining data quality.

3.3.1  Collecting Data

       LCIs typically contain a mixture of primary and secondary data.  Collecting
primary data often necessitates the development and use of a data questionnaire, as
discussed in the previous section.  Although data questionnaires are typically not needed
for collecting secondary data, the secondary data values may need to be converted into
units that are consistent with the overall LCI.  The entire LCI data collection effort may
benefit from using a checklist and accompanying data worksheets.  (For examples of an
inventory checklist and data worksheet, refer to EPA (1993b.)  A checklist helps to guide
data collection and validation and may be tailored to any given product or process.  The
data worksheet has a dual purpose:

       •  as a tool for the analyst to coordinate and assimilate data, and
       •  for use in requesting data from others.

The LCI worksheet is often designed on a module, or subsystem, basis. These subsystems
comprise the basis for aggregating data from individual processes to the life-cycle stages
 (e.g., raw materials acquisition, manufacturing, etc.), and, ultimately, from the  life-cycle
 stages to the overall system level.

        Primary data collected from multiple facilities can be used to assess the validity
 and representativeness of the data values with greater confidence than data from a single
 source.  Another benefit of collecting data from multiple facilities is the added
 understanding of the variability of particular processes (Arthur D. Little, 1993b).  For
 example, the consumption of energy typically varies to some extent even when the energy
 values are  normalized to a unit value (e.g.,  Btu per unit output).  In addition, variations in
 data among multiple facilities for similar processes can be even more pronounced due to
 differences in equipment,  technologies, or operating practices.  Because the level of

                                          3-15

-------
resources needed to collect data from multiple facilities is necessarily greater, practitioners
will need to consider whether the added confidence in the data and the added
understanding of process variability will affect LCI results significantly enough to merit
these additional resources.

       When primary data are obtained through using a data questionnaire, the data
should be carefully reviewed and analysts should address any questions they may have
with one or more responses (Arthur D. Little, 1993b). When using data from  secondary
sources, it is important to determine how accurately the data represent the specific process
under evaluation.  For example, process input and output values  are dependent on a
number of different factors such as mix and quantity of inputs, use of pollution-control
devices, and process technology.  It is unlikely that any two processes will contain exactly
the same combination of factors. Thus, secondary data must be  carefully evaluated to
ensure that they are adequately representative. The DQGs may provide a useful tool for
informally screening potential data sources, including secondary  data.

3.3.2  Assessing Data Sources Against Relevant DQIs

       An important factor in identifying LCI data quality is determining which DQIs are
most applicable to the analysis.  The selection of DQIs is influenced by the following:

       • the defined DQGs;
       • the type of data (i.e., whether the data are primary or secondary and measured,
         modeled, or nonmeasured);
       • manipulations made to the data (i.e., whether the data  are extrapolated,
         interpolated, or back-calculated); and
       • the type of data quality assessment being conducted (i.e., whether data quality is
         being assessed during the conduct  of the LCI or by  an analyst reviewing an
         existing LCI).

Each factor that influences DQI selection is discussed below.                            ',

DQGs
       DQGs are qualitative statements that define specifications for the adequacy of data
used in an LCI or for certain LCA parameters.   DQIs are quantitative or qualitative terms
defining data characteristics that serve as benchmarks against which data quality can be
assessed to determine whether DQGs have been met.  The link between the defined DQGs

                                         3-16

-------
and selection of DQIs is key.  DQGs by definition will help practitioners determine which
DQIs are most relevant for LCI data quality assessment.  For example, a DQG that
requires the LCI data be typical of that for manufacturing facilities nationwide will
necessarily heighten the significance of the DQI's representativeness and comparability.
                          '*-   *• •;?.*, ,-, -  :, v
                 , -The link" between the defined D'QGs and the selection of
                                        -    '      -*   *        •
       DQGs may also, implicitly or explicitly, help practitioners determine the type of
data that is to be used in the LCI (e.g., primary, secondary, measured, modeled,
nonmeasured). Such data specifications will in turn determine the relevance of DQIs.  For
example, if the DQGs specify that primary measured data be used, then the precision and
bias DQIs may be critical in gauging whether the primary data provide the desired level
of accuracy.

Data Type
       Although data are available from a variety of sources, data can be classified
broadly as either primary or secondary. As discussed in Chapter 2, primary data are
different from secondary data in that an analyst has direct access or direct input into the
data collection process.  Primary data may be measured pursuant to statistical or
nonstatistical protocols, estimated through best professional or engineering judgments, or
generated  from models (see Figure 2-1).  These data may be assessed  against DQIs such
as precision and bias (if statistical parameters are provided), representativeness, and
completeness. Data generated through models can be evaluated in part based on whether
the model was verified and validated.

       Secondary data may be more difficult to evaluate from a data quality perspective
 because they come in multiple forms and may lack an explanation of the data collection
 methods and data variability.  Like primary data, secondary data can be measured,
 modeled,  or estimated. An example of estimated secondary data is EPA's TRI data base,
 which contains chemical release data. In addition  to being monitored, sampled, or
                                          3-17

-------
modeled, the TRI data may also be based on professional or engineering estimates
(referred to in this document as nonmeasured data).

       Traditional DQIs—including measures of precision, bias, and completeness—are
quantitative measures for evaluating primary data (EPA, 1993b; Johnson and Ford, 1986).
While these DQIs may be applicable to secondary data, it is more likely that additional
indicators will need to be employed because secondary data typically are not generated
under statistical guidelines.  Qualitative indicators  (e.g., completeness, comparability) and
understanding of the data collection procedures may offer better insight on the limitations
of the data.

       Another important factor that influences the selection of DQIs when evaluating
secondary data sources is the availability of information sufficient for an analyst to assess
the quality of the source. For example, a data source could contain perfect data, but if
there is no way to discern this, the data may not be usable for an LCI to influence public
policy that would require the documented use of any high quality data.  When evaluating
secondary data sources, it therefore may be useful to apply DQIs such  as the level of
reference or description of the data collection method to see if adequate information is
provided about how the data were generated. Once it is determined that adequate
information is supplied about the data to assess its quality, the practitioner can proceed
with evaluating the actual data through using other quantitative DQIs.
               "j	ssfft*: t Mftay. *  »Jcs-t £»,* *«v» ~p  <*•   * t *?   * ?•*.! y
               t;«	SiftSSfl W*" »*«WN **. Vss« ^Wn-ii --of  a* >  k * » ]>
                When evaluating secondary data sources,  it may be
                useful to apply DQIs such as the fevel of reference or
                description  of the data collection  method to see  if
                adequate information is provided about how the data
               ijiliiiijiiiiiil.;.^!!!'!" » t* Hi j    mmu*r» 1  i ™       »   . 1^       ->
                were generated.
       For these and other primary or secondary nonmeasured data, which cannot be
assessed with more quantitative DQIs, the acceptability DQI may be most
important—indicating whether the data value is widely accepted by peers or colleagues in
the field.
                                         3-18

-------
Data Manipulations
       Data quality is influenced not only by the quality of primary and secondary data
sources used, but also by any assumptions used or manipulations to make the data better
suited for use in an LCI.  Using poor assumptions or manipulations can result in the
generation and use of inappropriate data values, even if the quality of a data source itself
is found to be high.   Therefore, it is imperative to assess not only the quality of the data
source but also the assumptions and calculations used to generate the LCI data values.

       For example,  suppose an analyst has Department of Energy (DOE) energy
consumption data for a particular industry.  To determine the appropriate value for an LCI
(i.e., for a facility or group of facilities), the analyst may need to back-calculate (a process
to disaggregate data) from the aggregate DOE data. To do this, the analyst will develop
assumptions and calculations to generate numbers suitable for use in the LCI.  Under this
scenario, DQIs (e.g., representativeness) may be necessary to assess not only the quality
of the data source but also the assumptions and calculations used by the analyst to
generate the LCI data values.

Type of Data Quality Assessment
       The type of data quality assessment being conducted also influences the selection
of DQIs.  There are two basic types  of LCI data quality assessments: those conducted by
the  practitioner while developing  an  LCI and those performed by analysts reviewing
existing and/or published LCIs. For a data quality assessment conducted by the
practitioner, the choice of DQIs to use will be a function of the goals and scope of the
LCI as well stakeholder needs. When reviewing  the quality of data in an existing or
published LCI, additional DQIs may be needed to indicate, for example, whether the data
 have been referenced and if the data are accessible and reproducible.
 3.3.3  Assessing Data Quality

        After a practitioner has selected the relevant DQIs for a particular data source, the
 next step is to evaluate the quality of the data source against the DQIs.  Assessing the
 quality of LCI data is important because it provides users with a frame of perspective with
 which to properly interprete LCI results. LCI results based on poor quality data may not
 be as significant as results based on higher quality data. Also, while higher quality data
 would likely be needed to support the external applications of LCI results for external
 applications, practitioners using LCI results for internal application should also consider
 how different levels of data quality  affects the value of those results.  DQIs are used in

                                          3-19

-------
the LCI data quality assessment process to determine whether DQGs have been met.  As
discussed in the previous section, the importance of individual DQIs is a function of the
DQGs.  Again, keep in mind that these guidelines do not suggest conducting a detailed
assessment and documentation of every data source used, but rather promote the
assessment of key data sources.  The final LCI report should include any documentation
of data quality issues and assessment results.  In some cases, it may be more appropriate
to provide a summary of data quality results in the text of the final report and place more
detailed data quality information in an appendix.

       Two aspects of assessing  data quality are very important: (1) that each key
data source be evaluated in a consistent and clearly defined manner, and (2) that the
resulting data quality information be clearly communicated.  Table 3-4 contains an
example of an LCI data quality worksheet that provides a framework for evaluating and
documenting the quality of key LCI data sources. The worksheet can be used to identify
the relevance of each DQI to the data source (identified by high/medium/low) and the
quality of the data with respect to each DQI (identified by high/medium/low).  For
example, as shown in Table 3-4, referencing the data source is considered to be of low
relevance but receives a high data quality rating because the document was thoroughly
referenced.  On the other hand, representativeness of the data source is considered to  be of
high relevance and received a high rating because the data were collected from numerous
facilities.  In this  example,  it is more important to have a highly rated representative data
source than to have a highly rated referenced data source.

       It is recommended that each DQI rating be accompanied by explanatory comments
that adequately convey the  rationale for both the relevance and quality ratings for the
DQI. Once the evaluation has been completed for each relevant DQI, an overall data
quality assessment of good, fair,  or poor may be assigned to the data source based on the
relative importance of the DQIs and their respective ratings.

       In addition to clearly explaining the specific relevance and quality rating for each
DQI, practitioners may want to take the additional step of documenting the general criteria
used for assigning high, medium, or low quality ratings for each DQI.  These "proof
statements" would provide  general rules for each DQI by which analysts could evaluate a
variety of data sources (Arthur D. Little, 1993b).  Proof statements not only increase  the
clarity of data quality evaluation but may help increase consistency  among practitioners in
how ratings are made. An example of proof statements for rating the degree of bias in
data is shown in Table 3-5.

                                        3-20

-------
   TABLE 3-4.  CASE STUDY EXAMPLE LCI DATA QUALITY WORKSHEET
Data Source:


Data Quality Goals:

Data Assessed:
     Development Document for Effluent Limitations Guidelines, New Source Performance
     Standards, and Pretreatment Standards for the Pulp, Paper, and Paperboard and the Builders'
     Paper and Board Mills Point Source Categories. U.S. EPA, Office of Water, 1982.
     •  Effluent data for each compound will be typical for most US paperboard manufacturers.
     •  Data must capture long-term trends in effluent releases.
     Primary water effluent considered for this data  source include: BODS, TSS, pentachlorophenol
     (PCP), trichlorophenol (TCP), and zinc.	
         DQI
 Relevance   Data Quality
ofDQIl(A)  Ratmg2(B)
                                                                         Comments
Acceptability
Bias
High        High
 High       High
Comparability
 Completeness
 Data Collection Method
 and Limitations
 Precision
High
 Medium
 High
High
High
                                     Medium
                         Medium
             Medium
 Referenced
 Representative
 Low
 High
 High
 High
(A)  It is important for the data to have undergone review by
     an independent party.
(B)  The data received extensive review, including being
     subject to an extensive QA/QC methodology.
(A)  Bias in aggregated data can result from over-reliance on
     data from a new technology, a specific region of the
     country, or from specific processing method.
(B)  Data were collected from over 600  mills covering many
     technologies, processes,  and geographic  areas.  The data
     were segregated into specific subcategories to avoid
     overlapping process technologies. Long-term sampling
     programs were employed which mitigate problems from
     cyclic variations (e.g., seasonal, business cycle).
(A)  It is important for the data to  be comparable to long-term
     measures of industry effluent.
(B)  Several test sites were selected for  long-term analysis to
     provide a comparable measure.  Data values compared
     well to these standards.
(A)  Representativeness is deemed more important than
     completeness in this analysis.
(B)  There is considerable data in this data source on the
     target primary water effluent from 600 mills.
(A)  Given the broad universe of mills this data source
     captures, it is important for collection methods and
     limitations to be well documented.
(B)  An extensive description of the data collection methods
     used is provided in the text.  Specific data quality
     limitations for each mill are not identified.
(A)  Exact values are not needed, reasonable approximations
     are adequate for the study.
(B)  Few statistical measures are provided to assess the
     precision of the data. While  single values are provided
     for each mill from which the variation across mills can
     be  assessed, there is no consideration of the variation in
     the data collected.
 (A) Thorough referencing of the data source is not critical
     given the wealth of effluent data collected from the 600
     mills.
 (B) The document is thoroughly referenced.
 (A) It is important for the effluent data in this report  to
     reflect typical releases.
 (B)  Although the data is dated 1982, it is the most recent
      comprehensive collection (from 600  mills) and is deemed
      to be representative of the industry.
 'High, medium, or low. 2High, medium, or low.
 Source:  Modified from Arthur D. Little, 1993b.
                                                    3-21

-------
       TABLE 3-5. EXAMPLE PROOF STATEMENTS FOR THE BIAS DQI
                                       Data Quality Rating
  DQI
Low
Medium
High
 Bias    There is no clear indication
         that potential biasing effects
         have been considered.
                 Potential biasing effects are
                 considered qualitatively, but
                 only limited efforts are
                 made to correct for these
                 biases.
                  The data collection tool is
                  structured to avoid any
                  potential bias.

                  Where bias is identified,
                  additional data are collected
                  or the results are adjusted to
                  mitigate the impact of the
                  bias.
Source:  Arthur D. Little, 1993b.


3.4    EVALUATING MODEL SENSITIVITY AND RESULTS

       As part of the process of collecting and evaluating LCI data, and building a
computational model, practitioners will determine which data values are key to the
analysis. While DQGs will provide overall guidance on where data quality should be
emphasized.  In addition, the following approaches can be used to determine whether a
specific data source is  key to the analysis:

       • Sensitivity analysis, and to a lesser extent, uncertainty analysis, can be used to
         identify the parameters most important to LCI results (as discussed in Chapter
         4).

       • Practitioners can use their knowledge of the data source and its
         relevance/importance to  the overall analysis to determine which data sources are
         likely to have the greatest effect on results.

       In identifying key parameters, the objective is to determine which parameters are
critical (i.e., contribute significantly) to final results and then estimate the level of error in
the data that would significantly alter the final results.  Finding that small changes in an
individual parameter have significant impacts on LCI outputs would indicate that the
parameter is important to the analysis and that high quality data is more critical for  that
parameter.  For example, if an LCI comparing two alternative processes shows that

                                        3-22

-------
process A and B produce 10 and 11 tons of solid waste, respectively, it may be difficult
to discern whether process A truly differs from process B.  Sensitivity and uncertainty
analysis techniques, discussed in Chapter 4, may help to determine how changing the
values of these two numbers would affect LCI results.  If a "reasonable" change in any
parameter significantly changes final LCI results the practitioner should consider

       •  collecting additional data to get a more precise estimate of the parameter,
       •  compensating for  missing or deficient data, or
       •  caveating final LCI results (e.g.,  provide ranges or a distribution of values).

       Determining which model parameters are most sensitive to LCI results does not
indicate whether data quality is adequate or not but does provide increased understanding
of the significance of those  parameters. This can help identify where additional resources
should be directed to improve LCI data quality or where some degree of qualification is
needed in study research.

3.4.1  Performing Sensitivity and Uncertainty Analyses

       LCI data are generally collected on an iterative basis. After evaluating a data
source against selected DQIs, practitioners  may determine that collecting and reviewing
additional data sources is necessary. Sensitivity analysis, and to a lesser extent,
uncertainty analysis, can be used to indicate which  parameters are most important to the
analysis and where data quality resources should be directed or redirected.

        Sensitivity analysis is  a process for evaluating the effect  of variations of input
parameters to a system on model results. Sensitivity analysis can be performed on typical
LCI data to determine the importance  of individual LCI data values or combined values
(e.g., total CO2 emissions for the manufacturing stage) on overall LCI results.

        Uncertainty analysis is a process for evaluating uncertainty associated with
 inferences, and may be  applicable to analyzing uncertainty in LCI data and  results.
 Uncertainty analysis may also help to  identify the propagation of uncertainty throughout
 the LCI model, that is, where the uncertainty associated with various key data grows as
 the LCI model utilizes the  data values and calculations before arriving at a final set of
 LCI numbers.
                                         3-23

-------
       Statistical analysis of uncertainty can be difficult in an LCI, where the multiple
data points needed to perform statistical analyses typically are not available. Thus, it is
generally not feasible to calculate actual measures of error or uncertainty for LCI data.
However, several alternative methods of expressing uncertainty in data or final results
may be useful, including the following:

       •  presenting a discussion of possible sources of uncertainty,
       •  identifying estimates of uncertainty associated with specific data or results,
       •  calculating and presenting numbers with appropriate significant figures, and
       •  presenting data and results as ranges rather than absolute numbers (Arthur D.
          Little, 1993b).

       Chapter 4 provides a more detailed discussion of sensitivity and uncertainty
analysis techniques and their application to LCI data.

3.4.2  Determining Whether DQGs Have Been Met

       Once key data sources have been identified and assessed against applicable DQIs
and data values have been evaluated for their impact on LCI results, the practitioner can
determine if the data source, and corresponding data values from that source, meet the
defined DQGs.  If the obtained data quality results meet the DQGs, then the practitioner
proceeds with the study and documents the data  quality results.  If the DQGs are not met,
the practitioner has a number of possible options, which reflect the iterative nature of
setting and achieving DQGs:

       •  collect additional, better quality data which satisfies the DQGs,
       •  redefine the DQGs,
       •  revisit and possibly redefine the goals  and scope of the LCI,
       •  proceed with the LCI and clearly communicate any limitations that should be
          placed on the results,
       •  abandon the LCI, or
       •  apply data compensation  methods to address problems with the data  (see
          Chapter 5 for a discussion of data compensation methods).

In all cases, whether or not  DQGs are met, practitioners should also identify and clearly
communicate any limitations that should be placed on the LCI results.
                                        3-24

-------
3.5    DOCUMENTING AND REFERENCING LCI DATA QUALITY RESULTS

       A final LCA report should include a reference for each data source used and
explicit information on the quality of the key data sources as well as important
assumptions used to generate values. It is important for users to understand how relevant
key data sources and assumptions are to the overall LCI and the corresponding quality of
those data sources.  For example, a practitioner may do a detailed data quality analysis of
ten key data sources where the evaluation indicates that half of the data sources have
"good" data quality and half have "fair" data quality. If the half with good data quality
were more important to LCI results than the half with fair data quality, overall LCI data
quality could be considered  good..  Documentation of this information will enhance the
clarity and transparency of the final LCI and LCA results.
       In the final LCI report, practitioners should provide a worksheet, or other means of
 clearly conveying data quality, for each of the key LCI data sources.  In cases where a
 large number of data sources were evaluated, practitioners may want to summarize data
 quality information by using summary graphics relating the quality of key data sources.
 While the use of summary graphics alone is not likely to provide sufficient level of detail,
 it may serve as a useful supplement to more detailed data quality information.  Broad
 summaries of data quality information should not be provided in lieu of the more detailed
 information in the worksheets.  Ultimately, internal and external users should be able to
 conduct their own quality .assessment of key data sources and assumptions based on the
 information provided in the final report.

       To the extent possible, the sensitivity and uncertainty associated with each
 important LCI parameter (or class of parameters) used  in the analysis should also be
 reported. In addition to identifying data quality, the report should identify the degree of
 variation in each parameter.  This information is important to understanding the total
 variability in the final LCI results.
                                         3-25

-------
                                        »<<. "V,'
        ,? ml"
^  In the final LCI report, practitioners should provide a
  "worksheet,  or other means of clearly conveying data
   quality,  for each  of the key LCI ' data  sources.
   Ultimately, internal and external users should be able to
 ^conduct their own  quality assessment of key  data
 'sources and 'assumptions based on the information
 ,i provided in the final report
       Practitioners conducting external studies should also distribute their LCI results for

peer review.  If an LCI is peer reviewed, the results of this evaluation should be included
in the final LCI.  Obtaining general agreement from peer reviewers or colleagues about
the quality of the LCI data and the assumptions and calculations used will lend significant

credibility to the analysis.
                                         3-26

-------
                                   CHAPTER 4
   APPLYING SENSITIVITY AND UNCERTAINTY ANALYSIS TECHNIQUES
                                  TO LCI DATA

       At the beginning of an LCI, sensitivity and, to a lesser extent, uncertainty analysis
techniques may provide a useful screening tool to approximate variability among available
data sources before the data are actually collected. During LCI data quality assessment,
sensitivity and uncertainty analysis techniques may be employed to pinpoint data values,
or sets of values, that contribute significantly to the LCI totals, thus providing
practitioners with information to help determine where to emphasize quality assessment
and improvement efforts.  Practitioners will likely want to consult a statistician in the
appropriate application of these techniques.

       Sensitivity analysis can be
performed on all types of LCIs: internal,
external, comparative, and
noncomparative.  Sensitivity analysis is
typically used to determine the effect of
changes in a model's input parameters on
the model's results (Morgan and Henrion,
1990; Clemen, 1991).  Three methods for
conducting sensitivity analysis are
discussed in this chapter, each requiring
the generation of a mathematical model to
determine the relative importance of
system input parameters.  LCA
practitioners can use these methods  to
determine the importance  of individual
LCI input parameters (e.g., raw materials, energy, and water inputs; and various waste
stream outputs) or combined parameters on overall LCI results.

       Uncertainty analysis is used  to measure the degree to which uncertainty in
individual parameters contributes  to the total uncertainty in model results (Morgan and
Henrion, 1990; Vesely and Rasmuson,  1984; and Finkel, 1990). Uncertainty analysis
involves developing probability distributions or ranges and applying different
mathematical methods to  determine possible combinations of parameters with a high level
                                        4-1

-------
of uncertainty.  In general, as the scope of an LCI broadens, the level of uncertainty in
LCI data and the overall LCI results increases.

       LCI input parameters found to be both highly sensitive and uncertain are prime
candidates to carefully assess, and if necessary, improve data quality. It should be noted
that when results are sensitive to a particular input parameter and the values of that input
parameter are highly variable, data quality is not necessarily a problem. It does, however,
indicate a need to estimate the distribution of output values and incorporate this range into
LCI results.                                             ;

4.1    SENSITIVITY ANALYSIS TECHNIQUES

       Sensitivity analysis indicates how final results are affected by changes in individual
or combinations of input parameters.  In the context of LCI, the application of sensitivity :
analysis may be useful in the following situations:

       • the analyst does not have a high degree of confidence in an important data
         source,    *                                   ,
       • the production system being assessed is highly variable, or
       • data for a particular element are missing or deficient (Arthur D. Little, 1993b).

       Sensitivity analysis is typically performed in a series of steps. Individual
parameters are varied one at a time while all other parameters are  held constant. The
impact on model results caused by varying an individual parameter reveals the
importance, or sensitivity, of that parameter to model results (Clemen, 1991; Morgan and
Henrion, 1990). Analysts typically use the following techniques to depict model
sensitivity: one-way analyses of individual parameters  to model results, "tornado graphs"
for one-way analyses, and simple ratios that depict the  sensitivity between parameters in
the same model or in different models (Clemen, 1991; Franklin Associates, Ltd., 1993;
and North, 1990). These techniques highlight the relative importance of model parameters
and may help identify places where additional data collection or compensation efforts are
needed to improve the quality of LCI results (Watson and Buede, 1987). Each of these
techniques is described briefly below.

       When interpreting the results of a sensitivity analysis it is important  to consider the
number of data values upon which model parameters are based, since any given parameter
value could represent an individual data point or the average of many data points.  In

                                       4-2

-------
general, averages of many data points produce results with lower variability and higher
reliability than individual data values (Franklin Associates, Ltd., 1993). Thus, when
interpreting sensitivity analysis results, analysts should refer back to the original data
source and worksheets from which the data value was derived.

4.1.1  Single System (One-Way) Sensitivity Analyses

       Single system, or one-way, sensitivity analysis is useful for evaluating the
importance of individual parameters to model results.  The sensitivity of individual
parameters relative to the system total is determined by calculating the amount individual
parameters would need to change in order for model results to be altered by a given
percentage.  For example, a 10 percent change in the system total may be chosen as a
general threshold above which changes in model results are considered to be "significant."
Practitioners  may want to consult a statistician and/or process engineer to help determine
appropriate threshold level for specific system.

       Table 4-1 summarizes a hypothetical one-way sensitivity analysis on energy use
and solid waste generation of a generic pulp mill product. The sensitivity of individual
parameters is calculated as the amount each parameter needs to change (while all other
parameters are held constant) in order for final results to be changed by a given
percentage.  In this example, to change total energy consumption by 10 percent or 5.7
million Btu,  raw materials energy would need to change by 380 percent (5.7/1.5 x 100).
To change total solid waste generation by 10% or 164 Ibs, raw materials solid waste
generation would need to change by 13,644 percent (164/1.2 x 100). The smaller the
percentage change of an individual parameter needed to change the total by a given
percentage, the more sensitive the total is to that parameter.  Table 4-1 shows that, for
energy consumption, product manufacture is the most sensitive parameter and consumer
use/disposal  the least sensitive parameter.  For solid waste generation, consumer
use/disposal  is the most sensitive parameter and transportation the least sensitive
parameter.

4.1.2   Tornado Diagrams

        Tornado diagrams are another method to illustrate model sensitivities. Tornado
diagrams, typically created as bar graphs, indicate the degree to which model outputs vary
due to predicted changes  in individual parameters (Clemen, 1991).  Instead of determining
the change in individual parameters needed to change model results by a given percentage

                                         4-3

-------
    TABLE 4-1. HYPOTHETICAL SENSITIVITY ANALYSIS WORKSHEET
        Calculation of the percentage by which parameter values must change
                        for the total to be changed by 10%
                                 Energy (mil Btu)
                   10% Sensitivity
 Raw Materials
 Pulp Mill
 Product Manufacture
 Packaging, etc.
 Transportation
 Consumer Use/Disposal
 Total
 1.5
18.0
28.0
 6.6
 2.5
 0.1
  378
   32
   20
   86
  227
5,670
56.7
 Example: To change the system energy results by 10% (5.7 mil Btu)
           product manufacture energy consumption must change by 20%
           (5.6 mil Btu).
                                 Solid Waste (Ibs)
                   10% Sensitivity
Raw Materials
Pulp Mill
Product Manufacture
Packaging, etc.
Transportation
Consumer Use/Disposal
Total
1.2
442.0
155.0
84.0
0.1
955.0
1,637.0
13,644
37
106
195
163,730
17

 Example: To change the system solid waste results by 10% (164 Ibs),
           consumer use/disposal generation of solid waste must change
           by 17% (162 Ibs).
Source: Modified from Franklin Associates, Ltd., 1993.
(as described in the previous section), tornado diagrams illustrate the change in model
results for given levels of change in individual parameters.
                                      4-4

-------
       In the tornado diagram shown in Figure 4-1, the top bar, which is always the
longest, represents the most sensitive parameter (i.e., the parameter that has the greatest
relative impact on model results), and the bottom bar, which is always the shortest,
represents the least sensitive parameter (i.e., the parameter that has the smallest relative
impact on model results). The bars on the diagram are created by developing a model of
the system or problem under evaluation.  Then high and low values are selected for each
parameter while all other parameters are held constant.  The lengths of the bars represent
the extent to which air releases change relative to given changes in LCI parameters.  In
this example, a 0- to  10-percent change in virgin materials has the greatest effect on total
air releases and is thus considered the most sensitive parameter.
Virgin Material -
Energy Consumption -
Recycled Materials -
Water Use -
Transportation -
Packaging -
Water Releases -
Hazardous Waste Releases -
Solid Waste Releases -
<

- h^-n^^^s^H^i ^^vyrt; »y & v*? ^r">*^5. > ^s **.*" , T^»*s^»1

- flRB^^^^^fW-fi^yCX "1!C\vSK*^i.%"^4*1,f^5^l^'1^|

- KXillftC^slSWNWSJSMSfeyffl^f^i

MSM*1l8KRf^«*M:S#iPHf "^fillS8-. j

fSM^K.^jM.X^SLXSS^.'M

r lS^WJ8tllMW?tlrf%il

3 5,000 10,000 15,000 20,000 25,000 30,000
Air Releases (Ib)
                 Figure 4-1.  Sensitivity of Air Releases for Product X

       Table 4-2 presents a hypothetical data worksheet for constructing a tornado
diagram of air releases associated with given input parameters for product X.  To keep the
analysis simple, low and high input parameter values were set at 1 and 10 percent of the
individual parameter values. In practice, process engineers and industry  experts may need
to be consulted to help determine appropriate low and high input parameter values.
                                         4-5

-------
TABLE 4-2. HYPOTHETICAL TORNADO DIAGRAM WORKSHEET FOR
             PRODUCT X
Model Parameters
Virgin Material
Energy Consumption
Recycled Materials
Water Use
Transportation
Packaging
Water Releases
Hazardous Waste
Range of Change in
Parameter Values8
1 -
1 -
1 -
1 -
1 -
1-
1 -
1 -
10%
10%
10%
10%
10%
10%
10%
10%
Range of Change in Air
Releases (Ibs)
1,000
1,000
1,000
7,500
5,500
5,000
8,500
9,000
- 30,000
- 24,000
- 19,000
- 24,000
- 18,500
- 14,000
- 12,500
- 11,000
       Releases
       Solid Waste Releases
 Example:
                           1 - 10%
A 1- to 10-percent change in water usage will
change in air releases.
         9,750 - 10,075
result in a 7,500- to 24,000-lb
"• Note that the range of change of 1 - 10% was chosen for purposes of example only.
 In practice, a range of reasonable values would be more appropriate.
Table 4-2 shows how 0- to 10-percent changes in input parameter values affect air
releases. The larger the range of change in air releases, the greater effect the input
parameter will have on the final LCI results.

       Although the tornado graph is a relatively simple sensitivity analysis technique, it
requires developing a mathematical model to evaluate the relationship between model
inputs and outputs.  Practitioners analyze  numerous parameters and information sources
when building an LCI. The pertinent information typically is entered into a computer and
a model of the major input/output processes is developed.   Practitioners may want to
consult a statistician for assistance in

       •  designing the mathematical model,
       •  identifying important model input parameters, and
       •  analyzing and interpreting sensitivity analysis results.
                                        4-6

-------
       The tornado diagram technique is most applicable to an evaluation of the
sensitivity in single systems.  If a comparative LCA is conducted, this approach can be
applied to each model independently, and the results of each tornado diagram can be
compared to determine the most sensitive input parameters in each system.

4.1.3  Ratio Sensitivity Analysis

       Sensitivity in model parameters can also be expressed as a ratio (North, 1990;
Morgan and Henrion, 1990).  This method is a comparative analysis and is applicable
only to comparative LCAs.  Rather than varying individual parameters one at a time to
determine the effect on model results, two systems or processes can be compared
according to an identified criterion (e.g., total emissions, where the process with the
lowest emissions is ranked higher). A ratio is calculated to determine the percentage a
parameter would need to change to reverse the rankings.

       Table 4-3 provides an example of energy consumption data for two different
production and delivery processes. Energy usage is broken down by the different
components of the process:  packaging processes, distribution of the product to the retail
sector, and final disposition of the product.  Based on a "least emissions" criterion, the
two processes are ranked by total energy usage. System 1 uses approximately 16.8
million Btu, and System 2 uses approximately 13 million Btu. Thus, System 2 is ranked
higher (less energy consumption) than System 1. The sensitivity is expressed as the ratio
of the difference in total energy consumption (16.8 - 13.2 = 3.60 million Btu) over the
amount of energy  consumed by each process component.  For example, for the
"distribution to retail" component of System 1, the percent error is 3.60/1.03 = 3.49 or
approximately 350 percent.  This indicates that energy consumption in the retail
component of System 1  would have to  change or be in error by  approximately
 350 percent to reverse the rankings of the two systems. Given that the data are unlikely
 to be in error to this order of magnitude, the "distribution to retail" component of System
 1 would not be considered important (or sensitive) to the overall results.  However, if a 5-
 or 10-percent error hi the data could reverse the rankings of the two systems, this would
 indicate where data quality resources may best be focused.
                                         4-7

-------
TABLE 4-3.
ENERGY SENSITIVITY ANALYSIS ON TWO PRODUCTION
SYSTEMS


System Components
Energy
Consumption
(million Btu)



Percent Error Needed
to Reverse Total
Ranking
System #1
Component 1
Component 2
Component 3
Component 4
Component 5
Subtotal
Primary Packaging
Secondary Packaging
Corrugated
Stretch Wrap
Subtotal
Filling and Packaging
Distribution to Retail
Disposition
Total
4.200
6.300
0.555
0.921
1.300



0.200
0.050










13.276
1.200



0.250
0.985
1 .030
0.042
16.783
86
57
649
, 391
277

300

1,800
7,200

365
350
8,571
21
System #2
Component 1
Component 2
Component 3
Component 4
Subtotal
Primary Packaging
Secondary Packaging
Corrugated
Stretch Wrap
Subtotal
Filling and Packaging
Distribution to Retail
Disposition
Total
3.600
3.200
0.640
1.350



0.260
0.032









8.790
2.300



0.292
0.752
0.996
0.056
13.186
100
113
563
267

156

1,385
11,250

479
361
6,429
27
Source: Franklin Associates, 1992.
                                 4-8

-------
       Although straightforward, the ratio sensitivity analysis approach has some
limitations.  First, it is applicable only to comparative LCIs.  If a single product or
process was under evaluation, a second system would be needed to use for comparison.
However, it is not clear whether an existing LCI would be suitable for use in this
situation.  Also, the percent-error calculation evaluates process components separately
when in actuality each component  is part of a complex industrial system.  Therefore,
calculating a simple ratio or percentage of error may fail to show the interrelationships
between the many components of an industrial process.

4.2    UNCERTAINTY ANALYSIS

       Uncertainty analysis is used to determine how various sources of uncertainty in
parameter inputs—such as  incomplete information, variability, or the structure and
simplifying assumptions used in the creation of a model—influence the uncertainty in
model results.  The basic principle of uncertainty analysis is to assign probabilities to
uncertain parameters and then use  statistical or mathematical techniques to determine the
uncertainty in model outputs. In general, probabilities are assigned based on past
empirical data or expert or subjective judgment (Keeney and Raiffa, 1976).

        Uncertainty analysis may be used to better understand the importance of a data
source or model and for deterrnining whether to acquire additional information (Morgan
and Henrion, 1990).  Uncertainty analysis techniques may also be used to determine
where data quality resources could best be focused and to determine the total uncertainty
in the final LCI results. As a cautionary note, it should be recognized that the use of
expert or subjective judgment in uncertainty analysis leads to a larger "garbage in, garbage
out" issue.  That is, very elegant uncertainty distributions may be developed, but they
consist of entirely pseudo-information of unknown accuracy.

 4.2.1  Sources of Uncertainty

        The process of  determining sources of uncertainty involves asking a carefully
 stated question for which the answer is uncertain and itemizing all the sources of
 uncertainty that could contribute to the answer (Finkel, 1990). Sources of uncertainty that
 are useful to consider when collecting and analyzing data include (1) random error in
 measurement and sampling methods, (2) systematic error in measurement and sampling
 methods, (3) natural variability, and (4) approximation in modeling.
                                         4-9

-------
Random Error
       The most commonly studied and best understood cause of uncertainty is random
error in direct measurements of a quantity.  Random error results from imperfections in
the measuring instrument and observational technique (Morgan and Henrion, 1990).
Taking repeated measurements can help reduce the effects of uncertainty caused by
random error.

To estimate random error, an analyst must have the following:

       •  primary or secondary statistically sampled data,
       •  a complete description of the data collection method, and
       •  nonaggregated data.

With all three of the prerequisites, the process of describing uncertainty is relatively
straightforward because standard statistical measures (e.g., standard deviation, confidence
intervals) can be used. Probability sampling is necessary to quantify sampling error.
Replicated measurements are needed to quantify measurement error. Although such an
approach may not be amenable to nonparametric samples, it does provides a powerful tool
for describing uncertainty associated with statistically sampled data.

Systematic Error
       Unlike random error, systematic error results from an inherent flaw or bias in the
data collection or measurement process (Finkel, 1990).  Systematic error may be caused,
for example, by improper calibration of measurement instruments or by the continuous
misreading of a measurement instrument due to such factors as poor training or the angle
at which the gauge is read.  In a survey, systematic error occurs when questions are
phrased so that an unintended response is consistently obtained. Or, the sampling frame
may systematically exclude members of one segment of the population.

       Systematic error can be reduced by careful design of the data collection plan.
Some  degree of systematic error will still exist, however, no matter how carefully the plan
is designed.  For systematic error that cannot by eliminated, and is large enough to be of  '
concern, subjective probability distributions can be assigned to the possible effects of the
systematic error. For example, utilization of blocks and spikes can allow estimation of the
bias, and statistical compensation.
                                        4-10

-------
       In assigning a subjective probability distribution to a series of possible effects, an
analyst may use expert opinion to assign numerical probabilities to the occurrence of an
event,  which represents the frequency with which the event would occur in repeated trials
(Kennedy, 1991).  Expert opinion is the likely method for developing these subjective
probability distributions.  Once the distribution has been developed, methods for
propagating and analyzing the effects of uncertainty can be employed.

Natural Variability
       Variability refers to the natural fluctuation in variables over time or space.
Examples include the salinity  of sea water, phosphate concentrations in waste water, and
the heights of 20-year-old Americans.  Assessing the uncertainty caused by natural
variability of a quantity can be simple provided that enough data are available on the
quantity to form a frequency distribution. Unlike a probability distribution, which
provides a probability of obtaining a specific value for a variable, a frequency distribution
expresses the number of times a variable takes on a specific value.  Thus it shows which
values have occurred more often. A relative frequency distribution expresses
the percentage of time a variable has taken a specific value.

       If LCI data are aggregated, it may be impossible to construct a frequency
distribution.  However, if a frequency distribution is  available  and is considered to be
appropriate for assessing uncertainty, the uncertainty about a quantity can be represented
by a probability distribution with the same parameters  as the frequency distribution.
Analysts may also assign ranges with defined boundaries to each uncertain variable.
Uncertainty analysis methods  (described in Section 4.2.3) can  then be used to analyze the
effects of parameter uncertainty.

Approximation in Modeling
       Uncertainty resulting from approximation may be assessed in the industrial process
model used to generate final results or in a primary or secondary modeled data source
used as an input to the LCI.  Uncertainty from approximation occurs when models are
used or created.  Designing a model that truly represents the process or system of concern
is virtually impossible.  Therefore, certain approximations and assumptions must be made.
                                         4-11

-------
       Specific sources of uncertainty in a model include (1) model structure, (2)
abnormal conditions, (3) excluded variables, and (4) surrogate variables (Finkel, 1990).  A
lack of independence between input variables is an additional source of uncertainty.
Inappropriate model structure can lead to results that do not represent the process being
modeled. Abnormal conditions may prevent generalizing a model of the system of
interest.  Model verification and validation can help pinpoint areas where uncertainties
may be reduced by denoting why the model structure is inappropriate or by describing any
abnormal conditions that may exist. Excluding variables can increase the uncertainty
associated with a model because  one or more variables that have a strong influence on the;
outcome may be inadvertently leftout or dropped to increase computational efficiency.
For LCI data, sensitivity analysis or expert opinion could be used to determine which
variable(s) might be considered for exclusion from the model.  Using surrogate variables,
or proxies (discussed in Chapter 5), can also add uncertainty because they do not measure
the exact quantity of interest.

4.2.2  Qualitative Measures for Controlling Data Errors and Uncertainty

       Determining data error may be done with the statistical  techniques just discussed
as well as through more qualitative means.  The following qualitative, proactive measures
should help practitioners reduce errors and uncertainty in LCI results:

       •  careful planning and execution of the LCI data development process,
       •  formal and informal reviews of the LCI data development process,
       •  focusing data quality efforts on data values that have the greatest influence on
          LCI results, and
       •  ensuring that sample sizes for key data values are as large as  possible, thus
          increasing the reliability of the estimates and decreasing variability (Arthur D.
          Little, 1993b).

       In the planning and execution of the LCI data development process, each step in
the process  must be performed carefully to maintain consistency with the study goals and
scope, to minimize errors, and to ensure that final results  represent the system of interest.
For example, DQGs should be defined in a manner that fosters the use of representative
data, the study scope and boundary  settings should include all relevant life-cycle inputs   ;
and outputs of the system of interest, key assumptions should be substantiated and clearly
documented, the data collection questionnaires  should be designed to obtain the
appropriate  data values, and so on.                                                    '.

                                         4-12

-------
       To help ensure that final results are representative, formal and informal review of
the data and the data development process can be conducted. Formal review can include
development of data quality worksheets and submission of the data collection plan and
final worksheets to a peer-review panel.  In addition to formal review, informal reviews
by analysts, peers, industry experts, and consultants may take place in each of the data
development steps and may help to minimize errors that could undermine results. For
example, a reviewer may discover that in the development of process input/output charts,
an important aspect of the process was inadvertently excluded.  Thus, even if high quality
data are used, the final results may not be representative—these types of omissions  may
be avoided through a carefully planned LCI review process.

       Focusing data quality efforts on data values that have the greatest influence  on LCI
results refers back to the issues of large versus small data values (see Section 2.3) and
sensitivity (see Sections 3.4 and 4.1). Large data values generally contribute
proportionately more to  final LCI results  and the quality of those results than do smaller
data values.  Therefore efforts  to reduce errors and uncertainty in larger data values can
provide a proportionately greater improvement on overall data quality. Data values found
to be highly sensitive to a specific LCI parameter, and contribute significantly to the value
of that parameter, are also a good candidates for focusing data quality efforts. By
Improving the quality of highly sensitive  data values, the quality of overall parameter
values may also be improved.

       Regarding sample size  and variability; practitioners should consider two  important
factors:

       •  Outliers in the data set, both large and small, have the greatest influence on the
          total variance and uncertainty of the data.
        •  A smaller total number of data values means less reliability in results (Franklin
          Associates, Ltd., 1993).

 These two  factors highlight the importance of achieving maximum sample sizes for key
 data inputs and carefully evaluating outliers to ensure that they are legitimate values.
                                          4-13

-------
4.23  Quantitative Methods of Uncertainty Analysis

       Uncertainty analysis involves determining how the uncertainties involved with
input parameters affect the uncertainty of model results. There are a variety of methods
for analyzing uncertainty.  This section describes three possible methods to analyze
uncertainty including uncertainty propagation, the Gaussian approximation method, and
Monte Carlo simulation.

Uncertainty Propagation
       Any measured value has associated with it some degree of uncertainty for which
bounds must be determined in order for this uncertainty to be meaningful (EPA, 1987).
When such a measured value is used in a calculation, its associated uncertainty results in
an uncertainty in the computed value. Thus, one is then faced with determining
uncertainty bounds on the computed value based on uncertainty associated with input
values.
       The manner in which uncertainties propagate in a calculation depends on the
functional relationship between the input variables and the type of error (random or
systematic) involved (EPA, 1978).  Uncertainty propagation formulas for random and
systematic errors are listed in Table 4-4 for four basic arithmetic operations.  The
formulas provide confidence intervals for functions of two independent variables.  From
these confidence intervals, bounds may be placed on the degree of uncertainty associated
with a final calculated value that has been propagated from the two independent input
variables. In the context of LCI, these functions may be useful for identifying the level of
uncertainty associated with calculated data values and/or the final results of the LCI
computation model.  For a more complete discussion of uncertainty propagation and the
application of uncertainty propagation formulas, refer to EPA (1978).

Ttte Gaussian Approximation Method
       The Gaussian method takes into account both  sensitivity and uncertainty. Like
sensitivity analysis, low and high values representing plausible ranges or probability
distributions are selected for individual parameters while all other variables are held
constant to determine the relative uncertainty associated with each variable and with
overall results. Once a mathematical model for the system of interest has been developed,
rates of change in the model results are calculated based on unit changes in a model
parameter; holding all other variables constant.  The model results are then evaluated
against theoretical values to provide initial "best guess" values for model parameters.

                                        4-14

-------
TABLE 4-4.  UNCERTAINTY PROPAGATION FORMULAS FOR RANDOM AND
             SYSTEMATIC ERRORS
 Operation
                    Uncertainty Propagation Formula
 Random Error
Addition,
            + x
                                               B
                                                  f-Va
 Subtraction, Xj - x2
                                          A H- B +/-Va2 + b
 Multiplication, xxx2
                                             B
                                               +/-VB2
                                   a
                                       b2
 Division, x/x2
                                        A/B) +/-
                                                 \
                                                   5,2     A2.
                                                  JL -f .2_b2
                                                  B2    B2
 NOTE: A +/- a and B +/- b are confidence intervals for x1 and x2.  The formulas give
        confidence intervals for the various mathematical operations performed with xx and
        x2.  The formulas are valid only when A and B are statistically independent.
 Systematic Error
 Addition
 Subtraction
 Multiplication
 Division
                     Lower Bound
                                                           Upper Bound
A + B - (a + b)                  A + B + (a + b)
A - B - (a + b)                  A - B + (a + b)
AB + sgn(AB)ab - (a|B |  + b|A|)  AB+sgn (AB)ab+(a|B | + b| A|)
  A -    a|B|+ b|A|               A. _    a|B|+ b|A|
  B"
                               + sgn(AB)b|B|
                                                               B2-sgn(AB)b)B|
 NOTE: A +/- a and B +/- b are error bounds for Xj and x2.  The formulas give upper and
        lower bounds for the four basic mathematical operations performed with xt and x2.
        The formulas are valid only when Xj and x2 are functionally independent variables.

 NOTE: These equations involve strong assumptions regarding the shape of the underlying distributions, as
 well as their independence. Refer to EPA (1978) for a complete discussion of these issues.
 Source: EPA, 1978.
                                       4-15

-------
       Following the evaluation of the model results, each best guess value is squared and
multiplied by the variance of the input model parameters.  If the individual parameters are
independent of each other, the quantities calculated above  may be summed to produce an
estimate of the variance of the model results.  If the individual parameters are not
independent of each other, the calculation can be used to estimate the covariance of each
pair of explanatory variables.

       The Gaussian method allows for the assignment of ranges or probability
distributions for uncertain model inputs to assess the composite uncertainty in model
outputs.  This method can also reveal which sensitive variables are also uncertain.  If
parameters identified as most important (or  sensitive) to LCI results are also highly
uncertain, practitioners should focus data quality resources on these parameters. "When a
decision must be made about whether to expend resources to acquire additional
information, in general, the greater the uncertainty, the greater the expected value of
additional information"  (Morgan and Henrion, 1990).

       The formulas for these calculations and a detailed discussion of the Gaussian
method are beyond the scope of this document.  The reader  should consult a statistician or
refer to Morgan and Henrion (1990) for a more complete discussion of the Gaussian
method.

Monte Carlo Simulation
       Monte Carlo simulation involves  simulating a system by randomly selecting values
for each explanatory variable in a model (as described in the Gaussian-approximation
section) then plugging the values into  a functional equation and producing a simulated
output value.  Monte Carlo simulation requires the following steps:

       • model the data generating process,
       • employ the generating process and an estimator  to create several estimates, and
       • use  these simulated data to estimate the population distribution properties,
         assuming that the model is at least approximately  correct (Kennedy, 1991).

       For a more complete description of these steps refer to Kennedy (1991). Through
the use of computers, this process is repeated hundreds or thousands of times to generate
a simulated distribution for the output variable (Morgan and Henrion, 1990).
                                        4-16

-------
       The random selection of the explanatory variables is typically based on frequency
distributions or Bayesian methods. The simulated values are usually assumed to be
independent of each other because this allows for easier statistical analysis.
                                         4-17

-------

-------
                                    CHAPTER 5
       COMPENSATING FOR MISSING DATA AND DATA DEFICIENCIES

       This chapter addresses missing data (actual missing values) and data deficiencies,
such as weaknesses in measurement techniques, and describes mathematical techniques
used to compensate for missing data and data deficiencies encountered during the LCI
data development process.
                                                    s   *   «•„      v         *vV
                                           4? f r&*fe crifical *hat practitioners deariy communicate ^
                                           -si" the effect of daiaMsompensairan methocte qa id ,
                                           -* s "t !!-„»% "'»>. ?  V >•  o   v      *">'•,   *> <•
                                           ^   ftaCHITQ: t   <**<     ^            1
                                           *  .v.jesuii».^  •*              ^    ^
                                            ,? "k.
       While the methods described are
suitable for addressing many problems
with LCI data, they vary in complexity
and may require consultation with experts
or the use of advanced statistical
techniques.  The methods described in this
chapter have not been tested in LCI
applications.  It is recommended that
practitioners consult experts from the
applicable methodological field before
applying these techniques.
5.1    MISSING DATA AND DATA DEFICIENCIES

       Practitioners must frequently compensate for missing or deficient data to complete
the LCI (Arthur D. Little, 1993b).  Three general types of missing data are:

        •  data that may not be available for a given industry, facility, product, or process;
        •  data that may be available for a product but not for all the components of the
          process used in making that product; and
        •  data derived from a survey that may be missing due to nonresponse by facilities
          receiving the survey (unit nonresponse) or nonresponse to specific items on a
          questionnaire (item nonresponse) (Lepkowski, Landis, and Stehouwer, 1987).

        Data can be missing for other reasons. For example, a data set with values on air
 emissions may have a previously determined threshold level below which values are
 marked with a code instead of the actual value.  Other values may be missing on the basis
 of a second variable—that is, if information for a variable is not available, values for
                                         5-1

-------
other variables that are dependent on the initial variable may also be indeterminable
(Azen, Guilder, and Hill, 1989).

       Unlike missing data, data deficiencies are associated with the methods involved in
collecting, synthesizing, analyzing, and describing data. Some causes of data deficiencies
are faulty sampling design, data collection methods, aggregation techniques, description of
variables in data sets, and measurement methods (Plewa et al., 1988). For example, if
data were collected on electricity usage for all firms within an industry, but the reported
data were rounded up to the nearest 100 kWh before any analysis, measures such as the
sum, sample mean, and  sample standard deviation would be biased.

5.2    ADJUSTING FOR MISSING DATA AND DATA DEFICIENCIES

       As discussed in section 3.4.2, once key data sources have been identified and
assessed against applicable DQIs and data values have been evaluated for their impact on
LCI results, the practitioner can determine if the data source, and corresponding data
values from that source, meet the defined DQGs.  If the obtained data quality results meet
the DQGs, then the practitioner proceeds with the study and documents the data quality
results. If DQGs are not met (i.e., data are missing or deficient) practitioners have several
options:

       •  collect additional, better quality data which satisfies the DQGs,
       •  redefine the DQGs,
       •  revisit and possibly redefine the goals and scope of the LCI,
       •  proceed with the LCI and clearly communicate any limitations that should be
          placed on the  results,
       •  abandon the LCI, or
       •  apply data compensation methods to address problems with the data.

       A number of different factors contribute to whether the practitioner should
undertake compensation efforts. Such factors include:

       •  importance of the data relative to other missing or deficient data points;
       •  ease or feasibility of data compensation (this is not discussed); and
       •  resource constraints (time, money, and technical knowledge).
                                        5-2

-------
                 communicate, the effect of these techniques «on, LCI
        ^v"**%result"'"   (\  '  '    "    ^ ' "  "  * »'.•">
         ^4^1.^ *   i    '"^>>'*^*  V*      *  A <• f V **     %     
-------
       The imputation methods discussed below may be applicable to LCI data.
However, clearly denoting any imputed data and explaining the uncertainty involved with
the use of these methods is very important.  While these methods "compensate" for data
problems, they do not enhance data quality or replace missing data with more applicable
(or detailed) facility data.  In many cases, it will be necessary and appropriate to consult
with a statistician on the application of these methods and the uncertainty that they
introduce to the LCI process.

Logical Proxies
       One way to fill in missing data is to find a "reasonable" substitute for the missing
information. Logically developed proxies are one viable substitute for missing data,
especially when there are no data on a given component of the process or product of
interest. For example, if no data exist on water  releases for a specific facility  within an
industry, certain facilities in the industry for which the data do exist could be evaluated
for use as a proxy for the facility of concern.  Similarly, if data were missing for an entire
industry, an analyst could use data for the industry determined to be most similar to that
industry as a proxy.  Expert opinion is sometimes required to determine the validity of the
proxy.

Deductive Imputation
       Deductive imputation involves examining patterns in the data to determine and
draw conclusions about missing data values.  Suppose an analyst is examining a raw
material used in a specific product  In making the product, a coproduct and waste
releases are created.  Portions of the raw material are transformed into the product, the
coproduct, and the waste releases. Now suppose that the analyst has a data set for  an
industry that used 500,000 kg of the raw material, 400,000 kg of which became part of
the product  and 37,000 kg of which went into solid waste.  Using deductive
imputation—taking the total amount of inputs and subtracting all the known amounts of
outputs—the analyst could deduce that 63,000 kg of the material went into the  coproduct.

       In the above example the analyst would need to consider whether a mass-balance
equation would suitably represent the process. In order  to use this method, some
distinguishable pattern in the data or the process must exist, such that there is a high
                                        5-4

-------
likelihood that the calculated value is equal or very close to the correct value (Kalton and
Kasprzyk, 1982).

Mean Imputation
       The mean imputation overall method involves taking the mean of the known data
values and substituting this mean for each of the missing data values (Kalton and
Kasprzyk, 1982).  For example, if a data set contained 500 reported values and 100
missing values for the amount of a chemical released into a waste  stream, an analyst
could calculate the average of the 500 known values and substitute this value for the 100
missing values.  Analyses could then be performed using all 600 data points.

       This method should be used with caution given that the variability in the unknown
missing values is not taken into  account (Rubin, 1987).  Because the sample mean is used
for the missing data points, the sample variance and standard deviation will most likely be
significantly understated.  The results also may be biased to the extent that missing values
consistently differ from secured values.

       Similar items hi a sample can be grouped into imputation classes, determined by
evaluating the subjects and by expert opinion.  They can also be determined by
similarities in other variables (Cox, 1981; Kalton and Kasprzyk,  1982). For each
imputation class, the mean of the respondents' values is  calculated.  Each calculated mean
is used to replace the missing data in its own class. As in the mean imputation overall
method,  using the sample mean as a replacement may result hi biased statistics in the
analysis.

       Referring back to the chemical example, an analyst may determine from an
examination of the data or expert opinion that the amount of the chemical released  varies
significantly from one industry to another. Industry classifications can be the foundation
for the imputation classes. The mean imputation procedure could then be used within
each class and the classes analyzed separately.

Random Imputation
       In contrast to the mean imputation overall method, the random imputation method
 accounts for some of the variability hi the unknown data values  (Rubin, 1987). Each
 missing  value is replaced with a data value selected at random from the respondent data.
 Using the chemical release example mentioned in the previous section, each of the 500
 known values could be assigned a probability of 1/500  (e.g., by  using a random number

                                         5-5

-------
generator, or random digit table).  One value is then chosen as the 501st data value.  The
value should not be removed from possible further selection as this complicates the
analysis. The procedure is continued until all 100 missing values have been imputed.

       Random imputation within classes combines the random imputation overall
methodology with the formation of imputation classes.  After forming imputation classes,
the analyst replaces missing values with values selected at random from the respondents'
values within the class being imputed  (Kalton and Kasprzyk, 1982).  As in the case of
comparing the random and mean imputation overall methodologies, random imputation
within classes accounts for some of the increase in variability that is not accounted for by
the mean methodology.

Hot-Deck Imputation
       Hot-deck imputation is a variation of cold-deck imputation, which must first be
understood to describe the hot-deck imputation procedure.  Cold-deck imputation uses
values from a prior distribution to replace missing values in the data set of concern.  This
requires a data set similar to the one for which values are being imputed.  The imputed
values can be selected using randomization or systematic methods.  In addition, the data  ,
are often divided into imputation classes (Chapman,  1976).  Cold-deck procedures are no
longer common due to the criticism that current data were not being used for imputation
procedures (Chapman, 1976; Cox,  1981). Therefore, cold-deck procedures are not
recommended for application to LCI data.

       Unlike cold-deck methods, hot-deck imputation uses only the data set for which
the missing values are being imputed.  As in the cold-deck procedure, imputation classes
need to be formed. An initial value (or cold-deck) based on current data or expert opinion
is derived for the variable of concern.  The records are then analyzed sequentially. If the
first data value is present, it replaces the cold-deck value. If not, the cold-deck value is
used to replace the missing value.  This procedure is repeated until all of the missing data
values have been filled in.                               .

       Hot-deck procedures could be used in conjunction with published data that have
missing values.  For example, if an analyst was using a database to determine energy
usage for transportation in a given industry, and the values for the 6th, 27th, and 65th
facilities in a specific imputation class were missing, these values would be replaced by
the 5th, 26th, and 64th values  in that class, respectively. However, using expert opinion
and logical proxies may produce more accurate values for the missing data.

                                        5-6

-------
Regression Imputation
       Regression imputation includes predicted imputation and random imputation.  In
the predicted method, a regression model is developed in which changes in an independent
variable that is missing values are characterized by changes in other existing independent
variables.  These existing independent variables are obtained from subjects/units for which
all values have been reported (Kalton and Kasprzyk, 1982).

       The predicted regression imputation method can be useful if the relationship
between the variables is not easily determined by inspection or expert opinion.  For
instance, suppose changes in the amounts of any of ten inputs change the amount of a
certain release.  One must not only determine how a change in one of the inputs affects
the release amount but also how all possible simultaneous changes of two or more inputs
affect the release amount. A statistician is required to develop an appropriate regression
model.

       The random regression imputation method is an extension of the predicted
regression imputation; the entire predicted method is used but each imputed value has a
residual (random) term added to it to account for the difference between values predicted
by the regression model and values that are measured  or observed.

       The random term, or residual, is the difference between the observed and predicted
value of the dependent variable for specific values of the independent variables. Residuals
provide insight into the predictive error of the model (i.e., the difference between values
predicted by the model and observed values).  Because a regression model is used to
predict values for missing data, a residual cannot be calculated. Rather, a residual is
chosen based on deductive or more complex statistical methods.  Refer to Kalton and
Kasprzyk (1982) for further information on random regression imputation.

5.2.2  Weighting Methods

        One method commonly used to compensate for unit nonresponse in surveys is the
development  of weights to inflate the responses of units that do respond.  The various
weighting methods correspond to the statistical sampling methods used in choosing the
 survey participants.  Weighting methods can also  be used to compensate for item
 nonresponse in analyses of individual parameters  by computing an analysis weight for
 each item, if, that is affected by nonresponse.
                                         5-7

-------
       An example of a weighting method could involve surveying 20 companies on
energy usage.  If only 8 companies respond, the analyst examines other information about
the companies to see which responding companies are most similar to each nonresponding
company. If one responding company is most similar to 2 nonresponding companies
while another is most similar to 3 nonrespondents, these responding companies would be
assigned weights of 3 and 4, respectively. The weights are calculated by summing 1 (for
the responding company) to the number of nonresponding companies deemed to be most
similar to the responding company. Each responding company that was not considered to
be most similar to one or more of the nonresponding companies would receive a weight
of 1.  The sum of the responding companies' weights should then equal 20 (Rubin, 1987).
Because of the complex statistical methods involved, the estimation procedures associated
with this or any other weighting method are beyond the scope of this document.  The
interested reader is referred to Little and Rubin (1987).

       The use of weighting methods by an LCI analyst would be rare, as very little of
the primary data are gathered using large-scale surveys of companies or facilities.
However, if a secondary data source such as a database contains data obtained through
sampling methods, to gather  data, determining the validity  of any weighting methods used
in assembling the database may be important.  This requires consulting a statistician and
obtaining a description of both the sampling and weighting methods.
                                       5-8

-------
      APPENDIX A




DATA QUALITY INDICATORS

-------
A.1    ACCEPTABILITY

       Acceptability refers to the degree to which the data source has been peer reviewed,
evaluated against a standard, or checked for errors through expert judgment.  Acceptability
also refers to the extent to which the model has been checked to determine if it represents
what it is supposed to represent, (e.g., emissions from a facility's high-density
polyethylene production line).  The acceptability DQI encompasses the concepts and
procedures of both verification/validation and quality assurance/quality control (QA/QC).

       Data that are peer reviewed and found to be of adequate quality are generally
considered to be acceptable. Peer review of a primary data source generally requires  the
consensus of process engineers and others knowledgeable in the production system of
interest.  Peer review of a secondary data source may show that the source was not
officially validated or verified although it is regarded by colleagues in the field as a fairly
accurate data source. For example, although the TRI data base has not been validated or
verified, it is an accepted source of air, water, and waste emissions data for those
industries subject to the reporting requirements.  For an LCI, using the TRI data base may
be more acceptable than using a data base that has not at least received similar
acceptance.  Peer review can also be applied to assumptions and calculations used to
interpolate, extrapolate, or back-calculate data suitable for an LCI.

       Acceptability is also expressed by the degree to which a data source has been
evaluated against an accepted method or standard, and/or checked for errors. This means
that an "acceptable" model should be both verified and validated.  A model can be
validated by consulting an expert familiar with the process or product being modeled.
Model verification involves comparing the model output with actual facility data or a
reference value.  If the model output is similar to the known value(s) then the model is
likely to generate relatively accurate results (Lewis and Orav, 1989).  If a model is
validated but not verified, or verified but not validated, the  accuracy of the results could
be questioned.

       Validation/verification techniques are most easily applied to primary data generated
through the use of a model. The applicability of these techniques is less clear  for model-
based secondary data. Unless the model and appropriate documentation can be obtained,
only qualitative verification/validation techniques can be employed. Expert judgment can
determine if the model results are representative of the industry or facility under
                                        A-2

-------
evaluation and the natural variability in the model.  Similarly, the model can be compared
against known facility input/output values.

       In cases where the data or model can be subjected to standard methods of
statistical analysis (e.g., primary data collected through sampling or generated pursuant to
a statistical protocol), QA/QC is the preferred analytical technique under this DQI. EPA
has developed QA/QC methodology requiring that certain objectives or goals be set, that
DQIs (such as precision, bias, and representativeness) be used to evaluate the objective,
and that the data are collected pursuant to sound statistical procedures.

A.2    BIAS
       Bias refers to  the level of systematic error that causes the values of a data set to be
consistently higher or lower than the corresponding "true" parameter values. These "true"
parameter values serve as baseline, or reference, values against which LCI data can be
compared.  Although the discussion of bias in this section assumes that these reference
values exist for LCI data, in many cases, they are  not known.

       Bias can be created by, among other things, a weakness in the data collection
methodology  (e.g., improper sampling, uncalibrated measurement equipment, or
consistently rounding up values).  As indicated in  Table A-l, bias can be measured by
taking the difference between the average measured value (X) and the reference value of a
standard material (T) (EPA, 1984; EPA, 1991). In the Table A-l example, bias is -0.5.
This means that the average measured value is .5 units lower than the reference value (T).
An alternative index  of bias is the percent bias, which indicates the percent difference of
the averaged  measured value from the reference value. In the Table A-l example, the
percent bias is 1 percent (EPA,  1984).

        The bias  DQI can also be applied to data generated through modeling.  For
 example, in modeling air emissions, model bias can be evaluated if there is a reference
 air-emission level or concentration against which the model results can be compared.
 Similarly, the bias DQI can be applied to statistically based data provided in the published
 literature if a reference value is given or known by the analyst.  The bias DQI may also
 be useful for analyzing whether measurement devices  used to collect primary LCI data
 exhibit systematic error, as compared against an existing reference value (i.e., the
 measured values are consistently higher or lower than the reference values).  Measurement
 of bias in primary data necessarily involves the use of reference values.  Thus, the bias
                                         A-3

-------
              TABLE A-l. SAMPLE DATA TO CALCULATE BIAS
Measurements
1
2
3
4
x,
48
55
50
45
Xj-X
-1.5
5.5
0.5
-4.5
(X; - X)2
2.25
30.25
0.25
20.25
 Average X = (X, + X2 + X3 + X4) •*• n
         - (48 + 55 + 50 + 45) * 4
         = 49.5
Bias
X - T (T = reference sample value)
49.5-50
-0.5
                                      Percent Bias    = X - (T
                                                            x 100
                                                             x 100
Source: EPA, 1984.
DQI becomes of limited value for evaluating primary data in cases where reference values
are unavailable.

       The bias DQI is more difficult to apply to modeled or nonmeasured secondary
data.  If the data are modeled, bias can be evaluated in one of two ways: the model can
be assessed against a standard value to determine if the results are consistently higher or
lower than the standard, or expert judgment can be used to make the determination.
However, reference materials are not likely to be available to assess the bias in secondary
data.                                                    :

The bias DQI also can be used to evaluate assumptions and calculations used to
manipulate LCI data. Again, either expert judgment can be used to assess whether an
assumption is biased, or the output of a calculation can be assessed against a reference
value (if one exists).
                                        A-4

-------
       It should be noted that measures of bias themselves do not reveal much about the
significance of the bias.  It is left up to the data analyst to determine the importance of
bias measures.  For example, a bias measure of -0.5 indicates the difference between the
average measured value and the reference value. However, this measure does not provide
any insight into how significant a -0.5 (or 1 percent) bias is.  There is no a priori rule of
thumb to help guide interpretation of bias.  Practitioners should use expert judgment or
consult a statistician to help determine the significance of bias measures.

A.3    COMPARABILITY

       Comparability is the degree to which different methods,  data sets, or decisions
agree or can be represented as similar or equivalent (EPA, 199Ic).  Practitioners should
discuss how comparability criteria are satisfied so that variations between the data sources
are clear to users of the material.

       Primary sampled data should be comparable with other measured data for similar
samples.  More  specifically, samples should be collected in a comparable manner from
comparable media and by comparable methods. This is achieved by using standard data
collection and analytical techniques (EPA,  1991; EPA, 1990a).  If the primary data are
judgment samples, however, bias and comparability become closely related, diminishing
the value of comparability as  a DQI (Franklin Associates, Ltd., 1993).

        A more complicated matter is the use of comparability for secondary data sources,
where, for a particular LCI parameter,  an analyst can combine more than one data source.
In this situation, it is preferred that the data sources be comparable, but this may be
difficult to  discern or achieve. For example, the TRI data base contains air, water, and
waste release data for industries in SIC code 20 through 39 that manufacture, produce, or
use greater than 25,000 pounds of any of 300 listed chemicals.  Due to the reporting
threshold, the TRI data base may not contain certain facility emissions needed in an LCI.
If this is the case, an analyst may need to combine the TRI data with other data sources to
adequately reflect industry air, water, and waste emissions.  To be comparable, the
additional data sources would need to be collected and reported in a similar manner.  In
this example, the additional data would preferably be facility-specific emissions data
 (versus industry, national, or other aggregate data).
                                         A-5

-------
A.4    COMPLETENESS

       Completeness routinely is expressed as the percentage of obtained data to needed
data (Mickler and Medlarz, 1987; EPA, 1990b; and EPA, 1991). The level of desired
completeness is largely determined by the goal of the study or data collection effort.

       The completeness DQI is applicable to LCI data that are statistically based,
modeled, or nonmeasured. The completeness DQI can be used to evaluate whether the
information or measurement procedures provide all the data values necessary. In other
words, this DQI could be used to identify whether there are missing data or data
deficiencies. For example, if an LCI requires  air emissions data for a sample of, say, 10
facilities in  an industry but the information source includes data for only eight sampled
facilities, the data source would be considered 80 percent complete.

       The completeness DQI can also be used to evaluate any assumptions and
calculations used to manipulate secondary data.  Secondary data sources may only provide
aggregated data, such as total energy consumption by fuel type (e.g., coal, nuclear, or
other), which must be interpolated, extrapolated, or back-calculated to develop a facility-
specific number suitable for an LCI. Although a data source could be determined to be
complete, the requisite assumptions employed by the analyst may be incomplete.  Thus,
when interpolating, extrapolating, or back-calculating data, considering the completeness
of assumptions is important. Incomplete or deficient assumptions will lead to inaccurate
calculations used to derive LCI data values.

A.5    DESCRIPTION OF THE DATA COLLECTION METHOD AND
        LIMITATIONS

        This DQI applies to both primary and secondary data sources that are statistically
 based or nonmeasured.  In both cases, a description of the data collection method provides
 additional information to help an analyst assess data quality using the other DQIs.

        With regard to statistically based data, data quality can be more easily assessed if
 detail is provided about the sampling procedure used to obtain the data.  This sort of
 information includes:

        •  a description of the sampling plan,
        •  a discussion of how the sample was developed,

                                        A-6

-------
       •  a discussion of how the sample size was determined,
       •  an identification of exactly what was measured, and
       •  an identification of the instruments used to collect the samples.

       If the data were generated through the use of a survey, the data source can include
the list of questions.  Reviewing the questionnaire can help the analyst determine whether
the data reflect the questions asked and whether the data are suitable for the LCI.
Without this kind of information, determining the accuracy or overall quality of the data
source is difficult, if not impossible.

       With respect to a nonmeasured data source, it is equally important from a data
quality perspective that the source contain a discussion of why and how the data were
collected.  For example, the source can indicate whether the data were gathered to fulfill a
legal reporting requirement generated by using proxies or generated by using expert
opinion. If the data were generated due to a legal reporting requirement, the applicable
regulations could be assessed to determine the extent of coverage.  Depending on the
goals and scope of the LCI, the data may be totally or partially suitable for the analysis.
Determining that the data are, among other things, nonrepresentative or that they contain
an undercoverage bias can reduce the usefulness of the data base.  For data generated with
proxies, knowing the basis for the proxy or the level of difference in the comparison
provides an indication of whether the estimates are biased. Similarly, if the data were
based on expert opinion, knowing the assumptions and calculations used to generate the
data is helpful.

       A discussion of model limitations provides an indication of the variability (or
precision) in the data set. With respect to statistical data, the data source can identify the
precision of the measurement instruments and the completeness of the data set (the
number of samples collected versus the number needed). If a survey is employed,  the
data source can indicate, among other things, coverage of the sampling source and  the
level or degree of nonresponse.  With a statistical sample or  a survey, the source can
show whether there were missing values, or  data deficiencies and  for which problems
there was compensation.

        With regard to nonmeasured data, the data source also can show the sources of
variation in the data set. This can include a discussion of data deficiencies (e.g.,
companies who were required to report but reported the data incorrectly), limitations in
                                         A-7

-------
any proxies used, or the lack of procedures to ensure that data were properly input to the
data base or spreadsheets.

A.6    PRECISION

       Precision refers to the closeness, expressed numerically if possible, of estimates to
one another or the variability hi a set of values or measurements compared to their mean.
As shown in Figure A-l, precision and bias are closely related.  Precise data can be
heavily biased, and unbiased data can be largely imprecise. The goal is to obtain data that
are both precise and unbiased (i.e., accurate).

       The sample standard deviation and the coefficient of variation (CV) (i.e., the
standard deviation divided by the mean) are indicators of precision. Their calculation is
shown is Table A-2.  To increase the confidence in the generality of results, the sample
should have as little variability as possible (Franklin Associates, Ltd., 1993).  Thus, the
lower the standard deviation and CV, the higher the precision (EPA, 1984; EPA, 1990b;
and EPA, 1991).

       Precision can be used to evaluate measurement methods  used to collect primary
environmental data. For any measurement method, sources of variation include sample
collection, handling, shipping, storage, preparation, and analysis. Generally,  precision^
measured by comparing individual repeated measurements against the sample mean (X) of
measurements on the same unit, or a reference value if available, and calculating the
coefficient of variation (CV) by dividing the standard deviation in  the sample data by the
sample mean, or reference value. The higher the CV, the higher the dispersion of sample
measures relative to the sample mean, and generally, the lower  the quality of the
measurement tool. Based on the sample data provided in Table A-2, the  standard
deviation is 4.2 andthe CV is approximately 0.085 (EPA, 1984). Comparing these values
against the mean provides  an indication of the degree of variability in the data. Once
again, it is recommended that data analysts use expert judgment or consult a statistician to
determine the significant of CV measures.

       The application of the precision DQI is less clear for secondary data that are
modeled or nonmeasured.  With respect to modeled data, the precision DQI  may possibly
be applied if three pieces of information are provided:
                                         A-8

-------
a)
    Unbiased and Precise
b)
     Biased and Precise
 c)
d)
    Unbiased and Imprecise
     Biased and Imprecise
           Figure A-l. Relationship Between Precision and Bias
                             A-9

-------
          TABLE A-2.  SAMPLE DATA TO CALCULATE PRECISION
    Measurements
           X.-X
                (X, - X)2
1
2
3
4
48
55
50
45
-1.5
5.5
0.5
-4.5
2.25
30.25
0.25
20.25
 Average X = (Xj + X2 + X3 + X4) + n
           = (48 + 55 + 50 + 45) -5- 4
           = 49.5
Variance S2

         S2
          S
    (Xt -
                                                                       (n -1)
(2.25 + 30.25+0.25 + 20.25)- 3
— = 17.67
 3
4.2
                                                CV =  —
                                                        4.2
                                                       49i5
                                                       0.085
Source:  EPA, 1984.
       •  the full data set,
       •  the model documentation, and
       •  a reference value.
If this information is unavailable, model precision may possibly be determined based on
expert opinion.  An expert or group of experts can be consulted to determine the degree of
variability in the data and whether the variation is acceptable.

       The application of the precision DQI to nonmeasured data such as the TRI data
base or a DOE total energy consumption value is even less clear than for modeled data.
A possible application of the precision DQI to these data would be evaluating the data in
a tune series and determining if the year-to-year variation is reasonable.  For example, the

                                       A-10

-------
TRI data base can be evaluated to obtain lead releases from several years for a particular
company or group of companies. Expert opinion can be used to determine whether the
year-to-year change is reasonable. An expert can determine that the change  in releases is
disproportionate based on known production processes and emission controls. This
approach can provide an indication of the "precision" of the estimates reported by
industry.

       For secondary data that require manipulation in order to develop a value(s)  suitable
for an LCI, the precision DQI can be used to assess the adequacy of the calculation. The
results of a calculation can be measured against the mean, standard deviation and the CV
for the data. As indicated above, this information would reveal the variability in the data.
If this information is unavailable, a more likely scenario would be to assess  the precision
of the calculation based on expert opinion. Clearly, if expert opinion is the choice, an
expert different  from the one who developed it should review the calculation.

A.7   LEVEL OF REFERENCE

       To adequately review the quality of a data source used in an LCI, an analyst must
have a complete reference for the source.  Practitioners should clearly indicate what data
sources were used, who  developed the sources, and when they were published.  An
adequate reference enables an analyst to go directly to the source to gain a better
understanding of its quality and how it was used in the LCI.

       For primary data sources, references provide information on where the data were
obtained.  For example,  references on a primary data source on facility ah" emissions
might include information on where, when, and how the data were obtained (e.g., actual
measures, process back-calculations, engineering estimates).

       Referencing secondary data sources presents a challenge.  Although it is relatively
easy to reference an immediate source, it is not always easy to determine the origin of the
data in the secondary source or how it was obtained. For example, secondary energy data
can be obtained from the Energy Information Administration (EIA). While this may be a
reputable source, the practitioner has only limited insight into where and how those data
 were generated, and thus limited confidence in whether those data accurately represent the
                                         A-ll

-------
production system of interest.  Practitioners should attempt to obtain and document as
much information as possible on secondary data sources.

A.8    REPRESENTATIVENESS

       Representativeness refers to the degree to which the data describe what an analyst
is attempting to describe (EPA, 1991; and Mickler and Medlarz, 1987).  This DQI can be
used in a variety of ways.  From a traditional QA/QC perspective, the representativeness
DQI is used to determine if the sampling program design accurately depicts the system of
interest.  This DQI is best met by ensuring that sampling locations are selected properly
and the appropriate number of samples is collected (Barth etaL, 1989).

       With respect to LCI data, the representativeness DQI  can be used to determine the
degree to which primary data represent or depict the parameter of interest. For example,
if primary  air emissions data are collected, the representativeness DQI can be used to
determine the degree to which the data represent what is being measured, such as peak or
average air emissions.

       The representativeness DQI may be more difficult to  apply to secondary data. For
example, if an LCI required hazardous waste release data for a specific waste stream, the
TRI data base—a nonmeasured data source—could be searched to determine if it
contained the data for the industry of interest.  If the TRI data base contained the
hazardous  waste release data, the data source would on the surface appear to be
representative. However, superficial appearance  is not enough to ensure
representativeness.  Typical TRI data are for generic waste streams, but these data are not
necessarily representative of the specific waste stream data needed for an LCI that is often
linked to a specific product or process.  Therefore, practitioners should carefully evaluate
secondary data sources to ensure that they accurately represent what is desired in the LCI.

       Understanding the distinction between the representativeness and completeness
DQIs is important. Continuing with the TRI example,  if the data base contains the type
of hazardous  waste releases generated by the selected industry, the data source would be
considered representative.  However, it is possible that the data base would not contain
information for all of the facilities studied. In this case, the data source is representative
but not complete.
                                        A-12

-------
       Applying the representativeness DQI to assumptions and calculations is possible
and is just as, or even more, important as the completeness DQI. For example, an LCI
for antifreeze might have a complete set of emissions data for antifreeze using ethylene
glycol.  However, this emissions data cannot be fit to antifreeze using propylene glycol.
Constructing an LCI for propylene glycol antifreeze using the emissions data for ethylene
glycol would be misrepresentative.

       In addition,  data derived through a back-calculation may be representative of what
the analyst wants to obtain. However, if an analyst's assumptions are incomplete, no
matter how  representative the data are, they still may not reflect the actual value needed
in an LCI.  Take the example given under the completeness DQI for an extrapolated
energy value. If an analyst assumes that a facility uses coal-based electricity when it in
fact uses nuclear-based electricity, the data derived can represent the perceived need for
the LCI:  coal-based electricity data values. However, because the analyst's assumption
regarding the use of nuclear fuel was incorrect, the data would not reflect the actual value
needed in the LCI.
                                         A-13

-------

-------
 APPENDIX B




BIBLIOGRAPHY

-------
                           RESEARCH METHODOLOGY

       A number of data bases were searched extensively for literature on data quality.
These databases are:

       American Statistics Index (ASP):  1/1/74 to 3/1/92

       Economic Literature Index:  1/1/69 to 3/31/92

       Embase (Excerpta Medica):  1974 to 1992

       Energy Science and Technology:  1974 to 1992

       Enviroline:  1/1/70  to 5/31/92

       Environmental Bibliography: 1/1/74 to 3/31/92

     .  Health Planning and Administration:  1/1/75 to 7/1/92

       Mathsci:  1/1/40 to 6/30/92

       Medline:  1/1/85 to 7/1/92

       National Technical  Information Service (NTIS):  1964 to 1992

       Pollution Abstracts: 1/1/70 to 5/31/92

       Toxline:  1/1/65 to  5/31/92

       Water Resources Abstracts:  1/1/68 to 5/1/92

       Different combinations of the following key words were used in performing the
literature search:

       assessment, available data, data deficiency/deficiencies, data gap(s), data
       indicator(s), data quality, environmental impact assessment, extrapolation, full fuel
       cycle, hazard ranking, hot decking, imputation, missing data, quality assurance,
       quality control, risk assessment, sensitivity analysis, uncertainty/uncertainty
       analysis.

       This bibliography includes references not used in this report but of use to anyone
interested in data quality assessment methodologies.
                                        B-2

-------
                                    BIBLIOGRAPHY

Albridge, Kim M., Jim Standish, and James F. Fries.  1988.  "Hierarchical Time-Oriented
     Approaches to Missing Data Inference." Computers and Biomedical Research
     21:349-366.

Alexander, Greg R., Mark E. Tompkins, and Donald A. Comely.  1990.  "Gestational Age
     Reporting and  Preterm Delivery."  Public Health Report  105(3):267-275.

Anderson, Carl L.  1987. "The Production Process:  Inputs and Wastes." Journal of
     Environmental Economics and Management  14:1 -12.

Anderson, D. W.  1983.  "Development of Product-Specific Output Projections Using an
     Input-Output Model."  Empirical Economics 8:1-8.

Arthur D. Little, Inc.  1993a.  Technical Review of the Draft Guidance Document "Assessing
     Data Quality for Life Cycle Assessment."  Prepared  for the U.S. Environmental
     Protection Agency, Office of Solid Waste, Washington, D.C.

Arthur D. Little, Inc.  1993b,  "Case Studies and Document Review: Assessing Data Quality
     for Life Cycle Assessment."  Prepared for the Research Triangle Institute, Center for
     Economics Research, Research Triangle Park, NC.

Azen, Stanley P., Michael Van Guilder, and Mary Ann Hill.  1989.  "Estimation of
     Parameters and Missing Values Under A Regression Model With Non-Normally
     Distributed and Non-Randomly Incomplete Data." Statistics In Medicine 8:217-228.

Bailey, Paul E. 1990-91. "Life-Cycle Costing and Pollution Prevention." Pollution
     Prevention Review Winter:27-39.

Barth, D.S. et al.  1989.  Soil Sampling Quality Assurance Users' Guide,  Second Edition.
     U.S. Environmental Protection Agency report 600/8-89/046, PB89-189864.  Las Vegas,
     Nevada: University of Nevada.

Berube, M., and S. Bisson.  1991. Lifecycle Studies:  A Literature Review and Critical
     Analysis.  Ministere De  L'Environment Du Quebec.

Bogen, Kenneth T., and Robert C. Spear.  1990.  "Uncertainty and Variability in
     Environmental Risk Assessment: A Framework for Analysis." In New Risks, L. A.
     Cox, Jr., and P.F. Ricci, eds., pp. 389-401.  New York:  Plenum Press.

Boustead, I., and  G. F. Hancock. Handbook of Industrial Energy Analysis.  New York: John
     Wiley and Sons.

Brodzinsky, Richard, and Hanwant B. Singh. 1983.  Volatile Organic Chemicals in the
                                          B-3

-------
     Atmosphere: An Assessment of Available Data.  Research Triangle Park, NC:  EPA
     Office of Research and Development, Environmental Sciences Research Laboratory.

Bunn, Derek W.  1984. Applied Decision Analysis. New York:  McGraw-Hill Book
     Company.

Bunnett, Joseph F.  1985.  "Errors in Publications." Accounts of Chemical Research 18(10).

Byrd, Daniel M., and Elizabeth T. Barfield.  1989.  "Uncertainty in the Estimation of
     Benzene Risks: Application of an Uncertainty Taxonomy to Risk Assessments Based on
     an Epidemiology Study of Rubber Hydrochloride Workers." Environmental Health
     Perspective 82:283-287.

Calabrese, Edward J. 1982.  "The Role of Exposure Data in Standard Setting."  Toxic
     Substances Journal  4:12-22.

Center for Policy Alternatives.  1990. "Update on Diapers." Revised.  September.

Chapman, David W.  1976.  "A Survey of Nonresponse Imputation Procedures."  Prepared for
     the Proceedings of the Section of Social Statistics.  Westat^ Inc.

Checkoway, Harvey, David A. Savitz, and Nicholas J. Heyer.  1991.  "Assessing the Effects
     of Nondifferential Misclassification of Exposures in Occupational Studies."  Appl.       \
     Occup. Environ. Hyg.  6(6):528-533.

Clemen, R.T.  1991. Making Hard Decisions: An Introduction to Decision Analysis.
     Boston:  PWS-Kent Publishing Company.

Commission on Geosciences, Environment, and Resources.  1990.  Tracking Toxic Substances
     at Industrial Facilities: Engineering Mass Balance Versus Materials Accounting.
     Washington: National Academy Press.

Conn, Judith M., Kung-Jong Lui, and Daniel L. McGee. 1989.  "A Model-Based Approach
     to the Imputation of Missing Data:  Home Injury Incidences."  Statistics In Medicine
     8:263-266.

Cook, Thomas D.  1990.  "The Generalization of Casual Connections:  Multiple Theories in
     Search of Clear Practice." In Research Methodology:  Strengthening of Non-
     Experimental Data,  Lee Sechrest, Edward Perrin, and John Bunker, eds., pp.  9-18.
     Washington, DC:  U.S. Department of Health and Human Services, Agency for Health
     Care Policy and Research.

Cooper, Lee G., Jan De Leeuw, and Aram G. Sogomonian.  1991. "An Imputation Method
     for Dealing with Missing Data hi Regression." Applied Stochastic Models and Data
     Analysis 7:213-235.
                                          B-4

-------
Cordray, David S.  1990.  "Strengthening Casual Interpretations of Nonexperimental Data:
     The Role of Meta-Analysis." In Research Methodology:  Strengthening of Non-
     Experimental  Data, Lee Sechrest, Edward Perrin, and John Bunker, eds., pp.  151-172.
     Washington, DC: U.S. Department of Health and Human Services, Agency for Health
     Care Policy and Research.

Council for Solid Waste Solutions.  1990.  Resource and Environmental Profile Analysis of
     Polyethylene and Unbleached Paper Grocery Sacks.  Final report. Prepared by Franklin
     Associates, Ltd. June.

Council for Solid Waste Solutions.  1991.  Resource and Environmental Profile Analysis of
     High-Density  Polyethylene and Bleached Paperboard Gable Milk Containers.  Final
     report. Prepared by Franklin Associates, Ltd.

Council of State Governments, Lexington, Kentucky, and U.S. Environmental Protection
     Agency.  1982. CSG/Tellus Packaging Study, Volume I.  Prepared by Tellus Institute.

Cox, Brenda G.  1981.  "Imputation Procedures to Replace Missing Responses to Data
     Items."  Prepared in the Annual  Meetings  of North Central Sociological Association.
     Cleveland, Ohio. April.

Cox, David C., and Paul Baybutt. 1981. "Methods for Uncertainty Analysis: A Comparative
     Survey." Society for Risk Analysis  1(4):251-258.

"Diaper Decisions."  Consumer Reports.  August 1991. pp. 551-556.

Denison, Richard.  1992.  Toward a Code of Ethical Conduct for LCAs.  Washington DC:
     Environmental Defense Fund.

Eddy, David M. 1990.  "The Role of Meta-Analysis." In Research Methodology:
     Strengthening of Non-Experimental Data, Lee  Sechrest, Edward Perrin, and John
     Bunker, eds., pp.  173-176. Washington, DC:  U.S.  Department  of Health and Human
     Services, Agency for Health Care Policy and Research.

Ekvall, Thomas.  1992.  "Life-cycle Analyses of Corrugated Cardboard: A comparative
     analysis of two existing studies." Unpublished draft report.

Ellis, Hugh.  1992. "Acid Rain:  Operations Research Joins the Battle to Control Serious
     Ecological Threat."  OR/MS Today June:26-28.

 Energy Information Administration.  1992a.  Annual Energy Outlook with Projections to
     2010. Washington, DC:  U.S. Government Printing Office.

 Energy Information Administration.  1992b.  Assumptions for the Annual Energy Outlook
      1992. Washington, DC:  U.S. Government Printing Office.
                                           B-5

-------
Energy Information Administration.  1992c.  Monthly Energy Review. Washington, DC: U.S.
     Government Printing Office.

Energy Information Administration.  1992d.  Supplement to the Annual Energy Outlook 1992.
     Washington, DC: U.S. Government Printing Office.

Energy Information Administration.  199la.  An Assessment of the Quality of Selected EIA
     Data Series. Washington, DC: U.S. Government Printing Office.

Energy Information Administration.  1991b.  Manufacturing Energy Consumption Survey:
     Consumption of Energy 1988. Washington, DC: U.S. Government Printing Office.

Energy Information Administration.  1991c.  Manufacturing Fuel-Switching Capability 1988.
     Washington, DC: U.S. Government Printing Office.

Energy Information Administration.  199Id.  New Releases. Washington, DC:  U.S.
     Government Printing Office.

Energy Information Administration.  1990.  Energy Consumption and Conservation Potential:
     Supporting Analysis for the National Energy Strategy. Washington, DC:  U.S.
     Government Printing Office.

Energy Information Administration.  1989.  An Assessment of the Quality of Selected EIA
     Data Series, Washington, DC: U.S. Government Printing Office.

Energy Information Administration.  1987.  An Assessment of the Quality of Selected EIA
     Data Series. Washington, DC: U.S. Government Printing Office.

Esmen, Nurtan A.  1991.  "Analysis of Strategies for Reconstructing Exposures." Appl.
     Occup. Environ. Hyg.  6(6):488-494.

Finkel, Adam M.  1990.  Confronting Uncertainty in Risk Management: A Guide for
     Decision-Makers. Washington, DC:  Center For Risk Management.

Finn, John T. 1976.  "Measures  of Ecosystem Structure  and Function Derived from Analysis
     of Flows."  Journal of Theoretical Biology 56:363-380.

Ford, Judith S.  1991.  AEERL Quality Assurance Procedures Manual for Contractors and
     Financial Assistance Recipients.  Draft. Washington, DC: EPA Office of Research and
     Development.

Forsund, Finn R.  "Input-Output Models, National Economic Models, and the Environment."
     Handbook of Natural Resource and Energy Economics.  Vol. 1, A.V. Kneese and J. L.
     Sweeney, eds., pp. 325-339.
                                         B-6

-------
Franklin Associates, Ltd. 1993.  "Data Quality Case Studies." Prepared for Research
     Triangle Institute.  Prairie Village, Kansas.

Franklin Associates, Ltd. 1992.  "Example Information on Sensitivity Analysis for Two
     Production Systems."  Prairie Village, Kansas.

Franklin Associates, Ltd. 1991.  Resource and Environmental Analysis of High-Density
     Polyethylene and Bleached Paperboard Gable Milk Containers. Prairie Village, Kansas.

Franklin Associates, Ltd. 1989.  Comparative Energy and Environmental Impacts for Soft
     Drink Delivery Systems. Prairie Village, Kansas.

French, Simon.  1989. Readings in Decision Analysis. New  York, NY:  Chapman and Hall.

Funtowicz, S.O., and J.R. Ravetz.  1990.  Uncertainty and Quality in Science for Policy. The
     Netherlands: Kluwer Academic Publishers.

Funtowicz, S.O., and J.R. Ravetz.  1987.  "The Arithmetic of Scientific Uncertainty." Physics
     Bulletin 38:412-414.

Funtowicz, S.O., and J.R. Ravetz.  1986.  "Policy-Related Research:  A Notational Scheme
     for the  Expression of Quantitative Technical Information."  Journal of Operational
     Research Society  37(3):243-247.

Funtowicz, S.O., and  J.R. Ravetz.  1984.  "Uncertainties and  Ignorance in Policy Analysis."
     Risk Analysis  4(3):219-220.

Funtowicz, S.O., S.M. Macgill, and J.R. Ravetz.  1989.  "The Management of Uncertainties in
     Radiological Data." /. Radiol Prot. 9(4):257-261.

Funtowicz, S.O., S.M. MacgiU, and J. R. Ravetz.  1988.  "Mapping Uncertainties of
     Radiological Hazards."  Atom.  Nov: 15-16.

Galloway, James N., J.  David Thornton, Stephen A. Norton, Herbert L. Volchok, and Ronald
     N. McLean. 1982. "Trace Metals in Atmospheric Deposition: A Review and
     Assessment."  Atmospheric Environment 16(7): 1677-1700.

Hamalainen, Raimo P.  1992. "Politics & Policy:  Decision Analysis Makes Its Way Into
      Environmental Policy in Finland."  OR/MS Today June:40-43.

 Henrion, Max, and Baruch Fischoff. 1986.  "Assessing Uncertainty in Physical Constants."
     American  Journal of Physics  54(9):791-798.

 Hocking, Martin B.  1991.  "Paper vs. Polystyrene: A Complex  Choice." Science  251:504-
      505.
                                            B-7

-------
Hope, C.W., and S. Owens. 1986.  "Research Policy and Review 10:  Frameworks for
     Studying Energy and the Environment." Environment and Planning  18:851-864.

Hui, Siu L., and James O. Berger.  1983. "Empirical Bayes Estimation of Rates in
     Longitudinal Studies." Journal of the American Statistical Association  78(384):753-
     760.

Hunt, Robert G., and William E. Franklin.  1975.  "Resource and Environmental Profile
     Analysis . . . of Beer Containers." Chemtech August:474-481.

Iman, Ronald L. 1990.  "Methods Use in Probabilistic Risk Assessment for Uncertainty and
     Sensitivity Analysis."  In New Risks, L.A. Cox Jr. and P.F. Ricci, eds.  New York:
     Plenum Press.

Iman, Ronald L. 1987.  "A Matrix-Based Approach to Uncertainty and Sensitivity Analysis
     for Fault Trees.  Risk Analysis 7(l):21-33.

Iman, Ronald L., and Jon C. Helton.  1988. "An Investigation of Uncertainty and Sensitivity
     Analysis Techniques for Computer Models." Risk Analysis 8(1):71-90.

James, David.  1985.  "Environmental Economics, Industrial Process Models, and Regional-
     Residuals Management Models." Handbook of Natural Resource and Energy
     Economics. Vol. 1, A. V. Kneese and J. L. Sweeney, eds., pp. 271-319.

Johnson, Gary L., and Judith S. Ford.  1986. "Development of Data Quality Indicators for
     Toxic Air Pollution Measurements."  In Proceedings of the 1986 EPA/APCA Symposium
     on Measurement of Toxic Air Pollutants. Research Triangle Park, NC:  EPA Air and
     Energy Engineering Research Laboratory.

Kalton, Graham, and Daniel Kasprzyk.  1982. "Imputing for Missing Survey Responses."
     Prepared for the Proceedings of the Section on Survey Research Methods.

Keeney, Ralph L., and Raiffa, Howard. 1976. Decisions with Multiple Objectives:
     Preferences and Value Trade Offs. New York:  John Wiley & Sons, Inc.

Kennedy, Peter. 1991. A Guide to Econometrics.  Second Edition. The MTT Press:
     Cambridge, Massachuetts.

Kirk, Roger E. 1990. Statistics:  An Introduction.  Third Edition.  Fort Worth:  Holt,
     Rinehart, and Winston.

Kirkpatrick, Neil.  1991. Total Approach to Preserving the Environment." Packaging Week
     7(25): 19.

Kollig, Heinz P. 1987.  "Criteria for Evaluating the Reliability of Literature Data on
                                           B-8

-------
     Environmental Process Constants."  Toxicological and Environmental Chemistry
     17:287-311.

Kollig, Heinz P., and Brenda E. Kitchens.  1990.  "Problems Associated with Published
     Environmental Fate Data."  Toxicological and Environmental Chemistry  28:95-103.

Kollig, Heinz P., U.S. Environmental Protection Agency.  1992. Personal communication
     with Chris Lacke, Research Triangle Institute.


Lepkowski, James M., J. Richard Landis, and Sharon A. Stehouwer.  1987.  "Strategies for
     the Analysis of Imputed Data From a Sample Survey:  The National Medical Care
     Utilization  and Expenditure Survey."  Medical Care  25(8):705-716.

Lewis, P.A.W., and EJ.  Orav.  1989. Simulation Methodology for Statisticians, Operations
     Analysts, and Engineers:  Volume I. The Wadsworth and Brooks/Cole Statistics/
     Probability Series.

Little, Roderick, J. A., and Donald B. Rubin.  1987.  Statistical Analysis with Missing Data.
     New York:  John Wiley and Sons, Inc.

Liibkert, Barbara, Yrjo Virtanen, Manfred Muhlberger, Jyrki Ingman, Bruno Vallance, and
     Sebastian Alber. 1991.  "Life-Cycle Analysis Idea:  An International Database for
     Ecoprofile  Analysis, A Tool For Decision Makers."  Working Paper.  Laxenburg,
     Austria:  International Institute for  Applied Systems Analysis.

Macgill, S. M., and S. O.  Funtowicz.  1988.   "The 'Pedigree' of Radiation Estimates: An
     Exploratory Analysis in the Context of Exposures of Young People in Seascale as a
     Result of Sellafield Discharges." J. Radiol Prot.  8(2):77-86.

Mann, Charles.  1990.  "Meta-Analysis in the Breech." Science 249:476-480.
    Marinelli,  Janet.  1990. "Packaging."  Garbage. May-June.

Marland, Greg, and Ralph M. Rotty. 1984.  "Carbon Dioxide Emissions from Fossil Fuels:
     A Procedure for Estimation and  Results for 1950-1982." Tellus  366:232-261.

Massey, James T., and Keith L. Hoffman.  1989.  "Monitoring Data Quality Through
     Comparisons Between Data Systems."  Statistics In Medicine  8:367-377.

 McNamee, P., and J. Celara.   1987. Decision Analysis for the Professional with Supertree.
     Redwood City: The Scientific Press.

 Mekel, O.C.L.,  and G. Huppes. 1990.  Environmental Effects of Different Package Systems
     for Fresh Milk. Printed in The Netherlands.
                                           B-9

-------
Mendenhall, William, and Robert J. Beaver.  1991. Introduction to Probability and Statistics.
     8th Edition. Boston:  P.W.S. Kent.

Mickler, Robert A., and Susan A. Medlarz. 1987.  "The Role of Quality Assurance in
     National Acid Rain Research in the United States." Environmental Technology Letters
     8:459-466.

Mill, Theodore, and Barbara T. Walton.  1987.  "How Reliable Are Data-Base Data?"
     Environmental Toxicology and Chemistry  6:161-162.      ;

Moore, David S., and George P. McCabe.  1989. Introduction to the Practice of Statistics.
     New York:  W.H. Freeman and Co.

Morgan, M. Granger, and Max Henrion.  1990.  Uncertainty: A Guide to Dealing with
     Uncertainty in Quantitative Risk and Policy Analysis.  New York: Cambridge
     University Press.

Morris, Peter A.  1974. "Decision Analysis Expert Use." Management Science
     20(9): 1233-1241.                                      ;

National Academy of Science.  1988. Final Report on Quality Assurance to the
     Environmental Protection Agency.  Washington,  DC: National Academy Press.

Neptune, Dean, Eugene P. Brantly, Michael J. Messner, and Daniel I. Michael.  1990.
     "Quantitative Decision Making in Superfund:  A Data Quality Objectives Case Study."
     HMC May/June: 19-27.

North, D. Warner.  1990.  "Decision Analysis in Environmental Risk Management:
     Applications to Acid Deposition and Air Toxics."  In New Risks, L.A.  Cox Jr. and P.F.
     Ricci, eds.  New York:  Plenum Press.

Pacific Environmental Services, Inc.  1989. Emission Factor Documentation for AP-42:
     Section 5.19 Synthetic Fibers. Prepared for the  U.S. Environmental Protection Agency,
     Office of Air Quality Planning and Standards, Research Triangle Park, NC.

Plewa, Michael J.,  Roger A. Minear, Diane Ades-Mclnerney, and Elizabeth Wagner.  1988.
     Refining the Degree of Hazard Ranking Methodology for Illinois Industrial Waste
     Streams. Printed by the Authority of the State of Illinois 88/150.

Rogers, Everett M. 1987.  "Methodology for Meta-Research."  In Organizational
     Communication:  Abstracts, Analysis, and  Overview, Volume 10. Greenbaum, Howard
     H., Susan A.  Hellweg, and Joseph W. Walter, eds, pp. 13-33.  Beverly Hills, CA: Sage
     Publications.
                                          B-10

-------
Rothgeb, T. Michael, Charles A. Pittinger, and Celeste C. Kuta.  1991.  "Environmental
     Quality of Consumer Products."  TAPPI  28(3):64-69.

Rubin, Donald B.  1990. "Discussion."  Paper presented at the 1990 U.S. Bureau of the
     Census Annual Research Conference Proceedings, Arlington, VA, March 18-21.

Rubin, Donald B.  1987. Multiple Imputation for Nonresponse in Surveys.  New York: John
     Wiley and Sons, Inc.

Rubin, Donald B., and Nathaniel Schenker.  1991. "Multiple Imputation in Health-Care
     Databases: An Overview and Some Applications."  Statistics in Medicine  10:585-598.

SAS Institute Inc.  1988. SAS ® Procedures Guide, Release 6.03 Edition.  Gary, NC:  SAS
     Institute Inc., 441 pp.

Schaffhauser, Anthony J., and Bruce Tonn.  1991.  "Signaling Uncertainty and Quality of Cell
     Entries." Draft paper.  Oak Ridge, TN:  Oak Ridge National Laboratory.

Scientific Certification Systems, Inc.  1992. Life Cycle Inventory and the Environmental
     Report Card.  Oakland, CA: Scientific Certification Systems, Inc.

Sechrest, Lee, and Maureen Hannah.  1990. "The Critical Importance of Nonexperimental
     Data." In Research Methodology:  Strengthening of Non-Experimental Data, Lee
     Sechrest, Edward Perrin, and John Bunker, eds., pp.  1-7. Washington, DC:  U.S.
     Department of Health and Human Services, Agency for Health Care Policy and
     Research.

Society of Environmental Toxicology and Chemistry.  1994.  Life-Cycle Assessment: A
     Conceptual Framework for Data Quality.  Pensacola, Florida:  Society of Environmental
     Toxicology and Chemistry, and SETAC Foundation for Environmental Education, Inc.

Society of Environmental Toxicology and Chemistry.  1993a.  Guidelines for Life Cycle
     Assessment:  A 'Code  of Practice'.  Draft paper.  Pensacola, Florida:  Society of
     Environmental Toxicology and Chemistry,  and SETAC Foundation for Environmental
     Education, Inc.

Society of Environmental Toxicology and Chemistry.  1993b.  A Conceptual Framework for
     Life-Cycle Impact Assessment. Pensacola, Florida:  Society of Environmental
     Toxicology and Chemistry, and SETAC Foundation for Environmental Education, Inc.

Society of Environmental Toxicology and Chemistry.  1992.  Guidelines for Conduct ofLCA
     Peer Review.   Working paper.  Pensacola, Florida: Society of Environmental
     Toxicology and Chemistry, and SETAC Foundation for Environmental Education, Inc.

Society of Environmental Toxicology and Chemistry, and SETAC Foundation for
                                          B-ll

-------
     Environmental Education, Inc. 1991. A Technical Framework for Life-Cycle
     Assessments.  Pensacola, FL:  Society of Environmental Toxicology and Chemistry, and
     SETAC Foundation for Environmental Education, Inc.

Spetzler, Carl S., and Carl-Axel Stael von Holstein.  1975. "Probability Encoding in Decision
     Analysis." Management Science 22(3):340-358.

Stephanou, Harry E.  1987.  "Perspectives on Imperfect Information Processing."  IEEE
     Transactions on Systems, Man, and Cybernetics  17(5):780-798.

Stern, R. M.  1989. "Cancer Incidence Among Welders Due to Combined Exposures to ELF
     and Welding Fumes: A Study in Data Pooling." In Risk Assessment in Setting National
     Priorities, James J. Bonin and Donald E. Stevenson, eds., pp. 517-527.  New York:
     Plenum Press.

Talcott, Frederick W.  1992.  "Environmental Agenda: The Time Is Ripe for an Analytical
     Approach to Policy Problems." OR/MS Today June: 18-24.

Taylor, Jeremy M. G., Munoz, Sue M. Bass, Alfred J. Saah, Joan S. Chmiel, and Lawrence
     A. Kingsley.  1990.  "Estimating The Distribution of Tunes From HIV Seroconversion
     to ADDS Using Multiple Imputation." Statistics In Medicine  9:505-514.

Thompson, Michael, and Michael Warburton.  1985.  "Decision Making Under Contradictory
     Certainties: How To Save The Himalayas When You Can't Find Out What's Wrong
     with  Them."  Journal of Applied Systems Analysis  12:3-34.

Tiemey, WUh'am M., and Clement J. McDonald. 1991. "Practice Databases and Their Uses
     in Clinical Research."  Statistics in Medicine 10:541-557.

Tonn, Bruce, and Richard Goeltz.  1990. "An Experiment in Combining Estimates of
     Uncertainty." In New Risks, L. A. Cox, Jr. and P.F. Ricci, eds., pp. 403-412.  New
     York: Plenum Press.

U.S. Department of Commerce.  1985.  "Methodology for Characterization of Uncertainty in
     Exposure Assessments." PB85-240455.  Prepared by Roy W. Whitmore, Research
     Triangle Institute, Research Triangle Park, NC.

U.S. Department of Health and Human Services.  1985.  Environmental Health:  A Plan for
     Collecting and Coordinating Statistical and Epidemiologic Data. Washington, DC:
     U.S.  Government Printing Office.

U.S. Department of the Interior.  1992. Mineral Commodity Summaries  1992. Washington,
     DC:  U.S. Government Printing Office.

U.S. Department of the Interior.  1989. Minerals Yearbook.  Volume I. Washington, DC:
                                         B-12

-------
     U.S. Government Printing Office.

U.S. Department of the Interior.  1985. Mineral Facts and Problems.  Washington, DC:
     U.S. Government Printing Office.

U.S. Environmental Protection Agency.  1994a.  Life Cycle Assessment: Public Data Sources
     for the LCA Practitioner.  EPA, Office of Solid Waste, Washington, DC.  Prepared by
     Battelle Memorial Instititue.

U.S. Environmental Protection Agency.  1994b.  Life Cycle Impact Assessment:  A
     Conceptual Framework, Key Issues, and Summary of Existing Methods.  EPA, Office of
     Air Quality Planning and Standards, Research Triangle Park, NC.  Prepared by Research
     Triangle Instititue.

U.S. Environmental Protection Agency.  1993a.  Life Cycle Design Guidance Manual  Final
     Report, EPA600/R-92/226. Office of Research and Development. Prepared by National
     Pollution Prevention Center, University of Michigan..

    U.S. Environmental Protection Agency.  1993b. Life-Cycle Assessment:  Inventory
    Guidelines and Principles. Final Report, EPA/600/R-92/245. Office of Research and
    Development. Prepared by Battelle, and Franklin Associates, Ltd.

U.S. Environmental Protection Agency.  1991. "AEERL Quality Assurance Procedures
     Manual for Contracts and Financial Aid Recipients." Draft Document.  Washington,
     DC: EPA Office of Research and Development.

U.S. Environmental Protection Agency.  1990a.  Revised Hazard Ranking System: Qs and
     As.  Washington, DC:  EPA Office of Emergency and Remedial Response.

U.S. Environmental Protection Agency.  1990b.  Guidance for Data Usability in Risk
     Assessment. Interim Final Report. Washington, DC.

U.S. Environmental Protection Agency.  1990c.  Toxic Release Inventory: TRIS  Overview.
     Washington, DC:  EPA Office of Toxic Substances.

U.S. Environmental Protection Agency.  1989a.  Report on Minimum Criteria to Assure Data
     Quality.  EPA/530-SW-90-021. Washington, DC:  EPA Office of Solid Waste.

U.S. Environmental Protection Agency. 1989b.  Risk Assessment Guidance for Superfund,
     Volume I, Human Health Evaluation Manual (Part A).  EPA/540/1-89/002.  Washington,
     DC.

U.S. Environmental Protection Agency. 1987. Municipal Waste Combustion Study:
     Emission Data Base for Municipal Waste Combustors. Final report. EPA/530/SW-
     87/012B. Prepared by P. Schindler, Midwest Research Institute.
                                          B-13

-------
U.S. Environmental Protection Agency.  1986a.  Development of Data Quality Objectives:
     Description of Stages I and II.  Washington, DC: EPA Quality Assurance Management
     Staff. [Obtain by calling (202) 260-5763]

U.S. Environmental Protection Agency.  1986b.  Proceedings of the 1986 EPA/APCA
     Symposium on Measurement of Toxic Air Pollutants. Raleigh, NC: EPA Environmental
     Monitoring Systems Laboratory and the Air Pollution Control Association.

U.S. Environmental Protection Agency.  1984.  Calculation of Precision, Bias, and Method
     Detection Limit for Chemical and Physical Measurements. Chapter 5. Quality
     Assurance Management and Special Studies Staff.

U.S. Environmental Protection Agency.  1978.  Source Assessment:  Analysis of
     Uncertainty—Principles and Applications.  EPA-600/2-78-004u.  Prepared by R. W.
     Serth, T. W. Hughes, R. E. Opferkuch, and E. C. Eimutis, Monsanto Research
     Corporation.  Dayton, OH: EPA Office of Research and Development.

Vesely, W. E., and  D. M. Rasmuson. 1984.  "Uncertainties in Nuclear Probabilistic Risk
     Analyses." Risk Analysis 4(4):313-322.

Vigon, Bruce and Mary Ann Curran.  1993.  "Life-cycle  Improvements Analysis: Procedure
     Development and Demonstration." Paper presented at the 1993 IEEE Internation
     Symposium of'Electronics and the Environment, Arlington, Virginia, May 10-12, 1993.
     Institute of Electrical and Electronics Engineers, Inc., Piscataway, NJ.

VJgon, Bruce and Allan A. Jensen.  1992.  "Lifecycle Assessment Data Quality and Databases
     Practitioner Survey."  Discussion paper presented at the Society of Environmental
     Toxicology and Chemistry Lifecycle Assessment Data Quality Workshop, Wintergreen,
     Virginia, October 4-9,  1992.                                                       !

Watson, S.R., and D.M. Buede.  1987.  Decision Synthesis:  The Principles and Practice of
     Decision Analysis.  Cambridge University Press.

Wei, Greg C. G., and Martin A. Tanner. 1991.  "Applications of Multiple Imputation to the
     Analysis of Censored Regression Data." Biometrics 47:1297-1309.

Wilkinson, G. N.   1958. "Estimation of Missing Values  for the Analysis of Incomplete
     Data." Biometrics 14:257-286.

Wolf, Katy, Azita Yazdani, and Pamela Yates.  1991. Waste Minimization: Small Quantity
     Generators at Los Angeles International Airport. Prepared by UCLA Engineering
     Research Center for Hazardous Substances Control  for Alternative Technology Division,
     California Department of Toxic Substances Control, U.S. Environmental Protection
     Agency.
                                          B-14

-------
Yin, Robert K., and Karen A. Heald.  1975. "Using the Case Survey Method to Analyze
     Policy Studies."  Administrative Science Quarterly.  20:371-381.

Youngren, Susan Hunter, and Robert G. Tardiff.  1991. "Reducing Uncertainty in Risk
     Assessment: Exposure Assessment."  In Risk Analysis, C. Zervos ed., pp. 575-588.
     New York: Plenum Press.
                                           B-15

-------

-------