Guidance on the Development, Evaluation, and Application of Environmental Models


                              EPA/1 OO/K-09/003 I March 2009
                                      www.epa.gov/crem
United States
Environmental Protection
Agency
             Guidance on the Development,
             Evaluation, and Application of
             Environmental  Models
Office of the Science Advisor
Council for Regulatory Environmental Modelinj

-------
                                             EPA/1 OO/K-09/003
                                                 March 2009
                                       Office of the Science Advisor
Guidance on the Development, Evaluation, and
      Application of Environmental Models
             Council for Regulatory Environmental Modeling
                U.S. Environmental Protection Agency
                    Washington, DC 20460

-------
This Guidance on the Development, Evaluation, and Application of Environmental Models was prepared in
response to a request by the U.S. Environmental Protection Agency  (EPA) Administrator that EPA's
Council for  Regulatory  Environmental  Modeling (CREM)  help  continue  to  strengthen the  Agency's
development, evaluation, and use of models (http://www.epa.gov/osp/crem/library/whitman.PDF').

A  draft version  of this document  (http://cfpub.epa.gov/crem/crem  sab.cfm)  was  reviewed   by  an
independent panel  of experts established  by EPA's Science Advisory Board  and  revised by CREM in
response to the panel's comments.

This final document is available in printed and electronic form.  The electronic version provides direct links
to the references identified in the document.
Disclaimer

This document provides guidance to those who develop, evaluate, and apply environmental models. It
does not impose legally binding requirements; depending on the circumstances, it may not apply to a
particular situation. The U.S. Environmental Protection Agency (EPA) retains the discretion to adopt, on a
case-by-case basis, approaches that differ from this guidance.

-------
Authors, Contributors, and Reviewers
This  document was developed under the leadership of EPA's Council  for Regulatory Environmental
Modeling. A number of people representing EPA's core, program, and regional offices helped write and
review it.

PRINCIPAL AUTHORS:

Council for Regulatory Environmental Modeling Staff:
Noha Gaber, Gary Foley, Pasky Pascual, Neil Stiber, Elsie Sunderland

EPA Region 10:
Ben Cope

Office of Environmental Information:
Annett Mold  (deceased)

Office of Solid Waste and Emergency Response:
Zubair Saleem

CONTRIBUTORS AND INTERNAL REVIEWERS:

EPA Core Offices:
Office of Research and Development:
Justin Babendreier, Thomas Barnwell (retired), Ed  Bender, Lawrence Burns (retired), Gary Foley, Kathryn
Gallagher, Kenneth Galluppi, Gerry Laniak, Haluk Ozkaynak,  Kenneth  Schere, Subhas  Sikdar,  Eric
Weber, Joe Williams

Office of Environmental Information:
Ming Chang, Reggie Cheatham, Evangeline Cummings, Linda Kirkland, Nancy Wentworth

Office of General Counsel:
James Nelson  (retired), Barbara Pace, Quoc Nguyen, Manisha Patel, Carol Ann Sicilano

Science Advisory Board:
Jack Kooyoomjian

EPA Program Offices:
Office of Air and Radiation:
Tyler Fox, John Irwin (retired), Joe Tikvart, Richard (Chet) Wayland, Jason  West

Office of Prevention, Pesticides and Toxic Substances:
Lynn Delpire,  Alan  Dixon, Wen-Hsiung  Lee, David  Miller, Vince Nabholz,  Steve  Nako,  Neil Patel,
Randolph Perfetti (retired), Scott Prothero, Donald  Rodier

-------
Office of Solid Waste and Emergency Response:
Peter Grevatt, Lee Hofmann, Stephen Kroner (retired), Larry Zaragoza

Office of Water:
Jim Carleton, Sharon E. Hayes, Marjorie Wellman, Denise Keehner, Lauren Wisniewski, Lisa McGuire,
Mike Messner, James F. Pendergast

EPA Regional Offices:
Region 1:
Brian Hennessey, Michael Kenyon

Region 2:
Kevin Bricke, Rosella O'Connor, Richard Winfield

Region 3:
Alan Cimorelli

Region 4:
Nancy Bethune, Brenda Johnson, Tim Wool

Region 5:
Bertram Frey, Arthur Lubin, Randy Robinson, Stephen Roy, Mary White

Region 6:
James Yarborough

Region 7:
Bret Anderson

Region 10:
David Frank (retired), John Yearsley (retired)
                                             IV

-------
Preface                                                                                   ii

Disclaimer                                                                                ii

Authors, Contributors, and Reviewers                                                       iii

Executive Summary                                                                       vii

1.              INTRODUCTION
1.1             Purpose and Scope of This Document                                          1
1.2             Intended Audience                                                           2
1.3             Organizational Framework                                                    2
1.4             Appropriate Implementation of This Document                                    3

2.              MODELING FOR ENVIRONMENTAL DECISION SUPPORT
2.1             Why Are Models Important?                                                   4
2.2             The Modeling Life-Cycle                                                      5

3.              MODEL DEVELOPMENT
3.1             Introduction                                                                 8
3.2             Problem Specification and Conceptual Model Development                        9
3.2.1           Define the Objectives                                                         9
3.2.2           Determine the Type and Scope of Model Needed                                 9
3.2.3           Determine Data Criteria                                                       9
3.2.4           Determine the Model's Domain of Applicability                                   10
3.2.5           Discuss Programmatic Constraints                                            10
3.2.6           Develop the Conceptual Model                                                10
3.3             Model Framework Selection and Development                                   11
3.3.1           Model Complexity                                                           12
3.3.2           Model Coding and Verification                                                14
3.4             Application Tool Development                                                15
3.4.1           Input Data                                                                 16
3.4.2           Model Calibration                                                           17

4.              MODEL EVALUATION
4.1             Introduction                                                                19
4.2             Best Practices for Model Evaluation                                            21
4.2.1           Scientific Peer Review                                                       23
4.2.2           Quality Assurance Project Planning and Data Quality Assessment                  25
4.2.3           Corroboration, Sensitivity Analysis, and Uncertainty Analysis                      26
4.2.3.1         Types of Uncertainty                                                         26
4.2.3.2         Model Corroboration                                                         29
4.2.3.3         Sensitivity and Uncertainty Analysis                                            31
4.3             Evaluating Proprietary Models                                                31
4.4             Learning From Prior Experiences — Retrospective Analyses of Models             32
4.5             Documenting the Model Evaluation                                            33
4.6             Deciding Whether to Accept the Model for Use in Decision Making                  34

5.              MODEL APPLICATION
5.1             Introduction                                                                35
5.2             Transparency                                                               37
5.2.1           Documentation                                                              37
5.2.2           Effective Communication                                                     38
5.3             Application of Multiple Models                                                39
5.4             Model Post-Audit                                                            39

-------
APPENDICES
              Appendix A:  Glossary of Frequently Used Terms                            41
              Appendix B:  Categories of Environmental Regulatory Models                 49
              Appendix C: Supplementary Material on Quality Assurance Planning and       56
              Protocols
              Appendix D:  Best Practices for Model Evaluation                            60

Literature Cited                                                                       77
                                          VI

-------

In pursuing its mission to protect human health and to safeguard the natural environment, the U.S.
Environmental Protection Agency often relies on environmental models. In this guidance, a model is
defined as a "simplification of reality that is constructed to gain insights into select attributes of a particular
physical, biological, economic, or social system."

This guidance provides recommendations for the effective development, evaluation, and use of models in
environmental decision making once an environmental issue has been identified. These
recommendations are drawn from Agency white papers, EPA Science Advisory Board reports, the
National Research Council's Models in Environmental Regulatory Decision Making, and peer-reviewed
literature. For organizational simplicity, the recommendations are categorized into three sections: model
development, model evaluation, and model application.

Model evaluation is the process for generating information over the life cycle of the project that helps
determine whether a model and its analytical results are of sufficient quality to serve as the basis for a
decision. Model quality is an attribute that is meaningful only within the context of a specific model
application. In simple terms, model evaluation provides information to help answer the following
questions: (a) How have the principles of sound science been addressed during model development? (b)
How is the choice of model supported by the quantity and quality of available data? (c) How closely does
the model approximate the real system of interest? (d) How well does the model perform the specified
task while meeting the objectives set by quality assurance project planning?

Model application (i.e., model-based decision making) is strengthened when the science underlying the
model is transparent. The elements of transparency emphasized in this guidance are (a) comprehensive
documentation of all aspects of a modeling project (suggested as a list of elements relevant to any
modeling project) and (b) effective communication between modelers, analysts, and decision makers.
This approach ensures that there is a clear rationale for using a model for a specific regulatory
application.

This guidance recommends best practices to help determine when a model, despite its uncertainties, can
be appropriately used to inform a decision. Specifically, it recommends that model developers and users:
(a) subject their model to credible, objective peer review; (b) assess the quality of the data they use; (c)
corroborate their model by evaluating the degree to which it corresponds to the system being modeled;
and (d) perform sensitivity and uncertainty analyses. Sensitivity analysis evaluates the effect of changes
in input values or assumptions on a model's results. Uncertainty analysis investigates the effects of lack
of knowledge and other potential sources of error in the model (e.g., the "uncertainty" associated with
model parameter values). When conducted in combination, sensitivity and uncertainty analysis allow
model users to be more informed about the confidence that can be placed in model results. A model's
quality to support a decision becomes better known when information is available to assess these factors.
VII

-------
1.
1.1 Purpose and Scope of This Document

The U.S. Environmental Protection Agency (EPA) uses a wide range of models to inform decisions that
support its mission of protecting human health and safeguarding the natural environment — air, water,
and land — upon which life depends. These models include atmospheric and indoor air models, ground
water and surface water models, multimedia models, chemical equilibrium models, exposure models,
toxicokinetic models, risk assessment models, and economic models. These models range from simple to
complex and may employ a combination of scientific, economic, socio-economic, or other types of data.

As stated in the National Research Council (NRC) report Models in Environmental Regulatory Decision
Making, models are critical to regulatory decision making because the spatial and temporal scales linking
environmental controls and environmental quality generally do not allow for an observational approach to
understand the relationship between economic activity and environmental quality (NRC 2007). Models
have a long history of helping to explain scientific phenomena and predict outcomes and behavior in
settings where empirical observations are limited or not available.

This guidance uses the NRC report's definition of a model:

A simplification of reality that is constructed to gain insights into select attributes of
a particular physical, biological, economic, or social system.

In particular, this guidance focuses on the subset of all models termed "computational models" by the
NRC. These are models that use measurable variables, numerical inputs, and mathematical relationships
to produce quantitative outputs. (Note that all terms underlined in this document are defined in the
Glossary, Appendix A).

As models become increasingly significant in decision making, it is important that the model development
and evaluation processes conform to protocols or standards that help ensure the utility, scientific
soundness, and defensibility of the models and their outputs for decision making. It is also increasingly
important to plan and manage the process of using models to inform decision making (Manno et al.
2008). This guidance document aims to facilitate a widespread understanding of the processes for model
development, evaluation, and application and thereby promote their appropriate application to support
informed decision making. Recognizing the diversity of modeling applications throughout the Agency,
the principles and practices described in the guidance apply generally to all models used to inform
Agency decisions, regardless of domain, mode, conceptual basis, form, or rigor level (i.e., varying from
screening-level applications to complex analyses) (EPA 2001). The principles presented in this guidance
are also applicable to models not used for regulatory purposes as experience has shown that models
developed for research and development have often found useful applications in environmental
management purposes.

This guidance presents recommendations drawn from Agency white papers on environmental modeling,
EPA Science Advisory Board (SAB) reports, NRC's Models in Environmental Regulatory Decision
Making, and the peer-reviewed literature. It provides an overview of best practices for ensuring and
evaluating the quality of environmental models.

-------
These practices complement the systematic QA planning process for modeling projects outlined in
existing guidance (EPA 2002b). These QA processes produce documentation supporting the quality of
the model development and application process (Appendix C, Box C1: Background on EPA Quality
System). For example, QA plans should contain performance criteria ("specifications"') for a model in the
context of its intended use, and these criteria should be developed at the onset of each project. During
the model evaluation process, these criteria are subjected to a series of tests of model quality ("checks").
Documentation of these specifications and the evaluation results provides a record of how well a model
meets its intended use and the basis for a decision on model acceptability.

The primary purpose of this guidance is to provide specific advice on how to best perform these "checks"
during model development, evaluation, and application. Following the best practices emphasized in this
document, together with well-documented QA project plans, will help ensure that results of modeling
projects and the decisions informed by them heed the principles of the Agency's Information Quality
Guidelines (EPA 2002a).

1.2 Intended Audience

This document is intended for a wide range of audiences, including model developers, computer
programmers, model users, policy makers who work with models, and affected stakeholders. Model
users include those who generate model output (i.e., who set up, parameterize, and run models) and
managers who use model outputs.

1.3 Organizational Framework

The main body of this document provides an overview of principles of good modeling for all users. The
appendices present technical information and examples that may be more appropriate for specific user
groups. For organizational simplicity, the main body of this guidance has separate chapters on the three
key topics: model development, model evaluation, and model application. However, it is important to note
that these three topics are not strictly sequential, For example, the process of evaluating a model and its
input data to ensure their quality should be undertaken and documented during all stages of model
development and application.

Chapter 1 serves as a general introduction and outlines the scope of this guidance. Chapter 2 discusses
the role of models in environmental decision making. Figure 1 at the end of Chapter 2 shows the steps in
the model development and application process and the role that models play in the public policy
process. Chapters 3 and 4 provide guidance on model development (including problem specification)
and model evaluation, respectively. Finally, Chapter 5 recommends practices for most effectively
incorporating information from environmental models into the Agency's policy or regulatory decisions.

Several appendices present more detailed technical information and examples that complement the
chapters. Appendix A provides definitions for all underlined terms in this guidance, and Appendix B
summarizes the categories of models that are integral to environmental regulation. Appendix C presents
additional background information on the QA program and other relevant topics. Appendix D presents
an overview of best practices that may be used to evaluate models, including more detailed information
on the peer review process for models and specific technical guidance on tools for model evaluation.

-------
1.4    Appropriate Implementation of This Document

The principles and  practices described in this guidance are designed to apply generally to all types of
models; however, EPA program and regional offices may modify the recommendations,  as appropriate
and necessary to the specific modeling project and application.  Each  EPA office is responsible for
implementing the best practices described in a manner appropriate to meet its needs.

As indicated by the use of non-mandatory  language such as "may," "should," and "can," this  document
provides recommendations and suggestions and does  not create legal rights or impose  legally binding
requirements on EPA or the public.

The Council for Regulatory Environmental Modeling has also developed the Models Knowledge Base —
a Web-based inventory of information on models used in EPA — as a companion product to complement
this document.  This inventory provides convenient access to standardized documentation  on the models'
development, scientific basis, user requirements, evaluation studies, and application examples.

-------
2. Modeling for Environmental Decision Support

2.1 Why Are Models Important?

This guidance defines a model as "a simplification of reality that is constructed to gain insights into
select attributes of a particular physical, biological, economic, or social system." A model
developer sets boundary conditions and determines which aspects of the system are to be modeled,
which processes are important, how these processes may be represented mathematically, and what
computational methods to use in implementing the mathematics. Thus, models are based on simplifying
assumptions and cannot completely replicate the complexity inherent in environmental systems. Despite
these limitations, models are essential for a variety of purposes in the environmental field. These
purposes tend to fall into two categories:

• To diagnose (i.e., assess what happened) and examine causes and precursor conditions (i.e., why it
happened) of events that have taken place.
• To forecast outcomes and future events (i.e., what will happen).

Whether applied to current conditions or envisioned future circumstances, models play an important role
in environmental management. They are an important tool to analyze environmental and human health
questions and characterize systems that are too complex to be addressed solely through empirical
means.

Models can be classified in various ways (see Appendix B) — for example, based on their conceptual
basis and mathematical solution, the purpose for which they were developed and are applied, the domain
or discipline to which they apply, and the level of resolution and complexity at which they operate. Three
categories of regulatory models have been identified based on their purpose or application (CREM 2001):

• Policy analysis. The results of policy analysis models affect national policy decisions. These models
are used to set policy for large, multi-year programs or concepts — for example national policy on
acid rain and phosphorus reduction in the Great Lakes.
• National regulatory decision making. These models inform national regulatory decision making
after overall policy has been established. Examples include the use of a model to assist in
determining federal regulation of a specific pesticide or to aid in establishing national effluent
limitations.
• Implementation applications. These models are used in situations where policies and regulations
have already been made. Their development and use may be driven by court-ordered schedules and
the need for local action.

Environmental models are one source of information for Agency decision makers who need to consider
many competing objectives. A number of EPA programs make decisions based on information from
environmental modeling applications. Within the Agency:

• Models are used to simulate many different processes, including natural (chemical, physical, and
biological) systems, economic phenomena, and decision processes.
• Many types of models are employed, including economic, behavioral, physical, engineering design,
health, ecological, and fate/transport models.

-------
•   The geographic scale of the problems addressed by a model can vary from national scale to  an
    individual site. Examples of different scales include:
    •   National air quality models used in decisions about emission requirements.
    •   Watershed-scale water quality models used in decisions about permit limits for point sources.
    •   Site-scale human  health risk  models  used  in  decisions  about  hazardous  waste  cleanup
       measures.

Box 1:  Examples of EPA Web Sites Containing Model Descriptions for Individual Programs

National Environmental Research Laboratory Models: http://www.epa.qov/nerl/topics/models.html
Atmospheric Sciences Modeling Division: http://www.epa.gov/asmdnerl/index.html
Office of Water's Water Quality  Modeling: http://www.epa.gov/waterscience/wqm
Center for Subsurface Modeling Support: http://www.epa.gov/ada/csmos.html
National Center for Computational Toxicology: http://www.epa.gov/ncct
Support Center for Regulatory Atmospheric Modeling:  http://www.epa.gov/scram001/agmindex.htm
Models also  have useful  applications  outside the regulatory  context.   For example, because models
include explicit mathematical  statements about system  mechanics, they serve  as research tools for
exploring  new  scientific issues  and screening  tools for simplifying and/or refining  existing scientific
paradigms or software (SAB 1993a, 1989). Models can also help users study the behavior of ecological
systems, design field studies, interpret data, and generalize results.

2.2     The Modeling Life-Cycle

The process  of developing and applying a model to  address a specific decision making need generally
follows the iterative progression described in Box 2 and depicted in Figure 1.  Models are used to address
real or perceived  environmental problems.  Therefore, a modeling process (i.e., model  development,
evaluation, and application described in chapters 3,  4, and 5,  respectively) is initiated after the Agency
has identified an environmental problem and determined that model results could provide useful input for
an Agency decision.

Problem identification will be most successful if it involves all parties who would be involved in model
development and use (i.e., model developers, intended users, and decision makers).  At a  minimum, the
Agency should develop a relatively simple, plain English problem identification statement.

-------
Box 2: Basic Steps in the Process of Modeling for Environmental Decision Making
(modified from Box 3-1, NRC Report on Models in Regulatory Environmental Decision Making)
 Step
                     Modeling Issues
Problem identification
and specification:
to determine the right
decision-relevant questions
and establish modeling
objectives
Definition of model
purpose
Goal
Decisions to be supported
Predictions to be made
Specification of
modeling context
Scale (spatial and temporal)
Application domain
User community
Required inputs
Desired output
Evaluation criteria
Model development: to
develop the conceptual
model that reflects the
underlying science of the
processes being modeled,
and develop the
mathematical
representation of that
science and encode these
mathematical expressions
in a computer program
Conceptual model
formulation
Assumptions (dynamic, static, stochastic, deterministic)
State variables represented
Level of process detail necessary
Scientific foundations
Computational
model development
Algorithms
Mathematical/computational methods
Inputs
Hardware platforms and software infrastructure
User interface
Calibration/parameter determination
Documentation
Model evaluation: to test
that the model expressions
have been encoded
correctly into the computer
program and test the model
outputs by comparing them
with empirical data	
Model testing and
revision
Theoretical corroboration
Model components verification
Corroboration (independent data)
Sensitivity analysis
Uncertainty analysis
Robustness determination
Comparison to evaluation criteria set during formulation
Model application:
running the model and
analyzing its outputs to
inform a decision
Model use
Analysis of scenarios
Predictions evaluation
Regulations assessment
Policy analysis and evaluation
Model post-auditing

-------
Public Peltcf ;,Pr0ees»:
ENVIRONMENT
tmmementation
.Administrative (OMB)
4 M & Regulatory
interpretation of
Model Results &
Uncertainties
Regulatory
Agency Decisions
STAKEHOLDERS include:
- Source fadliity owners or responsi
- Directly affected neignfeormfg property owners an
-------
Summary of Recommendations for Model Development

• Communication between model developers and model users is crucial during model development.
• Each element of the conceptual model should be clearly described (in words, functional expressions,
diagrams, and graphs, as necessary), and the science behind each element should be clearly
documented.
• When possible, simple competing conceptual models/hypotheses should be tested.
• Sensitivity analysis should be used early and often.
• The optimal level of model complexity should be determined by making appropriate tradeoffs among
competing objectives.
• Where possible, model parameters should be characterized using direct measurements of sample
populations.
• All input data should meet data quality acceptance criteria in the QA project plan for modeling.

3.1 Introduction

Model development begins after problem identification — i.e., after the Agency has identified an
environmental problem it needs to address and has determined that models may provide useful input for
the Agency decision making needed to address the problem (see Section 2.2). In this guidance, model
development comprises the steps involved in (1) confirming whether a model is, in fact, a useful tool to
address the problem; what type of model would be most useful; and whether an existing model can be
used for this purpose; as well as (2) developing an appropriate model if one does not already exist. Model
development sets the stage for model evaluation (covered in chapter 3), an ongoing process in which the
Agency evaluates the appropriateness of the existing or new model to help address the environmental
problem.

Model development can be viewed as a process with three main steps: (a) specify the environmental
problem (or set of issues) the model is intended to address and develop the conceptual model, (b)
evaluate or develop the model framework (develop the mathematical model), and (c) parameterize the
model to develop the application tool. Sections 3.2, 3.3, and 3.4 of this chapter, respectively, describe
the various aspects and considerations involved in implementing each of these steps.

As described below, model development is a collaborative effort involving model developers, intended
users, and decision makers (the "project team"). The perspective and skills of each group are important to
develop a model that will provide an appropriate, credible, and defensible basis for addressing the
environmental issue of concern.

A "graded approach" should be used throughout the model development process. This involves repeated
examination of the scope, rigor, and complexity of the modeling analysis in light of the intended use of
results, degree of confidence needed in the results and Agency resource constraints.
-------
3.2 Problem Specification and Conceptual Model Development

Problem specification, culminating in development of the conceptual model, involves an iterative,
collaborative effort among model developers, intended users, and decision makers (the project team) to
specify all aspects of the problem that will inform subsequent selection or development of a model
framework. Communication between model developers and model users is crucial to clearly establish the
objectives of the modeling process; ambiguity at this stage can undermine the chances for success
(Manno et al. 2008).

During problem specification, the project team defines the regulatory or research objectives, the type and
scope of model best suited to meet those objectives, the data criteria, the model's domain of applicability,
and any programmatic constraints. These considerations provide the basis for developing a conceptual
model, which depicts or describes the most important behaviors of the system, object, or process relevant
to the problem of interest. Problem specification and the resulting conceptual model define the modeling
needs sufficiently that the project team can then determine whether an existing model can be used to
meet those needs or whether a new model should be developed.

3.2.1 Define the Objectives

The first step in problem specification is to define the regulatory or research objectives (i.e., what
questions the model needs to answer). To do so, the team should develop a written statement of
modeling objectives that includes the state variables of concern, the stressors driving those state
variables, appropriate temporal and spatial scales, and the degree of model accuracy and precision
needed.

3.2.2 Determine the Type and Scope of Model Needed

Many different types of models are available, including empirical vs. mechanistic, static vs. dynamic,
simulation vs. optimization, deterministic vs. stochastic, and lumped vs. distributed. The project team
should discuss and compare alternatives with respect to their ability to meet the objectives in order to
determine the most appropriate type of model for addressing the problem.

The scope (i.e., spatial, temporal and process detail) of models that can be used for a particular
application can range from very simple to very complex depending on the problem specification and data
availability, among other factors. When different types of models may be appropriate for solving different
problems, a graded approach should be used to select or develop models that will provide the scope,
rigor, and complexity appropriate to the intended use of and confidence needed in the results. Section
3.3.1 provides more information on considerations regarding model complexity.

3.2.3 Determine Data Criteria

This step includes developing data quality objectives (DQOs) and specifying the acceptable range of
uncertainty. DQOs (EPA 2000a) provide specifications for model quality and associated checks (see
Appendix C, Box C1: Background on EPA Quality System). Well-defined DQOs guide the design of
monitoring plans and the model development process (e.g., calibration and verification). The DQOs
provide guidance on how to state data needs when limiting decision errors (false positives or false
-------
negatives) relative to a given decision.1 The DQOs should include a statement about the acceptable level
of total uncertainty that will still enable model results to be used for the intended purpose (Appendix C,
Box C2: Configuration Tests Specified in the QA Program). Uncertainty describes the lack of knowledge
about models, parameters, constants, data, and beliefs. Defining the ranges of acceptable uncertainty —
either qualitatively or quantitatively — helps project planners generate "specifications" for quality
assurance planning and partially determines the appropriate boundary conditions and complexity for the
model being developed.

3.2.4 Determine the Model's Domain of Applicability

To select an appropriate model, the project team must understand the model's domain of applicability —
i.e., the set of conditions under which use of the model is scientifically defensible and the relevant
characteristics of the system to be modeled. This involves identifying the environmental domain to be
modeled and then specifying the processes and conditions within that domain, including the transport and
transformation processes relevant to the policy/management/research objectives, the important time and
space scales inherent in transport and transformation processes within that domain in comparison to the
time and space scales of the problem objectives, and any peculiar conditions of the domain that will affect
model selection or new model construction.

3.2.5 Discuss Programmatic Constraints

At this stage, the project team also needs to consider any factors that could constrain the modeling
process. This discussion should include considerations of time and budget, available data or resources to
acquire more data, legal and institutional factors, computer resource constraints, and the experience and
expertise of the modeling staff.

3.2.6 Develop the Conceptual Model

A conceptual model depicts or describes the most important behaviors of the system, object, or process
relevant to the problem of interest. In developing the conceptual model, the model developer may
consider literature, fieldwork, applicable anecdotal evidence, and relevant historical modeling projects.
The developer should clearly describe (in words, functional expressions, diagrams, and/or graphs) each
element of the conceptual model and should document the science behind each element (e.g., laboratory
experiments, mechanistic evidence, empirical data supporting the hypothesis, peer-reviewed literature) in
mathematical form, when possible. To the extent feasible, the modeler should also provide information
on assumptions, scale, feedback mechanisms, and static/dynamic behaviors. When relevant, the
strengths and weaknesses of each constituent hypothesis should be described.
1 False rejection decision errors (false positives) occur when the null hypothesis (or baseline condition) is incorrectly
rejected based on the sample data. The decision is made assuming the alternate condition or hypothesis to be true
when in reality it is false. False acceptance decision errors (false negatives) occur when the null hypothesis (or
baseline condition) cannot be rejected based on the available sample data. The decision is made assuming the
baseline condition is true when in reality it is false.
10
-------
3.3 Model Framework Selection or Development

Once the team has specified the problem and type of model needed to address the problem, the next
step is to identify or develop a model framework that meets those specifications. A model framework is a
formal mathematical specification of the concepts, procedures, and behaviors underlying the system,
object, or process relevant to the problem of interest, usually translated into computer software.

For mechanistic modeling of common environmental problems, one or more suitable model frameworks
may exist. Many existing model frameworks in the public domain can be used in environmental
assessments. Several institutions, including EPA, develop and maintain these model frameworks on an
ongoing basis. Ideally, more than one model framework will meet the project needs, and the project team
can select the best model for the specified problem. Questions to consider when evaluating existing
model frameworks are described below.

Sometimes no model frameworks are appropriate to the task, and EPA will develop a new model
framework or modify an existing framework to include the additional capabilities needed to address the
project needs.

Some assessments require linking multiple model frameworks, such that the output from one model is
used as input data to another model. For example, air quality modeling often links meteorological,
emissions, and air chemistry/transport models. When employing linked models, the project team should
evaluate each component model, as well as the full system of integrated models, at each stage of the
model development and evaluation process.

In all cases, the documentation for the selected model should clearly state why and how the model can
and will be used.

As potential model frameworks are identified or developed for addressing the problem, the project team
will need to consider several issues, including:

• Does sound science (including peer-reviewed theory and equations) support the underlying
hypothesis?
• Is the model's complexity appropriate for the problem at hand?
• Do the quality and quantity of data support the choice of model?
• Does the model structure reflect all the relevant inputs described in the conceptual model?
• Has the model code been developed? If so, has it been verified?

It is recommended that the evaluation process apply the principles of scientific hypothesis testing (Platt
1964) using an iterative approach (Hilborn and Mangel 1997). If the team is evaluating multiple model
frameworks, it may be useful to statistically compare the performance of these competing models with
observational, field, or laboratory data (Chapter 4).

Box 3: Example of Model Selection Considerations: Arsenic in Drinking Water
(from Box 5-3 of NRC's Models in Environmental Regulatory Decision Making)
A major challenge for regulatory model applications is which model to use to inform the decision making process. In
this example, several models were available to estimate the cancer incidence associated with different levels of
arsenic in drinking water. These models differed according to how age and exposure were incorporated (Morales et
11
-------
al. 2000). All the models assumed that the number of cancers observed in a specific age group of a particular village
followed a Poisson model with parameters, depending on the age and village exposure level. Linear, log, polynomial,
and spline models for age and exposure were considered.

These various models differed substantially in their fitted values, especially in the critical low-dose area, which is so
important for establishing the benchmark dose (BMD) that is used to set a reference dose (RfD). The fitted-dose
response model was also strongly affected by whether Taiwanese population data were included as a baseline
comparison group. Depending on the particular modeling assumptions used, the estimates of the BMD and
associated lower limit (BMDL) varied by over an order of magnitude.

Several strategies are available for choosing among multiple models. One strategy is to pick the "best" model — for
example, use one of the popular statistical goodness of fit measures, such as the Akieke (sic) information criterion
(AIC) or the Bayesian information criterion (BIC). These approaches correspond to picking the model that maximizes
log-likelihood, subject to a penalty function reflecting the number of model parameters, thus effectively forcing a
trade-off between improving model fit by adding addition model parameters versus having a parsimonious
description. In the case of the arsenic risk assessment, however, the noisiness of the data meant that many of the
models explored by Morales et al. (2000) were relatively similar in terms of statistical goodness-of-fit criteria. In a
follow-up paper, Morales et al. (2006) argued that it was important to address and account for the model uncertainty,
because ignoring it would underestimate the true variability of the estimated model fit and, in turn, overestimate
confidence in the resulting BMD and lead to "risky decisions" (Volinsky et al. 1997).

Morales et al. suggested using Bayesian model averaging (BMA) as a tool to avoid picking one particular model.
BMA combines over a class of suitable models. In practice, estimates based on a BMA approach tend to approximate
a weighted average of estimates based on individual models, with the weights reflecting how well each individual
model fits the observed data. More precisely, these weights can be interpreted as the probability that a particular
model is the true model, given the observed data. Figure 2 shows the results of applying a BMA procedure to the
arsenic data:

• Figure 2(a) plots individual fitted models, with the width of each plotted line reflecting the weights.
• Figure 2(b) shows the estimated overall dose-response curve (solid line) fitted via BMA. The shaded area shows
the upper and lower limits (2.5% and 97.5% tiles) based on the BMA procedure. The dotted lines show upper
and lower limits based on the best fitting models.

Figure 2(b) (L30) effectively illustrates the inadequacy of standard statistical confidence intervals in characterizing
uncertainty in settings where there is substantial model uncertainty. The BMA limits coincide closely with the
individual curves at the upper level of the dose-response curve where all the individual models tend to give similar
results.
Figure 2. (a) Individual dose-response models, and (b) overall dose-response model fitted using the Bayesian model
averaging approach. Source: Morales et al. 2000.

3.3.1 Model Complexity

During the problem specification stage, the project team will have considered the degree of complexity
desired for the model (see Section 3.2.2). As described below, model complexity influences uncertainty.
Models tend to uncertainty as they become increasingly simple or increasingly complex. Thus complexity
12
-------
is an important parameter to consider when choosing among competing model frameworks or
determining the suitability of the existing model framework to the problem of concern. For the reasons
described below, the optimal choice generally is a model that is no more complicated than necessary to
inform the regulatory decision. For the same reasons, model complexity is an essential parameter to
consider when developing a new model framework.

Uncertainty exists when knowledge about specific factors, parameters (inputs), or models is incomplete.
Models have two fundamental types of uncertainty:

• Model framework uncertainty, which is a function of the soundness of the model's underlying scientific
foundations.
• Data uncertainty, which arises from measurement errors, analytical imprecision, and limited sample
size during collection and treatment of the data used to characterize the model parameters.

These two types of uncertainty have a reciprocal relationship, with one increasing as the other decreases.
Thus, as illustrated in Figure 3, an optimal level of complexity (the "point of minimum uncertainty") exists
for every model.
Uncertainty
Total Uncertainty
Model
Framework
Uncertainty
Point of
Minimum
Uncertainty
Data
Uncertainty^**
Model Complexity
Figure 3. Relationship between model framework uncertainty and data uncertainty, and their
combined effect on total model uncertainty.
(Adapted from Hanna 1988).

For example, air quality modelers must sometimes compromise when choosing among the physical
processes that will be treated explicitly in the model. If the objective is to estimate the pattern of pollutant
concentration values near one (or several) source(s), then chemistry is typically of little importance
because the distances between the pollutant source and receptor are generally too short for chemical
13
-------
formation and destruction to greatly affect pollutant concentrations. However, in such situations, other
factors tend to have a significant effect and must be properly accounted for in the model. These may
include building wakes, initial characterization of source release conditions and size, rates of diffusion of
pollutants released as they are transported downwind, and land use effects on plume transport.
Conversely, when the objective is to estimate pollutant concentrations further from the source, chemistry
becomes more important because there is more time for chemical reactions to take place, and initial
source release effects become less important because the pollutants become well-mixed as they travel
through the atmosphere. To date, attempts to model both near-field dispersion effects and chemistry have
been inefficient and slow on desktop computers.

Because of these competing objectives, parsimony (economy or simplicity of assumptions) is desirable in
a model. As Figure 3 illustrates, as models become more complex to treat more physical processes, their
performance tends to degrade because they require more input variables, leading to greater data
uncertainty. Because different models contain different types and ranges of uncertainty, it can useful to
conduct sensitivity analysis early in the model development phase to identify the relative importance of
model parameters. Sensitivity analysis is the process of determining how changes in the model input
values or assumptions (including boundaries and model functional form) affect the model outputs
(Morgan and Henrion 1990).

Model complexity can be constrained by eliminating parameters when sensitivity analyses (Chapter
4/Appendix D) show that they do not significantly affect the outputs and when there is no process-based
rationale for including them. However, a variable of little significance in one application of a model may be
more important in a different application. In past reviews of Agency models, the SAB has supported the
general guiding principle of simplifying complex models, where possible, for the sake of transparency
(SAB 1988), but has emphasized that care should be taken not to eliminate important parameters from
process-based models simply because data are unavailable or difficult to obtain (SAB 1989). In any
case, the quality and resolution of available data will ultimately constrain the type of model that can be
applied. Hence, it is important to identify the existing data and and/or field collection efforts that are
needed to adequately parameterize the model framework and support the application of a model. The
NRC Committee on Models in the Regulatory Decision Process recommended that models used in the
regulatory process should be no more complicated than is necessary to inform regulatory decision and
that it is often preferable to omit capabilities that do not substantially improve model performance (NRC
2007).

3.3.2 Model Coding and Verification

Model coding translates the mathematical equations that constitute the model framework into functioning
computer code. Code verification ascertains that the computer code has no inherent numerical problems
with obtaining a solution. Code verification tests whether the code performs according to its design
specifications. It should include an examination of the numerical technique in the computer code for
consistency with the conceptual model and governing equations (Beck et al. 1994). Independent testing of
the code once it is fully developed can be useful as an additional check of integrity and quality.

Several early steps can help minimize later programming errors and facilitate the code verification
process. For example:
14
-------
- Using "comment" lines to describe the purpose of each component within the code during
development makes future revisions and improvements by different modelers and programmers more
efficient.
- Using a flow chart when the conceptual model is developed and before coding begins helps
show the overall structure of the model program. This provides a simplified description of the
calculations that will be performed in each step of the model.

Breaking the program/model into component parts or modules is also useful for careful consideration
of model behavior in an encapsulated way. This allows the modeler to test the behavior of each sub-
component separately, expediting testing and increasing confidence in the program. A module is an
independent piece of software that forms part of one or more larger programs. Breaking large models
into discrete modules facilitates testing and debugging (locating/correcting errors) compared to large
programs. The approach also makes it easier to re-use relevant modules in future modeling projects, or
to update, add, or remove sections of the model without altering the overall program structure.

Use of generic algorithms for common tasks can often save time and resources, allowing efforts to
focus on developing and improving the original aspects of a new model. An algorithm is a precise rule (or
set of rules) for solving some problem. Commonly used algorithms are often published as "recipes" with
publicly available code (e.g., Press 1992). Developers should review existing Agency models and code
to minimize duplication of effort. The CREM models knowledge base, which will contain a Web-
accessible inventory of models, will provide a resource model developers can use for this purpose.

Software engineering has evolved rapidly in recent years and continues to advance rapidly with changes
in technology and user platforms. For example, some of the general recommendations for developing
computer code given above do not apply to models that are developed using object-oriented platforms.
Object-oriented platform model systems use a collection of cooperating "objects." These objects are
treated as instances of a class within a class hierarchy, where a class is a set of objects that share a
common structure and behavior. The structure of a class is determined by the class variables, which
represent the state of an object of that class; the behavior is given by the set of methods associated with
the class (Booch 1994). When models are developed with object-oriented platforms, the user should print
out the actual mathematical relationships the platform generates and review them as part of the code
validation process.

Many references on programming style and conventions provide specific, technical suggestions for
developing and testing computer code (e.g., The Elements of Programming Style [Kernigham and
Plaugher 1988]). In addition, the Guidance for Quality Assurance Project Plans for Modeling (EPA
2002b) suggests a number of practices during code verification to "check" how well it follows the
"specifications" laid out during QA planning (Appendix C, Box C2: Configuration Tests Specified in the QA
Program).

3.4 Application Tool Development

Once a model framework has been selected or developed, the modeler populates the framework with the
specific system characteristics needed to address the problem, including geographic boundaries of the
model domain, boundary conditions, pollution source inputs, and model parameters. In this manner, the
generic computational capabilities of the model framework are converted into an application tool to
15
-------
assess a specific problem occurring at a specific location. Model parameters are terms in the model that
are fixed during a model run or simulation but can be changed in different runs, either to conduct
sensitivity analysis or to perform an uncertainty analysis when probabilistic distributions are selected to
model parameters or achieve calibration (defined below) goals. Parameters can be quantities estimated
from sample data that characterize statistical populations or they can be constants such as the speed of
light and gravitational force. Other activities at this stage of model development include creating a user
guide for the model, assembling datasets for model input parameters, and determining hardware
requirements.

3.4.1 Input Data

As mentioned above, the accuracy, variability, and precision of input data used in the model is a major
source of uncertainty:

• Accuracy refers to the closeness of a measured or computed value to its "true" value (the value
obtained with perfect information'). Due to the natural heterogeneity and random variability
(stochasticity') of many environmental systems, this "true" value exists as a distribution rather than a
discrete value.
• Variability refers to differences attributable to true heterogeneity or diversity in model parameters.
Because of variability, the "true" value of model parameters is often a function of the degree of spatial
and temporal aggregation.
• Precision refers to the quality of being reproducible in outcome or performance. With models and
other forms of quantitative information, precision often refers to the number of decimal places to
which a number is computed. This is a measure of the "preciseness" or "exactness" of the model.

Modelers should always select the most appropriate data — as defined by QA protocols for field
sampling, data collection, and analysis (EPA 2002c, 2002d, 2000b) — for use in modeling analyses.
Whenever possible, all parameters should be directly measured in the system of interest.

Box 4: Comprehensive Everglades Restoration Plan: An Example of the Interdependence of Models and

(from NRC's Models in Environmental Regulatory Decision Making)
The restoration of the Florida Everglades is the largest ecosystem restoration ever planned in terms of geographical
extent and number of individual components. The NRC Committee on Restoration of the Greater Everglades
Ecosystem, which was charged with providing scientific advice on this effort, describes the role that modeling and
measurements should play in implementing an adaptive approach to restoration (NRC 2003). Under the committee's
vision, monitoring of hydrological and ecological performance measures should be integrated with mechanistic
modeling and experimentation to better understand how the Everglades function and how the system will respond to
management practices and external stresses. Because individual components of the restoration plan will be
staggered in time, the early components can provide scientific feedback to guide and refine implementation of later
components of the plan.

The NRC Committee on Models in the Regulatory Decision Process recommends that: "...using adapting
strategies to coordinate data collection and modeling should be a priority for decision makers and those
16
-------
responsible for regulatory model development and application. The interdependence of measurements
and modeling needs to be fully considered as early as the conceptual model development phase."

3.4.2 Model Calibration

Some models are "calibrated" to set parameters. Appendix C provides guidance on model calibration as
a QA project plan element (see Box C3: Quality Assurance Planning Suggestions for Model Calibration
Activities). In this guidance, calibration is defined as the process of adjusting model parameters within
physically defensible ranges until the resulting predictions give the best possible fit to the observed data
(EPA 1994b). In some disciplines, calibration is also referred to as parameter estimation (Beck et al.
1994).

Most process-oriented environmental models are under-determined; that is, they contain more uncertain
parameters than state variables that can be used to perform a calibration. Sensitivity analysis can be
used to identify key processes influencing the state variables. Sometimes the rate constant for a key
process can be measured directly — for example, measuring the rate of photosynthesis (a process) in a
lake in addition to the phytoplankton biomass (a state variable). Direct measurement of rate parameters
can reduce model uncertainty.

When a calibration database has been developed and improved over time, the initial adjustments and
estimates may need period recalibration. When data for quantifying one or more parameter values are
limited, calibration exercises can be used to find solutions that result in the "best fit" of the model.
However, these solutions will not provide meaningful information unless they are based on measured
physically defensible ranges. Therefore, this type of calibration should be undertaken with caution.

Because of these concerns, the use of calibration to improve model performance varies among EPA
offices and regions. For a particular model, the appropriateness of calibration may be a function of the
modeling activities undertaken. For example, the Office of Water's standard practice is to calibrate well-
established model frameworks such as CE-QUAL-W2 (a model for predicting temperature fluctuations in
rivers) to a specific system (e.g., the Snake River). This calibration generates a site-specific tool (e.g., the
"Snake River Temperature" model). In contrast, the Office of Air and Radiation (OAR) more commonly
uses model frameworks and models that do not need site-specific adjustments. For example, certain
types of air models (e.g., gaussian plume) are parameterized for a range of meteorological conditions,
and thus do not need to be "recalibrated" for different geographic locations (assuming the range of
conditions is appropriate for the model). OAR also seeks to avoid artificial improvements in model
performance by adjusting model inputs outside the ranges supported by the empirical databases. These
practices prompted OAR to issue the following statement on model calibration in their Guideline on Air
Quality Models (EPA 2003b):

Calibration of models is not common practice and is subject to much error and
misunderstanding. There have been attempts by some to compare model estimates and
measurements on an event-by-event basis and then calibrate a model with results of that
comparison. This approach is severely limited by uncertainties in both source and
meteorological data and therefore it is difficult to precisely estimate the concentration at
17
-------
an exact location for a specific increment of time. Such uncertainties make calibration of
models of questionable benefit. Therefore, model calibration is unacceptable.

In general, however, models benefit from thoughtful adaptation that will enable them to respond
adequately to the specifics of each regulatory problem to which they are applied.
18
-------
Summary of Recommendations for Model Evaluation

appropriately used to inform a decision.
• Model evaluation addresses the soundness of the science underlying a model, the quality and
quantity of available data, the degree of correspondence with observed conditions, and the
appropriateness of a model for a given application.
• Recommended components of the evaluation process include: (a) credible, objective peer review; (b)
QA project planning and data quality assessment; (c) qualitative and/or quantitative model
corroboration; and (d) sensitivity and uncertainty analyses.
• Quality is an attribute of models that is meaningful only within the context of a specific model
application. Determining whether a model serves its intended purpose involves in-depth discussions
between model developers and the users responsible for applying for the model to a particular
problem.
• Information gathered during model evaluation allows the decision maker to be better positioned to
formulate decisions and policies that take into account all relevant issues and concerns.

4.1 Introduction
Models will always be constrained by computational limitations, assumptions and knowledge
gaps. They can best be viewed as tools to help inform decisions rather than as machines to
generate truth or make decisions. Scientific advances will never make it possible to build a
perfect model that accounts for every aspect of reality or to prove that a given model is correct in
all aspects for a particular regulatory application. These characteristics...suggest that model
evaluation be viewed as an integral and ongoing part of the life cycle of a model, from problem
formulation and model conceptualization to the development and application of a computational
tool.
— NRC Committee on Models in the Regulatory Decision Process (NRC 2007)

The natural complexity of environmental systems makes it difficult to mathematically describe all relevant
processes, including all the intrinsic mechanisms that govern their behavior. Thus, policy makers often
rely on models as tools to approximate reality when making decisions that affect environmental systems.
The challenge facing model developers and users is determining when a model, despite its uncertainties,
can be appropriately used to inform a decision. Model evaluation is the process used to make this
determination. In this guidance, model evaluation is defined as the process used to generate information
to determine whether a model and its analytical results are of a quality sufficient to serve as the basis for
a decision. Model evaluation is conducted over the life cycle of the project, from development through
application.
19
-------
Box 5: Model Evaluation Versus Validation Versus Verification
Model evaluation should not be confused with model validation. Different disciplines assign different meanings to
these terms and they are often confused. For example, Suter (1993) found that among models used for risk
assessments, misconception often arises in the form of the question "Is the model valid?" and statements such as
"No model should be used unless it has been validated." Suter further points out that "validated" in this context means
(a) proven to correspond exactly to reality or (b) demonstrated through experimental tests to make consistently
accurate predictions.
Because every model contains simplifications, predictions derived from a model can never be completely accurate
and a model can never correspond exactly to reality. In addition, "validated models" (e.g., those that have been
shown to correspond to field data) do not necessarily generate accurate predictions of reality for multiple applications
(Beck 2002a). Thus, some researchers assert that no model is ever truly "validated"; models can only be invalidated
for a specific application (Oreskes et al. 1994). Accordingly, this guidance focuses on process and techniques for
model evaluation rather than model validation or invalidation.
"Verification" is another term commonly applied to the evaluation process. However, in this guidance and elsewhere,
model verification typically refers to model code verification as defined in the model development section. For
example, the NRC Committee on Models in the Regulatory Decision Process (NRC 2007) provides the following
definition:
Verification refers to activities that are designed to confirm that the mathematical framework
embodied in the module is correct and that the computer code for a module is operating according
to its intended design so that the results obtained compare favorably with those obtained using
known analytical solutions or numerical solutions from simulators based on similar or identical
mathematical frameworks.
In simple terms, model evaluation provides information to help answer four main questions (Beck 2002b):

1. How have the principles of sound science been addressed during model development?
2. How is the choice of model supported by the quantity and quality of available data?
3. How closely does the model approximate the real system of interest?
4. How does the model perform the specified task while meeting the objectives set by QA project
planning?

These four factors address two aspects of model quality. The first factor focuses on the intrinsic
mechanisms and generic properties of a model, regardless of the particular task to which it is applied. In
contrast, the latter three factors are evaluated in the context of the use of a model within a specific set of
conditions. Hence, it follows that model quality is an attribute that is meaningful only within the context of
a specific model application. A model's quality to support a decision becomes known when information is
available to assess these factors.

The NRC committee recommends that evaluation of a regulatory model continue throughout the life of a
model and that an evaluation plan could:

• Describe the model and its intended uses.
• Describe the relationship of the model to data, including the data for both inputs and corroboration.
20
-------
• Describe how such data and other sources of information will be used to assess the ability of the
model to meet its intended task.
• Describe all the elements of the evaluation plan by using an outline or diagram that shows how the
elements relate to the model's life cycle.
• Describe the factors or events that might trigger the need for major model revisions or the
circumstances that might prompt users to seek an alternative model. These can be fairly broad and
qualitative.
• Identify the responsibilities, accountabilities, and resources needed to ensure implementation of the
evaluation plan.

As stated above, the goal of model evaluation is to ensure model quality. At EPA, quality is defined by the
Information Quality Guidelines (IQGs) (EPA 2002a). The IQGs apply to all information that EPA
disseminates, including models, information from models, and input data (see Appendix C, Box C4:
Definition of Quality). According to the IQGs, quality has three major components: integrity, utility, and
objectivity. This chapter focuses on addressing the four questions listed above by evaluating the third
component, objectivity — specifically, how to ensure the objectivity of information from models by
considering their accuracy, bias, and reliability.

• Accuracy, as described in Section 2.4, is the closeness of a measured or computed value to its "true"
value, where the "true" value is obtained with perfect information.
• Bias describes any systematic deviation between a measured (i.e., observed) or computed value and
its "true" value. Bias is affected by faulty instrument calibration and other measurement errors,
systematic errors during data collection, and sampling errors such as incomplete spatial
randomization during the design of sampling programs.
• Reliability is the confidence that (potential) users have in a model and its outputs such that they are
willing to use the model and accept its results (Sargent 2000). Specifically, reliability is a function of
the model's performance record and its conformance to best available, practicable science.

This chapter describes principles, tools, and considerations for model evaluation throughout all stages of
development and application. Section 4.2 presents a variety of qualitative and quantitative best practices
for evaluating models. Section 4.3 discusses special considerations for evaluating proprietary models.
Section 4.4 explains why retrospective analysis of models, conducted after a model has been applied,
can be important to improve individual models and regulatory policies and to systematically enhance the
overall modeling field. Finally, Section 4.5 describes how the evaluation process culminates in a decision
whether to apply the model to decision making. Section 4.6 reviews the key recommendations from this
chapter.

4.2 Best Practices for Model Evaluation

The four questions listed above address the soundness of the science underlying a model, the quality and
quantity of available data, the degree of correspondence with observed conditions, and the
appropriateness of a model for a given application. This guidance describes several "tools" or best
practices to address these questions: peer review of models; QA project planning, including data quality
assessment; model corro bo ration (qualitative and/or quantitative evaluation of a model's accuracy and
predictive capabilities); and sensitivity and uncertainty analysis. These tools and practices include both
qualitative and quantitative techniques:
21
-------
• Qualitative assessments: Some of the uncertainty in model predictions may arise from sources
whose uncertainty cannot be quantified. Examples are uncertainties about the theory underlying the
model, the manner in which that theory is mathematically expressed to represent the environmental
components, and the theory being modeled. Subjective evaluation of experts may be needed to
determine appropriate values for model parameters and inputs that cannot be directly observed or
measured (e.g., air emissions estimates). Qualitative assessments are needed for these sources of
uncertainty. These assessments may involve expert elicitation regarding the system's behavior and
comparison with model forecasts.
• Quantitative assessments: The uncertainty in some sources — such as some model parameters and
some input data — can be estimated through quantitative assessments involving statistical
uncertainty and sensitivity analyses. These types of analyses can also be used to quantitatively
describe how model estimates of current conditions may be expected to differ from comparable field
observations. However, since model predictions are not directly observed, special care is needed
when quantitatively comparing model predictions with field data.

As discussed previously, model evaluation is an iterative process. Hence, these tools and techniques
may be effectively applied throughout model development, testing, and application and should not be
interpreted as sequential steps for model evaluation.

Model evaluation should always be conducted using a graded approach that is adequate and appropriate
to the decision at hand (EPA 2001, 2002b). This approach recognizes that model evaluation can be
modified to the circumstances of the problem at hand and that programmatic requirements are varied.
For example, a screening model (a type of model designed to provide a "conservative" or risk-averse
answer) that is used for risk management should undergo rigorous evaluation to avoid false negatives,
while still not imposing unreasonable data-generation burdens (false positives) on the regulated
community. Ideally, decision makers and modeling staff work together at the onset of new projects to
identify the appropriate degree of model evaluation (see Section 3.1).

External circumstances can affect the rigor required in model evaluation. For example, when the likely
result of modeling will be costly control strategies and associated controversy, more detailed model
evaluation may be necessary. In these cases, many aspects of the modeling may come under close
scrutiny, and the modeler must document the findings of the model evaluation process and be prepared
to answer questions that will arise about the model. A deeper level of model evaluation may also be
appropriate when modeling unique or extreme situations that have not been previously encountered.

Finally, as noted earlier, some assessments require the use of multiple, linked models. This linkage has
implications for assessing uncertainty and applying the system of models. Each component model as well
as the full system of integrated models must be evaluated.

Sections 4.2.1 and 4.2.2, on peer review of models and quality assurance protocols for input data,
respectively, are drawn from existing guidance. Section 4.2.3, on model corroboration activities and the
use of sensitivity and uncertainty analysis, provides new guidance for model evaluation (along with
Append ixD).
22
-------
Box 6: Examples of Life Cycle Model Evaluation

The value in evaluating a model from the conceptual stage through the use stage is illustrated in a multi-year project
conducted by the Organization for Economic Cooperation and Development (OECD). The project sought to develop a
screening model that could be used to assess the persistence and long-range transport potential of chemicals. To
ensure its effectiveness, the screening model needed to be a consensus model that had been evaluated against a
broad set of available models and data.
This project began at a 2001 workshop to set model performance and evaluation goals that would provide the
foundation for subsequent model selection and development (OECD 2002). OECD then established an expert group
in 2002. This group began its work by developing and publishing a guidance document on using multimedia models
to estimate environmental persistence and long-range transport. From 2003 to 2004, the group compared and
assessed the performance of nine available multimedia fate and transport models (Fenner et al. 2005; Klasmeier et
al. 2006). The group then developed a parsimonious consensus model representing the minimum set of key
components identified in the model comparison. They convened three international workshops to disseminate this
consensus model and provide an ongoing model evaluation forum (Scheringer et al. 2006).
In this example, more than half the total effort was invested in the conceptual and model formulation stages, and
much of the effort focused on performance evaluation. The group recognized that each model's life cycle is different,
but noted that attention should be given to developing consensus-based approaches in the model concept and
formulation stages. Conducting concurrent evaluations at these stages in this setting resulted in a high degree of buy-
in from the various modeling groups.

4.2.1 Scientific Peer Review

Peer review provides the main mechanism for independent evaluation and review of environmental
models used by the Agency. Peer review provides an independent, expert review of the evaluation in
Section 4.1; therefore, its purpose is two-fold:

• To evaluate whether the assumptions, methods, and conclusions derived from environmental models
are based on sound scientific principles.
• To check the scientific appropriateness of a model for informing a specific regulatory decision. (The
latter objective is particularly important for secondary applications of existing models.)

Information from peer reviews is also helpful for choosing among multiple competing models for a specific
regulatory application. Finally, peer review is useful to identify the limitations of existing models. Peer
review is not a mechanism to comment on the regulatory decisions or policies that are informed by
models (EPA 2000c).

Peer review charge questions and corresponding records for peer reviewers to answer those questions
should be incorporated into the quality assurance project plan, developed during assessment planning
(see Section 4.2.2, below). For example, peer reviews may focus on whether a model meets the
objectives or specifications that were set as part of the quality assurance plan (see EPA 2002b) (see
Section 3.1).
23
-------
All models that inform significant2 regulatory decisions are candidates for peer review (EPA 2000c, 1993)
for several reasons:

• Model results will be used as a basis for major regulatory or policy/guidance decision making.
• These decisions likely involve significant investment of Agency resources.
• These decisions may have inter-Agency or cross-agency implications/applicability.

Existing guidance recommends that a new model should be scientifically peer-reviewed prior to its first
application; for subsequent applications, the program manager should consider the scientific/technical
complexity and/or the novelty of the particular circumstances to determine whether additional peer review
is needed (EPA 1993). To conserve resources, peer review of "similar" applications should be avoided.

Models used for secondary applications (existing EPA models or proprietary models) will generally
undergo a different type of evaluation than those developed with a specific regulatory information need in
mind. Specifically, these reviews may deal more with uncertainty about the appropriate application of a
model to a specific set of conditions than with the science underlying the model framework. For example,
a project team decides to assess a water quality problem using WASP, a well-established water quality
model framework. The project team determines that peer review of the model framework itself is not
necessary, and the team instead conducts a peer review on their specific application of the WASP
framework.

The following aspects of a model should be peer-reviewed to establish scientific credibility (SAB 1993a,
EPA 1993):

• Appropriateness of input data.
• Appropriateness of boundary condition specifications.
• Documentation of inputs and assumptions.
• Applicability and appropriateness of selected parameter values.
• Documentation and justification for adjusting model inputs to improve model performance
(calibration).
• Model application with respect to the range of its validity.
• Supporting empirical data that strengthen or contradict the conclusions that are based on model
results.

To be most effective and maximize its value, external peer review should begin as early in the model
development phase as possible (EPA 2000b). Because peer review involves significant time and
resources, these allocations must be incorporated into components of the project planning and any
Executive Order 12866 (58 FR 51735) requires federal agencies to determine whether a regulatory action is
"significant" and therefore, subject to the requirements of the Executive Order, including review by the Office of
Management and Budget. The Order defines "significant regulatory action" as one "that is likely to result in a rule
that may: (1) Have an annual effect on the economy of $100 million or more or adversely affect in a material way
the economy, a sector of the economy, productivity, competition, jobs, the environment, public health or safety, or
State, local, or tribal governments or communities; (2) Create a serious inconsistency or otherwise interfere with an
action taken or planned by another agency; (3) Materially alter the budgetary impacts of entitlements, grants, user
fees, or loan programs or the rights and obligations of recipients thereof; or (4) Raise novel legal or policy issues
arising out of legal mandates, the President's priorities, or the principles set forth in [the] Order." Section 2(f).
24
-------
related contracts. Peer review in the early stages of model development can help evaluate the
conceptual basis of models and potentially save time by redirecting misguided initiatives, identifying
alternative approaches, or providing strong technical support for a potentially controversial position (SAB
1993a, EPA 1993). Peer review in the later stages of model development is useful as an independent
external review of model code (i.e., model verification). External peer review of the applicability of a
model to a particular set of conditions should be considered well in advance of any decision making, as it
helps avoid inappropriate applications of a model for specific regulatory purposes (EPA 1993).

The peer review logistics are left to the discretion of the managers responsible for applying the model
results to decision making. Mechanisms for accomplishing external peer review include (but are not
limited to):

• Using an ad hoc panel of scientists.3
• Using an established external peer review mechanism such as the SAB
• Holding a technical workshop.4

Several sources provide guidance for determining the qualifications and number of reviewers needed for
a given modeling project (SAB 1993a; EPA 2000c, 1993, 1994a). Key aspects are summarized in
Appendix D of this guidance.

4.2.2 Quality Assurance Project Planning and Data Quality Assessment

Like peer review, data quality assessment addresses whether a model has been developed according to
the principles of sound science. While some variability in data is unavoidable (see Section 4.2.3.1),
adhering to the tenets of data quality assessment described in other Agency guidance5 (Appendix D, Box
D2: Quality Assurance Planning and Data Acceptance Criteria) helps minimize data uncertainty.

Well-executed QA project planning also helps ensure that a model performs the specified task, which
addresses the fourth model evaluation question posed in Section 4.1. As discussed above, evaluating
the degree to which a modeling project has met QA objectives is often a function of the external peer
review process. The Guidance for Quality Assurance Project Plans for Modeling (EPA 2002b) provides
general information about how to document quality assurance planning for modeling (e.g., specifications
3 The formation and use of an ad hoc panel of peer reviewers may be subject to the Federal Advisory Committee Act
(FACA). Compliance with FACA's requirements is summarized in Chapter Two of the Peer Review Handbook,
"Planning a Peer Review" (EPA 2000c). Guidance on compliance with FACA may be sought from the Office of
Cooperative Environmental Management. Legal questions regarding FACA may be addressed to the Cross-Cutting
Issues Law Office in the Office of General Counsel.
4 Note that a technical workshop held for peer review purposes is not subject to FACA if the reviewers provide
individual opinions. [Note that there is no "one time meeting" exemption from FACA. The courts have held that
even a single meeting can be subject to FACA.] An attempt to obtain group advice, whether it be consensus or
majority-minority views, likely would trigger FACA requirements.
5 Other guidance that can help ensure the quality of data used in modeling projects includes:
• Guidance for the Data Quality Objectives Process, a systematic planning process for environmental data
collection (EPA 2000a).
• Guidance on Choosing a Sampling Design for Environmental Data Collection, on applying statistical
sampling designs to environmental applications (EPA 2002c).
• Guidance for Data Quality Assessment: Practical Methods for Data Analysis, to evaluate the extent to
which data can be used for a specific purpose (EPA 2000b).
25
-------
or assessment criteria development, assessments of various stages of the modeling process; reports to
management as feedback for corrective action; and finally the process for acceptance, rejection, or
qualification of the output for use) to conform with EPA policy and acquisition regulations. Data quality
assessments are a key component of the QA plan for models.

Both the quality and quantity (representativeness) of supporting data used to parameterize and (when
available) corroborate models should be assessed during all relevant stages of a modeling project. Such
assessments are needed to evaluate whether the available data are sufficient to support the choice of the
model to be applied (question 2, Section 4.1), and to ensure that the data are sufficiently representative of
the true system being modeled to provide meaningful comparison to observational data (question 3,
Section 4.1).

4.2.3 Corroboration, Sensitivity Analysis, and Uncertainty Analysis

The question "How closely does the model approximate the real system of interest?" is unlikely to have a
simple answer. In general, answering this question is not simply a matter of comparing model results and
empirical data. As noted in Section 3.1, when developing and using an environmental model, modelers
and decision makers should consider what degree of uncertainty is acceptable within the context of a
specific model application. To do this, they will need to understand the uncertainties underlying the
model. This section discusses three approaches to gaining this understanding:

• Model corroboration (Section 4.2.3.2), which includes all quantitative and qualitative methods for
evaluating the degree to which a model corresponds to reality.
• Sensitivity analysis (Section 4.2.3.3), which involves studying how changes in a model's input values
or assumptions affect its output or response.
• Uncertainty analysis (Section 4.2.3.3), which investigates how a model might be affected by the lack
of knowledge about a certain population or the real value of model parameters.

Where practical, the recommended analyses should be conducted and their results reported in the
documentation supporting the model. Section 4.2.3.1 describes and defines the various types of
uncertainty, and associated concepts, inherent in the modeling process that model corroboration and
sensitivity and uncertainty analysis can help assess.

4.2.3.1 Types of Uncertainty

Uncertainties are inherent in all aspects of the modeling process. Identifying those uncertainties that
significantly influence model outcomes (either qualitatively or quantitatively) and communicating their
importance is key to successfully integrating information from models into the decision making process.
As defined in Chapter 3, uncertainty is the term used in this guidance to describe incomplete knowledge
about specific factors, parameters (inputs), or models. For organizational simplicity, uncertainties that
affect model quality are categorized in this guidance as:

• Model framework uncertainty, resulting from incomplete knowledge about factors that control the
behavior of the system being modeled; limitations in spatial or temporal resolution; and simplifications
of the system.
26
-------
• Model input uncertainty, resulting from data measurement errors, inconsistencies between
measured values and those used by the model (e.g., in their level of aggregation/averaging), and
parameter value uncertainty.
• Model niche uncertainty, resulting from the use of a model outside the system for which it was
originally developed and/or developing a larger model from several existing models with different
spatial or temporal scales.

Box 7: Example of Model Input Uncertainty
The NRC's Models in Environmental Regulatory Decision Making provides a detailed example, summarized below, of
the effect of model input uncertainty on policy decisions.
The formation of ozone in the lower atmosphere (troposphere) is an exceedingly complex chemical process that
involves the interaction of oxides of nitrogen (NOX), volatile organic compounds (VOCs), sunlight, and dynamic
atmospheric processes. The basic chemistry of ozone formation was known in the early 1960s (Leighton 1961).
Reduction of ozone concentrations generally requires controlling either or both NOX and VOC emissions. Due to the
nonlinearity of atmospheric chemistry, selection of the emission-control strategy traditionally relied on air quality
models.
One of the first attempts to include the complexity of atmospheric ozone chemistry in the decision making process
was a simple observation-based model, the so-called Appendix J curve (36 Fed. Reg. 8166 [1971]). The curve was
used to indicate the percentage VOC emission reduction required to attain the ozone standard in an urban area
based on peak concentration of photochemical oxidants observed in that area. Reliable NOX data were virtually
nonexistent at the time; Appendix J was based on data from measurements of ozone and VOC concentrations from
six U.S. cities. The Appendix J curve was based on the hypothesis that reducing VOC emissions was the most
effective emission-control path, and this conceptual model helped define legislative mandates enacted by Congress
that emphasized controlling these emissions.
The choice in the 1970s to concentrate on VOC controls was supported by early results from models. Though new
results in the 1980s showed higher-than-expected biogenic VOC emissions, EPA continued to emphasize VOC
controls, in part because the schedule that Congress and EPA set for attaining the ozone ambient air quality
standards was not conducive to reflecting on the basic elements of the science (Dennis 2002).
VOC reductions from the early 1970s to the early 1990s had little effect on ozone concentrations. Regional ozone
models developed in the 1980s and 1990s suggested that controlling NOX emissions was necessary in addition to, or
instead of, controlling VOCs to reduce ozone concentrations (NRC 1991). The shift in the 1990s toward regulatory
activities focusing on NOX controls was partly due to the realization that historical estimates of emissions and the
effectiveness of various control strategies in reducing emissions were not accurate. In other words, ozone
concentrations had not been reduced as much as hoped over the past three decades, in part because emissions of
some pollutants were much higher than originally estimated.
Regulations may go forward before science and models are perfected because of the desire to mitigate the potential
harm from environmental hazards. In the case of ozone modeling, the model inputs (emissions inventories in this
case) are often more important than the model science (description of atmospheric transport and chemistry in this
case) and require as careful an evaluation as the evaluation of the model. These factors point to the potential
synergistic role that measurements play in model development and application.

In reality, all three categories are interrelated. Uncertainty in the underlying model structure or model
framework uncertainty is the result of incomplete scientific data or lack of knowledge about the factors
27
-------
that control the behavior of the system being modeled. Model framework uncertainty can also be the
result of simplifications needed to translate the conceptual model into mathematical terms as described in
Section 3.3. In the scientific literature, this type of uncertainty is also referred to as structural error (Beck
1987), conceptual errors (Konikow and Bredehoeft 1992), uncertainties in the conceptual model (Usunoff
et al. 1992), or model error/uncertainty (EPA 1997; Luis and Mclaughlin 1992). Structural error relates to
the mathematical construction of the algorithms that make up a model, while the conceptual model refers
to the science underlying a model's governing equations. The terms "model error" and "model
uncertainty" are both generally synonymous with model framework uncertainty.

Many models are developed iteratively to update their underlying science and resolve existing model
framework uncertainty as new information becomes available. Models with long lives may undergo
important changes from version to version. The MOBILE model for estimating atmospheric vehicle
emissions, the CMAQ (Community Multi-scale Air Quality) model, and the QUAL2 water quality models
are examples of models that have had multiple versions and major scientific modifications and extensions
in over two decades of their existence (Scheffe and Morris 1993; Barnwell et al. 2004; EPA 1999c, as
cited in NRC 2007).

When an appropriate model framework has been developed, the model itself may still be highly uncertain
if the input data or database used to construct the application tool is not of sufficient quality. The quality
of empirical data used for both model parameterization and corroboration tests is affected by both
uncertainty and variability. This guidance uses the term "data uncertainty" to refer to the uncertainty
caused by measurement errors, analytical imprecision, and limited sample sizes during data collection
and treatment.

In contrast to data uncertainty, variability results from the inherent randomness of certain parameters,
which in turn results from the heterogeneity and diversity in environmental processes. Examples of
variability include fluctuations in ecological conditions, differences in habitat, and genetic variances
among populations (EPA 1997). Variability in model parameters is largely dependent on the extent to
which input data have been aggregated (both spatially and temporally). Data uncertainty is sometimes
referred to as reducible uncertainty because it can be minimized with further study (EPA 1997).
Accordingly, variability is referred to as irreducible because it can be better characterized and
represented but not reduced with further study (EPA 1997).

A model's application niche is the set of conditions under which use of the model is scientifically
defensible (EPA 1994b). Application niche uncertainty is therefore a function of the appropriateness of a
model for use under a specific set of conditions. Application niche uncertainty is particularly important
when (a) choosing among existing models for an application that lies outside the system for which the
models were originally developed and/or (b) developing a larger model from several existing models with
different spatial or temporal scales (Levins 1992).

The SAB's review of MMSOILS (Multimedia Contaminant Fate, Transport and Exposure Model) provides
a good example of application niche uncertainty. The SAB questioned the adequacy of using a screening-
level model to characterize situations where there is substantial subsurface heterogeneity or where non-
aqueous phase contaminants are present (conditions differ from default values) (SAB 1993b). The SAB
considered the MMSOILS model acceptable within its original application niche, but unsuitable for more
heterogeneous conditions.
28
-------
4.2.3.2 Model Corroboration

The interdependence of models and measurements is complex and iterative for several reasons.
Measurements help to provide the conceptual basis of a model and inform model development,
including parameter estimation. Measurements are also a critical tool for corroborating model
results. Once developed, models can derive priorities for measurements that ultimately get used
in modifying existing models or in developing new ones. Measurement and model activities are
often conducted in isolation... Although environmental data systems serve a range of purposes,
including compliance assessment, monitoring of trends in indicators, and basic research
performance, the importance of models in the regulatory process requires measurements and
models to be better integrated. Adaptive strategies that rely on iterations of measurements and
modeling, such as those discussed in the 2003 NRC report titled Adaptive Monitoring and
Assessment for the Comprehensive Everglades Restoration Plan, provide examples of how
improved coordination might be achieved.
— NRC Committee on Models in the Regulatory Decision Process (NRC 2007)

Model corroboration includes all quantitative and qualitative methods for evaluating the degree to which a
model corresponds to reality. The rigor of these methods varies depending on the type and purpose of
the model application. Quantitative model corroboration uses statistics to estimate how closely the model
results match measurements made in the real system. Qualitative corroboration activities may include
expert elicitation to obtain beliefs about a system's behavior in a data-poor situation. These corroboration
activities may move model forecasts toward consensus.

For newly developed model frameworks or untested mathematical processes, formal corroboration
procedures may be appropriate. Formal corroboration may involve formulation of hypothesis tests for
model acceptance, tests on datasets independent of the calibration dataset, and quantitative testing
criteria. In many cases, collecting independent datasets for formal model corroboration is extremely
costly or otherwise unfeasible. In such circumstances, model evaluation may be appropriately conducted
using a combination of other evaluation tools discussed in this section.

Robustness is the capacity of a model to perform equally well across the full range of environmental
conditions for which it was designed (Reckhow 1994; Borsuk et al. 2002). The degree of similarity among
datasets available for calibration and corroboration provides insight into a model's robustness. For
example, if the dataset used to corroborate a model is identical or statistically similar to the dataset used
to calibrate the model, then the corroboration exercise has provided neither an independent measure of
the model's performance nor insight into the model's robustness. Conversely, when corroboration data
are significantly different from calibration data, the corroboration exercise provides a measure of both
model performance and robustness.

Quantitative model corroboration methods are recommended for choosing among multiple models that
are available for the same application. In such cases, models may be ranked on the basis of their
statistical performance in comparison to the observational data (e.g., EPA 1992). EPA's Office of Air and
Radiation evaluates models in this manner. When a single model is found to perform better than others in
a given category, OAR recommends it in the Guidelines on Air Quality Models as a preferred model for
29
-------
application in that category (EPA 2003a). If models perform similarly, then the preferred model is selected
based on other factors, such as past use, public familiarity, cost or resource requirements, and
availability.

Box 8: Example: Comparing Results from Models of Varying Complexity
(From Box 5-4 in NRC's Models in Environmental Regulatory Decision Making)
The Clean Air Mercury Rule6 requires industry to reduce mercury emissions from coal-fired power plants. A potential
benefit is the reduced human exposure and related health impacts from methylmercury that may result from reduced
concentrations of this toxin in fish. Many challenges and uncertainties affect assessment of this benefit. In its
assessment of the benefits and costs of this rule, EPA used multiple models to examine how changes in atmospheric
deposition would affect mercury concentrations in fish, and applied the models to assess some of the uncertainties
associated with the model results (EPA 2005).
EPA based its national-scale benefits assessment on results from the mercury maps (MMaps) model. This model
assumes a linear, steady-state relationship between atmospheric deposition of mercury and mercury concentrations
in fish, and thus assumes that a 50% reduction in mercury deposition rates results in a 50% decrease in fish mercury
concentrations. In addition, MMaps assumes instantaneous adjustment of aquatic systems and their ecosystems to
changes in deposition — that is, no time lag in the conversion of mercury to methylmercury and its bioaccumulation in
fish. MMaps also does not deal with sources of mercury other than those from atmospheric deposition. Despite those
limitations, the Agency concluded that no other available model was capable of performing a national-scale
assessment.
To further investigate fish mercury concentrations and to assess the effects of MMaps' assumptions, EPA applied
more detailed models, including the spreadsheet-based ecological risk assessment for the fate of mercury (SERAFM)
model, to five well-characterized ecosystems. Unlike the steady-state MMaps model, SERAFM is a dynamic model
which calculates the temporal response of mercury concentrations in fish tissues to changes in mercury loading. It
includes multiple land-use types for representing watershed loadings of mercury through soil erosion and runoff.
SERAFM partitions mercury among multiple compartments and phases, including aqueous phase, abiotic participles
(for example, silts), and biotic particles (for example, phytoplankton). Comparisons of SERAFM's predictions with
observed fish mercury concentrations for a single fish species in four ecosystems showed that the model under-
predicted mean concentrations for one water body, over-predicted mean concentrations for a second water body, and
accurately predicted mean concentrations for the other two. The error bars for the observed fish mercury
concentrations in these four ecosystems were large, making it difficult to assess the models' accuracy. Modeling the
four ecosystems also showed how the assumed physical and chemical characteristics of the specific ecosystem
affected absolute fish mercury concentrations and the length of time before fish mercury concentrations reached
steady state.
Although EPA concluded that the best available science supports the assumption of a linear relationship between
atmospheric deposition and fish mercury concentrations for broad-scale use, the more detailed ecosystem modeling
demonstrated that individual ecosystems were highly sensitive to uncertainties in model parameters. The Agency
also noted that many of the model uncertainties could not be quantified. Although the case studies covered the bulk
of the key environmental characteristics, EPA found that extrapolating the individual ecosystem case studies to
account for the variability in ecosystems across the country indicated that those case studies might not represent
extreme conditions that could influence how atmospheric mercury deposition affected fish mercury concentrations in
6 On February 8, 2008, the U.S. Court of Appeals for the District of Columbia Circuit vacated the Clean Air
Mercury Rule. The DC Circuit's vacatur of this rule was unrelated to the modeling conducted in support of the rule.
30
-------
a water body.
This example illustrates the usefulness of investigating a variety of models at varying levels of complexity. A
hierarchical modeling approach, such as that used in the mercury analysis, can provide justification for simplified
model assumptions or potentially provide evidence for a consistent bias that would negate the assumption that a
simple model is appropriate for broad-scale application.

4.2.3.3 Sensitivity and Uncertainty Analysis

Sensitivity analysis is the study of how a model's response can be apportioned to changes in model
inputs (Saltelli et al. 2000a). Sensitivity analysis is recommended as the principal evaluation tool for
characterizing the most and least important sources of uncertainty in environmental models.

Uncertainty analysis investigates the lack of knowledge about a certain population or the real value of
model parameters. Uncertainty can sometimes be reduced through further study and by collecting
additional data. EPA guidance (e.g., EPA 1997) distinguishes uncertainty analysis from methods used to
account for variability in input data and model parameters. As mentioned earlier, variability in model
parameters and input data can be better characterized through further study but is usually not reducible
(EPA 1997).

Although sensitivity and uncertainty analysis are closely related, sensitivity is algorithm-specific with
respect to model "variables" and uncertainty is parameter-specific. Sensitivity analysis assesses the
"sensitivity" of the model to specific parameters and uncertainty analysis assesses the "uncertainty"
associated with parameter values. Both types of analyses are important to understand the degree of
confidence a user can place in the model results. Recommended techniques for conducting uncertainty
and sensitivity analysis are discussed in Appendix D.

The NRC committee pointed out that uncertainty analysis for regulatory environmental modeling involves
not only analyzing uncertainty, but also communicating the uncertainties to policy makers. To facilitate
communication of model uncertainty, the committee recommends using hybrid approaches in which
unknown quantities are treated probabilistically and explored in scenario-assessment mode by decision
makers through a range of plausible values. The committee further acknowledges (NRC 2007) that:
Effective uncertainty communication requires a high level of interaction with the relevant decision
makers to ensure that they have the necessary information about the nature and sources of
uncertainty and their consequences. Thus, performing uncertainty analysis for environmental
regulatory activities requires extensive discussion between analysts and decision makers.

4.3 Evaluating Proprietary Models

This guidance defines proprietary models as those computer models for which the source code is not
universally shared. To promote the transparency with which decisions are made, EPA prefers using non-
proprietary models when available. However, the Agency acknowledges there will be times when the use
of proprietary models provides the most reliable and best-accepted characterization of a system.
31
-------
When a proprietary model is used, its use should be accompanied by comprehensive, publicly available
documentation. This documentation should describe:

• The conceptual model and the theoretical basis (as described in Section 3.3.1) for the model.
• The techniques and procedures used to verify that the proprietary model is free from numerical
problems or "bugs" and that it truly represents the conceptual model (as described in Section
3.3.3).
• The process used to evaluate the model (as described in Section 4.2) and the basis for
concluding that the model and its analytical results are of a quality sufficient to serve as the basis
fora decision (as described in Section 4.1).
• To the extent practicable, access to input and output data such that third parties can replicate the
model results.

4.4 Learning From Prior Experiences — Retrospective Analyses of Models

The NRC Committee on Models in the Regulatory Decision Process emphasized that the final issue in
managing the model evaluation process is the learning that comes from examining prior modeling
experiences. Retrospective analysis of models is important to individual models and regulatory policies
and to systematically enhance the overall modeling field. The committee pointed out that retrospective
analyses can be considered from various perspectives:

• They can investigate the systematic strengths and weaknesses that are characteristic of broad
classes of models — for example, models of ground water flow, surface water, air pollution, and
health risks assessment. For example, a researcher estimated that in 20 to 30 percent of ground
water modeling efforts, surprising occurrences indicated that the conceptual model underlying the
computer model was invalid (Bredehoeft 2003, 2005, in NRC 2007).

• They can study the processes (for example, approaches to model development and evaluation) that
lead to successful model applications.

• They can examine models that have been in use for years to determine how well they work. Ongoing
evaluation of the model against data, especially data taken under novel conditions, offers the best
chance to identify and correct conceptual errors. This type of analysis is referred to as a model "post-
audit" (see Section 5.5)

The results of retrospective evaluations of individual models and model classes can be used to identify
priorities for improving models.
32
-------
Box 9: Example of a Retrospective Model Analysis at EPA
(From Box 4-6 in NRC's Models in Environmental Regulatory Decision Making)
EPA's Model Evaluation and Applications Research Branch has been performing a retrospective analysis of the
CMAQ model's ability to simulate the change in a pollutant associated with a known change in emissions (A. Gilliland,
EPA, personal commun., May 19, 2006 and March 5, 2007). This study, which EPA terms a "dynamic evaluation"
study, focuses on a rule issue by EPA in 1998 that required 22 states and the District of Columbia to submit State
Implementation Plans providing NOX emission reductions to mitigate ozone transport in the eastern United States.
This rule, known as the NOX SIP Call, requires emission reductions from the utility sector and large industrial boilers
in the eastern and midwestern United States by 2004. Since theses sources are equipped with continuous emission
monitoring systems, the NOX SIP call represents a special opportunity to directly measure the emission changes and
incorporate them into model simulations with reasonable confidence.
Air quality model simulations were developed for the summers of 2002 and 2004 using the CMAQ model, and the
resulting ozone predictions were compared to observed ozone concentrations. Two series of CMAQ simulations were
developed to test two different chemical mechanisms in CMAQ. This allowed an evaluation of the uncertainty
associated with the model's representation of chemistry. Since the model's prediction of the relative change in
pollutant concentrations provides input for regulatory decision making, this type of dynamic evaluations is particularly
relevant to how the model is used.

4.5 Documenting the Model Evaluation

In its Models in Environmental Regulatory Decision Making report, the NRC summarizes the key
elements of a model evaluation (NRC 2007). This list provides a useful framework for documenting the
results of model evaluation as the various elements are conducted during model development and
application:

• Scientific basis. The scientific theories that form the basis for models.
• Computational infrastructure. The mathematical algorithms and approaches used in executing the
model computations.
• Assumptions and limitations. The detailing of important assumptions used in developing or
applying a computational model, as well as the resulting limitations that will affect the model's
applicability.
• Peer review. The documented critical review of a model or its application conducted by qualified
individuals who are independent of those who performed the work, but who collectively have at least
equivalent technical expertise to those who performed the original work. Peer review attempts to
ensure that the model is technically adequate, competently performed, properly documented, and
satisfies established quality requirements through the review of assumptions, calculations,
extrapolations, alternate interpretations, methodology, acceptance criteria, and/or conclusions
pertaining from a model or its application (modified from EPA 2006).
• Quality assurance and quality control (QA/QC). A system of management activities involving
planning, implementation, documentation, assessment, reporting, and improvement to ensure that a
model and its components are of the type needed and expected for its task and that they meet all
required performance standards.
• Data availability and quality. The availability and quality of monitoring and laboratory data that can
be used for both developing model input parameters and assessing model results.
33
-------
• Test cases. Basic model runs where an analytical solution is available or an empirical solution is
known with a high degree of confidence to ensure that algorithms and computational processes are
implemented correctly.
• Corroboration of model results with observations. Comparison of model results with data
collected in the field or laboratory to assess the model's accuracy and improve its performance.
• Benchmarking against other models. Comparison of model results with other similar models.
• Sensitivity and uncertainty analysis. Investigation of the parameters or processes that drive model
results, as well as the effects of lack of knowledge and other potential sources of error in the model.
• Model resolution capabilities. The level of disaggregation of processes and results in the model
compared to the resolution needs from the problem statement or model application. The resolution
includes the level of spatial, temporal, demographic, or other types of disaggregation.
• Transparency. The need for individuals and groups outside modeling activities to comprehend either
the processes followed in evaluation or the essential workings of the model and its outputs.

4.6 Deciding Whether to Accept the Model for Use in Decision Making

The model development and evaluation process culminates in a decision to accept (or not accept) the
model for use in decision making. This decision is made by the program manager charged with making
regulatory decisions, in consultation with the model developers and project team. It should be informed
by good communication of the key findings of the model evaluation process, including the critical issue of
uncertainty. The project team should gain model acceptance before applying the model to decision
making to avoid confusion and potential re-work.
34
-------
5.1 Introduction

Once a model has been accepted for use by decision makers, it is applied to the problem that was
identified in the first stages of the modeling process. Model application commonly involves a shift from
the hindcasting (testing the model against past observed conditions) used in the model development and
evaluation phases to forecasting (predicting a future change) in the application phase. This may involve a
collaborative effort between modelers and program staff to devise management scenarios that represent
different regulatory alternatives. Some model applications may entail trial-and-error model simulations,
where model inputs are changed iteratively until a desired environmental condition is achieved.

Using a model in a proposed decision requires that the model application be transparently incorporated
into the public process. This is accomplished by providing written documentation of the model's relevant
characteristics in a style and format accessible to the interested public, and by sharing specific model files
and data with external parties, such as technical consultants and university scientists, upon request. This
chapter presents best practices and other recommendations for integrating the results of environmental
models into Agency decisions. Section 5.2 describes how to achieve and document a transparent
modeling process, Section 5.3 reviews situations when use of multiple models may be appropriate, and
Section 5.4 discusses the use of post-audits to determine whether the actual system response concurs
with that predicted by the model.
Box 10: Examples of Major EPA Documents That Incorporate a Substantial Amount of Computational
Modeling Activities
(From Table 2-2 in NRC's Models in Environmental Regulatory Decision Making)

Air Quality
Criteria Documents and Staff Paper for Establishing NAAQS
Summarize and assess exposures and health impacts for the criteria air pollutants (ozone, particulate matter, carbon
monoxide, lead, nitrogen dioxide, and sulfur dioxide). Criteria documents include results from exposure and health
modeling studies, focusing on describing exposure-response relationships. For example, the particulate matter
criteria document placed emphasis on epidemiological models of morbidity and mortality (EPA 2004c). The Staff
Paper takes this scientific foundation a step further by identifying the crucial health information and using exposure
modeling to characterize risks that serve as the basis for the staff recommendation of the standards to the EPA
Administrator. For example, models ofthe number of children exercising outdoors during those parts of the day when
ozone is elevated had a major influence on decisions about the 8-hour ozone national ambient air quality standard
(EPA 1996).
State Implementation Plan (SIP) Amendments
A detailed description of the scientific methods and emissions reduction programs a state will use to carry out its
responsibilities under the CAA for complying with NAAQS. A SIP typically relies on results from activity, emissions,
and air quality modeling. Model-generated emissions inventories serve as input to regional air quality models and are
used to test alternative emission-reduction schemes to see whether they will result in air quality standards being met
(e.g., ADEC 2001; TCEQ 2004). Regional-scale modeling has become part of developing state implementation plans
35
-------
for the new 8-hour ozone and fine participate matter standards. States, local governments, and their consultants do
this analysis.
Regulatory Impact Assessments (RIAs) for Air Quality Rules
RIAs for air quality regulations document the costs and benefits of major emission control regulations. Recent RIAs
have included emissions, air quality, exposure, and health and economic impacts modeling results (e.g., EPA2004b)

Water Regulations
Total Maximum Daily Load (TMDL) Determinations
For each impaired water body, a TMDL identifies (a) the water quality standard that is not being attained and the
pollutant causing the impairment (b) and the total loading of the pollutant that the water may receive and still meet the
water quality standard and (c) allocates that total loading among the point and nonpoint sources of the pollutant
discharging to the water. Establishment of TMDLs may utilize water quality and/or nutrient loading models. States
establish most TMDLs and therefore state and their consultants can be expected to do the majority of this modeling,
with EPA occasionally doing the modeling for particularly contentious TMDLs (EPA 2002b; George 2004; Shoemaker
2004; Wool 2004).
Leaking Underground Storage Tank Program
Assesses the potential risks associated with leaking underground gasoline storage tanks. At an initial screening
level, it may assess one-dimensional transport of a conservative contaminant using an analytical model (Weaver
2004).
Development of Maximum Contaminant Levels for Drinking Water
Assess drinking water standards for public water supply systems. Such assessments can include exposure,
epidemiology, and dose-response modeling (EPA2002c; NRC2001b, 2005b).

Pesticides and Toxic Substances Program
Pre-manufacturinq Notice Decisions
Assess risks associated with new manufactured chemicals entering the market. Most chemicals are screened initially
as to their environmental and human health risks using structure-activity relationship models.
Pesticide Reassessments
Requires that all existing pesticides undergo a reassessment based on cumulative (from multiple pesticides) and
aggregate (exposure from multiple pathways) health risk. This includes the use of pesticide exposure models.

Solid and Hazardous Wastes Regulations
Superfund Site Decision Documents
Includes the remedial investigation, feasibility study, proposed plan, and record-of-decision documents that address
the characteristics and cleanup of Superfund sites. For many hazardous waste sites, a primary modeling task is
using groundwater modeling to assess movement of toxic substances through the substrate (Burden 2004). The
remedial investigation for a mining megasite might include water quality, environmental chemistry, human health risk,
and ecological risk assessment modeling (NRC 2005a).

Human Health Risk Assessment
Benchmark Dose (BMP) Technical Guidance Document
EPA relies on both laboratory animal and epidemiological studies to assess the noncancer effects of chronic
exposure to pollutants (that is, the reference dose [RfD] and the inhalation reference concentration, [RfC]). These
data are modeled to estimate the human dose-response. EPA recommends the use of BMD modeling, which
essentially fits the experimental data to use as much of the available data as possible (EPA 2000).
36
-------
Ecological Risk Assessment

The ecological risk assessment guidelines provide general principles and give examples to show how ecological risk
assessment can be applied to a wide range of systems, stressors, and biological, spatial, and temporal scales. They
describe the strengths and limitations of alternative approaches and emphasize processes and approaches for
analyzing data rather than specifying data collection techniques, methods or models (EPA 1998).
5.2 Transparency

The objective of transparency is to enable communication between modelers, decision makers, and the
public. Model transparency is achieved when the modeling processes are documented with clarity and
completeness at an appropriate level of detail. When models are transparent, they can be used
reasonably and effectively in a regulatory decision.

5.2.1 Documentation

Documentation enables decision makers and other model users to understand the process by which a
model was developed and used. During model development and use, many choices must be made and
options selected that may bias the model results. Documenting this process and its limitations and
uncertainties is essential to increase the utility and acceptability of the model outcomes. Modelers and
project teams should document all relevant information about the model to the extent practicable,
particularly when a controversial decision is involved. In legal proceedings, the quality and thoroughness
of the model's written documentation and the Agency's responses to peer review and public comments
on the model can affect the outcome of the legal challenge.

The documentation should include a clear explanation of the model's relationship to the scenario of the
particular application. This explanation should describe the limitations of the available information when
applied to other scenarios. Disclosure about the state of science used in a model and future plans to
update the model can help establish a record of reasoned, evidence-based application to inform
decisions. For example, EPA successfully defended a challenge to a model used in its TMDL program
when it explained that it was basing its decision on the best available scientific information and that it
intended to refine its model as better information surfaced.7

When a court reviews EPA modeling decisions, they generally give some deference to EPA's technical
expertise, unless it is without substantial basis in fact. As discussed in Section 4.2.3 regarding
corroboration, deviations from empirical observations are to be expected. In substantive legal disputes,
the courts generally examine the record supporting EPA's decisions for justification as to why the model
was reasonable.8 The record should contain not only model development, evaluation, and application but
also the Agency's responses to comments on the model raised during peer review and the public
process. The organization of this guidance document offers a general outline for model documentation.
Box 11 provides a more detailed outline. These elements are adapted from EPA Region 10's standard
practices for modeling projects.
7 Natural Resources Defense Council v. Muszynski, 268 F.3d 91 (2d Cir. 2001).
8 American Iron and Steel Inst. v. EPA, 115 F.3d 979 (D.C. Cir. 1997).
37
-------
Box 11: Recommended Elements for Model Documentation

1. Management Objectives
• Scope of problem
• Technical objectives that result from management objectives
• Level of analysis needed
• Level of confidence needed

2. Conceptual Model
• System boundaries (spatial and temporal domain)
• Important time and length scales
• Key processes
• System characteristics
• Source description
• Available data sources (quality and quantity)
• Data gaps
• Data collection programs (quality and quantity)
• Mathematical model
• Important assumptions

3. Choice of Technical Approach
• Rationale for approach in context of management objectives and conceptual model
• Reliability and acceptability of approach
• Important assumptions

4. Parameter Estimation
• Data used for parameter estimation
• Rationale for estimates in the absence of data
• Reliability of parameter estimates

5. Uncertainty/Error
• Error/uncertainty in inputs, initial conditions, and boundary conditions
• Error/uncertainty in pollutant loadings
• Error/uncertainty in specification of environment
• Structural errors in methodology (e.g., effects of aggregation or simplification)

6. Results
• Tables of all parameter values used for analysis
• Tables or graphs of all results used in support of management objectives or conclusions
• Accuracy of results

7. Conclusions of analysis in relationship to management objectives

8. Recommendations for additional analysis, if necessary

Note: The QA project plan for models (EPA 2002b) includes a documentation and records component that also
describes the types of records and level of detailed documentation to be kept depending on the scope and magnitude
of the project.

5.2.2 Effective Communication

The modeling process should effectively communicate uncertainty to anyone interested in the model
results. All technical information should be documented in a manner that decision makers and
stakeholders can readily interpret and understand. Recommendations for improving clarity, adapted from
the Risk Characterization Handbook (EPA 2000d), include the following:

• Be as brief as possible while still providing all necessary details.
38
-------
• Use plain language that modelers, policy makers, and the informed lay person can understand.
• Avoid jargon and excessively technical language. Define specialized terms upon first use.
• Provide the model equations.
• Use clear and appropriate methods to efficiently display mathematical relationships.
• Describe quantitative outputs clearly.
• Use understandable tables and graphics to present technical data (see Morgan and Henrion, 1990,
for suggestions).

The conclusions and other key points of the modeling project should be clearly communicated. The
challenge is to characterize these essentials for decision makers, while also providing them with more
detailed information about the modeling process and its limitations. Decision makers should have
sufficient insight into the model framework and its underlying assumptions to be able to apply model
results appropriately. This is consistent with QA planning practices that assert that all technical reports
must discuss the data quality and any limitations with respect to their intended use (EPA 2000e).

5.3 Application of Multiple Models

As mentioned in earlier chapters, multiple models sometimes apply to a certain decision making need; for
example, several air quality models, each with its own strengths and weaknesses, might be applied for
regulatory purposes. In other situations, stakeholders may use alternative models (developed by industry
and academic researchers) to produce alternative risk assessments (e.g., CARES pesticide exposure
model developed by industry). One approach to address this issue is to use multiple models of varying
complexities to simulate the same phenomena (NRC 2007). This may provide insight into how sensitive
the results are to different modeling choices and how much trust to put in the results from any one model.
Experience has shown that running multiple models can increase confidence in the model results (Manno
et al. 2008) (see Box 8 in Chapter 4 for an example). However, resource limitations or regulatory time
constraints may limit the capacity to fully evaluate all possible models.

5.4 Model Post-Audit

Due to time complexity, constraints, scarcity of resources, and/or lack of scientific understanding,
technical decisions are often based on incomplete information and imperfect models. Further, even if
model developers strive to use the best science available, scientific knowledge and understanding are
continually advancing. Given this reality, decision makers should use model results in the context of an
iterative, ever-improving process of continuous model refinement to demonstrate the accountability of
model-based decisions. This process includes conducting model post-audits to assess and improve a
model and its ability to provide valuable predictions for management decisions. Whereas corroboration
(discussed in Section 4.2.3.2) demonstrates the degree to which a model corresponds to past system
behavior, a model post-audit assesses its ability to model future conditions (Anderson and Woessner
1992).

A model post-audit involves monitoring the modeled system, after implementing a remedial or
management action, to determine whether the actual system response concurs with that predicted by the
model. Post-auditing of all models is not feasible due to resource constraints, but targeted audits of
commonly used models may provide valuable information for improving model frameworks and/or model
parameter estimates. In its review of the TMDL program, the NRC recommended that EPA implement
39
-------
this approach by selectively targeting "some post-implementation TMDL compliance monitoring for
verification data collection to assess model prediction error" (NRC 2001). The post-audit should also
evaluate how effectively the model development and use process engaged decision makers and other
stakeholders (Manno et al. 2008).
40
-------
Appendix A: Glossary of Frequently Used Terms

Accuracy: The closeness of a measured or computed value to its "true" value, where the "true" value is
obtained with perfect information. Due to the natural heterogeneity and stochasticity of many
environmental systems, this "true" value exists as a distribution rather than a discrete value. In these
cases, the "true" value will be a function of spatial and temporal aggregation.

Algorithm: A precise rule (or set of rules) for solving some problem.

Analytical model: A model that can be solved mathematically in terms of analytical functions. For
example, some models that are based on relatively simple differential equations can be solved
analytically by combinations of polynomials, exponential, trigonometric, or other familiar functions.

Applicability and utility: One of EPA's five assessment factors (see definition) that describes the extent
to which the information is relevant for the Agency's intended use (EPA 2003b).

Application niche: The set of conditions under which the use of a model is scientifically defensible. The
identification of application niche is a key step during model development. Peer review should include an
evaluation of application niche. An explicit statement of application niche helps decision makers
understand the limitations of the scientific basis of the model (EPA 1993).

Application niche uncertainty: Uncertainty as to the appropriateness of a model for use under a
specific set of conditions (see "application niche").

Assessment factors: Considerations recommended by EPA for evaluating the quality and relevance of
scientific and technical information. The five assessment factors are soundness, applicability and utility,
clarity and completeness, uncertainty and variability, and evaluation and review (EPA 2003b).

Bias: Systemic deviation between a measured (i.e., observed) or computed value and its "true" value.
Bias is affected by faulty instrument calibration and other measurement errors, systemic errors during
data collection, and sampling errors such as incomplete spatial randomization during the design of
sampling programs.

Boundaries: The spatial and temporal conditions and practical constraints under which environmental
data are collected. Boundaries specify the area or volume (spatial boundary) and the time period
(temporal boundary) to which a model application will apply (EPA 2000a).

Boundary conditions: Sets of values for state variables and their rates along problem domain
boundaries, sufficient to determine the state of the system within the problem domain.

Calibration: The process of adjusting model parameters within physically defensible ranges until the
resulting predictions give the best possible fit to the observed data (EPA 1994b). In some disciplines,
calibration is also referred to as "parameter estimation" (Beck et al. 1994).

Checks: Specific tests in a quality assurance plan that are used to evaluate whether the specifications
(performance criteria) for the project developed at its onset have been met.
41
-------
Clarity and completeness: One of EPA's five assessment factors (see definition) that describes the
degree of clarity and completeness with which the data, assumptions, methods, quality assurance,
sponsoring organizations, and analyses employed to generate the information are documented (EPA
2003b).

Class (see "object-oriented platform"): A set of objects that share a common structure and behavior.
The structure of a class is determined by the class variables, which represent the state of an object of that
class; the behavior is given by the set of methods associated with the class (Booch 1994).

Code: Instructions, written in the syntax of a computer language, that provide the computer with a logical
process. "Code" can also refer to a computer program or subset. The term "code" describes the fact that
computer languages use a different vocabulary and syntax than algorithms that may be written in
standard language.

Code verification: Examination of the algorithms and numerical technique in the computer code to
ascertain that they truly represent the conceptual model and that there are no inherent numerical
problems with obtaining a solution (Beck et al. 1994).

Complexity: The opposite of simplicity. Complex systems tend to have a large number of variables,
multiple parts, and mathematical equations of a higher order, and to be more difficult to solve. Used to
describe computer models, "complexity" generally refers to the level in difficulty in solving mathematically
posed problems as measured by the time, number of steps or arithmetic operations, or memory space
required (called time complexity, computational complexity, and space complexity, respectively).

Computational models: Models that use measurable variables, numerical inputs, and mathematical
relationships to produce quantitative outputs.

Conceptual basis: An underlying scientific foundation of model algorithms or governing equations. The
conceptual basis for a model is either empirical (based on statistical relationships between observations)
or mechanistic (process-based) or a combination. See definitions for "empirical model" and "mechanistic
model."

Conceptual model: A hypothesis regarding the important factors that govern the behavior of an object
or process of interest. This can be an interpretation or working description of the characteristics and
dynamics of a physical system (EPA 1994b).

Confounding error: An error induced by unrecognized effects from variables that are not included in the
model. The unrecognized, uncharacterized nature of these errors makes them more difficult to describe
and account for in statistical analysis of uncertainty (Small and Fishbeck 1999).

Constant: A fixed value (e.g., the speed of light, the gravitational force) representing known physical,
biological, or ecological activities.
42
-------
Corroboration (model): Quantitative and qualitative methods for evaluating the degree to which a model
corresponds to reality. In some disciplines, this process has been referred to as validation. In general,
the term "corroboration" is preferred because it implies a claim of usefulness and not truth.

Data uncertainty: Uncertainty (see definition) that is caused by measurement errors, analytical
imprecision, and limited sample sizes during the collection and treatment of data. Data uncertainty, in
contrast to variability (see definition), is the component of total uncertainty that is "reducible" through
further study.

Debugging: The identification and removal of bugs from computer code. Bugs are errors in computer
code that range from typos to misuse of concepts and equations.

Deterministic model: A model that provides a solution for the state variables rather than a set of
probabilistic outcomes. Because this type of model does not explicitly simulate the effects of data
uncertainty or variability, changes in model outputs are solely due to changes in model components or in
the boundary conditions or initial conditions.

Domain (spatial and temporal): The spatial and temporal domains of a model cover the extent and
resolution with respect to space and time for which the model has been developed and over which it
should be evaluated.

Domain boundaries (spatial and temporal): The limits of space and time that bound a model's domain
and are specified within the boundary conditions (see "boundary conditions").

Dynamic model: A model providing the time-varying behavior of the state variables.

Empirical model: A model whose structure is determined by the observed relationship among
experimental data (Suter 1993). These models can be used to develop relationships that are useful for
forecasting and describing trends in behavior, but they are not necessarily mechanistically relevant.

Environmental data: Information collected directly from measurements, produced from models, and
compiled from other sources such as databases and literature (EPA 2002a).

Evaluation and review: One of EPA's five assessment factors (see definition) that describes the extent
of independent verification, validation, and peer review of the information or of the procedures, measures,
methods, or models (EPA 2003b).

Expert elicitation: A systematic process for quantifying, typically in probabilistic terms, expert judgments
about uncertain quantities. Expert elicitation can be used to characterize uncertainty and fill data gaps
where traditional scientific research is not feasible or data are not yet available. Typically, the necessary
quantities are obtained through structured interviews and/or questionnaires. Procedural steps can be
used to minimize the effects of heuristics and bias in expert judgments.

Extrapolation: Extrapolation is a process that uses assumptions about fundamental causes underlying
the observed phenomena in order to project beyond the range of the data. In general, extrapolation is not
43
-------
considered a reliable process for prediction; however, there are situations where it may be necessary and
useful.

False negative: Also known as a false acceptance decision errors, a false negative occurs when the null
hypothesis or baseline condition cannot be rejected based on the available sample data. The decision is
made assuming the baseline condition is true when in reality it is false (EPA 2000a).

False positive: Also known as a false rejection decision error, a false positive occurs when the null
hypothesis or baseline condition is incorrectly rejected based on the sample data. The decision is made
assuming the alternate condition or hypothesis to be true when in reality it is false (EPA 2000a).

Forcing/driving variable: An external or exogenous (from outside the model framework) factor that
influences the state variables calculated within the model. Such variables include, for example, climatic
or environmental conditions (temperature, wind flow, oceanic circulation, etc.).

Forms (models): Models can be represented and solved in different forms, including analytic, stochastic,
and simulation.

Function: A mathematical relationship between variables.

Graded approach: The process of basing the level of application of managerial controls to an item or
work on the intended use of results and degree of confidence needed in the results (EPA 2002b).

Integrity: One of three main components of quality in EPA's Information Quality Guidelines. "Integrity"
refers to the protection of information from unauthorized access or revision to ensure that it is not
compromised through corruption or falsification (EPA 2002a).

Intrinsic variation: The variability (see definition) or inherent randomness in the real-world processes.

Loading: The rate of release of a constituent of interest to a particular receiving medium.

Measurement error: An error in the observed data caused by human or instrumental error during
collection. Such errors can be independent or random. When a persistent bias or mis-calibration is
present in the measurement device, measurement errors may be correlated among observations (Small
and Fishbeck 1999). In some disciplines, measurement error may be referred to as observation error.

Mechanistic model: A model whose structure explicitly represents an understanding of physical,
chemical, and/or biological processes. Mechanistic models quantitatively describe the relationship
between some phenomenon and underlying first principles of cause. Hence, in theory, they are useful for
inferring solutions outside the domain in which the initial data were collected and used to parameterize
the mechanisms.

Mode (of a model): The manner in which a model operates. Models can be designed to represent
phenomena in different modes. Prognostic (or predictive) models are designed to forecast outcomes and
future events, while diagnostic models work "backwards" to assess causes and precursor conditions.
44
-------
Model: A simplification of reality that is constructed to gain insights into select attributes of a physical,
biological, economic, or social system. A formal representation of the behavior of system processes,
often in mathematical or statistical terms. The basis can also be physical or conceptual (NRC 2007).

Model coding: The process of translating the mathematical equations that constitute the model
framework into a functioning computer program.

Model evaluation: The process used to generate information to determine whether a model and its
results are of a quality sufficient to serve as the basis for a regulatory decision.

Model framework: The system of governing equations, parameterization, and data structures that make
up the mathematical model. The model framework is a formal mathematical specification of the concepts
and procedures of the conceptual model consisting of generalized algorithms (computer code/software)
for different site- or problem-specific simulations (EPA 1994b).

Model framework uncertainty: The uncertainty in the underlying science and algorithms of a model.
Model framework uncertainty is the result of incomplete scientific data or lack of knowledge about the
factors that control the behavior of the system being modeled. Model framework uncertainty can also be
the result of simplifications necessary to translate the conceptual model into mathematical terms.

Module: An independent or self-contained component of a model, which is used in combination with
other components and forms part of one or more larger programs.

Noise: Inherent variability that the model does not characterize (see definition for variability).

Objectivity: One of three main components of quality in EPA's Information Quality Guidelines. It
includes whether disseminated information is being presented in an accurate, clear, complete and
unbiased manner. In addition, objectivity involves a focus on ascertaining accurate, reliable, and unbiased
information (EPA 2002a).

Object-oriented platform: A type of user interface that models systems using a collection of cooperating
"objects." These objects are treated as instances of a class within a class hierarchy

Parameters: Terms in the model that are fixed during a model run or simulation but can be changed in
different runs as a method for conducting sensitivity analysis or to achieve calibration goals.

Parameter uncertainty: Uncertainty (see definition) related to parameter values.

Parametric variation: When the value of a parameter itself is not a constant and includes natural
variability. Consequently, the parameter should be described as a distribution (Shelly et al. 2000).

Perfect information: The state of information where in which there is no uncertainty. The current and
future values for all parameters are known with certainty. The state of perfect information includes
knowledge about the values of parameters with natural variability.
45
-------
Precision: The quality of being reproducible in amount or performance. With models and other forms of
quantitative information, "precision" refers specifically to the number of decimal places to which a number
is computed as a measure of the "preciseness" or "exactness" with which a number is computed.

Probability density function: Mathematical, graphical, or tabular expression of the relative likelihoods
with which an unknown or variable quantity may take various values. The sum (or integral) of all
likelihoods equals 1 for discrete (continous) random variables (Cullen and Frey 1999). These
distributions arise from the fundamental properties of the quantities we are attempting to represent. For
example, quantities formed from adding many uncertain parameters tend to be normally distributed, and
quantities formed from multiplying uncertain quantities tend to be lognormal (Morgan and Henrion 1990).

Program (computer): A set of instructions, written in the syntax of a computer language, that provide
the computer with a step-by-step logical process. Computer programs are also referred to as code.

Qualitative assessment: Some of the uncertainty in model predictions may arise from sources whose
uncertainty cannot be quantified. Examples are uncertainties about the theory underlying the model, the
manner in which that theory is mathematically expressed to represent the environmental components,
and the theory being modeled. The subjective evaluations of experts may be needed to determine
appropriate values for model parameters and inputs that cannot be directly observed or measured (e.g.,
air emissions estimates). Qualitative corroboration activities may involve the elicitation of expert judgment
on the true behavior of the system and agreement with model-forecasted behavior.

Quality: A broad term that includes notions of integrity, utility, and objectivity (EPA 2002a).

Quantitative assessment: The uncertainty in some sources — such as some model parameters and
some input data — can be estimated through quantitative assessments involving statistical uncertainty
and sensitivity analyses. In addition, comparisons can be made for the special purpose of quantitatively
describing the differences to be expected between model estimates of current conditions and comparable
field observations.

Reducible uncertainty: Uncertainty in models that can be minimized or even eliminated with further
study and additional data (EPA 1997). See "data uncertainty."

Quality: A broad term that includes notions of integrity, utility, and objectivity (USEPA 2002a).
Reducible Uncertainty: Uncertainty in models that can be minimized or even eliminated with further study
and additional data (USEPA 1997). See data uncertainty.

Reliability: The confidence that (potential) users have in a model and in the information derived from the
model such that they are willing to use the model and the derived information (Sargent 2000).
Specifically, reliability is a function of the performance record of a model and its conformance to best
available, practicable science.

Response surface: A theoretical multi-dimensional "surface" that describes the response of a model to
changes in its parameter values. A response surface is also known as a sensitivity surface.
46
-------
Robustness: The capacity of a model to perform well across the full range of environmental conditions
for which it was designed.

Screening model: A type of model designed to provide a "conservative" or risk-averse answer.
Screening models can be used with limited information and are conservative, and in some cases they can
be used in lieu of refined models, even when time or resources are not limited.

Sensitivity: The degree to which the model outputs are affected by changes in selected input
parameters (Beck et al. 1994).

Sensitivity analysis: The computation of the effect of changes in input values or assumptions (including
boundaries and model functional form) on the outputs (Morgan and Henrion 1990); the study of how
uncertainty in a model output can be systematically apportioned to different sources of uncertainty in the
model input (Saltelli et al. 2000a). By investigating the "relative sensitivity" of model parameters, a user
can become knowledgeable of the relative importance of parameters in the model.

Simulation model: A model that represents the development of a solution by incremental steps through
the model domain. Simulations are often used to obtain solutions for models that are too complex to be
solved analytically. For most situations, where a differential equation is being approximated, the
simulation model will use finite time step (or spatial step) to "simulate" changes in state variables over
time (or space).

Soundness: One of EPA's five assessment factors (see definition) that describes the extent to which the
scientific and technical procedures, measures, methods, or models employed to generate the information
are reasonable for and consistent with the intended application (EPA 2003b).

Specifications: Acceptance criteria set at the onset of a quality assurance plan that help to determine if
the intended objectives of the project have been met. Specifications are evaluated using a series of
associated checks (see definition).

State variables: The dependent variables calculated within a model, which are also often the
performance indicators of the models that change over the simulation.

Statistical model: A model built using observations within a probabilistic framework. Statistical models
include simple linear or multivariate regression models obtained by fitting observational data to a
mathematical function.

Steady-state model: A model providing the long-term or time-averaged behavior of the state variables.

Stochasticity: Fluctuations in ecological processes that are due to natural variability and inherent
randomness.

Stochastic model: A model that includes variability (see definition) in model parameters. This variability
is a function of changing environmental conditions, spatial and temporal aggregation within the model
framework, and random variability. The solution obtained by the model or output is therefore a function of
model components and random variability.
47
-------
Transparency: The clarity and completeness with which data, assumptions, and methods of analysis
are documented. Experimental replication is possible when information about modeling processes is
properly and adequately communicated (EPA 2002a).

Uncertainty: The term used in this document to describe lack of knowledge about models, parameters,
constants, data, and beliefs. There are many sources of uncertainty, including the science underlying a
model, uncertainty in model parameters and input data, observation error, and code uncertainty.
Additional study and collecting more information allows error that stems from uncertainty to be
minimized/reduced (or eliminated). In contrast, variability (see definition) is irreducible but can be better
characterized or represented with further study (EPA 2002b, Shelly et al. 2000).

Uncertainty analysis: Investigation of the effects of lack of knowledge or potential errors on the model
(e.g, the "uncertainty" associated with parameter values). When combined with sensitivity analysis (see
definition), uncertainty analysis allows a model user to be more informed about the confidence that can
be placed in model results.

Uncertainty and variability: One of EPA's five assessment factors (see definition) that describes the
extent to which the variability and uncertainty (quantitative and qualitative) in the information or in the
procedures, measures, methods, or models are evaluated and characterized (EPA 2003b).

Utility: One of three main components of quality in EPA's Information Quality Guidelines. "Utility" refers
to the usefulness of the information to the intended users (EPA 2002a).

Variable: A measured or estimated quantity that describes an object or can be observed in a system and
that is subject to change.

Variability: Observed differences attributable to true heterogeneity or diversity. Variability is the result of
natural random processes and is usually not reducible by further measurement or study (although it can
be better characterized) (EPA 1997).

Verification (code): Examination of the algorithms and numerical technique in the computer code to
ascertain that they truly represent the conceptual model and that there are no inherent numerical
problems with obtaining a solution (Beck et al 1994).
48
-------
Appendix B: Categories of Environmental Regulatory Models
This section is taken from Appendix C of the NRC report Models in Environmental Regulatory Decision
Making.

Models can be categorized according to their fit into a continuum of processes that translate human
activities and natural systems interactions into human health and environmental impacts. The categories
of models that are integral to environmental regulation include human activity models, natural systems
models, emissions models, fate and transport models, exposure models, human health and
environmental response models, economic impact models, and noneconomic impact models. Examples
of models in each of these categories are discussed below.

HUMAN ACTIVITY MODELS
Anthropogenic emissions to the environment are inherently linked to human activities. Activity models
simulate the human activities and behaviors that result in pollutants. In the environmental regulatory
modeling arena, examples of modeled activities are the following:
• Demographic information, such as the magnitude, distribution, and dynamics of human populations,
ranging from national growth projections to local travel activity patterns on the order of hours.
• Economic activity, such as the macroeconomic estimates of national economic production and
income, final demands for aggregate industrial sectors, prices, international trade, interest rates, and
financial flows.
• Human consumption of resources, such as gasoline or feed, may be translated into pollutant
releases, such as nitrogen oxides or nutrients. Human food consumption is also used to estimate
exposure to pollutants such as pesticides. Resource consumption in dollar terms may be used to
assess economic impacts.
• Distribution and characteristics of land use are used to assess habitat, impacts on the hydrogeologic
cycle and runoff, and biogenic pollutant releases.
Model
TRANSCAD,
TRANSPLAN,
MINUTP
DRI
E-GAS
YIELD
Type
Travel demand
forecasting
models
Forecasts
national
economic
indicators
National and
regional
economic activity
model
Crop-growth
yield model
Use
Develops estimation of motor vehicle miles traveled
for use in estimating vehicle emissions. Can be
combined with geographic information systems
(CIS) for providing spatial and temporal distribution
of motor vehicle activity.
Model can forecast over 1 ,200 economic concepts
including aggregate supply, demand, prices,
incomes, international trade, interest rates, etc. The
eight sectors of the model are: domestic spending,
domestic income, tax sector, prices, financial,
international trade, expectations, and aggregate
supply.
Emissions growth factors for various sector for
estimating volatile organic compounds, nitrogen
oxides, and carbon monoxide emissions.
Predicts temporal and spatial crop yield.
Additional Information
http://www.caliper.com/tcvo
u.ritm
EIA1993
Young etal. 1994
Hayes etal. 1982
NATURAL SYSTEMS PROCESS AND EMISSIONS MODELS
Natural systems process and emissions models simulate the dynamics of ecosystems that directly or
indirectly give rise to fluxes of nutrients and other environmental emissions.
49
-------
Model
Marine
Biological
Laboratory
General
Ecosystem
Model (MBL-
GEM)
BEIS
Natural
Emissions
Model
Type
Pilot-scale
nutrient cycling
of carbon and
nitrogen
Natural
emissions of
volatile organic
compounds
Natural
emissions of
methane and
nitrous oxide
Use
Simulates plot-level photosynthesis and nitrogen
uptake by plants, allocation of carbon and nitrogen
to foliage, stems, and fine roots, respiration in these
tissues, turnover of biomass through litter fall, and
decomposition of litter and soil organic matter.
Simulates nitric oxide emissions from soils and
volatile organic compound emissions from
vegetation. Input to grid models for NAAQS
attainment (CAA)
Models methane and nitrous oxide emissions from
the terrestrial biosphere to atmosphere.
Additional Information
http://ecosystems.mbl.edu/
Research/Models/gem/wel
come.html
http://www.epa.gov/asmdn
erl/biogen.html
http://web.mit.edu/globalch
ange/www/tem.html#nem
EMISSIONS MODELS
These models estimate the rate or the amount of pollutant emissions to water bodies and the
atmosphere. The outputs of emission models are used to generate inventories of pollutant releases that
can then serve as an input to fate and transport models.
Model
PLOAD
SPARROW
MOBILE
MOVES
NONROAD
Type
Releases to
water bodies
Releases to
water bodies
Releases to air
Use
CIS bulk loading model providing annual pollutant
loads to waterbodies. Conducts simplified analyses
of sediment issues, including a bank erosion hazard
index.
Relates nutrient sources and watershed
characteristics to total nitrogen. Predicts
contaminant flux, concentration, and yield in
streams. Provides empirical estimates (including
uncertainties) of the fate of contaminants in streams.
Factors and activities for anthropogenic emissions
from mobile sources. Estimates current and future
emissions (hydrocarbons, carbon monoxide,
nitrogen oxides, particulate matter, hazardous air
pollutants, and carbon dioxide) from highway motor
vehicles. Model used to evaluate mobile source
control strategies, control strategies for state
implementation plans, and for developing
environmental impact statements, in addition to
other research.
Additional Information
http://www.epa.gov/ost/basi
ns
http://water.usgs.gov/nawq
a/sparrow
http://www.epa.gov/
otaq/m6.htm
http://www.epa.gov/
otaq/nonrdmdl.htm
EPA 2004, EPA 2005a,
Glover and Cumberworth
2003
FATE AND TRANSPORT MODELS
Fate and transport models calculate the movement of pollutants in the environment. A large number of
EPA models fall into this category. They are further categorized into the transport media they represent:
subsurface, air, and surface water. In each medium, there are a range of models with respect to their
complexity, where the level of complexity is a function of the following:
• The number of physical and chemical processes considered.
• The mathematical representation of those processes and their numerical solution.
• The spatial and temporal scales over which the processes are modeled.

Even though some fate and transport models can be statistical models, the majority is mechanistic (also
referred to as process-based models). Such models simulate individual components in the system and
the mathematical relationships among the components. Fate and transport model output has traditionally
50
-------
been deterministic, although recent focus on uncertainty and variability has led to some probabilistic
models.
Subsurface Models
Subsurface transport is governed by the heterogeneous nature of the ground, the degree of saturation of
the subsurface, as well as the chemical and physical properties of the pollutants of interest. Such models
are used to assess the extent of toxic substance spills. They can also assess the fate of contaminants in
sediments. The array of subsurface models is tailored to particular application objectives, for example,
assessing the fate of contaminants leaking from underground gasoline storage tanks or leaching from
landfills. Models are used extensively for site-specific risk assessments; for example, to determine
pollutant concentrations in drinking-water sources. The majority of models simulate liquid pollutants;
however, some simulate gas transport in the subsurface.
Model
MODFLOW
PRZM
BIOPLUME
Type
3D finite
difference for
ground water
transport
Hydrogeological
Two-dimensional
finite difference
and Method of
Characteristics
(MOC) model
Use
Risk Assessments (RBCA) Superfund Remediation
(CERCLA). Modular three-dimensional model that
simulates ground water flow. Model can be used to
support groundwater management activities.
Pesticide leaching into the soil and root zone of
plants (FIFRA). Estimates pesticide and nitrogen
fate in the crop zone root and can simulate soil
temperature, volatilization and vapor phase transport
in soil, irrigation, and microbial transformation.
Simulates organic contaminants in groundwater due
to natural processes of dispersion, advection,
sorption, and biodegradation. Simulates aerobic and
anaerobic biodegradation reactions.
Additional Information
http://water.usgs.gov/
nrp/gwsoftware/
modflow2000/
modflow2000.html
Prudic et al. 2004,
Wilson and Naff 2004
http://www.epa.gov/
ceampubl/products.htm
EPA 2005b
http://www.epa.gov/ada/
csmos/models.html
EPA 1998
Surface Water Quality Models
Surface water quality models are often related to, or are variations of, hydrological models. The latter are
designed to predict flows in water bodies and runoff from precipitation, both of which govern the transport
of aqueous contaminants. Of particular interest in some water quality models is the mixing of
contaminants as a function of time and space, for example, following a point-source discharge into a river.
Other features of these models are the biological, chemical, and physical removal mechanisms of
contaminants, such as degradation, oxidation, and deposition, as well as the distribution of the
contaminants between the aqueous phase and organisms.
Model
HSPF
WASP
QUAL2E
Type
Combined
watershed
hydrology and
water quality
Compartment
modeling for
aquatic systems
Steady-state and
quasi-dynamic
water quality
model
Use
Total maximum daily load determinations
TMDL (CWA). Watershed model simulating nonpoint
pollutant load and runoff, fate and transport
processes in streams.
Supports management decisions by predicting water
quality responses to pollutants in aquatic systems.
Multicompartment model that examines both the
water column and underlying benthos.
Stream water quality model used as a planning tool
for developing TMDLs. The model can simulate
nutrient cycles, benthic and carbonaceous demand,
algal production, among other parameters.
Additional Information
http://www.epa.gov/
cea m pu bl/swate r/hspf/
http://www.epa.gov/
athens/wwqtsc/html/
wasp.html
Brown 1986, Brown and
Barnwell 1987
http://www3.bae.ncsu.
edu/ Regional-
Bulletins/Modeling-
Bulletin/qual2e.html
51
-------

Brown 1986, Brown and
Barnwell 1987
Air Quality Models
The fate of gaseous and solid particle pollutants in the atmosphere is a function of meteorology,
temperature, relative humidity, other pollutants, and sunlight intensity, among other things. Models that
simulate concentrations in air have one of three general designs: plume models, grid models, and
receptor models. Plume models are used widely for permitting under requirements to assess the impacts
of large new or modified emissions sources on air quality or to assess air toxics (HAPs) concentrations
close to sources. Plume models focus on atmosphere dynamics. Grid models are used primarily to
assess concentrations of secondary criteria pollutants (e.g., ozone) in regional airsheds to develop plans
(SIPs) and rules with the objective of attaining ambient air quality standards (NAAQS). Both atmospheric
dynamics and chemistry are important components of 3-D grid models. In contrast to mechanistic plume
and grid models, receptor models are statistical; they determine the statistical contribution of various
sources to pollutant concentrations at a given location based on the relative amounts of pollutants at
source and receptor. Most air quality models are deterministic.
Model
CMAQ
UAM
REMSAD
ICSC
CALPUFF
CMB
Type
3-D Grid
3-D Grid
3-D Grid
Plume
Receptor
Use
SIP development, NAAQS setting (CAA). The model
provides estimates of ozone, particulates, toxics,
and acid deposition and simulates chemical and
physical properties related to atmospheric trace gas
transformations and distributions. Model has three
components including, meteorological system, an
emissions model for estimating anthropogenic and
natural emissions, and a chemistry-transport
modeling system.
Model calculates concentrations of inert and
chemically reactive pollutants and is used to
evaluate air quality, particularly related to ambient
ozone concentrations.
Using simulation of physical and chemical processes
in the atmosphere that impact pollutant
concentrations, model calculates concentration of
inert and chemically reactive pollutants.
PSD permitting; toxics exposure (CAA, TSCA).
Non-steady-state air quality dispersion model that
simulates long range transport of pollutants.
Relative contributions of sources. Receptor model
used for air resource management purposes.
Additional Information
http://www.epa.gov/
asmdnerl/CMAQ/
index.html
Byun and Ching 1999
Systems Applications
International, Inc., 1999
http://www.remsad.com
ICF Consulting 2005

http://www.epa.gov/scra
m001/rece ptor_cmb.htm
Coulter 2004
EXPOSURE MODELS
The primary objective of exposure models is to estimate the dose of pollutant which humans or animals
are exposed to via inhalation, ingestion and/or dermal uptake. These models bridge the gap between
concentrations of pollutants in the environment and the doses humans receive based on their activity.
Pharmacokinetic models take this one step further and estimate dose to tissues in the body. Since
exposure is inherently tied to behavior, exposure models may also simulate activity, for example a model
that estimates dietary consumption of pollutants. In addition to the Lifeline model described below, other
examples of models that estimate dietary exposure to pesticides include Calendex and CARES. These
52
-------
models can be either deterministic or probabilistic, but are well-suited for probabilistic methods due to the
variability of activity within a population.
Model
Lifeline
IEUBK
Air Pollutants
Exposure
Model (APEX)
Type
Diet, water and
dermal of single
chemical
Multipathway,
single chemical
Inhalation
exposure model
Use
Aggregate dose of pesticide via multiple pathways
Dose of lead to children's blood via multiple
pathways. Estimates exposure from lead in media
(air, water, soil, dust, diet, and paint and other
sources) using pharmacokinetic models to predict
blood lead levels in children 6 months to 7 years old.
The model can be used as a tool for the
determination of site-specific cleanup levels.
Simulates an individual's exposure to an air pollutant
and their movement through space and time in
indoor or outdoor environments. Provides dose
estimates and summary exposure information for
each individual.
Additional Information
http://www.thelifeline
group.org
Lifeline Group, Inc.
2006
http://www.epa.gov/
superfund/programs/
lead/products.htm
EPA 1994
http://www.epa.gov/ttn/
fera/human_apex.html
Richmond etal. 2001
HUMAN HEALTH AND ENVIRONMENT RESPONSE MODELS
Human Health Effects Models
Health effects models provide a statistical relationship between a dose of a chemical and an adverse
human health effect. Health effects models are statistical methods, hence models in this category are
almost exclusively empirical. They can be further classified as toxicological and epidemiological. The
former refer to models derived from observations in controlled experiments, usually with nonhuman
subjects. The latter refer to models derived from observations over large populations. Health models use
statistical methods and assumptions that ultimately assume cause and effect. Included in this category
are models that extrapolate information from non-human subject experiments. Also, physiologically based
pharmacokinetic models can help predict human toxicity to contaminants through mathematical modeling
of absorption, distribution, storage, metabolism, and excretion of toxicants. The output from health
models is almost always a dose, such as a safe level (for example, reference dose [RfD]), a cancer
potency index (CPI), or an expected health end point (for example, lethal dose for 50% of the population
(LD50) or number of asthma cases). There also exist model applications that facilitate the use of the
statistical methods.
Model
Benchmark
dose model
Linear
Cancer
model
Type
Software tool for
applying a
variety of
statistical models
to analyze dose-
response data
Statistical
analysis
method
Use
To estimate risk of pollutant exposure. Models fit to
dose-response data to determine a benchmark dose
that is associated with a particular benchmark
response.
To estimate the risk posed by carcinogenic
pollutants
Additional Information
http://cfpub.epa.gov/
ncea/cfm/recordisplay.
cfm?deid=20167
EPA 2000
Ecological Effects Models
Ecological effects models, like human health effects models, define relationships between a level of
pollutant exposure and a particular ecological indicator. Many ecological effects models simulate aquatic
53
-------
environments, and ecological indicators are related directly to environmental concentrations. Examples
of ecological effects indicators that have been modeled are: algae blooms, BOD, fish populations, crop
yields, coast line erosion, lake acidity, and soil salinity.
Model
AQUATOX
BASS
SERAFM
PATCH
Type
Integrated
fate and
effects of
pollutants in
aquatic
environment
Simulates
fish
populations
exposed to
pollutants
(mechanistic
Steady-state
modeling
system used
to predict
mercury
concentration
in wildlife
Movement of
invertebrates
in their
habitat
Use
Ecosystem model that predicts the environmental
fate of chemicals in aquatic ecosystems, as well as
direct and indirect effects on the resident organisms.
Potential applications to management decisions
include water quality criteria and standards, TMDLs,
and ecological risk assessments of aquatic systems.
Models dynamic chemical bioconcentration of
organic pollutants and metals in fish. Estimates are
being used for ecological risks to fish in addition to
realistic dietary exposures to humans and wildlife.
Predicts total mercury concentrations in fish and
speciated mercury concentrations in water and
sediments.
Provides population estimates of territorial terrestrial
vertebrate species overtime, in addition to survival
and fecundity rates, and orientation of breeding
sites.
Determine ecological effects of regulation.
Additional Information
http://www.epa.gov/
waterscience/models/
aquatox/
Hawkins 2005,
Rashleigh 2007
http://www.epa.gov/
athens/research/
modeling/bass.html
http://www.epa.gov/
ceampubl/swater/
serafm/index.htm
Knightes 2005
http://www.epa.gov/
wed/pages/models/
patch/patchmain.htm
Lawler et al. 2006
ECONOMIC IMPACT MODELS
This category includes a broad group of models that are used in many different aspects of EPA's
activities including: rulemaking (regulatory impact assessments), priority setting, enforcement, and
retrospective analyses. Models that produce a dollar value as output belong in this category. Models can
be divided into cost models, which may include or exclude behavior responses, and benefit models. The
former incorporate economic theory on how markets (supply, demand, and pricing) will respond as a
result of an action. Economic models are traditionally deterministic, though there is a trend toward
greater use of uncertainty methods in cost-benefit analysis.
Model
ABEL
Nonroad
Diesel
Economic
Impact Model
(NDEIM)
BenMAP
Type
Micro Economic
Macro economic
for impact of the
nonroad diesel
emissions
standards rule
Noneconomic
and
economic
benefits
from air quality
Use
Assess a single firm's ability to pay compliance costs
or fees. Estimates claims from defendants that they
cannot afford to pay for compliance, clean-up or civil
penalties using information from tax return data and
cash-flow analysis.
Used for settlement negotiations.
Multimarket model to analyze how producers and
consumers are expected to respond to compliance
costs associated with the rule. Estimates and
stratifies emissions for nonroad equipment. Model
can be used to inform State Implementation Plans
and regulatory analyses.
Model that estimates the health benefits associated
with air quality changes by estimating changes in
incidences of a wide range of health outcomes and
then placing an economic value on these reduced
incidences.
Additional Information
http://iaspub.epa.gov/
edr/edr_proc_qry.
navigate?P LIST
OPTION CD=CSDIS&
P REG AUTH
IDENTIFIER=1&P
DATA IDENTIFIER=
90389&P VERSIONS
http://www.epa.gov/ttn/
atw/nsps/cinsps/
ci_nsps_eia_report final
forproposal.pdf
http://www.epa.gov/ttne
casl /benmodels.html
54
-------
NONECONOMIC IMPACT MODELS
Noneconomic impact models evaluate the effects of contaminants on a variety of noneconomic
parameters, such as on crop yields and buildings. Note that other noneconomic impacts, such as impacts
on human health or ecosystems, are derived from the human health and ecological effects models
discussed previously.
Model
TDM (Travel
Demand
Management)
CERES-
Wheat
PHREEQE-A
Type
Model used to
evaluate travel
demand
management
strategies
Crop-growth
yield model
Models effects of
acidification on
stone
Use
Evaluates travel demand management strategies to
determine vehicle-trip reduction effects. Model used
to support transit policies including HOV lanes,
carpooling, telecommuting, and pricing and travel
subsidies.
Simulates effects of planting density, weather, water,
soil, and nitrogen on crop growth, development, and
yield. Predicts management strategies that impact
crop yield.
Simulates the effects of acidic solutions on
carbonate stone.
Additional Information
http://www.fhwa.dot.go
v/environment/cmaqeat/
descriptions_tdm_evalua
tion_model.htm
http://nowlin.css.msu.ed
u/wheat_book/
Parkhurst et al. 1990
55
-------
Appendix C: Supplementary Material on Quality Assurance
Planning and Protocols

This section consists of a series of text boxes meant to supplement concepts and references made in the
main body of the document. They are not meant as a comprehensive discussion on QA practices, and
each box should be considered as a discrete unit. Individually, the text boxes provide additional
background material for specific sections of the main document. The complete QA manuals for each
subject area discussed in this guidance and referred to below should be consulted for more complete
information on QA planning and protocols.

Box C1: Background on EPA Quality System
The EPA Quality System defined in EPA Order 5360.1 A2, "Policy and Program Requirements for the
Mandatory Agency-Wide Quality System" (EPA 2000e), covers environmental data produced from models
as well as "any measurement or information that environmental processes, location, or conditions;
ecological or health effects and consequences; or the performance of environmental technology." For
EPA, environmental data include information collected directly from measurements, produced from
models, and compiled from other sources such as databases and literature.

The EPA Quality System is based on an American National Standard, ANSI 1994. Consistent with
minimum specifications of this standard, §6.a.(7) of EPA Order 5360.1 A2 states that EPA organizations
will develop a Quality System that includes "approved" Quality Assurance (QA) Project Plans, or
equivalent documents defined by the Quality Management Plan, for all applicable projects and tasks
involving environmental data with review and approval having been made by the EPA QA Manager (or
authorized representative defined in the Quality Management Plan). The approval of the QA Project Plan
containing the specifications for the product(s) and the checks against those specifications (assessments)
for implementation is an important management control assuring records to avoid fiduciary "waste and
abuse" (Federal Managers' Financial Integrity Act of 19829 with annual declarations including
conformance to the EPA Quality System). The assessments (including peer review) support the product
acceptance for models and their outputs and approval for use such as supporting environmental
management decisions by answering questions, characterizing environmental processes or conditions,
and direct decision support such as economic analyses (process planned in Group D in the Guidance for
QA Project Plans for Modeling). EPA's policies for QA Project Plans are provided in Chapter 5 of EPA's
Manual 5360 A1 (EPA 2000e), the EPA Quality Manual for Environmental Programs (EPA 2000f) for in-
house modeling, and Requirements for Quality Assurance Project Plans (QA/G5-M) (EPA 2002b) for
modeling done through extramural agreements (e.g., contracts 48 CFR 46, grants and cooperative
agreements 40 CFR 30, 31, and 35). QA requirements must be negotiated and written into Interagency
Agreements if the project is funded by EPA; if funds are received by EPA, EPA Manual 5360 A1 (EPA
2000e) applies.

EPA Order 5360.1 A2 also states that EPA organizations' Quality Systems must include "use of a
systematic planning approach to develop acceptance or performance criteria for all work covered" and
"assessment of existing data, when used to support Agency decisions or other secondary purposes, to
verify that they are of sufficient quantity and adequate quality for their intended use."
' Federal Managers Financial Integrity Act of 1982, P.L. 97-255—(H.R. 1526), September 8, 1982.
56
-------
Box C2: Configuration Tests Specified in the QA Program
During code verification, the final set of computer code is scrutinized to ensure that the equations are
programmed correctly and that sources of error, such as rounding, are minimal. This process is likely to
be more extensive for new computer code. For existing code, the criteria used for previous verification, if
known, can be described or cited. Any additional criteria specific to the modeling project can be
specified, along with how the criteria were established. Possible departures from the criteria are
discussed, along with how the departures can affect the modeling process.

Software code development inspections: An independent person or group other than the author(s)
examines software requirements, software design, or code to detect faults, programming errors, violations
of development standards, or other problems. All errors found are recorded at the time of inspection, with
later verification that all errors found have been successfully corrected.

Software code performance testing: Software used to compute model predictions is tested to assess
its performance relative to specific response times, computer processing usage, run time, convergence to
solutions, stability of the solution algorithms, absence of terminal failures, and other quantitative aspects
of computer operation.

Testing of model modules: Checks ensure that the computer code for each module is computing
outputs accurately and within any specific time constraints. (Modules are different segments or portions
of the model linked together to obtain the final model prediction.)

Model framework testing: The full model framework is tested as the ultimate level of integration testing
to verify that all project-specific requirements have been implemented as intended.

Integration testing: The computational and transfer interfaces between modules need to allow an
accurate transfer of information from one module to the next, and ensure that uncertainties in one module
are not lost or changed when that information is transferred to the next module. These tests detect
unanticipated interactions between modules and track down their cause(s). (Integration tests should be
designed and applied hierarchically by increasing, as testing proceeds, the number of modules tested and
the subsystem complexity.)

Regression testing: All testing performed on the original version of the module or linked modules is
repeated to detect new "bugs" introduced by changes made in the code to correct a model.

Stress testing (of complex models): This ensures that the maximum load (e.g., real-time data
acquisition and control systems) does not exceed limits. The stress test should attempt to simulate the
maximum input, output, and computational load expected during peak usage. The load can be defined
quantitatively using criteria such as the frequency of inputs and outputs or the number of computations or
disk accesses per unit of time.

Acceptance testing: Certain contractually required testing may be needed before the new model or the
client accepts model application. Specific procedures and the criteria for passing the acceptance test are
listed before the testing is conducted. A stress test and a thorough evaluation of the user interface is a
recommended part of the acceptance test.
57
-------
Beta testing of the pre-release hardware/software: Persons outside the project group use the
software as they would in normal operation and record any anomalies they encounter or answer
questions provided in a testing protocol by the regulatory program. The users report these observations
to the regulatory program or specified developers, who address them before release of the final version.

Reasonableness checks: These checks involve items like order-of-magnitude, unit, and other checks to
ensure that the numbers are in the range of what is expected.

Note: This section is adapted from (EPA 2002b).

Box C3: Quality Assurance Planning Suggestions for Model Calibration Activities

Information related to objectives and acceptance criteria for calibration activities that generally appear at
the beginning of this QA Project Plan element includes the following:

Objectives of model calibration: This includes expected accomplishments of the calibration and how
the predictive quality of the model might be improved as a result of implementing the calibration
procedures.

Acceptance criteria: The specific limits, standards, goodness-of-fit, or other criteria on which a model
will be judged as being properly calibrated (e.g., the percentage difference between reference data values
from the field or laboratory and predicted results from the model). This includes a mention of the types of
data and other information that will be necessary to acquire in order to determine that the model is
properly calibrated (e.g., field data, laboratory data, predictions from other accepted models). In addition
to addressing these questions when establishing acceptance criteria, the QA Project Plan can document
the likely consequences (e.g., incorrect decision making) of selecting data that do not satisfy one or more
of these areas (e.g., are non-representative, are inaccurate), as well as procedures in place to minimize
the likelihood of selecting such data.

Justifying the calibration approach and acceptance criteria: Each time a model is calibrated, it is
potentially altered. Therefore, it is important that the different calibrations, the approaches taken (e.g.,
qualitative versus quantitative), and their acceptance criteria are properly justified. This justification can
refer to the overall quality of the standards being used as a reference or to the quality of the input data
(e.g., whether data are sufficient for statistical tests to achieve desired levels of accuracy).
58
-------
Box C4: Definition of Quality

notions of integrity, utility, and objectivity. Integrity refers to the protection of information from
unauthorized access or revision to ensure that it is not compromised through corruption or falsification. In
the context of environmental models, integrity is often most relevant to protection of code from
unauthorized or inappropriate manipulation (see Box 2). Utility refers to the usefulness of the information
to the intended users. The utility of modeling projects is aided by the implementation of a systematic
planning approach that includes the development of acceptance or performance criteria (see Box 1).
Objectivity involves two distinct elements, presentation and substance. Objectivity includes whether
disseminated information is being presented in an accurate, clear, complete and unbiased manner. It also
involves a focus on ascertaining accurate, reliable, and unbiased information.

EPA's five general assessment factors (EPA 2003b) for evaluating the quality and relevance of scientific
and technical information supporting Agency actions are: soundness, applicability and utility, clarity and
completeness, uncertainty and variability, and evaluation and review. Soundness refers to the extent to
which a model is appropriate for its intended application and is a reasonable representation of reality.
Applicability and utility describe the extent to which the information is relevant and appropriate for the
Agency's intended use. Clarity and completeness refer to documentation of the data, assumptions,
methods, quality controls, and analysis employed to generate the model outputs. Uncertainty and
variability highlight the extent to which limitations in knowledge and information and natural randomness
in input data and models are evaluated and characterized. Evaluation and review evaluate the extent of
independent application, replication, evaluation, validation, and peer review of the information or of the
procedures, measures, methods, or models employed to generate the information.
59
-------
Appendix D: Best Practices for Model Evaluation

D.1 Introduction

This appendix presents a practical guide to the best practices for model evaluation (please see Section
4.1 for descriptions of these practices). These best practices are:

• Scientific peer review (Section 4.1.1)
• Quality assurance project planning (Section 4.1.2)
• Corroboration (Section 4.1.3)
• Sensitivity analysis (Section 4.1.3)
• Uncertainty analysis (Section 4.1.3)

The objective of model evaluation is to determine whether a model is of sufficient quality to inform a
regulatory decision. For each of these best practices, this appendix provides a conceptual overview for
model evaluation and introduces a suite of "tools" that can be used in partial fulfillment of the best
practice. The appropriate use of these tools is discussed and citations to primary references are
provided. Users are encouraged to obtain more complete information about tools of interest, including
their theoretical basis, details of their computational methods, and the availability of software.

Figure D.1.1 provides an overview of the steps in the modeling process that are discussed in this
guidance. Items in bold in the figure, including peer review, model corroboration, uncertainty analysis,
and sensitivity analysis, are discussed in this section on model evaluation.
60
-------
Environmental System
Peer review is an ongoing process that should be
considered at all steps in the modeling process.
Dataset 3...n
Model
Results
Corroborated
Conceptual
Model
Mechanistic
Model
Code
Verification
Model Evaluation
Parameterized
Model*
Model Development
Model Application
Figure D.1.1. The modeling process.
* In some disciplines parameterization may include, or be referred to as, calibration.
** Qualitative and/or quantitative corroboration should be performed when necessary.
61
-------
D.2 Scientific Peer Review

EPA policy states that major science-based and technical products related to Agency decisions should
normally be peer-reviewed. Agency managers determine and are accountable for the decision whether to
employ peer review in particular instances and, if so, its character, scope, and timing. EPA has published
guidance for program managers responsible for implementing the peer review process for models (Beck
et al. 1994). This guidance discusses peer review mechanisms, the relationship of external peer review to
the process of environmental regulatory model development and application, documentation of the peer
review process, and specific elements of what could be covered in an external peer review of model
development and application.

The general process for external peer review of models is as follows (Beck et al. 1994, Press 1992):

• Step 0: The program manager within the originating office (AA-ship or Region) identifies elements of
the regulatory process that would benefit from the use of environmental models. A review/solicitation
of currently available models and related research should be conducted. If it is concluded that the
development of a new model is necessary, a research/development work plan is prepared.
• Step Ob (optional): The program manager may consider internal and/or external peer review of the
research/development concepts to determine whether they are of sufficient merit and whether the
model is likely to achieve the stated purpose.
• Step 1: The originating office develops a new or revised model or evaluates the possible novel
application of a model developed for a different purpose.
• Step 1b (optional): The program manager may consider internal and/or external peer review of the
technical or theoretical basis prior to final development, revision, or application at this stage. For
model development, this review should evaluate the stated application niche.
• Step 2: Initial Agency-wide (internal) peer review/consultation of model development and/or proposed
application may be undertaken by the developing originating office. Model design, default
parameters, etc., and/or intended application are revised (if necessary) based on consideration of
internal peer review comments.
• Step 3: The origination office considers external peer review. Model design, default parameters, etc.,
and/or intended application are revised (if necessary) based on consideration of internal peer review
comments.
• Step 4: Final Agency-wide evaluation/consultation may be implemented by the originating office.
This step should consist of consideration of external peer review comments and documentation of the
Agency's response to scientific/technical issues.

(Note: Steps 2 and 4 are relevant when there is either an internal Agency standing or an ad hoc peer
review committee or process).
62
-------
Box D1: Elements of External Peer Review for Environmental Regulatory Models (Box 2-4 from NRC's Models
in Environmental Regulatory Decision Making)
Model Purpose/Objectives
* What is the regulatory context in which the model will be used and what broad scientific question is the model
intended to answer?
• What is the model's application niche?
• What are the model's strengths and weaknesses?
Major Defining and Limiting Considerations
* Which processes are characterized by the model?
• What are the important temporal and spatial scales?
• What is the level of aggregation?
Theoretical Basis for the Model — formulating the basis for problem solution
* What algorithms are used within the model and how were they derived?
• What is the method of solution?
• What are the shortcomings of the modeling approach?
Parameter Estimation
* What methods and data were used for parameter estimation?
• What methods were used to estimate parameters for which there were no data?
• What are the boundary conditions and are they appropriate?
Data Quality/Quantity
Questions related to model design include:
• What data were utilized in the design of the model?
• How can the adequacy of the data be defined taking into account the regulatory objectives of the model?
Questions related to model application include:
• To what extent are these data available and what are the key data gaps?
• Do additional data need to be collected and for what purpose?
Key Assumptions
* What are the key assumptions?
• What is the basis for each key assumption and what is the range of possible alternatives?
• How sensitive is the model toward modifying key assumptions?
Model Performance Measures
* What criteria have been used to assess model performance?
• Did the data bases used in the performance evaluation provide an adequate test of the model?
• How does the model perform relative to other models in this application niche?
Model Documentation and Users Guide
* Does the documentation cover model applicability and limitations, data input, and interpretation of results?
Retrospective
• Does the model satisfy its intended scientific and regulatory objectives?
• How robust are the model predictions?
• How well does the model output quantify the overall uncertainty?

Source: EPA 1994b.
63
-------
D.3 Quality Assurance Project Planning
Box D2: Quality Assurance Planning and Data Acceptance Criteria
The QA Project Plan needs to address four issues regarding information on how non-direct
measurements are acquired and used on the project (EPA 2002d):

• The need and intended use of each type of data or information to be acquired.
• How the data will be identified or acquired, and expected sources of these data.
• The method of determining the underlying quality of the data.
• The criteria established for determining whether the level of quality for a given set of data is
acceptable for use on the project.

Acceptance criteria for individual data values generally address issues such as the following:

Representativeness: Were the data collected from a population sufficiently similar to the
population of interest and the model-specified population boundaries? Were the sampling and
analytical methods used to generate the collected data acceptable to this project? How will
potentially confounding effects in the data (e.g., season, time of day, location, and scale
incompatibilities) be addressed so that these effects do not unduly impact the model output?

Bias: Would any characteristics of the dataset directly impact the model output (e.g., unduly high
or low process rates)? For example, has bias in analysis results been documented? Is there
sufficient information to estimate and correct bias? If using data to develop probabilistic
distributions, are there adequate data in the upper and lower extremes of the tails to allow for
unbiased probabilistic estimates?

Precision: How is the spread in the results estimated? Is the estimate of variability sufficiently
small to meet the uncertainty objectives of the modeling project as stated in Element A7 (Quality
Objectives and Criteria for Model Inputs/Outputs) (e.g., adequate to provide a frequency of
distribution)?

Qualifiers: Have the data been evaluated in a manner that permits logical decisions on the
data's applicability to the current project? Is the system of qualifying or flagging data adequately
documented to allow data from different sources to be used on the same project (e.g., distinguish
actual measurements from estimated values, note differences in detection limits)?

Summarization: Is the data summarization process clear and sufficiently consistent with the
goals of this project (e.g., distinguish averages or statistically transformed values from unaltered
measurement values)? Ideally, processing and transformation equations will be made available
so that their underlying assumptions can be evaluated against the objectives of the current
project.
D.4 Corroboration

In this guidance, "corroboration" is defined as all quantitative and qualitative methods for evaluating the
degree to which a model corresponds to reality. In practical terms, it is the process of "confronting
models with data" (Hilborn and Mangel 1997). In some disciplines, this process has been referred to as
validation. In general, the term "corroboration" is preferred because it implies a claim of usefulness and
not truth.

Corroboration is used to understand how consistent the model is with data. However, uncertainty and
variability affect how accurately both models and data represent reality because both models and data
(observations) are approximations of some system. Thus, to conduct corroboration meaningfully (i.e., as
a tool to assess how well a model represents the system being modeled), this process should begin by
characterizing the uncertainty and variability in the corroboration data. As discussed in Section 4.1.3.1,
64
-------
variability stems from the natural randomness or stochasticity of natural systems and can be better
captured or characterized in a model but not reduced. In contrast, uncertainty can be minimized with
improvements in model structure (framework), improved measurement and analytical techniques, and
more comprehensive data for the system being studied. Hence, even a "perfect" model (that contains no
measurement error and predicts the correct ensemble average) may deviate from observed field
measurements at a given time.

Depending on the type (qualitative and/or quantitative) and availability of data, corroboration can involve
hypothesis testing and/or estimates of the likelihood of different model outcomes.

D.4.1 Qualitative Corroboration
Qualitative model corroboration involves expert judgment and tests of intuitive behavior. This type of
corroboration uses "knowledge" of the behavior of the system in question, but is not formalized or
statistics-based. Expert knowledge can establish model reliability through consensus and consistency.
For example, an expert panel consisting of model developers and stakeholders could be convened to
determine whether there is agreement that the methods and outputs of a model are consistent with
processes, standards, and results used in other models. Expert judgment can also establish model
credibility by determining if model-predicted behavior of a system agrees with best-available
understanding of internal processes and functions.

D.4.2 Quantitative Methods
When data are available, model corroboration may involve comparing model predictions to independent
empirical observations to investigate how well a model's description of the world fits the observational
data. This involves using both statistical measures for goodness of fit and numerical procedures to
facilitate these calculations. The can be done graphically or by calculating various statistical measures of
fit of a model's results to data.

Recall that a model's application niche is the set of conditions under which the use of a model is
scientifically defensible (Section 5.2.3); it is the domain of a model's intended applicability. If the model
being evaluated purports to estimate an average value across the entire system, then one method to deal
with corroboration data is to stratify model results and observed data into "regimes," subsets of data
within which system processes operate similarly. Corroboration is then performed by comparing the
average of model estimates and observed data within each regime (ASTM 2000).

D.4.2.1 Graphical Methods
Graphical methods can be used to compare the distribution of model outputs to independent
observations. The degree to which these two distributions overlap, and their respective shapes, provide
an indication of model performance with respect to the data. Alternately, the differences between
observed and predicted data pairs can be plotted and the resulting probability density function (PDF)
used to indicate precisions and bias. Graphical methods for model corroboration can be used to indicate
bias, skewness, and kurtosis of model results. Skewness indicates the relative precision of model results,
while bias is a reflection of accuracy. Kurtosis refers to the amplitude of the PDF.

D.4.2.2 Deviance Measures
Methods for calculating model bias:
Mean error calculates the average deviation between models and data (e = model-data) by dividing the
sum of errors (Ee) by total number of data points compared (m).

Ze
MeanError = — (in original measurement units)
m

Similarly, mean % error provides a unit-less measure of model bias:
65
-------
Ee/ v
MeanError(%) = * 100 ,
m

where "s" is the sample or observational data in original units.

Methods for calculating bias and precision:
Mean square error (MSE):

Ze2
MSE =
m
(Large deviations in any single data pair (model-data) can dominate this metric.)

Mean absolute error:
Eel
MeanAbsError =
m

D.4.2.3 Statistical Tests
A more formal hypothesis testing procedure can also be used for model corroboration. In such cases, a
test is performed to determine if the model outputs are statistically significantly different from the empirical
data. Important considerations in these tests are the probability of making type I and type II errors and
the shape of the data distributions, as most of these metrics assume the data are distributed normally.
The test-statistic used should also be based on the number of data-pairs (observed and predicted)
available.

There are a number of comprehensive texts that may help analysts determine the appropriate statistical
and numerical procedures for conducting model corroboration. These include:

• Efron, B., and R. Tibshirani. 1993. An Introduction to the Bootstrap. New York: Chapman and Hall.
• Gelman, A.J.B., H.S. Carlin, and D.B. Rubin. 1995. Bayesian Data Analysis. New York: Chapman
and Hall.
• McCullagh, P., and J.A. Nelder. 1989. Generalized Linear Models. New York: Chapman and Hall.
• Press, W.H., B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling. 1986. Numerical Recipes.
Cambridge, UK: Cambridge University Press.
• Snedecor, G.W., and W.G. Cochran. 1989. Statistical Methods. Eighth Ed. Iowa State University
Press.

D.4.3 Evaluating Multiple Models
Models are metaphorical (albeit sometimes accurate) descriptions of nature, and
there can never be a "correct" model. There may be a "best" model, which is
more consistent with the data than any of its competitors, or several models may
be contenders because each is consistent in some way with the data and none
clearly dominates the others. It is the job of the ecological detective to determine
the support that the data offer for each competing model or hypothesis.
— Hillborn and Mangel 1997, Ecological Detective

In the simplest sense, a first cut of model performance is obtained by examining which model minimizes
the sum of squares (SSq) between observed and model-predicted data.
SSq = ^ (pred - obs)2
66
-------
The SSq is equal to the squared differences between model-predicted values and observational values.
If data are used to fit models and estimate parameters, the fit will automatically improve with each higher-
order model — e.g., simple linear model, y = a + bX, vs. a polynomial model, y = a + bX + cX2.

It is therefore useful to apply a penalty for additional parameters to determine if the improvement in model
performance (minimizing SSq deviation) justifies an increase in model complexity. The question is
essential whether the decrease in the sum of squares is statistically significant.

The SSq is best applied when comparing several models using a single dataset. However, if several
datasets are available the Normalized Mean Square Error (NMSE) is typically a better statistic, as it is
normalized to the product of the means of the observed and predicted values (see discussion and
references, Section D.4.4.4).

D.4.4 An Example Protocol for Selecting a Set of Best Performing Models

During the development phase of an air quality dispersion model and in subsequent upgrades, model
performance is constantly evaluated. These evaluations generally compare simulation results using
simple methods that do not account for the fact that models only predict a portion of the variability seen in
the observations. To fill a part of this void, the U.S. Environmental Protection Agency (EPA) developed a
standard that has been adopted by the ASTM International, designation D 6589-00 for Statistical
Evaluation of Atmospheric Dispersion Model Performance (ASTM 2000). The following discussion
summarizes some of the issues discussed in D 6589.

D.4.4.1 Define Evaluation Objectives
Performing a statistical model evaluation involves defining those evaluation objectives (features or
characteristics) within the pattern of observed and modeled concentration values that are of interest to
compare. As yet, no one feature or characteristic has been found that can be defined within a
concentration pattern that will fully test a model's performance. For instance, the maximum surface
concentration may appear unbiased through a compensation of errors in estimating the lateral extent of
the dispersing material and in estimating the vertical extent of the dispersing material. Adding into
consideration that other biases may exist (e.g., in treatment of the chemical and removal processes
during transport, in estimating buoyant plume rise, in accounting for wind direction changes with height, in
accounting for penetration of material into layers above the current mixing depth, in systematic variation
in all of these biases as a function of atmospheric stability), one can appreciate that there are many ways
that a model can falsely give the appearance of good performance.

In principle, modeling diffusion involves characterizing the size and shape of the volume into which the
material is dispersing as well as the distribution of the material within this volume. Volumes have three
dimensions, so a model evaluation will be more complete if it tests the model's ability to characterize
diffusion along more than one of these dimensions.

D.4.4.2 Define Evaluation Procedures
Having selected evaluation objectives for comparison, the next step is to establish an evaluation
procedure (or series of procedures), which defines how each evaluation objective will be derived from the
available information. Development of statistical model evaluation procedures begins with technical
definitions of the terminology used in the goal statement. In the following discussion, we use a plume
dispersion model example, but the thought process is valid as well for regional photochemical grid
models.

Suppose the evaluation goal is to test models' ability to replicate the average centerline concentration as
a function of transport downwind and as a function of atmospheric stability. Several questions must be
answered to achieve this goal: What is an "average centerline concentration"? What is "transport
downwind"? How will "stability" be defined?

What questions arise in defining the average centerline concentration? Given a sampling arc of
concentration values, it is necessary to decide whether the centerline concentration is the maximum value
67
-------
seen anywhere along the arc or that seen near the center of mass of the observed lateral concentration
distribution. If one chooses the latter concept, one needs a definition of how "near" the center of mass
one has to be, to be representative of a centerline concentration value. One might decide to select all
values within a specific range (nearness to the center of mass). In such a case, either a definition or a
procedure will be needed to define how this specific range will be determined. A decision will have to be
made on the treatment of observed zero (and near measurement threshold) concentrations. To discard
such values is to say that low concentrations cannot occur near a plume's center of mass, which is a
dubious assumption. One might test to see if conclusions reached regarding the "best performing model"
are sensitive to the decision made on the treatment of near-zero concentrations.

What questions arise in defining "transport downwind"? During near-calm wind conditions, when
transport may have favored more than one direction over the sampling period, "downwind" is not well
described by one direction. If plume models are being tested, one might exclude near-calm conditions,
since plume models are not meant to provide meaningful results during such conditions. If puff models or
grid models are being tested, one might sort the near-calm cases into a special regime for analysis.

What questions arise in defining "stability"? For surface releases, surface-layer Monin-Obukhov length, L,
has been found to adequately define stability effects; for elevated releases, Z/L, where Z, is the mixing
depth, has been found to be a useful parameter for describing stability effects. Each model likely has its
own meteorological processor. It is likely that different processors will have different values for L and Z,
for each of the evaluation cases. There is no one best way to deal with this problem. One solution might
be to sort the data into regimes using each of the models' input values, and see if the conclusions
reached as to best performing model are affected.

What questions arise if one is grouping data together? If one is grouping data together for which the
emission rates are different, one might choose to resolve this difference by normalizing the concentration
values by dividing by the respective emission rates. To divide by the emission rate, either one has a
constant emission rate over the entire release or the downwind transport is sufficiently obvious that one
can compute an emission rate, based on travel time, that is appropriate for each downwind distance.

Characterizing the plume transport direction is highly uncertain, even with meteorological data collected
specific for the purpose. Thus, we expect that the simulated position of the plume will not overlap the
observed position of the plume. One must decide how to compare a feature (or characteristic) in a
concentration pattern, when uncertainties in transport direction are large. Will the observed and modeled
patterns be shifted, and if so, in what manner?

This discussion is not meant to be exhaustive, but to be illustrative of how the thought process might
evolve. When terms are defined, other questions arise that — when resolved — eventually produce an
analysis that will compute the evaluation objective from the available data. There likely is more than one
answer to the questions that develop. This may cause different people to develop different objectives and
procedures for the same goal. If the same set of models is chosen as the best-performing, regardless of
which path is chosen, one can likely be assured that the conclusions reached are robust.

D.4.4.3 Define Trends in Modeling Bias
In this discussion, references to observed and modeled values refer to the observed and model
evaluation objectives (e.g., regime averages). A plot of the observed and modeled values as a function of
one of the model input parameters is a direct means for detecting model bias. Such comparison has
been recommended and employed in a variety of investigations, e.g., Fox (1981), Weil et al. (1992),
Hanna (1993) In some cases the comparison is the ratio formed by dividing the modeled value by the
observed value, plotted as a function of one or more of the model input parameters. If the data have
been stratified into regimes, one can also display the standard error estimates on the respective modeled
and observed regime averages. If the respective averages are encompassed by the error bars (typically
plus and minus two times the standard error estimates), one can assume the differences are not
significant. As Hanna [11] describes, this a "seductive" inference. Procedures to provide a robust
assessment of the significance of the differences are defined in ASTM D 6589 (ASTM 2000).
68
-------
D.4.4.4 Summary of Performance
As an example of overall summary of performance, we will discuss a procedure constructed using the
scheme introduced by Cox and Tikvart (1990) as a template. The design for statistically summarizing
model performance over several regimes is envisioned as a five-step procedure.

1. Form a replicate sample using concurrent sampling of the observed and modeled values for each
regime. Concurrent sampling associates results from all models with each observed value, so that
selection of an observed value automatically selects the corresponding estimates by all models.
2. Compute the average of observed and modeled values for each regime.
3. Compute the normalized mean square error, NMSE, using the computed regime averages, and store
the value of the NMSE computed for this pass of the bootstrap sampling.
4. Repeat steps 1 through 3 for all bootstrap sampling passes (typically of order 500).
5. Implement the procedure described in ASTM D 6589 (ASTM 2000) to detect which model has the
lowest computed NMSE value (call this the "base" model) and which models have NMSE values that
are significantly different from the "base" model.

In the Cox and Tikvart (1990) analysis, the data were sorted into regimes (defined in terms of Pasquill
stability category and low/high wind speed classes), and bootstrap sampling was used to develop
standard error estimates on the comparisons. The performance measure was the robust highest
concentration (computed from the raw observed cumulative frequency distribution), which is a comparison
of the highest concentration values (maxima), which most models do not contain the physics to simulate.
This procedure can be improved if intensive field data are used and the performance measure is the
NMSE computed from the modeled and observed regime averages of centerline concentration values as
a function of stability along each downwind arc, where each regime is a particular distance downwind for
a defined stability range.

The data demands are much greater for using regime averages than for using individual concentrations.
Procedures that analyze groups (regimes) of data include intensive tracer field studies, with a dense
receptor network, and many experiments. Whereas, Cox and Tikvart (1990) devised their analysis to
make use of very sparse receptor networks having one or more years of sampling results. With dense
receptor networks, attempts can be made to compare average modeled and "observed" centerline
concentration values, but only a few of these experiments have sufficient data to allow stratification of the
data into regimes for analysis. With sparse receptor networks, there are more data for analysis, but there
is insufficient information to define the observed maxima relative to the dispersing plume's center of
mass. Thus, there is uncertainty as to whether or not the observed maxima are representative of
centerline concentration values. It is not obvious that the average of the n (say 25) observed maximum
hourly concentration values (for a particular distance downwind and narrowly defined stability range) is
the ensemble average centerline concentration the model is predicting. In fact, one might anticipate that
the average of the n maximum concentration values is likely to be higher than the ensemble average of
the centerline concentration. Thus the testing procedure outlined by Cox and Tikvart (1990) may favor
selection of poorly formed models that routinely underestimate the lateral diffusion (and thereby
overestimate the plume centerline concentration). This in turn, may bias such models' ability to
characterize concentration patterns for longer averaging times.

It is therefore concluded that once a set of "best-performing models" has been selected from an
evaluation using intensive field data that tests a model's ability to predict the average characteristics to be
seen in the observed concentration patterns, evaluations using sparse networks are seen as useful
extensions to further explore the performance of well-formulated models for other environs and purposes.

D.5 Sensitivity Analysis
This section provides a broad overview of uncertainty and sensitivity analyses and introduces various
methods used to conduct the latter. A table at the end of this section summarizes these methods' primary
features and citations to additional resources for computational detail.
69
-------
D.5.1 Introducing Sensitivity Analyses and Uncertainty Analysis

A model approximates reality in the face of scientific uncertainties. Section 4.1.3.1 identifies and defines
various sources of model uncertainty. External peer reviewers of EPA models have consistently
recommended that EPA communicate this uncertainty through uncertainty analysis and sensitivity
analysis, two related disciplines. Uncertainty analysis investigates the effects of lack of knowledge or
potential errors of model inputs (e.g., the "uncertainty" associated with parameter values); when
combined with sensitivity analysis, it allows a model user to be more informed about the confidence that
can be placed in model results. Sensitivity analysis measures the effect of changes in input values or
assumptions (including boundaries and model functional form) on the outputs (Morgan and Henrion
1990); it is the study of how uncertainty in a model output can be systematically apportioned to different
sources of uncertainty in the model input (Beck et al. 1994). By investigating the "relative sensitivity" of
model parameters, a user can become knowledgeable of the relative importance of parameters in the
model.

Consider a model represented as a function f, with inputs x1 and x2, and with output y, such that y =
f(xi,x;>). Figure D.5.1 schematically depicts how uncertainty analysis and sensitivity analysis would be
conducted for this model. Uncertainty analysis would be conducted by determining how y responds to
variation in inputs x1 and x2, the graphic depiction of which is referred to as the model's response surface.
Sensitivity analysis would be conducted by apportioning the respective contributions of x1 and x2 to
changes in y. The schematic should not be construed to imply that uncertainty analysis and sensitivity
analysis are sequential events. Rather, they are generally conducted by trial and error, with each type of
analysis informing the other. Indeed, in practice, the distinction between these two related disciplines may
be irrelevant. For purposes of clarity, the remainder of this appendix will refer exclusively to sensitivity
analysis.
Uncertainty Analysis
y =
inputs
model run
outputs
Sensitivity
Analysis
Figure D.5.1. Uncertainty and sensitivity analyses. Uncertainty analysis investigates the effects of lack of
knowledge or potential errors of model inputs. Sensitivity analysis evaluates the respective contributions
of inputs x-, and x2to output y.

D.5.2 Sensitivity Analysis and Computational Complexity
Choosing the appropriate uncertainty analysis/sensitivity analysis method is often a matter of trading off
between the amount of information one wants from the analyses and the computational difficulties of the
analyses. These computational difficulties are often inversely related to the number of assumptions one is
willing or able to make about the shape of a model's response surface.

Consider once again a model represented as a function f, with inputs x-, and x2 and with output y, such
that y = f(x1tx2). Sensitivity measures how output changes with respect to an input. This is a
straightforward enough procedure with differential analysis if the analyst:
70
-------
• Can assume that the model's response surface is a hyperplane, as in Figure 0.5.2(1);
• Accepts that the results apply only to specific points on the response surface and that these points
are monotonic first order, as in Figure D.5.2 (2);10 or
• Is unconcerned about interactions among the input variables.

Otherwise, sensitivity analysis may be more appropriately conducted using more intensive computational
methods.
(1)
(2)
Figure D.5.2. It's hyperplane and simple. (1) A model response surface that is a hyperplane can
simplify sensitivity analysis computations. (2) The same computations can also be used for other
response surfaces, but only as approximations around a single locus.

This guidance suggests that, depending on assumptions underlying the model, the analyst should use
non-intensive sensitivity analysis techniques to initially identify those inputs that generate the most
sensitivity, then apply more intensive methods to this smaller subset of inputs. It may therefore be useful
to categorize the various sensitivity analysis techniques into methods that (a) can be quickly used to
screen for the more important input factors; (b) are based on differential analyses; (c) are based on
sampling; and (d) are based on variance methods.

D.5.3 Screening Tools

D.5.3.1 Tools That Require No Model Runs
Cullen and Frey (1999) suggest that summary statistics measuring input uncertainty can serve as
preliminary screening tools without additional model runs (and if the models are simple and linear),
indicating proportionate contributions to output uncertainty:

• Coefficient of variation. The coefficient of variation is the standard deviation normalized to the mean
(CT/(O) in order to reduce the possibility that inputs that take on large values are given undue
importance.
• Gaussian approximation. Another approach to apportioning input variance is Gaussian
approximation. Using this method, the variance of a model's output is estimated as the sum of the
variances of the inputs (for additive models) or the sum of the variances of the log-transformed inputs
(for multiplicative models), weighted by the squares on any constants which may be multiplied by the
inputs as they occur in the model.

D.5.3.2 Scatterplots
Cullen and Frey (1999) suggest that a high correlation between an input and an output variable may
indicate substantial dependence of the variation in output and the variation of the input. A simple, visual
10 Related to this issue are the terms "local sensitivity analysis" and "global sensitivity analysis." The former refers
to sensitivity analysis conducted around a nominal point of the response surface, while the latter refers to sensitivity
analysis across the entire surface.
71
-------
assessment of the influence of an input on the output is therefore possible using scatterplots, with each
plot posing a selected input against the output, as in Figure D.5.3.

Time (ms)
15 T
12
9 •
3 •
-+-
Area
H 1 (K pixels)
0 100 200 300 400
Figure D.5.3. Correlation as indication of input effect. The high correlation between the input
variable area and the output variable time (holding all other variables fixed) is an indication of
the possible effect of area's variation on the output.

D.5.3.3 Morris's OAT
The key concept underlying one-at-a-time (OAT) sensitivity analyses is to choose a base case of input
values and to perturb each input variable by a given percentage away from the base value while holding
all other input variables constant. Most OAT sensitivity analysis methods yield local measures of
sensitivity (see footnote 9) that depend on the choice of base case values. To avoid this bias, Saltelli et
al. (2000b) recommend using Morris's OAT for screening purposes because it is a global sensitivity
analysis method — it entails computing a number of local measures (randomly extracted across the input
space) and then taking their average.

Morris's OAT provides a measure of the importance of an input factor in generating output variation, and
while it does not quantify interaction effects, it does provide an indication of the presence of interaction.
Figure D.5.4 presents the results that one would expect to obtain from applying Morris's OAT (Cossarini
et al. 2002). Computational methods for this technique are described in Saltelli et al. 2000b.
72
-------
Interaction
o.s
0.6
0.2
0.0
ptiy2
eff
2:00 1
opL_phy2
opL_phy"l
-1.2 -0.8 -0.4 0.0

Importance of input in output variation
0.4
o.s
Figure D.5.4. An application of Morris's OAT. Cossarini et al. (2002) investigated the influence
of various ecological factors on energy flow through a food web. Their sensitivity analysis
indicated that maximum bacteria growth and bacteria mortality (|abac and Kmbac, respectively)
have the largest (and opposite) effects on energy flow, as indicated by their values on the
horizontal axis. These effects, as indicated by their values on the vertical axis, resulted from
interactions with other factors.

D.5.4 Methods Based on Differential Analysis
As noted previously, differential analyses may be used to analyze sensitivity if the analyst is willing either
to assume that the model response surface is hyperplanar or to accept that the sensitivity analysis results
are local and that they are based on hyperplanar approximations tangent to the response surface at the
nominal scenario (Morgan and Henrion 1990; Saltelli et al. 2000b).

Differential analyses entail four steps. First, select base values and ranges for input factors. Second,
using these input base values, develop a Taylor series approximation to the output. Third, estimate
uncertainty in output in terms of its expected value and variance using variance propagation techniques.
Finally, use the Taylor series approximations to estimate the importance of individual input factors (Saltelli
et al. 2000b). Computational methods for this technique are described in Morgan and Henrion 1990.

D.5.5 Methods Based on Sampling
One approach to estimating the impact of input uncertainties is to repeatedly run a model using randomly
sampled values from the input space. The most well-known method using this approach is Monte Carlo
analysis. In a Monte Carlo simulation, a model is run repeatedly. With each run, different input values are
drawn randomly from the probability distribution functions of each input, thereby generating multiple
output values (Morgan and Henrion 1990; Cullen and Frey 1999). One can view a Monte Carlo simulation
as a process through which multiple scenarios generate multiple output values; although each execution
of the model run is deterministic, the set of output values may be represented as a cumulative distribution
function and summarized using statistical measures (Cullen and Frey 1999).

EPA proposes several best principles of good practice for the conduct of Monte Carlo simulations (EPA
1997). They include the following:

• Conduct preliminary sensitivity analyses to identify significant model components and input variables
that make important contributions to model uncertainty.
• When deciding upon a probability distribution function (PDF) for input variables, consider the
following questions: Is there any mechanistic basis for choosing a distributional family? Is the PDF
likely to be dictated by physical, biological, or other properties and mechanisms? Is the variable
73
-------
discrete or continuous? What are the bounds of the variable? Is the PDF symmetric or skewed, and if
skewed, in which direction?
• Base the PDF on empirical, representative data.
• If expert judgment is used as the basis for the PDF, document explicitly the reasoning underlying this
opinion.
• Discuss the presence or absence of covariance among the input variables, which can significantly
affect the output.

The preceding points merely summarize some of the main points raised in EPA's Guidance on Monte
Carlo Analysis. That document should be consulted for more detailed guidance. Conducting Monte Carlo
analysis may be problematic for models containing a large number of input variables. Fortunately, there
are several approaches to dealing with this problem:

• Brute force approach. One approach is to increase sheer computing power. For example, EPA's
ORD is developing a Java-based tool that facilitates Monte Carlo analyses across a cluster of PCs by
harnessing the computing power of multiple workstations to conduct multiple runs for a complex
model (Babendreierand Castleton 2002).
• Smaller, structured trials. The value of Monte Carlo lies not in the randomness of sampling, but in
achieving representative properties of sets of points in the input space. Therefore, rather than
sampling data from entire input space, computations may be through stratified sampling by dividing
the input sample space into strata and sampling from within each stratum. A widely used method for
stratified sampling is Latin hypercube sampling, comprehensively described in Cullen and Frey 1999.
• Response surface model surrogate. The analyst may also choose to conduct Monte Carlo not on the
complex model directly, but rather on a response surface representation of it. The latter is a simplified
representation of the relationship between a selected number of model outputs and a selected
number of model inputs, with all other model inputs held at fixed values (Morgan and Henrion 1990;
Saltelli et al. 2000b).

D.5.6 Methods Based on Variance
Consider once again a model represented as a function f, with inputs x1 and x2 and with output y, such
that y = f(x1,x2). The input variables are affected by uncertainties and may take on any number of possible
values. Let X denote an input vector randomly chosen from among all possible values for x1 and x2. The
output y for a given X can also be seen as a realization of a random variable Y. Let E(Y\X) denote the
expectation of Y conditional on a fixed value of X. If the total variation in y is matched by the variability in
E(Y\ X) as x-, is allowed to vary, this is an indication that variation in x-, significantly affects y.

The variance-based approaches to sensitivity analysis are based on the estimation of what fraction of
total variation of y is attributable to variability in E (Y\X) as a subset of input factors are allowed to vary.
Three methods for computing this estimation (correlation ratio, Sobol, and Fourier amplitude sensitivity
test) are featured in Saltelli et al. 2000b.

D.5.7 Which Method to Use?
A panel of experts was recently assembled to review various sensitivity analysis methods. The panel
refrained from explicitly recommending a "best" method and instead developed a list of attributes for
preferred sensitivity analysis methods. The panel recommended that methods should preferably be able
to deal with a model regardless of assumptions about a model's linearity and additivity, consider
interaction effects among input uncertainties, cope with differences in the scale and shape of input PDFs,
cope with differences in input spatial and temporal dimensions, and evaluate the effect of an input while
all other inputs are allowed to vary as well (Frey 2002; Saltelli 2002). Of the various methods discussed
above, only those based on variance (Section D.5.6) are characterized by these attributes. When one or
more of the criteria are not important, the other tools discussed in this section will provide a reasonable
sensitivity assessment.

As mentioned earlier, choosing the most appropriate sensitivity analysis method will often entail a trade-
off between computational complexity, model assumptions, and the amount of information needed from
74
-------
the sensitivity analysis. As an aid to sensitivity analysis method selection, the table below summarizes the
features and caveats of the methods discussed above.
Method
Screening
methods
Morris's
one-at-a-time
Differential
analyses
Monte Carlo
analyses
Variance-
based
Features
May be conducted
independent of model
run
Global sensitivity
analysis
Global sensitivity
analysis for linear model;
local sensitivity analysis
for nonlinear model
Intuitive
No assumptions
regarding response
surface
Robust and independent
of model assumptions
Addresses interactions
Caveats
Potential for significant error if
model is non-linear
Indicates, but does not
quantify interactions
No treatment of interactions
among inputs
Assumes linearity,
monotonicity, and continuity
Depending on number of
input variables, may be time-
consuming to run, but
methods to simplify are
available
May rely on assumptions
regarding input PDFs
May be computationally
difficult.
Reference
Cullen and Frey
1999, pp. 247-8.
Saltelli et al.
2000b, p. 68.
Cullen and Frey
1999, pp. 186-94.
Saltelli et al.
2000b, pp. 183-91
Cullen and Frey
1999, pp. 196-237
Morgan and
Henrion 1990, pp.
198-216.
Saltelli et al.
2000b, pp. 167-97
D.6 Uncertainty Analysis
D.6.1 Model Suitability
An evaluation of model suitability to resolve application niche uncertainty (Section 4.1.3.1) should
precede any evaluation of data uncertainty and model performance. The extent to which a model is
suitable for a proposed application depends on:

• Mapping of model attributes to the problem statement
• The degree of certainty needed in model outputs
• The amount of reliable data available or resources available to collect additional data
• Quality of the state of knowledge on which the model is based
• Technical competence of those undertaking simulation modeling

Appropriate data should be available before any attempt is made to apply a model. A model that needs
detailed, precise input data should not be used when such data are unavailable.

D.6.2 Data Uncertainty
There are two statistical paradigms that can be adopted to summarize data. The first employs classical
statistics and is useful for capturing the most likely or "average" conditions observed in a given system.
This is known as the "frequentist" approach to summarizing model input data. Frequentist statistics rely
on measures of central tendency (median, mode, mean values) and represent uncertainty as the
deviation from these metrics. A frequentist or "deterministic" model produces a single set of solutions for
each model run. In contrast, the alternate statistical paradigm employs a probabilistic framework, which
summarizes data according to their "likelihood" of occurrence. Input data are represented as distributions
rather than a single numerical value and models outputs capture a range of possible values.

The classical view of probability defines the probability of an event occurring by the value to which the
long run frequency of an event or quantity converges as the number of trials increases (Morgan and
Henrion 1990). Classical statistics relies on measures of central tendency (mean, median, mode) to
75
-------
define model parameters and their associated uncertainty (standard deviation, standard error, confidence
intervals).

In contrast to the classical view, a subjectivist or Bayesian view is that the probability of an event is the
current degree of belief that a person has that it will occur, given all of the relevant information currently
known to that person. This framework involves the use of probability distributions based on likelihoods
functions to represent model input values and employs techniques like Bayesian updating and Monte
Carlo methods as statistical evaluation tools (Morgan and Henrion 1990).
76
-------

Literature Cited in Main Text and Appendices A. C. D:

Anderson, M., and W. Woessner. 1992. The role of the postaudit in model validation. Advances in Water
Resources 15: 167-173.

ANSI (American National Standards Institute). 1994. Specifications and Guidelines for Quality Systems
for Environmental Data Collection and Technology Programs. ANSI/ANSQ, E4-1994.

ASTM. 2000. Standard Guide for Statistical Evaluation of Atmospheric Dispersion Model Performance (D
6589). Available: http://www.astm.org.
Babendreier, J.E., and K.J. Castleton. 2002. Investigating uncertainty and sensitivity in integrated,
multimedia environmental models: tools for FRAMES-3MRA. In: Proceedings of 1st Biennial Meetir
International Environmental Modeling and Software Society 2: 90-95. Lugano, Switzerland.
Barnwell, T.O., L.C. Brown, and R.C. Whittemore. 2004. Importance of field data in stream water quality
modeling using QUAL2E-UNCAS. J. Environ. Eng. 130(6): 643-647.

Beck, M.B. 1987. Water quality modeling: a review of the analysis of uncertainty. Water Resources
Research 23(8): 1393-1442.

Beck, B. 2002. Model evaluation and performance. In: A.M. El-Shaarawi and W.W. Piegorsch, eds.
Encyclopedia of Environmetrics. Chichester: John Wiley & Sons.

Beck, M., L.A. Mulkey, and T.O. Barnwell. 1994. Model Validation for Exposure Assessments — DRAFT.
Athens, Georgia: United States Environmental Protection Agency.

Booch, G. 1994. Object-Oriented Analysis and Design with Applications. 2nd ed. Redwood, California:
Benjamin/Cummings.

Borsuk, M.E., C.A. Stow, and K.H. Reckhow. 2002. Predicting the frequency of water quality standard
violations: a probabilistic approach for TMDL development. Environmental Science and Technology 36:
2109-2115.

Bredehoeft, J.D. 2003. From models to performance assessment: the conceptualization problem. Ground
Water 41 (5): 57'1-577.

Bredehoeft, J.D. 2005. The conceptualization model problem — surprise. Hydrogeology Journal 13(1):
37-46.

Cossarini, G., C. Solidoro, and A. Crise. 2002. A model forthe trophic food web of the Gulf of Trieste. In:
A.E. Rizzoli and A.J. Jakeman, eds. Integrated Assessment and Decision Support: Proceedings of the 1st
77
-------
Biennial Meeting of the iEMSs 3: 485. Available: http://www.iemss.org/iemss2002/proceedinqs/pdf/
volume%20tre/285 cossarini.pdf.

Cox, W.M., and J.A. Tikvart. 1990. A statistical procedure for determining the best performing air quality
simulation model. Atmos. Environ. 24A(9):2387-2395.

CREM (Council on Regulatory Environmental Modeling). 2001. Proposed Agency Strategy for the
Development of Guidance on Recommended Practices in Environmental Modeling. Draft. U.S.
Environmental Protection Agency.

Cullen, A.C., and H.C. Frey. 1999. Probabilistic Techniques in Exposure Assessment: A Handbook for
Dealing with Variability and Uncertainty in Models and Inputs, ed. 326. New York: Plenum Press.

EPA (U.S. Environmental Protection Agency). 1992. Protocol for Determining the Best Performing Model.
EPA-454-R-92-025. Research Triangle Park, North Carolina: Office of Air Quality Planning and
Standards.

EPA (U.S. Environmental Protection Agency). 1993. Review of Draft Agency Guidance for Conducting
External Peer Review of Environmental Regulatory Modeling. EPA-SAB-EEC-LTR-93-008.

EPA (U.S. Environmental Protection Agency). 1994a. Peer Review and Peer Involvement at the U.S.
Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 1994b. Report of the Agency Task Force on Environmental
Regulatory Modeling: Guidance, Support Needs, Draft Criteria and Charter. EPA-500-R-94-001.
Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 1997. Guiding Principles for Monte Carlo Analysis. EPA-
630-R-97-001. Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 1999c. Description of the MOBILE Highway Vehicle
Emissions Factor Model. Office of Mobile Sources. Ann Arbor, Michigan: U.S. Environmental Protection
Agency.

EPA (U.S. Environmental Protection Agency). 2000a. Guidance for the Data Quality Objectives Process.
EPA QA/G-4. Office of Environmental Information.

EPA (U.S. Environmental Protection Agency). 2000b. Guidance for Data Quality Assessment. EPA QA/G-
9. Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 2000c. Science Policy Council Handbook: Peer Review.
2nd ed.

EPA (U.S. Environmental Protection Agency). 2000d. Risk Characterization Handbook. Science Policy
Council. EPA-100-B-00-002. Washington, D.C.: U.S. Environmental Protection Agency.
78
-------
EPA (U.S. Environmental Protection Agency). 2000e. Policy and Program Requirements for the
Mandatory Agency-Wide Quality System. EPA Order, Classification Number 5360.1 A2.

EPA (U.S. Environmental Protection Agency). 2000f. EPA Quality Manual for Environmental Programs.
5360A1.

EPA (U.S. Environmental Protection Agency). 2001. Proposed Agency Strategy for the Development of
Guidance on Recommended Practices in Environmental Modeling. Model Evaluation Action Team,
Council for Regulatory Environmental Modeling. Washington, D.C.: U.S. Environmental Protection
Agency.

EPA (U.S. Environmental Protection Agency). 2002a. Information Quality Guidelines. Office of
Environmental Information. Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 2002b. Quality Assurance Project Plans for Modeling. EPA
QA/G-5M. Office of Environmental Information.

EPA (U.S. Environmental Protection Agency). 2002c. Guidance on Choosing a Sampling Design for
Environmental Data Collection for Use in Developing a Quality Assurance Plan. EPA QA/G-5S.
Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 2002d. Guidance on Environmental Data Verification and
Data Validation. EPA QA/G-8. Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 2003a. Revision to guideline on air quality models:
adoption of a preferred long range transport model and other revisions. Federal Register 68 (72): 18440-
18482.

EPA (U.S. Environmental Protection Agency). 2003b. A Summary of General Assessment Factors for
Evaluating the Quality of Scientific and Technical Information. Science Policy Council. Washington, D.C.:
U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 2006. Peer Review Handbook. 3rd ed. EPA-100-B-06-002.
Prepared for the U.S. Environmental Protection Agency by members of the Peer Review Advisory Group,
for EPA's Science Policy Council. Washington, D.C: U.S. Environmental Protection Agency. Available:
http://epa.gov/peerreview/pdfs/Peer%20Review%20HandbookMay06.pdffaccessed Nov. 10, 2006].

Fox, D.G. 1981. Judging air quality model performance: a summary of the AMS workshop on dispersion
model performance. Bull. Amer. Meteor. Soc. 62: 599-609.

Frey, H.C. 2002. Guest editorial: introduction to special section on sensitivity analysis and summary of
NCSU/USDA Workshop on Sensitivity Analysis. Risk Analysis. 22: 539-546.

Hanna, S.R. 1988. Air quality model evaluation and uncertainty. Journal of the Air Pollution Control
Association 38: 406-442.
79
-------
Hanna, S.R. 1993. Uncertainties in air quality model predictions. Boundary-Layer Met. 62: 3-20.

Hillborn, R., and M. Mangel. 1997. The Ecological Detective: Confronting Models with Data. Princeton,
New Jersey: Princeton University Press.

Kernigham, B.W., and P.J. Plaugher. 1988. The Elements of Programming Style. 2nd ed.

Konikow, L.F., and J.D. Bredehoeft. 1992. Ground water models cannot be validated. Advances in Water
Resources 15(1): 75-83.

Levins, S. 1992. The problem of pattern and scale in ecology. Ecology 73: 1943-1967.

Luis, S.J., and D.B. McLaughlin. 1992. A stochastic approach to model validation. Advances in Water
Resources 15(1): 75-83.

Manno, J., R. Smardon, J.V. DePinto, E.T. Cloyd, and S. Del Grando. 2008. The Use of Models in Great
Lakes Decision Making: An Interdisciplinary Synthesis. Randolph G. Pack Environmental Institute,
College of Environmental Science and Forestry. Occasional Paper 16.

Morgan, G., and M. Henrion. 1990. The nature and sources of uncertainty. In: Uncertainty: A Guide to
Dealing With Uncertainty in Quantitative Risk and Policy Analysis. Cambridge, U.K.: Cambridge
University Press, pp. 47-72.

NRC (National Research Council). 2001. Assessing the TMDL Approach to Water Quality Management,
Committee to Assess the Scientific Basis of the Total Maximum Daily Approach to Water Pollution
Reduction. Water Science and Technology Board, Division of Earth and Life Studies. Washington, D.C.:
National Academies Press.

Platt, J.R. 1964. Strong inference. Science 146: 347-352.

Press, W.H. 1992. Numerical Recipes: The Art of Scientific Computing. Cambridge, U.K.: Cambridge
University Press.

Reckhow, K.H. 1994. Water quality simulation modeling and uncertainty analysis for risk assessment and
decision making. Ecological Modeling 72: 1-20.

SAB (Science Advisory Board). 1988. Review of the Underground Storage Tank (UST) Release
Simulation Model. SAB-EEC-88-029. Environmental Engineering Committee. Washington, D.C.: U.S.
Environmental Protection Agency.
80
-------
SAB (Science Advisory Board). 1989. Resolution on the Use of Mathematical Models by EPA for
Regulatory Assessment and Decision-Making. EPA-SAB-EEC-89-012. Washington, D.C.: U.S.
Environmental Protection Agency.

SAB (Science Advisory Board). 1993a. Review of Draft Agency Guidance for Conducting External Peer
Review of Environmental Regulatory Modeling. EPA-SAB-EEC-LTR-93-008. Washington, D.C.: U.S.
Environmental Protection Agency.

SAB (Science Advisory Board). 1993b. An SAB Report: Review ofMMSoils Component of the Proposed
RIA forthe RCRA Corrective Action Rule. EPA-SAB-EEC-94-002. Washington, D.C.: U.S. Environmental
Protection Agency.

Saltelli, A., S. Tarantola, and F. Campolongo. 2000a. Sensitivity analysis as an ingredient of modeling.
Statistical Science 15: 377-395.

Saltelli, A., K. Chan, and M. Scott, eds. 2000b. Sensitivity Analysis. New York: John Wiley and Sons.

Saltelli, A. 2002. Sensitivity analysis for importance assessment. Risk Analysis 22: 579-590.

Sargent, R.G. 2000. Verification, validation and accreditation of simulation models. In: J.A. Joines et al.,
eds. Proceedings of the 2000 Winter Simulation Conference.

Scheffe, R.D., and R.E. Morris. 1993. A review of the development and application of the Urban Airshed
model. Atmos. Environ. B-Urb. 27(1): 23-39.

Shelly, A., D. Ford, and B. Beck. 2000. Quality Assurance of Environmental Models. NRCSE Technical
Report Series.

Small, M.J., and P.S. Fishbeck. 1999. False precision in Bayesian updating with incomplete models.
Human and Ecological Risk Assessment 5(2): 291-304

Suter, G.W.I. 1993. Ecological Risk Assessment. Boca Raton: Lewis Publishers. 528.

Usunoff, E., J. Carrera, and S.F. Mousavi. 1992. An approach to the design of experiments for
discriminating among alternative conceptual models. Advances in Water Resources 15(3): 199-214.

Weil, J.C., R.I. Sykes, and A. Venkatram. 1992. Evaluating air-quality models: review and outlook. J.
Appl. Meteor. 31: 1121-1145.
Literature Cited in Boxes in Main Text

ADEC (Alaska Department of Environmental Conservation). 2001. Section III.C: Fairbanks Transportation
Control Program. In: State Air Quality Control Plan. Vol. 2. Analysis of Problems, Control Actions.
Adapted July 27, 2001. Juneau, Alaska: Alaska Department of Environmental Conservation.
81
-------
Beck, M.E. 2002. Environmental foresight and models: a manifesto. Developments in Environmental
Modeling 22: 473. Amsterdam: Elsevier.

Burden, D. 2004. Environmental decision making: principles and criteria forgroundwaterfate and
transport models. Presentation at the First Meeting on Models in the Regulatory Decision Process, March
18, 2004, Washington, D.C.

Dennis, R.L. 2002. The ozone problem. In: M.B. Beck, ed. Environmental Foresight and Models: A
Manifesto. New York: Elsevier. 147-169.

EPA (U.S. Environmental Protection Agency). 1998. Guidelines for Ecological Risk Assessment. EPA-
630-R-95-002F. Risk Assessment Forum. Available: http://www.epa.gov/superfund/programs/
nrd/era.htm [accessed Nov. 7, 2006].

EPA (U.S. Environmental Protection Agency). 2002b. Total Maximum Daily Load for Total Mercury in the
Ochlockonee Watershed, GA. Region 4. Available: http://www.epa.gov/Region4/water/tmdl/
georgia/ochlockonee/final tmdls/OchlockoneeHgFinalTMDL.pdf [accessed Nov. 7, 2006].

EPA (U.S. Environmental Protection Agency). 2002c. Perchlorate Environmental Contamination:
Toxicological Review and Risk Characterization. External Review Draft. NCEA-1-0503. National Center
for Environmental Assessment, Office of Research and Development. Washington, D.C.: U.S.
Environmental Protection Agency. Available: http://cfpub.epa.gov/ncea/cfm/recordisplay. cfm?deid=24002
[accessed Nov. 7, 2006].

EPA (U.S. Environmental Protection Agency). 2004b. Final Regulatory Impact Analysis: Control of
Emissions From Nonroad Diesel Engines. EPA-420-R-04-007. Assessment and Standards Division,
Office of Transportation and Air Quality. Available: http://www.epa.gov/nonroad-diesel/2004fr/
420r04007a.pdf [accessed Nov. 9, 2006].

EPA (U.S. Environmental Protection Agency). 2004c. Air Quality Criteria for Particulate Matter. Vols. 1
and 2. EPA-600-P-99-002aF-bF. National Center for Environmental Assessment, Office of Research and
Development. Research Triangle Park, North Carolina: U.S. Environmental Protection Agency. Available:
http://cfpub.epa.gov/ncea/cfm/partmatt.cfm [accessed Nov. 9, 2006].

EPA (U.S. Environmental Protection Agency). 2005. Guidance on the Use of Models and Other Analyses
in Attainment Demonstrations for the 8-Hour Ozone NAAQS. Draft Final. Office of Air and Radiation,
Office of Air Quality Planning and Standards. Research Triangle Park, North Carolina: U.S. Environmental
Protection Agency. Available: http://www.epa.gov/ttn/scram/guidance/guide/draft-final-o3.pdf [accessed
April 27, 2007].
82
-------
Fenner, K., M. Scheringer, M. MacLeod, M. Matthies, T. McKone, M. Stroebe, A. Beyer, M. Bonnell, A.C.
Le Gall, J. Klasmeier, D. Mackay, D. van de Meent, D. Pennington, B. Scharenberg, N. Suzuki, and F.
Wania. 2005. Comparing estimates of persistence and long-range transport potential among multimedia
models. Environ. Sci. Technol. 39(7): 1932-1942.

George, J. 2004. State perspective on modeling in support of the TMDL program. Presentation at the
First Meeting on Models in the Regulatory Decision Process, March 18, 2004, Washington, D.C.

Gilliland, A. 2003. Overview of model Evaluation Plans for CMAQ FY04 Release. Presented at the CMAQ
Model Peer Review Meeting, December 17, 2003, Research Triangle Park, NC. Available:
httpV/www.cmascenter.org/r and d/first review/pdf/model evaluation plans for cmaq04 (gilliland).pdf?t
emp id=99999 [accessed Nov. 22, 2006].

Klasmeier, J., M. Matthies, M. MacLeod, K. Fenner, M. Scheringer, M. Stroebe, A.C. Le Gall, I.E.
McKone, D. van de Meent, and F. Wania. 2006. Application of multimedia models for screening
assessment of long-range transport potential and overall persistence. Environ. Sci. Technol. 40(1): 53-60.

Leighton, P.A. 1961. Photochemistry of Air Pollution. New York: Academic Press.

Morales, K.H., L. Ryan, T.L. Kuo, M.M. Wu, and C.J. Chen. 2000. Risk of internal cancers from arsenic in
drinking water. Environ. Health Perspect. 108(7): 655-661.

Morales, K.H., J.G. Ibrahim, C.J. Chen, and L.M. Ryan. 2006. Bayesian model averaging with
applications to benchmark dose estimation for arsenic in drinking water. J. Am. Stat. Assoc. 101(473): 9-
17.

NRC (National Research Council). 1991. Rethinking the Ozone Problem in Urban and Regional Air
Pollution. Washington, D.C.: National Academies Press.

NRC (National Research Council). 2001 b. Arsenic in Drinking Water 2001 Update. Washington, D.C.:
National Academies Press.

NRC (National Research Council). 2003. Adaptive Monitoring and Assessment for the Comprehensive
Everglades Restoration Plan. Washington, D.C.: National Academies Press.

NRC (National Research Council). 2005a. Superfund and Mining Megasites. Washington, D.C.: National
Academies Press.

NRC (National Research Council). 2005b. Health Implications of Perchlorate Ingestion. Washington, D.C.:
National Academies Press.

NRC (National Research Council). 2007. Models in Environmental Regulatory Decision Making.
Committee on Models in the Regulatory Decision Process, Board on Environmental Studies and
Toxicology, Division on Earth and Life Studies. Washington, D.C.: National Academies Press.
83
-------
OECD (Organisation for Economic Co-operation and Development). 2002. Report of the OECD/UNEP
Workshop on the Use of Multimedia Models for Estimating Overall Environmental Persistence and Long-
Range Transport Potential in the Context of PBTs/POPs Assessment. ENV/JM/MONO (2002)15. OECD
Series on Testing and Assessment No. 36. Paris: Organisation for Economic Co-operation and
Development. Available:
http://www.olis.oecd.org/olis/2002doc.nsf/43bb6130e5e86e5fc12569fa005d004c/150147753d5c7f6cc125
6c010047d31d/$FILE/JT00130274.PDF [accessed April 27, 2007].

Oreskes, N.M., K. Shrader-Frechette, and K. Belitz. 1994. Verification, validation and confirmation of
numerical models in the earth sciences. Science 263: p. 641-646.

Scheringer, M., M. MacLeod, and F. Wegmann. 2006. The OECD Pov and LRTP Screening Tool, Version
2.0. — Software and Manual. http://www.sust-chem.ethz.ch/downloads/Tool2 0 Manual.pdf [accessed
April 27, 2007].

Shoemaker, L. 2004. Modeling and decision making overview. Presentation at the First Meeting on
Models in the Regulatory Decision Process, March 18, 2004, Washington, D.C.

Suter, G.W.I. 1993. Ecological Risk Assessment. Boca Raton: Lewis Publishers. 528.

TCEQ (Texas Commission on Environmental Quality). 2004. Required control strategy elements. Chapter
5. In: Revision to the State Implementation Plan (SIP) for the Control of Ozone Air Pollution:
Houston/Galveston/Brazoria Ozone Nonattainment Area. Project No. 2004-042-SIP-NR. Austin, Texas:
Texas Commission on Environmental Quality. Available:
http://www.tnrcc.state.tx.us/oprd/sips/iune2004hgb EDrec.html#docs [accessed June 8, 2005].

Volinsky, C.T., D. Madigan, A.E. Raftery, and R.A. Kronmal. 1997. Bayesian model averaging in
proportional hazard models: assessing the risk of the stroke. Appl. Statist. 46(4): 433-448.

Weaver, J. 2004. Modeling leaking underground storage tanks. Presentation at the First Meeting on
Models in the Regulatory Decision Process, March 18, 2004, Washington, D.C.

Wool, T. 2004. Overview of the TMDL program and modeling approaches. Presentation at the First
Meeting on Models in the Regulatory Decision Process, March 18, 2004, Washington, D.C.

Literature Cited in Appendix B
Birgand, F. 2004. Evaluation of QUAL2E. In: J.E. Parsons, D.L. Thomas, and R.L. Huffman, eds.
Agricultural Non-Point Source Water Quality Models: Their Use and Application. Southern Cooperative
Series Bulletin 398. 96-106. Available: http://www3.bae.ncsu.edu/Regional-Bulletins/Modeling-
Bulletin/gual2e.html.

Brown, L.C. 1986. Uncertainty Analysis Using QUAL2E. EPA-600-D-86-053. Office of Research and
Development, U.S. Environmental Protection Agency.
84
-------
Brown, L.C., and T.O. Barnwell. 1987. The Enhanced Stream Water Quality Models QUAL2E and
QUAL2E-UNCAS: Documentation and User Manual. EPA-600-3-87-007. Environmental Research
Laboratory. Athens, Georgia: U.S. Environmental Protection Agency.

Byun, D.W., and J.K.S. Ching. 1999. Science Algorithms of the EPA Models-3 Community Multiscale Air
Quality (CMAQ) Modeling System. EPA-600-R-99-030. Atmospheric Modeling Division, National
Exposure Research Laboratory. Research Triangle Park, North Carolina: U.S. Environmental Protection
Agency. Available: http://www.epa.gov/asmdnerl/CMAQ/CMAQscienceDoc.html [accessed June 13,
2007].

Caliper Corporation. 2007. TransCAD. Available: http://www.caliper.com/tcovu.htm [accessed June 13,
2007].

Coulter, C.T. 2004. EPA-CMB8.2 Users Manual. EPA-452-R-04-011. Air Quality Modeling Group,
Emissions, Monitoring and Analysis Division, Office of Air Quality Planning and Standards. Research
Triangle Park, North Carolina: U.S. Environmental Protection Agency. Available:
http://www.epa.gov/scram001/models/receptor/EPACMB82Manual.pdf [accessed June 13, 2007].

Donigian, A.S., Jr. 2002. Watershed Model Calibration and Validation: The HSPF Experience. WEF
National TMDL Science and Policy 2002, November 13-16, 2002. Phoenix, AZ. Available:
http://hspf.com/TMDL.Nov02.Donigian.Paper.doc [accessed June 13, 2007].

EIA (Energy Information Administration). 1993. Documentation of the DRI Model of the U.S. Economy.
Available: tonto.eia.doe.gov/FTPROOT/modeldoc/m061.pdf [accessed March 31, 2007].

EPA (U.S. Environmental Protection Agency). 1994. Guidance Manual for the Integrated Exposure
Uptake Biokinetic Model for Lead in Children. EPA-540-R-93-081. OSWER9285.7-15-1. PB93-963510.
Office of Solid Waste and Emergency Response. Washington, D.C.: U.S. Environmental Protection
Agency. Available: http://www.epa.gov/superfund/programs/lead/products.htm [accessed Nov. 2, 2006].

EPA (U.S. Environmental Protection Agency). 1998. BIOPLUME III: Natural Attenuation Decision Support
System. User's Manual Version 1.0. EPA-600-R-98-010. Washington, D.C.: U.S. Environmental
Protection Agency.

EPA (U.S. Environmental Protection Agency). 1999. ABEL Model. Environmental Data Registry.
Available:
http://iaspub.epa.gov/edr/edr proc qrv.navigate?P LIST OPTION CD=CSDIS&P REG AUTH IDENTI
FIER=1&P DATA IDENTIFIER=90389&P VERSIONS [accessed June 14, 2007].

EPA (U.S. Environmental Protection Agency). 2000. Benchmark Dose Technical Guidance Document.
EPA-630-R-00-001. External Review Draft. Risk Assessment Forum. Washington, D.C.: U.S.
Environmental Protection Agency. Available:
http://www.epa.gov/nceawww1/pdfs/bmds/BMDExternal 10 13 2000.pdf [accessed June 12, 2007].
85
-------
EPA (U.S. Environmental Protection Agency). 2001. PLOAD Version 3.0: An ArcView CIS Tool to
Calculate Nonpoint Sources of Pollution in Watershed and Stormwater Projects. User's Manual. Office of
Water Science. Available:
www.epa.gov/waterscience/basins/b3docs/PLOAD v3.pdf [accessed March 21, 2007].

EPA (U.S. Environmental Protection Agency). 2004. MOVES2004 User Guide. Draft. EPA-420-P-04-019.
Assessment and Standards Division, Office of Transportation and Air Quality. Washington, D.C.: U.S.
Environmental Protection Agency. Available:
http://www.epa.gov/otaq/models/ngm/420p04019.pdf [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2005a. User's Guide for the Final NONROAD2005 Model.
EPA-420-R-05-013. Assessment and Standards Division, Office of Transportation and Air Quality.
Washington, D.C.: U.S. Environmental Protection Agency. Available:
http://www.epa.gov/otaq/models/nonrdmdl/nonrdmdl2005/420r05013.pdf [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2005b. PRZM-3, A Model for Predicting Pesticide and
Nitrogen Fate in the Crop Root and Unsaturated Soil Zones: Users Manual for Release 3.12.2. EPA-600-
R-05-111. Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 2005c. Integrated Exposure Uptake Biokinetic Model for
Lead in Children, Windows Version (lEUBKwin vl.O build 263). Superfund. Available:
http://www.epa.gov/superfund/programs/lead/products.htmffieubk [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2005d. Economic Impact Analysis of the Standards of
Performance for Stationary Compression Ignition Internal Combustion Engines. EPA-452-R-05-006.
Office of Air Quality Planning and Standards. Research Triangle Park, North Carolina: U.S. Environmental
Protection Agency. Available:
http://www.epa.gov/ttn/atw/nsps/cinsps/ci nsps eia reportfinalforproposal.pdf [accessed June 14, 2007].

EPA (U.S. Environmental Protection Agency). 2006a. Biogenic Emissions Inventory System (BEIS)
Modeling. Atmospheric Sciences Modeling Division, Office of Research and Development. Available:
http://www.epa.gov/asmdnerl/biogen.html [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2006b. NONROAD Model (Nonroad Engines, Equipment,
and Vehicles). Office of Transportation and Air Quality. Available: http://www.epa.gov/otaq/nonrdmdl.htm
[accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2006c. CSMoS Ground-Water Modeling Software. Ground
Water Technical Support Center, Risk Management Research, Office of Research and Development.
Available: http://www.epa.gov/ada/csmos/models.html [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2006d. HSPF. Exposure Assessment Models. Available:
http://www.epa.gov/ceampubl/swater/hspf/ [accessed June 13, 2007].
86
-------
EPA (U.S. Environmental Protection Agency). 2006e. Water Quality Analysis Simulation Program
(WASP). Ecosystems Research Division, Office of Research and Development. Available:
http://www.epa.gov/athens/wwqtsc/html/wasp.html [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2006f. The Chemical Mass Balance (CMB) Model EPA-
CMBv8.2. Receptor Modeling, Air Quality Models. Available: http://www.epa.gov/scram001/
receptor cmb.htm [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2006g. AQUATOX. Office of Water Science. Available:
http://www.epa.gov/waterscience/models/aguatox/ [accessed June 14, 2007].

EPA (U.S. Environmental Protection Agency). 2006h. BASS. Ecosystems Research Division, Office of
Research and Development. Available: http://www.epa.gov/athens/research/modeling/bass.html
[accessed June 14, 2007].

EPA (U.S. Environmental Protection Agency). 2007a. Better Assessment Science Integrating Point &
Nonpoint Sources (BASINS). Water Quality Models and Tools, Office of Water Science. Available:
http://www.epa.gov/waterscience/basins/ [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2007b. MOBILE6 Vehicle Emission Modeling Software.
Office of Transportation and Air Quality. Available: http://www.epa.gov/otaq/m6.htm [accessed June 13,
2007].

EPA (U.S. Environmental Protection Agency). 2007c. Modeling Products. Exposure Assessment.
Available: http://www.epa.gov/ceampubl/products.htm [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2007d. Community Multiscale Air Quality (CMAQ).
Atmospheric Science Modeling, Office of Research and Development. Available:
http://www.epa.gov/asmdnerl/CMAQ/index.html [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2007e. Human Exposure Modeling — Air Pollutants
Exposure Model (APEX/TRIM.Expo/nha/afonJ. Office of Air and Radiation. Available:
http://www.epa.gov/ttn/fera/human apex.html [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2007f. Benchmark Dose Software. National Center for
Environmental Assessment, Office of Research and Development. Available:
http://cfpub.epa.gov/ncea/cfm/recordisplay.cfm?deid=164443 [accessed June 14, 2007].

EPA (U.S. Environmental Protection Agency). 2007g. SERAFM— Spreadsheet-based Ecological Risk
Assessment for the Fate of Mercury. Surface Water Models. Exposure Assessment Models. Available:
http://www.epa.gov/ceampubl/swater/serafm/index.htm [accessed June 14, 2007].

EPA (U.S. Environmental Protection Agency). 2007h. Program to Assist in Tracking Critical Habitat
(PATCH). Western Ecology Division, Office of Research and Development. Corvallis, Oregon: U.S.
Environmental Protection Agency. Available:
http://www.epa.gov/wed/pages/models/patch/patchmain.htm [accessed June 14, 2007].
87
-------
EPA (U.S. Environmental Protection Agency). 2007L Benefits Analysis Models/Tools. Economics and
Cost Analysis Support. Available:
http://www.epa.gov/ttnecas1/benmodels.html [accessed June 14, 2007].

Glover, E.L., and M. Cumberworth. 2003. MOBILE6.1 Particulate Emission Factor Model Technical
Description — Final Report. EPA-420-R-03-001. Assessment and Standards Division, Office of
Transportation and Air Quality. Available: http://www.epa.gov/otag/models/mobile6/r03001.pdf [accessed
June 13, 2007].

Hawkins, T. 2005. Critical Evaluation of the Aquatox Model. Carnegie Mellon University. Available:
http://www.ce.cmu.edu/~trh/Professional/Research/Hawkins CriticalEvaluationOfTheAguatoxModel.pdf
[accessed March 31, 2007].

Hayes, J.T., P.A. O'Rourke, W.H. Terjung, and P.E. Todhunter. 1982. A feasible crop yield model for
worldwide international food production. Int. J. Biometeorol. 26(3): 239-257.

ICF Consulting. 2005. User's Guide to the Regional Modeling System for Aerosols and Deposition
(REMSAD): Version 8. Available:
http://www.remsad.com/documents/remsad users guide vS.OO 112305.pdf [accessed March 31, 2007].

ICF International/Systems Applications International. 2006. Regional Modeling System for Aerosols and
Deposition (REMSAD). Available: http://www.remsad.com/ [accessed June 13, 2007].

Knightes, C.D. 2005. SERAFM: An ecological risk assessment tool for evaluating wildlife exposure risk
associated with mercury-contaminated sediment in lake and river systems. Presentation at EPA Science
Forum 2005, May 16-18, 2005, Washington, D.C.

Lawler, J.J., D. White, R.P. Neilson, and A.R. Blaustein. 2006. Predicting climate-induced range shifts:
model differences and model reliability. Glob. Change Biol. 12(8):1568-1584.

Lifeline Group, Inc. 2006. Users Manual: Lifeline™ Verson 4.3. Software for Modeling Aggregate and
Cumulative Exposures to Pesticides and Chemicals. Available:
http://www.thelifelinegroup.0rg/lifeline/documents/v4.3 UserManual.pdf [accessed March 31, 2007].

Lifeline Group, Inc. 2007. Lifeline software. Available: http://www.thelifelinegroup.org/index.htm [accessed
June 13, 2007].

MBL (Marine Biological Laboratory). 2005. Marine Biological Laboratory General Ecosystem Model (MBL-
GEM). The Ecosystem Center, Marine Biological Laboratory. Available:
http://ecosvstems.mbl.edu/Research/Models/gem/welcome.html [accessed June 13, 2007].

MIT (Massachusetts Institute of Technology). 2006. Natural Emissions Model (NEM). The MIT Integrated
Global System Model: Ecosystems Impacts. Available:
http://web.mit.edu/globalchange/www/tem.htmltfnem [accessed June 13, 2007].
88
-------
Parkhurst, D.L., D.C. Thoratenson, and L.N. Plummer. 1980. PHREEQE: A Computer Program for
Geochemical Calculations. U.S. Geological Survey Water Research Investigations 80-96. Reston,
Virginia: U.S. Geological Survey.

Prudic, D.E., L.F. Konikow, and E.R. Banta. 2004. A New Stream- Flow Routing (SFR1) Package to
Simulate Stream-Aquifer Interaction With MOD-FLOW-2000. U.S. Geological Survey Open-File Report
2004-1042. U.S. Department of the Interior, U.S. Geological Survey. Available:
http://pubs.usgs.gov/of/2004/1042/ [accessed June 13, 2007].

Rashleigh, B. 2007. Assessment of lake ecosystem response to toxic events with the AQUATOX model.
In: I.E. Gonenc, V. Koutitonsky, B. Rashleigh, R.A. Ambrose, and J.P. Wolfin, eds. Assessment of the
Fates and Effects of Toxic Agents on Water Resources. Dordrecht, The Netherlands: Springer. 293-299.

Richmond, H., T. Palma, G. Glen, and L. Smith. 2001. Overview of APEX (2.0): EPA's Pollutant Exposure
Model for Criteria and Air Toxic Inhalation Exposures. Annual Meeting of the International Society of
Exposure Analysis, November 4-8, 2001, Charleston, South Carolina.

Ritchie, J.T., and D. Godwin. 2007. CARES Wheat 2.0. Michigan State University. Available:
http://nowlin.css.msu.edu/wheat book/ [accessed June 14, 2007].

Schwarz, G.E., A.B. Hoos, R.B. Alexander, and R.A. Smith. 2006. The SPARROW Surface Water-Quality
Model: Theory, Application and User Documentation. U.S. Geological Survey Techniques and Methods 6-
B3. U.S. Geological Survey. Available: http://pubs.usgs.gov/tm/2006/tm6b3/PDF.htm [accessed March
31,2007].

Shiftan, Y., and J. Suhrbier. 2002. The analysis of travel and emission impacts of travel demand
management strategies using activity-based models. Transportation 29(2): 145-168.

Sokolov, A.P., C.A. Schlosser, S. Dutkiewicz, S. Paltsev, D.W. Kicklighter, H.D. Jacoby, R.G. Prinn, C.E.
Forest, J. Reilly, C. Wang, B. Felzer, M.C. Sarofim, J. Scott, P.M. Stone, J.M. Melillo, and J. Cohen. 2005.
The MIT Integrated Global System Model (IGSM) Version 2: Model Description and Baseline Evaluation.
Report No. 124. Joint Program on the Science and Policy of Global Change, Massachusetts Institute of
Technology. Available: http://web.mit.edu/globalchange/www/abstracts.htmltfa124 [accessed March 31,
2007].

Systems Applications International, Inc. 1999. User's Guide to the Variable-Grid Urban Airshed Model
(UAM-V). San Rafael, California: Systems Applications International, Inc. Available:
http://www.uamv.com/documents/uam-v 1.31 user's guide.pdf [accessed June 13, 2007].

USGS (U.S. Geological Survey). 2007a. SPARROW Modeling of Surface-Water Quality. U.S. Department
of the Interior, U.S. Geological Survey. Available: http://water.usgs.gov/nawqa/sparrow/ [accessed June
13,2007].

USGS (U.S. Geological Survey). 2007b. MODFLOW-2000 Version 1.17.02. USGS Ground-Water
Software. U.S. Department of the Interior, U.S. Geological Survey. Available:
http://water.usgs.qov/nrp/qwsoftware/modflow2000/modflow2000.html [accessed June 13, 2007].
89
-------
Vukovich, J.M., and T. Pierce. 2002. The Implementation ofBEISS Within the SMOKE. 11th International
Emission Inventory Conference: Emission Inventories — Partnering for the Future, April 15-18, 2002,
Atlanta, Georgia. Available: http://www.epa.gov/ttn/chief/conference/ei11/modeling/vukovich.pdf
[accessed June 13, 2007].

Wilson, J.D., and R.L. Naff. 2004. MODFLOW-2000: The U.S. Geological Survey Modular Ground-Water
Model-GMG Linear Equation Solver Package Documentation. U.S. Geological Survey Water Resources
Open-File Report 2004-1261. U.S. Department of the Interior, U.S. Geological Survey. Available:
http://pubs.usgs.gov/of/2004/1261/ [accessed June 13, 2007].

Young, T., R. Randolph, and D. Bowman. 1994. Economic Growth Analysis System: Version 2.0. EPA-
600-SR-94-139. Air and Energy Engineering, Research Laboratory. Research Triangle Park, North
Carolina: U.S. Environmental Protection Agency. Available: http://www.p2pays.org/ref/07/0622.pdf
[accessed June 13, 2007].
90
-------
o
(D
(D

D
CD

0.
O
T3

(D
03
rt-
o'
vvEPA
United States
Environmental Protection
Agency
PRESORTED STANDARD
POSTAGES FEES PAID
EPA
PERMIT NO. G-35
8
Office of the Science Advisor (8105R)
Washington, DC 20460

Official Business
Penalty for Private Use
$300
(D
3
<-i-
03
Recycled/Recyclable
Printed with vegetable-based ink on
paper that contains a minimum of
50% post-consumer fiber content
processed chlorine free
-------