EPA/1 OO/K-09/003 I March 2009
                                      www.epa.gov/crem
United States
Environmental Protection
Agency
             Guidance on the Development,
             Evaluation, and Application of
             Environmental  Models
Office of the Science Advisor
Council for Regulatory Environmental Modelinj

-------
                                             EPA/1 OO/K-09/003
                                                 March 2009
                                       Office of the Science Advisor
Guidance on the Development, Evaluation, and
      Application of Environmental Models
             Council for Regulatory Environmental Modeling
                U.S. Environmental Protection Agency
                    Washington, DC 20460

-------
This Guidance on the Development, Evaluation, and Application of Environmental Models was prepared in
response to a request by the U.S. Environmental Protection Agency  (EPA) Administrator that EPA's
Council for  Regulatory  Environmental  Modeling (CREM)  help  continue  to  strengthen the  Agency's
development, evaluation, and use of models (http://www.epa.gov/osp/crem/library/whitman.PDF').

A  draft version  of this document  (http://cfpub.epa.gov/crem/crem  sab.cfm)  was  reviewed   by  an
independent panel  of experts established  by EPA's Science Advisory Board  and  revised by CREM in
response to the panel's comments.

This final document is available in printed and electronic form.  The electronic version provides direct links
to the references identified in the document.
Disclaimer

This document provides guidance to those who develop, evaluate, and apply environmental models. It
does not impose legally binding requirements; depending on the circumstances, it may not apply to a
particular situation. The U.S. Environmental Protection Agency (EPA) retains the discretion to adopt, on a
case-by-case basis, approaches that differ from this guidance.

-------
Authors, Contributors, and Reviewers
This  document was developed under the leadership of EPA's Council  for Regulatory Environmental
Modeling. A number of people representing EPA's core, program, and regional offices helped write and
review it.

PRINCIPAL AUTHORS:

Council for Regulatory Environmental Modeling Staff:
Noha Gaber, Gary Foley, Pasky Pascual, Neil Stiber, Elsie Sunderland

EPA Region 10:
Ben Cope

Office of Environmental Information:
Annett Mold  (deceased)

Office of Solid Waste and Emergency Response:
Zubair Saleem

CONTRIBUTORS AND INTERNAL REVIEWERS:

EPA Core Offices:
Office of Research and Development:
Justin Babendreier, Thomas Barnwell (retired), Ed  Bender, Lawrence Burns (retired), Gary Foley, Kathryn
Gallagher, Kenneth Galluppi, Gerry Laniak, Haluk Ozkaynak,  Kenneth  Schere, Subhas  Sikdar,  Eric
Weber, Joe Williams

Office of Environmental Information:
Ming Chang, Reggie Cheatham, Evangeline Cummings, Linda Kirkland, Nancy Wentworth

Office of General Counsel:
James Nelson  (retired), Barbara Pace, Quoc Nguyen, Manisha Patel, Carol Ann Sicilano

Science Advisory Board:
Jack Kooyoomjian

EPA Program Offices:
Office of Air and Radiation:
Tyler Fox, John Irwin (retired), Joe Tikvart, Richard (Chet) Wayland, Jason  West

Office of Prevention, Pesticides and Toxic Substances:
Lynn Delpire,  Alan  Dixon, Wen-Hsiung  Lee, David  Miller, Vince Nabholz,  Steve  Nako,  Neil Patel,
Randolph Perfetti (retired), Scott Prothero, Donald  Rodier

-------
Office of Solid Waste and Emergency Response:
Peter Grevatt, Lee Hofmann, Stephen Kroner (retired), Larry Zaragoza

Office of Water:
Jim Carleton, Sharon E. Hayes, Marjorie Wellman, Denise Keehner, Lauren Wisniewski, Lisa McGuire,
Mike Messner, James F. Pendergast

EPA Regional Offices:
Region 1:
Brian Hennessey, Michael Kenyon

Region 2:
Kevin Bricke, Rosella O'Connor, Richard Winfield

Region 3:
Alan Cimorelli

Region 4:
Nancy Bethune, Brenda Johnson, Tim Wool

Region 5:
Bertram Frey, Arthur Lubin, Randy Robinson, Stephen Roy, Mary White

Region 6:
James Yarborough

Region 7:
Bret Anderson

Region 10:
David Frank (retired), John Yearsley (retired)
                                             IV

-------
Preface                                                                                   ii

Disclaimer                                                                                ii

Authors, Contributors, and Reviewers                                                       iii

Executive Summary                                                                       vii

1.              INTRODUCTION
1.1             Purpose and Scope of This Document                                          1
1.2             Intended Audience                                                           2
1.3             Organizational Framework                                                    2
1.4             Appropriate Implementation of This Document                                    3

2.              MODELING FOR ENVIRONMENTAL DECISION SUPPORT
2.1             Why Are Models Important?                                                   4
2.2             The Modeling Life-Cycle                                                      5

3.              MODEL DEVELOPMENT
3.1             Introduction                                                                 8
3.2             Problem Specification and Conceptual Model Development                        9
3.2.1           Define the Objectives                                                         9
3.2.2           Determine the Type and Scope of Model Needed                                 9
3.2.3           Determine Data Criteria                                                       9
3.2.4           Determine the Model's Domain of Applicability                                   10
3.2.5           Discuss Programmatic Constraints                                            10
3.2.6           Develop the Conceptual Model                                                10
3.3             Model Framework Selection and Development                                   11
3.3.1           Model Complexity                                                           12
3.3.2           Model Coding and Verification                                                14
3.4             Application Tool Development                                                15
3.4.1           Input Data                                                                 16
3.4.2           Model Calibration                                                           17

4.              MODEL EVALUATION
4.1             Introduction                                                                19
4.2             Best Practices for Model Evaluation                                            21
4.2.1           Scientific Peer Review                                                       23
4.2.2           Quality Assurance Project Planning and Data Quality Assessment                  25
4.2.3           Corroboration, Sensitivity Analysis, and Uncertainty Analysis                      26
4.2.3.1         Types of Uncertainty                                                         26
4.2.3.2         Model Corroboration                                                         29
4.2.3.3         Sensitivity and Uncertainty Analysis                                            31
4.3             Evaluating Proprietary Models                                                31
4.4             Learning From Prior Experiences — Retrospective Analyses of Models             32
4.5             Documenting the Model Evaluation                                            33
4.6             Deciding Whether to Accept the Model for Use in Decision Making                  34

5.              MODEL APPLICATION
5.1             Introduction                                                                35
5.2             Transparency                                                               37
5.2.1           Documentation                                                              37
5.2.2           Effective Communication                                                     38
5.3             Application of Multiple Models                                                39
5.4             Model Post-Audit                                                            39

-------
APPENDICES
              Appendix A:  Glossary of Frequently Used Terms                            41
              Appendix B:  Categories of Environmental Regulatory Models                 49
              Appendix C: Supplementary Material on Quality Assurance Planning and       56
              Protocols
              Appendix D:  Best Practices for Model Evaluation                            60

Literature Cited                                                                       77
                                          VI

-------

In pursuing  its  mission to protect human  health and to safeguard the natural environment, the U.S.
Environmental Protection Agency often relies on environmental models. In this guidance,  a model is
defined as a "simplification of reality that is constructed to gain insights into select attributes of a particular
physical, biological, economic, or social system."

This guidance provides recommendations for the effective development, evaluation, and use of models in
environmental  decision  making   once  an  environmental  issue  has  been   identified.    These
recommendations are drawn  from Agency white papers,  EPA Science Advisory Board reports, the
National Research Council's Models in Environmental Regulatory Decision Making, and peer-reviewed
literature. For organizational simplicity, the  recommendations are categorized into three sections: model
development, model evaluation, and model application.

Model development can be viewed as a process with three main steps: (a) specify the environmental
problem (or set of issues) the  model is intended to address and  develop the conceptual  model, (b)
evaluate or develop the model framework (develop the mathematical  model), and (c)  parameterize the
model to develop the application tool.

Model evaluation is the  process for generating information over the life cycle of the project that helps
determine whether a model and its analytical results are  of sufficient quality to serve as the basis for a
decision.   Model quality  is an attribute that is meaningful  only  within the  context of  a specific model
application.   In  simple  terms,  model evaluation provides  information to  help answer  the following
questions: (a) How have the principles of sound science been addressed during model development? (b)
How is the choice of model supported by the quantity and quality  of available data? (c) How closely does
the model approximate the real system of interest? (d) How well does the model perform the specified
task while meeting the objectives set by quality assurance  project  planning?

Model application (i.e., model-based decision making) is strengthened when the science underlying the
model is transparent.  The elements of transparency emphasized  in this guidance are (a) comprehensive
documentation of all aspects of a modeling project (suggested as a  list of elements relevant to any
modeling project) and (b) effective communication  between modelers, analysts, and decision  makers.
This approach  ensures that  there is a clear rationale  for using  a  model  for a specific regulatory
application.

This guidance recommends best practices to help determine when a model, despite its uncertainties, can
be appropriately used to inform a decision. Specifically, it recommends that model developers and users:
(a) subject their model to credible, objective peer review;  (b) assess the quality of the data they use; (c)
corroborate their model by evaluating the degree to which it corresponds to the system being modeled;
and (d) perform sensitivity and uncertainty analyses.  Sensitivity analysis evaluates the effect of changes
in input values or assumptions on a model's results.  Uncertainty  analysis investigates the effects of lack
of knowledge and other potential sources of error in the model  (e.g., the "uncertainty" associated with
model parameter values). When conducted in combination,  sensitivity and uncertainty analysis allow
model users to  be more informed about the confidence that can  be placed in model results.  A model's
quality to support a decision becomes better known when information is available to assess these factors.
                                              VII

-------
1.
1.1     Purpose and Scope of This Document

The U.S. Environmental Protection Agency (EPA) uses a wide range of models to inform decisions that
support its mission of protecting human health and safeguarding the natural environment — air, water,
and land — upon which life depends.  These models include atmospheric and indoor air models, ground
water and surface  water  models, multimedia models, chemical equilibrium  models,  exposure models,
toxicokinetic models, risk assessment models, and economic models. These models range from simple to
complex and may employ  a combination of scientific, economic, socio-economic, or other types of data.

As stated in the National  Research  Council (NRC) report Models in Environmental Regulatory Decision
Making, models are critical to regulatory decision  making because the spatial  and temporal scales linking
environmental controls and environmental quality generally do not allow for an observational approach to
understand the relationship  between economic activity and environmental quality (NRC 2007). Models
have a long history of helping to explain  scientific phenomena  and  predict  outcomes and  behavior in
settings where empirical observations are limited or not available.

This guidance uses the NRC report's definition of a model:

       A simplification of reality that is constructed to gain insights into select attributes of
       a particular physical, biological, economic, or social system.

In particular, this guidance focuses on the subset of all models termed "computational models" by the
NRC. These are models that use measurable variables, numerical inputs, and mathematical relationships
to  produce quantitative outputs. (Note that all  terms underlined  in  this document  are  defined in the
Glossary, Appendix A).

As models become increasingly significant in decision making, it is important that the model development
and  evaluation  processes conform to  protocols or standards  that help ensure the  utility, scientific
soundness, and defensibility of the  models and their outputs for decision making.  It  is also increasingly
important to  plan and manage the  process of using models to inform decision making (Manno et al.
2008).  This guidance document aims to facilitate  a widespread understanding of the processes for model
development,  evaluation,  and application and thereby promote their appropriate application  to support
informed decision making.   Recognizing  the diversity of modeling applications throughout the Agency,
the  principles and  practices described in the guidance apply generally to  all  models used  to inform
Agency decisions, regardless  of domain,  mode,  conceptual basis,  form, or rigor level (i.e., varying from
screening-level applications to complex analyses) (EPA 2001).  The principles presented in this guidance
are also applicable to models not used for regulatory purposes as experience  has shown that models
developed for research  and development have often  found  useful applications  in  environmental
management purposes.

This guidance presents recommendations  drawn  from Agency  white papers on environmental modeling,
EPA Science Advisory Board (SAB)  reports,  NRC's  Models in Environmental Regulatory Decision
Making, and the peer-reviewed  literature.  It provides an overview of best  practices for ensuring and
evaluating the quality of environmental models.

-------
These  practices complement the  systematic QA planning process for  modeling  projects outlined in
existing guidance (EPA 2002b).  These QA processes produce documentation supporting the quality of
the model development and application process (Appendix C, Box C1:  Background on  EPA Quality
System). For example, QA plans should contain performance criteria ("specifications"') for a model in the
context of its intended use, and these criteria should be developed at the onset of each project.  During
the model evaluation process, these criteria are subjected to a series of tests of model quality  ("checks").
Documentation of these specifications and the evaluation results provides a record of how well a model
meets its intended use and the basis for a decision on model acceptability.

The primary purpose of this guidance is to provide specific advice on how to best perform these "checks"
during  model development, evaluation, and  application.  Following the best practices emphasized in this
document, together with well-documented QA project plans, will help ensure that results of modeling
projects and the decisions informed by them heed the principles of the Agency's  Information Quality
Guidelines (EPA 2002a).

1.2    Intended Audience

This document is intended  for a wide range of audiences,  including  model  developers, computer
programmers, model  users,  policy makers  who work with models, and affected stakeholders.  Model
users include those who generate model output (i.e., who set  up,  parameterize, and run  models)  and
managers who use model outputs.

1.3    Organizational Framework

The main body of this document provides an overview of principles  of good modeling for all users.  The
appendices present technical information and examples that may be more appropriate for specific user
groups. For organizational simplicity, the main body of this guidance has separate chapters on the three
key topics: model development, model evaluation, and model application. However, it is important to note
that these three topics are not strictly sequential, For example, the process of evaluating a model and its
input data to ensure their quality  should be undertaken  and documented during  all stages of model
development and application.

Chapter 1 serves as a general introduction and outlines the scope of this guidance. Chapter 2 discusses
the role of models in environmental decision making. Figure 1 at the  end of Chapter 2 shows the steps in
the model development and application process and the  role that models  play  in the public policy
process.  Chapters 3 and 4 provide guidance on model development (including problem specification)
and  model evaluation, respectively.   Finally,  Chapter 5 recommends  practices  for most  effectively
incorporating information from environmental models into the Agency's policy or regulatory decisions.

Several appendices present more detailed technical information and examples that complement the
chapters.  Appendix A provides definitions for all underlined terms in this guidance, and Appendix B
summarizes the categories of models that are integral to environmental regulation. Appendix C presents
additional background information on the QA program and other relevant topics.  Appendix D presents
an overview of best practices that may be used to evaluate models, including  more detailed information
on the peer review process for models and specific technical guidance on tools for model evaluation.

-------
1.4    Appropriate Implementation of This Document

The principles and  practices described in this guidance are designed to apply generally to all types of
models; however, EPA program and regional offices may modify the recommendations,  as appropriate
and necessary to the specific modeling project and application.  Each  EPA office is responsible for
implementing the best practices described in a manner appropriate to meet its needs.

As indicated by the use of non-mandatory  language such as "may," "should," and "can," this  document
provides recommendations and suggestions and does  not create legal rights or impose  legally binding
requirements on EPA or the public.

The Council for Regulatory Environmental Modeling has also developed the Models Knowledge Base —
a Web-based inventory of information on models used in EPA — as a companion product to complement
this document.  This inventory provides convenient access to standardized documentation  on the models'
development, scientific basis, user requirements, evaluation studies, and application examples.

-------
2.     Modeling for Environmental Decision Support	

2.1     Why Are Models Important?

This guidance defines a model as "a simplification of reality that is constructed to gain insights into
select attributes of a particular physical,  biological,  economic, or social system."   A model
developer sets boundary conditions and determines which aspects of the system are to be  modeled,
which processes are important, how these  processes may be represented  mathematically, and what
computational methods to use in implementing the mathematics.  Thus, models are based on simplifying
assumptions and cannot completely replicate the complexity inherent in environmental systems. Despite
these limitations, models  are essential  for a variety of purposes in the environmental field. These
purposes tend to fall  into two categories:

•   To diagnose (i.e., assess what happened) and examine causes and precursor conditions (i.e., why it
    happened) of events that have taken place.
•   To forecast outcomes and future events (i.e., what will happen).

Whether applied to current conditions or envisioned future circumstances, models play an important role
in environmental management.  They are an important tool to analyze environmental and human health
questions  and characterize  systems that are  too complex to  be addressed solely through  empirical
means.

Models can be classified in various ways (see Appendix B) — for example,  based on their conceptual
basis and mathematical solution, the purpose for which they were developed and are applied, the domain
or discipline to which they apply, and the  level of resolution and complexity at which they operate. Three
categories of regulatory models have been identified based on their purpose or application (CREM 2001):

•   Policy analysis. The results of policy analysis models affect national policy decisions. These models
    are used to  set policy for large, multi-year programs or concepts — for example national  policy on
    acid rain and phosphorus reduction in the Great Lakes.
•   National regulatory decision making.  These models inform national  regulatory decision making
    after  overall  policy has  been  established.  Examples include the use of a  model to  assist  in
    determining  federal regulation  of a  specific  pesticide or  to aid in establishing national effluent
    limitations.
•   Implementation applications.  These models are used in situations where policies and regulations
    have already been made. Their development and use may be driven by court-ordered schedules and
    the need for local action.

Environmental models are one source of information for Agency decision makers who need to consider
many competing objectives.  A number  of EPA programs make  decisions  based on information from
environmental modeling applications. Within the Agency:

•   Models are  used to simulate many different processes,  including natural (chemical, physical, and
    biological) systems, economic phenomena,  and decision processes.
•   Many types  of models are employed, including economic, behavioral,  physical, engineering design,
    health, ecological, and fate/transport models.

-------
•   The geographic scale of the problems addressed by a model can vary from national scale to  an
    individual site. Examples of different scales include:
    •   National air quality models used in decisions about emission requirements.
    •   Watershed-scale water quality models used in decisions about permit limits for point sources.
    •   Site-scale human  health risk  models  used  in  decisions  about  hazardous  waste  cleanup
       measures.

Box 1:  Examples of EPA Web Sites Containing Model Descriptions for Individual Programs

National Environmental Research Laboratory Models: http://www.epa.qov/nerl/topics/models.html
Atmospheric Sciences Modeling Division: http://www.epa.gov/asmdnerl/index.html
Office of Water's Water Quality  Modeling: http://www.epa.gov/waterscience/wqm
Center for Subsurface Modeling Support: http://www.epa.gov/ada/csmos.html
National Center for Computational Toxicology: http://www.epa.gov/ncct
Support Center for Regulatory Atmospheric Modeling:  http://www.epa.gov/scram001/agmindex.htm
Models also  have useful  applications  outside the regulatory  context.   For example, because models
include explicit mathematical  statements about system  mechanics, they serve  as research tools for
exploring  new  scientific issues  and screening  tools for simplifying and/or refining  existing scientific
paradigms or software (SAB 1993a, 1989). Models can also help users study the behavior of ecological
systems, design field studies, interpret data, and generalize results.

2.2     The Modeling Life-Cycle

The process  of developing and applying a model to  address a specific decision making need generally
follows the iterative progression described in Box 2 and depicted in Figure 1.  Models are used to address
real or perceived  environmental problems.  Therefore, a modeling process (i.e., model  development,
evaluation, and application described in chapters 3,  4, and 5,  respectively) is initiated after the Agency
has identified an environmental problem and determined that model results could provide useful input for
an Agency decision.

Problem identification will be most successful if it involves all parties who would be involved in model
development and use (i.e., model developers, intended users, and decision makers).  At a  minimum, the
Agency should develop a relatively simple, plain English problem identification statement.

-------
Box 2: Basic Steps in the Process of Modeling for Environmental Decision Making
(modified from Box 3-1, NRC Report on Models in Regulatory Environmental Decision Making)
 Step
                     Modeling Issues
Problem identification
and specification:
to determine the right
decision-relevant questions
and establish modeling
objectives
Definition of model
purpose
Goal
Decisions to be supported
Predictions to be made
Specification of
modeling context
Scale (spatial and temporal)
Application domain
User community
Required inputs
Desired output
Evaluation criteria
Model development: to
develop the conceptual
model that reflects the
underlying science of the
processes being modeled,
and develop the
mathematical
representation of that
science and encode these
mathematical expressions
in a computer program
Conceptual model
formulation
Assumptions (dynamic, static, stochastic, deterministic)
State variables represented
Level of process detail necessary
Scientific foundations
Computational
model development
Algorithms
Mathematical/computational methods
Inputs
Hardware platforms and software infrastructure
User interface
Calibration/parameter determination
Documentation
Model evaluation: to test
that the model expressions
have been encoded
correctly into the computer
program and test the model
outputs by comparing them
with empirical data	
Model testing and
revision
Theoretical corroboration
Model components verification
Corroboration (independent data)
Sensitivity analysis
Uncertainty analysis
Robustness determination
Comparison to evaluation criteria set during formulation
Model application:
running the model and
analyzing its outputs to
inform a decision
Model use
Analysis of scenarios
Predictions evaluation
Regulations assessment
Policy analysis and evaluation
Model post-auditing	

-------
                              Public Peltcf ;,Pr0ees»:
       ENVIRONMENT
      tmmementation
                                                          .Administrative (OMB)
                                                  4	M    & Regulatory
                                                            interpretation of
                                                            Model Results &
                                                             Uncertainties
   Regulatory
Agency Decisions
   STAKEHOLDERS include:
  - Source fadliity owners or responsi
  - Directly affected neignfeormfg property owners an
-------
Summary of Recommendations for Model Development

•   Communication between model developers and model users is crucial during model development.
•   Each element of the conceptual  model should be clearly described (in words, functional expressions,
    diagrams,  and graphs,  as necessary), and  the  science  behind  each element should be clearly
    documented.
•   When possible, simple competing conceptual models/hypotheses should be tested.
•   Sensitivity analysis should  be used early and often.
•   The optimal level of model complexity should  be determined by making appropriate tradeoffs among
    competing  objectives.
•   Where possible, model parameters should be characterized using direct measurements of sample
    populations.
•   All input data should meet data quality acceptance criteria in the QA project plan for modeling.

3.1     Introduction

Model  development  begins after problem identification —  i.e., after the Agency has  identified an
environmental  problem it needs to address and has determined that models may provide useful input for
the  Agency decision making needed to address the problem (see Section  2.2). In  this  guidance, model
development comprises the steps involved in  (1)  confirming whether a model is, in fact, a  useful tool to
address the  problem; what type of model would be most useful; and whether an existing model can be
used for this  purpose; as well as (2) developing an appropriate model if one does not already exist. Model
development sets the stage for model evaluation (covered in chapter 3), an  ongoing process in which the
Agency evaluates the appropriateness of the  existing or new model to help address the environmental
problem.

Model  development can  be  viewed  as a process with three main steps:  (a) specify the environmental
problem (or  set  of issues) the model is intended to address and develop the conceptual model, (b)
evaluate or develop the model framework (develop the mathematical  model), and (c) parameterize the
model  to develop the application tool. Sections 3.2, 3.3, and 3.4 of this chapter, respectively,  describe
the  various aspects and considerations involved in implementing each of these steps.

As  described below,  model development is a collaborative effort involving model  developers,  intended
users, and decision makers (the "project team"). The perspective and skills of each group are important to
develop a model that will provide  an  appropriate, credible, and defensible basis for addressing the
environmental issue of concern.

A "graded approach" should be used throughout the model development process. This involves  repeated
examination  of the scope, rigor, and complexity of the modeling analysis in light of the intended use of
results, degree of confidence needed in the results and Agency  resource constraints.

-------
3.2    Problem Specification and Conceptual Model Development

Problem  specification,  culminating in development of the conceptual model,  involves an  iterative,
collaborative effort among model developers, intended users, and decision  makers (the project team) to
specify all aspects of the problem that  will  inform subsequent  selection  or development of a model
framework. Communication between model developers and model users is crucial to clearly establish the
objectives of the  modeling  process; ambiguity at this stage  can undermine the chances  for success
(Manno et al. 2008).

During problem specification, the project team defines the regulatory or research objectives, the type and
scope of model best suited to meet those objectives, the data criteria, the model's domain of applicability,
and any programmatic constraints. These considerations provide the basis for developing a conceptual
model, which depicts or describes the most important behaviors of the system, object, or process relevant
to the problem of  interest. Problem specification and the resulting conceptual model define the modeling
needs sufficiently  that the project team can  then  determine whether an existing model can  be used to
meet those needs or whether a new model should  be developed.

       3.2.1    Define the Objectives

The first step in  problem specification is to  define the regulatory or research objectives (i.e., what
questions  the  model needs to answer).   To  do  so, the team should develop  a written statement of
modeling  objectives that includes the state  variables  of  concern, the stressors  driving  those state
variables,  appropriate temporal  and spatial  scales, and  the degree of model accuracy and precision
needed.

       3.2.2   Determine the Type and Scope of Model Needed

Many different types of models  are available, including  empirical vs. mechanistic, static vs. dynamic,
simulation vs.  optimization,  deterministic vs. stochastic,  and  lumped vs. distributed. The project team
should discuss  and compare alternatives with respect to their ability to meet the objectives in order to
determine the most appropriate type of model for addressing the problem.

The scope (i.e.,  spatial, temporal and process  detail)  of models  that can be used for  a particular
application can  range from very simple to very complex depending on the problem specification and data
availability, among other factors.  When different types of  models  may be appropriate for solving different
problems,  a  graded approach should be used to select or develop models that will provide the  scope,
rigor, and  complexity appropriate to the intended  use of and confidence needed in the  results.  Section
3.3.1 provides more information on considerations regarding model complexity.

       3.2.3   Determine Data Criteria

This step  includes developing data quality objectives (DQOs) and specifying the acceptable range of
uncertainty.  DQOs (EPA 2000a) provide  specifications  for model quality and associated checks (see
Appendix  C,  Box C1:  Background on EPA Quality System).  Well-defined  DQOs guide the design of
monitoring plans  and the model  development process (e.g.,  calibration  and verification). The DQOs
provide guidance  on how to  state data  needs when limiting  decision errors (false positives or false

-------
negatives) relative to a given decision.1 The DQOs should include a statement about the acceptable level
of total uncertainty that will still enable model results to be used for the intended purpose (Appendix C,
Box C2: Configuration Tests Specified in the QA Program). Uncertainty describes the lack of knowledge
about models, parameters, constants, data, and beliefs. Defining the ranges of acceptable uncertainty —
either qualitatively  or quantitatively — helps project  planners  generate  "specifications"  for quality
assurance planning and partially determines the appropriate boundary conditions and complexity for the
model being developed.

       3.2.4   Determine the Model's Domain of Applicability

To select an appropriate model, the project team must understand the model's domain of applicability —
i.e., the set of conditions under which use of the model is scientifically defensible and the relevant
characteristics of the system to be  modeled.  This involves identifying the environmental domain to be
modeled and then specifying the processes and conditions within that domain, including the transport and
transformation processes relevant to the policy/management/research objectives, the important time and
space scales inherent in transport and transformation processes within that domain in comparison to the
time and space scales of the problem objectives, and any peculiar conditions of the domain that will affect
model selection or new model construction.

       3.2.5   Discuss Programmatic Constraints

At this stage,  the project  team also needs  to  consider any factors that could constrain the modeling
process. This discussion should include considerations of time and budget, available data or resources to
acquire more data, legal and institutional factors, computer resource constraints, and the experience and
expertise of the modeling staff.

       3.2.6   Develop the Conceptual Model

A conceptual model depicts or describes the most important behaviors of the system, object, or process
relevant to the problem of interest. In developing the conceptual  model, the model developer may
consider literature, fieldwork, applicable anecdotal evidence, and relevant historical modeling projects.
The developer should clearly describe (in words, functional expressions, diagrams, and/or graphs) each
element of the conceptual model and should document the science behind each element (e.g., laboratory
experiments, mechanistic evidence,  empirical data supporting the hypothesis,  peer-reviewed literature) in
mathematical form, when possible. To the  extent feasible, the modeler should also provide information
on assumptions, scale, feedback mechanisms, and static/dynamic behaviors.  When relevant, the
strengths and weaknesses of each constituent hypothesis should be described.
1 False rejection decision errors (false positives) occur when the null hypothesis (or baseline condition) is incorrectly
rejected based on the sample data. The decision is made assuming the alternate condition or hypothesis to be true
when in reality it is false.  False acceptance decision errors (false negatives) occur when the null hypothesis (or
baseline  condition) cannot be rejected based on the available sample data.  The decision is made assuming the
baseline condition is true when in reality it is false.
                                               10

-------
3.3    Model Framework Selection or Development

Once the team has specified the problem and type of model needed to address the problem, the next
step is to identify or develop a model framework that meets those specifications. A model framework is a
formal mathematical specification  of the concepts,  procedures, and  behaviors underlying the system,
object, or process relevant to the problem of interest, usually translated into computer software.

For mechanistic modeling of common environmental problems, one or more suitable  model frameworks
may exist.   Many  existing model frameworks  in the public domain  can  be used  in  environmental
assessments.  Several institutions, including EPA, develop and maintain these model frameworks on an
ongoing basis. Ideally, more than one model framework will meet the project needs, and the project team
can select the best model for the specified problem.  Questions to consider when evaluating existing
model frameworks are described below.

Sometimes no  model frameworks are  appropriate to the task, and  EPA  will  develop  a new  model
framework or modify an  existing framework to include the additional capabilities needed to address the
project needs.

Some assessments require linking multiple model frameworks, such that the output from one model is
used  as  input data to another  model.   For example, air quality modeling often  links meteorological,
emissions, and  air chemistry/transport models. When  employing linked models, the project team should
evaluate each  component model, as well as the full system  of integrated models,  at each stage of the
model development and evaluation process.

In all cases, the documentation for the selected model should clearly state why and how the  model can
and will be used.

As potential model frameworks are identified or developed for addressing the problem,  the project team
will need to consider several issues, including:

•   Does  sound  science (including peer-reviewed  theory  and equations) support the  underlying
    hypothesis?
•   Is the model's complexity appropriate for the problem at hand?
•   Do the quality and quantity of data support the choice of model?
•   Does the model structure reflect all the relevant inputs described in the conceptual model?
•   Has the model code been developed? If so, has it been verified?

It is recommended that the evaluation process apply the principles of scientific hypothesis testing (Platt
1964) using an iterative  approach (Hilborn and Mangel 1997). If the team is evaluating multiple  model
frameworks, it may  be useful to statistically compare  the  performance of these competing models with
observational, field, or laboratory data (Chapter 4).

Box 3: Example of Model  Selection Considerations: Arsenic in Drinking Water
(from Box 5-3 of NRC's Models in Environmental Regulatory Decision Making)
A major challenge for regulatory model applications is which  model to use to inform the decision making  process.  In
this example, several models were available to  estimate the cancer incidence  associated  with different levels of
arsenic in drinking water. These models differed according to how age and exposure were incorporated (Morales et
                                              11

-------
al. 2000).  All the models assumed that the number of cancers observed in a specific age group of a particular village
followed a Poisson model with parameters, depending on the age and village exposure level. Linear, log, polynomial,
and spline models for age and exposure were considered.

These various  models differed substantially in their fitted values, especially in the critical low-dose area, which is so
important for establishing the benchmark dose  (BMD) that is  used to set a reference dose (RfD). The fitted-dose
response model was also  strongly affected by whether Taiwanese population data were  included  as  a baseline
comparison  group.  Depending  on  the  particular modeling  assumptions used, the estimates of the BMD  and
associated lower limit (BMDL) varied by over an order of magnitude.

Several strategies are available for choosing among  multiple models. One strategy is to pick the "best" model — for
example, use one of the popular statistical  goodness of fit measures, such  as the Akieke (sic) information criterion
(AIC) or the Bayesian information criterion (BIC). These approaches correspond to picking the model that maximizes
log-likelihood,  subject to a penalty function reflecting  the number of model parameters, thus  effectively forcing a
trade-off  between  improving model  fit by adding  addition  model  parameters versus  having  a  parsimonious
description. In  the case of the arsenic risk assessment, however,  the noisiness of the data meant that many of the
models explored by Morales et  al. (2000)  were relatively similar in terms of statistical goodness-of-fit criteria.  In a
follow-up paper, Morales et al. (2006) argued that it was important to address and account for the model uncertainty,
because  ignoring it would  underestimate the true variability of the estimated  model fit and,  in turn, overestimate
confidence in the resulting BMD and lead to "risky decisions" (Volinsky et al. 1997).

Morales et al.  suggested using  Bayesian model averaging (BMA) as a tool to avoid picking one particular model.
BMA combines over a class of suitable models. In practice, estimates based on a BMA approach tend to approximate
a weighted average of estimates based on individual  models,  with the weights reflecting  how well each individual
model fits the  observed  data. More precisely, these weights  can  be interpreted as the probability that a particular
model is the true model, given the observed data. Figure 2 shows the results  of applying a BMA procedure to the
arsenic data:

•    Figure 2(a) plots individual fitted models, with the width of each plotted line reflecting the weights.
•    Figure 2(b) shows the estimated overall dose-response curve (solid line) fitted via BMA. The shaded area shows
    the upper and lower limits (2.5%  and  97.5% tiles) based on the BMA procedure. The  dotted lines  show upper
     and lower limits based on the best fitting models.

Figure  2(b) (L30) effectively  illustrates the  inadequacy of standard  statistical confidence intervals in  characterizing
uncertainty in  settings where there is substantial  model uncertainty. The BMA limits coincide closely with the
individual curves at the upper level of the dose-response  curve where all the  individual  models tend  to give similar
results.
Figure 2. (a) Individual dose-response models, and (b) overall dose-response model fitted using the Bayesian model
averaging approach. Source: Morales et al. 2000.	


        3.3.1    Model Complexity


During the problem specification stage, the project team will have considered the degree of complexity
desired for the model (see Section  3.2.2). As described  below, model complexity influences uncertainty.
Models tend to uncertainty as they become increasingly simple or increasingly complex. Thus complexity
                                                   12

-------
is  an  important parameter  to  consider  when  choosing  among  competing  model  frameworks or
determining the suitability of the existing model framework to the problem of concern.  For the reasons
described below, the optimal choice generally is a model that is no more complicated than necessary to
inform  the regulatory decision. For the same  reasons, model complexity is an essential parameter to
consider when developing a new model framework.

Uncertainty exists when knowledge about specific factors, parameters (inputs), or models is incomplete.
Models have two fundamental types of uncertainty:

•   Model framework uncertainty, which is a function of the soundness of the model's underlying scientific
    foundations.
•   Data uncertainty, which arises from measurement errors, analytical imprecision, and  limited sample
    size during collection and treatment of the data used to characterize the model parameters.

These two types of uncertainty have a reciprocal relationship, with one increasing as the other decreases.
Thus, as illustrated in Figure 3, an optimal level of complexity (the "point of minimum uncertainty") exists
for every model.
   Uncertainty
                            Total Uncertainty
                               Model
                               Framework
                               Uncertainty
Point of
Minimum
                                                 Uncertainty
    Data
    Uncertainty^**
                                       Model Complexity
Figure 3.      Relationship between model framework uncertainty and data uncertainty, and their
              combined effect on total model uncertainty.
               (Adapted from Hanna 1988).

For example, air quality modelers must sometimes compromise when choosing among the physical
processes that will be treated explicitly in the model.  If the objective is to estimate the pattern of pollutant
concentration values near  one (or several)  source(s), then chemistry is typically of little importance
because the distances between the pollutant source and receptor are  generally too short for chemical
                                             13

-------
formation and destruction to greatly affect pollutant concentrations.  However, in such situations, other
factors tend to have a significant effect and  must be properly accounted for  in the model. These may
include building wakes, initial characterization of source release conditions and size, rates of diffusion of
pollutants  released as they  are  transported downwind, and land  use effects  on plume transport.
Conversely, when the objective is to estimate pollutant concentrations further from the source, chemistry
becomes more  important because  there is more time for chemical reactions to take place, and initial
source release effects become less important because the pollutants become well-mixed as they travel
through the atmosphere. To date, attempts to model both near-field dispersion effects and chemistry have
been inefficient and slow on desktop computers.

Because of these competing objectives, parsimony (economy or simplicity of assumptions) is desirable in
a model. As Figure 3 illustrates, as  models become more complex to treat more physical processes, their
performance tends to  degrade because they require  more  input variables,  leading to greater data
uncertainty. Because different models contain different types and ranges of uncertainty, it can useful to
conduct sensitivity  analysis early in the model development phase to identify the relative importance of
model parameters.  Sensitivity analysis is the process of determining how  changes in the model input
values or assumptions (including  boundaries and  model functional form) affect the model outputs
(Morgan and Henrion 1990).

Model  complexity  can be constrained by eliminating parameters when  sensitivity  analyses (Chapter
4/Appendix D) show that they do not significantly affect the outputs and when there is no  process-based
rationale for including them. However, a variable of little significance in one application of a model may be
more important in a different application.  In past reviews of Agency models, the SAB has supported the
general guiding principle  of simplifying complex models, where possible, for  the sake of transparency
(SAB  1988), but has emphasized that care should be taken not to eliminate important parameters from
process-based models simply because data  are  unavailable  or difficult to obtain  (SAB 1989).  In  any
case,  the quality and resolution  of available data  will ultimately constrain the type  of model that can be
applied.  Hence, it is important to  identify the existing data and and/or field  collection  efforts that are
needed to adequately parameterize the model framework and support the application of a model.  The
NRC Committee on Models in the Regulatory Decision Process recommended that models used  in the
regulatory process  should be  no more complicated than is necessary to inform  regulatory decision and
that it is often preferable to omit capabilities that do not substantially improve  model  performance (NRC
2007).

       3.3.2   Model Coding and Verification

Model coding translates the mathematical equations  that constitute the model framework into functioning
computer code.  Code verification ascertains that the computer code has no inherent numerical problems
with  obtaining a solution. Code verification  tests whether the code performs according to its design
specifications.   It should  include an examination  of the numerical technique in the computer code  for
consistency with the conceptual model and governing equations (Beck et al. 1994).  Independent testing of
the code once it is fully developed can be  useful as an additional check of integrity and quality.

Several early steps  can  help minimize  later programming  errors and  facilitate  the code  verification
process. For example:
                                              14

-------
-   Using "comment" lines to describe the purpose of each component within  the  code during
    development makes future revisions and improvements by different modelers and programmers more
    efficient.
-   Using a flow chart when the conceptual model is developed and before coding begins helps
    show the overall structure of the  model program.   This provides  a simplified description of the
    calculations that will be performed in each step of the model.

Breaking the program/model into component parts or modules is also useful for careful consideration
of model behavior in  an encapsulated way.  This allows the modeler to test the behavior of each sub-
component separately,  expediting  testing and  increasing  confidence in  the program.  A module is an
independent piece of software that forms part of one or more larger programs.  Breaking large models
into discrete modules facilitates  testing and debugging  (locating/correcting errors) compared to large
programs. The approach also makes it easier to  re-use relevant modules in future modeling projects, or
to update, add, or remove sections of the model without altering the overall program structure.

Use of generic algorithms for common tasks  can  often save time and resources, allowing efforts to
focus on developing and improving the original aspects of a new model. An algorithm is a  precise rule (or
set of rules) for solving some problem.  Commonly used algorithms are often published as "recipes" with
publicly available code (e.g., Press 1992).  Developers should review existing  Agency models and code
to minimize  duplication  of effort.   The CREM  models knowledge  base, which will  contain  a  Web-
accessible inventory of models, will provide a resource model developers can use for this purpose.

Software engineering has evolved rapidly in recent years and continues to advance rapidly with changes
in technology and user platforms.  For  example,  some of the general recommendations  for developing
computer code  given above do not apply to models that are developed using object-oriented platforms.
Object-oriented platform model systems  use a collection of cooperating "objects." These  objects are
treated as instances of a class within a class hierarchy, where a class is a set of objects that share  a
common structure and behavior.   The  structure of a  class is determined by the class variables, which
represent the state of an object of that class; the behavior  is given by the set of methods associated with
the class (Booch 1994). When models are developed with object-oriented platforms, the user should print
out the actual mathematical relationships the platform generates and review  them as part of the code
validation process.

Many references  on  programming style and  conventions  provide specific, technical suggestions for
developing  and testing  computer code (e.g.,  The Elements of Programming Style  [Kernigham and
Plaugher 1988]).   In addition,  the Guidance for Quality  Assurance Project Plans  for Modeling (EPA
2002b) suggests  a number  of practices during code  verification to "check" how well  it  follows the
"specifications" laid out during QA planning (Appendix C, Box C2: Configuration Tests Specified in the QA
Program).

3.4    Application Tool Development

Once a model framework has been selected or developed, the modeler populates the framework with the
specific system characteristics needed  to address the problem,  including geographic boundaries of the
model domain, boundary conditions, pollution source inputs, and model parameters.  In this manner, the
generic  computational capabilities of the  model framework are converted into an application tool to
                                              15

-------
assess a specific problem occurring at a specific location. Model parameters are terms in the model that
are fixed  during  a model  run  or simulation but can  be changed in different  runs, either to  conduct
sensitivity analysis or to perform an uncertainty analysis when probabilistic distributions are selected to
model parameters or achieve calibration (defined below) goals.  Parameters can be quantities estimated
from sample data that characterize statistical populations or they can  be constants such as the speed of
light and gravitational force.  Other activities at this stage of model development include creating a user
guide for the  model,  assembling datasets  for model input  parameters, and determining hardware
requirements.

        3.4.1   Input Data

As mentioned above, the accuracy, variability,  and precision of input data used in the model is a major
source of uncertainty:

•   Accuracy  refers  to  the closeness of a measured  or computed value to its "true" value (the value
    obtained with perfect  information').   Due to the  natural  heterogeneity  and  random variability
    (stochasticity')  of many environmental systems, this "true" value exists as a distribution rather than a
    discrete value.
•   Variability refers to  differences attributable to true heterogeneity or diversity in model parameters.
    Because of variability, the "true" value of model parameters is often a function of the degree  of spatial
    and temporal aggregation.
•   Precision  refers  to the  quality of  being reproducible in outcome or performance. With models and
    other forms of quantitative information, precision  often refers  to the number of decimal places to
    which a number is computed. This is a measure of the "preciseness" or "exactness" of the model.

Modelers should  always select  the  most appropriate data — as defined by QA  protocols  for field
sampling, data collection, and  analysis (EPA 2002c,  2002d,  2000b) — for use in  modeling analyses.
Whenever possible, all parameters should be directly measured in the system of interest.


Box 4: Comprehensive Everglades Restoration Plan: An  Example of the Interdependence of Models and

(from NRC's Models in Environmental Regulatory Decision Making)
The restoration  of the Florida Everglades  is the largest ecosystem restoration ever planned in terms of geographical
extent and number of individual components.  The NRC  Committee  on Restoration of the Greater Everglades
Ecosystem, which was charged with providing scientific advice on this effort, describes the  role that  modeling and
measurements should play in implementing an adaptive approach to restoration  (NRC 2003).  Under the committee's
vision, monitoring of hydrological  and ecological performance measures should be integrated  with mechanistic
modeling and experimentation to better understand how the Everglades function and how the system will respond to
management practices and external stresses.  Because  individual components of the restoration  plan will  be
staggered in time, the  early components can provide scientific feedback to guide and refine  implementation of later
components of the plan.


The NRC Committee on Models in the Regulatory Decision Process recommends that: "...using adapting
strategies to coordinate data collection and modeling should be a priority for decision makers and those
                                                16

-------
responsible for regulatory model development and  application. The interdependence of measurements
and modeling needs to be fully considered as early as the conceptual model development phase."

       3.4.2   Model Calibration

Some models are "calibrated" to set parameters.  Appendix C provides guidance on model calibration as
a QA project plan element (see Box C3:  Quality Assurance Planning Suggestions for Model Calibration
Activities).  In this guidance, calibration is defined as the process of adjusting model parameters within
physically defensible ranges until the resulting predictions give the best possible fit to the observed data
(EPA 1994b).   In some disciplines, calibration is also referred to as  parameter estimation (Beck et al.
1994).

Most process-oriented environmental models are under-determined; that is, they contain more uncertain
parameters than state variables  that can be used  to perform a calibration. Sensitivity analysis  can be
used to identify key processes influencing the state variables. Sometimes the rate constant for a key
process can be measured directly — for example, measuring the  rate of photosynthesis (a process) in a
lake in addition to the phytoplankton biomass (a state variable). Direct measurement of rate parameters
can reduce model uncertainty.

When a calibration database has been developed  and improved over time, the initial adjustments and
estimates may  need  period recalibration.  When  data for quantifying one or more parameter values are
limited, calibration  exercises can be used  to  find  solutions that result in the "best fit" of the  model.
However,  these solutions will not provide meaningful  information unless they are based  on measured
physically defensible ranges. Therefore, this type of calibration should be undertaken with caution.

Because of these concerns, the use of calibration to improve model performance varies among EPA
offices and regions.  For a  particular model, the  appropriateness  of calibration may be a function of the
modeling activities undertaken.  For example, the Office of Water's  standard practice is to calibrate well-
established model frameworks such as CE-QUAL-W2 (a model for predicting temperature fluctuations in
rivers) to a specific system (e.g., the Snake River). This calibration generates a site-specific tool (e.g., the
"Snake River Temperature" model).  In contrast, the Office of Air and  Radiation (OAR) more commonly
uses model frameworks and models that do not need site-specific adjustments.  For example,  certain
types of air models (e.g., gaussian plume) are parameterized for a range of meteorological conditions,
and thus  do not need to be "recalibrated" for different geographic locations  (assuming the range of
conditions is appropriate for the  model).  OAR also  seeks to avoid  artificial improvements in model
performance by adjusting model inputs outside the ranges supported by the empirical databases. These
practices prompted OAR to  issue the following statement on model calibration in their Guideline on Air
Quality Models  (EPA 2003b):

       Calibration  of models is not common practice and is subject to much  error and
       misunderstanding.  There have been attempts by some to compare model estimates and
       measurements on an event-by-event basis and then calibrate a model with results of that
       comparison.   This approach is severely limited by uncertainties in both source and
       meteorological data and therefore it is difficult to precisely estimate the concentration at
                                               17

-------
        an exact location for a specific increment of time. Such uncertainties make calibration of
        models of questionable benefit. Therefore, model calibration is unacceptable.

In general,  however, models  benefit from  thoughtful  adaptation  that  will enable  them  to respond
adequately to the specifics of each regulatory problem to which they are applied.
                                                 18

-------
Summary of Recommendations for Model Evaluation

    appropriately used to inform a decision.
•   Model evaluation addresses the soundness of the science underlying a model, the quality and
    quantity of available data, the degree of correspondence with observed conditions, and the
    appropriateness of a model for a given application.
•   Recommended components of the evaluation process include: (a) credible, objective peer review; (b)
    QA project planning and data quality assessment; (c) qualitative and/or quantitative model
    corroboration; and (d) sensitivity and uncertainty analyses.
•   Quality is an attribute of models that is meaningful only within the context of a specific model
    application.  Determining whether a model serves its intended purpose involves in-depth discussions
    between model developers and the users responsible for applying for the model to a particular
    problem.
•   Information gathered during model evaluation allows the decision maker to be better positioned to
    formulate decisions  and policies that take into account all relevant issues and concerns.


4.1     Introduction
       Models  will always  be  constrained by computational  limitations,  assumptions and knowledge
       gaps.  They can  best be viewed as tools to help inform decisions rather than as machines  to
       generate truth  or make decisions.  Scientific advances will never make  it possible to  build a
       perfect model that accounts for every aspect of reality or to prove that a given model is correct in
       all  aspects for a particular regulatory application. These  characteristics...suggest that model
       evaluation be viewed as an integral and ongoing part of the life cycle of a model, from problem
       formulation and model conceptualization to the development and application of a computational
       tool.
            —  NRC Committee on Models in the Regulatory Decision Process (NRC 2007)

The natural complexity of environmental systems makes it difficult to mathematically describe all relevant
processes,  including all the intrinsic mechanisms  that govern their behavior. Thus, policy makers often
rely on models  as tools  to approximate reality when making decisions that affect environmental systems.
The challenge facing model developers and users is determining when a model, despite its uncertainties,
can be appropriately  used to inform a  decision.  Model evaluation  is the  process used to make this
determination.  In this guidance, model evaluation is defined as  the process used to generate information
to determine whether a  model and its analytical results are  of a  quality sufficient to serve as the basis for
a decision.  Model  evaluation is conducted over the life cycle of the project, from development through
application.
                                                19

-------
Box 5: Model Evaluation Versus Validation Versus Verification
Model evaluation should not be confused with model validation. Different disciplines assign different meanings to
these terms and they are  often confused.  For  example,  Suter (1993) found  that among models used  for risk
assessments, misconception often arises in the form of the question "Is the model valid?" and statements such as
"No model should be used unless it has been validated." Suter further points out that "validated" in this context means
(a)  proven to correspond exactly to reality or (b) demonstrated through experimental tests to make  consistently
accurate predictions.
Because every model contains simplifications,  predictions derived from a model can never be  completely accurate
and a model can never correspond exactly to reality.  In addition,  "validated models" (e.g., those that have  been
shown to correspond to field data) do not necessarily generate accurate predictions of reality for multiple applications
(Beck 2002a). Thus, some researchers assert that no model is ever truly "validated"; models can only be invalidated
for a specific application (Oreskes et al. 1994).  Accordingly, this guidance focuses on process and techniques for
model evaluation rather than model validation or invalidation.
"Verification" is another term commonly applied to the evaluation process. However, in this guidance and elsewhere,
model verification typically refers to model code verification as defined in the model development section. For
example, the NRC Committee on Models in the  Regulatory Decision Process (NRC 2007) provides the following
definition:
        Verification refers to activities that are designed to confirm that the mathematical framework
       embodied in the module is correct and  that the computer code for a module is operating according
       to its intended design so that the results obtained compare favorably with those obtained using
       known analytical solutions or numerical solutions from simulators based on similar or identical
       mathematical frameworks.
In simple terms, model evaluation provides information to help answer four main questions (Beck 2002b):

1.  How have the principles of sound science been addressed during model development?
2.  How is the choice of model supported by the quantity and quality of available data?
3.  How closely does the model approximate the real system of interest?
4.  How does the model perform  the  specified task while meeting the  objectives set by QA  project
    planning?

These four factors address  two aspects  of model  quality. The first factor focuses  on the  intrinsic
mechanisms and generic properties of a model,  regardless of the particular task to which it is applied. In
contrast, the latter three factors are evaluated in the context of the use of a model within a specific set of
conditions.  Hence, it follows that model quality is an attribute that is meaningful only within the context of
a specific model application. A model's quality to support a decision becomes  known when information is
available to assess these factors.

The NRC committee recommends that evaluation of a regulatory model continue throughout the life of a
model and that an evaluation plan could:

•   Describe the model and its intended uses.
•   Describe the relationship of the model to data,  including the data for both inputs and corroboration.
                                                 20

-------
•   Describe how such data and other sources of information will be used to assess the ability of the
    model to meet its intended task.
•   Describe all the elements  of the evaluation plan by using an outline or diagram that shows how the
    elements relate  to the model's life cycle.
•   Describe the factors or events that might trigger the need for major model revisions or the
    circumstances that might prompt users to seek an alternative model.  These can be fairly broad and
    qualitative.
•   Identify the responsibilities, accountabilities, and resources needed to ensure implementation of the
    evaluation plan.

As stated above, the goal of model evaluation is to ensure model quality. At EPA, quality is defined by the
Information  Quality Guidelines (IQGs)  (EPA 2002a).   The  IQGs  apply to  all information that EPA
disseminates, including models,  information from  models, and input data  (see Appendix C, Box C4:
Definition of Quality).  According to the  IQGs, quality has three major components: integrity,  utility, and
objectivity.  This chapter focuses  on  addressing  the four questions listed above by evaluating the third
component,  objectivity —  specifically,  how to  ensure the objectivity  of information  from models by
considering their accuracy, bias, and reliability.

•   Accuracy, as described in Section 2.4, is the closeness of a measured or computed value to its "true"
    value, where the "true" value is obtained with perfect information.
•   Bias describes any systematic deviation between a measured (i.e., observed) or computed value and
    its "true" value.  Bias is affected by faulty instrument calibration and other measurement errors,
    systematic errors during data collection, and sampling errors such as incomplete spatial
    randomization during the design of sampling programs.
•   Reliability is the confidence that (potential) users have in a  model and its outputs such that they are
    willing to use the model and accept its results  (Sargent 2000). Specifically, reliability is a function of
    the model's performance record and  its conformance to best available,  practicable science.

This chapter describes principles, tools, and considerations for  model evaluation throughout all stages of
development and application.  Section 4.2 presents a variety of  qualitative and quantitative best practices
for evaluating models. Section 4.3 discusses special considerations for evaluating proprietary models.
Section  4.4 explains why retrospective analysis of models, conducted after a model has been applied,
can be important to  improve individual models and regulatory policies and to systematically enhance the
overall modeling field.  Finally,  Section 4.5 describes how the evaluation process culminates in a decision
whether to apply the model to  decision making. Section 4.6 reviews the key recommendations from this
chapter.

4.2     Best Practices for Model Evaluation

The four questions listed above address the soundness of the science underlying a model, the quality and
quantity  of available data,  the  degree  of correspondence  with  observed  conditions,  and  the
appropriateness  of  a  model for a given application.  This guidance describes several  "tools"  or  best
practices to  address these questions: peer review of models; QA project planning,  including data quality
assessment; model corro bo ration (qualitative and/or quantitative evaluation of a model's accuracy and
predictive  capabilities); and sensitivity and uncertainty  analysis. These  tools and practices include both
qualitative and quantitative techniques:
                                               21

-------
•   Qualitative assessments: Some of the uncertainty in model predictions may arise from sources
    whose uncertainty cannot be quantified. Examples are uncertainties about the theory underlying the
    model, the manner in which that theory is mathematically expressed to represent the environmental
    components, and the theory being modeled. Subjective evaluation of experts may be needed to
    determine appropriate values for model parameters and inputs that cannot be directly observed or
    measured (e.g., air emissions estimates).  Qualitative assessments are needed for these sources of
    uncertainty. These assessments may involve expert elicitation regarding the system's behavior and
    comparison with model forecasts.
•   Quantitative assessments:  The uncertainty in some sources — such as some model parameters and
    some input data — can be estimated through quantitative assessments involving statistical
    uncertainty and sensitivity analyses.  These types of analyses can also  be used to quantitatively
    describe how model estimates of current conditions may be expected to differ from comparable field
    observations.  However, since model predictions are not directly observed, special care is needed
    when quantitatively comparing model predictions with field data.

As discussed previously, model evaluation is an iterative process.  Hence, these tools and techniques
may be effectively applied throughout model development, testing, and  application and should not  be
interpreted as sequential steps for model evaluation.

Model evaluation should always be conducted using a graded approach that is adequate and appropriate
to the decision  at hand (EPA 2001, 2002b).   This approach  recognizes that model evaluation can  be
modified to the circumstances  of the problem at hand and that programmatic requirements are varied.
For example, a screening  model (a type of model designed to provide a "conservative" or risk-averse
answer) that is used  for risk management should undergo rigorous evaluation to avoid false negatives,
while still not  imposing  unreasonable data-generation burdens  (false  positives) on the  regulated
community.  Ideally, decision makers and modeling staff work together at the onset of new projects to
identify the appropriate degree of model evaluation (see Section 3.1).

External circumstances can affect the rigor required in model evaluation.  For example, when the likely
result of modeling will  be costly control strategies and associated controversy, more detailed  model
evaluation may be necessary.  In these cases, many aspects of the modeling may come under close
scrutiny, and the modeler must document the findings  of the model evaluation process  and be prepared
to answer questions that will arise  about the model.  A deeper level of model evaluation may also  be
appropriate when modeling unique or extreme situations that have not been previously encountered.

Finally, as noted earlier, some assessments require the use of multiple, linked models.  This linkage has
implications for assessing uncertainty and applying the system of models. Each component model as well
as the full system of integrated models must be evaluated.

Sections 4.2.1 and 4.2.2,  on  peer review of models and quality assurance protocols for input data,
respectively, are drawn  from existing guidance. Section  4.2.3, on model corroboration activities and the
use  of sensitivity and uncertainty analysis, provides  new guidance  for  model  evaluation (along  with
Append ixD).
                                              22

-------
Box 6: Examples of Life Cycle Model Evaluation

The value in evaluating a model from the conceptual stage through the use stage is illustrated in a multi-year project
conducted by the Organization for Economic Cooperation and Development (OECD). The project sought to develop a
screening model that could be used to assess the persistence and long-range transport potential of chemicals. To
ensure its effectiveness, the screening model needed to be a consensus model that  had been  evaluated against a
broad set of available models and data.
This project began  at a 2001 workshop to set model performance and evaluation  goals that would  provide the
foundation for subsequent model selection and development (OECD 2002). OECD then established an expert group
in 2002. This group  began its work by developing and publishing a guidance document on using multimedia models
to  estimate  environmental persistence and long-range transport. From 2003 to 2004, the  group compared and
assessed the performance of nine available multimedia fate and transport models (Fenner et  al. 2005; Klasmeier et
al. 2006). The group then  developed a parsimonious consensus  model  representing the minimum set  of  key
components identified in the  model comparison. They convened three international workshops to disseminate this
consensus model  and provide an ongoing model evaluation forum (Scheringer et al. 2006).
In this example, more than half the total effort was invested in the conceptual and model formulation stages, and
much of the effort focused on performance evaluation. The group recognized that each model's life cycle is different,
but noted that  attention should be  given to developing consensus-based approaches in  the  model concept and
formulation stages. Conducting concurrent evaluations at these stages in this setting resulted in a high degree of buy-
in from the various modeling groups.

        4.2.1   Scientific  Peer Review

Peer review provides the main mechanism  for independent evaluation  and review of environmental
models used by the Agency. Peer review provides  an independent, expert  review  of the evaluation in
Section 4.1; therefore, its purpose is two-fold:

•   To evaluate  whether the  assumptions, methods, and conclusions derived  from environmental models
    are based on sound scientific principles.
•   To check the scientific appropriateness of a model for informing a specific regulatory decision.  (The
    latter objective is particularly important for secondary applications of existing models.)

Information from peer reviews is also  helpful for choosing among multiple competing models for a specific
regulatory application.  Finally,  peer  review is useful to identify the  limitations of existing models. Peer
review  is not a  mechanism  to  comment on  the regulatory decisions or policies that are informed by
models (EPA 2000c).

Peer review charge questions and  corresponding records for peer reviewers to answer those questions
should  be incorporated into the  quality assurance project plan, developed during assessment planning
(see  Section  4.2.2,  below).  For example, peer reviews may focus  on  whether a  model  meets  the
objectives or specifications that were set as  part of the quality assurance plan (see  EPA 2002b) (see
Section 3.1).
                                                 23

-------
All models that inform significant2 regulatory decisions are candidates for peer review (EPA 2000c, 1993)
for several reasons:

•   Model results will be used as a basis for major regulatory or policy/guidance decision making.
•   These decisions likely involve significant investment of Agency resources.
•   These decisions may have inter-Agency or cross-agency implications/applicability.

Existing guidance recommends that a new model should be scientifically peer-reviewed prior to its first
application;  for subsequent applications, the program manager should consider the scientific/technical
complexity and/or the novelty of the particular circumstances to determine whether additional peer review
is needed (EPA 1993). To conserve resources, peer review of "similar" applications should be avoided.

Models used  for secondary applications  (existing EPA  models or proprietary  models) will generally
undergo a different type of evaluation than those developed with a specific regulatory information need in
mind.  Specifically, these  reviews may deal more with  uncertainty about the appropriate application of a
model to a specific set of conditions than with the science underlying the model framework.  For example,
a project team decides to assess a water quality problem using WASP, a well-established water quality
model framework.  The project team determines that  peer review of the model  framework  itself is not
necessary, and the  team  instead conducts a peer review on their specific application of the WASP
framework.

The following  aspects of a model should be peer-reviewed to establish  scientific credibility (SAB 1993a,
EPA 1993):

•   Appropriateness of input data.
•   Appropriateness of boundary condition specifications.
•   Documentation of inputs and assumptions.
•   Applicability and appropriateness of selected parameter values.
•   Documentation  and  justification  for  adjusting  model  inputs  to improve  model  performance
    (calibration).
•   Model application with respect to the range of its validity.
•   Supporting empirical data that strengthen  or contradict the conclusions that are based on model
    results.

To  be most effective and maximize its value, external peer review should begin as early in the model
development  phase as  possible (EPA 2000b).   Because peer review involves  significant time and
resources, these allocations must  be incorporated  into components of the project planning and any
 Executive Order 12866 (58 FR 51735) requires federal agencies to determine whether a regulatory action is
"significant" and therefore, subject to the requirements of the Executive Order, including review by the Office of
Management and Budget.  The Order defines "significant regulatory action" as one "that is likely to result in a rule
that may: (1) Have an annual effect on the economy of $100 million or more or adversely affect in a material way
the economy, a sector of the  economy, productivity, competition, jobs, the environment, public health or safety, or
State, local, or tribal governments or communities; (2) Create a serious inconsistency or otherwise interfere with an
action taken or planned by another agency; (3) Materially alter the budgetary impacts of entitlements, grants, user
fees, or loan programs or the rights and obligations of recipients thereof; or (4) Raise novel legal or policy issues
arising out of legal mandates, the President's priorities, or the principles set forth in [the] Order." Section 2(f).
                                                24

-------
related  contracts.   Peer  review in  the early  stages of model development can  help  evaluate the
conceptual basis of models  and potentially save time by redirecting misguided initiatives,  identifying
alternative approaches, or providing strong technical support for a potentially controversial position (SAB
1993a, EPA 1993).  Peer review in the later stages of model development is useful as an  independent
external review of model code (i.e., model verification).   External peer review of the applicability of a
model to a particular set of conditions should be considered well in advance of any decision making, as it
helps avoid inappropriate applications of a model for specific regulatory purposes (EPA 1993).

The  peer  review logistics are left to  the discretion of the  managers responsible for applying  the model
results to decision  making.   Mechanisms for accomplishing external peer review include  (but are not
limited to):

•   Using an ad hoc panel of scientists.3
•   Using an established external peer review mechanism such as the SAB
•   Holding a technical workshop.4

Several sources provide guidance for determining the qualifications and number of reviewers needed  for
a given modeling  project (SAB  1993a;  EPA 2000c,  1993, 1994a).  Key  aspects  are  summarized in
Appendix D of this guidance.

        4.2.2   Quality Assurance Project Planning and Data Quality Assessment

Like  peer  review, data quality assessment addresses whether a model has been developed  according to
the principles of sound science.  While some  variability  in  data is unavoidable  (see Section 4.2.3.1),
adhering to the tenets of data quality assessment described in other Agency guidance5 (Appendix D, Box
D2: Quality Assurance Planning and Data Acceptance Criteria) helps minimize data uncertainty.

Well-executed QA project planning  also helps ensure that a model performs the specified task, which
addresses the fourth model  evaluation  question posed in  Section 4.1.  As  discussed above,  evaluating
the degree to which a modeling project has met  QA objectives  is  often a function of the external  peer
review process.  The Guidance for Quality Assurance Project Plans for Modeling (EPA 2002b) provides
general information about how to document quality assurance planning for modeling (e.g., specifications
3 The formation and use of an ad hoc panel of peer reviewers may be subject to the Federal Advisory Committee Act
(FACA).  Compliance with FACA's requirements is summarized in Chapter Two of the Peer Review Handbook,
"Planning a Peer Review" (EPA 2000c).  Guidance on compliance with FACA may be sought from the Office of
Cooperative Environmental Management. Legal questions regarding FACA may be addressed to the Cross-Cutting
Issues Law Office in the Office of General Counsel.
4 Note that a technical workshop held for peer review purposes is not subject to FACA if the reviewers provide
individual opinions. [Note that there is no "one time meeting" exemption from FACA. The courts have held that
even a single meeting can be subject to FACA.]  An attempt to obtain group advice, whether it be consensus or
majority-minority views,  likely would trigger FACA requirements.
5 Other guidance that can help ensure the quality of data used in modeling projects includes:
    •   Guidance for the Data Quality Objectives Process, a systematic planning process for environmental data
       collection (EPA 2000a).
    •   Guidance on Choosing a Sampling Design for Environmental Data Collection, on applying statistical
       sampling designs to environmental applications (EPA 2002c).
    •   Guidance for Data Quality Assessment: Practical Methods for Data Analysis, to evaluate the extent to
       which data can be used for a specific purpose (EPA 2000b).
                                               25

-------
or assessment criteria development, assessments of various stages of the modeling process; reports to
management as feedback  for corrective action; and finally the  process for acceptance,  rejection,  or
qualification of the output for use) to conform with EPA policy and acquisition regulations.  Data quality
assessments are a key component of the QA plan for models.

Both the quality and quantity (representativeness) of supporting data used to parameterize and (when
available) corroborate models should be assessed during all relevant stages of a modeling project. Such
assessments are needed to evaluate whether the available data are sufficient to support the choice of the
model to be applied (question 2, Section 4.1), and to ensure that the data are sufficiently representative of
the true  system  being modeled to  provide meaningful comparison  to observational  data  (question  3,
Section 4.1).

       4.2.3  Corroboration, Sensitivity Analysis, and Uncertainty Analysis

The question "How closely does the  model approximate the real system of interest?" is unlikely to have a
simple answer.  In general, answering this question is not simply a matter of comparing  model results and
empirical data. As noted in Section 3.1, when developing and using an environmental model, modelers
and decision makers should consider what degree of uncertainty is acceptable within the  context of a
specific model application.  To do this,  they will need  to  understand  the uncertainties  underlying the
model. This section discusses three approaches to gaining this understanding:

•   Model  corroboration (Section 4.2.3.2), which includes all quantitative and qualitative  methods for
    evaluating the degree to which a model corresponds to reality.
•   Sensitivity analysis (Section 4.2.3.3), which involves studying how changes in a model's input values
    or assumptions affect its output or response.
•   Uncertainty analysis (Section 4.2.3.3), which investigates how a model might be affected by  the lack
    of knowledge about a certain population or the real value of model parameters.

Where practical, the  recommended analyses should  be conducted and their results reported in the
documentation supporting the  model.  Section 4.2.3.1 describes  and defines the  various types  of
uncertainty, and  associated concepts, inherent  in the modeling process that  model corroboration and
sensitivity and uncertainty analysis can help assess.

4.2.3.1 Types of Uncertainty

Uncertainties are  inherent in all aspects of the  modeling  process.  Identifying  those  uncertainties that
significantly influence model outcomes  (either qualitatively or quantitatively) and communicating their
importance is key to successfully integrating information from models into the decision making process.
As defined  in Chapter 3, uncertainty is the term used in this guidance to describe incomplete knowledge
about specific factors, parameters (inputs), or models.  For organizational simplicity,  uncertainties that
affect model quality are categorized in this guidance as:

•   Model  framework uncertainty, resulting from incomplete knowledge about factors that control the
    behavior of the system being modeled; limitations in spatial or temporal resolution; and simplifications
    of the system.
                                               26

-------
•   Model  input  uncertainty,  resulting  from  data  measurement  errors,  inconsistencies  between
    measured  values and those used  by the model  (e.g.,  in their level of aggregation/averaging),  and
    parameter value uncertainty.
•   Model niche uncertainty,  resulting  from the use of a model outside the system for which it was
    originally developed and/or developing  a larger  model from several existing models with different
    spatial or temporal scales.


Box 7: Example of Model Input Uncertainty
The NRC's Models in Environmental Regulatory Decision Making provides a detailed example, summarized below, of
the effect of model input uncertainty on policy decisions.
The formation of ozone in the lower atmosphere (troposphere) is an exceedingly complex chemical process that
involves the interaction of oxides  of nitrogen (NOX),  volatile  organic compounds (VOCs), sunlight,  and  dynamic
atmospheric processes.  The basic chemistry of ozone  formation was known  in the early 1960s  (Leighton  1961).
Reduction of ozone concentrations  generally requires controlling either or both NOX and VOC emissions.  Due to the
nonlinearity of atmospheric chemistry, selection of the  emission-control strategy traditionally relied on air quality
models.
One of the first  attempts to include the complexity of atmospheric ozone chemistry in the decision making process
was a simple  observation-based model, the so-called Appendix J curve (36 Fed. Reg. 8166 [1971]). The curve was
used to indicate the percentage  VOC emission  reduction  required to attain  the ozone standard in an urban area
based on  peak  concentration of photochemical  oxidants observed in that area.  Reliable NOX data were virtually
nonexistent at the time; Appendix J was based on data from measurements of ozone and VOC concentrations from
six U.S. cities.  The Appendix J  curve was based on the  hypothesis that reducing VOC emissions was the most
effective emission-control path, and this conceptual model helped  define legislative mandates enacted by Congress
that emphasized controlling these emissions.
The choice in  the  1970s to concentrate on VOC  controls was supported by early results from models. Though new
results in  the  1980s showed higher-than-expected biogenic VOC emissions,  EPA continued  to  emphasize VOC
controls,  in part because the  schedule that Congress  and  EPA set  for  attaining the ozone ambient air quality
standards was not conducive to reflecting on the basic elements of the science (Dennis 2002).
VOC reductions from the early 1970s to the early 1990s had little effect on ozone concentrations.  Regional ozone
models developed in the 1980s and 1990s suggested that controlling NOX emissions was necessary in addition to, or
instead of, controlling  VOCs to reduce ozone concentrations (NRC 1991).  The shift in the 1990s toward regulatory
activities focusing on  NOX controls was partly due to the realization that historical estimates of emissions and  the
effectiveness  of various control strategies in  reducing emissions were  not accurate.   In  other words,  ozone
concentrations had not been reduced as much as hoped over the  past three decades, in part because emissions of
some  pollutants  were much higher than originally  estimated.
Regulations may go forward before science and models are perfected because of the desire to mitigate the potential
harm from environmental hazards.  In the case of ozone modeling, the model inputs (emissions inventories in this
case)  are  often  more  important than the model science  (description of atmospheric transport and  chemistry in this
case)  and require as careful  an  evaluation as the evaluation of the  model.  These factors point to the  potential
synergistic role that measurements  play in model  development and application.

In reality, all three categories are interrelated. Uncertainty in the underlying model  structure or model
framework uncertainty is the result of incomplete scientific  data or  lack of knowledge about the factors
                                                  27

-------
that control the behavior of the system being modeled.  Model framework uncertainty can  also be the
result of simplifications needed to translate the conceptual model into mathematical terms as described in
Section 3.3.  In the scientific literature, this type of uncertainty is also referred to as structural error (Beck
1987), conceptual errors (Konikow and Bredehoeft  1992), uncertainties in the conceptual model (Usunoff
et al. 1992), or model error/uncertainty (EPA 1997; Luis and Mclaughlin 1992). Structural error relates to
the mathematical construction of the algorithms that make up a model, while the conceptual model refers
to the science underlying a  model's  governing  equations.   The  terms "model  error" and  "model
uncertainty" are both generally synonymous with model framework uncertainty.

Many models  are developed  iteratively to  update  their underlying science and resolve  existing  model
framework uncertainty as new information  becomes available.  Models with  long  lives  may undergo
important changes from  version to version.  The  MOBILE model for estimating atmospheric  vehicle
emissions, the CMAQ (Community Multi-scale Air Quality) model, and the QUAL2 water quality  models
are examples of models that have had multiple versions and major scientific modifications and extensions
in over two decades  of their existence (Scheffe and Morris 1993; Barnwell et al. 2004; EPA 1999c, as
cited in NRC 2007).

When an appropriate model framework has been developed, the model itself may still be highly uncertain
if the input data or database used to construct the application tool is not of sufficient quality.  The quality
of empirical  data used  for both model  parameterization and corroboration tests is affected  by both
uncertainty and variability.  This guidance uses the term "data uncertainty" to refer to the uncertainty
caused by measurement errors, analytical  imprecision,  and limited sample sizes during  data collection
and treatment.

In contrast to data uncertainty, variability results from the  inherent randomness  of certain parameters,
which  in  turn results from the  heterogeneity  and  diversity in environmental processes.  Examples of
variability  include fluctuations  in ecological  conditions, differences  in habitat, and  genetic variances
among populations (EPA 1997). Variability in  model parameters  is largely dependent on the extent to
which input data have been aggregated (both spatially and temporally).  Data uncertainty is sometimes
referred to as  reducible uncertainty because  it  can  be  minimized with further study (EPA 1997).
Accordingly,  variability  is referred  to  as  irreducible  because it  can  be better  characterized  and
represented but not reduced with further study (EPA 1997).

A model's application niche  is the set of conditions under which  use of the  model  is scientifically
defensible (EPA 1994b). Application niche uncertainty is therefore a function of the appropriateness of a
model  for use  under a specific set of conditions. Application niche uncertainty is particularly important
when (a) choosing among existing models for  an application that lies outside the system for which the
models were originally developed and/or (b) developing a larger model from several existing models with
different spatial or temporal scales (Levins 1992).

The SAB's review of MMSOILS (Multimedia Contaminant Fate, Transport and Exposure Model) provides
a good example of application niche uncertainty. The SAB questioned the adequacy of using a screening-
level model to  characterize situations where there is substantial subsurface heterogeneity or where non-
aqueous  phase contaminants  are present (conditions differ from default values) (SAB 1993b). The SAB
considered the MMSOILS model acceptable within its original application niche, but unsuitable for more
heterogeneous conditions.
                                              28

-------
4.2.3.2  Model Corroboration

        The interdependence of models and measurements is complex and iterative for several reasons.
        Measurements help to provide the conceptual basis of a model and inform model development,
        including parameter estimation.  Measurements are also a critical tool for corroborating model
        results.  Once developed, models can derive priorities for measurements that ultimately get used
        in modifying existing models or in developing new ones.  Measurement and model activities are
        often  conducted in  isolation... Although environmental data systems serve a range of purposes,
        including  compliance  assessment,  monitoring of trends in indicators, and  basic  research
        performance,  the importance of models in  the regulatory process requires measurements and
        models to  be better integrated.  Adaptive strategies that rely  on iterations of measurements and
        modeling,  such  as  those discussed in the  2003  NRC report titled Adaptive Monitoring and
        Assessment for the Comprehensive Everglades  Restoration Plan, provide  examples  of how
        improved coordination might be achieved.
                      — NRC Committee on Models in the Regulatory Decision Process (NRC 2007)

Model corroboration includes  all quantitative and qualitative methods for evaluating the degree to which a
model corresponds to reality.  The rigor of these methods varies depending on the type and purpose of
the model application.  Quantitative model corroboration uses statistics to estimate how closely the model
results match measurements made in the real  system.  Qualitative corroboration activities may include
expert elicitation to obtain  beliefs about a system's behavior in a data-poor situation.  These corroboration
activities may move model forecasts toward consensus.

For  newly developed  model frameworks  or untested mathematical  processes,  formal corroboration
procedures may  be appropriate.   Formal corroboration may involve formulation of hypothesis  tests for
model acceptance, tests  on  datasets independent of the calibration  dataset, and quantitative testing
criteria.   In many cases,  collecting independent datasets for formal model corroboration is extremely
costly or otherwise unfeasible. In such circumstances, model evaluation may be appropriately conducted
using a  combination of other evaluation tools discussed in this section.

Robustness is the capacity of a model to perform equally well across the full range of environmental
conditions for which it was designed (Reckhow 1994; Borsuk et al. 2002). The degree of similarity among
datasets available for calibration  and corroboration  provides insight  into  a model's robustness.   For
example, if the dataset used to corroborate a model is identical or statistically similar to the dataset used
to calibrate the model, then the corroboration exercise has provided  neither an independent measure of
the model's performance nor insight into the model's  robustness.  Conversely, when  corroboration data
are significantly different from calibration data, the corroboration exercise  provides a measure of both
model performance and robustness.

Quantitative model corroboration methods are recommended for choosing among multiple models that
are available for the same application.  In such cases, models may  be ranked on  the basis of their
statistical performance in comparison to the observational data (e.g.,  EPA 1992). EPA's Office of Air and
Radiation evaluates models in this  manner. When a single model is found to perform better than  others in
a given  category, OAR recommends it in the Guidelines on  Air Quality Models as a preferred model for
                                                29

-------
application in that category (EPA 2003a). If models perform similarly, then the preferred model is selected
based  on other  factors, such  as past  use,  public  familiarity, cost  or resource requirements,  and
availability.

Box 8:  Example: Comparing Results from Models of Varying Complexity
(From Box 5-4 in NRC's Models in Environmental Regulatory Decision Making)
The Clean Air Mercury Rule6 requires industry to reduce mercury emissions from coal-fired power plants. A potential
benefit is the  reduced human exposure and related health impacts from methylmercury that may result from reduced
concentrations of this toxin in fish.  Many challenges and uncertainties affect assessment of this benefit. In  its
assessment of the benefits and costs of this rule, EPA used multiple models to examine how changes in atmospheric
deposition would affect mercury concentrations in fish, and applied the models to assess some of the uncertainties
associated with the model results (EPA  2005).
EPA based its national-scale benefits assessment on results from the mercury maps (MMaps) model.  This model
assumes a linear, steady-state  relationship between atmospheric deposition of mercury and mercury concentrations
in fish,  and thus assumes that a 50% reduction in mercury deposition rates results  in a 50% decrease in fish mercury
concentrations.  In addition, MMaps assumes instantaneous adjustment of aquatic systems and their ecosystems to
changes in deposition — that is, no time lag in the conversion of mercury to methylmercury and its bioaccumulation in
fish. MMaps also does not deal with  sources of mercury other than those from atmospheric deposition. Despite those
limitations, the  Agency  concluded  that  no other  available  model  was capable  of performing a national-scale
assessment.
To further investigate fish mercury concentrations  and to assess the effects of MMaps' assumptions, EPA  applied
more detailed models, including the spreadsheet-based ecological risk assessment for the fate of mercury (SERAFM)
model, to five well-characterized ecosystems.  Unlike the steady-state MMaps model, SERAFM is a dynamic model
which calculates the temporal  response of mercury concentrations in fish tissues to changes in mercury loading.  It
includes multiple land-use types for representing watershed  loadings  of mercury through  soil  erosion  and runoff.
SERAFM partitions mercury among  multiple compartments and phases, including aqueous phase, abiotic participles
(for example,  silts), and biotic  particles (for example, phytoplankton). Comparisons of SERAFM's predictions with
observed fish mercury concentrations for  a single fish species in four ecosystems showed that the model under-
predicted mean concentrations for one water body, over-predicted mean concentrations for a second water body, and
accurately predicted  mean concentrations for the  other two.  The  error  bars  for the  observed fish mercury
concentrations in these four ecosystems were  large, making it difficult to assess the models' accuracy. Modeling the
four ecosystems also showed  how  the assumed physical and chemical characteristics of the specific ecosystem
affected absolute fish  mercury concentrations and  the length of time  before fish  mercury concentrations reached
steady state.
Although EPA concluded that  the best available science supports the  assumption of a  linear relationship between
atmospheric deposition and fish mercury concentrations for broad-scale use, the more detailed ecosystem modeling
demonstrated that individual ecosystems were highly sensitive to uncertainties in model parameters.  The Agency
also noted that many of the model uncertainties  could not be quantified. Although the case studies covered the bulk
of the  key environmental characteristics,  EPA found that extrapolating the individual ecosystem case studies to
account for the variability in ecosystems  across the country indicated that those case studies might not  represent
extreme conditions that could influence how atmospheric mercury deposition affected fish mercury concentrations in
6 On February 8, 2008, the U.S. Court of Appeals for the District of Columbia Circuit vacated the Clean Air
Mercury Rule. The DC Circuit's vacatur of this rule was unrelated to the modeling conducted in support of the rule.
                                                   30

-------
a water body.
This  example illustrates  the usefulness of investigating a variety of models at varying levels of complexity.   A
hierarchical  modeling approach, such as that used in the mercury analysis, can provide justification for simplified
model assumptions or potentially provide evidence for a consistent bias that would  negate the assumption that a
simple model is appropriate for broad-scale application.

4.2.3.3  Sensitivity and Uncertainty Analysis

Sensitivity  analysis is  the study of how a model's response can be apportioned to changes in model
inputs  (Saltelli et al. 2000a).  Sensitivity analysis is recommended as the principal evaluation tool for
characterizing the most and least important sources of uncertainty in environmental models.

Uncertainty analysis investigates the lack of knowledge about a certain population  or the real value of
model  parameters.  Uncertainty can sometimes be reduced through further study  and by collecting
additional data.  EPA guidance (e.g., EPA 1997) distinguishes uncertainty analysis from methods used to
account for variability  in input data  and model parameters.  As mentioned  earlier, variability in model
parameters and  input data can be better characterized through further study but is usually not reducible
(EPA 1997).

Although sensitivity  and uncertainty analysis  are closely related, sensitivity is  algorithm-specific with
respect to  model "variables" and  uncertainty is parameter-specific.  Sensitivity  analysis assesses the
"sensitivity" of the model to specific parameters and  uncertainty analysis  assesses the "uncertainty"
associated with  parameter values. Both types of analyses are important to understand the degree of
confidence a user can place in the model results. Recommended techniques for conducting uncertainty
and sensitivity analysis are discussed in Appendix D.

The NRC committee pointed out that uncertainty analysis for regulatory environmental modeling involves
not only analyzing uncertainty, but also communicating the uncertainties to policy makers. To facilitate
communication of model  uncertainty, the committee recommends using hybrid approaches in which
unknown quantities are treated probabilistically and explored in scenario-assessment  mode by decision
makers through a range of plausible values.  The committee further acknowledges (NRC 2007) that:
       Effective  uncertainty communication requires a high level of interaction with the relevant decision
       makers to ensure that they  have  the necessary information about the nature and  sources of
       uncertainty and their consequences.  Thus, performing uncertainty analysis for environmental
       regulatory activities requires extensive discussion between analysts and decision makers.

4.3     Evaluating Proprietary Models

This guidance defines proprietary models as  those computer models for which the source code is not
universally shared. To  promote the transparency with which decisions are made, EPA prefers  using non-
proprietary models when available. However, the Agency acknowledges there will be times when the use
of proprietary models provides the most reliable and best-accepted characterization of a system.
                                                31

-------
When a proprietary model is used, its use should be accompanied by comprehensive, publicly available
documentation. This documentation should describe:

    •   The conceptual model and the theoretical basis (as described in Section 3.3.1) for the model.
    •   The techniques and procedures used to verify that the proprietary model is free from numerical
       problems or "bugs"  and that it truly represents the conceptual  model (as  described in  Section
       3.3.3).
    •   The process used  to  evaluate the model  (as described in  Section  4.2)  and the basis for
       concluding that the model and its analytical results are of a quality sufficient to serve as the basis
       fora decision (as described in Section 4.1).
    •   To the extent practicable, access to input and output data such that third parties can replicate the
       model results.

4.4    Learning From Prior Experiences — Retrospective Analyses of Models

The NRC Committee on Models in the Regulatory Decision Process emphasized that the final  issue in
managing the model  evaluation  process is the learning that comes  from  examining prior modeling
experiences.  Retrospective analysis of models is important to individual models and regulatory policies
and to systematically enhance the overall  modeling field. The committee  pointed out that retrospective
analyses can  be considered  from various perspectives:

•   They can investigate the systematic  strengths  and weaknesses  that are characteristic of broad
    classes of models — for example,  models of ground water flow,  surface water, air pollution, and
    health risks assessment.  For example, a  researcher estimated that in 20 to 30 percent of ground
    water modeling  efforts,  surprising occurrences  indicated that the conceptual model underlying the
    computer model was invalid (Bredehoeft 2003, 2005, in NRC 2007).

•   They can study the processes (for example, approaches to model development and evaluation) that
    lead to successful model applications.

•   They can examine models that have been in use for years to determine how well they work.  Ongoing
    evaluation of the model against data,  especially data taken under novel  conditions, offers the best
    chance to identify and correct conceptual errors. This type of analysis is referred to as a model "post-
    audit" (see Section 5.5)

The results of retrospective  evaluations of individual models and model classes can be used to identify
priorities for improving models.
                                              32

-------
Box 9: Example of a Retrospective Model Analysis at EPA
(From Box 4-6 in NRC's Models in Environmental Regulatory Decision Making)
EPA's Model Evaluation and Applications Research Branch has been performing a retrospective analysis of the
CMAQ model's ability to simulate the change in a pollutant associated with a known change in emissions (A. Gilliland,
EPA, personal commun., May 19,  2006 and March 5,  2007). This study, which EPA terms a "dynamic evaluation"
study, focuses on  a rule issue by EPA in 1998 that required 22 states and the District of Columbia to submit State
Implementation Plans providing NOX emission reductions to mitigate ozone transport in the  eastern United States.
This rule, known as the NOX SIP Call, requires emission reductions from the utility sector and large industrial boilers
in the eastern and midwestern United States by 2004. Since theses sources are equipped  with continuous emission
monitoring systems, the NOX SIP call represents a special opportunity to directly measure the emission changes and
incorporate them into model simulations with reasonable confidence.
Air quality model simulations were developed for the summers of 2002 and 2004 using the  CMAQ model, and the
resulting ozone predictions were compared to observed ozone concentrations. Two series of CMAQ simulations  were
developed to test  two  different chemical mechanisms in  CMAQ.  This allowed  an  evaluation of the uncertainty
associated with the model's  representation of chemistry. Since the model's  prediction of the relative change in
pollutant concentrations provides input for regulatory decision making, this type of dynamic evaluations is particularly
relevant to how the model is used.

4.5     Documenting the Model Evaluation

In  its Models in Environmental Regulatory Decision Making report, the  NRC summarizes  the  key
elements of a model  evaluation (NRC 2007). This list provides a useful framework for documenting the
results of model evaluation as the various  elements are  conducted during model  development  and
application:

•   Scientific basis.  The scientific theories that form the basis for models.
•   Computational infrastructure.  The mathematical algorithms and approaches used in executing the
    model computations.
•   Assumptions and limitations.  The detailing of important  assumptions used  in  developing  or
    applying  a computational  model, as  well as the resulting  limitations that will affect the model's
    applicability.
•   Peer review.  The documented critical review of a model or its application conducted by qualified
    individuals who are independent of those who  performed the work,  but who collectively have  at least
    equivalent technical  expertise to those who  performed  the  original work.  Peer review attempts  to
    ensure that  the  model is technically adequate, competently performed,  properly documented,  and
    satisfies  established  quality   requirements  through  the   review  of  assumptions,  calculations,
    extrapolations,  alternate  interpretations,  methodology,  acceptance criteria,  and/or  conclusions
    pertaining from a model or its application (modified from EPA 2006).
•   Quality assurance and quality control  (QA/QC).   A  system  of management activities involving
    planning, implementation, documentation, assessment, reporting, and improvement to ensure that a
    model and its components are of the type needed and  expected for its task and that they meet  all
    required performance standards.
•   Data availability and quality. The availability and quality of monitoring and laboratory data that can
    be used for both developing model input parameters and  assessing model results.
                                                33

-------
•   Test cases.  Basic model runs where an analytical solution is available or an empirical solution is
    known with a high degree of confidence to ensure that algorithms and computational processes are
    implemented correctly.
•   Corroboration of model  results with  observations.  Comparison of model  results with  data
    collected in the field or laboratory to assess the model's accuracy and improve its performance.
•   Benchmarking against other models. Comparison of model results with other similar models.
•   Sensitivity and uncertainty analysis. Investigation of the parameters or processes that drive model
    results, as well as the effects of lack of knowledge and other potential sources of error in the model.
•   Model resolution capabilities. The level of disaggregation of processes  and results in the model
    compared to the resolution needs from the problem statement or model application.  The resolution
    includes the level of spatial, temporal, demographic, or other types of disaggregation.
•   Transparency. The need for individuals and groups outside modeling activities to comprehend either
    the processes followed  in evaluation or the essential workings of the model and its outputs.

4.6    Deciding Whether to Accept the Model for Use in Decision Making

The model development and evaluation process culminates in a decision to accept  (or not accept) the
model for use in decision making.  This decision is made by the program manager charged with making
regulatory decisions, in consultation with the model developers and project team.  It should be informed
by good communication of the key findings of the model evaluation process, including the critical issue of
uncertainty. The project team  should  gain model  acceptance before applying the  model to  decision
making to avoid confusion and potential re-work.
                                              34

-------
5.1     Introduction

Once a model has  been accepted for use by decision makers, it  is applied to  the  problem that was
identified in the first stages of the modeling process.  Model application commonly involves a shift from
the hindcasting (testing the  model against past observed conditions)  used in the model development and
evaluation  phases to forecasting (predicting a future change) in the application  phase.  This may involve a
collaborative effort between modelers and program staff to devise management scenarios that represent
different regulatory alternatives.   Some model applications may entail trial-and-error model simulations,
where model inputs are changed iteratively until a desired environmental condition is achieved.

Using a model in  a  proposed decision requires that the model application be  transparently incorporated
into the public process. This is  accomplished by providing written documentation of the model's relevant
characteristics  in a style and format accessible to the interested public, and by sharing specific model files
and data with external parties, such as technical consultants and university scientists, upon request. This
chapter presents best practices and  other recommendations for integrating the results of environmental
models  into  Agency decisions.   Section  5.2  describes how to achieve  and document a transparent
modeling process, Section 5.3 reviews situations when  use of multiple  models may be appropriate, and
Section 5.4 discusses the use of  post-audits to determine whether the  actual system response concurs
with that predicted by the  model.
Box 10:   Examples of Major  EPA  Documents That Incorporate  a Substantial Amount of Computational
Modeling Activities
(From Table 2-2 in NRC's Models in Environmental Regulatory Decision Making)

Air Quality
Criteria Documents and Staff Paper for Establishing NAAQS
Summarize and assess exposures and health impacts for the criteria air pollutants (ozone, particulate matter, carbon
monoxide,  lead, nitrogen dioxide, and sulfur dioxide). Criteria documents include results from exposure and health
modeling  studies, focusing  on describing exposure-response relationships.  For example,  the particulate matter
criteria document placed emphasis on epidemiological models of morbidity and mortality (EPA 2004c). The  Staff
Paper takes this scientific foundation a step further by identifying the crucial health information and using exposure
modeling to characterize risks that serve as the basis for the staff recommendation of the  standards to the  EPA
Administrator.  For example, models ofthe number of children exercising outdoors during those parts of the day when
ozone is elevated had a major influence on  decisions about the 8-hour ozone national ambient air quality standard
(EPA 1996).
State Implementation Plan (SIP) Amendments
A detailed  description of the scientific  methods and emissions reduction programs a state will use to carry out its
responsibilities under the CAA for complying with NAAQS. A SIP typically relies on results from activity, emissions,
and air quality modeling. Model-generated emissions inventories serve as input to regional air  quality models and are
used to test alternative emission-reduction schemes to see whether they will result in air quality standards being met
(e.g., ADEC 2001; TCEQ 2004).  Regional-scale modeling has become part of developing state implementation plans
                                                 35

-------
for the new 8-hour ozone and fine participate matter standards.  States, local governments, and their consultants do
this analysis.
Regulatory Impact Assessments (RIAs) for Air Quality Rules
RIAs for air quality regulations document the costs and benefits of major emission control regulations.  Recent RIAs
have included emissions, air quality, exposure, and health and economic impacts modeling results (e.g., EPA2004b)

Water Regulations
Total Maximum Daily Load (TMDL) Determinations
For each impaired water body, a TMDL identifies  (a) the water quality standard that is not being attained and the
pollutant causing the impairment (b) and the total loading of the  pollutant that the water may receive and still meet the
water quality standard and  (c) allocates that total loading among the point and nonpoint sources of the pollutant
discharging to the water.  Establishment of TMDLs may utilize water quality and/or nutrient loading models.  States
establish most TMDLs and therefore state and their consultants can be expected  to do the majority of this modeling,
with EPA occasionally doing the modeling for particularly contentious TMDLs (EPA 2002b; George 2004; Shoemaker
2004; Wool 2004).
Leaking Underground Storage Tank Program
Assesses  the potential  risks associated  with leaking  underground  gasoline storage  tanks.  At an  initial screening
level,  it may assess one-dimensional  transport of a conservative contaminant using an  analytical model (Weaver
2004).
Development of Maximum Contaminant Levels for Drinking Water
Assess  drinking  water standards for public water supply  systems.   Such assessments  can include  exposure,
epidemiology, and dose-response modeling (EPA2002c; NRC2001b, 2005b).

Pesticides and Toxic Substances Program
Pre-manufacturinq Notice Decisions
Assess risks associated with new manufactured chemicals entering the market.  Most chemicals are screened initially
as to their environmental and human health risks using structure-activity relationship models.
Pesticide Reassessments
Requires that all existing pesticides undergo a reassessment  based on cumulative  (from multiple pesticides) and
aggregate (exposure from multiple pathways) health risk. This includes the use  of pesticide exposure models.

Solid and Hazardous Wastes Regulations
Superfund Site Decision Documents
Includes the remedial investigation, feasibility study, proposed  plan, and record-of-decision documents that address
the  characteristics and  cleanup of Superfund sites.  For many hazardous waste sites, a primary modeling task is
using groundwater modeling to assess movement of toxic substances through the  substrate (Burden 2004).  The
remedial investigation for a  mining megasite might include water quality, environmental chemistry, human health risk,
and ecological  risk assessment modeling (NRC 2005a).

Human Health Risk Assessment
Benchmark Dose (BMP) Technical Guidance Document
EPA relies on both laboratory animal and epidemiological studies to  assess  the  noncancer effects of chronic
exposure to pollutants (that is, the reference dose [RfD]  and the inhalation reference concentration, [RfC]).  These
data are modeled to estimate the human  dose-response.  EPA recommends  the  use of BMD modeling, which
essentially fits the experimental data to use as much of the available data as possible (EPA 2000).
                                                   36

-------
Ecological Risk Assessment

The ecological risk assessment guidelines provide general principles and give examples to show how ecological risk
assessment can be applied to a wide range of systems, stressors, and biological, spatial, and temporal scales.  They
describe the  strengths and limitations of alternative approaches and  emphasize processes and approaches  for
analyzing data rather than specifying data collection techniques, methods or models (EPA 1998).
5.2    Transparency

The objective of transparency is to enable communication between modelers, decision makers, and the
public. Model transparency is achieved when the modeling processes are documented with clarity and
completeness  at an appropriate  level of detail.   When models are transparent, they can  be  used
reasonably and effectively in a regulatory decision.

       5.2.1   Documentation

Documentation enables decision makers and other model users to understand the process by which a
model was developed and used. During model development and use, many choices must be made and
options selected that may  bias the model results.  Documenting this  process and its limitations and
uncertainties is essential to increase the utility and acceptability of the model outcomes.  Modelers and
project teams should document all relevant information about  the  model  to  the extent practicable,
particularly when a controversial decision is involved.  In legal proceedings, the quality and thoroughness
of the model's written documentation and the Agency's responses to peer review and public comments
on the model can affect the  outcome of the legal challenge.

The documentation should  include  a clear explanation of the model's relationship to the scenario of the
particular application.  This  explanation should describe the  limitations of the  available information when
applied to other scenarios.   Disclosure about the state of science used in  a  model and future plans to
update the  model  can help establish  a record of reasoned, evidence-based application to inform
decisions.  For example, EPA successfully defended a challenge to a model  used in its TMDL program
when it explained that it was basing its decision on the best available  scientific information and that it
intended to refine its model  as better information surfaced.7

When a court  reviews EPA modeling decisions, they generally give some deference to EPA's  technical
expertise,  unless it is  without substantial basis in  fact.   As discussed in Section 4.2.3  regarding
corroboration,  deviations from empirical observations are to  be expected.  In  substantive legal  disputes,
the courts generally examine the record supporting EPA's decisions for justification as to why the model
was reasonable.8 The record should contain not only model development, evaluation, and application but
also  the  Agency's  responses  to comments on  the model raised during  peer review and the public
process.   The organization of this guidance document offers a general outline for model documentation.
Box 11 provides a more detailed outline. These elements are adapted  from  EPA Region 10's standard
practices for modeling projects.
7 Natural Resources Defense Council v. Muszynski, 268 F.3d 91 (2d Cir. 2001).
8 American Iron and Steel Inst. v. EPA, 115 F.3d 979 (D.C. Cir. 1997).
                                               37

-------
Box 11: Recommended Elements for Model Documentation

1. Management Objectives
•   Scope of problem
•   Technical objectives that result from management objectives
•   Level of analysis needed
•   Level of confidence needed

2. Conceptual Model
•   System boundaries (spatial and temporal domain)
•   Important time and length scales
•   Key processes
•   System characteristics
•   Source description
•   Available data sources (quality and quantity)
•   Data gaps
•   Data collection programs (quality and quantity)
•   Mathematical model
•   Important assumptions

3. Choice of Technical Approach
•   Rationale for approach in context of management objectives and conceptual model
•   Reliability and acceptability of approach
•   Important assumptions

4. Parameter Estimation
•   Data used for parameter estimation
•   Rationale for estimates in the absence of data
•   Reliability of parameter estimates

5. Uncertainty/Error
•   Error/uncertainty in inputs,  initial conditions, and boundary conditions
•   Error/uncertainty in pollutant loadings
•   Error/uncertainty in specification of environment
•   Structural errors in methodology (e.g., effects of aggregation or simplification)

6. Results
•   Tables of all parameter values used for analysis
•   Tables or graphs of all results used in support of management objectives or conclusions
•   Accuracy of results

7. Conclusions of analysis in relationship to management objectives

8. Recommendations for additional analysis, if necessary

Note:  The QA  project plan for models (EPA 2002b) includes a documentation  and records component that also
describes the types of records and level of detailed documentation to be kept depending on the scope and magnitude
of the project.


        5.2.2   Effective Communication


The modeling process should effectively communicate  uncertainty to anyone interested in  the  model
results.   All  technical  information  should be documented  in  a  manner that  decision makers  and
stakeholders can readily interpret and understand.  Recommendations for improving clarity, adapted from
the Risk Characterization Handbook (EPA 2000d), include the following:


•   Be as brief as possible while still providing all necessary details.
                                                  38

-------
•   Use plain language that modelers, policy makers, and the informed lay person can understand.
•   Avoid jargon and excessively technical language.  Define specialized terms upon first use.
•   Provide the model equations.
•   Use clear and appropriate methods to efficiently display mathematical relationships.
•   Describe quantitative outputs clearly.
•   Use understandable tables and graphics to present technical data (see Morgan and Henrion, 1990,
    for suggestions).

The conclusions and  other key points of the  modeling project should  be  clearly communicated.  The
challenge is to  characterize these essentials for decision makers, while also providing them with more
detailed information about the  modeling  process and its  limitations.  Decision makers should have
sufficient insight into the  model framework  and its underlying assumptions to  be able to apply model
results appropriately.  This is consistent with QA planning practices that assert  that all technical reports
must discuss the data quality and any limitations with respect to their intended use (EPA 2000e).

5.3    Application of Multiple Models

As mentioned in earlier chapters, multiple models sometimes apply to a certain decision making need; for
example, several air quality models, each with  its own strengths and weaknesses, might be applied for
regulatory purposes.  In other situations, stakeholders may use alternative models (developed by industry
and academic researchers) to produce alternative risk assessments (e.g., CARES pesticide exposure
model developed by industry).  One approach  to address this issue is to use multiple  models of varying
complexities to simulate the same phenomena (NRC 2007). This may provide insight  into how sensitive
the results are to different modeling  choices and how much trust to put in the results from any one model.
Experience has shown that running  multiple models can increase confidence in the model results (Manno
et al. 2008) (see Box  8 in Chapter  4 for an  example).  However, resource  limitations  or regulatory time
constraints may limit the capacity to  fully evaluate all possible models.

5.4    Model Post-Audit

Due to time  complexity,  constraints,  scarcity of resources,  and/or lack of scientific understanding,
technical decisions are often based on incomplete information and imperfect models.  Further, even if
model developers strive to use  the best science available, scientific knowledge and understanding are
continually advancing. Given this reality, decision makers should use model results in the context of an
iterative,  ever-improving process of continuous model  refinement to demonstrate the accountability  of
model-based  decisions. This process includes conducting model post-audits to assess and improve a
model and its ability to provide valuable predictions for management decisions.  Whereas corroboration
(discussed in Section  4.2.3.2) demonstrates the degree to which a  model corresponds to  past system
behavior, a model post-audit  assesses its ability to model  future conditions (Anderson and Woessner
1992).

A  model  post-audit  involves  monitoring  the modeled  system,  after implementing a  remedial  or
management action, to determine whether the actual system response concurs with that predicted by the
model.  Post-auditing  of all models is not feasible due to resource constraints, but targeted audits  of
commonly used models may provide valuable information for improving model frameworks and/or model
parameter estimates.  In  its review of the TMDL program, the NRC recommended that EPA implement
                                              39

-------
this approach by selectively  targeting  "some  post-implementation  TMDL compliance  monitoring for
verification data collection to assess model  prediction error" (NRC 2001).  The  post-audit should also
evaluate  how effectively the model development and use process engaged decision  makers and other
stakeholders (Manno et al. 2008).
                                              40

-------
Appendix A:    Glossary of Frequently Used Terms	

Accuracy:  The closeness of a measured or computed value to its "true" value, where the "true" value is
obtained with  perfect  information.   Due  to the natural  heterogeneity and stochasticity  of  many
environmental systems, this "true" value exists as a distribution  rather than a discrete value.  In  these
cases, the "true" value will be a function of spatial and temporal aggregation.

Algorithm: A precise rule (or set of rules) for solving some problem.

Analytical model:  A model that can be solved  mathematically in terms of analytical functions. For
example, some  models that  are based  on  relatively  simple  differential equations can  be solved
analytically by combinations of polynomials, exponential, trigonometric, or other familiar functions.

Applicability and utility: One of EPA's five assessment factors (see definition) that describes the extent
to which the information is relevant for the Agency's intended  use (EPA 2003b).

Application niche: The set of conditions under which the use of a model is scientifically defensible. The
identification of application niche is a key step during model development.  Peer review should include an
evaluation of application  niche.   An explicit statement of application  niche helps decision makers
understand the limitations of the scientific basis of the model (EPA 1993).

Application niche uncertainty:  Uncertainty as  to the appropriateness of a model for use under a
specific set of conditions (see "application niche").

Assessment factors: Considerations recommended by EPA for evaluating the quality and relevance of
scientific and technical information.  The five assessment factors are soundness, applicability and  utility,
clarity and completeness, uncertainty and variability, and evaluation and review (EPA 2003b).

Bias: Systemic deviation between a measured (i.e., observed) or computed value and its "true" value.
Bias is affected by faulty  instrument calibration and other measurement errors, systemic errors during
data collection,  and  sampling  errors  such  as incomplete spatial randomization during  the design  of
sampling programs.

Boundaries:  The spatial and temporal conditions and practical constraints under which environmental
data are collected.   Boundaries specify the  area or  volume (spatial  boundary)  and the  time  period
(temporal boundary) to which a model application will apply (EPA 2000a).

Boundary conditions:   Sets  of values  for state  variables  and their rates along problem domain
boundaries, sufficient to determine the state of the system within the problem domain.

Calibration: The process of adjusting model parameters within physically defensible ranges until the
resulting predictions give the best possible fit  to the observed data (EPA 1994b).  In some  disciplines,
calibration is also referred to as  "parameter estimation" (Beck et al. 1994).

Checks: Specific tests in a quality assurance plan that are  used to evaluate whether the specifications
(performance criteria) for the project developed at its onset have been met.
                                              41

-------
Clarity and completeness:  One of EPA's five assessment factors (see definition) that describes the
degree of clarity  and completeness  with which the data, assumptions, methods,  quality assurance,
sponsoring organizations, and  analyses  employed to generate the  information  are  documented  (EPA
2003b).

Class (see "object-oriented platform"): A set of objects that share a common structure and behavior.
The structure of a  class is determined by the class variables, which represent the state of an object of that
class; the behavior is given by the set of methods associated with the class (Booch 1994).

Code: Instructions, written in the syntax of a computer language, that provide the computer with a logical
process.  "Code" can also refer to a computer program or subset. The term "code" describes the fact that
computer  languages use a  different vocabulary and syntax than algorithms that  may be written  in
standard language.

Code verification:  Examination  of the  algorithms and numerical technique  in the computer code  to
ascertain that  they  truly represent the  conceptual model and that there are  no inherent numerical
problems with obtaining a solution (Beck et al. 1994).

Complexity:  The opposite of simplicity. Complex systems tend to  have a large  number of variables,
multiple parts, and mathematical equations of a higher order, and to be more difficult to  solve.  Used  to
describe computer models, "complexity" generally refers to the level in difficulty in solving mathematically
posed problems as measured by the  time, number of steps or arithmetic operations, or memory space
required (called time complexity, computational complexity, and space complexity,  respectively).

Computational models: Models  that use measurable variables, numerical  inputs, and mathematical
relationships to produce quantitative outputs.

Conceptual basis:  An underlying scientific foundation of model algorithms or governing  equations.  The
conceptual basis for a model is either empirical (based on statistical relationships between observations)
or mechanistic (process-based) or a combination.  See definitions for "empirical model" and "mechanistic
model."

Conceptual model:  A hypothesis regarding the important factors that govern  the behavior of an object
or process of interest.  This can be an interpretation or working description  of  the characteristics and
dynamics of a physical system (EPA 1994b).

Confounding error:  An error induced by unrecognized effects from variables that are not included in the
model. The unrecognized, uncharacterized nature of these errors makes them more difficult to  describe
and account for in  statistical analysis of uncertainty (Small and Fishbeck 1999).

Constant:  A fixed value (e.g., the speed of light, the gravitational force) representing known  physical,
biological, or ecological activities.
                                              42

-------
Corroboration (model): Quantitative and qualitative methods for evaluating the degree to which a model
corresponds to reality.  In some disciplines, this process has been referred to as validation.  In general,
the term "corroboration" is preferred because it implies a claim of usefulness and not truth.

Data  uncertainty:   Uncertainty  (see  definition)  that is caused  by  measurement errors,  analytical
imprecision, and limited  sample sizes during the collection and treatment of data.  Data uncertainty, in
contrast to variability (see definition), is the  component of total uncertainty that is  "reducible" through
further study.

Debugging:  The identification and removal of bugs from  computer code.  Bugs are errors in  computer
code that range from typos to misuse of concepts and equations.

Deterministic model:   A model that provides a  solution for the state variables rather than  a set of
probabilistic outcomes.  Because this type of model does not explicitly  simulate the effects of data
uncertainty or variability, changes in model outputs are solely due to changes in model components or in
the boundary conditions or initial conditions.

Domain (spatial and temporal):  The  spatial and temporal domains of a model cover the extent and
resolution  with respect to space and time for which the model has been  developed and over which  it
should be  evaluated.

Domain boundaries (spatial and temporal):  The limits of space and time that bound a model's domain
and are specified within the boundary conditions (see "boundary conditions").

Dynamic model:  A model providing the time-varying behavior of the state variables.

Empirical  model:   A  model whose structure  is determined  by the observed  relationship among
experimental data (Suter 1993). These models can be used to develop relationships that are  useful for
forecasting and describing trends in behavior, but they are not necessarily mechanistically relevant.

Environmental data:  Information collected directly from measurements, produced from models, and
compiled from other sources such as databases and literature (EPA 2002a).

Evaluation and review: One of EPA's five assessment factors (see definition) that describes the extent
of independent verification, validation, and peer review of the information or of the procedures, measures,
methods, or models (EPA 2003b).

Expert elicitation: A systematic process for quantifying, typically in probabilistic terms, expert judgments
about uncertain quantities.  Expert elicitation can be used to characterize uncertainty and fill data  gaps
where traditional scientific research is not feasible or data are not yet available. Typically, the necessary
quantities  are obtained through structured interviews and/or questionnaires.  Procedural  steps can be
used to minimize the effects of heuristics and bias in expert judgments.

Extrapolation:  Extrapolation  is a  process that uses assumptions about fundamental causes underlying
the observed phenomena in order to project beyond the range of the data.  In general, extrapolation is not
                                              43

-------
considered a reliable process for prediction; however, there are situations where it may be necessary and
useful.

False negative:  Also known as a false acceptance decision errors, a false negative occurs when the null
hypothesis or baseline condition cannot be rejected based on the available sample data. The decision is
made assuming the baseline condition is true when in reality it is false (EPA 2000a).

False positive:  Also known as  a false rejection decision  error, a false positive occurs when the null
hypothesis or baseline condition is incorrectly rejected based on the sample data. The decision is made
assuming the alternate condition or hypothesis to be true when in reality it is  false (EPA 2000a).

Forcing/driving  variable:  An external or exogenous (from  outside the model framework) factor that
influences the state variables calculated within the model.  Such variables include, for example, climatic
or environmental conditions (temperature, wind flow, oceanic circulation, etc.).

Forms (models): Models can be represented and solved in different forms,  including analytic, stochastic,
and simulation.

Function:  A mathematical relationship between variables.

Graded approach: The process of basing the level of application of managerial controls to an item or
work on the intended use of results and degree of confidence needed in the results (EPA 2002b).

Integrity:   One of three  main components of quality in EPA's Information Quality Guidelines.  "Integrity"
refers to the  protection  of information  from  unauthorized  access or  revision to ensure that it  is not
compromised through corruption or falsification (EPA 2002a).

Intrinsic variation: The variability (see definition) or inherent randomness in the real-world processes.

Loading:  The rate of release of a constituent of interest to a particular receiving medium.

Measurement error:  An error in the  observed data caused  by human  or instrumental error during
collection.  Such errors  can be independent or random.  When  a persistent bias or  mis-calibration is
present in the measurement device, measurement errors may be correlated among observations (Small
and Fishbeck 1999). In some disciplines, measurement error may be referred to as observation error.

Mechanistic  model:  A model  whose structure explicitly represents  an understanding of  physical,
chemical,  and/or biological  processes.   Mechanistic models  quantitatively  describe the relationship
between some phenomenon and underlying first principles of cause.  Hence, in theory, they are useful for
inferring solutions outside the domain in which the initial data were collected and used to parameterize
the mechanisms.

Mode (of a model): The manner in which a model operates.  Models can  be designed to represent
phenomena in different modes. Prognostic (or predictive) models are designed to forecast outcomes and
future events, while diagnostic models work "backwards" to assess causes and precursor conditions.
                                              44

-------
Model:  A simplification of reality that is constructed to gain insights into select attributes of a physical,
biological, economic, or social system.   A formal representation of the behavior of system processes,
often in mathematical or statistical terms. The basis can also be physical or conceptual (NRC 2007).

Model coding:   The  process of translating the mathematical  equations  that constitute the model
framework into a functioning computer program.

Model evaluation:  The process  used to generate  information to determine whether a model and its
results are of a quality sufficient to serve as the basis for a regulatory decision.

Model framework:  The system of governing equations,  parameterization, and data structures that make
up the mathematical model. The model  framework is a formal mathematical specification of the concepts
and procedures of the conceptual  model consisting  of generalized algorithms (computer code/software)
for different site- or problem-specific simulations (EPA 1994b).

Model framework uncertainty:  The uncertainty in the underlying science and algorithms of a model.
Model framework uncertainty  is the  result of incomplete scientific data or lack of knowledge about the
factors that control the behavior of the system being modeled. Model framework uncertainty can also be
the result of simplifications necessary to  translate the conceptual model  into mathematical terms.

Module:  An independent or  self-contained component of a  model, which is used in combination with
other components and forms part of one or more larger programs.

Noise: Inherent variability that the model does not characterize (see definition for variability).

Objectivity:  One of three main  components of quality  in  EPA's Information Quality  Guidelines.   It
includes  whether disseminated information  is being presented  in an accurate, clear, complete and
unbiased manner. In addition,  objectivity involves a focus on ascertaining accurate, reliable, and unbiased
information (EPA 2002a).

Object-oriented platform: A  type of user interface that models systems using a collection of cooperating
"objects." These objects are treated as instances of a class  within a class hierarchy

Parameters: Terms in the model  that are fixed during a model run or  simulation  but can be changed in
different runs as a method for conducting sensitivity analysis or to achieve calibration goals.

Parameter uncertainty:  Uncertainty (see definition) related to parameter values.

Parametric  variation:   When the value of a parameter  itself is not  a constant and includes natural
variability. Consequently, the parameter should be described as a distribution  (Shelly et al. 2000).

Perfect information:  The state of information where in which there is no uncertainty. The current and
future values for all  parameters are known  with certainty.   The state of perfect information includes
knowledge about the values of parameters with natural variability.
                                              45

-------
Precision: The quality of being reproducible in amount or performance. With models and other forms of
quantitative information, "precision" refers specifically to the number of decimal places to which a number
is computed as a measure of the "preciseness" or "exactness" with which a number is computed.

Probability density function:  Mathematical, graphical, or tabular expression of the relative likelihoods
with  which an unknown or variable quantity may  take various values.  The sum (or integral)  of  all
likelihoods equals 1  for discrete  (continous) random  variables (Cullen  and  Frey 1999).   These
distributions arise from the fundamental properties of the quantities we are attempting to represent. For
example, quantities formed from adding many uncertain parameters tend to be normally distributed, and
quantities formed from multiplying uncertain quantities tend to be lognormal (Morgan and Henrion 1990).

Program  (computer):  A set of instructions, written in the syntax of a computer language, that provide
the computer with a step-by-step logical process.  Computer programs are also referred to as code.

Qualitative assessment:  Some of the uncertainty  in model predictions may arise from sources whose
uncertainty cannot be quantified.  Examples are uncertainties about the theory underlying the model, the
manner in which that  theory is mathematically expressed to represent the environmental components,
and  the theory being  modeled.  The  subjective evaluations of experts may be needed  to determine
appropriate values for model parameters and inputs that cannot be directly observed or measured (e.g.,
air emissions estimates). Qualitative corroboration activities may involve the elicitation of expert judgment
on the true behavior of the system and agreement with model-forecasted behavior.

Quality:  A broad term that includes notions of integrity, utility, and objectivity (EPA 2002a).

Quantitative assessment:  The uncertainty in some  sources — such as some model parameters and
some input data — can be estimated  through quantitative assessments involving statistical uncertainty
and sensitivity analyses. In addition, comparisons can be made for the special purpose of quantitatively
describing the differences to be expected between model estimates of current  conditions and comparable
field  observations.

Reducible uncertainty:  Uncertainty in  models that can be minimized or even eliminated with further
study and additional data (EPA  1997). See "data uncertainty."

Quality:  A broad term that includes notions of integrity, utility, and objectivity (USEPA 2002a).
Reducible Uncertainty:  Uncertainty in models that can be minimized or even eliminated with further study
and additional data (USEPA 1997). See data uncertainty.

Reliability: The confidence that (potential) users have in a model and in the information derived from the
model such that they are willing  to use the  model and the  derived information  (Sargent 2000).
Specifically, reliability  is a function of the performance record  of a model and its conformance to best
available, practicable science.

Response surface: A theoretical multi-dimensional "surface" that describes the response  of a model to
changes  in its parameter values. A response surface is also known as a sensitivity surface.
                                              46

-------
Robustness:  The capacity of a model to perform well across the full range of environmental conditions
for which it was designed.

Screening model:   A  type  of  model designed to  provide a  "conservative"  or  risk-averse  answer.
Screening models can be used with limited information and are conservative, and in some cases they can
be used in lieu of refined models, even when time or resources are not limited.

Sensitivity:    The degree  to which the model  outputs are  affected  by changes  in selected  input
parameters (Beck et al. 1994).

Sensitivity analysis:  The computation of the effect of changes in input values or assumptions (including
boundaries and model functional form) on the outputs (Morgan and Henrion 1990); the  study  of how
uncertainty in a model output can be systematically  apportioned to different sources of uncertainty in the
model input (Saltelli et al. 2000a). By investigating  the "relative sensitivity" of model parameters, a user
can become knowledgeable of the relative importance of parameters in the model.

Simulation model: A model that represents the development of a solution by incremental  steps through
the model domain. Simulations are often used to obtain solutions for models that are too complex to be
solved  analytically.   For most  situations, where  a  differential  equation is being approximated, the
simulation model will use finite time step  (or spatial step) to "simulate" changes in state variables over
time (or space).

Soundness:  One of EPA's five assessment factors  (see definition) that describes the extent to which the
scientific and technical procedures, measures, methods, or models employed to generate the information
are reasonable for and consistent with the intended application (EPA 2003b).

Specifications: Acceptance criteria set at the onset of a quality assurance plan  that help to determine if
the intended objectives  of the project have been met.  Specifications are evaluated using a series of
associated checks (see definition).

State variables:   The dependent variables  calculated within  a  model,  which  are also often the
performance indicators of the models that change over the simulation.

Statistical model: A model built using observations within a probabilistic framework. Statistical  models
include  simple linear or multivariate  regression models obtained  by  fitting observational  data to  a
mathematical function.

Steady-state model: A model providing the long-term or time-averaged behavior of the state variables.

Stochasticity:  Fluctuations  in  ecological processes that are due to  natural  variability and inherent
randomness.

Stochastic model:  A model that includes variability (see definition) in model parameters. This variability
is a  function of changing environmental conditions, spatial and temporal aggregation within  the model
framework, and random variability. The solution obtained by the model or output is therefore a function of
model components and random variability.
                                              47

-------
Transparency:  The clarity and  completeness with which data, assumptions, and methods of analysis
are documented.  Experimental  replication is possible when information about  modeling  processes is
properly and adequately communicated (EPA 2002a).

Uncertainty: The term  used in this document to describe lack of knowledge about models, parameters,
constants, data,  and beliefs.  There are many sources of uncertainty, including the science underlying a
model, uncertainty in  model  parameters and  input data, observation error,  and code  uncertainty.
Additional study  and collecting  more information  allows  error that  stems from  uncertainty  to be
minimized/reduced (or eliminated).  In contrast, variability (see definition) is  irreducible but can be better
characterized or  represented with further study (EPA 2002b, Shelly et al. 2000).

Uncertainty analysis:   Investigation of the effects of lack of knowledge  or potential errors on the model
(e.g, the  "uncertainty" associated with  parameter values). When combined with sensitivity analysis  (see
definition), uncertainty analysis allows a model user to be more informed about the confidence that can
be placed in model results.

Uncertainty and variability:  One of EPA's five assessment factors (see definition) that describes the
extent to which the variability and  uncertainty (quantitative and  qualitative) in the information or in the
procedures,  measures, methods,  or models are evaluated and characterized  (EPA 2003b).

Utility: One of three main components of quality in EPA's Information Quality Guidelines. "Utility" refers
to the usefulness of the information to the intended users (EPA 2002a).

Variable: A measured or estimated quantity that describes an object or can be observed in a system and
that is subject to  change.

Variability:  Observed differences attributable to true heterogeneity or diversity. Variability is the result of
natural random processes and is usually not reducible by further measurement or study (although it can
be better characterized)  (EPA 1997).

Verification (code):  Examination  of the algorithms and numerical technique in  the computer code to
ascertain that they truly represent the  conceptual  model and that there are  no inherent numerical
problems with obtaining a solution (Beck et al 1994).
                                               48

-------
Appendix B: Categories  of Environmental Regulatory Models
This section is taken from Appendix C of the NRC report Models in Environmental Regulatory Decision
Making.

Models can be categorized according to their fit  into a continuum of processes that translate  human
activities and natural systems interactions into human health and environmental impacts. The categories
of models that are integral to environmental regulation include human activity models, natural systems
models,  emissions  models,  fate  and  transport  models,  exposure   models,  human   health  and
environmental response models, economic impact models,  and noneconomic impact models. Examples
of models in each of these categories are  discussed below.

HUMAN ACTIVITY MODELS
Anthropogenic emissions to the environment are  inherently linked to human activities. Activity models
simulate the  human  activities  and behaviors that result in pollutants. In the environmental regulatory
modeling arena, examples of modeled activities are the following:
•   Demographic information, such as the magnitude, distribution, and dynamics of human populations,
    ranging from national growth projections to local travel activity patterns on the order of hours.
•   Economic activity, such as the  macroeconomic estimates of national  economic production and
    income, final demands for aggregate  industrial sectors, prices, international trade, interest rates, and
    financial flows.
•   Human  consumption  of resources,  such  as  gasoline or feed,  may be translated into pollutant
    releases, such as nitrogen  oxides or nutrients. Human food consumption is also used to estimate
    exposure to  pollutants  such as  pesticides. Resource consumption in dollar terms may be  used to
    assess economic impacts.
•   Distribution and characteristics of land use are used to assess habitat, impacts on the hydrogeologic
    cycle and runoff, and biogenic pollutant releases.
Model
TRANSCAD,
TRANSPLAN,
MINUTP
DRI
E-GAS
YIELD
Type
Travel demand
forecasting
models
Forecasts
national
economic
indicators
National and
regional
economic activity
model
Crop-growth
yield model
Use
Develops estimation of motor vehicle miles traveled
for use in estimating vehicle emissions. Can be
combined with geographic information systems
(CIS) for providing spatial and temporal distribution
of motor vehicle activity.
Model can forecast over 1 ,200 economic concepts
including aggregate supply, demand, prices,
incomes, international trade, interest rates, etc. The
eight sectors of the model are: domestic spending,
domestic income, tax sector, prices, financial,
international trade, expectations, and aggregate
supply.
Emissions growth factors for various sector for
estimating volatile organic compounds, nitrogen
oxides, and carbon monoxide emissions.
Predicts temporal and spatial crop yield.
Additional Information
http://www.caliper.com/tcvo
u.ritm
EIA1993
Young etal. 1994
Hayes etal. 1982
NATURAL SYSTEMS PROCESS AND EMISSIONS MODELS
Natural systems process  and  emissions models simulate the dynamics of ecosystems that directly or
indirectly give rise to fluxes of nutrients and other environmental emissions.
                                            49

-------
Model
Marine
Biological
Laboratory
General
Ecosystem
Model (MBL-
GEM)
BEIS
Natural
Emissions
Model
Type
Pilot-scale
nutrient cycling
of carbon and
nitrogen
Natural
emissions of
volatile organic
compounds
Natural
emissions of
methane and
nitrous oxide
Use
Simulates plot-level photosynthesis and nitrogen
uptake by plants, allocation of carbon and nitrogen
to foliage, stems, and fine roots, respiration in these
tissues, turnover of biomass through litter fall, and
decomposition of litter and soil organic matter.
Simulates nitric oxide emissions from soils and
volatile organic compound emissions from
vegetation. Input to grid models for NAAQS
attainment (CAA)
Models methane and nitrous oxide emissions from
the terrestrial biosphere to atmosphere.
Additional Information
http://ecosystems.mbl.edu/
Research/Models/gem/wel
come.html
http://www.epa.gov/asmdn
erl/biogen.html
http://web.mit.edu/globalch
ange/www/tem.html#nem
EMISSIONS MODELS
These  models  estimate the rate or the amount of pollutant  emissions  to  water bodies and the
atmosphere. The outputs of emission models are used to generate inventories of pollutant releases that
can then serve as an input to fate and transport models.
Model
PLOAD
SPARROW
MOBILE
MOVES
NONROAD
Type
Releases to
water bodies
Releases to
water bodies
Releases to air
Use
CIS bulk loading model providing annual pollutant
loads to waterbodies. Conducts simplified analyses
of sediment issues, including a bank erosion hazard
index.
Relates nutrient sources and watershed
characteristics to total nitrogen. Predicts
contaminant flux, concentration, and yield in
streams. Provides empirical estimates (including
uncertainties) of the fate of contaminants in streams.
Factors and activities for anthropogenic emissions
from mobile sources. Estimates current and future
emissions (hydrocarbons, carbon monoxide,
nitrogen oxides, particulate matter, hazardous air
pollutants, and carbon dioxide) from highway motor
vehicles. Model used to evaluate mobile source
control strategies, control strategies for state
implementation plans, and for developing
environmental impact statements, in addition to
other research.
Additional Information
http://www.epa.gov/ost/basi
ns
http://water.usgs.gov/nawq
a/sparrow
http://www.epa.gov/
otaq/m6.htm
http://www.epa.gov/
otaq/nonrdmdl.htm
EPA 2004, EPA 2005a,
Glover and Cumberworth
2003
FATE AND TRANSPORT MODELS
Fate and transport models calculate the movement of pollutants in the environment. A large number of
EPA models fall into this category.  They are further categorized into the transport media they represent:
subsurface, air, and  surface water. In each medium, there are a range of models with respect to their
complexity, where the level of complexity is a function of the following:
•   The number of physical and chemical processes considered.
•   The mathematical representation of those processes and their numerical solution.
•   The spatial and temporal scales over which the processes are modeled.

Even though some fate and transport models can be statistical models, the majority is mechanistic (also
referred to as process-based models). Such  models simulate individual components in the system and
the mathematical relationships among the components. Fate and transport model output has traditionally
                                             50

-------
been  deterministic,  although recent focus on uncertainty and variability has led to  some  probabilistic
models.
Subsurface Models
Subsurface transport is governed by the heterogeneous nature of the ground, the degree of saturation of
the subsurface, as well as the chemical and physical properties of the pollutants of interest. Such models
are used to assess the extent of toxic substance spills. They can also assess the fate of contaminants in
sediments. The array of subsurface models is tailored to particular application  objectives, for example,
assessing the fate of contaminants leaking from underground gasoline storage tanks or leaching from
landfills.  Models  are  used  extensively for  site-specific  risk assessments; for example, to determine
pollutant concentrations in  drinking-water sources. The majority of models simulate liquid pollutants;
however, some simulate gas transport in the subsurface.
Model
MODFLOW
PRZM
BIOPLUME
Type
3D finite
difference for
ground water
transport
Hydrogeological
Two-dimensional
finite difference
and Method of
Characteristics
(MOC) model
Use
Risk Assessments (RBCA) Superfund Remediation
(CERCLA). Modular three-dimensional model that
simulates ground water flow. Model can be used to
support groundwater management activities.
Pesticide leaching into the soil and root zone of
plants (FIFRA). Estimates pesticide and nitrogen
fate in the crop zone root and can simulate soil
temperature, volatilization and vapor phase transport
in soil, irrigation, and microbial transformation.
Simulates organic contaminants in groundwater due
to natural processes of dispersion, advection,
sorption, and biodegradation. Simulates aerobic and
anaerobic biodegradation reactions.
Additional Information
http://water.usgs.gov/
nrp/gwsoftware/
modflow2000/
modflow2000.html
Prudic et al. 2004,
Wilson and Naff 2004
http://www.epa.gov/
ceampubl/products.htm
EPA 2005b
http://www.epa.gov/ada/
csmos/models.html
EPA 1998
Surface Water Quality Models
Surface water quality models are often related to, or are variations of, hydrological models. The latter are
designed to predict flows in water bodies and runoff from precipitation, both of which govern the transport
of aqueous  contaminants. Of particular  interest in  some water  quality models is  the mixing  of
contaminants as a function of time and space, for example, following a point-source discharge into a river.
Other features  of  these  models  are  the  biological,  chemical,  and physical  removal mechanisms  of
contaminants, such  as degradation,  oxidation, and  deposition,  as  well  as the distribution of the
contaminants between the aqueous phase and organisms.
Model
HSPF
WASP
QUAL2E
Type
Combined
watershed
hydrology and
water quality
Compartment
modeling for
aquatic systems
Steady-state and
quasi-dynamic
water quality
model
Use
Total maximum daily load determinations
TMDL (CWA). Watershed model simulating nonpoint
pollutant load and runoff, fate and transport
processes in streams.
Supports management decisions by predicting water
quality responses to pollutants in aquatic systems.
Multicompartment model that examines both the
water column and underlying benthos.
Stream water quality model used as a planning tool
for developing TMDLs. The model can simulate
nutrient cycles, benthic and carbonaceous demand,
algal production, among other parameters.
Additional Information
http://www.epa.gov/
cea m pu bl/swate r/hspf/
http://www.epa.gov/
athens/wwqtsc/html/
wasp.html
Brown 1986, Brown and
Barnwell 1987
http://www3.bae.ncsu.
edu/ Regional-
Bulletins/Modeling-
Bulletin/qual2e.html
                                               51

-------



Brown 1986, Brown and
Barnwell 1987
Air Quality Models
The  fate of gaseous and solid  particle pollutants in the atmosphere  is a  function  of meteorology,
temperature, relative humidity, other pollutants, and sunlight intensity, among  other things. Models that
simulate  concentrations in air have one  of three general designs:  plume models, grid models, and
receptor models. Plume models are used widely for permitting under requirements to assess the impacts
of large new or modified emissions sources on air quality or to assess air toxics (HAPs)  concentrations
close to  sources. Plume models focus on  atmosphere dynamics. Grid models  are used primarily to
assess concentrations of secondary criteria pollutants (e.g., ozone) in  regional airsheds to develop plans
(SIPs) and rules with the objective of attaining ambient air quality standards (NAAQS). Both atmospheric
dynamics and chemistry are important components of 3-D grid models. In contrast to mechanistic plume
and  grid  models,  receptor models are statistical; they determine  the statistical contribution of various
sources to  pollutant concentrations at  a given location based on  the relative amounts of pollutants at
source and receptor. Most air quality models are deterministic.
Model
CMAQ
UAM
REMSAD
ICSC
CALPUFF
CMB
Type
3-D Grid
3-D Grid
3-D Grid
Plume
Receptor
Use
SIP development, NAAQS setting (CAA). The model
provides estimates of ozone, particulates, toxics,
and acid deposition and simulates chemical and
physical properties related to atmospheric trace gas
transformations and distributions. Model has three
components including, meteorological system, an
emissions model for estimating anthropogenic and
natural emissions, and a chemistry-transport
modeling system.
Model calculates concentrations of inert and
chemically reactive pollutants and is used to
evaluate air quality, particularly related to ambient
ozone concentrations.
Using simulation of physical and chemical processes
in the atmosphere that impact pollutant
concentrations, model calculates concentration of
inert and chemically reactive pollutants.
PSD permitting; toxics exposure (CAA, TSCA).
Non-steady-state air quality dispersion model that
simulates long range transport of pollutants.
Relative contributions of sources. Receptor model
used for air resource management purposes.
Additional Information
http://www.epa.gov/
asmdnerl/CMAQ/
index.html
Byun and Ching 1999
Systems Applications
International, Inc., 1999
http://www.remsad.com
ICF Consulting 2005

http://www.epa.gov/scra
m001/rece ptor_cmb.htm
Coulter 2004
EXPOSURE MODELS
The primary objective of exposure models is to estimate the dose of pollutant which humans or animals
are exposed to via inhalation,  ingestion and/or dermal uptake.  These models bridge the gap between
concentrations of pollutants in  the environment and the doses humans receive based on their activity.
Pharmacokinetic models take this one step further and estimate dose to tissues in the body.  Since
exposure is inherently tied to behavior, exposure models may also simulate activity, for example a model
that estimates dietary consumption of pollutants. In addition to the Lifeline model described below, other
examples of models that estimate dietary exposure to pesticides include  Calendex and CARES.  These
                                              52

-------
models can be either deterministic or probabilistic, but are well-suited for probabilistic methods due to the
variability of activity within a population.
Model
Lifeline
IEUBK
Air Pollutants
Exposure
Model (APEX)
Type
Diet, water and
dermal of single
chemical
Multipathway,
single chemical
Inhalation
exposure model
Use
Aggregate dose of pesticide via multiple pathways
Dose of lead to children's blood via multiple
pathways. Estimates exposure from lead in media
(air, water, soil, dust, diet, and paint and other
sources) using pharmacokinetic models to predict
blood lead levels in children 6 months to 7 years old.
The model can be used as a tool for the
determination of site-specific cleanup levels.
Simulates an individual's exposure to an air pollutant
and their movement through space and time in
indoor or outdoor environments. Provides dose
estimates and summary exposure information for
each individual.
Additional Information
http://www.thelifeline
group.org
Lifeline Group, Inc.
2006
http://www.epa.gov/
superfund/programs/
lead/products.htm
EPA 1994
http://www.epa.gov/ttn/
fera/human_apex.html
Richmond etal. 2001
HUMAN HEALTH AND ENVIRONMENT RESPONSE MODELS
Human Health Effects Models
Health effects models provide a statistical relationship between a dose of a chemical and an adverse
human health effect. Health  effects models are statistical methods, hence models in this category are
almost exclusively empirical. They can be further classified as toxicological and  epidemiological. The
former refer to  models  derived from observations in  controlled experiments, usually with nonhuman
subjects. The latter refer to models derived from observations over large populations. Health models use
statistical methods and assumptions that ultimately assume  cause  and effect. Included in this category
are models that extrapolate information from non-human subject experiments. Also,  physiologically based
pharmacokinetic models can  help predict human toxicity to contaminants through mathematical modeling
of absorption, distribution, storage,  metabolism,  and excretion of toxicants.   The output from health
models  is almost  always a dose,  such as a safe level (for example, reference dose [RfD]), a cancer
potency index (CPI), or an expected health end point (for example, lethal dose for 50% of the population
(LD50) or number of asthma cases). There also  exist model applications that facilitate the use of the
statistical methods.
Model
Benchmark
dose model
Linear
Cancer
model
Type
Software tool for
applying a
variety of
statistical models
to analyze dose-
response data
Statistical
analysis
method
Use
To estimate risk of pollutant exposure. Models fit to
dose-response data to determine a benchmark dose
that is associated with a particular benchmark
response.
To estimate the risk posed by carcinogenic
pollutants
Additional Information
http://cfpub.epa.gov/
ncea/cfm/recordisplay.
cfm?deid=20167
EPA 2000
Ecological Effects Models
Ecological effects models, like human  health effects models, define  relationships between  a level of
pollutant exposure and a particular ecological indicator. Many ecological effects models simulate aquatic
                                              53

-------
environments, and ecological indicators are related directly to environmental concentrations.  Examples
of ecological  effects indicators that have been modeled are: algae blooms, BOD, fish populations, crop
yields, coast line erosion, lake acidity, and soil salinity.
Model
AQUATOX
BASS
SERAFM
PATCH
Type
Integrated
fate and
effects of
pollutants in
aquatic
environment
Simulates
fish
populations
exposed to
pollutants
(mechanistic
Steady-state
modeling
system used
to predict
mercury
concentration
in wildlife
Movement of
invertebrates
in their
habitat
Use
Ecosystem model that predicts the environmental
fate of chemicals in aquatic ecosystems, as well as
direct and indirect effects on the resident organisms.
Potential applications to management decisions
include water quality criteria and standards, TMDLs,
and ecological risk assessments of aquatic systems.
Models dynamic chemical bioconcentration of
organic pollutants and metals in fish. Estimates are
being used for ecological risks to fish in addition to
realistic dietary exposures to humans and wildlife.
Predicts total mercury concentrations in fish and
speciated mercury concentrations in water and
sediments.
Provides population estimates of territorial terrestrial
vertebrate species overtime, in addition to survival
and fecundity rates, and orientation of breeding
sites.
Determine ecological effects of regulation.
Additional Information
http://www.epa.gov/
waterscience/models/
aquatox/
Hawkins 2005,
Rashleigh 2007
http://www.epa.gov/
athens/research/
modeling/bass.html
http://www.epa.gov/
ceampubl/swater/
serafm/index.htm
Knightes 2005
http://www.epa.gov/
wed/pages/models/
patch/patchmain.htm
Lawler et al. 2006
ECONOMIC IMPACT MODELS
This category includes a  broad group of models  that are used  in many different aspects of EPA's
activities including: rulemaking (regulatory  impact assessments), priority setting,  enforcement,  and
retrospective analyses. Models that produce a dollar value as output belong in this category. Models can
be divided into cost models, which may include or exclude behavior responses, and benefit models. The
former incorporate economic theory on how markets (supply, demand, and  pricing) will respond as a
result of an action.  Economic models are  traditionally deterministic,  though there is a trend toward
greater use of uncertainty methods in cost-benefit analysis.
Model
ABEL
Nonroad
Diesel
Economic
Impact Model
(NDEIM)
BenMAP
Type
Micro Economic
Macro economic
for impact of the
nonroad diesel
emissions
standards rule
Noneconomic
and
economic
benefits
from air quality
Use
Assess a single firm's ability to pay compliance costs
or fees. Estimates claims from defendants that they
cannot afford to pay for compliance, clean-up or civil
penalties using information from tax return data and
cash-flow analysis.
Used for settlement negotiations.
Multimarket model to analyze how producers and
consumers are expected to respond to compliance
costs associated with the rule. Estimates and
stratifies emissions for nonroad equipment. Model
can be used to inform State Implementation Plans
and regulatory analyses.
Model that estimates the health benefits associated
with air quality changes by estimating changes in
incidences of a wide range of health outcomes and
then placing an economic value on these reduced
incidences.
Additional Information
http://iaspub.epa.gov/
edr/edr_proc_qry.
navigate?P LIST
OPTION CD=CSDIS&
P REG AUTH
IDENTIFIER=1&P
DATA IDENTIFIER=
90389&P VERSIONS
http://www.epa.gov/ttn/
atw/nsps/cinsps/
ci_nsps_eia_report final
forproposal.pdf
http://www.epa.gov/ttne
casl /benmodels.html
                                              54

-------
NONECONOMIC IMPACT MODELS
Noneconomic impact models  evaluate the effects of contaminants on  a  variety of noneconomic
parameters, such as on crop yields and buildings. Note that other noneconomic impacts, such as impacts
on human health or ecosystems, are derived from the human health  and ecological  effects models
discussed previously.
Model
TDM (Travel
Demand
Management)
CERES-
Wheat
PHREEQE-A
Type
Model used to
evaluate travel
demand
management
strategies
Crop-growth
yield model
Models effects of
acidification on
stone
Use
Evaluates travel demand management strategies to
determine vehicle-trip reduction effects. Model used
to support transit policies including HOV lanes,
carpooling, telecommuting, and pricing and travel
subsidies.
Simulates effects of planting density, weather, water,
soil, and nitrogen on crop growth, development, and
yield. Predicts management strategies that impact
crop yield.
Simulates the effects of acidic solutions on
carbonate stone.
Additional Information
http://www.fhwa.dot.go
v/environment/cmaqeat/
descriptions_tdm_evalua
tion_model.htm
http://nowlin.css.msu.ed
u/wheat_book/
Parkhurst et al. 1990
                                            55

-------
Appendix C: Supplementary Material  on Quality Assurance
Planning and Protocols	

This section consists of a series of text boxes meant to supplement concepts and references made in the
main body of the document.  They are not meant as a comprehensive discussion  on QA practices, and
each box  should be considered as a discrete unit.   Individually, the text boxes provide  additional
background material for  specific sections of the main  document.  The  complete QA  manuals for each
subject area discussed in this guidance and referred to below should be consulted for more complete
information on QA planning and protocols.

Box C1: Background on EPA Quality System
The EPA Quality System defined in EPA Order 5360.1 A2, "Policy and Program  Requirements for the
Mandatory Agency-Wide Quality System" (EPA 2000e), covers environmental data produced from models
as well  as "any measurement  or information that environmental processes,  location, or conditions;
ecological  or health effects and consequences; or the performance of environmental technology."  For
EPA, environmental data  include information  collected directly from  measurements, produced from
models, and compiled from other sources such as databases and literature.

The EPA Quality System is  based on an American National Standard, ANSI  1994.   Consistent with
minimum specifications of this standard,  §6.a.(7) of EPA Order 5360.1 A2 states that EPA organizations
will develop a Quality System that includes  "approved"  Quality  Assurance (QA)  Project  Plans, or
equivalent documents defined by the Quality Management Plan, for all applicable projects and tasks
involving environmental data  with review and approval having  been made by the EPA QA Manager (or
authorized representative defined in the Quality Management Plan).  The approval of the QA Project Plan
containing the specifications for the product(s) and the checks against those specifications (assessments)
for implementation is an important management control  assuring records to avoid  fiduciary "waste and
abuse"   (Federal Managers'  Financial  Integrity Act  of  19829  with  annual  declarations including
conformance to the  EPA Quality System). The assessments (including peer review) support the product
acceptance for  models  and their outputs  and approval  for use such as  supporting environmental
management decisions by  answering questions, characterizing environmental processes  or conditions,
and direct decision support such as economic analyses (process planned in  Group D in the Guidance for
QA Project Plans for Modeling).  EPA's policies for QA Project  Plans are provided in Chapter 5 of EPA's
Manual 5360 A1 (EPA 2000e), the EPA Quality Manual for Environmental Programs (EPA 2000f) for in-
house modeling, and Requirements for Quality Assurance Project Plans (QA/G5-M)  (EPA 2002b)  for
modeling done through  extramural agreements (e.g.,  contracts 48  CFR  46, grants and cooperative
agreements 40 CFR 30,  31, and 35). QA requirements must be negotiated and written into Interagency
Agreements if the project is funded by EPA; if funds are received by EPA,  EPA Manual 5360 A1 (EPA
2000e) applies.

EPA Order 5360.1  A2 also  states that EPA organizations' Quality  Systems must include "use of a
systematic planning approach to develop acceptance or performance criteria for all work covered" and
"assessment of existing data, when used to support Agency decisions or other secondary purposes, to
verify that they are of sufficient quantity and adequate quality for their intended use."
' Federal Managers Financial Integrity Act of 1982, P.L. 97-255—(H.R. 1526), September 8, 1982.
                                            56

-------
Box C2: Configuration Tests Specified in the QA Program
During code verification, the final set of computer code is scrutinized to ensure that the equations are
programmed correctly and that sources of error, such as rounding, are minimal. This process is likely to
be more extensive for new computer code.  For existing code, the criteria used for previous verification, if
known, can be described or cited.  Any additional  criteria specific to the  modeling  project can  be
specified,  along  with  how  the  criteria  were  established.   Possible  departures from the criteria  are
discussed, along with how the departures can affect the modeling process.

Software code development  inspections: An  independent person or group other than the author(s)
examines software requirements, software  design, or code to detect faults, programming errors, violations
of development standards, or other problems. All errors found are recorded at the time of inspection, with
later verification that all errors found have been successfully corrected.

Software code performance testing: Software used to compute model predictions is tested to assess
its performance relative to specific response times, computer processing usage, run time, convergence to
solutions, stability of the  solution algorithms, absence of terminal failures, and  other quantitative aspects
of computer operation.

Testing of model modules:  Checks  ensure that the computer code for each module is  computing
outputs accurately and within any specific  time constraints.  (Modules are different segments or portions
of the model linked together to obtain the final model prediction.)

Model framework testing:  The full model framework is tested as the ultimate level of integration testing
to verify that all project-specific requirements have been implemented as intended.

Integration testing:  The  computational  and transfer  interfaces between  modules need to allow  an
accurate transfer of information from one module to the next, and ensure that uncertainties in one module
are not lost or changed when that information is transferred to the next module.  These tests detect
unanticipated interactions between modules and  track down their cause(s).  (Integration tests should  be
designed and applied hierarchically by increasing,  as testing proceeds, the number of modules tested and
the subsystem complexity.)

Regression testing:  All testing performed on the original version  of the module or linked modules is
repeated to detect new "bugs" introduced by changes made in the code to correct a model.

Stress testing (of  complex models):  This ensures that the  maximum load  (e.g.,  real-time  data
acquisition and control systems) does not  exceed limits. The stress test should attempt to simulate the
maximum input, output, and computational load expected during peak usage.  The  load can be defined
quantitatively using criteria such as the frequency of inputs and outputs or the number of computations or
disk accesses per unit of time.

Acceptance testing: Certain contractually required testing may be needed before the new model or the
client  accepts model application. Specific procedures and the criteria for passing the acceptance test are
listed  before the testing is conducted. A stress test and a thorough  evaluation of the user interface is a
recommended part of the acceptance test.
                                              57

-------
Beta  testing  of the pre-release hardware/software:   Persons outside the project group use the
software as they would in normal  operation and  record any anomalies they encounter  or  answer
questions provided in a testing protocol by the regulatory program.  The users report these observations
to the regulatory program or specified developers, who address them before release of the final version.

Reasonableness checks:  These checks involve items like order-of-magnitude, unit, and other checks to
ensure that the numbers are in the range of what is expected.

Note: This section is adapted from (EPA 2002b).

Box C3: Quality Assurance Planning Suggestions for Model Calibration Activities

Information related to objectives and  acceptance criteria for calibration activities that generally appear at
the beginning of this QA Project Plan element includes the following:

Objectives of model calibration: This  includes expected accomplishments of the calibration and how
the predictive quality of the model  might be  improved  as  a result of  implementing  the  calibration
procedures.

Acceptance criteria: The specific limits,  standards, goodness-of-fit,  or other criteria on  which a model
will be judged as being properly calibrated (e.g., the percentage difference between reference data values
from the field or laboratory and predicted results from the model).  This includes a mention of the types of
data and other information that will  be  necessary to acquire  in order to  determine that the model is
properly calibrated (e.g., field data, laboratory  data, predictions from other accepted models).  In addition
to addressing these questions when establishing acceptance criteria, the QA Project Plan can document
the likely consequences (e.g., incorrect decision making) of selecting data that do not satisfy one or more
of these areas (e.g.,  are non-representative, are inaccurate), as well as procedures in place to minimize
the likelihood of selecting such data.

Justifying the calibration approach and acceptance criteria:   Each time a model is  calibrated,  it is
potentially altered. Therefore, it is important  that the different calibrations, the approaches taken (e.g.,
qualitative versus quantitative), and their acceptance  criteria are properly justified. This justification can
refer to the overall quality of the standards being used  as  a reference or to the quality of the input data
(e.g., whether data are sufficient for statistical tests to achieve desired levels of accuracy).
                                               58

-------
Box C4: Definition of Quality

notions of integrity, utility,  and  objectivity.    Integrity refers  to  the protection  of  information from
unauthorized access or revision to ensure that it is not compromised through corruption or falsification. In
the  context of environmental  models, integrity  is  often  most relevant to protection of  code from
unauthorized or inappropriate manipulation (see Box 2).  Utility refers to the usefulness  of the  information
to the  intended users.  The utility of modeling projects is aided by the implementation of a  systematic
planning approach  that includes the development of acceptance or performance  criteria  (see  Box 1).
Objectivity  involves two distinct elements, presentation and substance.   Objectivity  includes whether
disseminated information is being presented in an accurate, clear, complete and unbiased manner. It also
involves a focus on  ascertaining accurate, reliable, and unbiased information.

EPA's  five general assessment factors  (EPA 2003b) for evaluating the quality and relevance  of scientific
and  technical information supporting Agency  actions are: soundness, applicability and utility, clarity and
completeness, uncertainty and variability, and evaluation and review.  Soundness refers to the extent to
which  a model is appropriate for its intended application and is a  reasonable representation of reality.
Applicability and utility  describe the extent to which the information is relevant and appropriate for the
Agency's intended  use.  Clarity and completeness  refer to documentation of the data,  assumptions,
methods,  quality  controls,  and analysis employed to generate the  model outputs.   Uncertainty and
variability highlight the extent to which limitations in knowledge and information and natural randomness
in input data and  models are evaluated and characterized. Evaluation  and review evaluate the extent of
independent application, replication, evaluation, validation, and  peer review of the  information or of the
procedures, measures, methods, or models employed to generate the information.
                                               59

-------
Appendix D:  Best Practices for Model Evaluation	

D.1    Introduction

This appendix presents a practical guide to the best practices for model evaluation (please see Section
4.1  for descriptions of these practices). These best practices are:

•   Scientific peer review (Section 4.1.1)
•   Quality assurance project planning (Section 4.1.2)
•   Corroboration (Section 4.1.3)
•   Sensitivity analysis (Section 4.1.3)
•   Uncertainty  analysis (Section 4.1.3)

 The objective of model evaluation is to determine whether a model is of sufficient quality to inform a
 regulatory decision.  For each of these best practices, this appendix provides a conceptual overview for
 model evaluation  and  introduces  a suite of "tools" that  can  be used in partial fulfillment of the best
 practice.  The  appropriate use of these tools is discussed and citations  to primary references are
 provided. Users are encouraged to obtain more complete information about tools of interest, including
 their theoretical basis, details of their computational methods,  and the availability of software.

 Figure D.1.1 provides an overview of the  steps in the  modeling  process  that are discussed in this
 guidance.  Items in bold in the figure, including peer review, model corroboration, uncertainty analysis,
 and sensitivity analysis, are discussed in this section on model evaluation.
                                              60

-------
                                                Environmental System
                                                                                 Peer review is an ongoing process that should be
                                                                                 considered at all steps in the modeling process.
                                                                                                               Dataset 3...n
            Model
           Results
                                                                   Corroborated
              Conceptual
                 Model
    Mechanistic
       Model
                 Code
              Verification
                                                   Model Evaluation
                  Parameterized
                  Model*
   Model Development
Model Application
Figure D.1.1. The modeling process.
* In some disciplines parameterization may include, or be referred to as, calibration.
** Qualitative and/or quantitative corroboration should be performed when necessary.
                                                                  61

-------
D.2    Scientific Peer Review

EPA policy states that major science-based and technical products related to Agency decisions should
normally be peer-reviewed.  Agency managers determine and are accountable for the decision whether to
employ peer review in particular instances and, if so, its character, scope, and timing.  EPA has published
guidance for program managers responsible for implementing the peer review process for models (Beck
et al. 1994). This guidance discusses peer review mechanisms, the relationship of external peer review to
the process of environmental regulatory model development and application, documentation of the peer
review process, and  specific elements of what could be covered in an external peer review  of model
development and application.

The general process for external peer review of models is as follows (Beck et al. 1994, Press 1992):

•   Step 0: The program manager within the originating office (AA-ship or Region) identifies elements of
    the regulatory process that would benefit from the  use of environmental models. A review/solicitation
    of currently available  models and related research should  be conducted.  If it is concluded that the
    development of a new model is necessary, a research/development work plan is prepared.
•   Step Ob  (optional): The program manager may consider internal and/or external peer review of the
    research/development concepts to determine whether they are of sufficient merit and whether the
    model is likely to achieve the stated purpose.
•   Step  1: The originating office develops a new or revised model or evaluates the  possible novel
    application of a model developed for a different purpose.
•   Step 1b  (optional): The program manager may consider internal and/or external peer review of the
    technical or theoretical  basis prior to final development, revision, or application at this stage.  For
    model development, this review should evaluate the stated application niche.
•   Step 2: Initial Agency-wide  (internal) peer review/consultation of model development and/or  proposed
    application  may  be  undertaken by the developing  originating office.    Model  design,  default
    parameters, etc., and/or intended application are revised  (if necessary) based on consideration of
    internal peer review comments.
•   Step 3: The origination office considers external peer review.  Model design, default parameters, etc.,
    and/or intended application are revised (if necessary) based on consideration of internal peer review
    comments.
•   Step 4:  Final Agency-wide evaluation/consultation may  be implemented by  the originating office.
    This step should consist of consideration of external peer review comments and documentation of the
    Agency's response to scientific/technical issues.

(Note: Steps 2 and 4 are relevant when there is  either an  internal Agency standing or an  ad  hoc peer
review committee or process).
                                              62

-------
Box D1:  Elements of External Peer Review for Environmental Regulatory Models (Box 2-4 from NRC's Models
in Environmental Regulatory Decision Making)
Model Purpose/Objectives
*   What is the regulatory context in which the model will be used and what broad scientific question is the model
    intended to answer?
•   What is the model's application niche?
•   What are the model's strengths and weaknesses?
Major Defining  and Limiting Considerations
*   Which processes are characterized by the  model?
•   What are the important temporal and spatial scales?
•   What is the level of aggregation?
Theoretical Basis for the Model — formulating the basis for problem solution
*   What algorithms are used within the model and how were they derived?
•   What is the method of solution?
•   What are the shortcomings of the modeling approach?
Parameter Estimation
*   What methods and data were used for parameter estimation?
•   What methods were used to estimate parameters for which there were  no data?
•   What are the boundary conditions and are  they appropriate?
Data Quality/Quantity
Questions related to model design include:
•   What data  were utilized in the design of the model?
•   How can the adequacy of the data be defined taking into account the regulatory objectives of the model?
Questions related to model application include:
•   To what extent are these data available and what are the key data gaps?
•   Do additional data need to be collected and for what purpose?
Key Assumptions
*   What are the key assumptions?
•   What is the basis for each key assumption  and what is the range of possible alternatives?
•   How sensitive is the model toward modifying  key assumptions?
Model Performance Measures
*   What criteria have been used to assess model  performance?
•   Did the data bases used in the  performance evaluation provide an adequate test of the model?
•   How does  the model perform relative to other models in this application niche?
Model Documentation and Users Guide
*   Does the documentation cover model applicability and limitations, data  input, and interpretation of results?
Retrospective
•   Does the model satisfy its intended scientific  and regulatory objectives?
•   How robust are the model predictions?
•   How well does the model output quantify the  overall uncertainty?

Source: EPA 1994b.
                                                  63

-------
D.3    Quality Assurance Project Planning
Box D2: Quality Assurance Planning and Data Acceptance Criteria
The QA Project Plan needs to address four issues  regarding information  on how non-direct
measurements are acquired and used  on the project (EPA 2002d):

•   The need and intended use of each type of data or information to be acquired.
•   How the data will be identified or acquired, and expected sources of these data.
•   The method of determining the underlying quality of the data.
•   The criteria established for  determining whether the level of quality for a given set of data is
    acceptable for use on the project.

Acceptance criteria for individual data values generally address issues such as the following:

Representativeness:  Were the  data  collected  from a  population sufficiently similar to the
population  of interest and the model-specified population boundaries?  Were the sampling and
analytical  methods  used  to  generate  the  collected data  acceptable to this  project?  How will
potentially  confounding  effects in the data  (e.g., season,  time of day, location,  and scale
incompatibilities) be addressed so that these effects do not unduly impact the model output?

Bias: Would any characteristics of the dataset directly impact the model output (e.g., unduly high
or low process rates)? For  example, has bias in analysis results been documented?  Is there
sufficient information  to  estimate and  correct bias?   If using data to develop probabilistic
distributions, are there adequate data in the upper and lower extremes of the tails to allow  for
unbiased probabilistic estimates?

Precision: How is the spread in the results estimated?  Is the estimate of variability sufficiently
small to meet the uncertainty objectives of the modeling project as stated in Element A7 (Quality
Objectives  and Criteria for  Model Inputs/Outputs) (e.g., adequate to provide a frequency of
distribution)?

Qualifiers:  Have the data  been evaluated in a manner that permits logical  decisions on the
data's applicability to the current project? Is the system of qualifying or flagging data adequately
documented to allow data from different sources to be used on the same project (e.g., distinguish
actual measurements from estimated values, note differences in detection limits)?

Summarization:  Is the  data summarization process clear and sufficiently consistent with the
goals of this project (e.g., distinguish averages or statistically transformed values from unaltered
measurement values)? Ideally, processing and transformation equations will be made available
so  that their underlying assumptions can  be evaluated  against the  objectives of the  current
project.	
D.4    Corroboration

In this guidance, "corroboration" is defined as all quantitative and qualitative methods for evaluating the
degree to which a  model corresponds to reality.  In practical terms, it is the  process of "confronting
models with data" (Hilborn and Mangel 1997).  In some disciplines, this process  has been referred to as
validation.  In general, the term "corroboration" is preferred because it implies a  claim of usefulness and
not truth.

Corroboration is used to understand how consistent the model is with data.  However, uncertainty and
variability affect how accurately both models and data represent reality because both  models and data
(observations) are approximations of some system. Thus, to conduct corroboration meaningfully (i.e., as
a tool to  assess how well a model represents the system being modeled), this process should  begin by
characterizing the uncertainty and variability in the corroboration data. As discussed in Section 4.1.3.1,
                                               64

-------
variability  stems from the natural randomness  or stochasticity of natural systems and can  be better
captured or characterized in a model but not reduced.  In contrast,  uncertainty can  be minimized  with
improvements in model structure (framework), improved  measurement and analytical  techniques,  and
more comprehensive data for the system being studied.  Hence, even a "perfect" model (that contains no
measurement error  and predicts the  correct ensemble average)  may deviate  from observed  field
measurements at a given time.

Depending on the type (qualitative and/or quantitative) and availability of data, corroboration can involve
hypothesis testing and/or estimates of the likelihood of different model  outcomes.

D.4.1   Qualitative Corroboration
Qualitative model corroboration involves expert judgment and tests  of intuitive  behavior.  This type  of
corroboration uses "knowledge"  of the behavior of the system  in question, but  is not formalized  or
statistics-based.  Expert knowledge can establish model reliability through consensus and consistency.
For example, an expert panel consisting  of model developers and stakeholders could  be convened  to
determine whether there is  agreement that the methods and outputs of a model are consistent  with
processes, standards,  and results used in  other models.  Expert judgment can also  establish model
credibility  by determining  if model-predicted behavior of a  system  agrees  with  best-available
understanding of internal processes and functions.

D.4.2   Quantitative Methods
When data are available, model corroboration may involve comparing model predictions to independent
empirical observations  to investigate  how well a model's  description of the world fits the observational
data. This  involves  using  both  statistical measures for  goodness of fit and numerical procedures  to
facilitate these calculations. The can be done graphically or by calculating various statistical measures of
fit of a model's results to data.

Recall that  a model's  application niche is  the  set  of conditions under which  the use of a model  is
scientifically defensible (Section 5.2.3);  it is the domain  of a model's intended applicability. If the model
being evaluated purports to estimate an  average value across the entire system, then one method to  deal
with  corroboration data is to stratify model  results  and observed data into  "regimes,"  subsets of  data
within which system processes  operate similarly. Corroboration  is then performed  by comparing the
average of model estimates and observed data within each regime (ASTM 2000).

D.4.2.1  Graphical Methods
Graphical  methods  can  be used  to  compare the distribution of  model outputs  to  independent
observations. The degree to which these two distributions overlap, and their respective shapes, provide
an indication of model performance with respect to the data.  Alternately, the  differences  between
observed  and predicted data pairs can be plotted and  the  resulting probability density function (PDF)
used to  indicate precisions and bias. Graphical methods for  model corroboration can be used to indicate
bias, skewness, and kurtosis of model results.  Skewness indicates the relative precision  of model results,
while bias is a reflection of accuracy. Kurtosis refers to the amplitude of the PDF.

D.4.2.2  Deviance Measures
Methods for calculating model bias:
Mean error calculates the average deviation between models and data (e =  model-data) by dividing the
sum of errors (Ee) by total number of data points compared (m).

                                       Ze
                         MeanError  = —  (in original measurement units)
                                       m

Similarly, mean % error provides a unit-less measure of model bias:
                                              65

-------
                                                  Ee/ v
                                MeanError(%) =	* 100 ,
                                                    m

where "s" is the sample or observational data in original units.

Methods for calculating bias and precision:
Mean square error (MSE):

                                                 Ze2
                                         MSE =	
                                                  m
(Large deviations in any single data pair (model-data) can dominate this metric.)

Mean absolute error:
                                                      Eel
                                    MeanAbsError =
                                                       m

D.4.2.3 Statistical Tests
A more formal hypothesis testing procedure can also be used for model corroboration.  In such cases, a
test is performed to determine if the model outputs are statistically significantly different from the empirical
data.  Important considerations in these tests are the probability of making type I and type  II errors and
the shape of the data distributions, as most of these metrics assume the data are distributed  normally.
The test-statistic used should also be  based on the number of data-pairs  (observed and  predicted)
available.

There are a number of comprehensive texts that may help analysts  determine the appropriate statistical
and numerical procedures for conducting model corroboration. These include:

•   Efron, B., and R. Tibshirani. 1993. An Introduction to the Bootstrap. New York: Chapman and Hall.
•   Gelman, A.J.B., H.S. Carlin, and  D.B. Rubin. 1995. Bayesian Data Analysis. New York: Chapman
    and Hall.
•   McCullagh, P., and J.A. Nelder. 1989. Generalized Linear Models. New York: Chapman and Hall.
•   Press,  W.H.,  B.P.  Flannery,  S.A. Teukolsky, and W.T. Vetterling.  1986.  Numerical  Recipes.
    Cambridge, UK: Cambridge University Press.
•   Snedecor, G.W.,  and W.G.  Cochran.  1989. Statistical Methods. Eighth Ed. Iowa State University
    Press.

D.4.3   Evaluating Multiple Models
       Models are metaphorical (albeit sometimes accurate) descriptions of nature, and
       there can  never be a "correct" model.  There may be a "best" model, which is
       more consistent with the data than any of its competitors, or  several models may
       be contenders because each is consistent in some way with the data and none
       clearly dominates the others.  It is the job of the ecological detective to determine
       the support that the data offer for each competing model or hypothesis.
       — Hillborn and Mangel 1997, Ecological Detective

In the simplest sense, a first cut of model performance is obtained by examining which model minimizes
the sum of squares (SSq) between observed and model-predicted data.
                                   SSq = ^ (pred - obs)2
                                              66

-------
The SSq is equal to the squared differences between model-predicted values and observational values.
If data are used to fit models and estimate parameters, the fit will automatically improve with each higher-
order model — e.g., simple linear model, y = a + bX, vs. a polynomial model, y = a + bX + cX2.

It is therefore useful to apply a penalty for additional parameters to determine if the improvement in model
performance  (minimizing SSq  deviation) justifies an increase in  model complexity.   The question is
essential whether the decrease in the sum of squares is statistically significant.

The SSq is best  applied when comparing  several models using a single dataset.   However, if several
datasets are  available the Normalized Mean Square  Error (NMSE) is typically a better statistic, as it is
normalized to the product  of the means  of the observed and predicted  values (see discussion and
references, Section D.4.4.4).

D.4.4   An Example Protocol for Selecting a Set of Best Performing Models

During the development phase  of an air quality dispersion model and in subsequent upgrades, model
performance  is constantly evaluated.  These evaluations generally compare simulation results  using
simple methods that do not account for the  fact that models only predict a portion of the variability seen in
the observations.  To fill a part of this void,  the U.S. Environmental Protection Agency (EPA) developed a
standard that has been adopted by the  ASTM International,  designation D  6589-00 for  Statistical
Evaluation  of Atmospheric  Dispersion  Model Performance  (ASTM 2000).  The  following discussion
summarizes some of the issues discussed in D 6589.

D.4.4.1  Define Evaluation Objectives
Performing a statistical model  evaluation  involves defining those  evaluation  objectives (features or
characteristics) within the pattern of observed and modeled concentration values that are of interest to
compare.   As yet,  no  one feature  or characteristic has been found that can be  defined  within a
concentration pattern that will  fully test a  model's performance.  For instance, the maximum surface
concentration may appear unbiased  through a compensation  of errors in estimating  the lateral  extent of
the dispersing material  and in  estimating  the vertical extent of the dispersing material.   Adding into
consideration that other biases may exist (e.g.,  in treatment of the chemical  and  removal processes
during transport, in estimating buoyant plume rise, in accounting for wind direction changes with height, in
accounting for penetration of material into layers above the current mixing  depth, in  systematic variation
in all of these biases as a function of atmospheric stability), one can appreciate that there are many ways
that a model can falsely give the appearance of good performance.

In  principle, modeling diffusion involves characterizing the size and shape of the volume into which the
material is dispersing as well as the distribution of the material within this volume.  Volumes have three
dimensions, so a model evaluation will be more complete if it tests the model's ability to characterize
diffusion along more than one of these dimensions.

D.4.4.2  Define Evaluation Procedures
Having  selected  evaluation objectives for comparison, the  next  step is  to establish an evaluation
procedure (or series of procedures), which defines how each evaluation objective will be derived from the
available information.   Development of statistical model evaluation procedures  begins with technical
definitions of the  terminology used in the goal statement.  In the following  discussion, we use a plume
dispersion model example,  but the  thought process is valid as well for  regional  photochemical grid
models.

Suppose the  evaluation goal is to test models' ability to replicate the average centerline concentration as
a function of  transport downwind and as a  function of atmospheric stability.  Several questions must be
answered to  achieve this  goal: What is  an "average  centerline  concentration"?  What  is  "transport
downwind"? How  will "stability" be defined?

What questions  arise  in defining the  average  centerline concentration?  Given  a sampling  arc of
concentration values, it  is necessary to decide whether the centerline concentration is the maximum value
                                               67

-------
seen anywhere along the arc or that seen near the center of mass of the observed lateral concentration
distribution.  If one chooses the latter concept, one needs a definition of how "near" the center of mass
one has to be, to be representative of a centerline concentration value. One might decide to select all
values within a specific range (nearness to the center of mass).  In such a case, either a definition or a
procedure will be needed to define how this specific range will be determined. A decision will have to be
made on the treatment of observed zero (and near measurement threshold)  concentrations.  To discard
such values is to say that low concentrations cannot occur near a plume's  center of mass, which is a
dubious assumption. One might test to see if conclusions reached regarding the "best performing model"
are sensitive to the decision made on the treatment of near-zero concentrations.

What questions arise  in defining  "transport downwind"?  During  near-calm  wind  conditions,  when
transport may have favored  more than one direction over the sampling period, "downwind" is not well
described by one direction.   If plume models are being tested, one might exclude near-calm conditions,
since plume models are not meant to provide meaningful results during such conditions.  If puff models or
grid models  are being tested, one might sort the near-calm cases into a special regime for analysis.

What questions arise in defining "stability"? For surface releases, surface-layer Monin-Obukhov length, L,
has been  found to adequately define stability  effects; for elevated  releases,  Z/L, where Z, is the mixing
depth,  has been found to be  a useful parameter for describing stability effects.  Each model likely has its
own  meteorological processor.  It is likely that different processors will have different values for L and Z,
for each of the evaluation cases.  There is no one best way to deal with this problem. One solution might
be to sort the data into  regimes using  each of the models' input values, and see if the conclusions
reached as to best performing model are affected.

What questions arise if one  is grouping data  together?  If one is  grouping data together for which the
emission rates are different, one might choose to  resolve this difference by normalizing the concentration
values by dividing  by the respective emission rates.  To divide  by the emission rate, either one has a
constant emission  rate over the entire  release or the downwind transport is sufficiently obvious that one
can compute an emission rate, based on travel time, that is appropriate for each downwind distance.

Characterizing the plume transport  direction is highly uncertain, even with  meteorological data collected
specific for the purpose.  Thus, we expect that the simulated position of the plume will not overlap the
observed  position  of the plume. One  must decide how to compare a feature (or characteristic) in a
concentration pattern, when uncertainties in transport direction are large. Will the observed and  modeled
patterns be shifted, and if so, in what manner?

This  discussion is  not meant to  be exhaustive,  but to be illustrative of how the thought process might
evolve.  When terms are defined, other questions arise that — when resolved — eventually produce an
analysis that will compute the evaluation objective from the available data.  There likely is more than one
answer to the questions that develop. This may cause different people to develop different objectives and
procedures for the same goal.  If the same set of models is chosen as the best-performing,  regardless of
which path is chosen, one can likely be assured that the conclusions reached are robust.

D.4.4.3 Define Trends in Modeling Bias
In this discussion, references to  observed and modeled  values refer  to the  observed and  model
evaluation objectives (e.g., regime averages). A plot of the observed and modeled values as a function of
one of the model input parameters is  a direct means for detecting model bias.  Such comparison has
been recommended and employed  in a variety of investigations,  e.g.,  Fox  (1981), Weil et al. (1992),
Hanna (1993) In some cases the comparison is the ratio formed  by dividing the  modeled value by the
observed  value, plotted as a function of one or more of the model input parameters.  If the data have
been stratified into regimes, one can also display the standard error estimates on the respective  modeled
and observed regime averages.  If the  respective averages are encompassed by the error bars (typically
plus  and  minus two times  the  standard error estimates), one  can assume the differences  are not
significant.  As Hanna [11]  describes, this a "seductive"  inference.   Procedures to provide a robust
assessment of the significance of the differences are defined in ASTM D 6589 (ASTM 2000).
                                               68

-------
D.4.4.4 Summary of Performance
As an example of overall summary of performance, we will discuss a procedure constructed using the
scheme introduced by Cox and Tikvart (1990) as a template.  The design for statistically summarizing
model performance over several regimes is envisioned as a five-step procedure.

1.   Form a replicate sample using concurrent sampling of the observed and modeled values for each
    regime.  Concurrent sampling  associates results  from all models with each observed value, so that
    selection of an observed value  automatically selects the corresponding estimates by all models.
2.   Compute the average of observed and modeled values for each regime.
3.   Compute the normalized mean square error, NMSE, using the computed regime averages, and store
    the value of the NMSE computed for this pass of the bootstrap sampling.
4.   Repeat steps 1 through 3 for all bootstrap sampling passes (typically of order 500).
5.   Implement the  procedure described in ASTM  D 6589 (ASTM 2000) to detect which model has the
    lowest computed NMSE value  (call this the "base" model) and which models have NMSE values that
    are significantly different from the "base" model.

In the Cox and Tikvart  (1990) analysis, the data were sorted  into regimes (defined in terms of Pasquill
stability  category and  low/high wind speed classes),  and bootstrap sampling was  used  to develop
standard error estimates  on the  comparisons.   The  performance measure was the robust highest
concentration (computed from the raw observed cumulative frequency distribution), which is a comparison
of the highest concentration values (maxima), which most models do not contain the physics to simulate.
This procedure can be improved if intensive field data  are used and  the performance measure is the
NMSE computed from the modeled and observed regime averages of centerline concentration values as
a function of stability along each downwind arc, where each regime is a particular distance downwind for
a defined stability range.

The data demands are  much greater for using regime averages than for using individual concentrations.
Procedures that analyze  groups (regimes) of data include intensive  tracer field studies, with a  dense
receptor network, and many experiments.  Whereas, Cox and Tikvart (1990) devised  their analysis to
make use of very sparse  receptor  networks  having one or more years of sampling results.  With dense
receptor networks,  attempts can  be made to compare average modeled and "observed" centerline
concentration values, but  only a few of these experiments have sufficient data to allow stratification of the
data into regimes for analysis.  With sparse receptor networks, there are more data for analysis, but there
is insufficient information  to define the observed maxima relative to the dispersing plume's center of
mass.   Thus,  there is  uncertainty as to  whether or not the observed  maxima are  representative of
centerline concentration values. It is not obvious that the average of the n (say 25) observed maximum
hourly concentration values (for a  particular distance downwind and narrowly defined stability range) is
the  ensemble average centerline concentration the model is predicting. In fact, one might anticipate that
the  average of the n maximum concentration values is  likely to be higher than the ensemble average of
the  centerline  concentration. Thus the testing procedure outlined by Cox and Tikvart (1990) may favor
selection of poorly formed models  that routinely  underestimate  the  lateral  diffusion  (and thereby
overestimate  the plume  centerline  concentration).  This in turn, may  bias  such  models' ability to
characterize concentration patterns for longer averaging times.

It is therefore concluded that once  a set  of "best-performing  models" has  been  selected from  an
evaluation using intensive field data that tests a model's ability to predict the average characteristics to be
seen in  the observed concentration  patterns, evaluations using sparse networks are seen as  useful
extensions to further explore the performance of well-formulated models for other environs and purposes.

D.5    Sensitivity Analysis
This section provides a broad  overview of uncertainty and sensitivity  analyses  and  introduces various
methods used  to conduct  the latter. A table at the end of this section summarizes these methods' primary
features and citations to additional resources for computational detail.
                                              69

-------
D.5.1   Introducing Sensitivity Analyses and Uncertainty Analysis

A model approximates reality in the face of scientific uncertainties. Section 4.1.3.1 identifies and defines
various sources of model uncertainty.   External peer  reviewers  of  EPA models have  consistently
recommended that EPA  communicate this  uncertainty through uncertainty  analysis  and sensitivity
analysis, two related  disciplines.  Uncertainty analysis investigates the  effects  of lack of knowledge  or
potential errors  of model inputs (e.g.,  the "uncertainty"  associated  with parameter  values); when
combined with sensitivity analysis, it allows a model user to be more informed about the confidence that
can be placed in model results.  Sensitivity analysis measures the effect of changes in  input values  or
assumptions  (including  boundaries and model functional form)  on the outputs (Morgan and Henrion
1990); it is the study of how uncertainty in a  model output can be systematically apportioned to different
sources of uncertainty in the model input (Beck et al. 1994).  By  investigating the "relative sensitivity" of
model parameters,  a  user can  become knowledgeable of the relative importance of parameters in the
model.

Consider a model  represented  as a  function f, with inputs  x1  and x2, and with output y, such that y =
f(xi,x;>). Figure D.5.1  schematically depicts how uncertainty analysis and  sensitivity analysis would  be
conducted  for this  model. Uncertainty analysis would be conducted by determining how  y  responds  to
variation in inputs x1 and x2, the graphic depiction of which is referred to as the model's response surface.
Sensitivity  analysis would be  conducted by  apportioning the respective contributions  of x1 and x2  to
changes in y. The schematic should not be  construed to imply that uncertainty analysis  and sensitivity
analysis are sequential events.  Rather, they are generally conducted by trial and error, with each type of
analysis informing the other. Indeed, in practice, the distinction between these two related disciplines may
be irrelevant. For purposes of clarity, the  remainder of this appendix will refer exclusively to sensitivity
analysis.
                        Uncertainty Analysis
                   y  =
     inputs
model run
outputs
                                                                                 Sensitivity
                                                                                  Analysis
Figure D.5.1. Uncertainty and sensitivity analyses. Uncertainty analysis investigates the effects of lack of
knowledge or potential errors of model inputs. Sensitivity analysis evaluates the respective contributions
of inputs x-, and x2to output y.

D.5.2   Sensitivity Analysis and Computational Complexity
Choosing  the appropriate uncertainty analysis/sensitivity analysis method is often a matter of trading off
between the amount of information one wants from the analyses and the computational difficulties of the
analyses.  These computational difficulties are often inversely related to the number of assumptions one is
willing or able to make about the shape of a model's response surface.

Consider once  again a model represented as a function f, with inputs x-, and x2 and with output y, such
that  y  =  f(x1tx2).  Sensitivity  measures  how output  changes with  respect  to  an  input.  This is  a
straightforward  enough procedure with differential analysis if the analyst:
                                               70

-------
•   Can assume that the model's response surface is a hyperplane, as in Figure 0.5.2(1);
•   Accepts that the results apply only to specific points on the response surface and that these points
    are monotonic first order, as in Figure D.5.2 (2);10 or
•   Is unconcerned about interactions among the input variables.

Otherwise, sensitivity analysis may be more appropriately conducted  using more intensive computational
methods.
(1)
(2)
Figure D.5.2.  It's hyperplane and simple.  (1) A model response surface that is a hyperplane can
simplify sensitivity analysis computations. (2) The same computations can also be used for other
response surfaces, but only as approximations around a single locus.

This guidance suggests that, depending on assumptions underlying the model, the analyst should use
non-intensive  sensitivity  analysis techniques  to  initially identify  those inputs  that  generate  the most
sensitivity, then apply more intensive methods to this smaller subset of inputs.  It may therefore be useful
to categorize the various sensitivity analysis techniques into methods  that (a)  can be quickly used  to
screen for  the more important input factors;  (b) are based  on differential analyses; (c)  are  based on
sampling; and (d) are based on variance methods.

D.5.3   Screening Tools

D.5.3.1 Tools That Require No Model Runs
Cullen and  Frey (1999) suggest  that summary statistics measuring  input uncertainty  can  serve as
preliminary screening tools without additional model runs (and  if the  models  are  simple  and linear),
indicating proportionate contributions to output uncertainty:

•   Coefficient of variation. The coefficient of variation is the  standard deviation  normalized to  the mean
    (CT/(O) in order to reduce the  possibility that inputs that  take  on large values are given  undue
    importance.
•   Gaussian  approximation.  Another  approach  to   apportioning   input  variance  is   Gaussian
    approximation. Using this method, the variance  of a model's  output is  estimated as the sum of the
    variances of the  inputs (for additive models) or the sum of the variances of the log-transformed inputs
    (for multiplicative models), weighted by the squares on any  constants which may be multiplied by the
    inputs as they occur in the model.

D.5.3.2 Scatterplots
Cullen and Frey (1999) suggest that a high  correlation  between an input and  an  output variable  may
indicate substantial dependence of the variation  in output and the variation  of the input. A simple, visual
10 Related to this issue are the terms "local sensitivity analysis" and "global sensitivity analysis." The former refers
to sensitivity analysis conducted around a nominal point of the response surface, while the latter refers to sensitivity
analysis across the entire surface.
                                                71

-------
assessment of the influence of an input on the output is therefore possible using scatterplots, with each
plot posing a selected input against the output, as in Figure D.5.3.

       Time (ms)
        15 T
        12
        9 •
        3 •
                   -+-
       Area
H	1 (K pixels)
          0   100  200   300   400
      Figure D.5.3. Correlation as indication of input effect. The high correlation between the input
      variable area and the output variable time (holding all other variables fixed) is an indication of
      the possible effect of area's variation on the output.

D.5.3.3 Morris's OAT
The  key concept underlying one-at-a-time (OAT) sensitivity analyses is to choose a base case of input
values and to perturb each input variable by a given  percentage away from the base value while holding
all other input variables  constant. Most OAT sensitivity analysis  methods yield  local measures  of
sensitivity (see footnote 9) that depend on the choice of base case values. To avoid this bias,  Saltelli et
al. (2000b)  recommend using  Morris's  OAT for screening purposes because  it  is  a global sensitivity
analysis method — it entails computing a number of local measures (randomly extracted across the input
space) and then taking their average.

Morris's OAT provides a measure of the importance of an  input factor in generating output variation, and
while it does not  quantify  interaction effects, it does provide an indication of the presence of interaction.
Figure D.5.4 presents the results that  one would expect to obtain from  applying Morris's OAT (Cossarini
et al. 2002). Computational methods for this technique are described  in Saltelli et al. 2000b.
                                                72

-------
 Interaction
              o.s
              0.6
              0.2
              0.0
                                  ptiy2
                             eff
                               2:00 1
                                                         opL_phy2
                                                       opL_phy"l
                   -1.2     -0.8     -0.4     0.0

             Importance of input in output variation
                                                  0.4
                                                         o.s
 Figure D.5.4.  An application of Morris's OAT. Cossarini et al. (2002) investigated the influence
 of various ecological factors  on  energy flow through  a  food web. Their sensitivity analysis
 indicated  that maximum bacteria growth and bacteria mortality (|abac and Kmbac, respectively)
 have  the  largest (and opposite)  effects  on energy flow,  as indicated by their  values  on the
 horizontal axis. These effects, as indicated by their values on the vertical axis, resulted  from
 interactions with other factors.

D.5.4  Methods Based on Differential Analysis
As noted previously, differential analyses may be used to analyze sensitivity if the analyst is willing either
to assume that the model response surface is hyperplanar or to accept that the sensitivity analysis results
are local and that they are  based on hyperplanar approximations tangent to the response surface at the
nominal scenario (Morgan and Henrion 1990; Saltelli et al. 2000b).

Differential analyses entail  four steps.  First, select base values and ranges for input factors. Second,
using these input base values, develop a Taylor series approximation to the output. Third,  estimate
uncertainty in output in terms of its expected value and variance using variance propagation techniques.
Finally, use the Taylor series approximations  to estimate the importance of individual input factors (Saltelli
et al. 2000b). Computational methods for this technique  are described in Morgan and Henrion 1990.

D.5.5  Methods Based on Sampling
One approach to estimating the impact of input uncertainties is to repeatedly run a model using randomly
sampled  values  from the input space. The most well-known method using this  approach is Monte Carlo
analysis. In a Monte Carlo simulation, a model is run repeatedly. With each  run, different input values are
drawn  randomly from the  probability distribution  functions of each  input, thereby generating  multiple
output  values (Morgan and  Henrion 1990; Cullen and Frey 1999). One can view a Monte Carlo simulation
as a process through which multiple scenarios generate multiple output values; although each execution
of the model run is deterministic, the set of output values may be represented as a cumulative distribution
function and summarized using statistical measures (Cullen and Frey 1999).

EPA proposes several  best principles of good practice  for the conduct of Monte Carlo simulations (EPA
1997).  They include the following:

•   Conduct preliminary sensitivity  analyses to identify significant model components and input variables
    that make important contributions to  model uncertainty.
•   When  deciding upon  a  probability distribution  function (PDF) for input  variables, consider  the
    following questions: Is  there any mechanistic basis for choosing a distributional family? Is the PDF
    likely to be  dictated  by physical, biological, or other properties and mechanisms? Is the variable
                                               73

-------
    discrete or continuous? What are the bounds of the variable? Is the PDF symmetric or skewed, and if
    skewed, in which direction?
•   Base the PDF on empirical, representative data.
•   If expert judgment is used as the basis for the PDF, document explicitly the reasoning underlying this
    opinion.
•   Discuss the  presence or absence of covariance among the input variables, which can significantly
    affect the output.

The preceding points merely summarize some of the main points raised in EPA's Guidance on Monte
Carlo  Analysis. That document should be consulted for more detailed guidance. Conducting Monte Carlo
analysis may be problematic for models containing a large number of input variables. Fortunately, there
are several approaches to dealing with this problem:

•   Brute force approach. One approach is to increase sheer computing power. For example, EPA's
    ORD is developing  a Java-based tool that facilitates Monte Carlo analyses  across a cluster of PCs by
    harnessing  the  computing power of multiple workstations to conduct multiple runs for a complex
    model (Babendreierand Castleton 2002).
•   Smaller, structured trials.  The value  of Monte Carlo lies not in the randomness of sampling,  but in
    achieving representative  properties  of  sets  of points  in the  input space. Therefore, rather than
    sampling data from entire input space, computations may be through stratified sampling by dividing
    the input sample space into strata and sampling from within each stratum. A widely used method for
    stratified sampling is Latin hypercube sampling, comprehensively described in Cullen and Frey 1999.
•   Response surface model surrogate. The analyst may also choose to conduct Monte Carlo not on the
    complex model directly, but rather on a response surface representation of it. The latter is a simplified
    representation  of the relationship between  a selected  number of  model outputs and a selected
    number of model inputs, with all other model  inputs held at fixed values (Morgan and Henrion 1990;
    Saltelli et al. 2000b).

D.5.6   Methods Based on Variance
Consider once again a  model represented as a function f, with inputs x1 and x2 and with output y,  such
that y = f(x1,x2). The input variables are affected by uncertainties and may take on any number of possible
values. Let X denote an input  vector randomly chosen from among all possible values for x1 and x2. The
output y for a given X can also be seen as a realization of a  random variable Y. Let E(Y\X) denote the
expectation of Y conditional on a fixed value of X. If the total variation in y is matched by the variability in
E(Y\ X) as x-, is allowed  to vary, this is an  indication that variation in x-, significantly affects y.

The variance-based approaches to sensitivity analysis are based on the estimation of what fraction of
total variation of y is attributable to variability in E (Y\X) as a subset of input factors are allowed to  vary.
Three  methods for computing this estimation (correlation ratio,  Sobol, and Fourier amplitude sensitivity
test) are featured in Saltelli et al. 2000b.

D.5.7   Which Method to Use?
A  panel of experts was recently assembled to review various sensitivity analysis methods. The panel
refrained from explicitly recommending a "best"  method and instead developed a list of attributes for
preferred sensitivity analysis methods. The panel recommended that methods should preferably be able
to  deal with  a model  regardless of assumptions about a  model's linearity and  additivity, consider
interaction effects among  input uncertainties, cope with differences in the scale and shape of input PDFs,
cope with differences in input  spatial and temporal dimensions, and evaluate the effect of an input while
all other inputs are allowed to vary as well (Frey 2002; Saltelli 2002). Of the various methods discussed
above, only those based on variance (Section D.5.6) are characterized by these attributes. When one or
more  of the criteria are not important, the other tools discussed in this section will provide a reasonable
sensitivity assessment.

As mentioned earlier, choosing the most appropriate sensitivity analysis method will often entail a trade-
off between computational complexity, model assumptions, and the amount of information needed from
                                              74

-------
the sensitivity analysis. As an aid to sensitivity analysis method selection, the table below summarizes the
features and caveats of the methods discussed above.
Method
Screening
methods
Morris's
one-at-a-time
Differential
analyses
Monte Carlo
analyses
Variance-
based
Features
May be conducted
independent of model
run
Global sensitivity
analysis
Global sensitivity
analysis for linear model;
local sensitivity analysis
for nonlinear model
Intuitive
No assumptions
regarding response
surface
Robust and independent
of model assumptions
Addresses interactions
Caveats
Potential for significant error if
model is non-linear
Indicates, but does not
quantify interactions
No treatment of interactions
among inputs
Assumes linearity,
monotonicity, and continuity
Depending on number of
input variables, may be time-
consuming to run, but
methods to simplify are
available
May rely on assumptions
regarding input PDFs
May be computationally
difficult.
Reference
Cullen and Frey
1999, pp. 247-8.
Saltelli et al.
2000b, p. 68.
Cullen and Frey
1999, pp. 186-94.
Saltelli et al.
2000b, pp. 183-91
Cullen and Frey
1999, pp. 196-237
Morgan and
Henrion 1990, pp.
198-216.
Saltelli et al.
2000b, pp. 167-97
D.6    Uncertainty Analysis
D.6.1   Model Suitability
An  evaluation of model suitability  to  resolve application niche uncertainty  (Section 4.1.3.1) should
precede any evaluation of data uncertainty and  model performance.  The extent to which a  model is
suitable for a proposed application depends on:

    •   Mapping of model attributes to the problem statement
    •   The degree  of certainty needed  in model outputs
    •   The amount of reliable data available or resources available to collect additional data
    •   Quality of the state of knowledge on which the model is based
    •   Technical competence of those  undertaking simulation modeling

Appropriate data should be available before any attempt  is made to apply a model. A model that needs
detailed, precise input data should not be used when such data are unavailable.

D.6.2  Data Uncertainty
There are two statistical paradigms that can be adopted to summarize data. The first employs classical
statistics  and is useful for capturing  the most likely or "average" conditions  observed  in a given system.
This is known as the "frequentist"  approach  to summarizing model input data.  Frequentist statistics rely
on  measures of central tendency  (median,  mode,  mean values)  and represent uncertainty as the
deviation from these metrics. A frequentist or "deterministic" model produces a single set of solutions for
each model run. In contrast, the alternate statistical  paradigm employs a probabilistic framework, which
summarizes data according to their "likelihood" of  occurrence.  Input data are represented as distributions
rather than a single numerical value and models outputs capture a range of possible values.

The classical view of probability defines the probability of an event occurring by the  value to which the
long run  frequency  of an event or quantity converges as  the  number of trials increases (Morgan and
Henrion 1990).  Classical statistics  relies on  measures  of central tendency  (mean,  median, mode)  to
                                               75

-------
define model parameters and their associated uncertainty (standard deviation, standard error, confidence
intervals).

In contrast to the classical view, a subjectivist or Bayesian view is that the probability of an event is the
current degree of belief that a person has that it will occur, given all of the relevant information currently
known to that person.  This framework involves the use of probability distributions based on likelihoods
functions to  represent  model input values and employs techniques like Bayesian updating and Monte
Carlo methods as statistical evaluation tools (Morgan and Henrion 1990).
                                               76

-------

Literature Cited in Main Text and Appendices A. C. D:

Anderson, M., and W. Woessner. 1992.  The role of the postaudit in model validation. Advances in Water
Resources 15: 167-173.

ANSI (American National Standards Institute). 1994.  Specifications and Guidelines for Quality Systems
for Environmental Data Collection and Technology Programs. ANSI/ANSQ, E4-1994.

ASTM. 2000. Standard Guide for Statistical Evaluation of Atmospheric Dispersion Model Performance (D
6589). Available: http://www.astm.org.
Babendreier, J.E., and K.J. Castleton. 2002. Investigating uncertainty and sensitivity in integrated,
multimedia environmental models: tools for FRAMES-3MRA. In: Proceedings of 1st Biennial Meetir
International Environmental Modeling and Software Society 2: 90-95. Lugano, Switzerland.
Barnwell, T.O., L.C. Brown, and R.C. Whittemore. 2004. Importance of field data in stream water quality
modeling using QUAL2E-UNCAS. J. Environ. Eng. 130(6): 643-647.

Beck, M.B. 1987. Water quality modeling: a review of the analysis of uncertainty. Water Resources
Research 23(8): 1393-1442.

Beck, B. 2002. Model evaluation  and performance. In: A.M. El-Shaarawi and W.W. Piegorsch, eds.
Encyclopedia of Environmetrics. Chichester: John Wiley & Sons.

Beck, M., L.A. Mulkey, and T.O. Barnwell. 1994. Model Validation for Exposure Assessments — DRAFT.
Athens, Georgia: United States Environmental Protection Agency.

Booch, G. 1994. Object-Oriented  Analysis and Design with Applications. 2nd ed. Redwood, California:
Benjamin/Cummings.

Borsuk, M.E., C.A. Stow, and K.H. Reckhow. 2002. Predicting the frequency of water quality standard
violations: a probabilistic approach for TMDL development. Environmental Science and Technology 36:
2109-2115.

Bredehoeft, J.D. 2003. From models to performance assessment: the conceptualization problem. Ground
Water 41 (5): 57'1-577.

Bredehoeft, J.D. 2005. The conceptualization model problem — surprise. Hydrogeology Journal 13(1):
37-46.

Cossarini, G., C. Solidoro, and A.  Crise. 2002. A model forthe trophic food web of the Gulf of Trieste. In:
A.E. Rizzoli and A.J. Jakeman, eds. Integrated Assessment and Decision Support: Proceedings of the 1st
                                             77

-------
Biennial Meeting of the iEMSs 3: 485. Available: http://www.iemss.org/iemss2002/proceedinqs/pdf/
volume%20tre/285 cossarini.pdf.

Cox, W.M., and J.A. Tikvart. 1990. A statistical procedure for determining the best performing air quality
simulation model. Atmos. Environ. 24A(9):2387-2395.

CREM (Council on Regulatory Environmental Modeling). 2001.  Proposed Agency Strategy for the
Development of Guidance on Recommended Practices in Environmental Modeling. Draft. U.S.
Environmental Protection Agency.

Cullen, A.C., and H.C. Frey. 1999. Probabilistic Techniques in Exposure Assessment: A Handbook for
Dealing with Variability and Uncertainty in Models and Inputs, ed. 326. New York:  Plenum Press.

EPA (U.S. Environmental Protection Agency).  1992. Protocol for Determining the Best Performing Model.
EPA-454-R-92-025. Research Triangle Park, North Carolina: Office of Air Quality Planning and
Standards.

EPA (U.S. Environmental Protection Agency).  1993. Review of Draft Agency Guidance for Conducting
External Peer Review of Environmental Regulatory Modeling. EPA-SAB-EEC-LTR-93-008.

EPA (U.S. Environmental Protection Agency).  1994a. Peer Review and Peer Involvement at the U.S.
Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency).  1994b. Report of the Agency Task Force on Environmental
Regulatory Modeling: Guidance, Support Needs, Draft Criteria and Charter. EPA-500-R-94-001.
Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency).  1997.  Guiding Principles for Monte Carlo Analysis. EPA-
630-R-97-001. Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency).  1999c. Description of the MOBILE Highway Vehicle
Emissions Factor Model. Office of Mobile Sources. Ann Arbor, Michigan: U.S. Environmental Protection
Agency.

EPA (U.S. Environmental Protection Agency).  2000a. Guidance for the Data Quality Objectives Process.
EPA QA/G-4. Office of Environmental Information.

EPA (U.S. Environmental Protection Agency).  2000b. Guidance for Data Quality Assessment. EPA QA/G-
9. Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency).  2000c. Science Policy Council Handbook: Peer Review.
2nd ed.

EPA (U.S. Environmental Protection Agency).  2000d. Risk Characterization Handbook.  Science Policy
Council. EPA-100-B-00-002. Washington, D.C.: U.S. Environmental Protection Agency.
                                             78

-------
EPA (U.S. Environmental Protection Agency). 2000e. Policy and Program Requirements for the
Mandatory Agency-Wide Quality System. EPA Order, Classification Number 5360.1 A2.

EPA (U.S. Environmental Protection Agency). 2000f.  EPA Quality Manual for Environmental Programs.
5360A1.

EPA (U.S. Environmental Protection Agency). 2001. Proposed Agency Strategy for the Development of
Guidance on Recommended Practices in Environmental Modeling. Model Evaluation Action Team,
Council for Regulatory Environmental Modeling. Washington, D.C.: U.S. Environmental Protection
Agency.

EPA (U.S. Environmental Protection Agency). 2002a. Information Quality Guidelines. Office of
Environmental Information. Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 2002b. Quality Assurance Project Plans for Modeling. EPA
QA/G-5M. Office of Environmental Information.

EPA (U.S. Environmental Protection Agency). 2002c. Guidance on Choosing a Sampling Design for
Environmental Data Collection for Use in Developing a Quality Assurance Plan. EPA QA/G-5S.
Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 2002d. Guidance on Environmental Data Verification and
Data Validation. EPA QA/G-8. Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 2003a. Revision to guideline on air quality models:
adoption of a preferred long range transport model and other revisions.  Federal Register 68 (72): 18440-
18482.

EPA (U.S. Environmental Protection Agency). 2003b. A Summary of General Assessment Factors for
Evaluating the Quality of Scientific and Technical Information. Science Policy Council. Washington, D.C.:
U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 2006. Peer Review Handbook. 3rd ed. EPA-100-B-06-002.
Prepared for the U.S. Environmental Protection Agency by members of the Peer Review Advisory Group,
for EPA's Science Policy Council. Washington, D.C: U.S. Environmental Protection Agency. Available:
http://epa.gov/peerreview/pdfs/Peer%20Review%20HandbookMay06.pdffaccessed Nov. 10, 2006].

Fox, D.G. 1981.  Judging air quality model performance: a summary of the AMS workshop on dispersion
model performance. Bull. Amer. Meteor. Soc. 62: 599-609.

Frey, H.C. 2002.  Guest editorial:  introduction to special section on sensitivity analysis and summary of
NCSU/USDA Workshop on Sensitivity Analysis. Risk Analysis. 22: 539-546.

Hanna, S.R.  1988. Air quality model evaluation and uncertainty. Journal of the Air Pollution Control
Association 38: 406-442.
                                             79

-------
Hanna, S.R. 1993.  Uncertainties in air quality model predictions. Boundary-Layer Met. 62: 3-20.

Hillborn, R., and M. Mangel. 1997. The Ecological Detective: Confronting Models with Data. Princeton,
New Jersey: Princeton University Press.

Kernigham, B.W., and P.J. Plaugher. 1988. The Elements of Programming Style. 2nd ed.

Konikow, L.F., and J.D. Bredehoeft. 1992. Ground water models cannot be validated. Advances in Water
Resources 15(1): 75-83.

Levins, S. 1992. The problem of pattern and scale in ecology. Ecology 73: 1943-1967.

Luis, S.J., and D.B. McLaughlin. 1992. A stochastic approach to model validation. Advances in Water
Resources 15(1): 75-83.

Manno, J., R. Smardon, J.V. DePinto, E.T. Cloyd, and S. Del Grando. 2008.  The Use of Models in Great
Lakes Decision Making: An Interdisciplinary Synthesis. Randolph G. Pack Environmental Institute,
College of Environmental Science and Forestry. Occasional Paper 16.

Morgan, G., and M. Henrion. 1990. The nature and sources of uncertainty. In: Uncertainty: A Guide to
Dealing With Uncertainty in Quantitative Risk and Policy Analysis. Cambridge, U.K.: Cambridge
University Press, pp. 47-72.

NRC (National Research Council). 2001. Assessing the TMDL Approach to Water Quality Management,
Committee to Assess the Scientific Basis of the Total Maximum Daily Approach to Water Pollution
Reduction. Water Science and Technology Board, Division of Earth and Life Studies. Washington, D.C.:
National Academies Press.

NRC (National Research Council). 2007. Models in Environmental Regulatory Decision Making.
Committee on Models in the Regulatory  Decision Process, Board on Environmental Studies and
Toxicology, Division on Earth and Life Studies. Washington, D.C.: National Academies Press.

Platt, J.R. 1964. Strong inference. Science 146: 347-352.

Press, W.H. 1992. Numerical Recipes: The Art of Scientific Computing. Cambridge, U.K.: Cambridge
University Press.

Reckhow, K.H. 1994.  Water quality simulation modeling and uncertainty analysis for risk assessment and
decision making. Ecological Modeling 72: 1-20.

SAB (Science Advisory Board). 1988. Review of the Underground Storage Tank (UST) Release
Simulation Model. SAB-EEC-88-029. Environmental Engineering Committee. Washington, D.C.: U.S.
Environmental Protection Agency.
                                             80

-------
SAB (Science Advisory Board). 1989. Resolution on the Use of Mathematical Models by EPA for
Regulatory Assessment and Decision-Making. EPA-SAB-EEC-89-012. Washington, D.C.: U.S.
Environmental Protection Agency.

SAB (Science Advisory Board). 1993a.  Review of Draft Agency Guidance for Conducting External Peer
Review of Environmental Regulatory Modeling. EPA-SAB-EEC-LTR-93-008. Washington, D.C.: U.S.
Environmental Protection Agency.

SAB (Science Advisory Board). 1993b.  An SAB  Report: Review ofMMSoils Component of the Proposed
RIA forthe RCRA Corrective Action Rule. EPA-SAB-EEC-94-002. Washington, D.C.: U.S. Environmental
Protection Agency.

Saltelli, A., S. Tarantola, and F. Campolongo. 2000a. Sensitivity analysis as an ingredient of modeling.
Statistical Science 15: 377-395.

Saltelli, A., K. Chan, and M. Scott, eds.  2000b. Sensitivity Analysis. New York: John Wiley and Sons.

Saltelli, A. 2002. Sensitivity analysis for importance assessment. Risk Analysis 22: 579-590.

Sargent, R.G. 2000.  Verification, validation and  accreditation of simulation models. In: J.A. Joines et al.,
eds.  Proceedings of the 2000 Winter Simulation  Conference.

Scheffe, R.D., and R.E. Morris. 1993. A review of the development and application of the Urban Airshed
model. Atmos. Environ. B-Urb. 27(1): 23-39.

Shelly, A., D. Ford, and B. Beck. 2000.  Quality Assurance of Environmental Models.  NRCSE Technical
Report Series.

Small, M.J., and P.S. Fishbeck. 1999. False precision in Bayesian updating with incomplete models.
Human and Ecological Risk Assessment 5(2): 291-304

Suter, G.W.I. 1993. Ecological Risk Assessment. Boca Raton: Lewis Publishers. 528.

Usunoff, E., J. Carrera, and S.F. Mousavi. 1992.  An approach to the design of experiments for
discriminating among alternative conceptual models. Advances in Water Resources 15(3): 199-214.

Weil, J.C., R.I. Sykes, and A. Venkatram. 1992.  Evaluating  air-quality models: review and outlook. J.
Appl. Meteor. 31: 1121-1145.
Literature Cited in Boxes in Main Text

ADEC (Alaska Department of Environmental Conservation). 2001. Section III.C: Fairbanks Transportation
Control Program. In: State Air Quality Control Plan. Vol. 2. Analysis of Problems, Control Actions.
Adapted July 27, 2001. Juneau, Alaska: Alaska Department of Environmental Conservation.
                                             81

-------
Beck, M.E. 2002. Environmental foresight and models: a manifesto. Developments in Environmental
Modeling 22: 473. Amsterdam: Elsevier.

Burden, D. 2004. Environmental decision making: principles and criteria forgroundwaterfate and
transport models. Presentation at the First Meeting on Models in the Regulatory Decision Process, March
18, 2004, Washington, D.C.

Dennis, R.L. 2002. The ozone problem. In: M.B. Beck, ed. Environmental Foresight and Models: A
Manifesto. New York: Elsevier. 147-169.

EPA (U.S. Environmental Protection Agency). 1998. Guidelines for Ecological Risk Assessment. EPA-
630-R-95-002F. Risk Assessment Forum. Available: http://www.epa.gov/superfund/programs/
nrd/era.htm [accessed Nov. 7, 2006].

EPA (U.S. Environmental Protection Agency). 2000. Benchmark Dose Technical Guidance Document.
EPA-630-R-00-001. External Review Draft. Risk Assessment Forum. Washington, D.C.: U.S.
Environmental  Protection Agency. Available: http://www.epa.gov/nceawww1/pdfs/bmds/BMD-
External  10  13  2000.pdf [accessed June 12, 2007].

EPA (U.S. Environmental Protection Agency). 2002b.  Total Maximum Daily Load for Total Mercury in the
Ochlockonee Watershed, GA. Region 4. Available: http://www.epa.gov/Region4/water/tmdl/
georgia/ochlockonee/final tmdls/OchlockoneeHgFinalTMDL.pdf [accessed Nov. 7, 2006].

EPA (U.S. Environmental Protection Agency). 2002c. Perchlorate Environmental Contamination:
Toxicological Review and Risk Characterization. External Review Draft. NCEA-1-0503. National Center
for Environmental Assessment, Office of Research and Development. Washington, D.C.: U.S.
Environmental  Protection Agency. Available: http://cfpub.epa.gov/ncea/cfm/recordisplay. cfm?deid=24002
[accessed Nov. 7, 2006].

EPA (U.S. Environmental Protection Agency). 2004b. Final Regulatory Impact Analysis: Control of
Emissions From Nonroad Diesel Engines. EPA-420-R-04-007. Assessment and Standards Division,
Office of Transportation  and Air Quality. Available: http://www.epa.gov/nonroad-diesel/2004fr/
420r04007a.pdf [accessed Nov. 9, 2006].

EPA (U.S. Environmental Protection Agency). 2004c. Air Quality Criteria for Particulate Matter. Vols. 1
and 2. EPA-600-P-99-002aF-bF. National Center for Environmental Assessment, Office of Research and
Development. Research Triangle Park, North Carolina: U.S. Environmental Protection Agency. Available:
http://cfpub.epa.gov/ncea/cfm/partmatt.cfm [accessed Nov. 9, 2006].

EPA (U.S. Environmental Protection Agency). 2005. Guidance on the Use of Models and Other Analyses
in Attainment Demonstrations for the 8-Hour Ozone NAAQS. Draft Final. Office of Air and Radiation,
Office of Air Quality Planning and Standards. Research Triangle Park, North Carolina: U.S. Environmental
Protection Agency.  Available: http://www.epa.gov/ttn/scram/guidance/guide/draft-final-o3.pdf [accessed
April 27, 2007].
                                             82

-------
Fenner, K., M. Scheringer, M. MacLeod, M. Matthies, T. McKone, M. Stroebe, A. Beyer, M. Bonnell, A.C.
Le Gall, J. Klasmeier, D. Mackay, D. van de Meent, D. Pennington, B. Scharenberg, N. Suzuki, and F.
Wania. 2005. Comparing estimates of persistence and long-range transport potential among multimedia
models. Environ. Sci. Technol. 39(7): 1932-1942.

George, J. 2004. State perspective on modeling in support of the TMDL program. Presentation at the
First Meeting on Models in the Regulatory Decision Process, March 18, 2004, Washington, D.C.

Gilliland, A. 2003. Overview of model Evaluation Plans for CMAQ FY04 Release. Presented at the CMAQ
Model Peer Review Meeting, December 17, 2003, Research Triangle Park, NC. Available:
httpV/www.cmascenter.org/r and d/first review/pdf/model evaluation plans  for  cmaq04  (gilliland).pdf?t
emp id=99999 [accessed Nov. 22, 2006].

Klasmeier, J., M. Matthies, M. MacLeod, K. Fenner, M. Scheringer, M. Stroebe, A.C. Le Gall, I.E.
McKone,  D. van de Meent, and F. Wania. 2006. Application of multimedia models for screening
assessment of long-range transport potential and overall persistence. Environ. Sci. Technol. 40(1):  53-60.

Leighton, P.A. 1961. Photochemistry of Air Pollution. New York: Academic Press.

Morales, K.H., L. Ryan, T.L. Kuo, M.M. Wu, and C.J. Chen. 2000. Risk of internal cancers from arsenic in
drinking water. Environ. Health Perspect. 108(7): 655-661.

Morales, K.H., J.G. Ibrahim, C.J. Chen, and L.M. Ryan. 2006. Bayesian model averaging with
applications to benchmark dose estimation for arsenic in drinking water. J. Am. Stat. Assoc. 101(473): 9-
17.

NRC (National Research Council). 1991. Rethinking the Ozone Problem in Urban and Regional Air
Pollution. Washington, D.C.: National Academies Press.

NRC (National Research Council). 2001 b. Arsenic in Drinking Water 2001 Update. Washington, D.C.:
National Academies Press.

NRC (National Research Council). 2003. Adaptive Monitoring and Assessment for the Comprehensive
Everglades Restoration Plan. Washington, D.C.: National Academies Press.

NRC (National Research Council). 2005a. Superfund and Mining Megasites. Washington, D.C.: National
Academies Press.

NRC (National Research Council). 2005b. Health Implications of Perchlorate Ingestion. Washington, D.C.:
National Academies Press.

NRC (National Research Council). 2007.  Models in Environmental Regulatory Decision Making.
Committee on Models in the Regulatory Decision Process, Board  on Environmental Studies and
Toxicology, Division on Earth and Life Studies. Washington, D.C.: National Academies Press.
                                             83

-------
OECD (Organisation for Economic Co-operation and Development). 2002. Report of the OECD/UNEP
Workshop on the Use of Multimedia Models for Estimating Overall Environmental Persistence and Long-
Range Transport Potential in the Context of PBTs/POPs Assessment. ENV/JM/MONO (2002)15. OECD
Series on Testing and Assessment No. 36. Paris: Organisation for Economic Co-operation and
Development. Available:
http://www.olis.oecd.org/olis/2002doc.nsf/43bb6130e5e86e5fc12569fa005d004c/150147753d5c7f6cc125
6c010047d31d/$FILE/JT00130274.PDF [accessed April 27, 2007].

Oreskes, N.M., K. Shrader-Frechette, and K. Belitz. 1994. Verification, validation and confirmation of
numerical models in the earth sciences. Science 263: p. 641-646.

Scheringer, M., M. MacLeod, and F. Wegmann. 2006.  The OECD Pov and LRTP Screening Tool, Version
2.0. — Software and Manual. http://www.sust-chem.ethz.ch/downloads/Tool2 0  Manual.pdf [accessed
April 27, 2007].

Shoemaker, L. 2004. Modeling and decision making overview. Presentation  at the First Meeting on
Models in the Regulatory Decision Process, March 18, 2004, Washington, D.C.

Suter, G.W.I. 1993. Ecological Risk Assessment. Boca Raton: Lewis Publishers. 528.

TCEQ (Texas Commission on Environmental Quality). 2004. Required control strategy elements. Chapter
5. In: Revision to the State Implementation Plan (SIP) for the Control of Ozone Air Pollution:
Houston/Galveston/Brazoria Ozone Nonattainment Area. Project No. 2004-042-SIP-NR. Austin, Texas:
Texas Commission on Environmental Quality. Available:
http://www.tnrcc.state.tx.us/oprd/sips/iune2004hgb EDrec.html#docs [accessed June 8, 2005].

Volinsky, C.T., D. Madigan, A.E. Raftery, and R.A. Kronmal. 1997. Bayesian model averaging in
proportional hazard models: assessing the risk of the stroke. Appl. Statist. 46(4): 433-448.

Weaver, J. 2004. Modeling leaking underground storage tanks. Presentation at the First Meeting on
Models in the Regulatory Decision Process, March 18, 2004, Washington, D.C.

Wool, T. 2004. Overview of the TMDL program  and modeling approaches. Presentation at the First
Meeting on  Models in the Regulatory Decision Process, March 18, 2004, Washington, D.C.

Literature Cited in Appendix B
Birgand, F.  2004. Evaluation of QUAL2E. In: J.E. Parsons, D.L. Thomas, and R.L. Huffman, eds.
Agricultural Non-Point Source Water Quality Models: Their Use and Application. Southern Cooperative
Series Bulletin 398. 96-106. Available: http://www3.bae.ncsu.edu/Regional-Bulletins/Modeling-
Bulletin/gual2e.html.

Brown, L.C. 1986. Uncertainty Analysis Using QUAL2E. EPA-600-D-86-053. Office of Research and
Development, U.S. Environmental Protection Agency.
                                             84

-------
Brown, L.C., and T.O. Barnwell. 1987. The Enhanced Stream Water Quality Models QUAL2E and
QUAL2E-UNCAS: Documentation and User Manual. EPA-600-3-87-007. Environmental Research
Laboratory. Athens, Georgia: U.S. Environmental Protection Agency.

Byun, D.W., and J.K.S. Ching.  1999. Science Algorithms of the EPA Models-3 Community Multiscale Air
Quality (CMAQ) Modeling System. EPA-600-R-99-030. Atmospheric Modeling Division, National
Exposure Research Laboratory. Research Triangle Park, North Carolina: U.S. Environmental Protection
Agency. Available:  http://www.epa.gov/asmdnerl/CMAQ/CMAQscienceDoc.html [accessed June 13,
2007].

Caliper Corporation. 2007. TransCAD. Available: http://www.caliper.com/tcovu.htm [accessed June 13,
2007].

Coulter, C.T. 2004. EPA-CMB8.2 Users Manual. EPA-452-R-04-011. Air Quality Modeling Group,
Emissions, Monitoring and Analysis Division, Office of Air Quality Planning and Standards. Research
Triangle Park, North Carolina: U.S. Environmental Protection Agency. Available:
http://www.epa.gov/scram001/models/receptor/EPACMB82Manual.pdf [accessed June 13, 2007].

Donigian, A.S., Jr. 2002. Watershed Model Calibration and Validation: The HSPF Experience. WEF
National TMDL Science and Policy 2002, November 13-16, 2002. Phoenix,  AZ. Available:
http://hspf.com/TMDL.Nov02.Donigian.Paper.doc [accessed June 13, 2007].

EIA (Energy Information Administration). 1993. Documentation of the DRI Model of the U.S. Economy.
Available: tonto.eia.doe.gov/FTPROOT/modeldoc/m061.pdf [accessed March 31, 2007].

EPA (U.S. Environmental Protection Agency). 1994. Guidance Manual for the Integrated Exposure
Uptake Biokinetic Model for Lead in Children. EPA-540-R-93-081. OSWER9285.7-15-1. PB93-963510.
Office of Solid Waste and Emergency Response. Washington, D.C.: U.S. Environmental Protection
Agency. Available:  http://www.epa.gov/superfund/programs/lead/products.htm [accessed Nov. 2, 2006].

EPA (U.S. Environmental Protection Agency). 1998. BIOPLUME III: Natural Attenuation Decision Support
System. User's Manual Version 1.0. EPA-600-R-98-010. Washington, D.C.: U.S. Environmental
Protection Agency.

EPA (U.S. Environmental Protection Agency). 1999. ABEL Model. Environmental Data Registry.
Available:
http://iaspub.epa.gov/edr/edr proc qrv.navigate?P  LIST  OPTION CD=CSDIS&P REG  AUTH IDENTI
FIER=1&P DATA  IDENTIFIER=90389&P VERSIONS [accessed June 14, 2007].

EPA (U.S. Environmental Protection Agency). 2000. Benchmark Dose Technical Guidance Document.
EPA-630-R-00-001. External Review Draft. Risk Assessment Forum. Washington, D.C.: U.S.
Environmental Protection Agency. Available:
http://www.epa.gov/nceawww1/pdfs/bmds/BMDExternal  10 13 2000.pdf [accessed June 12, 2007].
                                            85

-------
EPA (U.S. Environmental Protection Agency). 2001. PLOAD Version 3.0: An ArcView CIS Tool to
Calculate Nonpoint Sources of Pollution in Watershed and Stormwater Projects. User's Manual. Office of
Water Science. Available:
www.epa.gov/waterscience/basins/b3docs/PLOAD v3.pdf [accessed March 21, 2007].

EPA (U.S. Environmental Protection Agency). 2004. MOVES2004 User Guide. Draft. EPA-420-P-04-019.
Assessment and Standards Division, Office of Transportation and Air Quality. Washington, D.C.: U.S.
Environmental Protection Agency. Available:
http://www.epa.gov/otaq/models/ngm/420p04019.pdf [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2005a. User's Guide for the Final NONROAD2005 Model.
EPA-420-R-05-013. Assessment and Standards Division, Office of Transportation and Air Quality.
Washington, D.C.: U.S. Environmental Protection Agency. Available:
http://www.epa.gov/otaq/models/nonrdmdl/nonrdmdl2005/420r05013.pdf [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2005b. PRZM-3, A Model for Predicting Pesticide and
Nitrogen Fate in the Crop Root and Unsaturated Soil Zones: Users Manual for Release 3.12.2. EPA-600-
R-05-111. Washington, D.C.: U.S. Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 2005c. Integrated Exposure Uptake Biokinetic Model for
Lead in Children,  Windows Version (lEUBKwin vl.O build 263). Superfund. Available:
http://www.epa.gov/superfund/programs/lead/products.htmffieubk [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2005d. Economic Impact Analysis of the Standards of
Performance for Stationary Compression Ignition Internal Combustion Engines. EPA-452-R-05-006.
Office of Air Quality Planning and Standards. Research Triangle Park, North Carolina: U.S. Environmental
Protection Agency. Available:
http://www.epa.gov/ttn/atw/nsps/cinsps/ci  nsps  eia reportfinalforproposal.pdf [accessed June 14, 2007].

EPA (U.S. Environmental Protection Agency). 2006a. Biogenic Emissions Inventory System (BEIS)
Modeling. Atmospheric Sciences Modeling Division, Office of Research and Development. Available:
http://www.epa.gov/asmdnerl/biogen.html [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2006b. NONROAD Model (Nonroad Engines, Equipment,
and Vehicles). Office of Transportation and Air Quality. Available: http://www.epa.gov/otaq/nonrdmdl.htm
[accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2006c. CSMoS Ground-Water Modeling Software. Ground
Water Technical Support Center, Risk Management Research, Office of Research and Development.
Available: http://www.epa.gov/ada/csmos/models.html [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2006d. HSPF. Exposure Assessment Models. Available:
http://www.epa.gov/ceampubl/swater/hspf/ [accessed June 13, 2007].
                                             86

-------
EPA (U.S. Environmental Protection Agency). 2006e. Water Quality Analysis Simulation Program
(WASP). Ecosystems Research Division, Office of Research and Development. Available:
http://www.epa.gov/athens/wwqtsc/html/wasp.html [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2006f. The Chemical Mass Balance (CMB) Model EPA-
CMBv8.2. Receptor Modeling, Air Quality Models. Available: http://www.epa.gov/scram001/
receptor cmb.htm [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2006g. AQUATOX. Office of Water Science. Available:
http://www.epa.gov/waterscience/models/aguatox/ [accessed June 14, 2007].

EPA (U.S. Environmental Protection Agency). 2006h. BASS. Ecosystems Research Division, Office of
Research and Development. Available: http://www.epa.gov/athens/research/modeling/bass.html
[accessed June 14,  2007].

EPA (U.S. Environmental Protection Agency). 2007a. Better Assessment Science Integrating Point &
Nonpoint Sources (BASINS). Water Quality Models and Tools, Office of Water Science. Available:
http://www.epa.gov/waterscience/basins/ [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2007b. MOBILE6 Vehicle Emission Modeling Software.
Office of Transportation and Air Quality. Available: http://www.epa.gov/otaq/m6.htm [accessed June 13,
2007].

EPA (U.S. Environmental Protection Agency). 2007c. Modeling Products. Exposure Assessment.
Available: http://www.epa.gov/ceampubl/products.htm [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2007d. Community Multiscale Air Quality (CMAQ).
Atmospheric Science Modeling, Office of Research and Development. Available:
http://www.epa.gov/asmdnerl/CMAQ/index.html [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2007e. Human Exposure Modeling — Air Pollutants
Exposure Model (APEX/TRIM.Expo/nha/afonJ. Office of Air and Radiation. Available:
http://www.epa.gov/ttn/fera/human apex.html [accessed June 13, 2007].

EPA (U.S. Environmental Protection Agency). 2007f. Benchmark Dose Software. National Center for
Environmental Assessment, Office of Research and Development. Available:
http://cfpub.epa.gov/ncea/cfm/recordisplay.cfm?deid=164443 [accessed June 14, 2007].

EPA (U.S. Environmental Protection Agency). 2007g. SERAFM— Spreadsheet-based Ecological Risk
Assessment for the  Fate of Mercury. Surface Water Models. Exposure Assessment Models. Available:
http://www.epa.gov/ceampubl/swater/serafm/index.htm [accessed June 14, 2007].

EPA (U.S. Environmental Protection Agency). 2007h. Program to Assist in Tracking Critical Habitat
(PATCH). Western Ecology Division, Office of Research and Development. Corvallis, Oregon: U.S.
Environmental Protection Agency. Available:
http://www.epa.gov/wed/pages/models/patch/patchmain.htm [accessed June 14, 2007].
                                             87

-------
EPA (U.S. Environmental Protection Agency). 2007L Benefits Analysis Models/Tools. Economics and
Cost Analysis Support. Available:
http://www.epa.gov/ttnecas1/benmodels.html [accessed June 14, 2007].

Glover, E.L., and M. Cumberworth. 2003. MOBILE6.1 Particulate Emission Factor Model Technical
Description — Final Report. EPA-420-R-03-001. Assessment and Standards Division, Office of
Transportation and Air Quality. Available: http://www.epa.gov/otag/models/mobile6/r03001.pdf [accessed
June 13, 2007].

Hawkins, T. 2005. Critical Evaluation of the Aquatox Model. Carnegie Mellon University. Available:
http://www.ce.cmu.edu/~trh/Professional/Research/Hawkins CriticalEvaluationOfTheAguatoxModel.pdf
[accessed March 31, 2007].

Hayes, J.T., P.A. O'Rourke, W.H. Terjung, and P.E. Todhunter. 1982. A feasible crop yield model for
worldwide international food production. Int. J. Biometeorol. 26(3): 239-257.

ICF Consulting. 2005. User's Guide to the Regional Modeling System for Aerosols and Deposition
(REMSAD): Version 8. Available:
http://www.remsad.com/documents/remsad users guide vS.OO  112305.pdf [accessed March 31, 2007].

ICF International/Systems Applications International. 2006.  Regional Modeling System for Aerosols and
Deposition (REMSAD). Available: http://www.remsad.com/ [accessed June 13, 2007].

Knightes, C.D. 2005. SERAFM: An ecological risk assessment tool for evaluating wildlife exposure risk
associated with mercury-contaminated sediment in lake and river systems. Presentation at EPA Science
Forum 2005, May 16-18, 2005, Washington, D.C.

Lawler, J.J., D. White, R.P. Neilson, and A.R. Blaustein. 2006. Predicting climate-induced range shifts:
model differences and model reliability. Glob. Change Biol.  12(8):1568-1584.

Lifeline Group, Inc. 2006. Users Manual: Lifeline™ Verson 4.3. Software for Modeling Aggregate and
Cumulative Exposures to Pesticides and Chemicals. Available:
http://www.thelifelinegroup.0rg/lifeline/documents/v4.3  UserManual.pdf [accessed March 31, 2007].

Lifeline Group, Inc. 2007. Lifeline software. Available: http://www.thelifelinegroup.org/index.htm [accessed
June 13, 2007].

MBL (Marine Biological Laboratory). 2005. Marine Biological Laboratory General Ecosystem Model (MBL-
GEM). The Ecosystem Center, Marine  Biological Laboratory. Available:
http://ecosvstems.mbl.edu/Research/Models/gem/welcome.html [accessed June 13, 2007].

MIT (Massachusetts Institute of Technology). 2006. Natural Emissions Model (NEM). The MIT Integrated
Global System Model: Ecosystems Impacts. Available:
http://web.mit.edu/globalchange/www/tem.htmltfnem [accessed June 13, 2007].
                                              88

-------
Parkhurst, D.L., D.C. Thoratenson, and L.N. Plummer. 1980. PHREEQE: A Computer Program for
Geochemical Calculations. U.S. Geological Survey Water Research Investigations 80-96. Reston,
Virginia: U.S. Geological Survey.

Prudic, D.E., L.F. Konikow, and E.R. Banta. 2004. A New Stream- Flow Routing (SFR1) Package to
Simulate Stream-Aquifer Interaction With MOD-FLOW-2000. U.S. Geological Survey Open-File Report
2004-1042. U.S. Department of the  Interior, U.S. Geological Survey. Available:
http://pubs.usgs.gov/of/2004/1042/ [accessed June 13, 2007].

Rashleigh, B. 2007. Assessment of lake ecosystem response to toxic events with the AQUATOX model.
In: I.E. Gonenc, V. Koutitonsky, B. Rashleigh, R.A. Ambrose, and J.P. Wolfin, eds. Assessment of the
Fates and Effects of Toxic Agents on Water Resources. Dordrecht, The Netherlands: Springer. 293-299.

Richmond, H., T. Palma, G. Glen, and L. Smith. 2001. Overview of APEX (2.0): EPA's Pollutant Exposure
Model for Criteria and Air Toxic Inhalation Exposures. Annual Meeting of the International Society of
Exposure Analysis, November 4-8, 2001, Charleston, South Carolina.

Ritchie, J.T., and D. Godwin. 2007.  CARES Wheat 2.0. Michigan State University. Available:
http://nowlin.css.msu.edu/wheat book/ [accessed June 14, 2007].

Schwarz, G.E., A.B. Hoos, R.B. Alexander, and R.A.  Smith. 2006. The SPARROW Surface Water-Quality
Model: Theory, Application and User Documentation. U.S. Geological  Survey Techniques and Methods 6-
B3. U.S. Geological Survey. Available: http://pubs.usgs.gov/tm/2006/tm6b3/PDF.htm [accessed March
31,2007].

Shiftan, Y., and J. Suhrbier. 2002. The analysis of travel and emission impacts of travel demand
management strategies using activity-based models. Transportation 29(2): 145-168.

Sokolov, A.P., C.A. Schlosser, S. Dutkiewicz, S. Paltsev, D.W. Kicklighter, H.D. Jacoby, R.G. Prinn, C.E.
Forest, J. Reilly, C. Wang, B. Felzer, M.C.  Sarofim, J. Scott, P.M. Stone, J.M. Melillo, and J. Cohen. 2005.
The MIT Integrated Global System Model (IGSM) Version 2: Model Description and Baseline Evaluation.
Report No. 124. Joint Program on the Science and Policy of Global Change, Massachusetts Institute  of
Technology. Available:  http://web.mit.edu/globalchange/www/abstracts.htmltfa124 [accessed March 31,
2007].

Systems Applications International,  Inc. 1999. User's Guide to the Variable-Grid Urban Airshed Model
(UAM-V). San Rafael, California: Systems  Applications International, Inc. Available:
http://www.uamv.com/documents/uam-v 1.31 user's guide.pdf [accessed June 13, 2007].

USGS (U.S. Geological Survey). 2007a. SPARROW Modeling of Surface-Water Quality. U.S. Department
of the Interior, U.S. Geological Survey. Available: http://water.usgs.gov/nawqa/sparrow/ [accessed June
13,2007].

USGS (U.S. Geological Survey). 2007b. MODFLOW-2000 Version 1.17.02. USGS Ground-Water
Software. U.S.  Department of the Interior, U.S. Geological Survey. Available:
http://water.usgs.qov/nrp/qwsoftware/modflow2000/modflow2000.html [accessed  June 13, 2007].
                                             89

-------
Vukovich, J.M., and T. Pierce. 2002. The Implementation ofBEISS Within the SMOKE. 11th International
Emission Inventory Conference: Emission Inventories — Partnering for the Future, April 15-18, 2002,
Atlanta, Georgia. Available: http://www.epa.gov/ttn/chief/conference/ei11/modeling/vukovich.pdf
[accessed June 13, 2007].

Wilson, J.D., and R.L. Naff. 2004. MODFLOW-2000: The U.S. Geological Survey Modular Ground-Water
Model-GMG Linear Equation Solver Package Documentation. U.S. Geological Survey Water Resources
Open-File Report 2004-1261. U.S. Department of the Interior, U.S. Geological Survey. Available:
http://pubs.usgs.gov/of/2004/1261/ [accessed June 13, 2007].

Young, T., R. Randolph, and D. Bowman. 1994. Economic Growth Analysis System: Version 2.0. EPA-
600-SR-94-139. Air and Energy Engineering, Research Laboratory. Research Triangle Park, North
Carolina: U.S. Environmental Protection Agency. Available: http://www.p2pays.org/ref/07/0622.pdf
[accessed June 13, 2007].
                                             90

-------
                                                                                                                     o
                                                                                                                     (D
                                                                                                                     (D

                                                                                                                     D
                                                                                                                     CD

                                                                                                                     0.
                                                                                                                     O
                                                                                                                    T3

                                                                                                                     (D
                                                                                                                     03
                                                                                                                     rt-
                                                                                                                     o'
vvEPA
      United States
      Environmental Protection
      Agency
PRESORTED STANDARD
 POSTAGES FEES PAID
        EPA
   PERMIT NO. G-35
                                   8
      Office of the Science Advisor (8105R)
      Washington, DC 20460

      Official  Business
      Penalty for  Private Use
      $300
                                   (D
                                   3
                                   <-i-
                                   03
        Recycled/Recyclable
        Printed with vegetable-based ink on
        paper that contains a minimum of
        50% post-consumer fiber content
        processed chlorine free

-------