EPA/600/A-97/098
1.1 ADVANCED TECHNIQUES FOR EVALUATING EULERIAN AIR QUALITY MODELS:
BACKGROUND AND METHODOLOGY
J.R. Arnold*, Robin L. Dennis1, and Gail S. Tonnesen
Atmospheric Sciences Modeling Division
National Oceanic and Atmospheric Administration
Research Triangle Park, NC
Model Evaluation Goals and Objectives
The goal of model evaluation within the framework
of applications is to determine an air quality model
(AQM)'s degree of acceptability and usefulness for a
specified application or task. That goal is met by satisfying
three objectives;
•	indicate the validity of the model's scientific
formulations
•	evaluate the realism of the model's simulations
•	characterize the credibility of the model's realism
relative to its intended application.
Satisfying these objectives, however, presents two
fundamental problems for providing a meaningful
interpretation of model behavior to guide decision-making:
•	how to tell whether the model is producing
seemingly appropriate results from incorrect
formulations or input data; /,e., how model
performance can appear right for the wrong
reasons
•	how to establish the model's acceptability when
the evaluation tests appear inadequate for
making a determination of acceptability.
Most applications of the model concern predicting
pollutant concentrations for future atmospheric conditions.
The degree of a model's acceptability, therefore, will
depend on the validity of the model's science and the
appropriateness of its implementation in the model's
component processes. Testing only a model's state
variable resultants against limited sets of historical data
provides only weak, inductive justification of the model's
acceptability since such tests are not directed at a model's
scientific formulation. Thus, the component processes that
comprise a model's scientific formulation should be the
focus of an evaluation.
Current Efforts Are Inadequate
Evaluation efforts to date have been largely
inadequate for addressing the fundamental problems and
meeting the objectives of AQM evaluation. The
inadequacy comes from the practice of basing the
evaluation of three-dimensional Eulerian process models
on tests that use simple statistical measures on residuals
*Corresponding author address: J. R. Arnold, PO Box
23193 Seattle, WA 98102-0493; email: jra@unc.edu
'On assignment to NERL, U.S. Env. Protection Agency,
(Concentration^^ - Concentration^,^) for Oa and,
less-often, for NOx. However, since these state variable
resultants are estimates derived from a highly-nonlinear
buffered system, simple comparison statistics cannot be
a sound basis for model evaluation because they provide
no information for understanding the interaction of model
processes responsible for producing the state variable
predictions.
The current recommended statistical measures for
AQM evaluation (USEPA 1991) date from 1981 (Fox
1981) and were adapted from long-established regression
statistics for empirically-fitted models where model
parameters can be tested against observational data. But
that regression paradigm is inappropriate as the sole test
for complex Eulerian process models where the underlying
science in the model must be evaluated to ensure
credibility and confidence in the model's simulations of
atmospheric photochemistry and fluid flow. And since
there are no strong ambient signals of change, there can
be no direct tests of any outcome variable which would be
meaningfully measured with resultant comparisons.
Moreover, because the atmosphere is a system of
strongly nonlinear processes, it would not be possible to
interpret such a signal unambiguously if one were present.
For these reasons, successful model evaluation for
judging a model's potential behavior in future applications
requires tests that are diagnostic of a model's processes
and the science represented in them.
The need for diagnostic testing of AQMs has
previously been recognized: Fox's report from an early
model evaluation workshop (Fox 1984) noted explicitly that
improvement in model performance is tied to
understanding the scientific basis for model behavior in
well-defined physical situations, a point repeated in
Seinfeld (1988). More recently, Tesche, Reynolds, and
Roth have described the need for "stressful" diagnostic
testing In a better, more comprehensive model evaluation
methodology (Tesche et al. 1992; Reynolds et a/. 1994).
However, incorporating process-oriented diagnostic tests
in actual model evaluations has been slow in coming.
Previous evaluations have focused on failure analysis of
module components, or sensitivity testing of the full model
by examining resultant concentrations, rather than on true
model process diagnostics.
Implications of Inadequate Evaluation
Inadequate evaluations allow the possibility of finding
a model acceptable for an application when in fact it is
t

-------
not. The lack of diagnostic tests for a model and its
component modules in most current evaluations has
allowed models to be judged acceptable for several
applications when large and significant errors remained.
For example, using the USEPA recommended
performance statistics of *5 to 15% bias, *30 to 35% gross
error, and *15 to 20% unpaired peak prediction accuracy
in [Og], urban applications of UAM-IV have performed
acceptably even when errors in the meteorological model
produced wind speeds of zero in all grid layers above the
surface (Tesche and McNally 1995). In other cases
(described in Tesche et al. 1992), data in VOC inventories
used to set*up a model for applications evaluations were
later shown to be underestimated by a factor of 2 or more;
yet at the time, model applications were found to be
acceptable using the recommended performance
statistics.
Furthermore, because model evaluation for air quality
applications is carried out to assist in control strategy
selection, an inadequate model evaluation for these
applications leaves the potential for undocumented bias in
estimating the effect of control strategies. Given an
undocumented bias in a model, policy-makers could select
an incorrect level of future reductions, or even the wrong
type of control (NOx or VOC). This point has been
demonstrated in two recent studies using two models to
simulate the New York domain for July 1988. A series of
sensitivity/uncertainty tests performed with UAM-IV (Sistla
et al. 1996) and with RADM (U et al. 1998) have shown
that emissions and meteorology uncertainties in the model
setup affect final predicted 03 concentration ([03]) to the
extent that preferences for control strategies can shift. The
high risk of selecting the wrong control strategy has costly
economic and social disbenefits, which will be increasingly
important for the proposed new multi-pollutant standards
and combined control strategies. Analyses of [03] time-
series plots and residual statistics cannot reveal such
potential biases in the use of a model; thus, undiscovered
biases present a substantial negative implication for any
evaluation procedure which uses residual statistics and
other outcome measures to the exclusion of diagnostic
tests.
Revised Model Evaluation Methodology
These implications of inadequate model evaluations
motivate development of a revised evaluation
methodology. Importantly, the examples described above
are failures of the model evaluation more than of the
model. Better evaluation procedures including enhanced
diagnostic testing might have detected the flaws in the
model and the compensating errors in its setup for these
applications.The methodology proposed here is described
with a matrix of evaluation tests grouped in several testing
categories, and incorporates new and redefined
techniques of existing model evaluations. A brief outline of
the methodology's key elements is given here; fuller
descriptions and a diagram of candidate tests and
interpretations for an example model evaluation appear in
the oral presentation.
This methodology proposes two top-level evaluation
types: Integrated/Diagnostic, and Applications.
Integrated/Diagnostic evaluation is an assessment of the
model's fitness for use in its intended application or task
based on a comparison of processes and output from the
model judged against relevant observations, observational
models, and other numerical model results. An
Applications evaluation then typically follows an
Integrated/Diagnostic evaluation and characterizes use of
the model in a specific application study for the purpose
of ensuring against degradation of model performance
through unintended or unjustified changes in the model or
its setup.
flie focus of this paper is Integrated/Diagnostic
evaluation since it precedes and supports an Applications
evaluation; and because an Applications evaluation makes
use of elements of the Integrated/ Diagnostic evaluation,
but for the purpose of testing the model setup for the
particular application of the study. While most of the tests
described below are specific to the chemistry of AQMs,
some suggestions for tests of other model components
are also included.
An Integrated/Diagnostic evaluation consists of a
series of test results and interpretations designed using
elements of these four categories:
(1)	Component and Composition assessment
(2)	Resultants comparisons
(3)	Diagnostic testing
(4)	Model Systems analysis.
Category (1) Component and Composition
assessment examines the structure and scientific basis of
model processes, and the instantiation of those processes
in the model's equations and numerical routines.
Assessment of the model structure includes describing
assumptions behind formulations of the model's
components, and testing whether different formulations
produce different results; i.e., providing a sensitivity
analysis on model structure.
Specific modules such as the chemical mechanism
and the deposition algorithms are assessed independently
using direct comparisons in cross-model testing like that
performed for the RADM and ADOM chemistry modules
(Dennis et al. 1990), and directly with Mechanistic tests of
the modules using specially-collected data - e.g., smog
chamber runs on specific compounds, or special
laboratory and field data - to test specific
parameterizations In the model for well-defined conditions.
There is substantial experience using Mechanistic tests on
chemical mechanisms of AQMs: a hierarchy of tests was
developed and has generally been followed (Whitten 1983;
Atkinson et al. 1987). In addition, several species ratios
have been proposed to help evaluators judge whether the
chemistry is correctly predicting the processing and
product formation (Dennis et al. 1990). While ratios of
species are not always less-sensitive to nonchemistry
effects such as transport or dispersion, they can indicate
areas of concern. Thus, where the chemical mechanism
correctly predicts the NO:NOz ratio, for example, but the

-------
full model does riot, this may indicate a problem with
transport.
Mechanistic testing of other modules is less advanced
than it is for chemistry, but an extensive series of
Mechanistic tests for AQM meteorological modules has
recently been proposed (Tesche and McNally 1996).
Category (2) Resultants comparisons describe how
well the model predicts for key state variables using
Outcome Variable matching. Outcome Variable matching
is the direct comparison of model-predicted state variable
concentrations against ambient observations of the same
state variables. These comparisons are made with
statistical measures, qualitative and quantitative pattern
analysis, and correlation statistics for geographical sites or
time series data like the ones recommended by Tesche et
at. (1990, 1992) to supplement the standard USEPA
measures of normalized bias and gross error, and average
unpaired peak accuracy (USEPA 1991). While 03 is a key
variable for AQM evaluation because it is of central
regulatory concern, it is not the only variable of interest.
Consequently, Resultants comparisons should be made
for precursor compounds such as NOx and VOCs, for
conserved species such as CO and NOy, and for other
products such as HNOa and total particulate N03', H202
and total peroxides, and PAN (peroxyacetic nitric
anhydride), MPAN (peroxy-methacrylic nitric anhydride),
and PPN (peroxypropionic nitric anhydride) where data are
available.
Category (3) Diagnostic testing assesses a model's
predictive performance and reveals why the model
responds in the way that it does by providing candidate
explanations in terms of processes and pathways for the
performance summarized in Resultants comparisons.
Diagnostic testing is in situ testing of a model's processes
and can be performed either internally with one model, or
across several models, or in comparisons using specially-
collected aerometric data which emphasize atmospheric
processes (see for example Parrish et al. 1993; Trainer et
at. 1993).
Two types of Diagnostic tests are proposed, Process
Diagnostics and Response Surface Diagnostics.
Process Diagnostics are specific tests of key reaction
pathways and interactions between components in an
AQM. They assess a model's ability to represent actual
atmospheric interactions by examining pathways and
processes in the model, often using special measurements
designed to indicate the activity of such processes.
For the chemistry module, for example, Process
Diagnostics would test;
*	radical initiation pathways using comparisons of
03, HCHO, HONO, peroxides, and spectral
irradiance
*	radical termination pathways using comparisons
of HN03, PAN, organic nitrates and peroxides
•	the balance between radical initiation and
termination
•	competition between radical termination
pathways by comparing production of HNO, and
other nitrates to H2Os and other peroxides
•	calculations of radical propagation efficiency and
OH chain length using NO, N02, VOC, and ROz
•	estimates of OH chain length using ratios of [03]
to radical termination products
•	estimates of Os production efficiency using ratios
of [Oj] to NOx termination products; i.e., NOz
•	speciation of NOz to compare competition
between NOx termination pathways
•	airmass aging pathways using relative fractions
of NOx vs. NOy, and PAN and other nitrates vs.
NOy.
Some Process Diagnostic tests will involve measures
that reflect both local photochemistry and photochemistry
over the history of an air parcel. For example, 03
production rates and OH chain length might be estimated
for instantaneous, local photochemistry using measured
R02, NO, N02, and VOC concentrations, while the
cumulative 03 production and average OH chain length
over an air parcel's history can be approximated using [03]
and the ratio of [03| to radical termination products.
However, local photochemistry and the history of an
airmass should be carefully distinguished, and it will be
necessary to develop diagnostic tests to evaluate model
representations of each.
Furthermore, ratios of species can be calculated in
the model for comparisons at the surface and aloft to
provide additional diagnostic information on a model's
treatment of intermediate and product species and
radicals. For example, chemical dynamics measures,
calculated using partitioned N0X and N0V species for aloft
and surface concentrations, could provide useful
information about airmass aging pathways since the
species aloft would be more susceptible to transport and
dispersion effects, and the chemistry can proceed further
to completion there. Also, R02-R02 reactions should be
tested in the surface layer of urban areas where the
reaction has only little significance, and again aloft at low
[NOJ, where the reaction is significant for [03] and [H202].
Additional Process Diagnostic tests would use CO
data with other ratios of species having varying lifetimes
to derive additional model-estimated chemical budgets and
processing rates. These comparisons would serve as an
aid to interpreting results from the Mechanistic testing of
the chemical mechanism, helping to separate the influence
of chemistry from other model processes.
The second type of Diagnostic tests, Response
Surface Diagnostics, test a model's ability to track
systemic modulation and generally will involve the use of
indicator species and ratios of species thought to correlate
consistently with VOC-sensitive and NOx-sensitive 03
production in the model. Several indicators of 03
production sensitivity and airmass aging are currently
under development (Sillman 1995; Sillman et al. 1997;
Tonnesen and Dennis 1998a, 1998b), and appear
promising as diagnostic tests for model evaluations. For
example, the indicators [03]/[HN03] and [03]/[NOJ show
strong correlations to [OJ sensitivity, and correctly predict
conditions that are either strongly NOx- or strongly VOC-
limited (Tonnesen and Dennis 1998b).

-------
Category (4) Model Systems analysis assesses the
full model as a system, and is intended to provide insight
about the model's behavior over a wide range of
simulations. These tests will Involve both model-to-model
comparisons and internal comparisons of one model's
structure and response characteristics.
Three types of Model Systems analyses are
proposed: sensitivity and uncertainty analysis, including
principal components analysis and supported by additional
response surface testing and process analysis for these
model-only tests; and, where necessary, a bounding
analysis. These types are briefly described below.
Sensitivity analysis, testing effects of particular
parameterizations in a model or analyzing the response of
a model to changes in input variables, is a common
evaluation element for model development purposes, and
should be more effectively included in Integrated/
Diagnostic evaluations as well. The chemical mechanism,
for example, has frequently been tested with sensitivity
analyses using the change in predicted [03] as the
endpoint for variations in VOC speciation (Harley et al.
1993), and for photolysis rates and reaction yields in the
mechanism (Gao et al. 1995,1996).
Uncertainty analysis, too, has frequently been used in
research-level evaluations of a model. One example
(described above) is the work of U et al. (1998) using
RADM for analyzing the effect of meteorology and
emissions uncertainties on 03 production and control
strategy selection. Hanna et al. (1998) have recently
extended the use of uncertainty analysis by addressing
multivariable input uncertainties using Monte Carlo
techniques with the full model. These are early results,
though, and have not been worked-up with quantitative
likelihoods or fully evaluated against existing single-
variable uncertainty studies, but they are promising as
potential indicators of the uncertainty range of Model
Systems functioning.
A bounding analysis is required when evaluation
results are inconclusive as to whether the model is
acceptable for the evaluated task, yet estimates of the
model's likely performance are still required (see Dennis
et al. 1990). Bounding analysis can provide targeted
interpretations of the effects of bias and error in the
science of the model for future predictions. Hence, it is a
means for examining process representations and
potential compensating errors in the model when
Response Surface Diagnostic results are not avaiiable.
Summary
Tests and interpretations developed using the revised
methodology for Integrated/Diagnostic evaluations
proposed here will help provide the 'more explicit, less
intuitive approaches to model acceptability" called for by
the Euierian Model Evaluation Team (Dennis et at. 1990)
by formalizing explicit approaches for;
•	specifying which tests are required
•	establishing standard interpretations of test
results
•	using new measures of model behavior to
supplement the aggregate scores of bias, gross
error, and average accuracy currently used.
The proposed evaluation methodology shows several
advantages over current practice, in that the revised
methodology:
•	can make fuller use of all available data from
observations and predictions
•	will provide for more directed sensitivity analysis
of important model processes
•	can direct future model development research to
the most important errors in model structure and
implementation to help ensure continued model
improvement.
Tests for the integrated/Diagnostic evaluations of
AQMs will require special observational data that are not
now routinely available, it is hoped that the description in
this paper of how those data might be used in a more
advanced evaluation of a model indicates the importance
of such data, and demonstrates the need for continued
research and development on accurate and reliable
measurement techniques. Active cooperation between
model evaluators and measurement developers is crucial
to the success of diagnostic evaluation, and for the
continued improvement of model performance.
References
Atkinson, R„ H. E. Jeffries, G. 2, Whitten, and F. L.
Lurmann, 1987: Proceedings of the Workshop on
Evaluation/Documentation of Chemical Mechanisms.
USEPA, Research Triangle Park, NC.
Dennis, R. L., W. R. Barchet, T. L. Clark, S. K. Seilkop,
and P. M. Roth, 1990: Evaluation of Regional Acidic
Deposition Models (Part I), And Selected Applications
of RADM (Part II). National Acid Precipitation
Assessment Program, Washington, DC.
Fox, D. G., 1981: Judging air quality model performance.
Bulletin of the American Meteorological Society, 62,
599-609.
	, 1984: Uncertainty in air quality modeling: A summary
of the AMS workshop (September 1982, Woods Hole,
MA) on quantifying and communicating model
uncertainty. Bulletin of the American Meteorological
Society, 65, 27-36.
Gao, D., W. R. Stockwell, and J. B. Milford, 1995: First-
order sensitivity and uncertainty analysis for a
regional-scale gas-phase chemical mechanism.
Journal of Geophysical Research ,100, 23153-66.
	, 1996: Global uncertainty analysis of a regional-scale
gas-phase chemical mechanism. Journal of
Geophysical Research, 101, 9107-19.
Hanna, S. R, J. C. Chang, and M. E. Fernau, 1998: Monte
Carlo estimates of uncertainties in predictions by a
photochemical grid model (UAM-IV) due to
uncertainties in input variables. Atmospheric
Environment [submitted February 1997].
Harley, R. A., A. G. Russell, and G. R. Cass, 1993:
Mathematical modeling of the concentrations of
volatile organic compounds: Model performance using

-------
a lumped chemical mechanism. Environmental Science
and Technology, 27,1638-49.
Li, V., R. L. Dennis, G. S. Tonnesen, and J. E. Pleim,
1998: Regional ozone concentrations and production
efficiency as affected by meteorological parameters in
the Regional Acid Deposition Modeling system.
Preprints from the 10th Joint Conference on the
Applications of Air Pollution Meteorology with the
AWMA (Phoenix, AZ, January 1998), AMS.
Parrish, D. D., M. P. Buhr, M. Trainer, R. B. Norton, J. P.
Shimshock, F. C. Fehsenfeld, A. G. Aniauf, J. W.
Bottenheim, Y. Z. Tang, H. A. Wiebe, J. M. Roberts,
R. L. Tanner, L. Newman, V. C. Bowersox, K. J.
Oiszyna, E. M. Bailey, M. O. Rodgers, T. Wang, H.
Berresheim, U. K. Roychowdhury, K. Demerjian,
1993: The total reactive oxidized nitrogen levels and
the partitioning between the individual species at six
rural sites in Eastern North America. Journal of
Geophysical Research, 98, 2927-39.
Reynolds, S. D., P. M. Roth, and T. W. Tesche, 1994: A
Process for the Stressful Evaluation of Photochemical
Model Performance. Western States Petroleum
Association, Giendale, CA.
Seinfeld, J. H., 1988: Ozone air quality models: A critical
review. Journal of the Air Pollution Control
Association ,38, 616-45.
Sillman, S., 1995: The use of NOY, Hs02, and HN03 as
indicators for Os-NOx-ROG sensitivity in urban
locations. Journal of Geophysical Research, 100,
14175-88.
Sillman, S., D. Y. He, M. R. Pippin, P. H. Daum, J. H.
Lee, L. I. Kleinman, and J. Weinstein-Lloyd, 1997:
Model correlations for ozone, reactive nitrogen, and
peroxides for Nashville In comparisons with
measurements: Implications for 03-NOx-hydrocarbon
chemistry. Journal of Geophysical Research
[submitted July 1997].
Sistla, G., N. Zhou, W. Hao, J. Ku, S. T. Rao, R.
Bomstein, F. Freedman, and P. Thunis, 1996: Effects
of uncertainties in meteorological inputs on urban
airshed model predictions and ozone control
strategies. Atmospheric Environment, 30,2011-55.
Tesche, T. W., F. L Lurmann, P. M. Roth, P.
Georgopoulos, J. H. Seinfeld, and G. R. Cass, 1990:
Improvement of Procedures for Evaluating
Photochemical Models. California Air Resources
Board, Sacramento, CA.
Tesche, T. W., and D. E. McNally, 1995: Assessment of
UAM-IV Model Performance for Three St. Louis
Ozone Episodes. Alpine Geophysics, LLC.,
Covington, KY.
Tesche, T. W., and D. E. McNally, 1996: Evaluation of the
MM5 Model for Three 1995 Regional Ozone Episodes
over the Northeast United States. Alpine Geophysics,
LLC., Covington, KY.
Tesche, T. W„ P. M. Roth, S. D. Reynolds, and F. W.
Lurmann, 1992: Scientific Assessment of the Urban
Airshed Model (UAM-IV). Alpine Geophysics, Crested
Butte, CO.
Tonnesen, G. S., and R. L. Dennis, 1998a: Analysis of
radical propagation efficiency to assess ozone
sensitivity to hydrocarbons and NOx. Part 1: Local
indicators of instantaneous odd oxygen production
sensitivity. Journal of Geophysical Research
[submitted July 1997],
	, 1998b: Analysis of radical propagation efficiency to
assess ozone sensitivity to hydrocarbons and NOx.
Part 2: Long lived species as indicators of ozone
concentration sensitivity. Journal of Geophysical
Research [submitted August 1997].
Trainer, M., D. D. Parrish, M. P. Buhr, R. B. Norton, F. C.
Fehsenfeld, K. G. Aniauf, J. W. Bottenheim, Y. Z.
Tang, H. A. Wiebe, J. M. Roberts, R. L. Tanner, L.
Newman, V. C. Bowersox, J. F. Meagher, K. J.
Oiszyna, M. O. Rodgers, T. Wang, H. Berresheim, K.
L. Demerjian, and U. K. Roychowdhury, 1993:
Correlation of ozone with NOv in photochemically
aged air. Journal of Geophysical Research, 98, 2917-
25.
USEPA, 1991: Guideline for Regulatory Application of the
Urban Airshed Model. USEPA OAQPS, Research
Triangle Park, NC.
Whitten, G. Z., 1983: The chemistry of smog formation: A
review of current knowledge. Environment
International, 9, 447-63.
Acknowledgements: J, R. Arnold's support is provided by
the NOAA / EPA Postdoctoral Program administered by
the University Corporation for Atmospheric Research. Gail
S. Tonnesen is a National Research Council Postdoctoral
Fellow.
This paper has been reviewed in accordance with the U.S.
Environmental Protection Agency's peer review policies
and approved for presentation and publication. Mention
of trade names or commercial products does not
constitute endorsement or recommendation for use.

-------
TECHNICAL REPORT DATA
1. REPORT NO.
EPA/600/A-97/098
2.
3 . F >10 .
4. TITLE AND SUBTITLE
Advanced techniques for evaluating Eulerian air quality models: Background and
methodology
5.REPORT DATE
6.PERFORMING ORGANIZATION CODE
7. AUTHOR(S)
6. Author(s), Affiliation, and Address (Identify EPA authors with Lab/Office)
J R. Arnold
P.O Box 23193
Seattle, WA 98102-0493
R.L. Dennis
Atmospheric Modeling Division
National Exposure Research Laboratory
Research Triangle Park, NC 27711
G S Tonnesen
National Exposure Research Laboratory
Research Triangle Park, NC 27711
8.PERFORMING ORGANIZATION
REPORT NO.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Same as block 12.
10.PROGRAM ELEMENT NO.
11. CONTRACT/GRANT NO.
12. SPONSORING AGENCY NAME AND ADDRESS
National Exposure Research Laboratory
Office of Research and Development
U. S. Environmental Protection Agency
Research Triangle Park, NC 27711
13.TYPE OF REPORT AND PERIOD COVERED
Preprints, 10th Joint Conference on the Applications of Air
Pollution Meteorology with the A&WMA, January 11-16,
1998, Phoenix, Arizona
14. SPONSORING AGENCY CODE
15. SUPPLEMENTARY NOTES
16. ABSTRACT

-------
• Modei* Evaluation Goals and Objectives
The goal of model evaluation within the framework of applications is to determine an air quality
model (AQM's) degree of acceptability and usefulness for a specified application or task. That goal is
met by satisfying three objectives:
•	indicate the validity of the model's scientific formulations
•	evaluate the realism of the model's simulations
•	characterize the credibility of the model's realism relative to its intended application.
Satisfying these objectives, however, presents two fundamental problems for providing a meaningful
interpretation of model behavior to guide decision-making:
how to tell whether the model is producing seemingly appropriate results from incorrect
formulations or input data; i.e., how model performance can appear right for the wrong
reasons
how to establish the model's acceptability when the evaluation tests appear inadequate for
making a determination of acceptability.
17.			KEY WORDS AND DOCUMENT ANALYSIS
a. DESCRIPTORS
b.IDENTIFIERS/ OPEN ENDED
TERMS
e. COS ATI



18. DISTRIBUTION STATEMENT
19. SECURITY CLASS (This Report)
UNCLASSIFIED
21 .NO. OF PAGES
20. SECURITY CLASS (This Page)
UNCLASSIFIED
22. PRICE

-------