United States
Environmental Protection
Agency
Risk Reduction Engineering
Laboratory
Cincinnati OH 45268
Research and Development
EPA/600/S8-89/102 June 1990
&EPA Project Summary
MOUSE (Modular Oriented
Uncertainty SystEm):
A Computerized Uncertainty
Analysis System
Albert J. Klee
Environmental engineering
calculations involving uncertainties,
either in the model itself or in the
data, are far beyond the capabilities
of conventional analysis for any but
the simplest of models. There exist a
number of general-purpose computer
simulation languages, using Monte
Carlo methods, that are capable of
such analysis, but these languages
are difficult to learn and to implement
quickly.
MOUSE (an acronym for Modular
Oriented Uncertainty SystEm) deals
with the problem of uncertainties in
models that consist of one or more
algebraic equations. It was especially
designed for use by those with little
or no knowledge of computer
languages, programming, or
simulation. It is designed to be run
on almost any personal computer,
easy and fast to learn, and has all of
the features needed for substantive
uncertainty analysis (built-in
probability distributions, plotting and
graphing capabilities, sensitivity
analysis, interest functions for cost
analyses, etc.). Moreover, a series of
unique companion utility programs
write much of the necessary
computer code for the user, help in
analyzing sample data to determine
the probability distributions that best
fit those data, check each program
for errors in syntax, and assist in
finding logical errors in the model
that is subject to uncertainty.
Some typical examples of the use
of MOUSE within the U.S. Environ-
mental Protection Agency include:
studying the migration of pollution
plumes in streams, establishing
regulations for hazardous wastes in
landfills, and estimating pollution
control costs.
This Project Summary was
developed by EPA's Risk Reduction
Engineering Laboratory, Cincinnati,
OH, to announce key findings of the
research project that is fully
documented in a separate report of
the same title (see Project Report
ordering information at back).
Introduction
If we define a model as a physical or
symbolic representation of reality, we find
among the set of all models one called
the mathematical model, one particular
type of which consists of a series of one
or more algebraic equations.
Mathematical models of this type are
extremely important for they are found
almost everywhere, including the
environmental sciences, economics,
engineering, and science in general. The
use of an equation is understood by
almost everyone; in a somewhat
"inelegant" sense, numbers "go into" the
equation and an answer is obtained. For
example, consider the following very
simple equation,
Y = AB
[1]
where Y might stand for an equipment
purchase cost, given that A is the unit
price and B is the number of units
required. Alternatively, equation 1 might
represent an engineering calculation
where Y is the cross-sectional area of a
-------
heating duct, given that A is its height
and B is its width. In any event, to
"solve" equation 1, one has to know the
values of A and B. If A is equal to 2, for
example, and B is equal to 15, then Y =
2x15 = 30. Often, however, we are not
sure of the values of A or of B. A might
be 3 and B might be 20; in such event Y
would be equal to 60, not 30. The greater
our uncertainty about the input variables
A and B, clearly the greater our
uncertainty about the output variable, Y.
The most often encountered
approaches to uncertainty in
mathematical models are: (1) the best
value approach, (2) the conservative
approach, and (3) sensitivity analysis.
The first two are single-value approaches.
The "best" in "best value" is not
precisely defined; generally it refers to
some measure of central tendency such
as an average or a mode. In the
equipment purchase problem, we might
suppose that the values of A of 2 and of
B of 15 are average values. Presumably,
the answer of 30 is also some sort of
average value.
In the conservative approach, the input
values selected are not the average or
most likely ones but rather those that
produce conservative results with regard
to the consequences of over- or
underestimating. For example, in the
conservative approach for the duct
example, the values of A and B selected
to go into equation 1 would be greater
than their average values of 2 and 15
since overestimation is probably better
than underestimation in this case. If
"best" values were used, there would be
too great a chance that the duct area
would be underestimated.
Traditional sensitivity analysis usually
starts with a best value estimate, followed
by a perturbation or change in one of the
input variables (holding all other input
variables at their previous values). The
perturbation can be either an increase or
a decrease in the value of the variable
and hence can be either of a
"conservative" nature or a "liberal" one.
The perturbation is generally within the
known or believed uncertainty range of
the variable. The process is repeated for
as many variables and for as great a
change as is desired. For example, to
examine the effects of a modest change
in A in equation 1, we might increase the
value of A by 10% over its "best"
estimate value of 2. A 10% increase in A
(to 2.2) results in an estimated Y-value of
33. If a value of A of 2.2 is "reasonably
likely" to occur, then the traditional
sensitivity analysis suggests that a value
of Y of 33 is also "reasonably likely" to
occur.
Problems with Traditional
Approaches
Clearly, the best value approach does
not address the problem of uncertainty at
all. Furthermore, "best" input values do
not necessarily have high probabilities of
occurring. Another difficulty arises when
the algebraic model contains non-linear
elements such as multiplications or
divisions and the variables are correlated.
If variables A and B of equation 1 were
correlated, for example, the average of Y
would not be equal to the average of A
times the average of B. In point of fact, if
A and B were positively correlated, then
the average of Y would be greater than
the product of the averages; conversely,
if A and B were negatively correlated, the
average of Y would be less than the
product of the averages. The
conservative approach also has its
deficiencies. First, in a complex
calculation involving many equations and
many input variables (some of which may
be correlated), it may not be obvious
what values of the input variables
constitute "conservative" ones with
respect to the output. Second, because
conservative input values generally are
those with a low probability of occurring,
the estimates obtained by using such
values perforce will not have a high
probability of occurring. The worst thing,
however, about both of these approaches
is that the point estimates involved do not
utilize all of the information that is usually
available. One usually has at least some
idea of the uncertainties in the data at
hand.
Sensitivity analysis, being largely an
amalgamation of elements of both the
best value and conservative approaches,
suffers the defects of both methods. An
arbitrary change in the value of an input
variable, even though the change falls
within the expected range of the variable,
tells us little about the likelihood of
occurrence of the new estimate obtained.
In other words, if we know little about the
likelihood of such a change occurring, it
follows that we know little about the
likelihood of the calculated output
occurring. Furthermore, in traditional
sensitivity analysis, all other variables are
held at their previous values, the so-
called "all other things being equal" view
of the world. The problem is that "all
other things" are seldom equal. In
actuality, the change we introduce in a
variable under traditional sensitivity
analysis may well be either mitigated or
intensified by what is happening to
other variables. In short, traditio
sensitivity analysis does not show
combined net effect of changes in
variables or the likelihood of vari<
changes occurring together. Viewed
this manner, the traditional sensith
analysis can be misleading.
Alternative Solutions to the
Uncertainty Problem
The first alternative, illustrated in Fig
1, is Direct or Complete Enumerati
The model of equation 1 is employ
and we assume the uncertainties o
and B as given in the two probab
distributions for these variables showi
the upper left-hand corner of the fig
In other words, we suppose that then
a 25% chance that A is equal to 1, E
that it is equal to 2, and 25% that
equal to 3. For B, there is a 50-50 cha
that it is equal to either 10 or 20. (For
simple example, we assume
correlation between the two in
variables.) In complete enumeration
list all of the possible combinations of
input variables and then calculate
probabilities of these combinati
occurring. In this example there ar
choices for A and 2 for B, resulting
possible outcomes for Y.
probabilities of these combinations
shown in the middle top of the fig
Since some of the combinations
duplications, the table of combination
A and B may be simplified to th
entries shown at the upper right of
figure. The average value of Y is sh
to be 30. At the bottom of Figure 1
graph of the frequency or probat
distribution of Y. Note that the most li
value is not the average (or"best") v
but rather values to either side c
Furthermore, one of the extreme v£
(Y = 60) has a higher probabilit
occurrence than has the average v
As can be seen, the comp
enumeration method tells us everyi
about the distribution of Y, includin
mean, standard deviation, mininr
maximum, and the probability
occurrence of any value ol
Furthermore, it is an exact met
Unfortunately, if the number of outcc
is large (or if the probability distribu
are continuous), the metho(
computationally impractical.
The second alternative is
Probability Calculus method.
method, as the name implies, req
some knowledge of the calculu
probabilities (sometimes know
engineering as the "propagatio
-------
Model: Y = A x B
Basic Data
A p(A) B p(B)
1 .25 10 .50
2 .50 20 .50
3 .25
Outcomes
A
1
2
3
1
2
3
B
10
10
10
20
20
20
AB
10
20
30
20
40
60
P(AB)
.125
.250
125
.125
.250
.125
Ordered
Outcomes
AB
10
20
30
40
60
Total
p(AB)
.125
.375
.125
.250
.125
= 1 .000
Total = 1.000
.40 -
.30 —
f(Y) .20 -
.10 -
.00
0 10 20 30 40 50 60
Y
Probability Distribution of Y
Figure 1. Direct (complete enumeration).
error"). Using the model of equation 1 as
before, the method is illustrated in Figure
2. The error formula is given at the top of
the figure and involves three terms and
knowledge of the variances of A and B.
The latter are calculated, as is shown in
the figure, from the probability
distributions of A and B given previously
in Figure 1. The error formula shows the
variance of Y to be 225. The Probability
Calculus method produces no more than
the mean and the variance of the output
(i.e., Y) distribution. That the variance
alone is not sufficient to determine the
nature of uncertainty, however, is clear
from Figure 1.
The third approach to the problem of
uncertainty is a form of Monte Carlo
simulation known as Model Sampling.
Ke idea of Model Sampling is relatively
nple:
1. A value for each of the input
variables is drawn at random
from their respective probability
distributions, and the model is
computed using this particular
set of values.
2. The above process is repeated
many times. Since the results
vary with each iteration, the
outputs themselves (i.e., the Y's)
are gathered in the form of a
probability distribution. Thus the
uncertainties of the model's
inputs are transferred to the
output that can then be studied
and subsequently used in
decision processes.
The procedure is shown schematically
in Figure 3. The output (and the
accuracy) of the Monte Carlo simulation
method becomes almost equal to that of
complete enumeration as the number of
iterations becomes large. Unlike Direct
Enumeration, however, large and/or
complex problems are tractable and
continuous uncertainty distributions are
easily handled. The Monte Carlo
simulation method forms the basis for
MOUSE, the computerized uncertainty
analysis system that is the subject of this
summary.
MOUSE, A Computerized
Uncertainty Analysis System
To conduct Monte Carlo simulation on
an environmental, economic, engineering,
or scientific model, you must
communicate the model and any input
information to the computer and instruct
it as to what is required with regard to the
nature and the output of the analysis.
-------
Model: Y = AxB
Error Formula: var(AB) = var(Y) +
x var(B ) + B2 x var(A)
Let A = 2 and § = 15
P(A)
(A - A? p(A)(A • A?
.25
.50
.25
7
2
3
7
0
7
.25
.00
.25
P(B)
.50
.50
B (B - 3)2
10 25
20 25
var(A) = .50
p(B)(B - 3)2
72.5
72.5
var(B) = 25.0
Therefore,
var(Y) = 22(25.0) + 152(0.50)
+ (0.50)(25.0) = 225
Figure 2. Probability calculus.
Since models are specific to the problem
at hand, the model must be coded in
some sort of high-level (i.e., English-like)
computer language. For environmental
engineering uncertainty analysis, the
most commonly-available high-level
languages for personal computers are
Compiled Basic, FORTRAN, and Pascal.
With regard to coding a particular
environmental model itself, there is little
difference among them (e.g., the
equation y = ab is written as Y = A*B in all
three), although there are significant
differences with regard to the sometimes
arcane but necessary programming
matters that are part of (a) the
specification burdens associated with
these languages (e.g., declaring and
defining variables, arrays, and common
storage), and (b) input/output. Skipping
the arguments over the competing virtues
of these languages, we note simply that
FORTRAN was selected as the basis for
MOUSE primarily because it is the most
standardized of the three options. With
any compiled language, the source text is
generally created with an editor, saved to
disk, and then processed by a compiler
and linker before it is run. Thus, the basic
requirements for MOUSE are simply a
text editor and a FORTRAN
compiler/linker.
Anyone knowing a few arithmetic
symbols can write even the most
complex of models in any of these
languages. The overhead burden (as
exemplified by FORTRAN statements
such as DIMENSION and COMMON) and
the input/output procedures (as
exemplified by the FORTRAN statements
such as OPEN and FORMAT) are quite
another thing, however. MOUSE avoids
the former problem by providing a uti
program (called TRAP, for TRA\
formation Preprocessor) that actuj
writes the required specifications;
solves the latter problem by doing al
the output and most of the input its
The output power of MOUSE is showr
Figure 4, where a typical MOU
histogram is presented.
Other than the model itself,
overhead burden, and the input/out
procedures, a computer program m
also contain instructions from the user
to what is required, e.g., what out
variables are to be gathered and plai
into histograms, what probabi
distributions (and their parameters) ar<
be used for the uncertain variables, v»
sensitivity analyses are to be perforrr
etc. In MOUSE, these are communicj
via single lines inserted into the prog
that "call" MOUSE functions
subroutines. These calls must be wri
precisely, since computers
notoriously unhelpful when it come;
second guessing exactly what you w
MOUSE, however, assists you in
ways. Firstly, while you are writing }
program (using a word processor of }
choice), there is an on-line (merm
resident) program available at all tii
that immediately shows you the forrr
and the arguments needed for, each
Secondly, with a little prompting for
arguments, TRAP will write these calls
you. Thirdly, there is another ut
program, called CHECKER, that will :
your program for errors, line by
Model: Y = AxB
Start: i = 1
Random
Sample
Y = AxB
Random
^Sample
I
Record Y(
Finish
Figure 3. Monte Carlo simulation.
From collection of Y's obtain:
1. Mean
2. Standard deviation
3. Coefficient of variation
4. Minimum
5. Maximum
6. Graph of frequency distribution
7. Graph of cumulative frequency distribution
-------
******************************************
* DISTRIBUTION FOR QUANTITY TEST *
******************************************
NUMBER OF ITERATIONS
5000
MEAN
MINIMUM
MAXIMUM
20361.21000
4219.64600
82200.32000
STANDARD DEVIATION =
COEFFICIENT OF VARIATION, X =
12154.16000
59.69275
LOWER
LIMIT
4200.0000
6100.0000
8000.0000
9900.0000
11800.0000
13700.0000
15600.0000
17500.0000
19400.0000
21300.0000
23200.0000
25100.0000
27000.0000
28900.0000
30800.0000
32700.0000
34600.0000
36500.0000
38400.0000
40300.0000
42200.0000
44100.0000
46000.0000
47900.0000
49800.0000
51700.0000
53600.0000
55500.0000
57400.0000
59300.0000
61200.0000
63100.0000
65000.0000
OVERFLOW
NUMBER OF
ENTRIES
60.
245.
443.
585.
486.
467.
381.
374.
248.
239.
194.
158.
127.
121.
107.
94.
95.
69.
75.
70.
56.
56.
55.
35.
26.
24.
28.
17.
11.
15.
5.
6.
9.
19.
PERCENT
ENTRIES
1.20
4.90
8.86
11.70
9.72
9.34
7.62
7.48
4.96
4.78
3.88
3.16
2.54
2.42
2.14
1.88
1.90
1.38
1.50
1.40
1.12
1.12
1.10
.70
.52
.48
.56
.34
.22
.30
.10
.12
.18
.38
CUMULATIVE
X ENTRIES
1.20
6.10
14.96
26.66
36.38
45.72
53.34
60.82
65.78
70.56
74.44
77.60
80.14
82.56
84.70
86.58
88.48
89.86
91.36
92.76
93.88
95.00
96.10
96.80
97.32
97.80
98.36
98.70
98.92
99.22
99.32
99.44
99.62
100.00
CUMULATIVE
COMPLEMENT
98.80
93.90
85.04
73.34
63.62
54.28
46.66*
39.18
34.22
29.44
25.56
22.40
19.86
17.44
15.30
13.42
11.52
10.14
8.64
7.24
6.12
5.00
3.90
3.20
2.68
2.20
1.64
1.30
1.08
.78
.68
.56
.38
.00
DISTRIBUTIONS
FREQUENCY DISTRIBUTION
CUMULATIVE DISTRIBUTION
*Q*****
****Q**********************
*********Q*************************************
****************Q********************************************
**********************Q*****************************
»*********************»*****Q*********************
*********************************Q*******
*************************************Q**
*************************** o
************************** Q
********************* Q
***************** Q
************** o
************** o
************ Q
*********** Q
*********** Q
******** 0
********* Q
******** Q
******* Q
******* Q
******* Q
***** 0
**** 0
**** 0
**** 0
*** 0
** 0
***
**
**
**
***
CUMULATIVE CUMULATIVE
X ENTRIES COMPLEMENT
VALUE OF
TEST
5.0
10.0
25.0
50.0
75.0
90.0
95.0
99.0
95.0
90.0
75.0
50.0
25.0
10.0
5.0
1.0
5673.4690
6936.3430
9630.4280
14767.1900
23536.7100
36677.3300
44100.0000
57906.6700
Figure 4. Typical example of statistics, histogram and graphs produced by MOUSE.
When it finds an error, it will tell where it
is and what is wrong.
It is very possible to write a computer
program that contains no syntactical
errors whatsoever but is rife with logicaJ
errors. One way to detect logical errors is
to examine the results of intermediate
calculations for reasonableness. Utilizing
(a device known as a "Trace Line,"
MOUSE will print out the value of any
variable at 1, 20, 50, and 100 iterations of
the Monte Carlo method. A utility
program, called TRACER, will
automatically insert these trace lines into
your program and, when you are finished,
remove them as well.
It is not always clear what probability
distributions should be used for the
uncertain inputs of an environmental
engineering model, and the fitting of
probability distributions to sample data is
a statistical skill not possessed by all. A
MOUSE utility program known as IMP
(/nteractive Modeler for Probabilities) not
only will fit a classical probability
distribution to sample data, it will fit an
empirical distribution hand-drawn on
graph paper as well and also analyze a
data set for auto- and bivariate-
correlations.
For environmental engineering models
involving algebraic equations, MOUSE is
superior to either general purpose
-------
programming or simulation languages. It kind faster and easier than can other solution. Further, MOUSE programs are
is concise, powerful, and convenient and languages. With MOUSE, your attention easier to understand, explain to others
easy to use. MOUSE can solve is on problem-solving, rather than on the and modify than are general purpose
uncertainty problems of the algebraic details of coding a program to compute a programming or simulation languages.
-------
The EPA author, Albert J. Klee (also the EPA Project Officer, seet&feifois
the Risk Reduction Engineering Laboratory, Cincinnati, OH 45268.
The complete report consists of paper copy and diskette, entitled "MOUSE
(Modular Oriented Uncertainty SystEm): A Computerized Uncertainty Analysis
System:"
Paper Copy (Order No. PB 90-172 560/AS; Cost: $31.00, sub/ect to change)
Diskette (Order No. PB 90-501370/AS, Cost $80.00, subject to change)
(Cost of diskette includes paper copy.)
The above items will be available only from:
National Technical Information Service
5285 Port Royal Road
Springfield, VA 22161
Telephone: 703-487-4650
The EPA Project Officer can be contacted at:
Risk Reduction Engineering Laboratory
U.S. Environmental Protection Agency
Cincinnati, OH 45268
United States Center for Environmental Research
Environmental Protection Information
Agency Cincinnati OH 45268
Official Business
Penalty for Private Use $300
EPA/600/S8-89/102
C00085836 H«EHL
USEPA REGION V LIBRAE*
230 5 DEARBORN ST
as 1670
CHICAGO IL 6C6QH
------- |