Mouse (modular Oriented Uncertainty System) A Computerized Uncertainty Analysis System


                  United States
                  Environmental Protection
                  Agency
Risk Reduction Engineering
Laboratory
Cincinnati OH 45268
                  Research and Development
EPA/600/S8-89/102  June 1990
&EPA         Project  Summary
                   MOUSE  (Modular  Oriented
                   Uncertainty SystEm):
                   A  Computerized  Uncertainty
                   Analysis System
                   Albert J. Klee
                    Environmental  engineering
                  calculations  involving uncertainties,
                  either  in the model  itself or in the
                  data, are far beyond  the capabilities
                  of conventional analysis for any but
                  the simplest of models. There exist a
                  number of general-purpose computer
                  simulation languages, using Monte
                  Carlo methods, that  are capable of
                  such analysis, but these languages
                  are difficult to learn and to implement
                  quickly.
                    MOUSE (an acronym for Modular
                  Oriented Uncertainty  SystEm) deals
                  with the problem of uncertainties in
                  models that  consist of one or more
                  algebraic equations. It was especially
                  designed for use by those with little
                  or no  knowledge  of computer
                  languages, programming,  or
                  simulation. It is  designed to be run
                  on almost any personal computer,
                  easy and fast to learn, and has all of
                  the features needed  for substantive
                  uncertainty analysis (built-in
                  probability distributions, plotting and
                  graphing  capabilities,  sensitivity
                  analysis, interest functions for cost
                  analyses, etc.). Moreover, a  series of
                  unique  companion utility programs
                  write  much of the  necessary
                  computer code for the  user, help in
                  analyzing sample data to determine
                  the probability distributions  that best
                  fit those data, check each program
                  for errors in syntax, and assist in
                  finding  logical errors  in the model
                  that is subject to uncertainty.
                    Some typical examples of the use
                  of MOUSE within the U.S.  Environ-
                  mental  Protection Agency include:
studying the migration of pollution
plumes  in streams, establishing
regulations for hazardous wastes  in
landfills, and estimating  pollution
control costs.
  This Project  Summary was
developed by EPA's  Risk Reduction
Engineering Laboratory, Cincinnati,
OH, to announce key findings of the
research  project  that  is  fully
documented in a separate report  of
the same title (see Project  Report
ordering information at back).

Introduction
  If we define a model as a physical  or
symbolic representation of reality, we find
among the set of all models  one called
the mathematical model, one particular
type of which consists of a series of one
or  more  algebraic  equations.
Mathematical models of this type are
extremely important  for they are found
almost everywhere, including the
environmental sciences, economics,
engineering, and science in general. The
use of an equation is understood by
almost everyone;  in  a  somewhat
"inelegant" sense, numbers "go into" the
equation and an answer is obtained. For
example, consider the following very
simple equation,
          Y = AB
[1]
where Y might stand for an equipment
purchase cost, given that A is the unit
price and  B is the number of units
required. Alternatively, equation 1 might
represent an engineering calculation
where Y is the cross-sectional area of a

-------
heating  duct, given that A is its  height
and  B  is its  width.  In any event, to
"solve"  equation 1, one has to know the
values of A and B. If A is equal to 2, for
example, and B is equal to 15, then Y =
2x15 = 30. Often, however, we are not
sure of the values of A or  of B. A might
be 3 and B might be 20; in such event Y
would be equal to 60, not 30. The greater
our uncertainty about the input variables
A  and  B,  clearly  the  greater  our
uncertainty about the output variable, Y.
  The  most  often  encountered
approaches  to   uncertainty   in
mathematical models are: (1) the best
value approach,  (2)  the  conservative
approach, and  (3) sensitivity analysis.
The first two are single-value approaches.
The  "best"  in  "best  value"  is  not
precisely defined;  generally  it refers to
some measure of central tendency such
as  an  average  or  a  mode.  In  the
equipment purchase  problem, we might
suppose that the values of A of 2 and of
B of 15  are average values. Presumably,
the answer of 30  is also  some sort of
average value.
  In the conservative approach, the input
values selected are not the average or
most  likely ones  but rather those  that
produce conservative results with regard
to  the  consequences   of  over-  or
underestimating.  For example,  in  the
conservative  approach  for the  duct
example, the values of A and B selected
to go into equation 1 would  be  greater
than their average values of 2 and  15
since overestimation is probably  better
than  underestimation  in  this case.  If
"best" values were used, there would be
too great  a chance  that the duct area
would be underestimated.
  Traditional sensitivity analysis  usually
starts with a best value estimate, followed
by a perturbation or change in one of the
input  variables  (holding all  other input
variables at their previous values).  The
perturbation can be either an increase or
a decrease in the value of the variable
and  hence  can  be  either  of  a
"conservative" nature or a "liberal" one.
The  perturbation is generally within the
known or believed uncertainty range of
the variable. The process is repeated for
as many  variables  and for  as  great a
change  as is  desired. For example, to
examine the effects of a modest change
in A in equation 1, we might increase the
value of  A by  10%  over its  "best"
estimate value of 2. A 10% increase  in A
(to 2.2)  results in an estimated Y-value of
33. If a  value of A of 2.2 is  "reasonably
likely"  to occur,  then the  traditional
sensitivity analysis suggests  that a value
of Y of 33 is also "reasonably likely" to
occur.

Problems  with Traditional
Approaches
  Clearly, the best value approach does
not address the problem of uncertainty at
all.  Furthermore, "best" input values do
not necessarily have high probabilities of
occurring. Another difficulty arises  when
the algebraic model  contains non-linear
elements such  as  multiplications or
divisions  and the variables are correlated.
If variables  A and B  of equation  1  were
correlated, for example, the average of Y
would not be equal  to  the average of A
times the average of B. In point of fact, if
A and B  were positively correlated, then
the average  of Y would be greater than
the product  of the averages; conversely,
if A and B were negatively correlated, the
average of  Y  would be less than the
product  of  the   averages.  The
conservative  approach also has  its
deficiencies.  First,  in  a  complex
calculation involving  many equations and
many input variables (some of which may
be  correlated),  it may not be  obvious
what  values  of the  input  variables
constitute   "conservative"  ones  with
respect to the output.  Second, because
conservative input values generally are
those with a low probability of occurring,
the estimates  obtained by using  such
values perforce  will  not have  a high
probability of occurring. The worst thing,
however,  about both of  these approaches
is that the point estimates involved do not
utilize all  of the information that is usually
available. One usually has at  least some
idea of the  uncertainties in  the  data at
hand.
  Sensitivity analysis, being  largely an
amalgamation  of elements of both the
best value and conservative approaches,
suffers the defects of both methods. An
arbitrary  change in the value  of an input
variable,  even though  the  change  falls
within the expected range of the variable,
tells us  little  about  the likelihood of
occurrence of the new estimate obtained.
In other words, if we know little about the
likelihood of such a change occurring, it
follows that we  know  little  about the
likelihood  of the  calculated  output
occurring.  Furthermore,  in  traditional
sensitivity analysis, all other variables are
held at their previous  values, the so-
called "all other things being equal" view
of the world.  The problem  is that "all
other  things"   are  seldom  equal.  In
actuality,  the change  we introduce in a
variable  under  traditional  sensitivity
analysis  may well be either mitigated or
intensified by what is happening  to
other variables.  In  short,  traditio
sensitivity analysis  does not show
combined net effect of changes in
variables  or the  likelihood  of  vari<
changes  occurring  together.  Viewed
this  manner, the  traditional  sensith
analysis can be misleading.

Alternative Solutions to the
Uncertainty Problem
  The first alternative, illustrated in Fig
1,  is Direct or  Complete Enumerati
The  model  of equation  1  is  employ
and  we assume the uncertainties o
and  B  as given in the two  probab
distributions for  these variables showi
the upper left-hand corner of the fig
In  other words, we suppose that then
a 25%  chance that A is  equal to 1, E
that  it is equal to  2, and 25% that
equal to 3. For B, there is a 50-50 cha
that it is equal to either 10 or 20. (For
simple  example, we  assume
correlation  between  the  two in
variables.) In complete  enumeration
list all of the possible combinations of
input variables  and then  calculate
probabilities of these  combinati
occurring. In this  example there  ar
choices for A and  2 for  B, resulting
possible   outcomes  for  Y.
probabilities of  these  combinations
shown  in  the middle  top  of  the  fig
Since  some of the combinations
duplications, the  table  of combination
A  and  B  may  be  simplified  to  th
entries  shown at the upper right of
figure. The average value of Y is sh
to  be 30. At the bottom  of Figure 1
graph of the frequency or  probat
distribution of Y.  Note that the most li
value is not  the average (or"best") v
but  rather  values  to  either  side  c
Furthermore, one of the extreme v£
(Y  = 60)  has  a  higher probabilit
occurrence than has the average v
As  can  be  seen,  the  comp
enumeration method tells  us everyi
about the distribution  of Y, includin
mean,  standard deviation,  mininr
maximum,  and  the  probability
occurrence  of   any   value   ol
Furthermore, it is an  exact  met
Unfortunately, if the number of outcc
is  large (or if the probability distribu
are  continuous), the   metho(
computationally impractical.
  The   second   alternative   is
Probability Calculus method.
method,  as the  name implies, req
some  knowledge  of the  calculu
probabilities (sometimes  know
engineering as the "propagatio

-------
                              Model: Y = A x B
Basic Data
A p(A) B p(B)
1 .25 10 .50
2 .50 20 .50
3 .25


Outcomes
A
1
2
3
1
2
3
B
10
10
10
20
20
20
AB
10
20
30
20
40
60
P(AB)
.125
.250
125
.125
.250
.125
Ordered
Outcomes
AB
10
20
30
40
60
Total
p(AB)
.125
.375
.125
.250
.125
= 1 .000
                                                Total = 1.000
                     .40 -
                     .30 —
                  f(Y) .20 -
                     .10 -
                     .00
                         0    10   20   30    40    50   60

                                     Y

                            Probability Distribution of Y
Figure 1. Direct (complete enumeration).
error"). Using the model of equation 1 as
before, the method is illustrated in Figure
2. The error formula is given at the top of
the figure and involves three terms and
knowledge of the  variances  of A  and B.
The latter are calculated, as is shown in
the  figure, from the  probability
distributions  of A and B given previously
in Figure 1. The error formula shows the
variance of Y to be 225. The Probability
Calculus method produces no more than
the mean and the variance of the output
(i.e.,  Y) distribution. That the  variance
alone  is not sufficient  to determine the
nature of uncertainty,  however, is clear
from Figure 1.
  The third approach to the problem  of
uncertainty  is  a form of Monte  Carlo
simulation  known  as Model Sampling.
  Ke idea of Model Sampling is  relatively
  nple:
    1.   A value for  each of the input
        variables is  drawn  at  random
        from their  respective probability
        distributions,  and the model is
        computed  using  this particular
        set of values.

    2.   The above process  is repeated
        many times. Since  the  results
        vary with  each  iteration,  the
        outputs themselves (i.e., the  Y's)
        are  gathered in the form of  a
        probability  distribution. Thus the
        uncertainties of the  model's
        inputs  are transferred  to  the
        output that can then be studied
        and subsequently used  in
        decision processes.

  The procedure is shown schematically
in Figure  3. The  output  (and  the
accuracy) of the Monte Carlo simulation
method becomes almost equal to that of
complete enumeration as the  number of
iterations becomes  large.  Unlike  Direct
Enumeration, however,  large  and/or
complex problems are  tractable and
continuous  uncertainty  distributions are
easily  handled.  The Monte  Carlo
simulation method  forms the  basis for
MOUSE, the computerized  uncertainty
analysis system that is the subject of this
summary.

MOUSE, A Computerized
Uncertainty Analysis System
  To conduct Monte Carlo simulation on
an environmental, economic, engineering,
or  scientific  model,   you  must
communicate the  model  and  any input
information  to the computer and instruct
it as to what is required  with regard to the
nature  and the output  of the analysis.

-------
           Model: Y = AxB

    Error Formula: var(AB)  = var(Y) +

         x var(B ) + B2 x var(A)

         Let A = 2 and § = 15
 P(A)
(A - A?   p(A)(A • A?
.25
.50
.25
7
2
3
7
0
7
.25
.00
.25

P(B)
.50
.50


B (B - 3)2
10 25
20 25

var(A) = .50
p(B)(B - 3)2
72.5
72.5
var(B) = 25.0
 Therefore,
    var(Y) = 22(25.0) + 152(0.50)
   + (0.50)(25.0) = 225

Figure 2. Probability calculus.
Since models are specific to the problem
at hand,  the  model must  be coded  in
some sort of high-level (i.e., English-like)
computer  language. For environmental
      engineering uncertainty  analysis,  the
      most commonly-available  high-level
      languages for  personal  computers  are
      Compiled Basic, FORTRAN, and Pascal.
      With  regard  to  coding  a particular
      environmental model  itself, there is little
      difference  among  them (e.g.,  the
      equation y = ab is written as Y = A*B in all
      three), although there  are  significant
      differences with regard to the sometimes
      arcane  but necessary  programming
      matters  that  are  part of  (a)  the
      specification  burdens  associated with
      these languages  (e.g.,  declaring and
      defining variables,  arrays,  and  common
      storage),  and (b)  input/output.  Skipping
      the arguments over the competing virtues
      of these languages, we note simply  that
      FORTRAN was selected  as the basis for
      MOUSE primarily because it is  the most
      standardized of the three  options. With
      any compiled language, the source  text is
      generally created with an editor, saved to
      disk, and then processed by a compiler
      and linker before it is run. Thus,  the basic
      requirements  for MOUSE  are simply  a
      text  editor  and   a  FORTRAN
      compiler/linker.
        Anyone  knowing  a few arithmetic
      symbols  can write  even  the  most
      complex of models  in  any  of  these
      languages. The overhead burden  (as
      exemplified  by FORTRAN statements
      such as DIMENSION and COMMON) and
      the  input/output  procedures  (as
      exemplified by the FORTRAN statements
      such as OPEN and FORMAT) are quite
      another  thing,  however.  MOUSE avoids
the former problem by providing a uti
program (called  TRAP, for  TRA\
formation Preprocessor) that actuj
writes  the  required specifications;
solves  the latter problem  by  doing al
the output and  most of the  input its
The output power of MOUSE is showr
Figure  4,  where  a  typical  MOU
histogram is presented.
  Other  than the  model  itself,
overhead burden, and  the  input/out
procedures, a computer  program m
also contain instructions from the user
to what  is  required, e.g., what  out
variables are  to be gathered and plai
into histograms,  what probabi
distributions (and their parameters) ar<
be used for the uncertain variables, v»
sensitivity analyses are to be perforrr
etc. In  MOUSE, these are communicj
via single lines inserted into the prog
that  "call"   MOUSE   functions
subroutines. These calls must be wri
precisely,   since  computers
notoriously  unhelpful when  it come;
second guessing exactly what you w
MOUSE, however, assists  you  in
ways. Firstly,  while you are writing }
program (using a word processor of }
choice), there is  an on-line  (merm
resident) program available at all tii
that immediately shows you the forrr
and the arguments needed for,  each
Secondly, with a little prompting  for
arguments, TRAP will write these calls
you. Thirdly, there  is  another  ut
program, called CHECKER, that will :
your program for  errors, line by
                             Model: Y = AxB
                 Start: i  = 1
           Random
           Sample
                Y = AxB
          Random

         ^Sample
                    I
                 Record Y(
                  Finish

Figure 3.  Monte Carlo simulation.
From collection of Y's obtain:

1.  Mean
2.  Standard deviation
3.  Coefficient of variation
4.  Minimum
5.  Maximum
6.  Graph of frequency distribution
7.  Graph of cumulative frequency distribution

-------
                        ******************************************
                        *    DISTRIBUTION  FOR QUANTITY TEST      *
                        ******************************************
      NUMBER OF  ITERATIONS
      5000
                   MEAN
                MINIMUM
                MAXIMUM
                    20361.21000
                     4219.64600
                    82200.32000
         STANDARD DEVIATION =
COEFFICIENT  OF VARIATION, X =
12154.16000
   59.69275
LOWER
LIMIT
4200.0000
6100.0000
8000.0000
9900.0000
11800.0000
13700.0000
15600.0000
17500.0000
19400.0000
21300.0000
23200.0000
25100.0000
27000.0000
28900.0000
30800.0000
32700.0000
34600.0000
36500.0000
38400.0000
40300.0000
42200.0000
44100.0000
46000.0000
47900.0000
49800.0000
51700.0000
53600.0000
55500.0000
57400.0000
59300.0000
61200.0000
63100.0000
65000.0000
OVERFLOW
NUMBER OF
ENTRIES
60.
245.
443.
585.
486.
467.
381.
374.
248.
239.
194.
158.
127.
121.
107.
94.
95.
69.
75.
70.
56.
56.
55.
35.
26.
24.
28.
17.
11.
15.
5.
6.
9.
19.
PERCENT
ENTRIES
1.20
4.90
8.86
11.70
9.72
9.34
7.62
7.48
4.96
4.78
3.88
3.16
2.54
2.42
2.14
1.88
1.90
1.38
1.50
1.40
1.12
1.12
1.10
.70
.52
.48
.56
.34
.22
.30
.10
.12
.18
.38
CUMULATIVE
X ENTRIES
1.20
6.10
14.96
26.66
36.38
45.72
53.34
60.82
65.78
70.56
74.44
77.60
80.14
82.56
84.70
86.58
88.48
89.86
91.36
92.76
93.88
95.00
96.10
96.80
97.32
97.80
98.36
98.70
98.92
99.22
99.32
99.44
99.62
100.00
CUMULATIVE
COMPLEMENT
98.80
93.90
85.04
73.34
63.62
54.28
46.66*
39.18
34.22
29.44
25.56
22.40
19.86
17.44
15.30
13.42
11.52
10.14
8.64
7.24
6.12
5.00
3.90
3.20
2.68
2.20
1.64
1.30
1.08
.78
.68
.56
.38
.00
                                                                               DISTRIBUTIONS
                                                                      FREQUENCY DISTRIBUTION
                                                                             CUMULATIVE DISTRIBUTION
                                                                *Q*****

                                                                ****Q**********************

                                                                *********Q*************************************

                                                                ****************Q********************************************

                                                                **********************Q*****************************

                                                                »*********************»*****Q*********************

                                                                *********************************Q*******

                                                                *************************************Q**

                                                                ***************************             o

                                                                **************************                Q

                                                                *********************                       Q

                                                                *****************                             Q

                                                                **************                                  o

                                                                **************                                   o

                                                                ************                                      Q

                                                                ***********                                         Q

                                                                ***********                                          Q

                                                                ********                                              0

                                                                *********                                             Q

                                                                ********                                               Q

                                                                *******                                                Q

                                                                *******                                                 Q

                                                                *******                                                  Q

                                                                *****                                                    0

                                                                ****                                                     0

                                                                ****                                                      0

                                                                ****                                                      0

                                                                ***                                                       0

                                                                **                                                        0
                                                                ***

                                                                **

                                                                **

                                                                **

                                                                ***
      CUMULATIVE   CUMULATIVE
       X ENTRIES   COMPLEMENT
            VALUE OF
            TEST
             5.0
            10.0
            25.0
            50.0
            75.0
            90.0
            95.0
            99.0
95.0
90.0
75.0
50.0
25.0
10.0
 5.0
 1.0
 5673.4690
 6936.3430
 9630.4280
14767.1900
23536.7100
36677.3300
44100.0000
57906.6700
Figure 4.  Typical example of statistics, histogram and graphs produced by MOUSE.
When it finds an error, it will tell where it
is and what is wrong.
  It is very possible to write a computer
program that contains  no syntactical
errors whatsoever  but is rife with logicaJ
errors. One way to detect logical errors is
to  examine  the  results  of intermediate
calculations for reasonableness. Utilizing
(a device known  as a  "Trace Line,"
MOUSE  will print  out  the value of any
variable at 1, 20, 50, and  100 iterations of
                   the  Monte  Carlo  method.  A  utility
                   program,   called  TRACER,   will
                   automatically insert these trace lines into
                   your program and, when you are finished,
                   remove them as well.
                     It is not always clear what probability
                   distributions should be  used for the
                   uncertain inputs  of an  environmental
                   engineering  model,  and  the  fitting  of
                   probability distributions to sample data is
                   a statistical  skill not possessed by  all. A
                                                    MOUSE utility program known as  IMP
                                                    (/nteractive Modeler for Probabilities) not
                                                    only  will  fit a  classical  probability
                                                    distribution to sample data,  it will  fit  an
                                                    empirical distribution  hand-drawn  on
                                                    graph  paper  as well  and also analyze a
                                                    data   set  for  auto-  and   bivariate-
                                                    correlations.
                                                      For environmental engineering models
                                                    involving algebraic  equations, MOUSE is
                                                    superior  to   either  general purpose

-------
programming or simulation languages. It    kind faster and easier than can  other    solution.  Further,  MOUSE  programs are
is concise, powerful, and convenient and    languages. With MOUSE, your attention    easier to understand, explain to others
easy to  use.  MOUSE  can  solve    is on problem-solving,  rather than on the    and  modify than are general  purpose
uncertainty  problems  of the algebraic    details of coding a program to compute a    programming or simulation  languages.

-------
The EPA author, Albert J. Klee (also the EPA Project Officer, seet&feifois
 the Risk Reduction Engineering Laboratory, Cincinnati, OH 45268.
The  complete  report consists of paper copy  and diskette, entitled "MOUSE
 (Modular Oriented  Uncertainty SystEm): A Computerized  Uncertainty Analysis
 System:"
    Paper Copy (Order No. PB 90-172 560/AS; Cost: $31.00, sub/ect to change)
    Diskette (Order No. PB 90-501370/AS, Cost $80.00, subject to change)
    (Cost of diskette  includes paper copy.)
The above items will be available only from:
       National Technical Information Service
       5285 Port Royal Road
       Springfield, VA 22161
       Telephone: 703-487-4650
The EPA Project Officer can be contacted at:
       Risk Reduction Engineering Laboratory
       U.S. Environmental Protection Agency
       Cincinnati, OH 45268
United States                   Center for Environmental Research
Environmental Protection         Information
Agency                        Cincinnati OH 45268
 Official Business
 Penalty for Private Use $300

 EPA/600/S8-89/102
C00085836   H«EHL
USEPA   REGION  V  LIBRAE*
230  5  DEARBORN  ST
as 1670
CHICAGO               IL 6C6QH

-------