United States
Environmental Protection
Agency
Environmental Research
Laboratory
Duluth MN 55804
EPA 600 3-80 055
June 1980
Research and Development
vvEPA
Sampling Strategies for
Water Quality in the
Great Lakes
LIBRARY
-------
RESEARCH REPORTING SERIES
Research reports of the Office of Research and Development, U S Environmental
Protection Agency, have been grouped into nine series These nine broad cate-
gories were established to facilitate further development and application of en-
vironmental technology Elimination of traditional grouping was consciously
planned to foster technology transfer and a maximum interface in related fields
The nine series are
1 Environmental Health Effects Research
2 Environmental Protection Technology
3 Ecological Research
4 Environmental Monitoring
5 Socioeconomic Environmental Studies
6 Scientific and Technical Assessment Reports (STAR)
7 Interagency Energy-Environment Research and Development
8. "Special" Reports
9 Miscellaneous Reports
This report has been assigned to the ENVIRONMENTAL MONITORING series
This series describes research conducted to develop new or improved methods
and instrumentation for the identification and quantification of environmental
pollutants at the lowest conceivably significant concentrations It also includes
studies to determine the ambient concentrations of pollutants in the environment
and/or the variance of pollutants as a function of time or meteorological factors.
This document is available to the public through the National Technical Informa-
tion Service, Springfield, Virginia 22161
-------
EPA-600/3-80-055
June 1980
SAMPLING STRATEGIES FOR WATER
QUALITY IN THE GREAT LAKES
by
Raymond P. Canale, Leon M. DePalma,
and William F. Powers
Civil and Aerospace Engineering
University of Michigan
Ann Arbor, Michigan 48109
Grant No. R803754-02-1
Project Officer
David M. Do Ian
Large Lakes Research Station
Environmental Research Laboratory-Duluth
Grosse lie, Michigan 48138
ENVIRONMENTAL RESEARCH LABORATORY
OFFICE OF RESEARCH AND DEVELOPMENT
U.S. ENVIRONMENTAL PROTECTION AGENCY
DULUTH, MINNESOTA 55804
1
U.S. mL<0«t»TAL
EDTSOI, 8, Jo OB817
-------
DISCLAIMER
This report has been reviewed by the Environmental Research Laboratory
Duluth, U.S. Environmental Protection Agency, and approved for publication.
Approval does not signify that the contents necessarily reflect the views
and policies of the U.S. Environmental Protection Agency, nor does mention
of trade names or commercial products constitute endorsement or recommenda-
tion for use.
-------
FOREWORD
One of the goals of the International Surveillance Program for the Great
Lakes is to detect long-term changes in the concentration of various para-
meters in each of the lakes. In order for this program to be successful,
data of sufficient quality and quantity must be provided so that trends can
be accurately assessed. However, each additional sample collected increases
the cost of the program. There is, therefore, a trade-off between accuracy
and cost and a sampling strategy must be identified that will obtain the
necessary accuracy at the minimum possible cost.
This report is the result of an effort to use basic ideas from control
theory and statistics to develop a methodology for identifying optimum
sampling strategies. Besides the applications included in this report, this
research effort has stimulated further work on different parameters and
space and time scales.
David M. Dolan
Environmental Scientist
Environmental Research Lab-Duluth
Large Lakes Research Station
-------
ABSTRACT
The major goal of this project was to investigate the potential applica-
tions of Kalman filtering and modern optimization techniques to the design
of sampling strategies for water quality in the Great Lakes. Two represent-
ative problems of general limnological interest with considerably different
characteristics were studied. The first problem concerns sampling require-
ments for long-term trend detection of chloride and total phosphorus concen-
trations in Lake Michigan. A minimum-cost sampling strategy has been deter-
mined in association with the Kalman filter and mathematical models that can
detect changes of 1.0 mg Cl/£ and 1.0 yg P/£ in Lake Michigan with 90 per-
cent probability. The cost of this sampling program is considerably less
than monitoring programs designed without benefit of mass balance models and
Kalman filter data processing. An investigation was conducted to determine
the sensitivity of the optimal cost to the accuracy of the mass balance
model parameters. It is concluded that the calculated sampling cost is most
sensitive to the variance in the apparent phosphorus settling velocity, not
as sensitive to the accuracy of laboratory analyses, and rather insensi-
tive to the accuracy of the effective flow at the Straits of Mackinac and
atmospheric loads.
The second problem concerns the potential applicability of Kalman fil-
tering techniques for concentration estimation and parameter identification
for a phytoplankton and nutrient cycling model for Saginaw Bay. The ap-
proach has been used to obtain estimates of the accuracy of model-predicted
phytoplankton concentrations in Saginaw Bay as a function of the uncertainty
associated with model parameters and initial conditions. It has been deter-
mined that the accuracy of uncalibrated model predictions is limited and de-
pends on the location within the bay. However, the accuracy of the predic-
tions can be improved if the uncertainties of the model parameters and ini-
tial conditions can be reduced by laboratory experiments or the availability
of field data. Examples are presented that show how suboptimal sampling
strategies can be developed and used to increase the predictive capabilities
of uncalibrated phytoplankton models in Saginaw Bay.
This report is submitted in fulfillment of Grant No. R803754-02-1 by the
University of Michigan under sponsorship of the U.S. Environmental Protec-
tion Agency.
-------
CONTENTS
Foreword i i i
Abstract iv
Figures vii
Tables viii
Acknowledgment ix
1. Introduction 1
2. Conclusions 3
3. Recommendations 4
4. Introduction to Kalman Filtering 5
Introduction 5
Kalman filter equations 5
A simple Kalman filter example 8
Summary 10
5. An Application to Lake Michigan Chloride and Total
Phosphorus Trend Detection 13
Introduction 13
Derivation of the stochastic model 14
Design of measurement strategies 26
Summary 33
6. An Application to Annual Plankton and Nutrient Cycles in
Saginaw Bay 34
Introduction 34
Saginaw Bay nutrient-plankton model 35
Error propagation without measurement 38
Suboptimal measurement strategy generation 43
Summary 57
References 59
Appendices
A. Discrete Kalman Filter and Optimization of Measurement
Strategies 62
Introduction 62
Discrete Kalman filter 62
Measurement strategy optimization 65
Application to nonlinear models 67
Summary of the Kalman filter and measurement
Strategy optimization 69
-------
B. Optimal Measurement Strategy: Overview of Computer
Program 71
Description 71
Problem formulation 72
Dimensions 73
Program operation 75
-------
FIGURES
Number Page
1 Changes in estimates of the state variable and variance
according to Kalman filtering technique 11
2 Overall summary of Kalman filter technique 12
3 Saginaw Bay model segments, sampling stations, and flow
pattern 36
4 Saginaw Bay model biochemical reactions and forcing
functions 37
5 Nominal states for model segment 5 40
6 Nominal state and standard deviation for chlorophyll a
for full and zero measurement cases 45
7 Nominal state and standard deviation for herbivorous
zooplankton for full and zero measurement cases 46
8 Nominal state and standard deviation for phosphate
for full and zero measurement cases 47
9 Nominal state and standard deviation for nitrate for
full and zero measurement cases 48
10 Parameter estimates and standard deviation for the phyto-
plankton growth rate for full and zero measurement
cases 49
11 Parameter estimates and standard deviations for the zoo-
plankton grazing rates for full and zero measurement-
cases 50
12 Parameter estimates and standard deviations for the zoo-
plankton respiration rate for full and zero measurement
cases 51
13 Parameter estimates and standard deviations for the phos-
phorus-chlorophyll ration for full and zero measurement
cases 52
vii
-------
TABLES
Number Page
...-.-,. ?
1 Summary of GLRD Data 16
2 Apparent Settling Velocity for Small Lakes 18
3 A Summary of Parameters, Constants, and Initial States ... 24
4 Required Number of Sampling Stations and Costs for Con-
servative Formulation of Tributary Load Error 28
5 Required Number of Sampling Stations and Cost for More
Realistic Formulation of Tributary Load Error 29
6 Optimal Measurement Strategy and Cost for Detection
Every Year 32
7 Estimated Means and Standard Deviations for Model
Parameters 39
8 Calculated Mean and Standard Deviations at the Peak
Populations for Zero Measurements 42
9 Calculated Standard Deviations of Parameter Values for
Various Measurement Strategies 54
10 Calculated Standard Deviations of Peak Populations for
Various Measurement Strategies 56
vm
-------
ACKNOWLEDGMENTS
The authors would like to acknowledge the assistance of several indi-
viduals who were involved in this study. Mr. Daniel Scharf from the Univer-
sity of Michigan expanded the computer program used in Section 5 and helped
perform the error propagation calculations in Section 6. Mr. Christopher
Hoogendyk also from the University of Michigan modified the USEPA phyto-
plankton model for Saginaw Bay so that it could be run on the Michigan
Terminal System. Mr. Steven Chapra from the Great Lakes Environmental Re-
search Laboratory provided some valuable suggestions with regard to the
formulation of the model for Lake Michigan in Section 5. Dr. Curtiss Davis
from the Great Lakes Research Division (University of Michigan) contributed
information concerning the accuracy of laboratory techniques for total phos-
phorus and chloride concentrations. Mr. William Richardson from the USEPA
provided cost estimates of the sampling strategies in Section 5. Finally,
special acknowledgment is directed to David Dolan and Nelson Thomas from the
USEPA who coordinated the project and provided many useful ideas concerning
problem formulation and preparation of the final report.
-------
SECTION 1
INTRODUCTION
The determination of effective sampling strategies for estimating water
quality in lakes depends upon many factors such as seasonal and annual vari-
ations, spatial variations, quality of the measurement techniques, accuracy
of the estimates desired, and economic feasibility. Because of these many
factors, a computer based approach could prove to be a very useful tool in
the design of sampling strategies for water quality in the Great Lakes.
This research is concerned with developing such a computer based approach
and demonstrating its feasibility on two different types of problems.
The basic mathematical approach employed in the solution of these
problems is an information processing technique known as Kalman filtering.
This technique couples knowledge of the quality of the measurement proce-
dures, mathematical models of the biochemical processes, and the adequacy
of these measurements and models (for example, variances associated with
measurement and model noise) to determine best estimates of the concentra-
tions of the water quality variables and the variance of the estimates. In
addition, if the number of variables involved is not too large, automatic
computer-based optimization techniques can be employed to define optimal
sampling strategies.
Mathematical models of long-term (30 years) phosphorus and chloride con-
centration .changes in Lake Michigan are developed along with variances asso-
ciated with these models in Section 5. This model and proposed monitoring
programs for the lake can be used to develop an optimal sampling strategy
that can be completely determined economically on a large computer. This
allows an extensive sensitivity study of the various factors that affect'the
measurement program. A modified LAKE 1 (Thomann e_t a1_., 1975) model is em-
ployed in Section 6 for the determination of a short term (1 year) measure-
ment strategy for phytoplankton, zooplankton, and nutrients in Saginaw Bay.
The resultant model contains eight concentration variables in each of five
segments. Furthermore, thirty-five additional differential equations are
used to improve some of the model parameter estimates. As a result, a
system has evolved that is too large for economical automatic computer opti-
mization. However, analysis of the variances developed by the Kalman filter
makes it possible to assess the adequacy of the model predictions and to
develop suboptimal measurement strategies.
A brief heuristic introduction to the Kalman filter is presented in Sec-
tion 4. More details concerning the mathematical procedures employed to
-------
develop the Kalman filter and to derive the minimum-cost sampling strategy
subject to the variance constraints are presented in Appendix A and in
DePalma (1977). The reader interested only in the water quality applica-
tions may proceed directly to Sections 5 and 6.
-------
SECTION 2
CONCLUSIONS
The results of this study indicate that Kalman filtering techniques can
be applied effectively to assist with the design of limnological and water
quality sampling programs in the Great Lakes. This conclusion is based on
encouraging results obtained for two demonstration problems. First, the
Kalman filter and optimization procedures have been used to design an opti-
mal long-term monitoring program for chloride and total phosphorus in Lake
Michigan. This sampling program detects concentration changes of 1 mgCIA
and 1 ygPA with 90 percent confidence and is considerably less expensive
than sampling programs suggested without benefits of mass balance models and
Kalman filtering. The procedures applied to this problem can also be used
to study the value of mass balance models for trend detection. As model
parameter uncertainties decrease, the model becomes more accurate and use-
ful, and actual field monitoring requirements and costs decrease. Prelimi-
nary calculations suggest that it may be appropriate to invest research
funds to improve analytical techniques for total phosphorus and to obtain
better estimates for the rate of phosphorus loss to the sediments.
Kalman filtering and state estimation techniques have been used to cal-
culate the uncertainty associated with phytoplankton model predictions as a
function of the accuracy of model parameters and initial conditions. Pre-
liminary calculations indicate that errors in uncalibrated model predictions
of phytoplankton in Saginaw Bay can be significant even if it is assumed
that the structure and nature of the system equations are perfect. Thus,
before accurate predictions can be made, it is necessary to improve esti-
mates of the model parameters with field monitoring or laboratory research
programs or to supplement model predictions with measured data to update
state estimates using the Kalman filter. These results have direct signifi-
cance when models are used for planning purposes such as predicting the im-
pact of nutrient control programs in the Great Lakes.
-------
SECTION 3
RECOMMENDATIONS
This study has determined an optimal, minimum-cost sampling strategy for
long-term trend detection of chloride and total phosphorus in Lake Michigan.
This strategy involves limited tributary samples and lake monitoring only
for those years when concentrations must be determined. It is recommended
that this sampling schedule be initiated if the only goal of the measurement
efforts is that of trend detection. Sampling requirements for other goals
(such as to define annual nutrient and plankton cycles) would obviously have
different characteristics. In such cases, it may be possible to again de-
fine the most efficient sampling program if the system dynamics can be for-
mulated to accommodate Kalman filtering methodology. Finally, it is recom-
mended that the optimal sampling be recomputed if research results or field
data become available in the future that would allow more accurate statisti-
cal assessment of the model parameters and processes.
The Kalman filtering and state estimation techniques applied in this
study have been used to estimate the uncertainty of phytoplankton model pre-
dictions in Saginaw Bay as a function of the accuracy of model parameters
and initial conditions. However, error propagation for only seven para-
meters (from a total of twenty) has been studied because of funding and time
limitations of this project. Furthermore, the error covariance calculations
depend on the adequacy of a linear approximation of the system state equa-
tions. It is recommended that the adequacy of this approximation and the
numerical methods employed be evaluated further before the results of the
calculations presented in this report are incorporated in water quality
planning projects.
-------
SECTION 4
INTRODUCTION TO KALMAN FILTERING
INTRODUCTION
The Kalman filter was derived by Kalman (1960) and Kalman and Bucy
(1961) and has had considerable application in aerospace engineering and,
more recently, to some problems concerned with water quality (Moore, 1971;
Beck and Young, 1976). In order to simplify the presentation, this section
describes the Kalman filter equations for a scalar system in which the water
quality variable is measured at distinct measurement times and the water
quality model is a differential equation. Complete derivation of the equa-
tions for other cases may be found in Gelb (1974), Jazwinski (1970), and
Sage and Melsa (1971), among others. Appendix A gives the Kalman filter
equations for a general discrete dynamic model for cases where only some of
the water quality variables are measured along with the method of employing
the filter in the determination of minimum cost measurement strategies for a
given level of required accuracy.
KALMAN FILTER EQUATIONS
The Kalman filter is an information processing technique that combines a
dynamic model and measurements to produce an improved estimate of the state
of the system (for example, the concentrations of the water quality vari-
ables). If an adequate dynamic model for the system does not exist, then
standard statistical techniques can be employed, using only the concentra-
tion measurements, to estimate the state of the system. Thus, Kalman fil-
tering is mainly applicable when some type of dynamic model of the system is
available. However, even dynamic models that give only seasonal or annual
trends usually justify the use of the Kalman filter.
Suppose that a mathematical model of the biological and chemical inter-
actions is a system of differential equations
x = f(t,x) + w(t) (4.1)
where x(t) is a n-vector that represents the concentration of the water
quality variables at time t (for example x-|(t) = phytoplankton concentration
at time t), f(t,x) is a function that describes the known rate of change of
x (for example, f contains the advective and dispersive transport, loadings,
and kinetic coefficients), and w(t) is the error in the model at time t.
-------
One of the most difficult problems in mathematical modeling is the quantifi-
cation of the error term w(t) because this term is an indicator of the value
of the model. If w(t) could be characterized completely, then, it could be-
come part of the f(t,x) term. For most applications, it is only possible to
statistically characterize the term w(t), which is usually assumed to be
zero-mean, white noise with variance Q(t). The white noise assumption is
conservative because it implies no time correlation in the model error w(t).
The accuracy of the dynamic equations is implied by the magnitude of Q.
Relatively large values of Q imply that f(t,x) is relatively inaccurate.
Quantitative determination of the true value of Q is very difficult, if not
impossible. Thus, in most applications conservative upper bounds are em-
ployed as estimates for Q. As an alternative, liberal estimates for Q (that
is, when Q = 0) give the estimation capabilities of a perfect model. Note
that w(t) includes any inaccuracies associated with the forcing functions of
the system.
In addition to statistically characterizing the error term in Equation
(4.1), it is also necessary to statistically characterize the initial condi-
tions of the system variables at time equal to zero, t0. Before the Kalman
filter can be employed, values for the mean (x0) and variance (P0) of the
initial state (for example, the concentrations of the water quality vari-
ables) must be available at a time prior to the first measurement.!
The problems solved in this report involve discrete sampling (as opposed
to continuous monitoring), therefore the measurement process can be des-
cribed as
y(ti) = x(ti) + v(ti) (i = 1,2,. ...N) (4.2)
where y(t-j) is the measured value of x at ti, v(ti) is the error in the
measurement at t-j, and N is the number of measurements. Appendix A dis-
cusses cases where only some of the concentrations are measured. It is as-
sumed that v has a mean value of zero and a variance R. Typically, the
measurement process is relatively well known and can be characterized ac-
curately.
Given the information in Equations (4.1) and (4.2) suppose it is desired
to estimate the concentration of a variable in a scalar system at t] > t0
(that is, at the first measurement time). This estimation will utilize both
the model prediction and measurement at t]. The Kalman filter is used to
calculate this estimate in two stages. The first stage involves a predic-
tion of x^at t] before the actual measurement is taken, which will be de-
noted by x(tT), and the variance associated with this estimate is denoted by
The predicted value of the concentration is simply the integration of
Equation (4.1) with w(t) = 0 from t0 to f| with x(t0) = x0. That is, inte-
grate
IIn this report, x refers to true values of the state variables, whereas x
refers to estimated values.
-------
x = f[t,x(t)], x(t0) = X0 (4.3)
to t] . This is equivalent to simply using the model to calculate concentra-
tions at t] under the assumption that the model is perfect. The variance of
the resultant concentration at t] for the scalar case is denoted as P(t])
and is calculated approximately by integrating
P = 2AP + Q (4.4)
with P(t0) = P0 from t0 to ti where
A - ^ tt,X] I (fl r\
A " ~ I (t,x(t)). (4'bj
This is called the sensitivity of the dynamic equation. This equation is
based on the theory of stochastic estimation and has been derived by
Jazwinski (1970) and others. If f(t,x) is linear in x, then Equation (4.4)
gives the true variance of x. Other approximations for P are discussed in
Appendix A. The approximation of Equations (4.3-4.5) is known as the ex-
tended Kalman filter.
Equation (4.4) is used to calculate the accuracy of the prediction based
on the characteristics of the system (for example, transport and kinetics)
and the model error. If the first term in Equation (4.4) is positive (for
example, during rapid concentration increases), then the associated vari-
ance increases between measurements, and initial condition errors are magni-
fied as time increases. Conversely, if concentrations are decreasing
rapidly, the first term in Equation (4.4) will be negative, and the accuracy
of the estimate could actually increase. A simple example will be presented
later in this section to illustrate this effect. The second term in Equa-
tion (4.4) (Q) is always positive. Thus, it can only cause the accuracy of
the estimate to deteriorate as time increases.
Integration of Equations (4.3) and (4.4) from t0 to t] is the first
stage of the Kalman filter, and x(ti) and P(tT) are the results. The second
stage involves processing the measurement y(t-|), as described by Equation
(4.2), to develop an improved estimate of the concentration at ti, which is
denoted by x(tj) and the associated concentration variance P(t|). The
Kalman filter equations for the scalar case for this stage are
x(t|) = x(tp + K(t1)[y(t1) - x(ti)] (4.6)
P(t{) = [1 - K(t1)]P(tp (4.7)
) + F^r1 (4.8)
where K(t]) is called the Kalman Gain. K is called a gain because, by
analogy with feedback gain in electronics, it determines how much weighting
is given to the measurement in developing the estimate for the concentration
at time tt. These equations have been developed by Jazwinski (1970) and
others ana are based on determining a minimum variance estimate for the con-
-------
centrations using information from both the mathematical model and imperfect
measurements.
It is instructive to consider two special extreme cases of Equations
(4.6-4.8). First, suppose that the measurement at t] is much more accurate
than the estimate generated by the model from to to t] (that is, P(t]) is
much greater than RI). Then,^from Equation (4.8) the Kalman Gain approaches
one, and from Equation (4.6) x(t]) ~ y(t]). Thus, the new information pro-
vided by the model has little significance. Also, note that from Equation
(4.7) the associated variance for this case is approximately zero. Of
course, P(t|) equals zero only for a perfect measurement defined by R-| = 0.
Next, consider the case where the measurement is poor and not as accu-
rate as the value predicted by the model (that is, P(tf) is much less^than
R]). Then, from Equation (4.8) the Kalman Gain approaches zero, and x(tj)~
x(t-T). In this case, the calculated concentration t| is approximately the
value determined by the model, or in the limit as R] ->°° it is as if no mea-
surements were conducted. ^Note that Equation (4.7) implies P(tt) ~ P(tp,
which must be the case if x(t|) ~ x(tp with R-| relatively large.
The remaining measurements y(t2),...,y(tn) are processed by simply pro-
ceeding in the same manner as the two stages noted above. That is, x(t|)
and P(tp are employed as initial conditions for the next time step, and
Equations (4.3) and (4.4.) are integrated from t] to t2 to produce x(t2") and
P(tp). Equations (4.6-4.8), with t2 and R2 (instead of t] and R-]), are then
employed to define x(t^) and P(t|). The Kalman filter is recursive (as
opposed to batch) because new concentration estimates are obtained each time
a measurement is taken. Another property of the Kalman filter, which is
very important in the determination of sampling strategies, is that if Equa-
tion (4.1) is linear in x or can be approximated by a linear function, then
the variance equations (Equations (4.4), (4.7), and (4.8)) are independent
of the actual measured concentrations y(t-)),.. ,y(tn) and depend only on the
quality of the measurements, defined by R], ,Rn. This allows the com-
putation of the variance and an evaluation of the effectiveness of a
sampling strategy before the sampling begins.
A SIMPLE KALMAN FILTER EXAMPLE
Probably the simplest mathematical model of the behavior of a pollutant
in a lake is the scalar differential equation
x = -ax + u(t) + w(t) (4.9)
where x(t) = concentration of the pollutant at time t, u(t) is the input
rate to the lake (for example, an industrial discharge), a is the positive
constant sinking or degradation rate, and w(t) is the model error resulting
from either imperfect understanding of the physical, chemical, or biological
mechanisms that affect x(t), uncertain values for a, or inadequate charac-
terization of the loading or forcing functions, or any combination of these.
Suppose that two measurements of x are to be made at t] and t2 with errors
v(t-|) and v(t2) that have mean values of zero and variances R] and R2-
8
-------
To apply the Kalman filter, statistics associated with the mathematical
model must also be supplied. Suppose that the period of interest begins at
t0 < t] with initial estimates of XQ and PQ that are obtained either by
measurement or by model propagation from a time prior to to- To illustrate
the behavior of the Kalman filter, suppose that the model errors are con-
stants with zero mean values and variances Qi and Q2. Also, suppose that
the forcing function, u(t), has a constant value of U] between t0 and t] and
a value of U2 between t0 and t].
With these assumptions, the prediction equations between measurements
corresponding to Equations (4.3) and (4.4) are
x = -ax + u
P = -2aP +
.j
which are easily integrated to form the solutions
x(t) = [x(tt) - u
(4.10)
(4.11)
(4.12)
P(t) =
Qi+1/2a
where x(t0) = x0 and P(t0) = PO- The values of x(t-j) and
surement are determined by Equations (4.6-4.8),
x(t{) = x(t') +[K] y(t1) -
P(t|) = (1-Kj) P(t')
P(t7)
(4.13)
after mea-
(4.14)
(4.15)
where, from Equations (4.12) and (4.13),
x(t-)
= (X
U]/a
P(t') = (P - Q/2a)e"2a(tl"to) + Q/2a
(4.17)
(4.18)
For the sake of illustration, suppose that y(t-|) is a very good measure-
ment compared to the prediction (R-j is much less than P(tJ)) and vice versa
for the y(t£) measurement (R2 is much greater then P(t2~)). Note that
because the model differential equation is stable (that is, -a < 0), use of
-------
the dynamic model improves the accuracy of the estimate if P > Q-i/2a (see
Figure 1).
Similar estimates can be developed at t2 with the results shown in Fi-
gure 1. In this case, Equation (4.16) implies K-| * 1 (because RI is much
less than P(tp) and l<2 ~ 0 (because R£ is much greater than PU^)).
SUMMARY
The Kalman filter equations for a scalar differential equation process
model with discrete measurements have been presented, and a simple water
quality example was employed to illustrate the behavior of the equations.
The Kalman filter equations provide appropriate weighting to both imperfect
model predictions and measurements to produce an overall best estimate of
concentrations based on minimizing the variance at each step. If the model
equations are linear or can be approximated by a linear system, then it is
possible to propagate the variance of the concentration values as a function
of the uncertainty of the measurements, model parameter values, model
forcing functions, and model structure. However, the estimates of the error
propagation are independent of the actual measurements themselves. The
overall techniques of Kalman filtering are summarized in Figure 2. In the
following sections, the Kalman filter equations will be used in the deter-
mination of optimal sampling strategies for total phosphorus and chlorides
in Lake Michigan and phytoplankton and nutrient dynamics in Saginaw Bay.
10
-------
Kalman State Estimate
Ka I man Variance Estimate
Figure 1. Changes in estimates of the state variable and variance
according to Kalman filtering technique.
11
-------
tr- t.
^v
^
or Statistics
ft_
UJ
CT
i
C
o
O)
s.
a;
-p
"£
E
o
V.
O
CL
o
c
g
i4_
O
c
*^-
c
o
o
M
p
(/>
v_
o
E
J»
(O
V-
o
(O
3
I/)
cr>
-------
SECTION 5
AN APPLICATION TO LAKE MICHIGAN CHLORIDE AND TOTAL PHOSPHORUS
TREND DETECTION
INTRODUCTION
One goal of the International Surveillance Program for the Great Lakes
is to detect long-term changes in the concentrations of chloride ion and
total phosphorus in each of the lakes. This section examines the cost-
effectiveness of such a surveillance program for Lake Michigan and demon-
strates how measurements of lake concentrations and mass balance models may
be effectively employed together for trend detection. When appropriate, the
extension of the techniques to other large lakes and water quality variables
will be discussed.
The present surveillance plan for Lake Michigan involves lakewide
sampling for two successive years every nine years and tributary sampling 26
times per year. Material balance models are not normally considered as
sources of information that can be utilized, along with data, to estimate
concentration changes and thereby detect trends. This research demonstrates
how uncertain parameters in mass balance models of lakewide average concen-
trations can be interpreted as random variables, so that both the model and
data can be employed in a Kalman filter. If the filter is used to estimate
the accuracy of lakewide average concentrations over the next thirty years
and a sampling program is prespecified, then the standard deviations asso-
ciated with the accuracy of the estimates can be precomputed. The estimated
errors must be small enough for trend detection, thus the standard devia-
tions must satisfy upper bound constraints.
It is also shown that the problem of choosing the appropriate number of
days for which tributaries must be sampled is equivalent to that of choosing
a controllable model error covariance matrix. The minimum-cost lake sur-
veillance and tributary sampling programs that meet accuracy criteria on the
estimates of lakewide average concentrations are determined. The criteria
are chosen to guarantee that trends will be detected with some specified
statistical confidence. A series of measurement strategy optimization
problems are solved to illustrate the role that Kalman filter techniques can
play in the design of monitoring programs.
13
-------
DERIVATION OF THE STOCHASTIC MODEL
The Indicator of Trend
Chloride is a conservative pollutant that occurs primarily in dissolved
form and is affected little by the annual biological cycle. Total phos-
phorus, however, includes phosphorus that is utilized in the biochemical
cycle of phytoplankton and other organisms. As phytoplankton cells mature
and die, they accumulate at the sediment-water interface, resulting in re-
moval of phosphorus from the water column. A measure of the phosphorus
available for biological production during a particular year is provided by
the lakewide average concentration of total phosphorus following spring
overturn. Because ice at this time may prevent a sampling program from
being implemented, it is convenient to assign May as the month when chloride
and total phosphorus concentrations are relevant to trend detection.
The lakewide average concentrations of chloride and total phosphorus for
May of year n are defined, respectively, by
Xi(n) = m^nJ/VCn) and X2(n) = m2(n)/V(n) (5.1)
where m-](n) and m2(n) are the pollutant masses in the lake (including the
nearshore zone but excluding the sediments) and where V(n) is the lake
volume. Lakewide average concentrations may be crude indicators of trend if
severe concentration gradients exist. However, if the lake is segmented
spatially, then mass balance models must be derived for each segment aver-
age, and the resultant model must include uncertain parameters for advective
and dispersive mass transfer.
In order to derive difference equations for the lakewide average concen-
trations, equations are first determined for the masses m-j(n) and m2(n).
During year n (May of year n to April of year n+1), mass exits the lake
through the Straits of Mackinac (SM) and the Chicago Sanitary and Ship Canal
(CSSC) and enters from tributary, atmospheric, and direct loading. Further-
more, a fraction of the total phosphorus that reaches the sediments during a
year is incorporated into refractory deposits, resulting in a net sedimenta-
tion loss (Hutchinson, 1957). These sources and sinks are approximated on a
net annual basis rather than on a seasonal basis, and their statistical
characteristics are evaluated with information presently available.
Straits of Mackinac
The net annual flow through the SM is toward Lake Huron; however, at
certain times of the year the flow is toward Lake Michigan. The net ex-
change of pollutant mass between the two lakes during one year is the dif-
ference between the flux towards Lake Michigan and the flux towards Lake
Huron. Each flux is the product of an annual flow and an effective concen-
tration. The effective concentration lies in the range of instantaneous
concentrations during the year. If qjn and qout are the components of an-
nual flow toward and away from Lake Michigan, and if X-jn and Xout are the
effective concentrations on the Lake Huron and Lake Michigan sides of the
SM, then the net loss from Lake Michigan is,
14
-------
Xout "out - Xin then the net loss is proportional to Qout - qin> which is
the net annual flow. However, chloride and total phosphorus concentrations
in Lake Huron are less than those in Lake Michigan because of dilution by
Lake Superior water. A conservative upper bound on the net loss is obtained
by setting Xin = 0» in which case the loss is proportional to q0ut- There-
fore, the actual loss must lie between Xout (qout - qin) and Xout q0ut and
is denoted by Xout qs, where qs is an uncertain effective outflow. Quinn
(1976) has computed net annual flows, qout - q-jn, of 224 x 10^£ to 522 x
10^£. The validity of the technique has been upheld by current meter data
from Saylor and Sloss (1976). Furthermore, Quinn suggests that these cur-
rent meter data can be employed to obtain the following relationship between
net flow and gross flow,
-«in>' (5'3)
If Equation (5.3) is used to estimate the gross outflow for 1950-1966 from
Quinn's net flow, then a range of 374 x 1CHU to 871 x 10"1U results.
The effective flow for each year, n, qs(n), is interpreted as a random
variable, uniformly distributed between the minimum net flow and maximum
gross flow for 1950-1966. The mean and standard deviation of this random
variable are, respectively, 547 x 1QTU and 186 x 10^£. The effective con-
centration on the Lake Michigan side of the SM for year n, Xou^-(n), must lie
in the range of concentrations in the outflowing water during the year. It
is assumed that this range is narrower than the range of concentrations over
the entire lake for May of year n. The effective concentration is repre-
sented as the sum of the lakewide average concentration in May, X(n), and a
deviation, es(n). That is,
XQut(n) = X(n) + es(n). (5.4)
The deviation is a zero mean random variable that is representative of the
spatial heterogeneity of the lake. Its standard deviation can be estimated
from unpublished data collected in the northern half of the lake by the
Great Lakes Research Division (GLRD) of the University of Michigan for April
and June 1976. A statistical summary of the chloride data from thirty-seven
stations sampled during the June cruise and similar results for total phos-
phorus for thirty-three stations sampled during the April cruise are pro-
vided in Table 1. Only stations of depth greater than twenty meters have
been included to avoid bias resulting from nearshore pollution. For each
station, the mean (unweighted with depth) and standard deviation of the data
have been computed. The standard deviation of the station means is assumed
to be representative of the variation in the data because the data suggest
that the horizontal variation exceeds the vertical variation. These stand-
ard deviations are 0.32 mg Cl/£ and 1.6 yg P/£. However, the total observed
variation is a result of not only spatial heterogeneity, but also laboratory
analysis error. Great Lakes Research Division has suggested that standard
15
-------
TABLE 1. SUMMARY OF GLRD DATA
Chloride
Station
no.
2
4
6
8
10
13
15
17
19
20
23
25
27
29
30
32
34
36
39
42
43
44
46
47
48
49
53
54
55
58
59
60
63
64
67
68
69
Number
of
samples
4
6
13
6
4
4
6
17
5
7
5
14
6
4
4
4
5
9
4
3
3
3
3
4
3
3
2
3
4
3
5
3
5
3
6
5
3
Mean
mg/£
7.73
7.68
7.69
7.70
7.80
7.70
7.72
7.73
7.86
7.89
7.73
7.73
7.68
7.63
7.79
7.72
7.74
7.73
7.58
7.70
7.68
7.68
7.34
7.58
7.55
7.45
7.01
7.10
7.32
6.58
7.07
7.06
7.00
6.89
7.84
7.83
7.90
Standard
deviation
mg/£
0.02
0.03
0.03
0.05
0.19
0.01
0.03
0.06
0.02
0.02
0.02
0.02
0.07
0.03
0.10
0.05
0.04
0.05
0.11
0.04
0.03
0.08
0.22
0.14
0.01
0.14
0.01
0.03
0.19
1.13
0.06
0.12
0.26
0.10
0.06
0.05
0.06
Total phosphorus
Station
no.
2
4
6
8
10
13
15
17
19
20
23
25
27
29
30
32
34
36
39
42
43
44
46
47
48
49
54
55
58
59
60
63
64
Number
of
samples
4
6
12
6
4
4
6
17
6
6
5
14
6
4
4
4
5
9
4
3
2
4
3
4
3
3
3
4
3
3
2
3
3
Mean
yg/£
7.3
7.9
8.1
9.1
14.8
8.6
7.4
6.7
7.0
7.1
6.9
7.4
6.9
6.8
8.2
5.9
6.5
7.9
6.3
6.5
6.5
7.8
6.6
6.1
6.4
6.2
8.4
5.9
5.8
6.1
6.0
6.4
5.2
Standard
deviation
yg/£
0.6
0.6
1.2
1.0
2.4
1.3
0.4
1.3
0.6
0.5
0.4
0.7
1.1
0.8
1.1
0.6
0.6
1.2
0.6
0.7
0.1
2.3
0.4
0.2
0.4
0.4
2.0
0.6
0.7
0.6
0.1
0.6
0.4
Mean of station means
Standard deviation of
station means
7.55 mg/£ Mean of station means 7.2
Standard deviation of
0.32 mg/£ station means 1.6 yg/£
16
-------
deviations for laboratory errors are 0.10 mg C1/& and 1.0 yg P/&J There-
fore, the standard deviations for station concentrations about the lakewide
average concentrations are 0.30 mg Cl/£ and 1.3 yg P/£. These are employed
for the standard deviations of £sj(n) and es2(n), respectively, where sub-
scripts have been introduced to distinguish chloride from total phosphorus.
The net losses of pollutant masses through the SM are, respectively,
(n) + es (nfl q$(n) and |~X2(n)
qs(n)
Chicago Sanitary and Ship Canal
The annual diversion of water from Lake Michigan to the Illinois River
through the CSSC has been limited to 28^7 x lO^A since 1970 by U.S. Supreme
Court decree. The outflow, denoted by qfc, is interpreted as a known con-
stant at this maximum value. The effective concentration in the outflow is
interpreted in the same manner as for the SM with resulting losses of
and
(5.6)
The zero mean random variables, £q(n) and ^(n), are different random
variables than esi(n) and es2(n) but have the same standard deviations, that
is, 0.30 mg Cl/£ and 1.3 yg P/JL
Refractory Sediment Deposits
For a lake in which the lakewide average phosphorus concentration in the
spring is at steady state, the phosphorus retention coefficient is
R = (min -
where m-jn and m0ut are the mass loading and mass outfall during a year. The
apparent phosphorus settling velocity S[m/yr] is that velocity such that the
net sediment loss for a year is Sm/h, where m is the lakewide mass in the
spring and R is the mean depth. According to Chapra (1975), the apparent
settling velocity can be related to the retention coefficient by
S = Rp(q/A)/(l-Rp),
(5.8)
where q is the total outflow and A is the lake surface area. Although Rn is
difficult to measure for Lake Michigan, data have been compiled for smaller
lakes. Resultant apparent settling velocities, computed with Equation
(5.8), are summarized in Table 2. These data suggest a range of 0.7 to 37.9
m/yr.
^Personal communication, Curtiss Davis, GLRD.
17
-------
TABLE 2. APPARENT SETTLING VELOCITY FOR SMALL LAKES
Lake
^Four Mile
Raven
Talbot
Bob
Twelve Mile + Boshkung
Halls
Beech
Pine
Eagle + Haliburton
Oblong + Haliburton
Rawaon
Clear
^Brewer
Clarke
Costello
Kearney
Aegerisse
Bodensee
Turlersee
Zurichsee
Hallwilersee
Griefensee
Pfaffikersee
Baldeggersee
Tahoe
Found
Little McCauley
S[m/yr]
11.0,11.0
11.3,13.3
9.3
16.2,16.3
22.9,21.4
25.5,29.4
8.6,16.8
5.0,1.4
8.6,14.6
14.5
5.6,4.9,6.0
4.3
3.7
17.0
8.7
6.5
12.1
37.9
26.0
11.3
4.1
15.3
23.1
11.8
5.7
7.2
8.8
Lake
3Mendota
^Joseph
Rousseau
5Conesus
Can ad ice
Honeoye
Washington
t
'Leman
Bay of Naples
Canadaligua
Carlos
Carry Falls
Cass
Charlevoix
Higgins
Houghton
Long (Aroostook Co.)
Long (Cumberland Co.)
Mattawamkegg
Moosehead
Pelican
Rangeley
Sebago
Winnipesaukee
S[m/yr]
11.1
8.8
13.6
35.8
30.9
23.3
27.0
3.2
14.1
5.1
4.5
20.2
15.1
8.9
2.1
1.6
5.3
8.6
14.4
4.6
0.7
4.9
7.8
11.0
iKirchner and Dillon (1975).
2Dillon and Kirchner (1975).
3sonzogni (174).
^Michalski et al_. (1973).
Sstewart and Markella (1974).
6Lorenzen e_t aj_. (1976).
7Larsen and Mercier (1976).
Multiple values represent different data sets.
18
-------
The mechanisms that govern the formation of refractory deposits are com-
plex. The exchange between the sediments and the overlying waters of a lake
depends on currents and degree of stratification, on phytop'iankton species
composition, and on biochemical cycles at the interface. Harrison et al.
(1972), for example, discuss the effect of oxygen and bacteria on the ex-
change. It would be extremely difficult to relate exchange mechanisms in
Lake Michigan with those of the lakes in Table 2. Therefore, S will be
interpreted as a random variable, uniformly distributed between 0.7 and 37.9
m/yr. The_mean and standard deviation are, respectively, 19.3 and 10.7
m/yr. If h(n) is the mean depth of Lake Michigan, then the net loss of
phosphorus to the sediments during the year n is
S(n) m2(n)/h(n). (5.9)
Tributary Loads
The GLWQB has identified twenty-seven major tributaries in the Lake
Michigan drainage basin. The mass of pollutant discharged into the lake
during a year by a tributary is the integral of the instantaneous flux over
the year. The Mean Value Theorem permits the integral to be replaced by a
sum
365
Load = Z) X , q, (5.10)
d=l d d
where q
-------
The error associated with this procedure is interpreted as a zero mean ran-
dom variable whose standard deviation is equal to that of data available for
the Maumee River (U.S. Army Corps of Engineers, 1975). The standard devia-
tions of these data are 15.6 mg Cl/£ and 271 yg P/£. These deviations are
probably conservative upper bounds on the standard deviations of error asso-
ciated with estimates of Lake Michigan tributary concentrations. This re-
port does not consider tributary-specific standard deviations, but rather
employs o^ = 15.6 mg Cl/£ and aj? = 2.71 yg P/£ for each tributary.
The error variances for the annual tributary loads are given by
2^2 22
0-i / j q , + crC x j q , and
(5.11)
22 2 2
°2 ^ qd + a2 ^ qd '
for chloride and total phosphorus. The calculations in Equation (5.11) re-
quire the tributary discharges qd- In order to compute optimal lake mea-
surement strategies for the next thirty years, however, standard deviations
must be available now. Therefore, the qj are replaced by the maximum daily
discharge on record for the tributary. This discharge for the jth tributary
is denoted by qmaxJ, and is computed from the maximum discharge at the
closest station to the mouth of the tributary by correcting for ungauged
drainage.
Conservative error variances for the total tributary loads for year n
are then computed by
27
r 2 2 i
a, D(n) + a' (365-D(n)) T, (qmax3)
L ' J j=l
E
2 2 I 2?
y D(n) + a; (365-D(n)) ^ (qmaxj)'
where it is assumed that both chloride and total phosphorus are analyzed in
the water samples collected on days D. The most appropriate number of days
of sampling for year n, D(n), is interpreted as a variable to be determined
in association with measurement strategy optimization.
Atmospheric Loads
Precipitation samples collected by Junge and Werby (1958) suggests that
the annual chloride atmospheric load for Lake Michigan is between 6.4 x 1012
and 8.7 x 1012 mg. In the mass balance model, the chloride load is denoted
20
-------
by the random variable a-|(n), which is uniformly distributed between these
extremes and, therefore has a mean of 7.6 x 10^ mg and a standard deviation
0.7 x 10^2 tng. Murphy and Doskey (1975) collected precipitation samples
from the shores of the lake. Their data are employed to compute a range of
12 x 10^4 to 18.5 x 10^4 yg for the annual total phosphorus atmospheric
load. This load is interpreted as a random variable, a2(n), uniformly dis-
tributed between the extremes, and thus has a mean of 15.3 x 10^ yg and a
standard deviation of 1.9 x 1014 yg. If data become available that define
more significant phosphorus loading, then the optimal strategies reported in
this report should be recomputed.
Direct Loads
Pollutants from some industries and municipalities discharge directly to
Lake Michigan or discharge to a tributary below the sampling point for that
tributary. The U.S. Environmental Protection Agency (1972) has computed a
direct annual phosphorus load for 1971 of 17 x 10^4 yg, which is significant
compared with the sum of 1970 tributary and direct loads of 97 x 10'4 yg
(Chapra, 1977). Corresponding data for the direct chloride loads are not
available, although the sum of 1960 tributary and direct loads was computed
by O'Connor and Mueller (1970). The direct loads are denoted by d](n) and
d2(n) and are interpreted as known constants because their standard devia-
tions are probably small compared to those of the tributary and atmospheric
loads.
Summary of the Model
The approximations for the sources and sinks of chloride and total phos-
phorus are employed to derive the stochastic difference equations for the
lakewide masses:
= m.,(n)- [x-,(n)+es (n)] qs(n)- [x](n)+£c (n)]
r -I r n (5J3)
m2(n+l) = m?(n)- X?(n)+e (n) q (n)- X9(n)+e (n) q
^ L t. :>2 J S L c. Cp J C
- S(n) m2(n)/R(n)+ [t2(n)+a2(n)+32(n)]
Quinn (1975) has computed Lake Michigan water levels for the first of
May for 1900-1972. The levels have a standard deviation of 1.1 feet and can
be converted to volumes with an accuracy of + 0.4 percent.' Volumes range
^Personal communication, Leonard Schutze, U.S. Army Corps of Engineers.
21
-------
from 4.89 x 1015£ to 4^97 x 1015£. Therefore, the volume has been inter^
preted as a constant, V = 4.93 x 1015. Similarly, the average depth is h
85m. Therefore, the equations for lakewide average concentrations are
X^n+1) = [l - qs/V - qc/V"]Xl(n)
- = (X e. (n) + q" e. (n) + X,(n) q (n)l
y L S S, C C-| I b .1
,(n) + a,(n) + d,(n
v L1 ' ' (5.14)
X2(n+l) = [l - q~s/V - q"c/V - S(n)/F]x2(n)
(n) + q e (n) + X2(n) q (n)l
C. 02 C- :> J
+ t [t2(n) + a2(n) + d"2(n)]
where qs(n) = qs(n) - Q^s and "qs is the mean of qs(n). The terms £s-j(n)qs(n)
and eso^^sC11) are small and have been neglected.
e
rr S
The apparent annual phosphorus loss rate may be interpreted as a model
state (X3 = S/h) with trivial dynamics
X3(n+l) = X3(n) (5.15)
and a random initial condition. Of course, by writing Equation (5.15), it
is hypothesized that the apparent loss rate is constant in time with some
uncertain value. Although this may be strictly incorrect, it is desirable
to demonstrate that the estimate of an uncertain model parameter can be im-
proved, that is, its error variance reduced by interpreting the parameter as
a model state. With this interpretation, Equations (5.14) and (5.15) repre-
sent a three-dimensional nonlinear stochastic model with states X], X2» and
*3-
At the initial year, n = 1976, the three state variables are uncertain.
From GLRD data in Table 1, mean values of 7.55 mg Cl/£ and 7.2 yg P/£ are
computed for X] (1976) and X2 (1976), respectively. The accuracy of these
estimates depends on the variance of point concentrations in the lake and on
the number of samples employed in the computation of the means. If the
average of L(n) station concentrations serves as the measurement, then it is
possible to determine a formula that can be used to relate the measurement
error and L(n) (DePalma, 1977). If 62 is the variance resulting from the
22
-------
heterogeneous nature of the lake and Y2 is the variance resulting from
laboratory analysis error, then the overall measurement error variance is
(62 + Y2)/L(n). (5.16)
The overall error variance of a measurement and Equation (5.16) are used to
compute the standard error of the initial concentration estimates. The
standard errors are 0.05 mg Cl/£ and 0.29 yg P/£ using data in Table 1.
An estimate for S in 1976 is 19.3 m/yr from Table 2. The standard de-
viation of the error in this initial estimate is the standard deviation
associated with the uniform distribution, that is, 10.7 m/yr. Note that
this does not equal the standard deviation of the data in Table 2 because
these data were only employed to establish a range for S. A summary of the
model parameters, constants, and initial states in given in Table 3.
Kalman filter State Estimation
Equations (5.13), (5.14), and (5.15) constitute a nonlinear stochastic
model with uncertain states (X], X2, and X3), uncertain parameter^ (es-|,_
£S2'-£C1' £C2' ^s» al' a2' t]» and t2)' and known constants ("qs, q"c> v» dl >
ana d2). The Kalman filter (Sage and Melsa, 1971) is a method that can be
used to calculate an unbiased minimum variance estimate of the state vari-
ables (chloride and total phosphorus). A detailed description of the basis
and approach of the Kalman filter is described in Appendix A. Briefly, the
method requires statistical characterization of both the model equations and
the measurements but does not depend on the actual measurements themselves.
It is this property of the filter that allows the development of an a priori
optimal measurement strategy. If estimates of the state variables are also
desired, then measurements are required. In order to implement the linear-
ized Kalman filter, the nonlinear state equations are linearized about nomi-
nal states that are computed by setting each uncertain parameter and initial
state to its mean value and then solving the state equations (Equations
(5.13), (5.14), and (5.15)). The nominal states have been computed assuming
that future loads are the same as present loads; however, the Kalman filter
approach is also applicable if future loads can be represented as known
functions of time.
The Kalman filter combines the model and measurements to compute an
estimate of the lakewide average concentration, which is theoretically more
accurate than estimates based on the model or measurements alone. The esti-
mation error variance, which is a measure of the accuracy of the Kalman
estimate, can be computed, and it depends on the model parameter variances
and measurement error variance but not on the actual measurements. Once the
number of days, D(n), on which tributary samples are collected and the num-
ber of stations, L(n), at which lake samples are collected have been speci-
fied (for a number of years beyond 1976), the estimation error variances can
be computed. Furthermore, D(n) and L(n) may be chosen to meet constraints
imposed on the estimation error variances.
23
-------
oo
LU
I
h-
oo
_1
l~«<
1
^ i
5^
i i
Q
^
**
oo
r-
e-f
i
OO
O
o
oo
ce:
LU
h-
LU
^f
f^S
t^
PI
u_
o
^_
rv
«g-
Kr1
)
OO
=c
%
oo
LU
1
CD
1
+J
C
c:
o o
3 4_i
-O ro
.p-
rO >
4-> CU
00 TO
f^
ro
CU
^
P-
£>
>r-
+->
Q.
(_
a
Q
C S«
p- a
rO +->
4J CU
S_ E
O) ro
0 S-
C ro
^ ^ ^ ^
CD CO CT> CD CO O} CO CT>
E^-S3. E ^ E 2-
i CM *d- CM ^t
r p-~ I p i
O O O O O
O O r r P- p- r-
OO OO OO OO
x xx xx
O i O r-
to i to i^. o~>
CO to
P- IP- O r-
CM i
CM 1
CM
CM
CM «d- CM ^d-
0 O O O
I 1 p 1
OOO OO XX XX
*^I" r^ to oo
oo to .
en r^ LD
ai .
to
\&~
c
ro
CU
E "a
E ro
0 E 0
E s- o
O *4-. S_
S- tt- -t->
t(_ c: o
0 20)
C T- 0 S_
O -M ' >r-
P- rO M- T3
-!-> S-
rO +-> CU TD
S_ C > E
4_>
d) C O "O
O O CU ro
C O M- O
0 M- i CM
o o cu to
cu oo cu >> -a
s: 01 oo a> 2: s- ro
OO ro O "3 OO fO O
S_ S_ 4-> P-
4_ CU M_ CU 4- 3
o > o > o ja u
n3 (O >r !
C C C S- S-
O CU O Q) O -I-J OJ
+J -p- +-> -P- -(-> "4 D-
rO 2 ro S ro O to
P- CU ( d) -P- O
>±> > -^ > E E
QJ ro CU ro CU 3 -(->
OP- O i O OO «=C
- «k
p CM
x 1 "O I'D
/ * ^ ' v + + .-^ ^^ v
r CM p- CM ^,
0,00 oo 01 /r,?0 J7~^
W OJ OJ W !CT +-> >-> ro ro
T3
CU
r- O
C
O tO
o -o
0
a.
cu
S-
o
o
to
c
0
i '
ro
P-
>
CD
~CT
S-
(O
T3
C
ro
to
t,_
O
OJ
CD
C
ro
S_
cu
_c
1
< N
to
~^
s-
o
.c
Q.
to
o
CL
^
O
r^
Ol
t
-a
p~
rd
a>
o
p-
i-
o
p^
^3
o
o
to
CTl
p
S-
0
a>
i- CM
rd i
00 A|
c~
rO O
11
E A|
cu ur>
JC 10
h- 00
.
. -»
00
3
S-
O
-C
a.
00
0
Q.
* *
=3"
r*^
CTi
P
T3
c~
ra
O)
1
S-
0
p
c~
o
un
LO
Ol
p
^_
0
ti-
0)
S-
ro
oo
C
ro
CL)
E
CU
.c
CM
24
-------
i LO
""o '~
oo
UJ
I
CO
c:
rd
l/l
C
O
00
JC
en
OJ
(J
O)
OJ
rd
CU
CO
oo
ro
CTl
in
co
O
^0
t/O
cn
rs
o
a>
Q.
a>
a
l/> O
!cr la-
t3
S-
rd
-o
C
rd
-4-J
oo
rd
r
4-5
C
I I
-P
r
C
ZD
C
O
^»
+->
rd
r
>
O)
rd
CD
c~
0
r
_[_>
Q.
r
^_
O
I/)
O)
+->
rd
to
^
CD
E
un
o
o
LT>
un
c:
0
i
i *
rd
S-
-4-J
C
0)
<_>
cr
O
o
O)
CD
rd
S-
0)
>
rd
O)
~O
r-
^
O)
rd
1
O)
~C3
s_
o
1
o
r--.
CT>
'""
~T^-
X
^
cn
o^
00
o
oo
o
f
_(_3
rd
S-
1 *
rr
O)
o
c:
o
o
O)
CD
rd
^_
O)
>
ro
CD
-a
i
s
O)
V
rd
00
S-
0
Q_
00
o
a.
to"
CT>
1
^C\l
X
1
S-
^o
oo
^-«
o
00
00
o
l-c
oo
cu
4J
rd
S-
co
(/I
o
r~
00
S-
0
Q.
00
O
Q-
VD~
C7i
^_
^"0"
X
25
-------
DESIGN OF MEASUREMENT STRATEGIES
The Great Lakes Water Quality Board (GLWQB) (1976) has recommended a
long-term schedule for intensive sampling of the off-shore waters of Lake
Michigan. The lake will be sampled two successive years every seven years
(1976, 1977, 1985, 1986 ... ). Tributaries will be sampled 26 times per
year. One goal of the program is to detect trends in the concentrations of
chloride and total phosphorus. This goal would be accomplished by comparing
estimates of lakewide average concentrations for the years when samples are
collected. The estimates of concentration for any one year would be based
only on data from that year. No procedure is presently available for intro-
ducing mass balance information into the estimation process.
It is appropriate to first examine the effectiveness of the proposed
GLWQB sampling program for the case where no mass balance information is
utilized and the data are not processed by a Kalman filter. Consider a
sampling program in which sixty deep-water stations are sampled two consecu-
tive years every seven years. If it is assumed that four depth samples are
collected at each station and are averaged to compute the station concentra-
tion and the mean of the station concentrations is interpreted as a measure-
ment of the lakewide average concentration, then from Equation (5.16) the
standard deviations of the chloride and phosphorus measurement errors are
0.04 mg/£ and 0.2 yg/£, respectively.
In order to evaluate the effectiveness of this surveillance program,
criteria for trend detection must be established. For demonstration pur-
poses, it will be assumed that changes of 1.0 mg Cl/£ and 1.0 yg P/£ must be
detected with 0.90 probability. Measurement errors must, therefore, fall
below 0.5 mg Cl/£ and 0.5 yg P/H with 0.90 probability. In order to convert
these criteria into constraints on the measurement error variances, it could
be assumed that measurement errors are normal random variables (which is not
true in general) or the Chebyschev Inequality (Davenport, 1970) could be
used that applies to any random variable regardless of its density function.
To avoid any unnecessary assumptions, the latter approach is chosen. This
inequality guarantees, with a probability of p, that the estimation error
will not exceed A, if the standard deviation does not exceed A /1-p. There-
fore, (for p = 0.90 and A - 0.5) the constraints on the standard deviations
are 0.158 mg Cl/£ and 0.158 yg P/£.
It is concluded that the GLWQB planned surveillance program does not in-
clude enough stations to meet the detection criterion for phosphorus (0.2
> 0.158 yg/£), but includes more than enough stations to meet the criterion
for chloride (0.04 < 0.158 mg/£). Note that the measurement error variance
will be the same for 1977, 1985, 1986, 1994, 1995, 2003, and 2004 if sixty
stations are sampled each year and the Kalman filter is not employed to pro-
cess the data. In order to meet the detection criteria (without the Kalman
filter) a minimum of about 108 stations must be sampled according to Equa-
tion (5.16) (Strategy I).
Although the criterion for total phosphorus trend detection is not met
by the monitoring program proposed by the GLWQB, a strategy may exist that
satisfies the criterion but costs less than Strategy I. An interactive
26
-------
optimization algorithm involving the solution of a sequence of linear pro-
gramming problems (DePalma, 1977) has been applied to compute the minimum-
cost strategy that satisfies the chloride and total phosphorus detection
criterion every nine years (Strategy II). The number of lake stations L(n)
is chosen between zero and 168, the latter corresponding to two ships,
operating two weeks, each sampling six stations per day. The number of
days of tributary sampling D(n) is chosen between twelve and 365. The opti-
mal strategy is summarized in Table 4. Tributary samples, beyond the basic
twelve per year, are not requested by the strategy, and lakewide sampling is
only requested for those years when constraints have been imposed on the
standard deviations. The standard deviations corresponding to Strategy II
and all other strategies to be discussed in this chapter have the following
basic characteristics. The chloride standard deviations are well below im-
posed constraints of 0.158 mg/ji for all thirty years, and the phosphorus
standard deviations equal the constraint of 0.158 yg/£ for the years when a
constraint is imposed and exceed 0.158 yg/£ for all other years. Clearly,
requirements for phosphorus detection control the measurement strategy and
the optimal strategy would not change if chloride were ignored in the opti-
mization .
Employment of the Kalman filter has reduced the cost required for trend
detection. If four water samples are collected at a station and analyzed
for all basic chemicals, including chloride and total phosphorus, then the
total cost incurred per transect is $3,320. If one water sample is col-
lected from each tributary and analyzed for all basic chemistry, then the
total cost incurred in sampling the system is $3,300 per day. "I Only twelve
days of tributary sampling are requested by the optimal strategy, which re-
presents a saving of $46,200 each year, and lake samples are requested only
for those years when constraints are imposed. The number of lake stations
required for each of the four years when a constraint is imposed is shown in
Table 4 as Strategy II. If no model were employed (Strategy I), 108 sta-
tions are required each year resulting in a four-year cost of $239,000. The
model permits a reduction in cost of lake sampling to $227,000. The overall
savings for lake and tributary sampling is $1,398,000 over a thirty-year
period.
Only $12,000 in lake sampling costs are saved by utilizing the mass
balance model because the formulation of the standard deviation of the tri-
butary load as a function of D(n) is very conservative (being based on maxi-
mum observed discharges and Maumee River concentration data). The above
minimum-cost strategy has been recomputed under somewhat more realistic as-
sumptions: that daily discharges equal the average daily discharge on re-
cord for a tributarty; that a-| and 02 are 1/50 of the values previously as-
sumed; and the a-j" and ap assume the values originally assigned to a-] and a?.
The number of lake stations requested by the optimal strategy (Strategy II)
is presented in Table 5 and is substantially less than for the conservative
assumptions. The costs for the more liberal ($153,000) and conservative
($227,000) cases bracket the cost that would be computed if actual values
for the discharges and standard deviations were available today.
'Sampling costs provided by William Richardson, EPA, Grosse He.
27
-------
TABLE 4. REQUIRED NUMBER OF SAMPLING STATIONS AND COSTS FOR CONSERVATIVE
FORMULATION OF TRIBUTARY LOAD ERROR
Year
Strategy
I
II
III
IV
V
VI
VII
VIII
IX
X
XI
XII
XIII
XIV
1977
108
106
106
106
92
84
83
76
67
88
70
52
34
25
1986
108
108
108
108
107
95
91
78
68
90
72
54
36
27
1995
108
100
100
100
100
94
91
72
63
82
65
46
29
21
2004
108
97
97
96
96
93
91
69
61
79
61
43
26
18
Cost
[$]
1,625,000
227,000
227,000
227,000
218,000
201,000
196,000
162,000
142,000
187,000
148,000
107,000
68,000
49,000
28
-------
TABLE 5. REQUIRED NUMBER OF SAMPLING STATIONS AND COST FOR MORE
REALISTIC FORMULATION OF TRIBUTARY LOAD ERROR
Year
Strategy
I
II
III
IV
V
VI
VII
VIII
IX
X
XI
XII
XIII
XIV
1977
108
106
106
106
84
56
55
76
67
88
70
51
33
25
1986
108
107
107
107
106
64
0
77
68
89
72
53
36
27
1995
108
45
41
33
45
36
0
33
29
36
27
18
11
8
2004
108
19
14
10
19
16
0
14
13
14
10
6
3
2
Cost
[$]
1,625,000
153,000
147,000
141,000
139,000
94,000
30,000
110,000
96,000
125,000
98,000
70,000
45,000
33,000
29
-------
It is of considerable interest to determine the reduction in measurement
cost that can be realized by applying research funds to the study of the
various mass balance model parameters. Inaccuracies in these parameters de-
pend on uncertainty in the pollutant concentrations in outflowing water, un-
certainty in the effective SM flow, uncertainty in atmospheric loads, and
uncertainty in tributary loads. Reduction of uncertainty associated with
the outflowing water would require the implementation of a routine sampling
program and will not be considered here. The second component may be re-
duced by intensive current metering and sampling in the SM to identify the
nature and magnitude of the return flow. Consider a hypothetical situation
in which the effective flow at SM is known precisely. The minimum-cost mea-
surement programs in Tables 4 and 5 (Strategies III) have been computed
based on this hypothesis, with conservative and liberal assumptions for the
formulation of tributary load error variance. The costs entered in these
tables represent the minimum costs achievable under such a research program
because the effective SM flow cannot be measured precisely. The difference
between Strategies II and III must be weighed against the cost of the re-
search.
A third component of the covariance matrix may be reduced by an inten-
sive atmospheric-load sampling program. The program would not have to be
routine because atmospheric loads would not be expected to change signifi-
cantly over the next thirty years. Consider a hypothetical situation where
the atmospheric loads are known precisely. The minimum-cost measurements in
Table 4 and 5 (Strategies IV) have been computed and represent an additional
reduction in cost compared to Strategies II. The savings must be weighed
against the cost of collecting and analyzing precipitation and dustfall
samples.
The fourth and final component of the covariance matrix, uncertainty in
tributary loads, has been assumed to be controllable in this report. How-
ever, because of the high cost of sampling the tributary system, none of the
strategies computed so far or to be computed, contain any samples beyond the
routine twelve per year. This is true for both the conservative and liberal
assumptions concerning the tributary load error variance as a function of
the number of samples.
The apparent annual phosphorus loss rate, X3, is also a model parameter
but has been interpreted as a state variable in the stochastic model. This
permits the linearized Kalman filter to estimate this rate in the same man-
ner as it estimates chloride and phosphorus lakewide average concentrations.
Of course, the estimate is not based on measurements of the rate itself but
rather on measurements of the total phosphorus lakewide average concentra-
tion. The standard deviation of X$(n) is constant between measurements and
decreases when a measurement is made. In general, the accuracy of the esti-
mate can be significantly improved by the end of the thirty year sampling
program. However, it would be desirable to dedicate enough research to the
mechanisms of total phosphorus removal in Lake Michigan to reduce the pre-
sent uncertainty in X3. If mechanisms of sinking and removal could be cor-
related with apparent settling velocity, then the range of uncertainty in X3
could be reduced from the present range of +_ 19.6 m/yr. Minimum-cost mea-
surement strategies in Tables 4 and 5 have been computed under hypotheses
30
-------
that the range of S is reduced immediately to ± 5.0 m/yr (Strategy V), ±1.0
m/yr (Strategy VI), or ± .5 m/yr (Strategy VII). It is evident that this
improvement in model accuracy can provide a significant reduction in mea-
surement cost. A decrease in the range of S beyond +. 5.0 m/yr, however may
be difficult given the present understanding of phosphorus exchange me-
chanisms.
Next, consider the effect of total phosphorus laboratory analysis error
on the optimal cost. The variance of laboratory analysis errors affects the
error of a lakewide measurement in Equation (5.16) where 6? is the variance
resulting from spatial heterogeneity and y^ "is the variance resulting from
laboratory error. Minimum-cost measurement (Strategy VIII) and (Strategy
IX) in Tables 4 and 5 correspond to cases where the laboratory standard de-
viation is reduced from its present value of 1.0 yg P/& to 0.5 yg P/£ and
0.1 yg P/£. The reduction in measurement costs is significant but must be
weighed against the research cost necessary to improve the present analysis
technique. Note that the reduction in error is limited by the variation of
point concentrations about the lakewide average concentration that cannot be
altered.
Finally, reductions in the optimal cost can be achieved by relaxing the
criterion of trend detection. Instead of requiring that changes of 1.0 mg
Cl/£ and 1.0 yg P/£ be detected with 0.90 probability, detection with 0.88,
0.85, 0.80, 0.70, and 0.60 probability may be acceptable. The Chebyschev
Inequality can be used to compute the constraints that must be imposed on
the standard deviations of estimation errors in order to meet the detection
criterion. If it is assumed that the estimation errors are normal, then, in
order to meet the detection criterion with 0.90 probability, standard devia-
tions would only have to be maintained below about 0.3 mg Cl/£ and 0.3 yg
P/&. This approximately corresponds to the constraint imposed by the
Chebyschev Inequality for 0.60 probability. Minimum-cost Strategies X to
XIV have been computed corresponding to probabilities of 0.88 to 0.60. The
cost of detection at 0.60 probability is about one-fifth the cost of detec-
tion at 0.90 probability, and a sharp increase in cost is incurred as the
probability increases from 0.80 to 0.90. Specification of trend-detection
criterion will likely remain an ad hoc scientific decision. The purpose of
the methodology applied in this section is to analyze the effect of parti-
cular uncertainties on the estimation process once detection criteria have
been established.
The optimal measurement strategies discussed in this section and sum-
marized in Tables 4 and 5 have been computed based on trend detection every
nine years. In general, the Kalman filter approach for trend detection de-
monstrates its superiority over conventional procedures (no model, only
data) more lucidly if detection is required every year. Consider the case
in which liberal statistics are employed for tributary loads and all other
statistics are as presented in Table 3. This case will, therefore, be
analogous to Strategy II except error variance constraints will be imposed
every year instead of every nine years. The optimal strategy and cost are
summarized in Table 6. It is observed that the required number of stations
is greatly reduced after the first few years of the program. If the Kalman
filter were not employed, then 108 stations would have to be visited each
31
-------
TABLE 6. OPTIMAL MEASUREMENT STRATEGY AND COST FOR DETECTION
EVERY YEAR
n
1
2
3
4
5
6
7
8
9
10
Number of
stations
106
89
70
54
42
33
27
22
18
15
n
11
12
13
14
15
16
17
18
19
20
Total
Number of
stations
12
10
9
8
7
6
5
4
4
3
Cost = $301,600
n
21
22
23
24
25
26
27
28
29
30
Number of
stations
3
3
3
2
2
2
2
1
1
1
32
-------
year costing $1,793,000 over thirty years. The optimal cost is only
$301,600 when the filter is employed.
SUMMARY
A nonlinar model has been derived that describes the long-term dynamics
of chloride and phosphorus lakewide average concentrations in Lake Michigan.
Statistics for each uncertain parameter in the model have been approximated
from available data. The nonlinear model has been linearized about a nomi-
mal, so that the linearized Kalman filter can be employed. Finally, the
role of this model was explored for measurement strategy design.
The monitoring strategy proposed by GLWQB cannot detect changes of 1.0
yg P/£ at 0.90 probability with or without the Kalman filter. However, a
minimum-cost strategy that meets the detection criterion every nine years
has been computed. The optimal strategy contains limited tributary samples
for both conservative and liberal formulations of the tributary load error
variance. It is concluded that such samples are not a cost-effective source
of information for detecting trends in lakewide average concentrations. It
is also concluded that the phosphorus error variance constraint controls the
number of lake stations required for trend detection because the chloride
error variance satisfies the constraints and thereby has no effect on the
optimal strategy. Furthermore, the optimal lake sampling strategy is very
simple. Lake samples must be collected only for those years when the detec-
tion criterion must be satisfied.
An investigation was conducted to determine the sensitivity of the opti-
mal cost to the statistics of the mass balance model parameters. It is con-
cluded that the calculated cost is most sensitive to the variance in the ap-
parent phosphorus settling velocity, not as sensitive to the accuracy of
laboratory analyses, and rather insensitive to the accuracy of the effective
SM flow and the atmospheric loads.
The actual design of a Lake Michigan surveillance program entails econo-
mic and practical considerations and would take into account other goals be-
sides chloride and total phosphorus trend detection. The analyses presented
here simply illustrate the manner in which Kalman filtering techniques can
aid in defining and subsequently reducing the requirements of trend surveil-
lance programs.
33
-------
SECTION 6
AN APPLICATION TO ANNUAL PLANKTON AND NUTRIENT
CYCLES IN SAGINAW BAY
INTRODUCTION
Section 5 described the use of Kalman filtering techniques to derive
optimal measurement strategies for long-term trend detection of chloride and
total phosphorus concentrations in Lake Michigan. The major goal of Section
6 is to conduct a brief preliminary investigation of the applicability of
similar techniques to define efficient sampling strategies for phytoplank-
ton, zooplankton, and nutrients such as nitrogen and phosphorus in Saginaw
Bay. Annual rather than long-term dynamic cycles and interactions will be
considered in an aquatic system where significant spatial gradients also
exist.
The EPA Large Lakes Research Station at Grosse He has supported an in-
tensive limnological monitoring program in the bay since 1973. Measurements
from this program have been used to develop nutrient and plankton models for
the system. These models have been used to predict phytoplankton levels in
the bay following enactment of various schemes to control the level of phos-
phorus input loading. These calculations of phytoplankton levels are based
on model equations and parameter estimates derived by comparing the model
output with actual phytoplankton concentrations observed during 1974 and
1975. It is assumed in these calculations that independent laboratory and
field data are not available to determine the model parameters. However,
data in the literature are used to establish ranges for these parameters.
Several interesting and practical questions arise concerning the reli-
ability of such procedures. For example, how good are the estimates of the
predicted peak or average phytoplankton concentrations under the assumed
conditions? It is known that uncertainties in the calculations are caused
by imperfect knowledge of each model parameter and each state variable ini-
tial condition. It would be useful to know how well the model parameters
and initial conditions must be known in order to calculate the peak phyto-
plankton concentration to within a known or specified degree of certainty.
If it is impractical or impossible to quantify the model parameters and ini-
tial conditions sufficiently, it may be necessary to supplement model cal-
culations for future conditions with measurements to obtain more accurate
estimates of peak phytoplankton concentrations. In these cases it is neces-
sary to determine the appropriate minimum number, frequency, and location of
measurements.
34
-------
Systematic and formal procedures have not previously been used to ad-
dress and resolve these kinds of questions associated with plankton-nutrient
models. However, the results in this section illustrate a promising ap-
proach to these issues using Kalman filtering techniques.
Another class of problems in Saginaw Bay concerns the design of cost ef-
ficient sampling programs that can be used to batter define the value of
model parameters and initial conditions. For example, suppose it is desired
to calculate the peak phytopjankton concentrations within a specified toler-
ance using the model without'supplemental data. Then, the techniques can be
used to estimate the required certainty associated with the model parameters
and initial conditions. This desire to determine the model parameters and
initial conditions is best fulfilled by performing field measurements
directly in the bay. It is of interest to determine cost efficient sampling
schemes (number and frequency of samples) to accomplish these goals. This
section introduces a methodology involving the use of Kalman filtering tech-
niques that can be used to calculate suboptimal and, eventually, the least
expensive sampling program.
SAGINAW BAY NUTRIENT-PLANKTON MODEL
Spatial and temporal variations of limnological variables in Saginaw Bay
are determined by solving dynamic differential mass balance equations for
each of five finite segments or cells as illustrated in Figure 3. This fi-
gure also shows the location of the Saginaw River, which is the major source
of nutrients to the bay. The bay is coupled to Lake Huron by advective and
dispersive hydraulic transport. The general flow pattern among various
model segments is represented by arrows in Figure 3. The major biological
and chemical kinetic reactions and forcing functions that affect the plank-
ton and nutrient annual dynamics are illustrated in Figure 4. The dynamic
behavior of the limnological state variables in each completely mixed finite
segment of the bay is obtained by numerically integrating a system of ordi-
nary differential equations of the form
V g£ - Ac + VR + W (6.1)
where V is an nxn diagonal matrix of volumes, c is an nxl vector of the
state variables, A is an nxn matrix of advective and dispersive transport, R
is an nxl vector of kinetic interactions, and W is an nxl vector of input
forcing functions for c. For m interacting variables and n finite segments,
a total of mxn equations must be solved. For this example, the state vari-
ables are phytoplankton (expressed as chlorophyll a), herbivorous zooplank-
ton, carnivorous zooplankton, organic nitrogen, nitrate nitrogen, ammonia
nitrogen, organic phosphorus, and reactive phosphorus.
A simulation model called LAKE-1 was developed by Thomann e_t a\_., (1975)
to describe the dynamics of similar phytoplankton-zooplankton-nutnent
interactions in Lake Ontario and the effects of nutrient loadings on those
interactions. This model was subsequently applied to Saginaw Bay by
35
-------
SAGINAW BAY
1974 Sampling Network
Legend oBoat Station
A Water Intake
Segment
Oi 110 KM
0" HOMI
Saginaw River
Figure 3. Saginaw Bay model segments, sampling stations, and
flow pattern.
36
-------
Material Sinks
(m/T)
Physical Transport
i
I
i
I
Phytoplankton
Concentration
Nutrient
Concentration
Time
Zooplonkton
(m/L3)
\
Prey
I
Gra*
1
'ing
Phytoplonkton
(m/L3)
Nutrient
Limitation
1
Nutr
(m/
i
Nut
\
tents
L3)
hentUse
\
i
Material Inputs
(m/T)
Input Flow Growth Rote Growth Rote Grazing Rate
S~~~
Temperature
^
Temperature
/^
Solar Radiation
jiA/n
HI I!V*~^/N«-W«
Time
Figure 4. Saginaw Bay model biochemical reactions and
forcing functions.
37
-------
Richardson and Bierman (1976). The Richardson and Bierman model was adapted
for this project to run on MTS (the University of Michigan Terminal System).
A general computer program was developed that implements models that can
be described by a set of differential equations using an Adams predictor-
corrector numerical integration method called DVDQ (Krogh, 1969). Solutions
for a particular model are obtained by calling a subroutine that evaluates
derivatives for the model using Equation (6.1) and any data files necessary
to define time variable forcing functions. Temperature and light cycles are
examples of time variable forcing functions used in the model. The equa-
tions for the derivatives and input data for this project were taken
directly from a computer listing supplied by the EPA.
ERROR PROPAGATION WITHOUT MEASUREMENT
The theoretical development in Section 4 has suggested how the Kalman
filter can be used to calculate the minimum-variance estimate of a state
variable based on both a linear stochastic difference model for the system
and actual field measurements. This section examines state variable error
propagation for cases where measurements are not employed. Such applica-
tions might arise when the models are used to predict future water quality
conditions that result from alternative nutrient control procedures.
Variance error propagation for linear systems is defined by Equation
(4.4). This equation can be applied to nonlinear systems by constructing a
linear system that defines the deviations of the states from a nominal solu-
tion generated from current best estimates of the parameters and state ini-
tial conditions (see Appendix A).
Estimates of the means and standard deviations for the model parameters
are listed in Table 7. Standard deviations for the parameters were obtained
by defining a range of values for a given parameter from the literature
(Canale ejt _al_., 1976, Richardson and Bierman, 1976, Thomann et jil_., 1975)
and assuming a uniform distribution over this range. For the purposes of
this illustration and to limit computer costs, only seven parameters were
chosen for study, while the other ten were assumed to be known perfectly
(see Table 7). The initial values of the states are dependent on observed
data and vary from model segment to segment. For the purposes of illustra-
tion, standard deviations associated with the states were arbitrarily
equated to 0.25 times their mean value.
The computer generated nominal gives the time variable behavior of the
states for each of the five model segments. These states depend on the
model structure, parameter values, and forcing functions. The nominal
states for model segment 5 are plotted in Figure 5. A 19-week time interval
was used in the calculations to approximate the first phase of the phyto-
plankton growing season and to reduce computation costs. Time zero in this
figure corresponds to a date of April 1. The calculated nominal states
agree reasonably well with measured values of the states for all model cells
and variables (Richardson and Bierman, 1976).
38
-------
TABLE 7. ESTIMATED MEANS AND STANDARD DEVIATIONS FOR MODEL PARAMETERS
Parameters
Nitrogen
Half-saturation constant
Organic nitrogen decompositon
rate
Ammonia to nitrate nitrifica-
tion rate
Nitrogen-chlorophyll ratio
Phosphorus
Half-saturation constant
Organic phosphorus decomposi-
tion rate
Phosphorus-chlorophyll ratio
Zooplankton
Conversion efficiency
Endogenous respiration rate
Herbivorous and carnivorous
zooplankton grazing rate
Phytoplankton
Chlorophyll half-saturation
constant
Endogenous respiration rate
(at 20 )
Settling velocity
Saturated growth rate
Optimum light intensity
Mean
value
0.025
0.007
0.015
0.01
0.005
0.007
0.00083
0.06
0.0026
0.04
20
0.0
0.05
0.20
350
Standard
deviation
0
0
0
0.00175
0
0
0.0003
0
0.0011
0.01
0
0.0225
0
0.091
0
Units
mg/£
-1 -1
day deg
day" deg"
mgN/ygCh
mg/£
day" ' deg"
mgP/ygCh
-i -I
day" 'deg"
£/mg C day deg
g/£
-i
day"1
m/day
day"!
langley/day
Carbon
Carbon-chlorophyll ratio
0.025
0.02
mgC/ygCh
39
-------
o 100
i
u
-006
o>
6
0031-
CL
(S)
O
a
0.8
CHLOR A
HERBIV
ZOO
3
0>
O
s
0.4 >
CO
o:
LU
X
0.02
PHOSPHATE
0.0\
CL
O
o:
o
1.0
Ld
a:
0.5
0.1
^
E
NITRATE N
AMMONIA N
0.05 §
0 5 10 15
WEEKS
Figure 5. Nominal states for model segment 5.
40
-------
Model application might require information regarding the uncertainty
associated with a model caused by imperfect knowledge of the parameters and
initial states, assuming that the model structure is exact in all other
ways. These standard deviations have been computed according to Equation
(4.4) assuming no measurements are made. Column (b) of Table 8 lists the
calculated standard deviations at the peak values of chlorophyll a, herbiv-
orous zooplankton, and carnivorous zooplankton using the data in Table 7 for
the uncertainty in the seven selected parameters. Column (c) from this
table lists the calculated standard deviations at the peak chlorophyll a and
zooplankton levels for a case where the standard deviations for the seven
selected parameters are reduced by a factor of ten. Column (d) gives simi-
lar calculations for the case where the original uncertainties are used for
the seven parameters but the uncertainties associated with the initial
states are reduced by a factor of ten. These results are an indication of
the magnitude of error propagation that would result when an uncalibrated
model is run using uncertain initial values, literature ranges for model
parameters, and accurate forcing functions.
The results in Table 9 show that the largest calculated standard de-
viations occur in segment 1 for all three states. Therefore, according to
these results, model predictions of the peak populations without measure-
ments are the most unreliable in segment 1. Furthermore, the calculated
standard deviations at the peaks in segment 1 are large compared to the mean
value of the peak populations (see column a). This, of course, indicates
that the model has limited value for predictions of peak states in segment 1
without measurements. Uncertainty accumulates as a function of time accord-
ing to Equation (4.4), which for this case depends only on the nature of the
system equations and parameter values. Relatively large errors occur in
segment 1 because it has a small volume and depth, relatively warm tempera-
tures, and high nutrient levels. On the other hand, the model has rela-
tively good predictive capability for peak plankton levels in segment 5,
which has a large volume and depth and low temperatures and nutrient levels.
The calculated standard deviations for the peak populations improve when
the standard deviations associated with the seven parameters are reduced by
a factor of ten. Such a reduction might be achieved by measurement of these
parameters in laboratory or field experiments. The effectiveness of these
improvements follows the flow pattern, which is in a counterclockwise direc-
tion. Standard deviations are only reduced about 1 percent in segment 4
whereas about 55 percent improvement is calculated for segment 5. An oppo-
site phenomenon occurs when the uncertainty of the initial states is reduced
and the uncertainty in the parameters remains the same. Reductions in the
calculated standard deviations are most dramatic upstream in segment 4
(about 84 percent) and smaller in segment 5 (about 11 percent).
These calculated trends for the standard deviations of the states are
consistent with physical considerations. Segments 4 and 2 are influenced
more by boundary forcing functions and initial conditions and less by the
structure of the equations and the model parameters. The changes in the
states of segments 1, 3, and 5 are effected by input from adjacent model
segments and the structure of the model and are relatively insensitive to
the initial conditions.
41
-------
CO
^r
o
HH
I
i
ZD
CL.
O
Q-
LU
Q_
LU
h-
1
GO
O GO
t I 1
=£ LU
. . «^-
l~~l ^~
>i i i
LU
LU o;
O ZD
GO
f ^ e^,
C£. LU
< s:
o
z o
1 LU
GO rvj
O rv
sz o
< u-
£.
LU
Q
LU
1
1
ZD
o
co
LU
CD
^f
1
o
cu
c >
T3 O O
^ re) -J3 Q.
T3 T3 res E
^ * c" »i »i
re) >
4-* CU S~
GO -O O
4-
T3
CU
C >
T3 0 0
S- -i- S-
O "O re) E
. * j^ >r »i
res >
4-> CU 5-
GO TD O
<+-
r-~
c~ re)
-DOE
C »^ t
> ^_
~ res 4-> O
.a ~a res c
« ^ ^- [
re) > s-
4-> O) O
GO T3 4_
^^
^,
T3 CO
c~ ^.
i re! i cu
i- co E CU
4-> 0) S- E
r- 4-> O (O
c re> c s_
r- 4-> re)
CO CX
CO ^~
S- res
cu E
4-> S- re) co
E C 4-> 4->
to -r- re)
S- -a c: 4->
re) c: -r- co
Q. rci
,_
co re!
M- i- -i-
O CU -t-J CO
1,3 , QJ
co eu c 4->
QJ £ -I- (O
3 res -t->
i S- T3 CO
res res c
> ex res
4->
cu re)
i cu re)
re) 4-> cu
> re) ex
c: co cu
res _c
CU C|_ 4J
2! 0
4 ^"^
O CO
cu rei cu
E CU CD
1 ' "
"CD
T3 r- 0
o cu c:
s: u
CO
CU
1 *
re!
i *
GO
i
CD
O
SI
CO f^. *s3~
en co «3-
CM CM en
r tO
^1" i LO
vo co ^d~
en en to
LO r
n
CO
co en ^*
O CM CO
O O CM
i > O O
-c: s- s-
CX 0 0
o > >
J_ !- -r-
O -Q C
< i- S-
^: cu re)
o x: o
co en CM
*<-
r CO IT)
CO CM
co CM r~~
tO LO LO
i CM
CO CO LO
en LO i
to LO to
i CM
CM <* O
LO CM i
CO 0 0
en «* r^
CM CM CM
c: c:
0 0
^ ^*
c c
re) re)
ex a.
o o
0 0
re) N N
i CO CO
r 3 ^
>> O O
-c s- s-
ex o o
0 > >
i- -1- -r-
O J3 C
i S- S-
.c CD re)
t > -r c i
OO r LO
*3- 1 1^
i CM CM
LO
to o oo
o «3- co
i en i
CO
A
CM
CM co r^-
CO LO LO
r en to
oo
«s
CM
oo to ^f
to ^t- >j-
IO O O
OO
to co to
OO CO CO
c c
0 0
^. -^
c c:
res re)
'a. "a.
0 0
o o
re) N N
i CO CO
i 3 3
>> O O
.c s- s-
cx o o
o > >
O -Q C
i S- S-
_C CU re)
C_> 31 C_3
oo en to
to o co
(^ O CM
^~
co en LO
to en oo
^=3-1
CM to r^.
r^ O co
^f LO i
CM r^ «3-
LO i i
=* O O
O LO 1^
«xj" *^- ^3"
C C
O O
-* JXL
$~_ £~
re) re)
"a. 'a.
O 0
o o
re) N N
i co co
r 3 3
>-> O O
-c i. s-
0. 0 0
0 > >
S_ -r- -r-
o *~* c
r S- S-
jc cu re)
c_> x: o
o <=3- r~.
<^- i oo
«3- o o
r ^" t
CM i «^-
CM 0 0
O O LO
en CM LO
*d- o o
CM en to
O CM i
^- o o
CO r LO
LO LO LO
C C
0 0
^£ ^.
c. c
re) re)
ex ex
o o
0 0
re) N N
i co co
i 33
>5 O O
-c s- s-
ex. o o
o > >
i_ *i )
O -Q C
S- i-
.c CD re)
o x: o
42
-------
The results of these few simulations are provocative and raise several
interesting questions. However, time and funding limitations and the origi-
nal purpose of this section (to investigate the large-scale problem poten-
tial of the approach described in Section 4) did not permit additional
analysis. For example, it would be important to study the effect of param-
eter identification with parameters other than the seven selected, to per-
form a more extensive study of the trade-off between more accurate knowledge
of initial conditions as opposed to more accurate knowledge of parameters,
and to perform Monte Carlo type simulations to determine the validity of
Equation (4.4) in predicting standard deviations for this class of problems.
SUBOPTIMAL MEASUREMENT STRATEGY GENERATION
The previous section has examined the propagation of errors for the peak
plankton populations when models alone were used to predict future condi-
tions without supplemental measurements. Because the uncertainties asso-
ciated with these predictions are unacceptably large, direct measurements
and Kalman filtering techniques have been employed to reduce the error vari-
ance associated with the state variable estimates. A major goal of this
section is to apply these techniques to a problem that is so large that
automatic optimization as applied to trend detection in Lake Michigan (Sec-
tion 5) is either too expensive or of questionable value because of limited
statistical knowledge of the chemical and biological mechanisms.
The computer program that was utilized in Section 5 to determine optimal
measurement strategies is a general purpose program (DePalma, 1977). The
optimization algorithm in the program is very efficient for certain classes
of problems, but it involves considerable storage. Automatic computation of
full optimal sampling strategies for Saginaw Bay involves computer storage
requirements well above the limit of the computing facilities at the Univer-
sity of Michigan. This difficulty can be addressed by modifying the ap-
proach to the problem to take advantage of the structure of the governing
equations or by employing only the simulation portion of the program to
analyze various measurement strategies. In this case, only the simulation
portion of the program has been employed to demonstrate how reasonable sub-
optimal measurement strategies can be developed with a minimal amount of
computation.
The mathematical model described earlier in this section is a nonlinear
differential equation system involving eight state variables in each of five
bay segments. Furthermore, it has been assumed that seven parameters in
each segment have unknown values that are to be improved during the measure-
ment process. Thus, seventy-five differential equations define the state
model. Only the growing season (ti = 1,2,...,19) is considered for the
range of admissible measurement times because of the large number of differ-
ential equations. The variance of a measurement can be improved by averag-
ing a larger number of samples. However, practical considerations dictate
that measurements at any time involve all eight state variables. In this
section, the minimum measurement variance in a given week corresponds to a
maximum practical sampling effort, while the maximum measurement variance
corresponds to no measurement. The measurement strategy designer then
43
-------
selects, with the aid of the Kalman filter, the set of weekly measurement
variances, times, and segments that results in a cost-effective sampling
program with acceptable accuracy.
The first step in this suboptimal measurement strategy selection process
is the generation of the covariance matrix for the full measurement case
(that is, minimum variance measurement for each week in each segment) which
corresponds to a calibrated model with a full data set. Next, the covari-
ance matrix is determined for the zero measurement case (that is, no mea-
surements at any time) which corresponds to an uncalibrated model. These
cases indicate the minimum and maximum variances attainable by the sampling
program. The purpose of the optimal sampling strategy is to determine which
partial data set will be adequate for accurate state estimation. Figures 6
to 9 show the nominal states and standard deviations for chlorophyll a,
herbivorous zooplankton, phosphate3 and nitrate for the full and zero mea-
surement cases in each cell. The dots represent the nominal values, the
vertical lines to the left of the dots indicate a one standard deviation
range for the zero measurement strategy, and the vertical lines to the right
give the one standard deviation range for the full measurement case. In
some segments the variance grows so rapidly that the zero measurement case
gives little information, whereas some predictive capability exists (without
measurement) in segment 5. This indicates that it is unacceptable to use
uncalibrated models for prediction without supplemental measurements. These
measurements of course correspond to what are normally required for model
calibration and subsequent verification. The Kalman filter can be used to
help define cost-efficient measurement programs.
Figures 10 to 13 show how measurements can be used to improve estimates
of the model parameters. The dots represent the best estimate of the param-
eter before the sampling program, the vertical bars to the left of the dots
indicate the one standard deviation range without any measurements, and the
vertical bars to the right of the dots indicate the one standard deviation
range with a full measurement strategy. Measurements can only improve
knowledge of a parameter. Note that in most cases the improvements are bet-
ter in segment 5 than in segment 1.
The full and zero measurement time history plots for all of the state
variables and parameters were used to hypothesize a simple suboptimal mea-
surement strategy. The following guidelines were useful in designing this
suboptimal strategy.
(1) Determine the segments in which measurements are the most
useful (for example, measurements in segment 5 are preferred
to measurements in segment 1).
(2) Determine the earliest time when most of the variance begins
noticeable decreases in the parameter plots (for example, in
most cases this occurs between t = 3 and t = 6).
(3) Determine the time region where the maximum variance decrease
occurs on most of the seventy-five plots (for example, this
occurs most frequently between t = 6 and t = 11.
44
-------
t
,
1
u
II
M
0
, '- ,
, » ,
It
H
L
J
J
&-
1
in
^n
^r
jj
jj
in
o
o in
f^ ^^ -
f/Bn/jv aoiHO
a- o
m
9w
^
UJ
Lu
m
0
O
UJ
LJ
IO
O
9£
UJ
UJ
U)
-o
E
fO
i-
o
q-
ro
o.
o
o
S-
o
q-
a
o
to
O)
-a
ro
-O
E 10
ro 0)
+-> in
00 ro
O
a
E -!->
(O E
O)
(O S-
4J 3
00 CO
ra
i d)
r- O
O CU
ZL N
0)
i-
O
45
-------
Oco
LU
UJ
in
UJ
LU
i-
o
tf-
E
O
- O
00
(T/5uj) ooz
o
ro
O
m
O
(r/6uj/ OOZ AI9H3H
y
H
HI _
H
M
H
H
M -
1 *
1
in
C)
H
JL,
A
A
A
A
H
a-
i
m
CM
O
o
CO
LU
LU
^
in
O
H O
_ '_.
-]
m
LU
LU
A
A
i
o
m
ro
O
(B
"o.
o
o
N
O
o
S-
o
q-
^-> S
10
o
fo a)
c: N
i
E -a
o c
'z. to
I
O)
3
C7>
ooz Aiea3H
46
-------
UJ
LU
-O
CM
O
o
3lVHdSOHd
O
UJ
UJ
O
O
m
O
BlVHdSOHd
o
s-
OJ
N
c
fO
4-
S-
o
(O
.E
O-
to
O
o
i
-t->
(O
>
(T3 tO
O) O
10 O)
r CU
CO S-
C 3
i (/)
E "3
O 0)
CO
S-
Z3
cn
47
-------
r
O CO
UJ
LU
5
o
in
m
s-
LU
LU
O
q
rd
o
01
c.
fO
q-
s_
o
q-
O)
-i->
re
5-
O
fO
>
OJ
-a
-a
S-
ea
o
E
(O
4->
CO
a to
E 0)
n3 co
(O
a> o
cu
fO S-
C 13
i CO
^~ fO
o cu
Z= E
a-i
cu
s-
en
Ni
48
-------
CO
O
CO
C>
I
ro
O
in
Oco
UJ
LL)
- O
Oi
UJ
in
O
(,/op) 31Vd H1MOH9
NOiMNVldOlAHd
to
O
c
O
rO
a.
o
O.
M- (/)
rc3
to O
c
O -4->
i- C
-i-> a>
(O E
i- a>
> s-
O) 3
O co
fO
-a aj
S- £
ta
^a o
c s-
(O O)
-M N
a
c:
ro
CO i
d)
E s-
i- o
4-> 4-
co
CU 0)
-t->
S- fd
a; s-
+->
a> ^:
E 4->
n3 2
i- O
ro S-
D- CD
a>
s-
H1MOW9
N01»NVHd01AHd
49
-------
in
O
m
O
in
o
in
LJ
UJ
m
- O
(6ap
9NIZVH9 NOlMNVTdOOZ
_ in
m
U-'
LU
$
O
8
c
o
O-
o
o
CU
in
O)
fO
O
O C.
r- CU
+J e
(O CD
r- S_
> 3
CU 5
O fO
CU
-a E
a) c
E T-
rci NI
S-- ro
res S-
Cu CD
s~
13
ONIZVb'J NOl>iN\ndOOZ
50
-------
8
o
00
O
O
o
O en
"~ y:
LU
LU
IT)
LO
LU
LU
o
O
O
IMOI.LVaidS3H NOlMNVldOOZ
S
o
o
c:
(O
"o.
o
o
tsl
IO
CD QJ
_c 10
-!-> (O
o
s_
O -M
(4- E
OJ
5 E
s= d)
O i-
to ra
-o o
S-
-O O)
S- N
fO
o -o
c: c
re fO
O 13
C 4-
na
s_
CO O
O) M-
+->
fO
r- (t3
4-> S-
co
(O
OJ S-
E T-
n3 CX
S- CO
fO Ol
Ou S-
O)
N01»NVldOOZ
51
-------
O
O
O
O
O in
~- ^
UJ
UJ
If)
O
9*
UJ
5
in
- O
Q.
O
S-
o
.C
O
S-
o
_c
Q.
l/l
O
^:
CL
CU
S-
o
O CO
r- OJ
-P to
rO rt3
r- O
>
O) 4->
-O E
O)
00 £
"O O
c: s-
03 CL)
N
to
O) TD
+-> c
res fo
tO
O>
M-
O)
E O
(O -i-
S- +->
fO fO
Q- S_
O)
S-
llAHdOdOlHD-SnHOHdSOHd
nAHdOdOlHD-SnaOHdSOHd
52
-------
Application of these guidelines suggests a suboptimal sampling program
that consists of full measurements only in segments 2 and 5 at t = 3, 8, 9,
and 10. The rationale for this strategy is that most of the variances do
not start decreasing until t = 3 and that the maximum variance decreases
occurred in the region t = 6 and t = 11. Both sides of the bay are sampled
because of the segmented structure of the system. Segment 5 is chosen be-
cause it gives the most improvements overall, and segment 2 is chosen from
the west side because it is connected to two segments. However, estimates
from segments 2 and 4 have similar variance levels, so the choice between
these segments is somewhat arbitrary. This suboptimal sampling strategy re-
quires approximately 8 percent of the effort of the full measurement
strategy.
A summary of some of the major characteristics of the suboptimal strat-
egy (as compared to the full measurement case) is presented in Tables 9 and
10. The current best estimates of the standard deviations of the parameters
are shown for all model segments in column (a) of Table 9. The standard
deviations attainable with the full measurement strategy are shown in column
(b). The standard deviations are reduced by approximately an order of mag-
nitude in segments 3 and 5, whereas approximately a half order of magnitude
reduction occurs in segments 1, 2, and 4. Larger reductions in segments 3
and 5 are probably due to the counterclockwise flow pattern in the bay and
the structure of the system equations. Thus, for parameter identification,
more information is gained from measurements in segments 3 and 5 compared to
segments 1, 2, and 4 for the same measurement cost.
The standard deviations for the parameters with the suboptimal measure-
ment strategy are shown in column (c) of Table 9. Substantial reductions in
the standard deviations are obtained even though this strategy involves only
8 percent of the effort of the full measurement strategy. Again, the most
significant improvements are obtained in segments 3 and 5. It is interest-
ing to note that estimates in segment 3 are more accurate than in segment 2
even though no measurements are performed in segment 3 while measurements
are performed in segments 2 at t = 3, 8, 9, and 10.
Important aspects of state variable (as opposed to parameter) estimation
are presented in Table 10. Standard deviations of the estimates of chloro-
phyll a, herbivorous zooplankton, and carnivorous zooplankton at their peak
population concentrations are presented for a number of cases. The calcu-
lated standard deviation at the peak population in each segment with the
full measurement case is given in column (a). Column (b) presents the
standard deviations at the peak population with the suboptimal strategy.
Suboptimal standard deviations are approximately an order of magnitude
larger than the full measurement standard deviations, except for chlorophyll
a in segments 2 and 5. Chlorophyll a is estimated more accurately in these
segments because the suboptimal strategy involves measurements in segments 2
and 5 during the critical peak growth period for chlorophyll a.
Herbivorous and carnivorous zooplankton are not estimated nearly as ac-
curately as chlorophyll a in segments 2 and 5. The reasons for this are
twofold. First, the suboptimal measurement strategy does not contain mea-
surements during the important growth periods for herbivorous and carniv-
53
-------
o;
0
i -
UL_
oo
LU
1
>
ry
LU
1
LU
CXL OO
CC LU
Q_ i i
O
U_ LU
O (
oo cc:
0 to
t t
1 1
< z
1 < 1 1 1
^^ ^-
LU LU
o o:
Q OO
eC LU
Q 2:
t^.
«=c oo
1 ZD
oo o
1 1
LU CC
1 >
eC
<
c '
.
cn
LU
m
1
-a
CU CO
-M > c:
i c o i o
3 CU S- <-
S- S- !- T- -O
CU 13 d C
4-> CO _E -i- O
14- ro +J O
ef CU -i
E 5
-o
CU
i -l-> > S-
i c o a.'
3 CU S- 4-> CO
C|- E O- CU CU
cu E E 3
T3 S- S- -i- ro r
~-- CU 3 S- rO
-1 J t/) C" f« ^
M- ro +J Q.
eC CU <-
E 3
i
i CU
S- ro S-
» a> i E 3 -M
O -M _Q !- CO C
- <4- 3 -(-> ro CU
0 E
1
CU
S- S-
* CU i 3 +->
jO -Mi CO C
- 14- 3 ro CU
13 CO
O CU
s-
CU
O 3
c
^_
cu
cu
E
S-
rd
Q-
o
^t~ *3~ r^
LO i OJ i OO
O cn i i LO r-~ o
UD O OJ O OJ O O
r i O O O O O
o o o o o o o
o
UD OJ CO LO OO
i o LO cvj cn *3- cvj
oo o f~~- r--. cn r o
LO CVJ O O O O O
o o o o o o o
o o o o o o o
o
oo
cn «3- co co oj
oo oo cn UD LO co O
oo cn to o cn i o
«* i O O O O O
o o o o o o o
UD
en r~>. LO LO
1JD 1 !*» i 1"-- CO O
CO <3- OJ O CO O O
CVJ r O O O O O
o o o o o o o
LO i
O LO O i O r-- oo
i OO CD i O i O
cn cvj i o oj o o
o o o o o o o
cu
ro
S-
o
-C CU !-
-4-> -M O -M
S ro CU T- »O
OS- -M O -M S-
S_ rd !- rd
cn c s- -t-> S- i
O CU n3 i
-a -i- -M c s- r >>
CU +-> rd O i -C
4-> ro S- !- i >, Q.
> Q. S-
rs CL c s- .c: o o
+J CO !- !- O- S- 1
rd CD N Q- O O JZ.
CO S- rd CO S- i CJ
S- CO O jc
c: c cn S- i o o
o o -E -t->
-M -M c c: o o
^: .* O o -t-> oo
C C -M +J O 3
rO rd -i} >-> O O S.- -M O
^: -E O O O
<£> r O O O O
O 0 O 0 O O O
o o o o o o o
OJ o
cn r-» LO ID
oo LO *3- «j- co oj i
OJ ID VO O O i O
«* O O r O O
o o o o o o o
ID
OJ ^t" ^^
r^« f^* Ovj ^xj" oo
LO co r^. i o oo O
cn i oj o «* o O
OJ ' O O O O O
o o o o o o o
LO i
O LO O O 1^ 00
r OJ O 1 O 1 O
cn oj r o cvj o o
o o o o o o o
CVJ CVJ OJ OJ OJ OJ OJ
cu
ro
o
-C CU -i-
+J -M O 4->
3: re CU -r rd
OS- .M O -M S-
S_ rd -r- rd
cn c: s_ -M s- r
O CU rd i
-0 -i- -M C S- , >,
CU +-> rd O i -C
+-> ro S- -i- i >> Q-
rd S- +-> r .c O
S- -i- cn ro >> CL i-
3 Q- C S- .C O O
rO CU M Q. O O -C
CO S- rO CO S- i O
S- CU O _C
c c cn s- u o
O O _C -M
-M -M C C O O
jy: ^K; o o -t-> co
C C -M +-> O 3
rO ro * v^ 4-> c J-
> c c: cu o
D. O. ro ro c Cn_c
O O i i O o O.
-P -M CL CL.a S- co
>> >^> O O S- 4-> O
£1 .c O O rd .- ^:
D_ a. M r-J c_3 2: Q.
r^ co
cn [« i~^ o r~»
oo LO cn LO i^^ ID o
< oo * o i^ r o
CO OJ O O CD O O
o o o o o o o
o o o o o o o
LO i
CO LO CO O CO
cn to co LO cvj i o
"JD r. CO O ID r O
OJ r O O O O O
o o o o o o o
o o o o o o o
1^,
cn cvj o
CVJ i CVJ «3- *
r^ cn o «3~ i cn O
CO CVJ OO O LO O O
o o o o o
o o o o o o o
OO I^~ CO
r*^ *^" r*^. co cn t-D <~~
^j~ ^d~ *JD o cn cvj o
ID LO O O O O O
O O O O CD O CD
O O O O O O O
LO i
O LO O i O f**. CO
r CVJ O r- O i O
en cvj i o oj o o
o o o o o o o
CO OO CO OO CO CO CO
cu
ro
S-
o
-c: cu [-
-t-> +-> O -M
S ro O -M S-
S_ rd -i- ro
cn c s- +-) s- i
O CU ro i
T3 -i- -M C S- i >)
CU 4-> ra O i .C
-M ro S- -r- r >> O.
rO S- -M i -C: o
S- <- cn ro >i Q. S-
3 a. c s- -c o o
rO CU M CL o O -E
co S- ro co S- > O
S- CU O .E
E E Cn S- r- O O
O O -E -M
-M 4-> E E O O
-*: .* O O +-> CO
E E 4-J +-> O 3
ro rO ^^ -^ -M E S-
i r E E CU O
Q. CL rO rO E Cn_E
O O r i O O Q-
-M -P Q. CL r*t j_ co
>i >> O O S- »-> O
J= .E O O (d -i- JC
Q- Q- M IVI C-3 ^ D_
^-^
-^
eu
3
>r_
-M
O
CJ
54
-------
. *.
~o
cu
r-
O
C_)
cn
UJ
1 J
CO
^£
1
T3
CU CO
r 4-> > C
i c o i o
3 o> S- res -r-
Ol CU E 4-> '-
cu 3 c c
4_ res 4-i o
ea: cu -i-
E S
-a
cu
i 4-> > S_
i C O CU
3 O) S- 4-> CO
c&_ g= Q_ CU CU
' > O) E E 3
ID J_ S- -r- res i
- cu 3 S- res
45 t/1 -^* fO ^
<4- reS 4-J Q-
=i: CU <
E 2
1
r CU
S_ res s-
' - CU 1 E 3 4->
O 4-) J3 -i- CO C
M- 3 4-> res a>
cC CO O- CU E
0 E
,
cu
s- $-
- O) i 3 4->
1 " <4 13 res cu
<^ cu. E
cu
[ * 4-3
c res
^ CU E
res s_ -r-
S- 4->
13 CO
c_> cu
S-
r O>
'aJ'E
C_> 3
J_
cu
4_>
OJ
E
res
S-
res
Q-
^i- cn CD
* -vi- cn i co
o i «3- i cn cn o
cn o CM o *d- o O
CM O O O O O
o o o o o o o
O 00
O I^ CM r
co ^J- cn co in cn CM
co cn r-~ o *3~ i o
00 i O O O O
o o o o o o o
O O O 0 O O O
O CM CO *d"
^J- CO O CD CD *3" i
cn r^> r^^ o co i o
CD i O O i O O
o o o o o o o
o cn
CM CO ^f i "3-
cn o CM i co o o
r^ CM co o Ln i O
CO i O O O O O
o o o o o o o
cn r
o cn o i o r«~ co
i CM O i O i O
cn CM i o CM o o
o o o o o o o
*3- <* ^t- «3- «=j- ** ^±
cu
4->
res
S-
0
^: a> <-
4-) 4-> O 4->
S res cu -i res
OS- 4-> O 4-> S-
S- res T- res
en c: S- 4-> S- i
O cu res i
T3 <- 4-> C S- i >,
a; 4-> res o i .c:
4-> res s- -i- r >> Q.
res s_ 4-> i jc o
s- -i- en res >i Q. s-
3 o. c S- -c o 0
4-3 CO -r- -f- Q. S- 1
res cu N Q. o o -C
co S- res co S- i o
s_ cu o .C
c c: cn s_ r o o
O O -C 4->
4-> 4-> C C U O
^* ^£ O C> 4-» CO
C C 4J 4-> O 3
res res ^^ NX 4-3 c S
r C C 0) 0
o_ Q- res res cz cn ^.
O O i r O O O_
4-> 4-> Q. Q.JD S- tO (
>> >> O O S_ 4-> O i
jn .c o o res -i- ^i
D- D- M rxi o z: a. |
55
in cn
co t oo ^)- LO
tn CD co co r-» i o
co t CM o cn CM o
CD CM O O O O O
o o o o o o o
O 0 O O O O O
CO ^J"
i co r~- i CD
rv. o co CM i i o
cn «=j- CM o en i o
CO i O O O O O
o o o o o o o
o o o o o o o
cn
cn co CM cn
cn o i CD en CM
r-. co r-. CM co cn O
i CO O O r O O
CM O O O O O O
O O O O O O 0
cn in co o
4-3 O 4-3
2 reS CU ' reS
OS- 4-> O 4-> S-
S- reS -i res
cn c S- 4-> S- r-
O cu res i
"D -r- 4-> C S- 1 >,
cu 4-> res o i -c
4-> res s- -i- i >, o.
res s- 4-J " .c o
s_ -i- cn res >> Q. s-
3 Q- C S- -C O O
4-3 CO !-!- Q. S- 1
res cu N O- o O x:
co s- res co S- i o
S- CU O .C
c c: cn s- i u o
O 0 -C 4->
4-3 4-> c c: u o
^ ^t O O 4-> CO
C C 4-> 4-> O 3
res res _*: _i*: 4-3 c: i-
i i C C QJ O
O- O- res res c: cnx:
O O i i O o Q.
4-> 4-J D- CL_D $_ co
>> >i 0 0 S- 4J O
-C ^: o o res -r- x:
Q- D- M r~>l C_3 Z Q-
-------
Q<
o
u_
oo
1 H
1
i
O-
O
D_
^
LU LU
CL. l I
03
LL. LU
O H-
^^
OO O£
z I
o oo
i i
& l~~
i i iii
^* ^~
LU LU
Q o:
§ <:
<=C LU
r"*t ^"
oo o
O O£
LU
<=C
^3
C_)
O
O
LU
1
CO
^
I -f_>
I C~
3 CL)
x 4 E
-O CD
S-. S-
CU 3
M ro
=£. 0)
E
-M
i C
3 CU
+- E
-^ CU
O S- S-
0) 3
_L j ,/ JO
<=C 00
S-
~-, OJ
rO +J
t|_
^
0) 00
> C
0 r 0
S_ ro <-
Q.-I- +J
E *-* '("~
r- -r- T3
C C
J^ ! O
-(-> 0
2
CU
> S-
O Q-
1
2
|
i 0)
ro S-
E =3 4J
Q. CD £
0 E
1
CU
s_
i 3 4->
r 00 C
=3 ro CU
H_ r^
ro oo
O> CU ^^
E Q- CU
f^ M- 2
0 --^
S-
i CU
O) i J3
O Q) 3
s: o c
00
0)
f~i
rO
»i
S_
ro
^»
CU
rO
j S
OO
oo o
F-^. 00
LO ro i
ro o o
CM 0 0
o o o
CM 00
ro oo i
OO O O
CM O O
o o o
CT) VO
o LO r^.
sf ro i
CM O O
CM O O
r O
00 CO
r^ oo i
OO O O
CM O O
O O 0
oo sj- r>-
r' i r
c: c
0 0
"^ S
c c;
ro ro
Q. Q.
0 O
O 0
ro N N
i OO OO
' Z3 ""^
>5 O 0
JT s- s-
Q. O O
0 > >
S_ -r- T-
O J3 C
±- s-
JZ CU ro
C_5 3Z O
CO
LO CO
GS LO r
O O O
00 O O
o o o
f^v. ^
r OO
i LO i
00 O O
CM O O
o o o
LO Sf
i f^, r^.
Sf Sf r^
00 O O
O 0 O
LO O
co cr>
O LO i
i O O
OO O O
o o o
CTi sf r*.
CM CM CM
5- j-
0 0
+2 t^
!Z C
ro ro
'Q.'Q.
0 0
o o
ro N N
i oo oo
=3 3
>i 0 0
-c s- s-
Q- O O
0 > >
0 JD c
s- s-
-CT Ol ro
C_3 HI C_)
O sf
r VO
i OO r
r-^ o o
sf O O
o o o
sf oo
O to
OO OO r
CM O O
sf O O
o o o
r*-* o
LO CT) IO
OO CM .
OO O O
oo o o
i LO
n to
sf OO i
en o o
sf O O
O O 0
to oo to
oo oo oo
c: c
0 0
"*"* ^
c c
ro ro
'Q.'Q.
0 0
0 0
rO NJ tsl
I 00 00
i ^3 ^3
>> 0 0
JT s- s_
CL O O
o > >
o Jo c
r- i- S-
JT OJ ro
O 31 O
r^
LO Sf
O to
CM O
CM O O
CTi O O
r- O O
O 0 O
r--
sf sf
CM O
LO O O
oo o o
o o
o o o
ro
oo sf
o to
i CM O
CM O O
sf O O
r- 0 0
00
to sf
o to
CM O
sf O O
O O O
CM O O
O 0 O
O LO r-^
sf sf sf
c c:
0 0
4-> 4->
c c
ro ro
Q. CL
O 0
0 0
ro tsl N
i 00 00
i 13 3
>> 0 0
J^ ^. i-
Q. O O
0 > >
i- -r- <-
O JO C
r- S- S-
jz rjj ro
0 3= C_>
sf
00 >
OO CM
oo r^ o
en o o
. 00
O O 0
r CM
CM r
r^ CM
CT> O O
to o o
r- 0 0
000
CTi CO
LO O
r^ LO CM
CM O O
CM O O
000
CM LO
Sf r
r^ CM
to o o
CTi O O
. 00
o o o
OO r LO
LO LO LO
c c
0 0
-(-> 4->
c c
rO ro
Q. CL
o o
0 0
(T3 hsj N
i 00 C/l
r 33 ^?
>» 0 0
JT s- s_
Q. O 0
0 > >
S_ ,- -r-
O r^ c~
S- S-
JZ CU ro
c_> re o
56
-------
orous zooplankton. Second, the variances for these two state variables
change most rapidly for t > 10, but the suboptimal measurement strategy does
not include measurements after t = 10. If peak population prediction of
zooplankton is desired, the measurements at t = 8 and 10 should probably be
eliminated and replaced by measurements t = 11 and t = 17. In such a case,
the suboptimal strategy would consist of full measurements at t = 3, 9, 11,
and 17.
It is also of interest to determine how the uncertainty of model states
changes when knowledge of the model initial states and parameters improves
when a full measurement strategy is employed. The first case involves full
measurements with an order of magnitude reduction in the initial standard
deviations of the parameters. The second case considers full measurements
with an order of magnitude reduction in the initial standard deviations of
the state variables. Physically, the first case corresponds to a systematic
improvement of the parameter values. This might occur during the season
year of a sampling program if the first year of monitoring resulted in an
order of magnitude reduction of the parameter standard deviations, or it
could occur as a consequence of laboratory or field experiments. Case 2
could correspond physically to a situation where an extensive initial
sampling effort allows the standard deviations of the state variables to be
reduced by an order of magnitude.
The results of these cases are summarized in columns (d) and (e) of
Table 9 and columns (c) and (d) of Table 10. A comparison of the results in
columns (d) and (e) with column (b) in Table 9 indicates that standard devi-
ations are reduced by approximately a factor of two to four and that im-
provements of parameters gives somewhat better results than improvements of
initial conditions for parameter estimation accuracy. It is interesting to
note that the peak concentration standard deviations are approximately the
same for all three full-measurement cases (see Table 10). Thus, an order of
magnitude reduction in the initial parameter or state variable standard de-
viations does not increase the capability of making more accurate peak value
predictions. This is probably because of large increases in the standard
deviations of chlorophyll a, carnivorous zooplankton, and herbivorous phyto-
plankton during critical growth periods, and only measurements in these re-
gions can insure reasonably accurate values. This is in sharp contrast to
the results presented earlier with no measurements where improvements in the
state initial conditions and parameter values significantly reduce the cal-
culated standard deviation at the peak populations (see Table 8).
SUMMARY
This section contains a preliminary investigation of the potential ap-
plicability of Kalman filtering techniques for concentration and parameter
identification associated with a phytoplankton and nutrient cycling model
for Saginaw Bay. It has been determined that the accuracy of uncalibrated
model predictions is limited. Therefore, field monitoring and laboratory
research programs are necessary to improve model predictive capabilities.
Cost-effective monitoring programs must be designed using trail-and-error
optimization rather than automatic optimization because of the relatively
57
-------
large number of differential equations required to describe the annual
nutrient and plankton cycles.
A suboptimal, but effective, sampling program has been developed to im-
prove the accuracy of the model calculations. An example monitoring program
for Saginaw Bay involves approximately 8 percent of the maximum number of
samples, it has been determined that this sampling program applied in the
context of the Kalman filter and the system model can significantly reduce
the uncertainty associated with predicted values of the state variables such
as chlorophyll a. Several tables are presented showing these results. Al-
though the limited analyses in this study are preliminary and incomplete, it
is concluded that Kalman filtering techniques can be applied to determine
the accuracy of limnological model predictions as a function of the uncer-
tainties associated with the model parameters or coefficients and initial
conditions. If the accuracy of such models is unacceptable, even with good
knowledge of parameters and initial conditions, it may be necessary to
supplement model predictions with field measurements of the concentrations.
In these cases, Kalman filtering and optimization techniques can be used to
determine efficient and cost-effective monitoring programs.
58
-------
REFERENCES
Beck, B., and P.C. Young. 1976. Systematic Identification of DO-BOD.
Journal of the Environmental Engineering Division, Vol. 102, No. EE5,
pp. 909-927.
Canale, R.P., L.M. DePalma, and A.H. Vogel. 1976. A Plankton-Based Food
Web Model for Lake Michigan. In: Modeling Biochemical Processes in
Aquatic Ecosystems, R.P. Canale, ed., Ann Arbor Science, Ann Arbor,
Michigan.
Chapra, S.C. 1975. Comment on an Empirical Method of Estimating the Reten-
tion of Phosphorus in Lakes by W.B. Kirchner and P.O. Dillon. Water
Resources Res., 11: 1033-1034.
Chapra, S.C. 1977. Total Phosphorus Model for the Great Lakes. ASCE J.
Environmental Engr. Division, Vol. 103, No. EE2, pp. 147-161.
Davenport, W.B., Jr. 1970. Probability and Random Processes. McGraw-Hill,
New York. 542 pp.
DePalma, L.M. 1977. A Class of Measurement Strategy Optimization Prob-
lemswith an Application to Lake Michigan Surveillance. Ph.D. dissert-
ation, University of Michigan, Ann Arbor, Michigan. 122 pp.
Dillon, P.J. and W.B. Kirchner. 1975. Reply to Chapra (1975). Water Re-
sources Res., 11: 1035-1036.
Gelb, A. 1974. Applied Optimal Estimation. MIT Press, Cambridge,
Massachusetts. 374 pp.
Great Lakes Water Quality Board. 1976. International Surveillance Program
for the Great Lakes. In: 1975 Annual Report, Great Lakes Water
Quality, Appendix B, Windsor, Ontario, Canada. 255 pp.
Harrison, M.J., R.E. Pacha, and R.Y. Morita. 1972. Solubilization of In-
organic Phosphates by Bacteria Isolated from Upper Klamath Lake Sedi-
ment. Limnology and Oceanography., 17: 50-57.
Hutchinson, G.E. 1957. A Treatise on Limnology, Vol. 1. John Wiley and
Son, New York. 1015 pp.
Jazwinski, A.H. 1970. Stochastic Processes and Filtering Theory. Aca-
demic Press, New York. 376 pp.
59
-------
Junge, C.E. and R.T. Werby. 1958. The Concentration of Chloride, Sodium,
Potassium, Calcium and Sulfate in Rain Water over the United States. J.
Meterology., 15: 417-425.
Kalman, R.E. 1960. A New Approach to Linear Filtering and Prediction Prob-
lems. Trans. ASME, J. Basic Engr., 82D: 34-35.
Kalman, R.E. and R. Bucy. 1961. New Results in Linear Filtering and Pre-
diction Theory. Trans. ASME, J. Basic Engr., 83D: 95-108.
Kirchner, W.B. and P.J. Dillon. 1975. An Empirical Method of Estimating
the Retention of Phosphorus in Lakes. Water Resources Res., 11:
182-183.
Krogh, F.T. 1969. VODQ/SVDQ/DVDQ - Variable Order Intergrators for the
Numerical Solution of Ordinary Differential Equations. Jet Propulsion
Laboratory TU Document No. CP-2308, NPO-11643. 22 pp.
Larsen, D,P. and H.J. Mercier. 1976. Phosphorus Retention Capacity of
Lakes. J. Fisheries Res. Board of Canada., 33: 1742-1750.
Lorenzen, M.W., D.J. Smith, and L.V. Kimrnel. 1976. A Long-Term Phosphorus
Model for Lakes: Application to Lake Washington. In: Modeling Bio-
chemical Processes in Aquatic Ecosystems, R.P. Canale, ed., Ann Arbor
Science, Ann Arbor, Michigan.
Michalski, M.F.P., M.G. Johnson, and D.M. Veal. 1973. Muskoka Lakes Water
Quality Evaluation, Report No. 3: Eutrophication of the Muskoka Lakes.
Ontario Ministry of the Environment, Toronto, Ontario, Canada. 57 pp.
Moore, S.F. 1971. The Application of Linear Filter Theory to the Design
and Improvement of Measurement Systems for Aquatic Environment. Ph.D.
Dissertation, University of California at Davis, California. 218 pp.
Murphy, T.J. and P.V. Doskey. 1975. Inputs of Phosphorus from Precipita-
tion to Lake Michigan. EPA-600/3-75-005, U.S. Environmental Protection
Agency, Duluth, Minnesota. 27 pp.
O'Connor, D.J. and J.A. Mueller. 1970. A Water Quality Model of Chloride
in Great Lakes. ASCE J. Sanitary Engr. Div., 4: 955-975.
Quinn, F.H. 1975. Lake Michigan Beginning-of-Month Water Levels and
Monthly Rates of Change of Storage. National Oceanic and Atmospheric
Administration Report No. TRERL326-GLERL2, Boulder, Colorado. 32 pp.
Quinn, F.H. 1977. Annual and Seasonal Flow Variations through the Straits
of Mackinac. Water Resources Res., Vol. 13, 137-144.
Richardson, W.L. and V.J. Bierman, Jr. 1976. A Mathematical Model of Pol-
lutant Cause and Effect in Saginaw Bay, Lake Huron. In: Water Quality
Criteria Research of the USEPA, EPA-600/3-76-079, Corvallis, Oregon.
cp. 138-158.
60
-------
Sage, A.P. and J.L. Melsa. 1971. Estimation Theory with Applications to
Communications and Control. McGraw-Hill, New York. 529 pp.
Saylor, J.H. and P.W. Sloss. 1976. Water Volume Transport and Oscillatory
Current Flow through the Straits of Mackinac. J. Physical Oceanography,
Vol. 6, 229-237.
Sonzogni, W.C. 1974. Effect of Nutrient Input Reduction on the Eutrophica-
tion of the Madison Lakes. Ph.D. Dissertation, University of Wisconsin,
Madison, Wisconsin. 352 pp.
Stewart, K.M. and S.J. Markella. 1974. Seasonal Variations in Concentra-
tions of Nitrate and Total Phosphorus and Calculated Nutrient Loading
for Six Lakes in Western New York. Hydrobiologia., 44: 61-89.
Thomann, R.E., D.M. DiToro, R.P. Winfield, and D.J. O'Connor. 1975. Mathe-
matical Modeling of Phytoplankton in Lake Ontario - 1. Model Develop-
ment and Verification. EPA-660/3-75-005, U.S. Environmental Protection
Agency, Corvallis, Oregon. 177 pp.
U.S. Army Corps of Engineers. 1975. Lake Erie Wastewater Management Study:
Preliminary Feasibility Report, Buffalo, New York. 167 pp.
61
-------
APPENDIX A
DISCRETE KALMAN FILTER AND OPTIMIZATION OF
MEASUREMENT STRATEGIES
INTRODUCTION
The Kalman filter equations for a scalar differential equation process
model with discrete, full state measurements were presented in Section 4 to
develop insight into the basic procedure. In this appendix, the Kalman fil-
ter equations for a discrete process model (for example, the discretization
of a system of differential equations by numerical integration formulas) and
discrete measurements will be presented along with the method employed to
minimize the cost subject to constraints on the state estimate variance.
DISCRETE KALMAN FILTER
In this section, the process model is assumed to be of the form
x(k+l) = <|>k x(k) + w(k), (k = 0,1,...,k) (A.I)
where x(k) is an n-vector describing the state at index k (for example, time
tk)» ^k is an nxn matrix describing the known relation between the state at
x(k) and x(k+l), and w(k) is the model equation error at k (for example,
model inadequacy, lack of knowledge of parameters in 4^, etc). (Since the
index k is usually associated with time, k will usually be referred to as
time instead of index.) Equation (A.I) is a linear difference equation;
however, most water quantity models are nonlinear. The Kalman filter and
optimization equations will be presented for the linear system of Equation
(A.I). It will be shown how to transform nonlinear water quality process
models into the form of Equation (A.I) later in this section.
Equation (A.I) may result naturally (for example, in Section 5 a dis-
crete model is employed to describe year-to-year changes of phosphorus,
chloride, and sinking rate) or from a discretization of a differential equa-
tion model (for example, in Section 6 the LAKE-1 differential equation model
is discretized to form a system of difference equations).
As with the development of Section 4, to apply the Kalman filtering ap-
proach to Equation (A.I), statistics associated with the initial state,
x(o), and the model error, w(k), must be supplied. Again it is assumed that
the model error is zero-mean, sequentially uncorrelated (white) noise, that
is,
62
-------
EJw(k)] = 0 , E[w(k)w(j)T]= Q(k)5kj (A. 2)
( 1 , k=j
where 5, . = , , . , and the initial state statistics are
KJ (. U , KrJ
E[X(O)] = xQ , PQ - E[(x(o)-xo)(x(o)-xo)T]. (A-3)
In Section 4 it was assumed that the full state was measured at each
measurement time. In this section, the general situation in which measure-
ments of linear combinations of the states are made at certain times is con-
sidered. A linear combination of states at time k is a weighted sum of the
individual states. If a row vector Cm is introduced to write concisely this
weighted sum, then a measurement of type m at time n may be denoted
= cm xCO + vmk> (A. 4)
where ym^ is the scalar measurement of the linear combination Cm x(k) and
vmk ^s the associated measurement error with
Rmk (A.5)
If at any time k all states are measured, then the Cm set is the identity
matrix. If M types of measurements are made at each time k = 1, ..., N,
then M times N measurements are made in total. As an example of this nota-
tion, consider an eutrophication model with states x-j, ..., X3 corresponding
to the eight water quality variables: phytoplankton, zooplankton, dissolved
phosphorus, particulate phosphorus, nitrate, ammonia, organic nitrogen, and
silicon, respectively. The measurement type corresponding to a measurement
of only zooplankton biomass would be
C-] = [0 1 0 0 0 0 0 0], (A. 6)
and the type corresponding to a Kjeldahl nitrogen measurement (ammonia plus
organic nitrogen would be
C2 = [0 0 0 0 0 1 1 01. (A. 7)
The error variance for a measurement of type m at time k, Rm|<, depends on
the number of vertical hauls made at the station. The variance of a total
phosphorus measurement would depend on the number of subsample digested and
analyzed.
The Kalman filter combines the information provided by the model in
Equation (A.I - A. 3) and the measurements in Equations (A. 4 and A.5) to com-
pute the conditional mean estimate of the system state, denoted by x(k),
which is the estimate based on all measurements taken prior to and at time
63
-------
kJ The filter is recursive in that it permits x(k) to be computed from
x(k-l) and the measurement vector at time k. Recursive estimation is a com-
mon procedure, for example, least squares curve fitting is commonly formu-
lated recursively so that each data point is processed as it is received
rather than the data being batch processed when all are available.
As in Section 4, the Kalman filter equations are applied recursively in
two stages. Given x(k-l), P(k-l), the first involves the propagation of x
and P to the next measurement time, k. The state prediction is simply the
difference equation without the model error, tha+ is,
x(k-) = k.-| x(k-l),
and the covariance propagation is
Q(k-D,
(A.8)
(A.9)
which has an interpretation analogous to Equation (4.8) in Section 4, that
is, the covariance is influenced by the dynamics through the first term and
the model inadequacy through the second term of Equation (A.9).
The second stage involves processing the measurement at time k, and the
equations are essentially the same as Equations (4.6 - 4.8) in Section 4
since discrete measurements were assumed there. In particular
M
P(k) =
r r "
m m
-1
R
'mk
P(k")
and
(A.10)
x(k) - x(k") + P(k)
m=l
ymk-Cmx(k"
(A.11)
The difference between these equations and Equations (4.6 - 4.8) result from
the following: the special way that the measurements are treated at each k
in Equation (A.4) (that is, as a set of scalar measurements); the fact that
a full state measurement is not assumed; and, because this form is more con-
venient for the measurement strategy optimization problem. Also, because
the measurements are treated as a set of scalar measurements at a given k
(which is advantageous for optimization of measurement strategy purposes),
the Kalman gain provides little insight for this case and is not introduced.
As with the case of Section 4, note that if Rm|< is relatively large (indi-
cating a relatively bad measurement), then the coefficient of P(k~) in Equa-
tion (A.10) is approximately I, which implies P(k) * P(k"), that is, the
n this section, x(k) is analogous to x(tk) of Section 4. The + sign is
not employed because it becomes awkward for indices of the form k-1, k+1,
etc.
64
-------
measurement has little influence. Conversely, if the R^ are relatively
small, the inverse of j
I + Y. CT C /R ,
m = 1 m m mk
will also be relatively small, which results in a reduction of P(k~) in
forming P(k).
MEASUREMENT STRATEGY OPTIMIZATION
It is evident from Equation (A.11) that the estimate for x(k) for all
k = 1, ..., K cannot be computed until the measurements have been taken.
that is, until ymk are available for each m = 1, ..., M and k = 1, ..., K.
However, the accuracy of the estimates, which is manifested in the covari-
ance matrix P(k) for k = 1, ..., K, can be precomputed from Equations (A.9)
and (A.10) once the necessary matrices, vectors, and scalars have been
specified. This point is vital to the technique for definition of optimal
measurement strategies. The estimation error covariance matrices P(k) can
be computed for every k = 1, ..., K once the measurement strategy has been
specified for k = 1, ..., K. Computation need not wait until the actual
measurements have been taken. The Kalman filter exhibits this property be-
cause the statistics of model uncertainty and measurement uncertainty are
not computed during the processing but rather are assumed known prior to
time zero. The manner in which the statistics might be quantified are de-
monstrated in the applications in Sections 5 and 6.
The vector estimates, x(k), for k - 1, ..., K are comprised of n indi-
vidual state estimates x-j(k) for i = 1, ..., n. The accuracy of x-j(k) is
reflected in the estimation error variance of x-j(k), tnat is the itn dia-
gonal entry of the estimation error covariance matrix P(k). It is of con-
siderable interest to investigate the effect of the measurement strategy on
the variances P-j-j(k). In particular, it may be of interest to calculate the
minimum-cost measurement strategy that satisfies constraints
P-j-jtk^Pmax-jk for i = 1, ..., n and k = 1, ..., K. (A. 12)
In Equation (A.12), P-j-j(k) is a scalar, and the Pmaxik are bounds that are
imposed to guarantee that the resulting estimates of x-j(k) are of sufficient
accuracy to serve some purpose. In Section 5, for example, values of Pmax-j^
are chosen to guarantee that the state estimates will be accurate enough for
long-term pollutant trend detection.
The optimization problem of this section will be formulated to be com-
patible with DePalma (1977). In order to accomplish this, assume that a
measurement ym|/ is actually the arithmetic average of a number of multiple
measurements. In particular, suppose L^ multiple measurements are made of
the form
y1' = Cmx(k) = r1' for i = 1, ..., l_k (A. 13}
65
-------
where ri is a measurement error (associated with y1) and zero mean and vari-
ance Rm. The resultant measurement is the arithmetic average of these L|<
measurements, or
If the correspondence
Lk
rmk = IT £ r' (A-
K 1 I
is made, then Equations (A.4) and (A.14) are equivalent. Furthermore, if r1'
are independent random variables, then the variance of the resultant mea-
surement error is inversely proportional to Lk,
Rmk = Rm/Lk- (A.16)
For many types of measurements, the number of multiple measurements is
limited by such practical considerations as the number of stations that can
be sampled in one day or the number of analyses a laboratory can schedule.
For such situations in which each measurement must be computed from no more
than Lmax multiple measurements,
0 £ L < Lmax for each k = 1, ..., K. (A.17)
Furthermore, the cost of multiple measurements is usually proportional to
the number of multiple measurements. Therefore, the total cost of a mea-
surement strategy would be
K
k^1 a Lk (A.18)
where a is the cost of taking one measurement of each type m = 1, ..., M.
The unit cost a and the maximum number of multiple measurements Lmax are as-
sumed not to be functions of time k.
In the optimization problem, Lk is interpreted as a variable that can be
chosen from the interval defined by Equation (A.17). Besides having control
over the measurement strategy, it may be desirable to control the model
error covariance matrices defined by Equation (A.9). The motivation for the
formulation that follows is the trend detection problem examined in Section
5. In that problem, one component of model error is caused by tributary
load uncertainty. The variance of this error, however, depends upon the
number of days for which the tributaries are sampled. In this optimization
problem, both a measurement strategy (in the sense defined in the preceding
66
-------
paragraph) and a tributary sampling program will be sought that minimize
cost and satisfy Equation (A.12).
First, define an nxn matrix at each time k
n(k) = Q(k)
(A.19)
and consider a situation in which fi(k) may be decomposed into the sum of an
uncontrollable matrix £2u(k) ^nd a controllable matrix ^c(k). Furthermore,
assume that only the diagonal elements of fic(k) can be controlled and that
other elements are zero. DePalma (1977) has demonstrated that the error
variancescof tributary load estimates can be interpreted as the diagonal
entries ^-j-j(k) of a controllable matrix ^c(k). Control over these elements
is feasible if it is possible to control the number of days, D(<, for which
tributary water samples are, collected. A cost of Dk will be assigned to the
tributary sampling program. The total cost, over K years, is then
K
E
(A.20)
The optimization problem considered by DePalma (1977) requests the mea-
surement strategy (Lk for k = 1, ..., K) and the number of days (Dk for
k = 0, ..., K-l) which minimizes the total cost (sum of Equations (A.18) and
(A.20)) subject to constraints on estimation error variances defined by
Equation (A.12). An algorithm is described by DePalma (1977) to solve this
problem, and it is applied to the trend detection problem of Section 5. The
theory behind the algorithm utilizes the theory of mathematical programming
and will not be presented here.
APPLICATION TO NONLINEAR MODELS
In the previous sections of this appendix, only linear models were em-
ployed as a source of information to estimate the state of a physical
system. Most physical systems,
mined in this report, cannot be
additive modeling uncertainties
X(k+l), is a nonlinear function
tainty in this transition cannot
including the water resource systems exa-
described adequately by a linear model with
In general, the system state at time k+1,
of the state at time k, X(k), and the uncer-
be interpreted as additive. A very general
nonlinear stochastic difference equations of the form
= fk[X(k), W(k)l (A.21)
will be considered where f is a vector of n nonlinear functions, each de-
pendent on some or all of the states X(k) and the uncertainties W(k). The
initial__state_and the random uncertainties have means and covariance ma-
trices X(0), W(k), P(0), and Q(k), (k - 1, ..., K), respectively.
The Kalman filter is a scheme for computing the conditional mean esti-
mate of X(k), based on a linear model and measurements. Exact schemes for
67
-------
computing the conditional mean of X(k) for a nonlinear model have not yet
been developed. Approximate schemes have been derived and are presented in
Sage and Melsa (1971). Most of these methods solve^two couple difference
equations, one for an approximate conditional mean X(k) and the other; for an
approximate estimation error covariance matrix P(k) associated with X(k).
Because the equations are coupled and X(k) depends upond the actual measure-
ment values, the covariance matrix P(k) also depends upon the actual mea-
surement values, the covariance matrix P(k) also depends on the measure-
ments. Therefore, P(k) for k = 1, ..., K cannot be precomputed, even if the
measurement strategy has been specified prior to time zero. Because of this
limitation, these methods for approximating the conditional mean will not be
considered further in this report.
However, a method has been proposed (the linearized Kalman filter) for
which the approximate estimation error covariance matrix can be precomputed
and is not dependent on the actual measurements. Rather than X(k) being
estimated directly, the deviation of X(k) from a nominal state Xnom(k) is
estimated. The nominal state is that which one would expect if random
model uncertainties W(k) were known (and were equal to their means) and the
initial state was equal to its mean. The nominal is computed by
Xnom(k + 1) = fk[Xnom(k), W(k)], (A. 22)
which employs the full nonlinear model with Xnom(O) = X(0). The deviation
of the state from the nominal is defined by
x(k) = X(k) - Xnom(k). (A. 23)
A linear approximation to the dynamics of x(k) can be derived by expand-
ing fk in a Taylor series and ignoring terms of order two or higher. In
order to simplify notation, the vectors of partial derivatives are denoted
by
f x. = 3fk/ax.(k)
k n k n
f w. = 3f./3w.(k)
k n k 1
Xnom(k)
W(k)
(A. 24)
Xnom(k)
W(k)
The dynamics of the deviation x(k) are approximately governed by
X] fkx2...fkxn]x(k) +w(k) (A.25)
where the initial deviation x(0) has zero mean_and covariance matrix P(0)
and where w(k) = [f|
-------
variance matrix Q(k). Note that the bracketed quantities are matrices be-
cause f^x-j and f^w-j are vectors. By making the correspondence
$k E Kxl fkx2 ' VnJ (A.26)
N |_ N I l\ t. l\ M J » /
it is seen that Equation (A.25) is a linear difference equation of the form
of Equation (A.I).
The measurement equation (assuming that the measurements are linear com-
binations of the state variables) is of the form
Ymk = V(k) + rmk. (A.27)
Upon substitution of Equation (A. 23) into (A.27),
Ymk = Cm[x(k) + Xnom(k)] + rmk (A. 28)
or,
* Ymk - CmXnom(k) = Cmx(k) + rmk. (A. 29)
Note that once the measurement Ym|< is available, the pseudo-measurement
can be computed and is of the form (A. 4). In the linearized Kalman filter,
Equation (A. 25) is interpreted as the model of the physical system and the
conditional mean of the deviation x(k), and its error covariance matrix is
computed as earlier in this section. Once x(k) has been computed, Equation
(A. 23) is employed to compute the state estimate
X(k) = Xnom(k) + x(k). (A. 30)
SUMMARY OF THE KALMAN FILTER AND MEASUREMENT STRATEGY OPTIMIZATION
The procedures described in this appendix can be employed to estimate
the state of any physical system for which an uncertain nonlinear model and
noisy measurements are available. The steps in this procedure are as
follows:
(1) Model the system and determine the necessary statistics
of the initial state and model uncertainty.
(2) Compute the nominal state.
(3) Linearize the nonlinear model about the nominal.
(4) Optimize the measurement strategy and controllable model
error covariance matrix.
(5) Compute the estimation error covariance matrix.
69
-------
(6) Collect and process the measurements to compute the estv
mate of the deviation.
(7) Compute the estimate of the state.
70
-------
APPENDIX B
OPTIMAL MEASUREMENT STRATEGY: OVERVIEW OF COMPUTER PROGRAM
This program computes the minimum dollar cost measurement strategy sub-
ject to satisfaction of specified limits on the diagonal elements of the
state covariance matrix at each event time. The theoretical details are
presented in DePalma (1977). An example of the usage of this program is
presented in Section 5.
DESCRIPTION
This program optimizes measurement strategy for the discrete time prob-
lem using linearized Kalman filter techniques. The method is described in
Appendix A. Briefly, given the following system
x = ViVi + Bk-iVi + Bk-iVi (BJ)
zk = Hkxk + vk (B.2)
where xk is an n-dimensional state vector, wk is an n-vector of process
noise, f|< is an m-dimensional input vector, zk is an ^-dimensional measure-
ment vector, and vk is an I vector of measurement noise. $k_-| (the state
transition matrix) and B(<-i are known n x n matrices, and Hk_-j is a known
£xn matrix. In addition, the statistics of wk and vk are known
0, , (B3)
EJwk]=
E[Vk]=0' E[vkvJ]=Rk '
Then, the covariance matrix P|< for the above system processed by a Kalman
filter can be calculated as follows:
P/MI -KkHk)Pk- (B.6)
where
71
-------
This is the Kalman gain. This program determines r^, rmi-n < r^, such that
the economic cost
V Ck ,
2-i , (c. a known measurement cost factor)
k=l rk k (B-8)
is minimized subject to inequality constraints on the diagonal elements of
the covariance matrix.
PROBLEM FORMULATION
The first step in optimizing a measurement strategy with this program is
to convert the system under consideration into a form useable by the pro-
gram. For a general continuous time problem with X = F(t,X,f), X an n-vec-
tor:
1. Form the linearized problem using a Taylor series, that is,
let x(t) = X(t) - Xnom(t) where Xnom(t) is the nominal tra-
jectory.
Then
«*>:M.
x(t) = A(t)x(t) ,
(B'9)
For the interval of interest [0,T], form k subdivisions of
length At, that is, At = i/k, where k is selected by accu-
racy considerations. Then the k state transition matrices
$1, ..., $[< are defined by the series
99 "I "I
. A(0)At . A(0)^ + + A_'(0)At_
~T! 2! n.
Af(k-l)*At]At , Ar(k-l)*AtrAt^ x + A[(k-l)*At]
+ fl + 2! ' n.!
"- "k
where n-j is selected such that
n.
^ ~. <^e (a prespecified small number)
72
-------
2. Determine Q(t) = E[w w^l where w is an n-vector of process
noise. Then form k nxn matrices (diagonal)
W0(n) = Q(n - At); n = 1, ..., k. (B.12)
3. The program permits the control of the diagonal matrix WQ by
measurements such that WQ = Wmax for no measurements, and
Wo = Wmin for maximum permissible measurements. If the prob-
lem being considered will use M such measurements, then form
the two M-vectors Wmin and Wmax for the M measurements, where
the Wmin and Wmax vectors define the diagonal elements of the
Wm-jn and Wmax matrices.
4. Determine the £ vector Rm-jn, which is the diagonal of E[v VT]
for maximum permissible measurements.
5. Determine the cost vectors 6w and BR such that is maximum per-
missible measurements are made, then the total cost is
k
E
i = 1
m
L
p . ' -i
£
L
n = 1
gR
n
minn
where 6W and 6^ are the same for each time k.
(B.13)
.E[(VXO>
-------
N = 3, £ = 2, M = 2.
In the program coding, the following integers are defined:
NT = K
NSEG
NSS = N/NSEG
NSR = £/NSEG
NSW = M/NSEG
NS = N
NR = £
NW = M
(the number of time steps)
(the number of segments)
(the number of state variables per segment)
(the number of state measurements per segment)
(the number of forcing function measurements per
segment)
(the total number of state variables)
(the total number of state measurements)
(the total number of forcing function measurements)
(Note: NSS*NSEG = NS, NSR*NSEG = NR, and NSW*NSEG = NW.)
The program must be dimensioned for each new problem since the amount of
core storage required is a function of N^ NT. The matrices should be
dimensioned as follows:
MATRIX
STNUMW, BETAW, WMIN, WMAX
STNUMR, RMIN, BETAR
PMAX, PMAXO
PO, TEN, D, PINVM, PM, PINV, P, PHIP
WO, PHI, E, SAVE
DIMENSION
(NW)
(NR)
(NT, NS)
(NS, NS)
(NS, NS, NT)
FORMAT
SYMBOL
DESCRIPTION
75E12.5
75E12.5
PHI(I,J,N)
WO(I,J,N)
The NT state transition matrices starting
from time = 0 and read row by row
The NT process noise matrices (diagonal)
read in same order as PHI(I,J,N)
4015
STNUMW(I)
The NW locations of the measured forcing
functions within the state vector. For
example, if NS = 4 and NW = 2, and the
forcing functions are measured on the
first and third states, then
STNUMW(i) - 1
STNUMW(2) = 3
75E12.5
75E12.5
75E12.5
WMIN(I)
WMAX(I)
BETAW(I)
The NW minimum variances on WO
The NW maximum variances on WO
The NW cost values for full measurements
74
-------
FORMAT
SYMBOL
DESCRIPTION
4015
STNUMR(I)
The NR locations of the measured state
ables within the state vector. For
example, if NS = 4 and second, third,
and fourth states are measured, then
STNUMR = 2,3,4
75E12.5
RMIN(I)
The NR minimum covariances for a maximum
measurement
75E12.5
75E12.5
NOTE:
75E12.5
75E12.5
BETAR(I)
PO(N,N)
if NW = 0 Skip
if NR = 0 Skip
u(D
PMAXO(NS,NT)
The NR cost values for full measurements
The initial covariance matrix read row by
row.
STNUMW, WMIN, WMAX, BETAW
STNUMR, RMIN, BETAR
The (NR + NW)*NT controls if a control is to
be tested; otherwise zeros
The previous N vectors of standard deviation
constraints on the states if using re-
start feature (see program operation);
otherwise, large, e.g., 1.00E + 05
PROGRAM OPERATION
The program is interactive and must be run from a terminal. The flow
chart for operation is presented below. The program prints out a number of
options for the user. The user either selects this option (by typing 1) or
ignores it (by typing zero). During the optimization the user can termi-
nate, continue, or dump (for restart with new bo"nds). He also has a choice
of outputting as the optimization proceeds. Briefly, the method starts with
the zero control and proceeds to the optimal control (that is, minimum cost
subject to the covariance constraints). When the cost does not change for
several iterates, it is assumed that the optimum has been found. The pro-
gram has five iterations per stage, and the output can be controlled for an
entire stage or for a single interation.
75
-------
STANDARD DEVIATIONS FOR THE CONTROL
READ FROM DEVICE 1 ARE PRINTED
ON DEVICE 6
STOP
STANDARD DEVIATIONS FOR ZERO CONTROL AND FULL
CONTROL PRINTED, IF BOUNDS ARE MET WITH ZERO CONTROL
THE TRIVIAL SOLUTION (ZERO CONTROL) IS THE OPTIMUM. IF THE
BOUNDS ARE NOT MET WITH FULL CONTROL (u=l) THEN THE
SOLUTION DOES NOT EXIST,
STOP 4
OLD AND NEW BOUNDS USED TO COMPUTE
NEW LINEARIZATION FROM DATA ON DEVICE 3
DATA READ FROM DEVICE 3, SAME BOUNDS,
»~FIVE ITERATIONS COMPLETED
TIME AND COST PRINTED FOR
TYPE-W AND TYPE-R CONTROLS
PRINTED IN TEMPORAL ORDER
I_J^_CONSRAINT INFORMATION CORRESSPONDING
TO NON-BASIC AND BASIC SLACKS PRINTED
STANDARD DEVIATIONS PRINTED FOR
DUMP DATA ON
DEVICE 2
-STOP
*-STOP
76
-------
TECHNICAL REPORT DATA
(Please read Instructions on the reverse before completing)
1 REPORT NO.
EPA-600/3-80-055
4 TITLE AND SUBTITLE
SAMPLING STRATEGIES FOR WATER QUALITY IN THE GREAT
LAKES
6. PERFORMING ORGANIZATION CODE
3. RECIPIENT'S ACCESSIOr*NO.
5. REPORT DATE
JUNE 1980 ISSUING DATE.
AUTHOR(S)
Raymond P.
William F.
8. PERFORMING ORGANIZATION REPORT NO.
Canale, Leon
Powers
M. DePalma, and
9 PERFORMING ORGANIZATION NAME AND ADDRESS
University of Michigan
Depts. of Civil and Aerospace Engineering
Ann Arbor, Michigan 48109
10. PROGRAM ELEMENT NO.
1BA608
11. CONTRACT/GRANT NO.
R803754
12. SPONSORING AGENCY NAME AND ADDRESS
Environmental Research Laboratory
Office of Research and Development
U.S. Environmental Protection Agency
Duluth, Minnesota 55804
13. TYPE OF REPORT AND PERIOD COVERED
Final
14. SPONSORING AGENCY CODE
EPA/600/03
15. SUPPLEMENTARY NOTES
16. ABSTRACT
The major goal of this project was to investigate the potential applications of
Kalman filtering and modern optimization techniques to the design of sampling
strategies for water quality in the Great Lakes. Two representative problems of
general limnological interest with considerably different characteristics were
studied. The first problem concerns sampling requirements for long-term trend de-
tection of chloride and total phosphorus concentrations in Lake Michigan. A mini-
mum-cost sampling strategy has been determined in association with the Kalman filter
and mathematical models that can detect changes of 1.0 mg Cl/£ and 1.0 yg P/£ in
Lake Michigan with 90 percent probability. The cost of this sampling program is
considerably less than monitoring programs designed without benefit of mass balance
models and Kalman filter data processing. An investigation was conducted to deter-
mine the sensitivity of the optimal cost to the accuracy of the mass balance model
parameters. It is concluded that the calculated sampling cost is most sensitive to
the variance in the apparent phosphorus settling velocity, not as sensitive to the
accuracy of laboratory analyses, and rather insensitive to the accuracy of the
effective flow at the Straits of Mackinac and atmospheric loads. The second problem
concerns the potential applicability of Kalman filtering techniques for concentration
estimation and parameter identification for a phytoplankton and nutrient cycling
model for Saginaw Bay.
17.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
Lakes
Sampling
Mathematical models
Minimization
b.IDENTIFIERS/OPEN ENDED TERMS
Sampling strategy
Large lakes
Lake Michigan
Saginaw Bay
COSATI Field/Group
08/H
12/B
13. DISTRIBUTION STATEMENT
Release Unlimited
19 SECURITY CLASS (This Report)
Unclassified
21. NO. OF PAGES
87
20. SECURITY CLASS (This page)
Unclassified
22. PRICE
EPA Form 2220-1 (9-73)
77
A- U.S. G"VERflHFMT PniNTINr. OFFICE- 198H--657-1 FS/nniQ
------- |