United States Environmental Monitoring EPA-600/4-79-055
Environmental Protection and Support Laboratory August 1979
Agency PO. Box 15027
Las Vegas NV 89114
Research and Development
&EPA Regulatory Water Quality
Monitoring Networks
Statistical and
Economic Considerations
-------
RESEARCH REPORTING SERIES
Research reports of the Office of Research and Development, U S Environmental
Protection Agency, have been grouped into nine series These nine broad categories
were established to facilitate further development and application of environmental
technology Elimination of traditional grouping was consciously planned to foster
technology transfer and a maximum interface in related fields The nine series are
1 Environmental Health Effects Research
2 Environmental Protection Technology
3 Ecological Research
4 Environmental Monitoring
5. Socioeconomic Environmental Studies
6 Scientific and Technical Assessment Reports (STAR)
7 Interagency Energy-Environment Research and Development
8 Special" Reports
9 Miscellaneous Reports
This report has been assigned to the ENVIRONMENTAL MONITORING series This series
describes research conducted to develop new or improved methods and instrumentation
for the identification and quantification of environmental pollutants at the lowest
conceivably significant concentrations. It also includes studies to determine the ambient
concentrations of pollutants in the environment and/or the variance of pollutants as a
function of time or meteorological factors
This document is available to the public through the National Technical Information
Service, Springfield, Virginia 22161
-------
EPA-600/4-79-055
August 1979
REGULATORY WATER QUALITY MONITORING NETWORKS-
STATISTICAL AND ECONOMIC CONSIDERATIONS
by
Jim C. Loft is and Robert C. Ward
Department of Agricultural and Chemical Engineering
Colorado State University
Fort Collins, Colorado 80523
Grant Number R-805759010
Project Officer
Donald B. Gilmore
Monitoring Systems Research and Development Division
Environmental Monitoring and Support Laboratory
Las Vegas, Nevada 89114
ENVIRONMENTAL MONITORING AND SUPPORT LABORATORY
OFFICE OF RESEARCH AND DEVELOPMENT
U.S. ENVIRONMENTAL PROTECTION AGENCY
LAS VEGAS, NEVADA 89114
-------
DISCLAIMER
This report has been, reviewed by the Environmental Monitoring and
Support Laboratory-Las Vegas, Nevada, U.S. Environmental Protection Agency,
and approved for publication. Approval does not signify that the contents
necessarily reflect the views and policies of the U.S. Environmental Frotee-
tion Agency, nor does mention of trade names for commercial products
constitute endorsement or recommendation for use.
ii
-------
FOREWORD
Protection of the environment requires effective regulatory actions that
are based on sound technical and scientific information. This information
must include the quantitative description and linking of pollutant sources,
transport mechanisms, interactions, and resulting effects on man and his
environment. Because of the complexities involved, assessment of specific
pollutants in the environment requires a total systems approach that tran-
scends the media of air, water, and land. The Environmental Monitoring and
Support Laboratory-Las Vegas contributes to the formation and enhancement of
a sound monitoring data base for exposure assessment through programs
designed to:
• develop and optimize systems and strategies for monitoring pollu-
tants and their impact on the environment
• demonstrate new monitoring systems and technologies by applying
them to fulfill special monitoring needs of the Agency's operating
programs.
This report covers a procedure for evaluating sampling frequencies of
established water quality monitoring networks. This report is intended to
assist monitoring systems managers to more efficiently distribute resources
between sampling sites and laboratory facilities in an effort to achieve
better data at a lower cost. For further information contact the Monitoring
Systems Research and Development Division at this Laboratory.
George B. Morgan
Director
Environmental Monitoring and Support
Laboratory
Las Vegas
iii
-------
SUMMARY
During the past eight years, a number of procedures have been proposed
for designing fixed-station water quality monitoring networks for regulatory
water quality management purposes. Each new procedure is based upon a
specific, and increasingly higher, level of statistical analysis and may or
may not consider the economics of monitoring. As the statistics applied to
the monitoring network design have become more and more sophisticated,
practical use of the design procedures has become more confusing, especially
in light of the economic constraints under which all regulatory agencies
operate.
The purpose of this study is to examine and quantify the statistical
trade-offs associated with using various levels of statistical sophistication
in network design and to formulate a procedure for accounting for economic
constraints in the design process. Sampling frequency is the major aspect of
network design considered in the study; consequently, the results of the
study are directed toward use by regulatory agencies for the evaluation and
upgrading of existing networks.
Network design can be based on a number of different objectives; how-
ever, it is becoming increasingly clear that the estimate of trends is the
major goal of fixed-station networks (mainly because of the low sampling
frequencies used). Consequently, the network objective selected for this
statistical comparison study was that of estimating annual geometric means of
water quality variables within known or projected confidence intervals. The
sampling frequency that achieves the desired confidence interval is then
designated the sampling frequency for the network.
Three levels of statistical sophistication were analyzed by computing
the confidence intervals about the geometric mean using:
1. A variance that accounted for seasonal variation and serial
correlation.
2. A variance that accounted for only seasonal variation.
3. A variance computed directly from the original data (i.e., no
accounting for either seasonal variation or serial correlation).
In order to compute the first two variances with confidence and to
consider network design, one must have a rather extensive water quality
record (e.g., frequent sampling over a long period of time for a range of
water quality variables and from a network of stations). Such records are
rare; however, an Illinois record met all of the criteria except the long
iv
-------
period of time. It was a one-year record with daily sampling at nine
stations covering 26 variables. The data record does permit a comparison of
the statistical levels in network design.
In order to compute the second variance, seasonal variation was assumed
to be deterministic and able to be predicted by estimating the coefficients
A and C of the equation:
y = A (cos wt + C)
where y = deterministic component at time t
w = 360 degrees/number of samples per year
The deterministic seasonal variation was then subtracted from the original
data and the variance computed.
The data from the above calculations were analyzed next to determine the
correlation structures. This was accomplished by fitting the coefficients,
4;-j , <$>2 > an^ the autoregressive, moving-average model:
Z =
t
*lZt-l +
}2Zt-2 + at:
Vt-1
where Z = value of time series at time t
a^ = random noise at time t
t
for each water quality variable. The theoretical autocorrelation functions
were then calculated based on the fitted models. The first variance could
then be determined.
Confidence intervals were computed for each station and for each vari-
able. The confidence intervals using the first variance were assumed to be
correct since they included the most sophisticated analysis. Confidence
intervals with variances 2 and 3 were then compared.
Results indicate that, within a sampling frequency range of approxi-
mately 12 samples per year to 34 samples per year, there is only an 8 to 10
percent error in confidence intervals about the means between the simplest
and most sophisticated design approaches. In this range the effects of
seasonal variation and serial correlation tend to cancel each other out, so
either both must be considered or both ignored. Since this is the range
within which many regulatory agencies operate their networks, it appears that
using basic statistics in network design is sufficient, especially in light
of the limited data records available to support more sophisticated network
design statistics. It should be pointed out that the above range of fre-
quencies is an average over five water quality variables—individual vari-
ables may act quite differently from the average.
v
-------
A dynamic programming cede was formulated to assign sampling frequencies
throughout a network with the goal of optimizing a statistical objective
function subject to an economic constraint. The objective function is the
sum (over several variables and all stations) of the normalized positive
deviation of the predicted confidence interval widths from preselected
design confidence interval widths. The code was designed to account for the-
effects of deterministic seasonal variation and serial correlation by
incorporating the results of the time-series analysis described above. The
economic constraint ensures that the annual operating costs of travel and
laboratory analysis will not exceed the allowable budget.
Within the economic framework used in this study, which deals strictly
with the operating costs of a monitoring system at a fixed scale of opera-
tion, the dynamic programming solution is relatively insensitive to variation
in the laboratory analysis and travel costs. The solution is more stongly
influenced by the water quality variables included in the design and by the
selection of design confidence interval widths.
vi
-------
CONTENTS
Foreword iii
Summary iv
Figures viii
Tables *
Abbreviations and Symbols xi
1. Introduction 1
Objectives 2
Organization of the report 3
2. Conclusions 5
3. Recommendations 7
4. Review of Literature 9
Monitoring policy and evaluation 9
Technical approaches to design 11
5. Statistical Analysis of Historic Data 16
Basic statistical concepts 16
Application 20
Statistical anaylsis results 30
6. Incorporating Economics into Water Quality Monitoring System
Design 55
Assigning sampling frequencies via dynamic programming . 55
Application to Illinois network 63
Sensitivity analysis 67
7. Synopsis 79
8. Assumptions / 81
References 83
Appendix 87
vii
-------
FIGURES
Number Page
1 Estimated autocorrelation function for daily sulfate
concentration, Grand River, Michigan 23
2 Estimated partial autocorrelation function for daily
sulfate concentration, Grand River, Michigan 24
3 Estimated autocorrelation function for daily specific
conductance, Grand River, Michigan 26
4 Estimated partial autocorrelation function for daily
specific, conductance, Grand River, Michigan 27
5 Estimated autocorrelation function of residuals for AR(2)
model, specific conductance, Grand River, Michigan . . 28
6 Theoretical autocorrelation function for chloride
concentration, Red River, Manitoba 36
7 Theoretical autocorrelation function for total phosphate
concentration, Red River, Manitoba 37
8 Theoretical autocorrelation function for specific
cenductaivce, Grand River, Michigan 38
9 Theoretical autocorrelation function for sulfate
concentration, Grand River, Michigan 39
10 95 percent confidence interval widths for the Michigan
chloride series as a fvwiction of sampling frequency . . 44
11 Average 95 percent confidence interval widths about the
geometric mean for total dissolved solids in the
Illinois network 46
12 Average 95 percent confidence interval widths about the
geemetric mean for total organic carbon in the
Illinois network 47
13 Average 95 percent confidence interval widths about the
geometric mean for »u&?e«ded solids in the Illinois
network 48
viii
-------
14 Average 95 percent confidence interval widths about the
geometric mean for total hardness in the Illinois
network 49
15 Average 95 percent confidence interval widths about the
geometric mean for nitrates in the Illinois network . . 50
16 Average 95 percent confidence interval widths for total
dissolved solids concentration, Illinois network ... 52
17 Average 95 percent confidence interval widths for total
hardness concentration, Illinois network 53
18 Effect of on 95 percent confidence interval widths
about the annual mean for first-order autoregressive
processes with unit variance 54
19 Flow diagram of dynamic programming code . 62
ix
-------
TABLES
Number Page
1 Results of fitting candidate models for conductivity time
series, Grand River Michigan 29
2 Results of model fitting, Red River, Manitoba 31
3 Results of model fitting, Grand River, Michigan 32
4 Results of model fitting, Illinois network 33
5 Computed confidence intervals about geometric mean for
several vater quality constituents 41
6 Relative error of confidence interval //3 as compared to
confidence interval #1 45
7 Median and range of confidence interval widths for water
quality constituents, Illinois network 64
8 Laboratory analysis costs for chemical constituents of
water quality 65
9 Results of dynamic programming design of sampling fre-
quencies for Illinois network 67
10 Design sampling frequencies based on varying travel costs 71
11 Design sampling frequencies based on alternate travel
distances 72
12 Effect of variation in total operating budget on design
sampling frequencies and monitoring system performance,
Illinois network 73
13 Effect of design confidence interval widths on design
sampling frequencies 75
14 Effect of water quality constituent selection on design
sampling frequencies and system performance, THinois
network 76
x
-------
LIST OF ABBREVIATIONS AND SYMBOLS
ABBREVIATIONS
ARMA — autoregressive, moving-average
nig/9. — milligrams per liter
SS — suspended solids
TDS — total dissolved solids
TOC — total organic carbon
SYMBOLS
A — amplitude of deterministic annual water quality cycle
— white-noise component or shock at time t in ARMA process
C — phase angle of deterministic annual water quality cycle
— annual operating cost of sampling for station i
cov(x.y) — covariance of x and y
— total annual operating budget for network
fi(Si, u^ — return function in dynamic programming
*
(Sp — optimal return function in dynamic programming
I.. — information content for station i , constituent j
1^ — design information content for constituent j
K — standard normal deviate for a probability of a/2
a/2
m — slope of deterministic linear trend in water quality
M — number of water quality constituents included in design
N — number of stations in network
xi
-------
NO^ — nitrate
0 — test statistic used in ARMA model evaluation
R — one-half the confidence interval width
r^(a) — estimate of lag-k autocorrelation of residuals in ARMA
model evaluation
S. — state variable for stage i in dynamic, programming
(budget remaining)
T — period of cycle (in this report, one year)
u.. — decision variable for stage i in dynamic programming
(number of samples collected per year at station i)
*
u^ (S^) — optimal decision function in dynamic programming
var(X) — variance of X
X — sample mean of X
X^ — confidence interval width for constituent j at station i
X^D — design confidence interval width for constituent j
X — value of water quality time series at time t
yt — value of deterministic annual cycle in water quality at
time t
Z — value of water quality time series at time t after
seasonal variation is removed
a — statistical significance level
2
X — (l-a)th quantile of chi-square distribution with n
n degrees of freedom
0^ — qth moving average coefficent of an ARMA process
p — population mean (of a water quality variable)
u * — population geometric mean (of a water quality variable)
p(n) or p — lag-n autocorrelation coefficient
n
2
o — population variance of water quality random variable
xii
-------
white-noise variance of a water quality time series
variance of a water quality time series with seasonal
variation removed
variance of water quality constituent j at station
pth autoregressive coefficient of an ARMA process
xiii
-------
SECTION 1
INTRODUCTION
In the area of water pollution, Federal law contains well defined pro-
cedures for planning and implementing pollution control measures by local,
State, and Federal government agencies. Throughout the pertinent body of
legislation (PL 89-234, PL 92-500, and PL 95-217, the Clean Water Act) and
plans that have arisen from implementation cf the law, there exists an almost
universal recognition of the need for regulatory water quality monitoring in
support of pollution control activities. Regulatory monitoring is defined
here as routine, fixed-station monitoring performed by a water quality manage-
ment agency to support its regulatory functions, such as discharge, permit
issuance and renewal.
Unfortunately, neither the nature and scope nor the precise statistical
objectives of such monitoring programs have been spelled out. Thus regulatory
monitoring systems have often been put into service in response to demands for
immediate action and without the careful thinking beforehand that is necessary
to ensure success.
Researchers in both government and academic institutions have declared
that much of this careful thinking in relation to water quality data collec-
tion should occur through consideration of statistical concepts. As a result,
the application of statistical methods to the design of surface water monitor-
ing systems has received much attention in recent years, and considerable
research has been conducted in an attempt to develop statistically sound
design procedures. One might hope that all of this work has led to a fairly
widespread acceptance of some basic principles that could be applied in prac-
tical situations. Unfortunately this is not the case. Rather one is con-
fronted with many diverse opinions as to what sort of statistical approach and
what level of statistical sophistication to apply to the design of regulatory
water quality monitoring networks. Several design procedures have been pro-
posed, each based on one opinion or another, but none seem to have made it
into the hands of water quality management agencies who are required by law to
collect water quality data and who, consequently, collect a significant per-
centage of all the water quality data in the United States.
Thus the monitoring efforts of many, if not most, management agencies
have been criticized, and the value of information they collect has been
questioned.
A second major concern in the design of monitoring networks is that of
achieving cost effectiveness in operation. Although the final judgment of
1
-------
whether or not the dollars devoted to monitoring are being well spent rests
on a difficult and subjective evaluation of the worth of data records, the
problem of improving statistical and economic efficiency in existing monitor-
ing programs is more straightforward. The above statements regarding the in-
adequacy of current water quality monitoring practices are explained more
fully in a report by the National Academy of Sciences (1977).
This research effort was undertaken in an attempt to refine currently
available scientific tools and to investigate their potential for application
to these two problems—the statistics and economics of water quality sampling.
The results of the research are directed toward use by regulatory water
quality management agencies for the twofold purpose of evaluating and upgrad-
ing their regulatory water quality monitoring programs.
The project involves the analysis of historic water quality records for
the purpose of reassigning sampling frequencies within a network. The level
of statistical sophistication employed is somewhat beyond the current capa-
bilities of most water quality management agencies. The practical realities
of applying this approach are then weighed against the consequences of using
only elementary statistics in sampling frequency design.
Finally, a mathematical programming procedure is developed that can make
use of statistical information (at any level of sophistication) to assign
sampling frequencies within a network while operating at a fixed budgetary
level.
OBJECTIVES
The overall objective of this research is to develop procedures for
evaluating and upgrading regulatory water quality monitoring networks in
terms of their ability to achieve desirable confidence interval widths about
sample geometric means for selected measured variables. This overall objec-
tive is accomplished via the following detailed objectives:
1. To identify a method for calculating the confidence interval width
about the sample mean or geometric mean of water quality variables
(concentrations of constituents) for given sampling frequencies when
the sample observations form a correlated time series.
2. To compare the results of computing confidence interval widths using
the same method as above with those obtained using simpler statis-
tical approaches.
3. To incorporate both statistical and economic information into sam-
pling frequency design by formulating and solving a mathematical
programming problem. The solution should minimize the overall
difference between design confidence interval widths and predicted
confidence interval widths throughout a network while operating at a
fixed budgetary level.
The first objective is accomplished through analysis of historic water
quality records. The time series of observations is assumed to consist of
three components: a linear trend, an annual cycle, and a correlated noise
2
-------
component. Each of these components is modeled, and the effects of each are
accounted for in the computation cf confidence intervals.
The second objective is accomplished by incorporating the results of the
first objective into a dynamic programming algorithm along with a provision
for computing the operating cost of sampling at individual stations. The
algorithm assigns sampling frequencies to each station in a network in order
to achieve optimum size and uniformity cf confidence interval widths about
sample means or geometric means when considering several water quality con-
stituents simultaneously. A fixed annual operating budget is imposed as a
constraint.
Several basic assumptions are necessary. The assumptions that delineate
the scope of the research are as follows:
1. An adequate historic data record is available for each water quality
constituent and each station under consideration.
2. Future grab samples will be equally spaced in time.
3. Sampling frequencies for all water quality constituents at a given
station are identical—i.e., each sample will be analyzed for all
constituents.
4. The selection of constituents to be measured is assumed to have been
performed beforehand.
5. Water quality observations from the data records used are assumed to
be representative in both time and space of conditions that actually
existed at the station.
6- The variance of the estimates of annual means of the water quality
variables as a result of poor sampling procedure or laboratory error
is ignored.
7. The variance of the estimates of parameters in deterministic and
stochastic models used in computing confidence interval widths about
the annual means are ignored as well.
ORGANIZATION OF THE REPORT
The remainder of this report is organized in the following manner.
Section 2 contains a review of literature covering the current state of water
quality monitoring, statistical tools appropriate for designing stream quality
monitoring systems, and several approaches to design that have been suggested
in the past. Section 3 develops in detail those statistical methods (includ-
ing time series analysis) that are necessary for applying the confidence
interval approach to the assignment of sampling frequencies while considering
seasonal variation and serial correlation in water quality. The methods are.
illustrated by application to data records from three areas: (1) the Red
River, Manitoba; (2) Grand River, Michigan; and (3) a nine-station network in
the State of Illinois. In Section 4 the dynamic programming algorithm is
3
-------
developed and applied to the Illinois network. A sensitivity analysis
explores the effect of variation of the major input variables. Sections 5 and
6 conclude the report with a discussion of the practical implications of the
research, a brief summary statement, and suggestions for further research.
4
-------
SECTION 2
CONCLUSIONS
1. The effect of serial correlation on confidence interval widths
about annual means is important at high sampling frequencies and
lessens as the sampling frequency decreases. The point at which the
effect of serial correlation becomes insignificant varies among
water quality constituents and locations, depending on the correla-
tion structure of the individual time series. Typically, however,
this point will occur at a sampling interval of three to five
weeks.
2. The deterministic annual cycle significantly affects computed
confidence interval widths over the entire range of sampling fre-
quencies under consideration—daily to bimonthly. Experimental
results indicate that failure to account for this seasonal varia-
tion can result in computation cf confidence interval widths that
are 20 percent to 50 percent larger than those that actually apply
within this range of sampling frequencies.
3. For the water quality time series studied and for a certain range
of sampling intervals, typically two to four weeks, the effects of
serial correlation in computing confidence interval widths is
roughly offset by the effects of seasonal variation. Therefore
within this region, both of the above factors should be considered
in computing confidence interval widths, or neither should be
considered.
4. The water quality records studied in this research indicate that
the most likely candidate models of the ARMA type to be evaluated
for water quality time series are AR(1), AR(2), and ARMA (1,1).
5. Computed confidence interval widths about annual means are highly
sensitive to the value of the lag-one autocorrelation coefficient
of the time series in question. Estimated values of p- encoun-
tered in this research range from approximately 0.5 to 0.9.
6. The dynamic programming code presented in this report provides a
fairly simple, efficient means of assigning sampling frequencies
throughout a network. The code is particularly useful when it is
desired to include the effect of serial correlation in the computa-
tion of confidence interval widths and when the selection of
sampling frequencies is limited to a few allowable values.
5
-------
7. Within the economic framework used in this study, which deals
strictly with the operating costs of a monitoring system at a fixed
scale of operation, the dynamic programming solution is relatively
insensitive to variation in the laboratory analysis and travel
costs. The solution is more strongly influenced, however, by the
water quality constituents included in the design and by the selec-
tion of design confidence interval widths.
6
-------
SECTION 3
RECOMMENDATIONS
Two groups of recommendations are made. The first group deals with
application of the current research, and the second group deals with sugges-
tions for future research.
RECOMMENDATIONS FOR APPLICATION
1. The effect of serial correlation on the widths of confidence intervals
about annual means or geometric means of quality constituents should be
considered by water quality management agencies for sampling intervals of
one month or less. These effects are important in both the design of
regulatory monitoring networks and the analysis of data subsequently
collected for management decisions.
2. The serial correlation effects should be quantified via the time series
analysis procedures described herein if sufficient data records are
available. An assumed AF.(l) correlation structure with a regionalized or
estimated value of the lag-one autocorrelation coefficient is the sug-
gested alternative.
3. In analyzing water quality records, a deterministic annual cycle should
be computed and removed from the observations prior to the determination
of confidence interval widths whenever possible. A fairly long record is
needed for this purpose, but equally spaced observations are not
required.
4. In future water quality monitoring systems, samples should be collected
equally spaced in time to facilitate time series analysis and trend
detection.
5. Sampling frequencies should be allocated among various stations of a
regulatory network using some rational statistical basis. The dynamic
programming code presented here is suggested with linear programming and
a stratified sampling approach as alternatives. The mathematical pro-
gramming approaches are preferable because they allow the incorporation
of economics into the analysis.
6. Management agencies should attempt to quantify as accurately as possible
the direct costs of travel and laboratory analysis that they experience
in sample collection and processing. The economic viewpoint adopted here
assumes that these costs are accurately known in order to establish the
7
-------
annual operating budget. Thus these cost figures are critical in the
economic analysis even though the sensitivity analysis showed the example
dynamic programming solution to be relatively insensitive to changes in
them.
7. The eventual uses of water quality data in management decisions should be
considered in the design of regulatory monitoring networks, particularly
with respect to the selection of water quality constituents to be in-
cluded in the design.
SUGGESTIONS FOR FUTURE RESEARCH
1. Daily water quality data for several constituents should be collected at
several locations in various sections of the United States over a long
period of time.
2. Such daily records should be analyzed to determine appropriate regional
models of the ARMA type for various quality constituents.
3. Techniques should be explored for the design of monitoring networks based
on daily records of flow or total dissolved solids supplemented with
sparser records of other quality constituents. Such techniques would be
based on the cross correlation between flow or total dissolved solids and
other quality constituents.
4. "Cookbook" network design and data analysis packages that include well-
documented Fortran or programmable calculator programs should be prepared
and distributed to regulatory agencies (or made available through STORET).
These packages should provide the capability of estimating deterministic
components from data records, computing confidence interval widths about
annual means and geometric means for time series of various correlation
structures, and assigning sampling frequencies throughout a network via
mathematical programming.
8
-------
SECTION 4
REVIEW OF LITERATURE
Much has been written in recent years on the subject of water quality
monitoring, and adequate literature reviews that summarize the work reported
up to 1976 have also appeared. Such a summary will not be repeated here, but
those major reports prior to 1976 that form the basis for the current project
will be discussed. The literature reviews will be identified, and pertinent
publications that have appeared since 1976 will be pointed out.
The first portion of this literature review covers material of a general
nature dealing with nationwide needs for water quality data in support of
water quality management and the degree to which those needs are or are not
being met. The rest of the review covers material dealing with the more
technical considerations of designing water quality monitoring networks,
particularly the assignment of sampling frequencies. First, early approaches
to design with the objective of detecting pollution events are mentioned.
Then more currently applicable approaches with the objective of determining
annual means and using fairly basic statistical methods are reviewed. Turn-
ing to the more complex problem of determining annual means while considering
serial correlation, the necessary background material is covered next.
Finally, advanced design approaches that make use of this theory are reviewed.
MONITORING POLICY AND EVALUATION
Several reports have attempted to describe the state of water quality
monitoring in general terms and to identify specific problems and deficiencies
in current monitoring practices. Foremost among these is the National
Research Council of the National Academy of Sciences (1977) report entitled
"Environmental Monitoring." This report was prepared for the U.S. Environ-
mental Protection Agency (EPA) at the request of the U.S. Congress and deals
with monitoring programs both operated and supervised by EPA. The latter
would include regulatory monitoring by State water quality authorities. Many
of the conclusions and recommendations of this report underscore the need for
performing research of the type described here. Among the pertinent conclu-
sions are the following:
1. The objectives of monitoring programs should be more clearly defined.
2. An effort should be made to better incorporate scientific principles
(including statistics) into the design, evaluation, and operation of
monitoring systems.
9
-------
3. The design of monitoring networks should incorporate an analysis of
the trade-offs between cost and effectiveness.
4. Data collected through monitoring efforts should be nore thoroughly
analyzed with respect to the intended use, adequately summarized,
and more widely disseminated.
5. Decisions in environmental management should rely more heavily on
data collected in the past.
A more detailed summary of the National Academy of Sciences report is pre-
sented by Kendrick 0 977).
An interesting philosophical treatise on designing hydrologic data
collection networks is given by Moss et al. (1978). The paper includes a
restatement of an obvious, but sometimes forgotten, problem of design.
Namely, if enough information were known about the hydrologic phenomena
involved to perfectly design a network, there would be no need to collect
additional data. The need for further research into hydrologic network
design is emphasized, particularly with respect to careful definition of
monitoring objectives, determining the value of data obtained from monitoring,
and relating monitoring network design to data utilization.
Cleary (1978) makes this crucial point regarding the role of background
data in water quality management decisions:
Before billions of dollars are spent for measures to control the quality
of U.S. rivers, their elusive aspects should be understood. Unfortu-
nately those who try to understand—those responsible for forming public-
policy and deciding how those billions of dollars will be spent—face a
serious handicap. That handicap is a lack of information describing
past river quality and reasons for the variability in that quality.
Cleary recognizes the National Stream Quality Accounting Network (NASQAN)
established by the U.S. Geological Survey in 1972 as having the greatest
promise for improving the nationwide water quality data base. A description
of the NASQAN program is presented in Cragwall (1976), and summarized results
of the program, which attempt to describe the current state of national water
quality on a region-by-region basis, are available in Steele et al. (1974),
Hawkinson et al. (1977), and Briggs and Ficke (1978).
The NASQAN system is dedicated to detecting long-term changes in stream
quality on a nationwide basis. Therefore, station locations are chosen to
account for as much of the Nation's streamflow as possible, and uniform
sampling frequencies for several constituents are used throughout the network
(Hawkinson and Ficke, 1975).
Why then are States required to operate their own monitoring networks,
and why should a State network be subject to different design criteria than
NASQAN? The State water quality authorities must make management decisions
on a short-term basis (crisis-oriented management) and must have water
quality data that is tailored to this purpose. For example, biannual reports
10
-------
on the current condition of water quality in the State are required by
Section 305-B of the Clean Water Act. Therefore, annual means (yearly aver-
ages) of pertinent water quality variables are meaningful statistics for this
short-term type of management strategy. The confidence interval approach to
design, to be discussed later, allows States to allocate their limited moni-
toring resources among stations in order to gain the largest possible amount
of usable information for a given level of expenditure. It further facili-
tates an understanding of the significance of changes from year to year of
annual means in water quality.
The emphasis in regulatory monitoring has traditionally been directed
toward assessing the effectiveness of municipal or point-source pollution
control. However, the Clean Water Act directs EPA to begin programs to
manage and control nonpoint sources. Pisano (1976) discusses Section 208 of
the Clean Water Act as the primary legislative mechanism through which States
will act to achieve water quality goals via control of both point- and
nonpoint-source pollution. Management under Section 208 involves both the
planning and implementation of pollution control practices and should be
supported through effective monitoring activities. Cooley (1976) provides an
editorial comment on some general considerations of monitoring nonpoint-
source pollution, including station location and constituent coverage.
A final article that is quite critical of the current approaches to
stream quality monitoring is that of Hines et al. (1977) of the U.S. Geologi-
cal Survey, who state:
Perhaps justifiably, in light of the complexities involved, enormous
amounts of time, money, and effort are spent on river-quality sampling
programs. In many basins, however, years of quality sampling have not
generated adequate information from which to establish environmental
standards or to make sound resource decisions. This situation has
resulted in a growing dissatisfaction with river-quality data programs.
These authors suggest an alternative approach to monitoring, consisting
of three major elements: (1) increased hydrologic analysis focusing on
streamflow, water temperature, channel morphology, and basin history, (2)
reinterpretation of existing data in light of the hydrologic analysis, and
(3) design of sampling programs based on short-term, synoptic surveys rather
than routine monitoring.
The authors indicate, at least indirectly, that this approach would be
appropriate for regulatory objectives (environmental decisions); however,
many of the practical realities of monitoring at the State level, such as a
dearth of expertise in water quality hydrology, are ignored. Also, the
intensive survey approach has been shown by Lettenmaier (1978) to be less
than satisfactory for trend detection; this will be covered later. Thus it
is apparent that much disagreement exists among experts as to what the basic
nature of monitoring programs should be.
TECHNICAL APPROACHES TO DESIGN
Detecting Pollution Events
11
-------
Within the past six years, several major reports have appeared that
address the technical aspects of the design of monitoring systems for water
quality management purposes. Although each includes a statistical approach
to the design problem, they differ widely in their objectives and recommended
methods.
In the late 1960's and early 1970's when water quality management was
guided by the Federal Water Quality Act of 1965 (PL 89-234), it was thought
that stream quality monitoring should have as its goal the detection of
stream standard violations (pollution events). Ward (1973) proposed a moni-
toring network design that would include two classes of stations. A primary
network would be designed to detect stream standard violations, and a second-
ary network would be designed to detect trends. The primary network would be
designed using a simulation model to determine the effectiveness of a moni-
toring scheme in detecting pollutant spills. The secondary network would be
designed to determine annual means of river quality constituents with a given
width of confidence interval. In each case costs would be balanced against
effectiveness.
Beckers and Chamberlain (1974) presented a more sophisticated approach
to designing for detecting stream standard violations. They developed a
complete, computerized design package using more complex stream models and
more detailed cost-effectiveness analyses than were applied by Ward (1973).
With time the realization of two important factors led to a shift away
from the objective of detecting stream standard violations. The first factor
is that such networks require high sampling frequencies and thus high costs
to operate with any satisfactory level of performance. The second is that
effluent monitoring became recognized (as a result of the 1972 Federal Water
Pollution Control Act Amendments, PL 92-500) as the only practical way to
enforce pollution standards.
Determining Annual Means
Thus the emphasis in stream monitoring has now been placed on the deter-
mination of annual means and trends of various water quality constituents.
Montgomery and Hart (1974) and Sherwani and Moreau (1975) provide reviews of
basic statistical techniques that may be applied to the design of such a
network.
Ward et al. (1976) provide an excellent summary of those statistical
tools that might possibly be adopted by regulatory agencies within the near
future. The emphasis is placed on the basic statistics of the water quality
"population" such as the annual mean and variance. Design of sampling fre-
quencies in order to achieve desired widths of confidence intervals about
annual means is discussed. Stratified sampling and linear programming are
suggested as appropriate design tools. In addition, a comprehensive review
of literature is presented, which adequately presents the state of the art in
water quality monitoring at the time of publication. In a concluding section,
the relationship of data needs and data utilization by agencies to their
selection of monitoring strategies is discussed.
12
-------
An additional population statistic of importance in water quality, which
is often used in place of the annual mean, is the geometric mean. The geo-
metric mean of flow is important in the work of Sanders (1974). The properties
of the geometric mean in the context of its use in water quality standards
are further discussed in Landwehr (1978) .
The arguments of Sanders and Ward (1978) point out that the full value
of any statistical approach to the design of monitoring networks can only be
realized if State governments are willing to incorporate these same statis-
tics into water quality decisions such as the setting of stream standards and
defining standards violations.
Considering Serial Correlation
A major limitation of the work of Ward et al. (1976) would appear to be
the assumption that, in most water quality sampling for regulatory purposes,
the samples will be independent. The extension of the same statistical
concepts to the case of samples that are not independent but serially corre-
lated requires that the field of time series analysis be introduced into the
subject of monitoring design. The basic statistical concepts involved in
this extension are illustrated in the following papers.
Bourodimos et al. (1974) examine the stochastic nature of water quality
time series—specifically, streamflow, temperature, dissolved oxygen, and
biochemical oxygen demand. These time series are assumed to consist of three
components: (1) a trend; (2) a cyclic or seasonal component; and (3) irregu-
lar fluctuations or a random component. The trend component is modeled by a
polynomial, and the cyclic component by a sinusoidal expression. The random
component is studied through its autocorrelation and spectral density func-
tions. The essential concept, though, is that of viewing a water quality
time series as composed of deterministic and stochastic portions that may be
isolated and studied independently. This concept is also applied elsewhere
in the literature, for example in Steele et al. (1974) and in Sanders (1974).
The stochastic component of a water quality time series will in general
be serially correlated. The importance of this phenomenon relative to moni-
toring is explicitly conveyed in Matalas and Langbein (1962) in the following
statement.
Hydrologic series frequently consist of observations that are dependent
on one another. Such series are referred to as nonrandom series and may
be represented by a simple Markov model. The dependence between the
observations is measured by the autocorrelation coefficients. In a
nonrandom series each observation repeats part of the information con-
tained in past observations. Consequently a nonrandom series yields
less information about the mean than a random series having an equal
number of observations.
The means by which this reduction in information may be quantified mathemati-
cally will be developed in detail in Section 5.
The use of the first-order Markov model is illustrated in Rodriguez-
Iturbe (1969) as applied to annual river flow. This simple stochastic model
13
-------
expresses the present value of a time series in terms of the immediately
preceding value and a random shock or uncorrelated noise term. The only
parameters of this model are the lag-one autocorrelation coefficient and the
mean and variance of the series. Rodriguez-Iturbe discusses the range of
errors involved in estimating these parameters for annual river flows, indi-
cating that large errors may result when fewer than 40 observations are used
in estimation.
The lag-one Markov model is a member of a larger class of time series
models known as autoregressive, moving-average (ARMA) models. The classic
text that covers this type of model is Box and Jenkins (1976). This text is
the basis for all of the time series analyses performed in connection with
the research described here.
Box-Jenkins type time series analysis has become extremely popular in
recent years, and journal articles have appeared describing the application
of ARMA models to many types of time series. An interesting study in which
time series models of the ARMA type were constructed for water temperature
and flow and then used for forecasting is described in McMichael and Hunter
(1972). A significant portion of the paper is devoted to introducing the
work of Box and Jenkins and describing the process of model construction.
The process of model construction has been refined since the appearance
of Box and Jenkins. These developments have been published in articles such
as Hipel et al. (1977) and McLeod et al. (1977). The first article includes
a summary of model identification, parameter estimation, and diagnostic
checks. The second paper includes application to three actual time series,
each of which is a "classic" that had been modeled and discussed previously.
Advances in ARMA modeling occur so rapidly that it is difficult to
define the state of the art at any one time. The British journal Biometrika
is a good source of information concerning the latest developments in model
construction, but the level of presentation is suitable for reading primarily
by statisticians. On a more practical level, the method by which one con-
structs a time series model will usually depend on the availability of compu-
ter programs to perform this function.
Advanced Design Techniques
Two reports have appeared that view water quality observations as seri-
ally correlated time series. Sanders (1974) studied the turbulent diffusion
of a conservative pollutant to establish mixing length criteria for sampling
station location. Sampling frequencies were determined based on daily obser-
vations of flow. Time series analysis was used to isolate the residual
random noise component of flow data, and then sampling frequencies were
assigned to achieve a given confidence interval width for the noise term.
This work represents the first attempt to take the confidence interval
approach beyond the simple situation of independent samples to cases where
serial correlation is important. The portion of the work dealing with the
assignment of sampling frequencies is summarized in Sanders and Adrian (1978).
Lettenmaier (1975) presented a rather sophisticated approach to designing
14
-------
a network for trend detection. He discussed two major points: (1) the power
of nonparametric tests for detecting linear and step trends in equally spaced,
serially correlated water quality observations, and (2) coupling statistics
and water quality models to determine spatial location of sampling stations
such that the average power of trend detection would be a maximum. Both
Sanders (1974) and Lettenmaier (1975) include excellent reviews of the liter-
ature of water quality monitoring.
Two important papers supporting the above work are Lettenmaier and
Burges (1976a) and Lettenmaier (1976). The former paper is a summary of the
complete report, and the latter paper is a discussion of nonparametric
statistical tests for trend that can be applied to water quality records.
Procedures are described for assigning sampling frequencies in order to
achieve desired power of trend detection for either a linear or step trend.
Many of the specific results are dependent on the assumption that a simple
Markov (first-order autoregressive) model adequately describes most water
quality time series.
The most recent in this series of articles (and the most relevant to
this current research) is Lettenmaier (1978). This article deals further
with the problem of designing networks for trend detection. Specifically the
power of trend detection tests in the case of constant sampling frequencies
is compared to that for stratified sampling (sampling one year in three) .
The conclusion is reached that constant sampling frequencies are superior for
trend detection even when two or three times as many samples may be collected
in stratified sampling. The importance of determining the correlation struc-
ture of water quality time series for sampling frequency design is emphasized.
Subsequently, the difficulty in actually evaluating the correlation coeffi-
cients from the type of data records normally available is recognized. A
regionalized approach to estimating correlation structure from the few com-
plete records available is suggested. Two other subjects—the use of inter-
vention analysis in trend detection and the geographical location of sampling
stations—are briefly touched upon.
There are few reports after Ward et al. (3976) that deal directly with
water quality monitoring network design in more specific terms, that is, that
spell out exactly how one should set up a sampling program. One notable
study, though, Is that of Moore et al. (1976). Their research, directed
toward monitoring of eutrophication in lakes, produced a cost-effectiveness
analysis for assigning sampling frequencies. Using a simple water quality
model and estimation theory, the reduction in uncertainty (variance) in the
estimates of water quality variables achieved through increased sampling is
balanced against the resulting increased cost. The effectiveness of this
approach (and many others) is limited by available prior data, adequacy of
the water quality model, and the ability of the designer to place a value on
the reduction of uncertainty in estimates. An additional drawback, with
regard to application by State agencies, is that the necessary statistical
and mathematical theory is far beyond the capabilities of existing agency
personnel. This limitation is shared with the design approach presented by
Lettenmaier (1975) and is one of the more serious problems associated with
upgrading monitoring programs at the State level.
15
-------
SECTION 5
STATISTICAL ANALYSIS OF HISTORIC DATA
BASIC STATISTICAL CONCEPTS
Confidence Interval About Mean
The value (concentration) of a water quality constituent is a function
of time, represented as
ent is precisely:
X.
The annual mean concentration of the constitu-
1
T
A
dt
where T is the period of consideration, in this case one year. The mean Is
estimated from discrete grab samples as a sum:
k
I h
f ' K - (1)
where k is the number of samples collected during period T and
T/At, where At is the interval between samples (assumed equal).
k =
Another parameter for characterizing water quality "populations" and the
one that is primarily used in this research is the geometric mean. The
geometric mean u' is estimated from:
K =
T
exp
k
I
t=l
£n X.
k
n
t=i
Here
XT
is the sample geometric mean. The geometric mean is simply the
quantity that results from dealing with the logarithms of the water quality
observations rather than the raw observations.
The geometric mean is probably a more useful statistic of water quality
random variables in general than is the mean for several reasons. One reason
is that the logarithms of water quality observations are often more nearly
normally distributed than are the observations themselves. In this case, the
geometric mean more closely approximates the median than does the mean.
Water quality observations for a single constituent may range over orders of
16
-------
magnitude. If only a few observations are taken, the mean will be heavily
influenced by a single large observed value. The geometric mean will be less
influenced by the large value and will provide a more meaningful indication
of central tendency. Finally, water quality time series composed of logs of
observations will be free of the very large fluctuation in magnitudes of
observations found in most untransformed series. Thus because the logarith-
mic series are better behaved, they can be modeled more effectively.
If the population variance is known, a (1 - a) x 100 percent confidence
interval about the sample mean is given by:
X " (Ka/2)[var(X)]l/2 ' X + (Ka/2)[var(X)]l/2j (2)
where var(X) = the variance of the sample mean
K
a
j2 ~ t^e standard normal deviance corresponding to a
probability of a/2 (tabulated)
(1 - a) x 100% = significance level
A 95 percent confidence interval may be defined by the following proba-
bility statement: There is a 95 percent probability that the true mean lies
within the interval about the sample mean given by (2) above.
The above holds if the sample mean is normally distributed. The Central
Limit Theorem indicates that this will usually be true for sample sizes
greater than 10 (Bendat and Piersol, 1971).
If the observations are independent, the variance of the sample mean is
given by:
2
var(X) = ~ (3)
2
where a ¦= the population variance
k =» number of observations
In this case it is a simple matter to compute the number of observations
(k) necessary to achieve a confidence interval of a particular width. Let
one-half the desired width of the confidence interval be represented by R ,
where:
R = K , —
"'2 vf
Thus:
k =
K a
a/2
(4)
17
-------
This procedure is applied to water quality sampling in Ward et al. (3 976).
If, on the other hand, the observations are not independent, the vari-
2
ance of the sample mean is larger than o /k and the above procedure will
result in a sample size that is smaller than that actually required. This
problem may be handled as follows.
A serially correlated time series of water quality observations may be
written as a sum of deterministic and stochastic components: Xfc = yfc + z .
where X = the value of the observation at time t
y = the value of the deterministic component at time t
Z = the value of the stochastic component at time t (Zt must be
stationary)
In general, both X and Z will be serially correlated.
In order to determine a confidence interval about the sample mean, it is
necessary to develop an expression for var(X) :
var(X) = var
= var
k-1
I \
t-0
k-1
I (yt + z )
t=0 c
Since there is no variance associated with the y terms, we have:
var(X) =
k-1 k-1
I I cov(zt+i ' Zt+j
i=0 j=0 3
k
2
k + 2(k-l)p(1) + 2(k-2)p(2) +
k-1
k + 2 ^ (k-n)p(n)
n=l
+ 2p(k-l)
(5)
where p(n) = lag-n autocorrelation coefficient
2
a z = var [ZtJ
Note that equation (5) reduces to equation (2) if the function p(n) is
zero, which is the case when samples are independent.
18
-------
If Z is stationary:
, > c°v[Zt ' Zt+n>
p(n) " var[Z^1
If y is periodic, i.e., is periodic, the sample mean must, of course,
be computed over an integral number of periods to have any meaning. In the
case of water quality constituents, an annual cycle is commonly exhibited, so
the annual mean is a useful statistic. For computing annual means, k in
the above derivation corresponds to the number of samples collected in one
year and the sampling interval in days would then be 365/k . Implicit in
this discussion is the assumption that:
i k_1
k E Xt
k=0 t
is an unbiased estimator of the annual mean y , regardless of the point in
the deterministic cycle at which the first sample is taken. In order to use
equation (5) to compute the variance of the sample mean, it is necessary to
know the values of the autocorrelation function of the stochastic component
p(n): n = 1, 2 k - 1 . These can be estimated from a historic data
record; however, one must have considerably more than k data points in
order to estimate k - 1 autocorrelation coefficients.
A better approach is to fit a time series model to the historic data
record with the deterministic component removed and then compute a theoreti-
cal autocorrelation function based on that model. This will generally
require less data and, it is hoped, will result in an autocorrelation function
that is "smoother" and more representative of the underlying stochastic
process than an autocorrelation function estimated from a particular reali-
zation of the process.
ARMA Models
The general class of time series models used for this purpose is auto-
regressive, moving average (ARMA) models as described in Box and Jenkins
(1976).
An ARMA (p,q) model possesses the following form:
Zt " *lZt-l + *2Zt-2 + ' • * + Vt-p + at
" eiat-l " 02at-2 + • • • " 6qat-q (6)
where Z ¦ present value of the time series
Z x B value of the series one time interval in the past, etc.
19
-------
^,(^2 f = autoregressive coefficients
0^, ©2, •..»*) = moving-average coefficients
at = present value of a random noise term or "shock"
at ^ = value of random shock at time t-1
The model expresses the current value of the time series in terms of
previous values, a random shock, and previous values of the random shock.
One effect of the model is to remove all serial correlation from the time
series, reducing it to a series of independent noise terms or shocks, a
2 t
The variance of the residuals, a , will be less than the variance of the
2 a
time series, a , since the variance of the time series includes the effect
cL
of correlation. In a purely autoregressive model, AR(p), all of the moving-
average coefficients are zero. Likewise in a purely moving-average model,
MA(q) , all of the autoregressive coefficients are zero.
A particular ARMA (p, q) process with specified coefficients will have a
unique autocorrelation function associated with it. This function may then
be used to determine the variance of the sample mean for equally spaced
samples drawn from the process. The sampling frequency necessary to achieve
a specified confidence interval that accounts for the effect of serial corre-
lation may then be determined by computing the variance of the sample mean
and width of the confidence interval for various sample sizes and selecting
the appropriate one.
The autocorrelation function desired is that of the stochastic component
only. Therefore, the deterministic component must be removed before a time
series model is fitted. For this research, the deterministic component was
assumed to consist of a linear trend plus a sinusoidal component with a
period of one year. This deterministic function has the form:
y = mt + A(cos wt + C) (7)
where y = deterministic component at time t
w = 360 degrees/number of samples per year
tn, A, C = fitted constants
A similar form of the deterministic function is also applied in Sanders
(1974) and in Steele et al. (1974).
APPLICATION
Data Records Used
Water quality records from three locations were analyzed using the
methods described above to determine their correlation structures. For
20
-------
selected constituents a range of hypothetical sampling frequencies were then
assigned, confidence intervals vere computed, and the results were compared
with those obtained when serial correlation was not considered. The water
quality records used were:
1. Approximately two years of weekly observations for four constitu-
ents from the Red River at Emerson, Manitoba. The period of record
considered was August 1960 through June 1962. Constituents consid-
ered were specific conductance, sodium, bicarbonate, and chloride.
2. Approximately 15 months of daily observations for four constituents
from the Grand River at 68th Avenue Bridge, Allendale, Michigan.
Period of record considered was March 1, 1976, through April 30,
1977. Constituents considered were specific conductance, total
phosphate, sulfate, and chloride.
3. One year of daily observations for five constituents at each of
nine stations in Illinois. Constituents considered were total
dissolved solids, total organic carbon, suspended solids, total
hardness, nitrates, and turbidity.
Data were obtained from the Monitoring and Surveys Division, Water
Quality Branch, Inland Waters Directorate, Ottawa, Canada; from the U. S.
Environmental Protection Agency, Region V, Chicago, Illinois; and from the
Illinois State Water Survey, Urbana, Illinois, respectively.
The data sets from the Canadian and Michigan locations were evaluated
first and used to gain experience in model fitting. (The results from these
two locations are presented first to describe the fitting process.) The
model fitting procedure was therefore more "streamlined" and efficient when
the much larger task of working with records from an entire network in
Illinois was begun.
For each of the Michigan data records, the parameters m , A , and C
in equation (7) were fitted by the method of least squares as described in
Sanders (197A), and then the function y was subtracted from the data
record before an ARMA model was fitted.
In practice, it would be unwise to estimate a linear trend based on a
single year of observations. Therefore the linear trend was not removed from
the Canadian or Illinois time series that were used in the final design
example. If, however, longer historic records were available, one would
normally estimate a linear trend component and then apply a statistical test
to see if it were significant. A "t" test—as described in any basic statis-
tics text such as Bowler and Lieberman (1972)—strictly applies only if the
individual observations are independent and identically, normally distributed,
but it may be useful also when these conditions "roughly" hold. Modified "t"
tests for records with dependent observations and nonparametric tests (which
are not based on a probability distribution) for trend are given by Letten-
maier (1976, 1978).
21
-------
The deterministic coefficients that were estimated for the seasonal
components of the water quality time series under study are listed in the
Appendix.
Time series models of the ARMA type were fitted to each data record
using the IMSL (International Mathematics and Statistics Library) subroutines
and the Control Data Corporation CYBER 172 computer system at Colorado State
University.
An initial step in model fitting is often some transformation of the
data. In many cases a transformation is selected in order to obtain "better"
fits. However, for this research it is important to remember that the models
will ultimately be used to generate theoretical autocorrelation functions for
time series to be sampled in the future and to establish confidence interval
widths about some sample statistic. If the sample statistic of interest were
the annual geometric mean, a logarithmic transformation would be used as in
Sanders (1974). If the annual mean were of Interest, no transformation would
be used. For the final design example based on the Illinois data, it is
desirable to deal only with geometric means for the sake of consistency, so a
logarithmic transformation is used in every case. This rule is followed for
the Canadian and Michigan data sets as well. Only models of degree p ^ 2
and q ± 2 are considered.
Model Fitting
The process of an ARMA model fitting is rather complex and is perhaps as
much an art as a science. A brief description will be attempted here, and a
detailed explanation may be found in Box and Jenkins (1970). Likely candidate
models are selected based on a visual inspection of autocorrelation and
partial autocorrelation functions estimated from the data for the first 20 or
so lags. An AR(p) process tends to exhibit exponentially decaying or damped
sine wave behavior in the autocorrelation function and has a partial auto-
correlation function that is zero except for the first p values. A moving
average model [MA(q)] would show similar behavior in reverse—exponential
decay in the partial autocorrelation function and q nonzero values in the
autocorrelation function. Mixed processes tend to exhibit a combination of
the above behavioral patterns. The estimated autocorrelation function and
partial autocorrelation function for the daily sulfate concentration time
series from the Grand River, Michigan, data are shown in Figures 1 and 2.
This time series was later fitted by a second-order autoregressive—AR(2)—
model.
After likely candidate models are selected, the autoregressive and
moving average coefficients are computed by the appropriate IMSL subroutines
using the method of maximum likelihood as described in Box and Jenkins (1976).
One model is then selected as the best from the various candidates based on
an examination of the correlation structure of the residuals and on the
2
magnitude of the residual varianca, a . If the model were perfect, the
21
residuals would be completely uncorrelated. On® would usually selsct a model
that hai th« smallest residual variance and least correlation remaining among
22
-------
1.00
0.75
c
o
0)
- 0.50
o
o
o
3
<
0.2 5
20
Lag — Days
Figure 1. Estimated autocorrelation function for daily sulfate
concentration, Grand River, Michigan.
23
-------
I 00.
0.75
c
o
o
0>
W
W
S 0.50
o
3
<
O
a.
0.25
1
A * t J 1 * * 1 1 I
UL
10 15
Lag — Days
20
Figure 2. Estimated partial autocorrelation function for daily
sulfate concentration, Grand River, Michigan.
24
-------
the residuals as the best. Other factors must be considered, however. For
example, one would like to adopt a model that has as few parameters as
possible.
Case Study in Model Fitting
In order to explain the time series model-fitting process in more detail
a case study is presented in which the procedures are described step by step.
The specific conductance record for the Grand River, Michigan, site is used
for this purpose.
After removing the deterministic seasonal component, the first step in
model fitting is a visual inspection of the estimated autocorrelation and
partial autocorrelation functions. These are given in Figures 3 and 4 and
appear very similar to the estimated functions for sulfate concentration in
Figures 1 and 2. Since the estimated autocorrelation function decays fairly
slowly, a significant autoregressive component is indicated. Only the first
two values of the estimated partial autocorrelation function appear to be
significant, therefore, no moving-average component appears to be present,
and the autoregressive operator would likely be of degree two. The most
likely candidate model, second-order autoregressive, is fitted first using
the appropriate IMSL subroutines. Values of the fitted parameters _ = 0.63
2
and <(>- = 0.28 result along with a residual variance, a , of 0.0026.
Si
The next step is the evaluation of the fitted model. The estimated
autocorrelation function r^(a), of the residuals is shown in Figure 5.
According to Box and Jenkins (1970), if the form of the model were known
exactly the °f the residuals would be normally distributed with mean
zero and standard error 1/ViT . Thus, the standard error limits as shown in
Figure 5 may serve as an approximate check on the adequacy of the model. For
the current example, the most significant value of r^ occurs at a lag of
seven. A logical explanation for this occurrence is that a weekly cycle
exists in the time series. A similar occurrence was noted, although to a
lesser extent, in other quality series at the same location but was not
observed at the other locations. Thus the adequacy of the deterministic
model, which should possibly include a weekly component for this location—as
suggested by Sanders (1974)—is questionable, but there is little reason to
doubt the adequacy of the AR(2) model.
A more quantitative check is given by a test statistic:
K 2
Q = n I r (a)
k=l
where " lag-k autocorrelation coefficient of residuals
n - number of observations to fit the model
K ¦ number of used in the test (here, K = 20)
25
-------
I 00 r
0 75
o
o
o
o
o
<4
0.50
0 25
!
T
T
r
I
_L
t *
10 15
Lag — days
20
Figure 3. Estimated autocorrelation function for daily specific
conductance, Grand River, Michigan.
26
-------
I 00
c
o
o
a>
W-
w
o
O
o
~—
"D
5
0 *->«.) h
i
I"
J
r
>
r
r- . h
0
y t
Iji
5 10 15
—T
20
Lag - days
Figure 4. Estimated partial autocorrelation function for
daily specific conductance, Grand River, Michigan,
27
-------
o ;'0r
o 10
c
o
w 000
o
o
o
<
t
IT
t
i
Standard
/ Error Limits
/
-0.10
-0.20
10 15
Lag — days ( k)
20
Figure 5. Estimated autocorrelation function of residuals
for AR(2) model, specific conductance, Grand River,
Michigan.
28
-------
If the final model is appropriate, Q is approximately distributed as chi-
square with (k-p-q) degrees of freedom. Recall, that p and q are the
orders of the autoregressive and moving-average components of the model,
respectively. By computing a such that the probability that Q is less
2
than x k_p q equals 1-a , it is possible to gain some insight into the
adequacy of the model. tx^-p-q is the (l-a)th quantile of the chi-square
distribution with k-p-q degrees of freedom.] One would expect a to be
fairly small (say less than 0.10) for adequate models. In practice, however,
a may be much larger, particularly when n is large, even when an inspection
of the rk(a) plot would indicate that the model is very good. Therefore,
in this research, a was computed primarily for the purpose of comparing
models. A model with a smaller a would normally be more acceptable than
one with a higher a . For the case study, the fitted AR(2) model results in
a value of a = 0.48.
A third technique of model evaluation is that of overfitting or fitting
more parameters than are actually thought necessary. Thus an AR(3) model is
fitted for the case study series. Fitted parameters are ^ = 0.63 , 2 =
0.25 , and ^ ¦ 0.02. A value of a = 0.64 and residual variance (a
of 0.0037 are also found. The fact that the value of 4>^ *s quite small
would lend support to the acceptance of an AR(2) model. This conclusion is
2
also supported by the values of a and a , which are both larger for the
3
AR(3) model than for the AR(2) model.
Two other candidate models were evaluated, AR(1) and ARMA (2,1). Both
appeared to provide poorer fits of the data than did the AR(2) model when
measured by the same criteria described above. Thus the AR(2) model was
selected. The results of fitting the four candidate models are summarized in
Table 1.
TABLE 1. RESULTS OF FITTING CANDIDATE MODELS FOR
CONDUCTIVITY TIME SERIES, GRAND RIVER, MICHIGAN
Candidate
Model
Fitted Parameters
Chi-Square
Significance
Level, a
*1
*2 *3 61
2
a
a
AR(1)
0.89
0.0029
0.60
AR(2)
0.63
0.28
0.0026
0.48
AR(3)
0.63
0.25 0.02
0.0037
0.64
ARMA(1,2)
0.70
0.21 0.06
0.0037
0.66
29
-------
STATISTICAL ANALYSIS RESULTS
Results of Model Fitting
All of the models selected in this study are of the first-order auto-
regressive, second-order autoregressive, or first-order autoregressive-first-
order moving-average types. Tables 2, 3, and 4 indicate the type of model
adopted and values of fitted model parameters for each water quality constit-
uent considered for the Manitoba, Michigan, and Illinois locations, respec-
tively. Theoretical autocorrelation functions were computed for each model
using the following equations from Box and Jenkins (1976).
For an AR(1) process:
P1 - (8)
Pk = ^ k = 1, 2, 3, . . . (9)
For an AR(2) process:
*1
P, - (10)
l-*2
+i2
P2 = *2 + C11)
l-*2
Pk = Vk-l + Vk-2 k = 3, A, 5, . . . (12)
For an ARMA (1,1) process:
Pi 2 (13)
l + ej - 2*1e1
Pj^ ~ ]_ k=2, 3, 4, . . . (14)
Theoretical (computed) autocorrelation functions are plotted in Figures
6, 7, 8, and 9 for the Manitoba chloride models, AR(1) process; Manitoba
specific conductance model, ARMA(1,1) process; Michigan specific conductance
model, AR(2) process; and Michigan total phosphate model, ARMA (1,1) process,
respectively. Since each model contains an autoregressive component, all of
the autocorrelation functions decay exponentially. However, they differ
considerably in the rate of decay.
Computed Confidence Interval Widths For Specific Models
The variance of the sample mean for each water quality constituent was
30
-------
TABLE 2. RESULTS OF MODEL FITTING, RED RIVER, MANITOBA
Constituent
Model
*1
6 6
2 1
2
a
2
a
z
y
log specific conductance
(ymhos/cm)
ARMA(1,1)
0.40
-0.37
0.0923
0.567
6.74
log bicarbonate
(nig/&)
AR(2)
0.45
0.17
0.145
0.091
5.71
log sodium
(mg/S.)
AR(1)
0.624
0.350
0.210
3.98
log chloride
(mg/S.)
AR(1)
0.644
0.602
0.375
4.08
-------
TABLE 3. RESULTS OF MODEL FITTING, GRAND RIVER, MICHIGAN
Constituent
Model
*1
*2
G1
2
a
2
a
z
u
log specific conductance
AR (2)
0.64
0.28
0.0279
0.0136
6.418
(Pmhos/cm)
log total phosphate
ARMA(1,1)
0.84
0.10
0.125
0.0953
-1.703
(mg/Z)
log sulfate
AR(2)
0.4430
0.3771
0.058
0.0169
4.04
(mg/£)
log chloride
AR (2)
0.51
0.29
0.130
0.0664
3.661
(mg/£)
-------
TABLE 4. RESULTS OF MODEL FITTING, ILLINOIS NETWORK
•&
Constituent
Model
*1
(ft 9
V2 1
2
cr
2
cr
z
u
Station #1—
-Little Wabash River at Louisville, Illinois
TDS
ARMA(1,1)
0.86
0.03
0.1740
0.1176
5.6793
TOC
AR(1)
0.79
0.1855
0.1833
2.4066
SS
AR(1)
0.86
1.8174
1.1655
3.7091
Hardness
ARMA(1,1)
0.86
-0.02
0.2065
0.1468
5.2670
no3-
AR(1)
0.90
0.6773
0.4276
1.1365
Station
#2—Kankakee
River at Kankakee,
Illinois
TDS
AR(1)
0.83
0.00686
0.00600
5.8526
TOC
AR(2)
0.69
0.19
0.1920
0.1298
1.9213
SS
AR(2)
0.73
0.19
1.5596
0.5059
2.6832
Hardness
AR(1)
0.85
0.00948
0.00900
5.7419
no3"
ARMA(1,1)
0.93
-0.14
0.6289
0.5506
1.782
Station
#3—Kankakee
River near Lorenzo,
Illinois
TDS
AR(2)
0.65
0.14
0.1658
0.0145
5.8875
TOC
AR(2)
0.61
0.19
0.07536
0.0749
2.0321
SS
AR(2)
0.76
0.11
0.5746
0.5634
3.0103
Hardness
AR(2)
0.69
0.20
0.01744
0.0158
5.6880
NO.,"
AR(2)
0.53
0.33
0.4649
0.3842
2.2108
*log (mg/t)
(continued)
-------
TABLE 4. (continued)
&
Constituent
Model
^2 61
2
a
2
a
z
y
Station #4—Chicago Sanitary
and Ship Canal at
Lockport, Illinois
TDS
AR(1) 0.89
0.3292
0.0281
6.0708
TOC
AR(2) 0.38
0.17
0.7536
0.0749
2.3513
SS
AR(2) 0.35
0.21
0.5746
0.5634
3.2369
Hardness
AR(2) 0.74
0.15
0.1639
0.0127
5.3434
no3~
AR(1) 0.56
0.40223
0.2330
2.0266
Station #5—Illinois River at Ottawa,
Illinois
TDS
ARMA(1,1) 0.85
-0.05
0.01099
0.0105
6.0520
TOC
ARMA(1,1) 0.80
0.10
0.0658
0.0534
2.3383
SS
AR(2) 0.65
0.11
0.4685
0.3311
3.4309
Hardness
AR(1) 0.95
0.01227
0.0116
5.5769
no3~
ARMA(1,1) 0.88
-0.15
0.1619
0.1250
2.553
Station #6—Vermillion River at Pontiac
, Illinois
TDS
AR(1) 0.96
0.07003
0.0482
6.0069
TOC
AR(2) 0.50
0.20
0.1631
0.1409
2.0488
SS
AR(2) 0.55
0.25
1.3816
0.6979
3.5015
Hardness
AR(1) 0.93
0.07233
0.0583
5.8522
NO ~
AR(2) 0.79
0.18
1.8465
1.2864
2.1514
*
log(mg/£)
(continued)
-------
TABLE 4. (continued)
•k
Model
2
2
Constituent
4>, 4>0 9, 0
a
u
12 1
z
Station #7—Eureka Lake at Eureka, Illinois
TDS
AR(2)
0.81 0.16 0.02458
0.0086
5.9165
TOC
AR(2)
0.42 0.45 0.0400
0.0381
2.5203
SS
AR(2)
0.46 0.32 0.3776
0.2270
2.6115
Hardness
AR(1)
0.97 0.02234
0.0105
5.4766
no3"
AR(1)
0.84 1.6920
1.6083
0.2927
Station #8—Canton Lake at Canton, Illinois
TDS
AR(1)
0.98 0.0153
0.0061
5.4071
TOC
ARMA(1,
1) 0.93 0.15 0.03867
0.0300
1.8897
SS
AR(2)
0.58 0.29 1.0757
0.8078
1.6359
Hardness
AR(1)
0.99 0.02077
0.0082
5.2039
no3~
AR(2)
0.62 0.34 0.6245
0.3458
0.5083
Station #9—Sangamon River at Lake Decatur, Illinois
TDS
AR(1)
0.92 0.1169
0.0083
5.7097
TOC
AR(2)
0.46 0.22 0.0801
0.0765
2.0290
SS
AR(2)
0.55 0.24 0.7142
0.2193
2.8744
Hardness
AR(2)
0.86 0.07 0.0146
0.0117
5.5574
NO ~
AR(2)
0.41 0.51 2.6655
0.5672
1.6438
*
log(mgM)
-------
,00<
¦! 0.75
o
AR(1) Process
^ = 0.64
0.50
0.25
r
J
0
3
Lag
4
Weeks
Figure 6. Theoretical autocorrelation function for chloride concentration,
Red River, Manitoba.
36
-------
.00 r
.1 0.75
o
I 0.50
0.25
ARMA(1,1) Process
<|> = 0.40
0 = -0.37
3 4
Lag - Weeks
Figure 7. Theoretical autocorrelation function for total phosphate
concentration, Red River, Manitoba,
37
-------
.0 01
AR(2) Process
0.75
^ = 0.63
2 = 0.28
0.50
t: 0.25
0
0 15 20
Lag - Days
25
Figure 8. Theoretical autocorrelation function for specific conductance
Grand River, Michigan.
-------
.oot-
0.75
§ 0.50
0.25
ARMA(1,1) Process
= 0.84
0L = 0.10
llli
T T r t t t
10 15
Lag - Days
20
25
Figure 9. Theoretical authocorrelation function for sulfate
concentration, Grand River, Michigan.
39
-------
then computed over a range of sampling frequencies using equation (5). A 95
percent confidence interval about the sample mean is then:
y - 1.96 [var(X)]1/2, y + 1.96 [var(X)]l/2| (15)
Since a logarithmic transformation has been performed, the width of the
confidence interval about the geometric mean in the original units of the
constituent is found from:
R = £y + 1.96 [var (X)]1//2 _gy - 1.96 [var (X) ]1^2 ^
The results of these computations for the Canadian and Michigan locations are
presented in Table 5. Confidence interval //I in Table 5 is computed as just
described and takes into account both the deterministic component and serial
correlation of the time series. Confidence interval #2 is computed from
equation (16) using:
2
var (X) = -f~
where k = number of observations per year
2
a = the variance of the time series with the deterministic component
removed
2
is estimated from water quality observations from which the func-
tion y given by equation (7) has been subtracted. Confidence interval #3
is computed from equation (16) as well, this time using:
_ 2
var(X) =
2
as in equation (3), where a is the variance of the time series without
2
considering either the deterministic component or serial correlation, a is
estimated from the logs of the raw water quality data. Thus confidence
intervals #1, #2, and #3 represent successively decreasing levels of sophis-
tication in the analysis.
Since the raw observations contain an apparent (or "false") variance as
a result of the deterministic component, confidence interval //2 is always
smaller than confidence interval #3. The effect of serial correlation is to
increase the variance of the sample mean. Therefore, confidence interval til
is always larger than confidence interval //2.
When samples are collected frequently, the effect of serial correlation
is large and confidence interval #3 is much larger than the other two.
However, as the interval between samples increases the effect of serial
AO
-------
TABLE 5. COMPUTED CONFIDENCE INTERVALS ABOUT GEOMETRIC
MEAN FOR SEVERAL WATER QUALITY CONSTITUENTS
1
3
7
14
28
36
45
Sampling
Interval
(days)
Width of Width of
Confidence Confidence
Interval //I Interval #2
Width of
Confidence
Interval //3
Grand River at Allendale,
, Michigan
Specific Conductance (ymhos/cm)
1
74.3
14.7
21.0
3
74.8
25.5
36.5
7
75.6
38.8
55.7
14
78.5
55.0
78.7
28
87.6
77.8
111.4
36
94.7
88.7
127.1
45
102.5
99.2
142.2
Total Phosphate (mg/&)
1
5.4
1.5
2.8
3
5.6
2.6
4.9
7
5.8
4.0
7.5
14
6.5
5.7
10.6
28
8.2
8.1
15.0
36
9.3
9.2
17.1
45
10.3
10.3
19.1
Sulfate (mg/£)
1
0.038
0.012
0.013
3
0.039
0.020
0.023
7
0.041
0.031
0.035
14
0.047
0.043
0.050
28
0.062
0.061
0.070
36
0.070
0.070
0.080
45
0.070
0.079
0.090
Chloride (mg/£)
6.8
7.0
7.4
8.5
11.0
12.5
14.0
2.1
3.6
5.5
7.7
10.9
12.5
14.0
2.9
5.0
7.6
10.8
15.3
17.5
19.6
A
Confidence interval #1 accounts for both seasonal and serial correlation
effects; confidence interval //2 accounts for seasonal effects only; and
confidence interval //3 assumes independent random samples.
(continued)
41
-------
TABLE 5. (continued)
Sampling Width of Width of Width of
Interval Confidence Confidence Confidence
(weeks) Interval //I Interval //2 Interval #3
Red River at Emerson, Manitoba
Specific Conductance (ymhos/cm)
1
189.1
109.2
139.4
2
193.2
154.5
197.4
3
209.4
191.3
244.4
4
226.9
218.9
279.7
6
281.0
279.5
357.6
8
323.5
323.2
413.9
Bicarbonate (mg/£)
1
105.7
49.6
62.4
2
110.6
70.2
88.4
3
115.8
86.9
109.5
4
120.4
99.5
143.3
6
138.5
127.2
160.6
8
153.0
147.2
186.2
Sodium (mg/£)
1
27.2
13.4
17.3
2
28.1
18.9
24.5
3
29.6
23.5
30.4
4
31.0
26.9
35.0
6
36.3
34.5
45.0
8
40.9
40.1
52.5
Chloride (mg/2.)
1
41.8
19.7
25.0
2
42.9
28.0
35.7
3
45.1
34.8
44.5
4
47.0
40.0
51. 2
6
54.9
51.5
66.5
8
61.6
60.0
77.9
Confidence interval //I accounts for both seasonal and serial correlation
effects; confidence interval #2 accounts for seasonal effects only; and
confidence interval #3 assumes independent random samples.
42
-------
correlation becomes equal to the decrease in variance the results from
considering the deterministic annual cycle. For example, this occurs for the
Michigan chloride series at a sampling interval of seven days at which point
confidence intervals #1 and #3 are equal. At some still larger sampling
interval, the effect of serial correlation will disappear entirely. This
occurs at about four weeks for the Michigan chloride series at which time
confidence intervals it 1 and it2 are equal. These relationships are illus-
trated graphically in Figure 10. Although a design approach using confidence
interval it?. is certainly feasible, it is presented here primarily as an
intermediate step in arriving at confidence interval it3.
One can readily see that the level of improvement in estimating confi-
dence interval widths that can be gained from the more sophisticated statis-
tical techniques is highly dependent on the sampling interval used and,
perhaps to a lesser extent, on the time series model that applies.
Table 6 gives a summary of these results by indicating the relative
error of confidence interval it3 as compared to confidence interval it 1 for
selected sampling frequencies. Taking an average over the four constituents
of the Canadian data, confidence interval #3 is 36 percent too narrow in
weekly sampling, 12 percent too wide when samples are collected monthly, and
27 percent too wide when samples are collected eight weeks apart. For the
Michigan constituents, confidence interval it3 is on the average 61 percent
too small in daily sampling, 24 percent too large in biweekly sampling, and
45 percent too large when sampling every 45 days.
One can obtain a better idea of the significance of these results
through a simple hypothetical example.
Case 1: Suppose that an agency wishes to monitor the Michigan location
in order to determine the annual mean specific conductance with a 95 percent
confidence interval width of 40 ymhos/cm. If the simplest statistical
approach were used, the agency would implement a program of sampling two
times per week (Table 5) but would obtain an "actual" confidence interval
width (accounting for the deterministic cycle and serial correlation) of 75
)jmhos/cm. If, alternatively, the time series approach described here were
used, the agency would realize that the desired confidence interval could not
be achieved with daily or less frequent sampling.
Case 2; Now suppose that the desired confidence interval width is 80
ymhos/cm. The agency could adopt biweekly sampling regardless of which
statistical approach was used (it 1 or #3) . Either would be appropriate in
this case.
Computed Confidence Interval Widths—Generalized Results
The widths of confidence intervals about the geometric means for five
constituents at each station in the Illinois network were computed over a
range of sampling frequencies as before. The computed widths were averaged
over the nine stations and plotted in Figures 11 through 15 for total dis-
solved solids, total organic carbon, suspended solids, total hardness, and
nitrates, respectively.
43
-------
« Confidence Intervol #1 ( Accounting for serial correlation and deterministic component )
o Confidence Interval #2 ( Accounting for deterministic component }
x Confidence Intervol # 3 ! Based on row data'
2 0
x x
XXX
X X X X
XXX
c.
w>
X X
X X
X X
X X
10
jc 5
• • • • X
X o
o
X O
o
x * * . • A « * 8 ®
* * #8?®
x * x ..••SB'-
O °
888
888
8 8 ®
8 8
o o
„ o
5
'0
i i _i
15
. j I i i.. i-
20
1
c 5
1.1
35
t—j i
40
So^piinq Inrervo! (days}
Figure 10. 95 percent confidence interval widths for the Michigan chloride series as a function
of sampling frequency.
-------
* +
TABLE 6. RELATIVE ERROR OF CONFIDENCE INTERVAL #3 AS COMPARED TO CONFIDENCE INTERVAL #1
Red River,
Manitoba
Grand
River,
Michigan
Sampling Interval
Sampling Interval
Constituent
1 Week
4 Weeks
8 Weeks
Constituent
1 Day
14 Days
45 Days
Conductance
-26%
+23%
+28%
Conductance
-71%
0%
+39%
Bicarbonate
-41%
+ 4 %
+22%
Phosphate
-48%
+63%
+85%
Sodium
-37%
+13%
+28%
Sulfate
-66 X
+ 6 Z
+14%
Chloride
-40%
+ 9%
+27%
Chloride
-581
+28%
+40%
Average
-36%
+12%
+27%
Average
-61%
+24%
+45%
Confidence interval //3 assumes independent, random samples,
^Confidence interval ifl accounts for effects of serial correlation and seasonal variation-
-------
00
90
80
70
60
50
40
50
20
I 0
*
AA
A A A A
AA *
AA
«8ooo
A ••
• • OO
A #
A • O O
• OO
O
• a O
• ••a 0
. • • A o
. A 0°
A O
A O
o
* o
A °
Ao
AO
O
• Confidence Interval #
A
O ° Confidence Interval #2
A Confidence Interval # 3
a_j i i l l_i i—i—l—i—i—i—i—i—i—i i i I . ... i ... ¦ f
10 20 30 40 50 60
Sampling interval , days
11. Average 95 percent confidence interval widths about
the geometric mean for total dissolved solids in
the Illinois network.
46
-------
5.0 r
4.0
3.0
- 2.0
1.0
A A
8 8
A
O
A
O
• %
s°
A
8#
<1^ 8 88 8
A A 0 0
%•
• Confidence Interval
O Confidence Interval
A Confidence Interval
# I
# 2
# 3
6
_i ¦ ' ¦ ¦ i i 1 i i i i I—i—i—i—i—I—i i
_l I L.
0
10
20 30 40
Sampling Interval , days
50
60
Figure 12. Average 95 percent confidence interval widths about the geometric
mean for total organic carbon in the Illinois network.
-------
40
oo
A A
c*a
o>
E
o
>
k_
a>
_c
9
O
C
®
c
o
o
A A A A
30
AAA
A A
0 0
00 9 i
20
a A m* ®
^ -0®
00
88
0 10
-C
X3
1
• ® ® o
• • A 0 °
AO °
A O
O
ftS*8
•6°
A 8® 8 *
A
O
¦ ¦ ¦ ¦ 1 ¦ ¦ ¦
O
A
Confidence Interval # I
Confidence Interval #2
Confidence Interval # 3
¦ ¦ ¦ * 1111 i i i—i i i i i . ¦ ¦ i
10
20 30 40
Sampling Interval , days
50
60
Figure 13. Average 95 percent confidence interval widths about the geometric mean
for suspended solids in the Illinois network.
-------
80r
70
60
50
40
30
2 20
10
AA
A AAA
AA • • • •
• •
AA « »QQO Q
A A* •
• OO
OO
OO
OO
• •
• •
• •'A O
•• A 0°
A o°
A O
A O
O
AO
A°
O
• Confidence Interval # I
O Confidence Interval # 2
A Confidence Interval #3
* ¦ ¦ » 1 ¦ ¦ ¦ ¦ 1 * i ¦¦'¦¦¦¦ ' ¦ i i—i—i i i i i J
10 20 30 40 50
Sampling Interval , days
60
Figure 14. Average 95 percent confidence interval widths about
the geometric mean for total hardness in the Illinois
network.
49
-------
I O.Or
AA
CT>
E
a)
—
a)
o
c
a>
o
H—
c
o
o
H-
o
sz
•o
5
9.0
8 0
7.0
6.0
5.0
4.0
3 0
2.0
1.0
A AAA
AA
• • • •
AA
• • •
A
A
• •
OO
• •
OO OO
• •
• •
• •
• • A
A •
A*
O O
OO
A
A
OO
O
A _ O
o
A o
A 0°
A o
o
Ao
O
O
A
O
O
• Confidence Interval # I
O Confidence Interval #2
A Confidence Interval # 3
-I I—I—I I I L_J I I i I I J I I I I 1 I I I I i 1 i < ¦ ¦ |
0
10 20 30 40 50
Sampling Interva I , da ys
60
Figure 15. Average 95 percent confidence interval widths
about the geometric mean for nitrates in the
Illinois network.
50
-------
The general behavioral pattern seen in Figure 10 is repeated here. The
effect of serial correlation seems to become relatively insignificant for
sampling intervals of about 20 days or greater in the total organic carbon
and suspended solids series. The same thing occurs at an interval of about
30 days for total dissolved solids. However, the effect of serial correla-
tion in the total hardness and nitrate series seems to persist out beyond
sampling intervals of AO days.
Recall that a "rule of thumb" has been suggested in the past by Ward et
al. (1976) and others that samples may be considered to be independent when
collected monthly or less frequently. The. current analysis would indicate
that this assumption is reasonably correct. It should be noted, however,
that this assumption strictly applies only to data records with deterministic
seasonal variation removed, the seasonal component being significant over the
entire range of sampling frequencies.
The task of fitting a separate model to each constituent record at each
station is certainly a formidable one. Lettenmaier (1975) has suggested that
many, if not most, water quality time series might be modeled using AR(1)
model. He further suggested using a parameter value of p^ = 0.85 when
insufficient data are available for estimation of . The error involved in
applying this assumption is explored in Figures 16 and 17, which compare
average confidence interval widths for total dissolved solids and total
hardness for the Illinois network using the fitted models with those computed
using an AR(1) model with = 0.85. The widths of the "actual" confidence
intervals are significantly less in each case, indicating that at least for
these examples the time series tend to exhibit stronger serial correlation
than that of the postulated AR(1), = 0.85 model.
In the absence of adequate data to fit a time series model of the ARMA
type, the use of the first-order autoregressive model is certainly justi-
fiable. However, confidence interval widths computed subsequent to this
assumption are highly dependent on the value of the lag-one autocorrelation
coefficient, . Figure 18 shows 95 percent confidence interval widths
about the annual mean (not the geometric mean) for samples exhibiting AR(1)
type correlation with several values of the parameter p^ and variance equal
to unity. A value of = 0 of course corresponds to uncorrelated (inde-
pendent) samples. These curves were obtained from equations (5) and (15).
Actual confidence interval widths would be obtained by multiplying the width
obtained from Figure 18 times the square root of the variance (standard
deviation) of a particular time series. A significant difference in confi-
dence interval widths exists at high sampling frequencies from the case of
P1 « 0.85 to = 0.95 .
51
-------
9 Or
• •
OO
80
• •• •
Ck)
v.
o>
E
o
>
70
## OOOO
• • OO
OO
- 60
4>
O
C
®
¦D
x-
C
o
o
<•-
o
JZ
XJ
3*
50
40
• •
O
O
• OO
O
o
o
o
o
o
o
,0
0° • C.I. Widths Based on Fitted Models
O C.I. Widths Based on AR(I) Model
P. = 0.85
30
iiii i ¦ * ' « ' » ' < i i i i < ¦ 1 ¦ ¦ * i > ' i ¦ ¦ i
10 20 30 4 0 50 60
Sampling interval, days
Figure 16. Average 95 percent confidence interval widths for
dissolved solids concentration, Illinois network,
computed from both actual fitted and hypothesized
AR(1) models.
52
-------
80r
70
60
V 50
!S 40
• •
• • • oo
••Sooo
• •
••• 00
oo
00
• •
• •
• •
o
o
o
30
o
o
20
I I I I 1
• C.I. Widths Based on Fitted Models
O C.I. Widths Based on AR(I) Model
P * 0.85
I.XJ.J I I I 1 1 t I J t ,1—L.
10 20 30 40 50 60
Sampling Interval, days
Figure 17. Average 95 percent confidence interval widths for
total hardness concentration, Illinois network,
computed from both actual fitted and hypothesized
AR(1) models.
53
-------
0 10 20 30 40 50 60
Sampling Interval - days
Figure 18. Effect of on 95 percent confidence interval
widths about the annual mean for first-order
autoregressive processes with unit variance.
54
-------
SECTION 6
INCORPORATING ECONOMICS INTO
WATER QUALITY MONITORING SYSTEM DESIGN
This section expands the application of the previously developed statis-
tical analysis techniques to consider assigning sampling frequencies within a
multistation network. Not only are confidence interval widths of several
constituents considered, but also the economics of actually obtaining and
analyzing the water quality sample are introduced. Given that the statis-
tical and economic objectives are generally competing, mathematical pro-
gramming is used to "optimize" the sampling frequencies.
The section is separated into three parts: (1) formulation; (2)
application; and (3) sensitivity analysis. The formulation portion objec-
tively defines the problem and establishes a dynamic programming approach to
solving it. The application portion applies the dynamic programming algo-
rithm to assigning sampling frequencies for an Illinois network for which
detailed statistics are currently available. The sensitivity analysis
portion evaluates the sensitivity of the sampling frequencies to changes in
the design variables (e.g., changes in costs, changes in travel distances,
changes in statistical design criteria, etc.).
ASSIGNING SAMPLING FREQUENCIES VIA DYNAMIC PROGRAMMING
The problem of redesigning a water quality monitoring network that is
already in place may be viewed as one of assigning new sampling frequencies
to each station in order to achieve improved performance. In this study the
performance of a network is evaluated in terms of the confidence interval
widths obtained about the annual means in quality constituents measured by
the network.
The primary consideration in design is that the confidence interval
widths obtained for each constituent of major concern be reasonably small and
as uniform as possible from station to station. The optimization procedure
presented here attempts to achieve this general objective while satisfying an
economic constraint. The statistical objective of the design is to minimize
the double summation—over both stations and constituents—of the positive
differences between predicted confidence interval widths and design confi-
dence interval widths for selected quality constituents. The economic
constraint ensures that the sum of the operating costs over all stations will
be less than the specified annual budget. The cost of sampling at each
station is assumed to consist of the direct travel cost plus the laboratory
cost of processing the samples.
55
-------
In general the design will be based only on a small number of quality
constituents that are considered to be most important by the management
agency rather than on the total number of constituents actually measured.
Also a limited number of possible sampling frequencies that reflect the
agency's options in this area will be considered.
Problem Definition
The optimization problem may be expressed mathematically as:
subject to:
N M
Minimize ][ £
i=l j=l
xn - "i
X.D
D"l
N
i=l
Ci — CT
(17)
(18)
Ci " fc(ui,:L)
Xij * fx(ui'j)
i = 1, 2, . . . N
i = 1, 2, . . . N
j = 1, 2, . . . M
(19)
(20)
Xj = a constant
= a constant
xu - "0
i - i, 2, •
M
if X.. - X. < 0
ij 3
(21)
(22)
(23)
where X
X.
J
ij
D
CT =
Ui =
N =
M =
predicted confidence interval width for constituent j at
station i
design confidence interval width for constituent j
annual cost of sampling at station i
total annual operating budget
number of samples collected per year at station i
total number of station considered
total number of constituents included in the design
56
-------
Note that the difference X.. - X.D is set equal to zero if it would
ij J
otherwise be negative since one would not normally wish to allow the objective
function to benefit from a station that had achieved a confidence interval
width that was smaller than that actually sought.
The differences, X, - X.^ , are normalized by dividing by the design
sj- J
confidence interval width, X^D , in order that the summation can be per-
formed over several different quality constituents whose confidence interval
widths might differ greatly in magnitude.
Also note that only operating costs are considered. This arises from the
assumption that the general level and extent of the monitoring operation have
been established and optimization is performed within a fairly narrow region
around this level. Thus fixed costs such as equipment costs and personnel
salaries will remain relatively constant within the range of decisions under
consideration. Another way to view the situation is that monetary resources
devoted to monitoring are being reallocated in order to improve system per-
formance. Only those resources that are actually affected by the reallocation
need be included in the analysis. Since changing sampling frequencies would
result in different numbers of samples to be processed and different distances
to be traveled, there are the costs considered in expression (19), which for
this study is assumed to be linear.
Linear Programming Approach
The functional relation (20)—and thus the objective function—is non-
linear; therefore, linear programming cannot be used to solve the problem
as it stands. If the objective function were expressed as a linear function
of sampling interval rather than numbers of samples collected, then expres-
sion (19) would become nonlinear and nothing is accomplished. One approach
suggested by Ward et al. (1976) is to reformulate the problem as follows:
N M
Min I I
i-1 j-1
Iij " Ij
Subject to similar constraints where 1^ and 1^
information contents, respectively, I
ij
u.
ij
(24)
are predicted and design
is defined by
(25)
where a
ij
variance of constituent j at station i , a known
population parameter
Thus the information content is a linear function of the number of
samples collected, and linear programming techniques apply. Assuming inde-
pendent samples, the information content as defined above is the reciprocal
57
-------
of the variance of the sample mean. It follows that achieving equal informa-
tion contents would also achieve equal confidence intervals. Of course the
expression for the variance of the sample mean becomes much more complicated
as serial correlation becomes important as shown in equation (5).
The linear programming approach, therefore, suffers from its inability
to deal with confidence interval widths directly. An additional drawback is
that computed "optimal" sampling frequencies may take on fractional values
unless an integer programming technique is used. An integer, linear pro-
gramming formulation is also presented in Ward et al. (1976).
Dynamic Programming Approach
A mathematical programming technique that overcomes all of these diffi-
culties and is both simple and computationally efficient is dynamic pro-
gramming. This is a technique that requires no assumption of linearity in
either the objective function or constraints. The discrete form of dynamic
programming, which is applied here, has the further advantage that the
analytical form of the objective function and constraints need not be known.
Additionally, only specific values of the decision variable—i.e., specific
sampling frequencies—need be considered. Thus dynamic programming can be
applied ae an integer technique.
The limitations of dynamic programming that relate to this application
are that no "cookbook" formulation of a "typical" dynamic programming problem
exists, and, therefore, one cannot normally use an off-the-shelf computer
code for his particular problem as one can in the case of linear programming.
However, dynamic programming codes that are specific to a particular situa-
tion are usually not as difficult to write as are other types of optimization
codes.
There are also some advantages inherent in preparing one's own optimiza-
tion code. The first is that a code written for a particular problem is
usually more efficient than one prepared to handle all problems in a general
class. A second advantage is that the writer of a code will have a better
feel for its operation and therefore will be less inclined to accept erro-
neous results. Finally, one can gain valuable insight into the nature of the
system he is dealing with as a result of the careful thinking necessary to
formulate a mathematical programming code used to optimize that system.
Dynamic programming was, therefore, selected as the optimization technique
for this study, and a computer program was written to solve the specific
problem previously described.
Summary of the Dynamic Programming Algorithm
The theory and mechanics of dynamic programming may be found in many
optimization texts. A very simple presentation of the subject that is,
nevertheless, adequate for most purposes is contained in Hillier and
Lieberman (1974). A brief discussion will be presented here.
Dynamic programming is a technique for making a sequence of interrelated
decisions. The problem is divided into stages, with a policy decision
58
-------
required at each stage. Each stage has a number of states associated with
it. The effect of a decision is to transform the current state into a new
state at the next stage. Each decision also makes an individual contribution
to the objective function. The optimization algorithm chooses the sequence
of decisions that minimizes (or maximizes) the overall value of the objective
function. The algorithm makes use of a recursive relation to find the
optimal sequence of decisions without considering all of the possible
sequences.
Given the current stage and state, the minimum value of the objective
function over all succeeding stages is called the return function. The
minimum value of the return function, which results from making the appro-
priate current decision, is known as the optimal return function. The
recursive relation expresses the value of the return function for the current
stage and state in terms of the contribution from making the current decision
and the value of the optimal return function for the new state at the next
stage that results.
Beginning at the last stage, the optimization proceeds by moving back-
wards, defining and storing the value of the optimal return function and the
associated minimizing decisions for each possible state in each stage. When
the first stage is reached, there is a single possible state known as the
initial condition. Thus, the optimal return function has a single value that
is the minimal value of the objective function for the overall problem. The
decision that achieves this minimum is chosen as the first policy decision.
Since this decision results in a unique state at the second stage, it is
possible to recall the decision associated with the optimal return function
for that state. This decision is chosen as the second policy decision, and
so on, until all policy decisions are made.
Dynamic Programming Formulation of the Monitoring Problem
The above general concepts are applied to the water quality monitoring
problem as follows.
Define:
u^ ¦ decision variable for stage i; number of samples
collected per year at station i
Si « state variable for stage i; portion of total budget
remaining
M (X,. - X.D)
I —^— = contribution to objective function at stage i
"J
f^(S^,u^) = return function for stage i
j-1 X D
*
f^ (Sj_) = minimum u^
59
-------
[fi(Si'ui)]= °Pt^rna^ return function for stage i
i,j = subscripts referring to station and constituent,
respectively
The value of the optimal return function corresponding to stage i and
state S± represents the minimum possible value of the objective function
that could result from sampling at stations i through N , given that there
are dollars currently remaining in the budget. Thus:
N M X„ . - X.
fi*(si) = mini I £ —2—jp1
u.U=i j-1 X.
i» J 3
D
(26)
The value of u. that minimizes f.(S.,u.) is denoted as u. (S.)
* i i i i i i'
The function u, (S.) might be called an "optimal decision function" analo-
11 *
gous to the optimal return function f.. (S^) . Therefore:
f.*(S.) = f.(S.,u,*)
x i i i i
(27)
The value of the state variable at stage i+1 may be expressed in terms
of the current state and current decision as follows:
and:
Si+1 " Si " Ci
C. = fc (Ul.i)
(28)
Thus the amount of money remaining in the budget is reduced by an amount
equal to the annual cost of sampling station i at frequency u^ . This
relation is known as the equation of state.
The recursive relation for this problem may be expressed as follows;
M (X, . - O
f (S ,u ) = I —±1 1
j=l X.
D + f i+1 ^i+P
(29)
M (X,, - X,D)
- I
I
X.
3
D
+ Wsi - V
v (Xij ~ Xj }
~ I T? + min
j=l
j
i+1
M
I I
£-i+l j-1
M (X - X V
* 1 '
D
(30)
60
-------
An important point to remember is that each minimization over is
constrained such that always greater than or equal to zero. Thus
^ 0 , and the total budget is never exceeded.
Recalling the definition given earlier, this relation expresses the
minimum value of the objective function (minimum sum of deviations from
design confidence intervals) resulting from sampling at stations i through
N , given a current state and making the current decision u^ in terms
of the contribution from the current stage:
M X.. - X.D
y i_
L D
J-i X.
and the optimal return function for the new state at the next stage,
f i+l(Si+l^
The sequence of calculations involved in the operation of the dynamic
programming code is illustrated in the flow diagram of Figure 19. As pre-
viously discussed, the general procedure is first to move backwards through
the stages computing optimal return functions and optimal decision functions
for each stage. The second step is, beginning with the initial condition
(total budget) at the first stage, to go back through the stages, computing
the new state at each stage based on the previous decision and recalling the
optimal policy decision associated with each new state until the last stage
is reached and the overall optimal operating policy has been attained.
The first segment of the program is composed of statistical subroutines
that compute confidence intervals for each water quality constituent at each
station over the range of sampling frequencies of interest. The method of
computation is described in Section 5. Theoretical autocorrelation functions
for adopted ARMA models are used to account for serial correlation in the
time series. The required inputs for this portion include the numerical
values of the ARMA model parameters for each constituent of each station and
the variance of each series with deterministic seasonal variation removed.
The main portion of the program is the optimization algorithm itself,
which utilizes the computed confidence intervals from the first segment in
the computation of the objective function. The cost of sampling, which is
used in the equation of state, is computed in a separate subroutine; this
facilitates using any desired cost relations. For this research, the cost
of collecting and processing a sample is considered to be the sum of the
incremental cost of travel plus direct laboratory analysis costs. Thus:
Total annual No. of
cost for ¦» samples
station per year
As mentioned previously, since only direct costs are considered in the
analysis, the optimization should be regarded as an efficient reallocation
Lab cost
per
sample
[Incremental
travel
cost
Incremental!
distance
to station /
(31)
61
-------
START
j^UurvPUTF .'U_L CONFIDENCE INTERVALS^
I
BCGIN AT L. A ST STAGF ( S t o t ion)
~
N IT! ALtZE:. 5TA1F ( Hi.Jyel ) ~~j
INITIALLY ' iF ClSION
( Sampling ' requency )
41
±
RECALL CONR-jENCE INTERVALS
!
[ COMP
I 01
±
, Next
Decision
N>jki State
Next Stage
COMPUTE CONTRIBUTION TO 1
) 6 J E C T IV t. FUNCTION J
r
~
COMPUTE SAMPLING COST
AND Rf-MAlNING BUDGET
- r
JL
COMPU'IE Hi ' UNN FUNCTION
COMPUTE AND 51 ORE
OPT!MAL PFTURN FUNCTION
OPTIMAL DF.CiSiON FUNCTION
-rr-r==j
COMPUTE OPTIMAL DECISIONS
FOR EACH STAGE
PRINT OPTIMAL. OPERA [ING
poi •; c y
T"
( S,0P )
Figure 19. Flow diagram of dynamic programming code.
62
-------
resources within a predetermined operating range, not as a "global" optimi-
zation of the system.
APPLICATION TO ILLINOIS NETWORK
As an illustration of the dynamic programming formulation described
earlier, the network design procedure was applied to nine stations of the
Illinois network for which time series models had been determined. The Input
factors that must be determined in order to use this approach are the follow-
ing: (1) water quality constituents to be included in the design; (2) the
parameters of the AREA models used, including autoregressive constants and
moving-average constants and variances; (3) design confidence interval widths;
(4) incremental travel costs; (5) incremental distances for each station; (6)
laboratory analyses costs; (7) total annual operating budget; and (8) sam-
pling frequencies considered. The design values of the input variables
related to the above factors were selected as follows.
Constituents Included In Design
The type and number of water quality constituents to be included in the
analysis is highly subjective. The most important considerations are proba-
bly: (1) to include those constituents that can serve as indicators of the
major types of quality problems (pollution) expected; and (2) to avoid
including too many constituents in the design, which would result in a less
effective determination of means for major indicators. For the initial or
"baseline" run, five constituents were included:
1. Total dissolved solids (mg/Z) as an indicator of overall water
quality and nonpoint-source pollution.
2. Total organic carbon (wg/Z) as an indicator of organic (municipal
and industrial) pollution.
3. Suspended solids (mg/£) as an indicator of pollution from agri-
culture, land clearing, and development.
4. Total hardness (mg/2 as calcium carbonate) as an indicator of metal
ion concentration.
5. Nitrate (mg/2.) as a nutrient indicator.
ARMA Models
The model used in the design was the selected "best" model of the three
candidates—AR(1), AR(2), and ARMA(1,1)—for each constituent at each station
as listed in Table 4. Since five parameters at nine stations were used, the
baseline run used 45 different models in computing confidence intervals.
Design Confidence Interval Widths
In order to assign design confidence interval widths to each constituent,
a hypothetical situation was constructed in which the State agency had
63
-------
sufficient resources to sample all stations uniformly at a frequency of 26
samples per year. It was desired to reassign sampling frequencies in order
to achieve greater uniformity of confidence intervals. Therefore, confidence
interval vidths were computed for each constituent at each station based on a
frequency of 26 samples per year, and the median width over all the stations
was selected for this demonstration as the design value for each constituent.
Other desirable criteria could be established for selecting the design value.
Thus, at the uniform frequency half of the stations would be achieving the
design confidence interval width and the program would reallocate samples in
order to improve upon this performance. The range of and median confidence
interval widths for the design constituents are presented in Table 7.
TABLE 7. MEDIAN AND RANGE OF CONFIDENCE INTERVAL WIDTHS FOR
WATER QUALITY CONSTITUENTS, ILLINOIS NETWORK
(95% Confidence Level, 26 Samples per Year)
Quality Constituent
Mean Confidence
Interval Width
Range
Total dissolved solids (mg/SO
37 .0
17.0 - 132.1
Total organic carbon (mg/2.)
2.29
1.26 - 3.80
Suspended solids (mg/£)
14.9
3.6 - 39.5
Total hardness (mg/£)
36.8
23.0 - 93.1
Nitrates (mg/£)
1.99
0.52 - 19.5
Incremental Travel Costs
The Illinois State Water Survey (Harmeson, 1978) indicated a direct
travel cost of $0.14 per km for sampling the network used in this study.
This figure should be representative of the change in travel costs an agency
would experience as a result of relatively small changes in total distance
traveled and would apply for the type of reallocation in sampling discussed
here. This is the cost that applies in equation 31.
Laboratory Analysis Costs
The laboratory cost per sample is highly variable, depending on the type
and number of constituents measured and on whether the analysis is done
within the agency or contracted to an outside laboratory. The costs used in
this study were based on information obtained from the Colorado State Depart-
ment of Health, Water Quality Control Division. This agency routinely
measures at least 35 chemical constituents (Table 8). Also given are the
average costs of analysis for each constituent as reported by the U. S. Army
Corps of Engineers (1978) for several independent laboratories. These costs
would more nearly reflect average total costs of analysis. The State of
64
-------
TABLE 8. LABORATORY ANALYSIS COSTS FOR CHEMICAL CONSTITUENTS
OF WATER QUALITY
Laboratory Analysis Costs
State of^ Independent
Water Quality Constituent Colorado Laboratories
1 Turbidity
0.53
3.89
2 Conductivity
0.27
3.95
3 Dissolved oxygen
2.50
4.60
4 Biochemical oxygen demand
9.96
27.19
5 Chemical oxygen demand
9.96
14.38
6 pH
0.83
3.05
7 Total volatile solids
3.00
6.63
8 Total dissolved solids
2.00
7.23
9 Total solids
2.00
7.00
10 Ammonia nitrogen
1.66
11.81
11 Nitrite nitrogen
1.66
7.50
12 Nitrate nitrogen
1.66
8.17
13 Total phosphate
1.66
5.57
14 Cyanide
4.98
19.93
15 Total hardness
1.66
4.85
16 Calcium
1.66
7.56
17 Magnesium
0.50
8.56
18 Sodium
1.66
8.95
19 Chloride
1.33
5.79
20 Sulfate
2.67
8.75
21 Fluoride
0.83
12.00
22 Arsenic
9.96
16.29
23 Boron
1.66
10.42
24 Cadmium
4.98
7.56
25 Chromium
4.98
9.05
26 Copper
4.98
7.88
27 Iron
1.66
8.24
28 Lead
4.98
10.47
29 Manganese
1.66
19.20
30 Silver
1.66
10.70
31 Zinc
1.66
9.63
32 Mercury
7.47
17.50
33 Kjeldahl nitrogen
9.96
18.05
34 Aluminum
1.66
10.96
35 Potassium
1.66
8.77
Total
111.91
342.08
From Anderson, 1978.
•f
From U.S. Army Corps of Engineers, 1978.
65
-------
Colorado performs its own laboratory analyses and reports an average cost of
$111.91 per sample (Anderson, 1978). This cost is taken to represent operat-
ing costs of the laboratory excluding overhead. Since this study deals with
changes in operating costs, a per-sample cost of $110 was used in the network
design. Note that if some samples were analyzed for fewer than the total
number of constituents, a smaller laboratory cost would apply. A more com-
plex economic analysis would be necessary, however, if the assumption that
all samples are processed identically were removed.
Determining Incremental Distances
The cost of collecting additional samples at a particular station is
more nearly a function of the "remoteness" of that station from other sta-
tions in the network than it is of the actual distance from that station to
the laboratory. This is true of course since the sampling unit normally
travels from station to station rather than from station to laboratory each
time. An exact determination of travel costs would require a consideration
of the exact collection routes and would be an enormously complex task. A
simpler procedure, which was adopted for this study, was to use the average
one-way travel distance (approximated by a straight line) from the three
stations nearest the one in question. This represents the average additional
distance traveled to collect an additional sample at the station and is the
distance used in equation 31. Design distances for each station in the
network are presented in Table 9.
Total Budget
The design budget is the portion of the total monitoring budget that
represents the direct operating cost associated with travel of the sampling
units and laboratory analysis of samples. The design budget used in this
study is based on the same hypothetical situation described earlier. It is
assumed that the agency's total budget allows it to sample each station with
a uniform frequency of 26 samples per year. The portion of that budget that
is direct operating expense was calculated as follows.
The direct travel budget is the incremental travel cost (per km) times
the sum over all stations of the distances associated with each station times
the number of samples per year. The annual laboratory budget is the cost of
processing a sample times the number of stations times the number of samples
per year. The total annual operating budget is, of course, the sum of the
travel and laboratory budgets. Using the baseline costs of $0.14 per km for
travel, $110 per sample for laboratory analysis, and 26 samples per year, a
design budget of $28,050 was obtained.
Sampling Frequencies Considered
The choice of sampling frequencies to consider is another highly subjec-
tive aspect of the design process. One approach would be to consider every
integer sampling frequency between one sample per year and 365 samples per
year. However, most State agencies are constrained (or feel that they are)
to certain "standard" sampling frequencies—e.g., weekly, monthly, etc. The
66
-------
TABLE 9. RESULTS OF DYNAMIC PROGRAMMING DESIGN OF SAMPLING FREQUENCIES FOR ILLINOIS NETWORK
(A) Individual Results
95% Confidence Interval Widths
Station Distance Frequency Water Quality Design Uniforra Frequency Predicted
Number (km) Samples/Year Constituent (ng/:) (ng/ •' ) (nig / *)
58.9
TDS
TOC
Suspended
Hardness
Nitrate
:lids
57.0
2. 29
H.9
36.3
1.99
87.2
3.30
39. 5
64. 5
1.99
TP . 1
3 - I -
35.2
57.9
69.1
26
TDS 37.0
TOC 2.2 9
Suspended solids 14.9
Hardness 36.3
Nitrate 1.99
17.0
2. 30
12.0
25. 1
5,03
17.0
2. 30
12.0
25.10
5 .03
44.3
26
TDS 37.0
TOC 2.29
Suspended solids 14.9
Hardness 36.3
Nitrate 1.99
35.3
2.38
17.8
36.8
5.39
35. 3
2.38
17.8
36.8
5.39
54.1
39
TDS 37.0
TOC 2.29
Suspended solids 14.9
Hardness 36.3
Nitrate 1.99
67.6
2. 22
14.9
23.0
2.77
64.3
1.81
12.2
22.1
2.24
63.5
26
TDS 37.0
TOC 2.29
Suspended solids 14.9
Hardness 36.3
Nitrate 1.99
37.0
1.92
14.3
36.4
4.14
37.0
1. 92
14.3
36.4
4.14
(continued)
-------
TABLE 9. (continued)
Station
Number
Distance
(km)
Frequency
Samples/Year
Water Quality
Constituents
95% Confidence Interval Widths
Design Uniform Frequency Predicted
(mg/5.) (mg/fc) (mg/t)
67.2
26
TDS 37.0
TOC 2.29
Suspended solids 14.9
Hardness 36.3
Nitrate 1.99
132.1
2.29
23.6
93.1
19.5
132.1
2.29
23.6
93.1
19.5
79.2
13
TDS 37.0
TOC 2.29
Suspended solids 14.9
Hardness 36.3
Nitrate 1.99
58.8
2.40
5.40
39.3
1.49
60.0
2.82
7.18
40.3
2.01
119.7
13
TDS 37.0
TOC 2.29
Suspended solids 14.9
Hardness 36.3
Nitrate 1.99
28.8
1.65
6.90
31.9
4.95
32.8
2.30
9.17
35.1
5.37
98.5
13
TDS 37.0
TOC 2.29
Suspended solids 14,9
Hardness 36.3
Nitrate 1.99
33.3
1.26
4.54
41.1
1.62
33. 7
1.41
5.50
41.2
1.67
(continued)
-------
TABLE 9. (continued)
(B) Summary Statistics
Constituent
Uniform Frequency Sampling Predicted
(nig/I) (ragft)
/ Change
TDS
TOC
Suspended solids
Total hardness
Nitrate
Mean =
Standard deviation =
Mean =
Standard deviation =
Mean =
Standard deviation =
Mean =
Standard deviation =
Mean =
Standard deviation =
55.3
35.2
2.25
0.70
15.4
10.9
43.5
22.0
5.21
5.57
Average % change in mean confidence interval widths = -0.5
Average % change in standard deviation of confidence interval widths
54.5
34.8
2.26
0.52
15.2
9.3
43.1
21.3
5.25
5.57
- 1.5
- 1.1
+ 0.4
-25.8
- 1.3
-14.7
- 0.9
- 3.2
+ 0.7
0.0
-10.3
-------
dangers of considering only such frequencies are discussed elsewhere, such as
in Sanders (1974) . Briefly, the problem is that of sampling at the same
point in every cycle of an underlying cyclic variation in quality. For
example, sampling on the same day in every week in a stream that exhibits a
weekly cycle in flow will cause aliasing of the collected water quality data.
A study of underlying cyclic variations in quality is, therefore, essential
before the selection of candidate sampling frequencies is made. A mathemat-
ical tool for performing such a study is spectral analysis. The application
of this tool to water quality is described in Wastler (196.3).
Nevertheless, for this study, "standard" sampling frequencies ranging
from twice a week to every two months were considered. Specifically, the
possible frequencies were 104, 52, 39, 26, 13, and 6 samples per year.
Results
The results of the design procedure may be found in Table 9. The table
shows the computed design sampling frequencies, design confidence interval
widths for each quality constituent, confidence interval widths that would
result from a uniform frequency of 26 samples per year at all stations, and
confidence interval widths that would result from the design sampling fre-
quency.
The level of improvement afforded by design sampling frequencies over
uniform sampling frequencies may be examined by comparing the means and
standard deviations of the confidence interval widths obtained in each case.
These results are also included in the table.
Taking an average over all five constituents, the mean confidence inter-
val widths decreased by 0.5 percent with respect to a uniform frequency
program, and the standard deviation of the confidence interval widths
decreased by 10.3 percent. Although using the same resources as uniform
sampling and allocating them via dynamic programming did not greatly improve
the average-size confidence intervals, it did provide for 10.3 percent more
uniformity in confidence widths across the network.
SENSITIVITY ANALYSIS
In order to determine how sensitive the dynamic programming solution is
to changes in the values of the input variables, the design problem was
repeated several times with a variety of input conditions. The results were
compared using the solution obtained previously as the standard of comparison
(or baseline run).
Incremental Travel Costs
The operating cost of travel was varied from $0.00 to $0.50 per km with
a baseline cost of $0.14 per km. Costs of $0.00, $0.10, $0.14, $0.18, $0.22,
and $0.26 per km yielded the same solution. However a cost of $0.50 per km
produced an increase in number of samples at station H3, which has a distance
of 44.3 km, and a decrease at station #9, which has a distance of 88.5 km.
These results are presented in Table 10. The lack of sensitivity to travel
70
-------
cost is expected since, for this example, the travel cost represents a rather
small fraction of the total design budget as indicated in the table.
TABLE 10. DESIGN SAMPLING FREQUENCIES BASED ON VARYING TRAVEL COSTS
Station
Distance
(km)
Design Frequency
(samples/yr) for
Cost = $0.14/km
Design Frequency
(samples/yr) for
Cost = $0.50/km
1
58.9
52
52
2
69.1
26
26
3
44.3
26
39
4
54.1
39
39
5
63.5
26
26
6
67.2
26
26
7
79.2
13
13
8
119.7
13
13
9
88.5
13
6
Total distance
644.5
Annual travel cost
2,350
8,350
Lab analysis cost
25,750
25,750
Total annual cost
$28,100
$34,100
Distances to Each Station
An alternate method of determining the distance associated with each
station was evaluated by using the one-way travel distance from the labora-
tory to the station. For these alternate distances, travel costs of $0.14
and $0.50 per km were applied. The results in the case of $0.14 per km are
the same as the baseline results except the alternate mileages produce a
sampling frequency of 6 samples per year rather than 13 at station #9. In
the case of $0.50 per km, there is additionally a shift of six samples per
year from station it!, which has a distance of 160 km, to station #7 which has
a distance of 120 km. These results are presented in Table 11.
Annual Operating Budget
The economic reasoning behind this study should apply for small varia-
tions about some predetermined level of activity. A variation of less than
+ 10 percent in the total budget was investigated while maintaining the
baseline travel and laboratory analysis costs. As indicated in Table 12 a
considerable difference in solutions resulted. The $2,600 decrease in budget
resulted in a design total of fewer samples collected per year while the
71
-------
$2,600 increase resulted in a total of 19 more samples per year than in the
baseline run.
TABLE 11. DESIGN SAMPLING FREQUENCIES BASED ON ALTERNATE TRAVEL DISTANCES
Station
Distance
(km)
Design Frequency
(samples/yr) for
cost = $0.14/km
Design Frequency
(samples/yr) for
Cost = $0.50/km
1
160
52
39
2
122
26
26
3
150
26
26
4
174
39
39
5
158
26
26
6
98
26
26
7
120
13
26
8
168
13
13
9
78
6
6
Total distance
1,228
Annual travel cost
4,500
15,950
Lab analysis cost
25,750
25,750
Total annual cost
$30,250
$41,700
The means and standard deviations of confidence interval widths for each
constituent are also given in Table 12 at each of the three budget levels.
At the lower budget ($25,500), the mean confidence interval width
increased by an average of 1.1 percent compared to a uniform frequency pro-
gram while the average standard deviation of the confidence interval widths
decreased by 9.6 percent. At the higher budget ($30,700) the mean confidence
interval width decreased by 1.2 percent, and the average standard deviation
of the confidence interval widths decreased by 12.7 percent when compared to
the uniform frequency program. In retrospect, the baseline budget of $28,100
produced decreases in the average means and average standard deviations of
confidence interval widths of 0.5 percent and 10.3 percent, respectively.
Note that the changes in the mean confidence interval widths are insignificant
in all cases, but improvements in uniformity of confidence intervals are
important.
Design Confidence Interval Widths
The design confidence interval widths for each constituent were reassigned
based on a uniform sampling frequency of 13 samples per year. The total
budget was accordingly reduced from $28,100 to $14,050. The alternate design
72
-------
TABLE 12. EFFECT OF VARIATION IN TOTAL OPERATING BUDGET ON
DESIGN SAMPLING FREQUENCIES AND MONITORING SYSTEM PERFORMANCE,
ILLINOIS NETWORK
(A) $25,500 Budget
Predicted Confidence Interval Widths
Station
Number
Design Sampling
Frequency
(samples/year)
TDS
(mg/£)
T0C
(mg/£)
SS
(mg/£.)
Hardness
(mg/ St)
no3
(mg/St)
1
39
80.7
3.32
36.4
59.8
1.91
2
26
17.0
2.30
12.0
25.10
5.03
3
26
35.3
2.38
17.8
36.8
5.39
4
39
64.3
1.81
12.2
22.1
2.24
5
26
37.0
1.92
14.3
36.4
4.14
6
26
132.1
2.29
23.6
93.1
19.5
7
13
60.0
2.82
7.2
40.3
2.01
8
13
32.8
2.30
9.2
35.1
5.37
9
6
35.8
1.86
8.1
42.2
1.88
Total ;
samples 214
Mean confidence interval width
55.0
2.33
15.6
43.4
5.27
% change over uniform
f requency
-0.5
+3.5
+1.3
-0.2
+1.2
Standard deviation of
confidence interval
widths
34.9
0.49
9.3
21.5
5.55
% change over uniform
frequency
-0.9
-30.3
-14.4
-2.3
-0.3
Average !
Z change in mean confidence interval width
= 1.1
Average !
% change in standard deviation
of confidence interval widths
- 9.6
(continued)
73
-------
TABLE 12. (continued)
(B) $30,
700 Budget
P re tii
cted Conf
idence
Interval Widths
Station
Nun; hi* r
Design Sampling
F requency
(samp 1es/year)
TDS
(mg/»)
TOC
(mg/ ¦' )
SS
(mg/>!)
Hardness
(mg/¦)
NO
(mg/?.)
J
52
78. 1
3. 14
3 5.2
57.9
1 .87
2
26
17.0
2 . 30
12.0
2 5.10
5.03
3
39
31. 4
2.15
16.7
3 5.4
5. 11
4
39
64. 3
1.81
12.2
22. 1
2.24
5
26
37.0
1 .92
14.3
36.4
4 . 14
6
26
] 32 .1
2,29
23.6
93.1
19.50
7
26
58.8
2.40
5.4
39. 3
1 .49
8
i 3
32.8
2. 30
9.2
3 5.1
5.37
9
6
3 5.8
1 .86
8. 1
42.2
1.88
Total s
iamples 2 53
Mean confidence interval width
54. J
2.24
15.2
4 2.9
5.18
% change over uniform
frequency
-2.2
-0.4
-1 . 3
-1.4
-0. 5
Standard
eonf i de
widths
deviation of
nee interval
34.9
0-40
9.2
21 .4
5.59
7. change over uniform
frequency
-2.0
42.6
-15.7
-2.7
+0.3
Average %
change in mean confidence interval width
- -1.2
Average %
change in standard deviation of
conf iden
ce interval widths
- -12.7
74
-------
confidence interval widths and resulting sampling frequencies are shown in
Table 13. These results represent a reallocation of samples about a uniform
frequency of 13 samples per year rather than 26 samples per year as in
previous designs, but the same stations tend to get the greatest number of
samples as in the baseline run.
TABLE 13. EFFECT OF DESIGN CONFIDENCE INTERVAL WIDTHS ON
DESIGN SAMPLING FREQUENCIES
(a)
(b)
Design Confidence
Design Confidence
Interval Widths Based on
Interval Widths Based on
Constituent
26 Samples/Year, (mg/£.)
26 Samples/Year, (mg/S,)
Total dissolved solids
37.0
47.4
Total organic carbon
2.29
2.82
Suspended solids
14.9
19.7
Total hardness
36.8
40.3
Nitrates
1.99
5.10
Sampling Frequency
Sampling Frequency
Station
(samples/year)
(samples/year)
1
52
26
2
26
13
3
26
13
4
39
13
5
26
13
6
26
13
7
13
6
8
13
13
9
13
6
Total samples
234
116
Water Quality Constituents Included
Sampling frequencies were determined based on a single constituent
(total dissolved solids) and on three constituents (total dissolved solids,
total organic, carbon, and nitrates), in addition to the baseline run with
five constituents. The. results are presented in Table 14 using the same
summary statistics as before. As one would expect, the greatest improvement
for the design constituents is seen when fewer are included in the design.
Since the concentrations of many water quality constituents are quite
correlated with each other (or to flow), one would expect an improvement in
overall uniformity of confidence interval widths from designs based on only a
75
-------
TABLE 14. EFFECT OF WATER QUALITY CONSTITUENT SELECTION ON
DESIGN SAMPLING FREQUENCIES AND SYSTEM PERFORMANCE,
ILLINOIS NETWORK
(A) Design Based on TDS Only
Design Sampling
F requency
(samples/year)
Predicted Confidence
Interval Widths
Station
Number
TDS
(mg/£)
TOC
(mg/J£)
SS
(mg/5,)
Hardness
(mg/Jl)
no3-
(mg/I)
1
52
78.1
3.14
35.2
57. 9
1.87
2
6
33.0
4.00
17.8
47.4
7.57
3
26
35.3
2.38
17.8
36.8
5.39
4
52
62.8
1.54
10.9
21.7
1.99
5
2.6
37.0
1.92
14.3
36.4
4.14
6
26
132.1
2.29
23.6
93.1
19.50
7
26
58.8
2.40
5.4
39.3
1.49
8
13
32.8
2.30
9.2
35.1
5.37
9
6
35.8
1.86
8.1
42.2
1.88
Total
samples 233
Mean confidence interval width
56.2
2.43
15.8
45.5
5.47
% change over uniform
frequency
-1.6
+8.0
+2.6
+4.5
+5.0
Standard deviation of
confidence interval
widths
32.8
0.740
9.2
20.3
5.67
% change over uniform
frequency
-6.8
+6.0
-15.6
-7.8
+1.8
Average % change in mean confidence interval width = +3.7
Average % change in standard deviation of confidence interval widths = -4.5
(continued)
76
-------
TABLE 14, (continued)
(B) Design Based on TDS, TOC, and NO^
Design Sampling TDS TOC SS Hardness NO
Station Frequency 3
Number (samples/year) (mg/£) (mg/Jl) (mg/£) (mg/SL) (mg/d)
1 39
80.7
3.32
36.4
59.8
1.91
2 26
17.0
2.30
12.0
25.10
5.03
3 26
35.3
2.38
17.8
36.8
5.39
4 39
64.3
1.81
12.2
22.1
2.24
5 26
37.0
1.92
14.3
36.4
4.14
6 26
132.1
2.29
23.6
93.1
19.50
7 26
58.8
2.40
5.4
39.3
1.49
8 13
32.8
2.30
9.2
35.1
5.37
9 6
35.8
1.86
8.1
42.2
1.88
Total samples 227
Mean confidence interval
width 54.9
2.29
15.4
43.3
5.22
% change over uniform
frequency
-0.7
+1.8
0.0
-0.5
+0.2
Standard deviation of
confidence interval
widths
34.9
0.46
9.5
21.5
5.59
% change over uniform
frequency
-0.8
-34.1
-12.8
-2.3
+0.4
Average % change in mean
confidence interval widths
= +1.6
Average % change in standard deviation
of confidence interval
widths =
-9.9
77
-------
very few constituents. These results support that conclusion. Also, as more
constituents are included, the individual improvements become compromised and
less and less is gained. One could conclude that if each of the 30 measured
constituents were equally important, uniform frequency sampling would be the
best alternative.
78
-------
SECTION 7
SYNOPSIS
Three sets of data records for several water quality constituents were
analyzed to examine the effects of using various levels of statistics to
design water quality monitoring networks. The effects measured were the
errors in computing confidence interval widths about the annual mean for each
constituent.
For each set of data, an estimate of the deterministic annual variation
and serial correlation structure was computed. The annual cycles were deter-
mined by estimating the coefficients A and C of the equation:
y = A(cos wt + C) (32)
where y => deterministic component at time t
w = 360 degrees/number of samples per year
The correlation structures were determined by fitting the coefficients,
<|>^ , 2 » and , of the autoregressive, moving-average model:
Zt " dlZt - 1 + d2Zt - 2 + at " Rlat - 1 (33)
where Z¦ value of time series at time t
afc = random noise at time t
for each water quality constituent and then calculating theoretical autocor-
relation functions based on the fitted models.
Confidence interval widths about annual geometric means were then deter-
mined for a range of sampling frequencies for each constituent. The confi-
dence interval widths were computed in three ways:
2
1. Based on the variance of the correlated noise (a ) and accounting
for the effect of serial correlation. z
2. Based on the variance of the series with the deterministic component
removed (variance of correlated noise).
3. Based on the variance of the original time series.
79
-------
The relative error resulting from using the simpler computational methods,
the second and third as compared with the first method was examined.
A dynamic programming code was then formulated for the purpose of
assigning sampling frequencies throughout a network in order to minimize a
statistical objective function with an economic constraint. The objective
function is the sum (over several selected constituents and all stations)
of the normalized positive deviation of the predicted confidence interval
widths from preselected design confidence interval widths. The code was
designed to account for the effects of deterministic seasonal variation and
serial correlation by incorporating the results of the time series analysis
just described. The economic constraint ensures that the annual operating
cost of the system, including direct costs of travel and laboratory analysis,
will not exceed the allowable budget.
As an example situation, the dynamic programming code was used to assign
sampling frequencies to the nine stations in Illinois from which data had
been obtained and analyzed. Design confidence interval widths were adopted
as the median confidence interval width for each constituent over all sta-
tions based on a sampling frequency of 26 samples per year. Using five
design water quality constituents and representative travel and laboratory
costs, a baseline design was produced. A sensitivity analysis was then
performed by varying the values of the input parameters of the optimization
routine. The input variables that were varied included the cost of travel,
annual operating budget, design confidence interval widths, and number of
quality constituents included in the analysis.
80
-------
SECTION 8
ASSUMPTIONS
Water quality monitoring involves sampling a correlated time series,
which is a realization of a stochastic process, the exact nature of which is
unknown. Thus assumptions have to be made relative to the way in which a
water quality population varies over time. Additional assumptions are nec-
essary in order to facilitate an economic analysis. Consequently, the
results of this research should be viewed in light of the limitations under
which the research was conducted.
Perhaps the most serious limitation is the lack of adequate historical
water quality records. The records used in this study are among the best
available today, and yet their limited length is inadequate for an accurate
estimation of deterministic seasonal components and somewhat inadequate for
fitting time series models. Practically speaking, regulatory agencies will
seldom have daily water quality records. However, records consisting of
infrequent and unevenly spaced observations may be used to estimate deter-
ministic seasonal cycles provided the record length is adequate. A closely
related limitation is the assumption that estimated population parameters
such as the mean and variance represent true population values. Such an
assumption is, of course, only as good as the data used in estimation.
Although this report deals strictly with sampling frequency selection,
it is essential that sample collection procedures be evaluated and improved
as a part of any upgrading of a water quality monitoring system. This
improvement should be geared toward assuring that future samples are repre-
sentative in both space and time of water quality conditions as they actually
exist in the stream.
Some common problems in sample collection that produce distorted water
quality information are failure to account for weekly cyclic variation in
quality, failure to account for diurnal variation in quality, and failure to
account for cross-sectional variation in quality. Some possible solutions to
these problems are: (1) using sampling intervals other than multiples of a
week; (2) collecting 24-hour composite samples; and (3) collecting multiple
samples along each stream cross section, respectively. A more in-depth
discussion of these considerations is presented in Sanders (1974).
A major limitation with regard to the application of these results is
that of statistical and mathematical expertise required by agency personnel.
Virtually all of the techniques described here would require some additional
training of personnel. However, all of them can be applied in "cookbook"
fashion without an understanding of all of the underlying mathematics.
81
-------
Fitting ARMA models requires that the necessary software packages (such as
IMSL) be available, but the other procedures—fitting the seasonal component,
computing confidence interval widths, and assigning sampling frequencies via
dynamic programming—can be accomplished if well-documented Fortran programs
are supplied to agencies.
A third limitation is that imposed by the economic viewpoint taken here.
This viewpoint requires that an existing network be in operation, that the
overall scale of the monitoring network not be subject to change, that the
fraction of the total budget allocated to operating costs of travel and
laboratory analysis be identified, and that incremental costs resulting from
a reallocation of sampling frequencies be identified. This viewpoint is
admittedly restrictive, but in order to extend the analysis to optimize the
total monitoring program in an economic sense, it would be necessary to
translate the value of water quality data into dollars and cents. Such an
objective would require an extensive research effort beyond that attempted
here.
A final limitation on the value of these results is caused by the
failure of most regulatory agencies to carefully define the ultimate use of
water quality data in management decisions. Without such definition there
cannot exist fully rational approaches toward the more subjective aspects of
network design. In the design procedures outlined here, the subjective
aspects include the selection of water quality constituents to be included in
the analysis and the assignment of design confidence interval widths for each
constituent.
82
-------
REFERENCES
Anderson, D. J. 1978. Water Quality Control Division, Colorado Department
of Health, Denver, Colorado. Personal communication, July.
Beckers, C. V., and S. G. Chamberlain. 1974. Design of cost-effective water
quality surveillance systems. Environmental Protection Agency, Socio-
economic Environmental Studies Series Report No. EPA-600/5-74-004,
January.
Bendat, J. S., and A. G. Piersol. 1971. Random Data: Analysis and Measure-
ment Procedures. John Wiley and Sons, Inc., New York.
Bourdimos, E. L., S. L. Yu, and R. A. Hahn. 1974. Statistical analysis of
daily water quality data. Water Resources Research 10(5): 925-941.
Bowker, A. H., and G. J. Lieberman. 1972. Engineering Statistics, Second
Edition. Prentice-Hall, Inc., Englewood, New Jersey, 641 p.
Box, G. P., and G. M. Jenkins. 1976. Time Series Analysis: Forecasting and
Control. Holden-Day, San Francisco.
Briggs, J. C., and J. F. Ficke. 1978. Quality of rivers of the United
States, 1975 water year—Based on the National Stream Quality Accounting
Network (NASQAN). U. S. Geological Survey Open File Report 78-200,
August, 436 p.
Cleary, J. C. 1978. Perspective on river quality diagnosis. Journal Water
Pollution Control Federation 50(5): 825-832.
Cooley, J. L. 1976. Nonpoint pollution and water quality monitoring.
Journal of Soil and Water Conservation 31(2): 42-43.
Cragwall, J. S. 1976. The national stream quality accounting network. The
Military Engineer, No. 441, January-February.
Harmeson, H. H. 1978. Illinois State Water Survey, Urbana, Illinois.
Personal communication, June.
Hawkinson, R. D., and J. F. Ficke. 1975. The National Stream Quality
Accounting Network (NASQAN)—Some questions and answers. U, S. Geologi-
cal Survey Circular 719, 23 p.
83
-------
Hawkinson, R. 0., J. F. Ficke, and L. G. Saindon. 1977. Quality of rivers
of the United States, 1974 Water Year—Based on the National Stream
Quality Accounting Network (NASQAN). U. S. Geological Survey Open File
Report 77-151, February.
Hillier, F. S., and G. J. Lieberman. 1974. Operations Research. Holden-
Day, Inc., San Fransisco, California.
Hines, W. G., D. A. Rickert, and Stuart W. McKenzie. 1977. Hydrologic
analysis and rdver-quality data programs. Journal Water Pollution
Control Federation 49(9): 2031-2041.
Hipel, K. W., A. I. McLeod, and W. C. Lennox. 1977. Advances in Box-Jenkins
modeling construction. Water Resources Research 13(3): 567-575.
Kendrick, P. J. 1977. What's right and wrong with EPA: and NAS analysis.
Journal Water Pollution Control Federation 49(9): 1951-1954.
Landwehr, J. M. 1978. Some properties of the geometric mean and its use in
water quality standards. 14(3): 467-473.
Lettenmaier, D. P. 1975. Design of monitoring systems for detection of
trends in stream quality. Technical Report No. 39, Charles W. Harris
Hydraulic Laboratory, University of Washington, Seattle, August.
Lettenmaier, D. P. 1976. Detection of trends in water quality data from
records with dependent observations. Water Resources Research 12(5):
1037-1046.
Lettenmaier, D. P. 1978. Design considerations for ambient stream quality
monitoring. Water Resources Research 14(4): 884-901.
Lettenmaier, D. P., and S. J. Burges. 1976a. Design of trend monitoring
networks. Presented at the National Conference on Environmental Engi-
neering Research, Development, and Design, July 12-14, Seattle, Washing-
ton.
Lettenmaier, D. P., and S. J. Burges. 1976b. Use of state estimation
techniques in water resource system modeling. Water Resources Bulletin
12(1): 83-99.
Loftis, J. C. 1978. Statistical and economic considerations for improving
regulatory water quality monitoring. Ph.D. Thesis, Agricultural and
Chemical Engineering Department, Colorado State University, Fort
Collins, Colorado 80523.
McLeod, A. I., K. W. Hipel, and W. C. Lennox. 1977. Advances in Box-Jenkins
modeling, 2, applications. Water Resources Research 13(3): 577-586.
McMichael, F. C., and S. J. Hunter. 1972. Stochastic modeling of tempera-
ture and flow in rivers. Water Resources Research 8(1): 87-98.
84
-------
Matalas, N. C. , and W. B. Langbein. 1962. Information content of the mean.
Journal of Geophysical Research 67(9): 3441-3448.
Montgomery, H. A. C., and I. C. Hart. 1974. The design of sampling pro-
grammes for rivers and effluents. Journal of the Institute of Water
Pollution Control 33(1): 77-101.
Moore, S. F. , G. C. Dandy, and R. J. DeLucia. 1976. Describing variance
with a simple water quality model and hypothetical sampling program.
Water Resources Research 12(4): 795-804.
Moss, M. E., D. P. Lettenmaier, and E. F. Wood. 1978. On the design of
hydrologic data networks. EOS, Transactions American Geophysical Union
59(8): 772-775.
National Academy of Sciences. 1977. Analytical studies for the U. S.
Environmental Protection Agency, Volume 4, Environmental Monitoring.
Pisano, M. 1976. Nonpoint polluton: an EPA view of areawide water quality
management. Journal of Soil and Water Conservation 31(3): 94-100.
Rodriguez-Iturbe, I. 1969. Estimation of statistical parameters for annual
river flows. Water Resources Research 5(6): 1418-1421.
Sanders, T. G. 1974. Rational design criteria for a river quality moni-
toring network. Ph.D. Thesis, Civil Engineering Department, University
of Massachusetts, August.
Sander9, T. G., and D. D. Adrian. 1978. Sampling frequency for river
quality monitoring. Water Resources Research 14(4): 569-576.
Sanders, T. G., and Ward, R. C. 1978. Relating stream standards to regu-
latory water quality monitoring practices. Proceedings of the American
Water Resources Association Symposium on Establishment of Water Quality
Monitoring Programs, June 12-14.
Sherwani, J. K., and P. H. Moreau. 1975. Strategies for water quality
monitoring. Report No. 107, Water Resources Research Institute of the
University of North Carolina, 124 Riddick Bldg., North Carolina State
University, Raleigh, June.
Steele, T. D., E. J. Gilroy, and R. 0. Hawkinson. 1974. An assessment of
areal and temporal variations in streamflow quality using selected data
from the National Stream Quality Accounting Network (NASQAN). U. S.
Geological Survey Open File Report 74-217, August.
U. S. Army Corps of Engineers. 1978. Unpublished laboratory analysis cost
data collected by James Westhoff, Chief, Analytical Laboratory Group,
Environmental Laboratory, Waterways Experiment Station, Vicksburg,
Mississippi.
85
-------
Ward, R. C. 1973. Data acquisition systems in water quality management.
Environmental Protection Agency, Socioeconomic Environmental Studies
Series Report No. EPA-R5-73-014, May.
Ward, R. C., K. S. Nielsen, and M. Bundgaard-Nielsen. 1976. Design of
monitoring systems for water quality management. Contributions from the
Water Quality Institute, Danish Academy of Technical Sciences, No. 3,
December.
Wastler, T. A. 1963. Application of spectral analysis to stream and estuary
field surveys—1. Individual power spectra Publication No. 999-WP-7,
U. S. Public Health Service, Washington, D. C.
86
-------
APPENDIX
ESTIMATED VALUES OF DETERMINISTIC COEFFICIENTS FOR SEASONAL COMPONENTS
OF WATER QUALITY TIME SERIES
Water Quality
yt =
A cos (wt + C) + mt
Constituent
L
Location
log (mg/Ji)
A
C
m
Grand River,
Specific conductance
-0.105
1.03
0.446
Michigan
(umhos/cm)
Total phosphate
-0.171
1. 23
0.0013
Sulfate
-0.119
0.462
0.0816
Chloride
-0.334
0.718
0.0015
Red River,
Specific conductance
0.265
1.94
Manitoba
(pmhos/cm)
Bicarbonate
0.276
65.4
Sodium
0.508
1.37
Chloride
0.653
1.47
Little Wabash River,
TDS
-0.268
5.60
Illinois, Sta.
1
TOC
0.0646
-0.0771
SS
-1.15
1.46
Hardness
0.344
8.81
Nitrate
0.369
1.54
Kankakee River,
TDS
0.0410
1.21
Illinois, Sta.
2
TOC
-0.357
1.80
SS
-1.47
1.72
Hardness
-0.0301
0.997
Nitrate
-.394
0.790
(continued)
87
-------
APPENDIX (continued)
Location
Water Quality
Constituent
log (mg/Z)
y^_ = A cos (wt + C) + mt
Kankakee River,
Illinois, Sta. 3
Chicago Ship Canal,
Illinois, Sta. 4
Illinois River,
Illinois, Sta. 5
Vermillion River,
Illinois, Sta. 6
TDS
TOC
SS
Hardness
Nitrate
TDS
TOC
SS
Hardness
Nitrate
TDS
TOC
SS
Hardness
Nitrate
TDS
TOC
SS
Hardness
Nitrate
0.0648
-0.282
-1.35
0.0578
-0.401
-0.0983
-0.0297
0.151
-0.0858
0.576
0.032
-0.159
-0.530
-0,0364
-0.272
0.211
0.208
-1.18
0.170
-1.06
1.14
1.84
1.65
2.20
1.42
0.661
0.985
1.46
0.874
-0.017
14.9
1.91
1.94
1,03
14.9
1.61
0.0401
1.67
1.53
0.904
(continued)
88
-------
APPENDIX (continued)
Location
Water Quality
Constituent
log (mg/I)
= A cos (wt + C) + mt
m
Eureka Lake,
Illinois, Sta. 7
Canton Lake,
Illinois, Sta. 8
Sangamon River,
Illinois, Sta. 9
TDS
TOC
SS
Hardness
Nitrate
TDS
TOC
SS
Hardness
Nitrate
TDS
TOC
SS
Hardness
Nitrate
0.180
0.0597
-0.553
0.156
-0.413
0.134
-0.133
-0.742
-0.157
-0.733
0.0806
-0.0846
-0.979
0.0753
-2.019
2.17
34.41
2.15
1.79
2.09
-3.43
2.00
1.93
-6.61
0.385
1.71
1.49
1.40
1.46
1.22
89
-------
TECHNICAL REPORT DATA
(Please read Instructions on the reverse before completing)
1. REPORT NO.
EPA-600/4-79-055
3. RECIPIENT'S ACCESSION NO.
4. TITLE AND SUBTITLE
REGULATORY WATER QUALITY MONITORING NETWORKS-
STATISTICAL & ECONOMIC CONSIDERATIONS
5. REPORT DATE
August 1979
6. PERFORMING ORGANIZATION CODE
7. AUTHOR(S)
Jim C. Loftis
Robert C. Ward
8. PERFORMING ORGANIZATION REPORT NO.
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Colorado State University
Fort Collins, Co. 80523
10. PROGRAM ELEMENT NO.
1NE833
11. CONTRACT/GRANT NO.
R 805759010
12. SPONSORING AGENCY NAME AND ADDRESS
U.S. Environmental Protection Agency-Las Vegas, NV
Office of Research and Development
Environmental Monitoring and Support Laboratory
Las Vegas, Nevada 89114
13. TYPE OF REPORT AND PERIOD COVERED
Final
14. SPONSORING AGENCY CODE
EPA/600/07
15. SUPPLEMENTARY NOTES
EMSL-LV Project Officer for this report is Donald B. Gilmore,
Telephone (702) 736-2969, X241 or FTS 595-2969, X241
Commercial
16. ABSTRACT
The purpose of this study is to examine and quantify the statistical trade-offs
associated with using various levels of statistical sophistication in network design
and to formulate a procedure for accounting for economic constraints in design
process. Sampling frequency is the major aspect of network design considered in the
study; consequently, the results of the study are directed toward their use by
regulatory agencies for the evaluation and upgrading of existing networks.
17.
KEY WORDS AND DOCUMENT ANALYSIS
DESCRIPTORS
b. IDENTIFIERS/OPEN ENDED TERMS
COSATl l ield/Group
Water quality
Design techniques
Water monitoring network
Design criteria
7 0B
68D
18. DISTRIBUTION STATEMENT
RELEASE TO PUBLIC
19. SECURITY CLASS (This Report)
UNCLASSIFIED
21. NO. OF PAGES
90
20. SECURITY CLASS (This page)
UNCLASSIFIED
22. PRICE
AO 5
EPA Form 2220-1 (Rev. 4-77) previous edition is obsolete
------- |