V.SDV.'EST RESEARCH INSTITUTE
feo
METHODOLOGIES FOR DETERMINING TRENDS IN
WATER QUALITY DATA
by
Kan'n M. Bauer
William D. Glauz
Jairus D. Flora
FINAL REPORT
July 3, 1984
EPA Contract No. 68-02-3938, Assignment No. 29
MRI Project No. 8205-5(29)
Prepared for
Industrial Environmental Research Laboratories
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Attn: Ms. Susan Svirsky
Task Manager (WH-553)
^^
MIDWEST RESEARCH INSTITUTE 425 VOLKER BOULEVARD, KANSAS CITY/MISSOURI 64110 • 816 753-7600
-------
METHODOLOGIES FOR DETERMINING TRENDS IN
WATER QUALITY DATA
by
Karin M. Bauer
William D. Glauz
Jairus D. Flora
FINAL REPORT
July 3, 1984
EPA Contract No. 68-02-3938, Assignment No. 29
MRI Project No. 8205-5(29)
Prepared for
Industrial Environmental Research Laboratories
U.S. Environmental Protection Agency
Research Triangle Park, North Carolina 27711
Attn: Ms. Susan Svirsky
Task Manager (WH-553)
gs^^
r.;j\VZST RESEARCH INSTITUTE 425 VOLKER BOULEVARD. KANSAS CITY. MiSSOUR! 6-^C • 616 753-76JQ
-------
PREFACE
Section 305(b) of the U.S. Clean Water Act requires that the States
report biennially on the quality of their navigable waters. To assist the
States in the preparation of these reports, the Environmental Protection
Agency issues guidance information. This report on water quality trend
determination was prepared by Midwest Research Institute for use by EPA as
a portion of the guidance information for helping the States in preparing
their 1984 reports. The authors are indebted to the several EPA reviewers
who made many thoughtful and worthwhile suggestions to improve the earlier
draft.
11
-------
TABLE OF CONTENTS
I. Introduction 1
II. Purpose 4
III. Important Considerations 6
A. What Is A Trend? 6
B. What Is A Change? 6
C. Seasonal Effects 7
D. What Comprises An Observed Series? 7
E. How Much Data? 10
F. How Often? 11
G. Hypothesis Testing 12
H. Estimation 14
I. The Normal Distribution 15
J. Parametric or Distribution-Free? 20
K. Plotting 22
L. Organization of this Document 23
IV. Parametric Procedures 27
A. Methods to Deseasonalize Data 27
B. Regression Analysis 28
C. Student's T-Test 32
D. Trend and Change 38
E. Time Series Analysis 39
V. Distribution-Free Methods. .... 41
A. Runs Tests for Randomness 41
B. Kendall's Tau Test 45
C. The Wilcoxon Rank Sum Test (Step Trend) 50
D. Seasonal Kendall's Test for Trend 57
E. Aligned Rank Sum Test for Seasonal Data
(Step Trend) 64
F. Trend and Change 69
VI. Special Problems . 71
A. Missing Data 71
B. Outlying Observations 72
C. Test for Normality 73
D. Detection Limits 75
E. Flow Adjustments 76
Bibliography 78
Appendix 82
-------
List of Figures
Figure Title Page
1 Composite Series 8
2 Sample Trend Line 9
3 Two Normal Distributions of a Variable X 16
4 Standard Normal Distribution 17
5 Decision Tree 24
6 Sample Linear Regression Line 29
7. Monthly Concentrations of Total Phosphorus 58
8 Plot of Percentage Violations Versus Time 61
9 Example of Plot on Probability Paper 74
IV
-------
List of Tables
Table Title Page
1 Upper Tail Probabilities for the Standard Normal
Distribution 19
2 Two-Tailed Significance Levels of Student's T 33
3 The 5% (Roman Type) and 1% (Boldface Type) Points
for the Distribution of F 37
4 r-Tables Showing 5% Levels for Runs Test 42
5 Upper Tail Probabilities for the Null Distribution
of Kendall's K Statistic 48
6 Upper Tail Probabilities for Wilcoxon's Rank Sum W
Statistic 53
-------
SECTION I. INTRODUCTION
Water quality reports prepared by the States and jurisdictions under
Section 305(b) of the Clean Water Act contain information on a wide variety
of parameters. Because the reports cover a relatively short 2-year period,
it is often difficult to determine with reasonable certainty whether water
quality has improved, remained constant, or degraded from one reporting
period to the next. In order to assess water quality trends, methods must be
used which differentiate a consistent trend in a given direction from natural
cyclic or seasonal changes or occasional excursions from the norm.
Generally, water quality data are affected by seasonal or cyclic
effects, an episodic or regular effect, a long-term monotonic trend, and
random noise. The purpose of this document is to provide a guidance in the
usage of statistical methods to separate out these effects and to test for
each of them.
The statistical methods described in this document provide the means for
the analyst to make this consistent trend assessment by using and testing
concepts of data validity and significance. These statistical techniques
will allow the analyst to detect the presence of long-term trends and to put
confidence intervals on the magnitude of trends or changes.
Once a specific statistical approach is established, it can be used in
succeeding years. As a result, the data-gathering process itself may be
modified, thereby increasing the quality and value of future data. Another
potential benefit is the discovery that historical data can have previously
unappreciated value; retrospective analyses then become possible.
-------
In principle, statistical methods can be applied to all the data
collected during the water quality monitoring process. These data include
chemical parameter values for water, sediment, and tissue samples (e.g.,
DO, pH, nutrients, trace metals, specific organic compounds); physical
parameter values (e.g., turbidity, suspended solids, light penetration);
biological parameter values (e.g., macroinvertebrate populations, other
aquatic species); pollutant discharge or source data; and derived values
which combine several parameters in a composite evaluation of water quality
(e.g., trophic indices, water quality indices, violation statistics).
However, the water quality analyst is advised to apply statistical methods
initially to only those parameters which are of greatest interest or
priority and which offer the most extensive and reliable data. As the
analyst becomes more familiar with statistical methods, he will then be
able to expand the scope of analysis to most or all of the parameters.
Suggestions for the scope of an initial effort to use statistical
methods in determining water quality trends are as follows:
1. The analyst should select for analysis those chemical water
quality parameters which are of greatest importance to the State, to
selected areas of the State, or to the designated use in question.
2. The analyst should consider important biological parameters such
as macroinvertebrate populations and distributions.
3. A minimum of two or three of the important parameters, including
at least one from each of the categories defined in (1) and (2), should be
statistically analyzed by methods designed to obtain valid estimates of
trends.
4. The analyst will probably find that the period of these data bases
will need to be extended beyond the 2-year period specifically called for
in the 305(b) reports, so parameters should be selected for which
comparable data were collected in earlier years. Data from previous 305(b)
reporting periods could be used to augment the current period data.
-------
These analyses should be conducted first for specific water bodies
rather than statewide. Further, it may be necessary to initially limit an
analysis to one part of a specific water system, with later extension to
the remainder of the system. The results of data analysis should tell the
analyst which is the appropriate and valid scope.
As mentioned earlier, the analyst may find it appropriate to use a
statistical approach to modify ongoing data-gathering activities. In
addition to improving the quality of the water quality data which are
collected, this may reduce the cost of data collection because data will
now be acquired more selectively.
In summary, there are two basic perspectives from which the analyst
should view the results of the application of statistical methods to water
quality data: the importance of the detected trends themselves, for what
they say about both the State's water quality and its water quality
programs; and the impact of the statistical methods and their results on
ongoing water quality data collection activities.
-------
SECTION II. PURPOSE
The purpose of this document is to provide guidance to the States and
Regions in analyzing for trends in their water quality data. It is not a
statistical treatise. Indeed, some statisticians may be concerned at the
looseness with which terms and symbols seem to be used; the lack of mention
of all the ifs, ands, and buts; and the possibility that the reader may end
up indiscriminantly using one test when another is more proper or more
powerful.
Rather, this document is aimed at persons with little background in
statistics, or even much in algebra. Most of the methods discussed require
only simple arithmetic and the use of standard tables. As such, it is a "how
to" approach, with little or no theory. In that respect, it is rather like
the statistical programs available now on most computers, which are designed
so that a nonstatistician can readily feed in data, select the statistical
method to be used, and then read and interpret the answers. In fact, many of
the methods are available as computer programs, and examples of these are
given in this document.
To be sure, there are distinct dangers in this approach, just as there
are when nonstatisticians use the "canned" computer programs. It is always
possible to use an inappropriate or incorrect test. This document will
therefore warn the reader of the major limitations and potential pitfalls to
be avoided. Ideally, of course, the analyst would always select the "best"
test; however, the "best" test is likely to require extra effort, access to
special computer programs, and the like. Realizing this, the analyst may be
tempted to give up and do nothing at all, or, worse, make arbitrary judgments
about "trends" in his data, with no justification. It seems better, in such
cases, to have some statistical backup, even if it is limited, before such
determinations are caade.
-------
Of course, if the analyst does have some background in statistics, has
a statistician available to assist or as a consultant, or has access to
statistical computer programs, these advantages should be applied. In such
cases, the material in this document should assist the analyst in making
even better use of his available resources.
Finally, for the reader desiring further information, all the
descriptions of the methods and tests given here include references to the
(more rigorous) statistical literature.
-------
SECTION III. IMPORTANT CONSIDERATIONS
A. What Is A Trend?
. The concept of "trend" is difficult to define. Generally, one thinks of
it as a smooth, long-term movement in an ordered series of measurements over
a "long" period of time. However, the term "long" is rather arbitrary and
what is long for one purpose may be short for another. For example, a
systematic movement in climatic conditions over a century would be regarded
as a trend for most purposes, but might be part of an oscillatory or cyclical
movement taking place over geological periods of time. In speaking of a
trend, therefore, one must bear in mind the length of the time period to
which the statement refers. For our purposes, we shall define a trend to be
that aspect of a series of observations that exhibits a steady increase or
decrease over the length of time of the observations, as opposed to a
"change," described next.
B. What Is A Change?
It is important to distinguish between the terms "trend" and "change";
they are often incorrectly used interchangeably in water quality reports.
However, they are not equivalent terms and should be carefully distinguished.
A "change" is a sudden difference in water quality associated with a discrete
event. For example, suppose for a period of time at a given station,
concentrations of some pollutants were persistently high. Then a new
treatment facility is placed into operation and, from then on, the
concentrations of these same pollutants are lower. This sudden improvement
in stream water quality should be referred to as a change, not a trend. A
change is sometimes called a "step trend" and, for clarity, the other is
called a "long-term trend."
-------
C. Seasonal Effects
Observations taken over time at regular intervals often exhibit a
"seasonal" effect. By this, a regular cycle is meant. For monthly data a
seasonal effect is a regular change over the year that more or less repeats
itself in succeeding years. This might be associated with different flow
rates corresponding to the seasonal pattern of precipitation and melting in
the watershed above a particular observation station. Here, seasonal
components will be restricted to refer to annual cycles, although in other
applications with different frequencies of observation, cycles could occur
quarterly, monthly or daily.
D. What Comprises An Observed Series?
A series of observations can be broken into several parts or components,
as illustrated in Figure 1 below. In this figure, each component of the
series has been generated separately. The seasonal component is represented
by a sine wave plotted in part a. A discrete change occurred at month 18 and
is shown in part b, along with a linear trend represented by the straight
line. Finally, part c shows an irregular or random component. These parts
are summed to give the series as it would be observed in part d of Figure 1.
In practice, one observes a series of data such as illustrated in part d of
Figure 1. The objective of trend analysis is to determine whether such a
series has a significant long-term trend or a discrete change associated with
a particular event.
It is often the case that a trend is not immediately obvious from a
series of "raw" measurements. Figure 2 depicts a series of ammonia
concentration values measured monthly at a single station. Statistical
techniques can be used to obtain the "deseasonalized" data (more will be said
about this later), and the trend line for the deseasonalized data. (This
particular example was taken from STORET). The horizontal axis represents
time (months) and the vertical axis is the ammonia concentration in mg/1.
Trend analysis techniques attempt to sort out the data displayed in Figure 2
into components such as those illustrated in Figure 1 using statistical
7
-------
(a) Seasonal Component
:X%%A^%^
• •••••(••'••^•••••^•••••••••••{••••••••••••••••[•••••••••••i*t«t«X«««»*I««*»t|«««**«»*«««|*«««*v*****l*««**X"
8--
4--
. c.
0
(b) Trend Component With Change
(c) Random Component
(d) Water Quality Data
o>
< --,
^
•I { I H I I I I I I I I I I f
048 16 24 32 40 48 56 64
MONTH
Figure 1. Composite Series
8
I £
-------
0.720
0.576
0.432
c
o
§ 0.288
0.144
0
0
• Observed Data
—— Deseasonaiized Data
Linear Regression of
Oeseasonalized Data
18
27
36
Month
45
54
63
72
Figure 2. Sample trend line.
-------
techniques to test for the presence of a trend while adjusting for a seasonal
effect. The techniques discussed as trend analysis provide methods to
determine whether the data show a trend, a change (often referred to as a
step trend), its direction, and whether such a change or trend is sta-
tistically significant—larger than can be ascribed to random variation.
Most of the methods presented later in this document deal with trends,
although some tests dealing with changes are also included. We concentrate
on simple techniques. More detailed analyses of such data are referred to as
time series analysis and can become quite complex. These more detailed
methods are discussed briefly in Section IV-E.
E. How Much Data?
Three or four measurements do not make a trend. Imagine a friend
flipping a coin and getting "heads" four times in a row. We might consider
that to be an interesting result, but not one that would prompt us to
conclude that the coin had heads on both sides. The amount of data is just
not convincing enough. However, if the friend continued flipping the coin
and obtained 10 heads in a row, we would then seriously suspect that the coin
was indeed unusual. This assumes, of course, that we can rule out the
possibility that the friend is somehow manipulating the outcome of the coin
tosses and thus biasing the results.
The amount of water quality data needed depends on the frequency of
collection and the period of seasonality considered. Theoretically, the
amount of data needed can be quantified using the laws of probability if the
specific requirements of the analysis are stated. Such detailed sample size
calculations are quite specific to each situation. However, a general
discussion of the considerations can give some guidance. Referring to part a
of Figure 1, one can see that a short series of six to eight monthly
observations would give a rather misleading impression of the series. It
could show a marked increasing trend, a decreasing one, or one of two types
of curvature. Clearly, to identify a cyclical pattern, the number of
10
-------
observations must be large enough to cover two complete cycles and preferably
more. Thus, with monthly data and a seasonal effect, a minimum of 24 to
30 months of data would be needed. On the other hand, if data are aggregated
to an annual basis, one might be able to identify a trend based on as few as
5 or 6 years. However, such an identification would not be able to
distinguish between a discrete change and a long-term trend, nor whether the
observed trend might be part of a longer cycle.
The more data, the more components of a series could be identified.
One rule of thumb is that there should be 10 observations (some authors
prefer 20) for each component to be tested for or adjusted for. In addition,
for cyclical effects, the series should cover at least two full periods. The
2 years of data reported in 305(b) would be just barely sufficient to apply a
technique that includes seasonality. It would be preferable to include data
from previous reporting periods, provided that they are compatible.
F. How Often?
The sampling scheme, i.e., the frequency of data collection, and the
number of sites and parameters, dictates to a great extent the type of
statistical procedures which should be used to analyze the data. Most water
quality characteristics (physical, chemical, biological) are collected on a
monthly basis; others are obtained once a year.
Examples of water quality characteristics measured in the EPA's basic
water monitoring program include:
11
-------
Sampling frequency
Characteristic In rivers and streams
Flow monthly
Temperature monthly
Dissolved oxygen monthly
pH monthly
Conductivity monthly
Fecal collform monthly
Total Kjeldahl nitrogen monthly
Nitrate plus nitrite monthly
Total Phosphorus monthly
Chemical oxygen demand monthly
Total suspended solids monthly
Representative fish/shellfish annually
tissue analysis
The important point is that the trend analysis must consider sampling
frequency when determining the appropriate type of trend analysis. If a
cyclical component is suspected, the frequency must be a relatively small
fraction of the period in order to estimate the cycle. In addition, data for
at least two cycles would be needed. On the other hand, if data are
collected at intervals as long as or longer than a possible cycle, it is
important to ensure that data collection times are at the same point in the
cycle. Otherwise, spurious trends might appear or the variability of the
data might be substantially increased. Care should be taken to ensure that
data collection is under the same conditons each time a sample is obtained.
G. Hypothesis Testing
In hypothesis testing, a statement (hypothesis) to be tested is
specified. It is generally stated with enough detail so that probabilities
of any particular sample can be calculated if it is true. The hypothesis to
be tested is referred to as the "null" hypothesis. An example is the
hypothesis that a set of data are independent and identically distributed
according to the normal distribution with mean zero and variance
12
-------
one. This would correspond to the hypothesis of no trend. (For most water
quality parameters a different mean and variance would be appropriate.) A
second hypothesis—the alternative—may be specified. This represents a
different situation that one wishes to detect, for example, a trend or
tendency for the concentration of a pollutant in water to decrease with time.
To test a hypothesis, one observes a sample of data and calculates the
probability of the data assuming that the hypothesis is true. If the
calculated probability is reasonably large, then the data do not contradict
the null hypothesis, and it is not rejected. On the other hand, if the
observed data are very unlikely under the null hypothesis, one is faced with
concluding either that a very unlikely event has occurred, or that the null '
hypothesis is wrong. Generally, if the observed data are less likely than a
prespecified level, one agrees to conclude that the null hypothesis is wrong
and to reject it.
In testing a hypothesis one can make two types of errors. First, one
can reject the null hypothesis although it is true: this is called an error
of the first kind or a Type I error. The probability of a'Type I error is
often denoted by the letter a. An example of a Type I error would be stating
that a trend existed when the data were actually random. On the other hand,
one can erroneously accept the null hypothesis when in fact it is false; this
is called an error of the second kind or a Type II error. The probability of
a Type II error is often denoted by the letter (3. An example of a Type II
error would be the failure to detect a real trend in water quality. To
decide whether to reject a null hypothesis, probability levels usually
expressed as percentages are used. These probability levels are also called
significance levels and are commonly chosen to be 1, 5, or 10%. In addition,
confidence levels are defined as 100% minus the significance levels, e.g.,
99, 95, or 90%, respectively.
Consider again the friend with the coin. If the coin and the friend are
unbiased, there is an equal chance that on any given toss the coin will come
up heads or tails. The probability, usually denoted by p, that it will be
heads is thus 50% or 1/2. The probability of getting heads four times in a
row is:
13
-------
(l/2)(l/2)(l/2)(l/2) = 1/16, or 6.25%.
That is, the odds are only 1 out of 16 that this result would occur purely by
chance. If the friend repeated the experiment 16 times, only once (on the
average) would he get heads 4 times in a row.
If one agrees to reject the hypothesis that p = 1/2 if he gets four
heads in a row, then the significance level of 0.0625 is the mathematical
probability of failing to accept the null hypothesis that the coin has both a
head and a tail that are equally likely to come up (based on this
experiment). An a-value of 0.0625 could be considered statistically
"significant." If one decides to reject only if heads appeared 10 times in a
row, the probability of the friend getting heads 10 times in a row is 1/2
multiplied by itself 10 times, which is approximately 0.001, or 1 chance out
of a 1,000. Such a finding would usually be considered "highly" significant,
indeed.
H. Estimation
In addition to testing a hypothesis (of no trend against the alternative
that a trend exists for example), one may wish to estimate the magnitude of a
trend. Such a measure might be a change in concentration per year. Two
aspects of statistical estimation need to be kept in mind. First, one may
obtain a point estimate, which is a statistic that gives the "best" single
value for the size of the trend. However, this by itself is of little use.
It needs to be coupled with a test to determine whether that value is
significantly different from zero. An additional formulation is to give a
range or an interval estimate for the value in question. Such an interval in
statistics is called a confidence interval. This is an interval calculated
from the data in such a manner that it would contain the true but unknown
value of the measure in a specified proportion of the samples. The
proportion of the samples that would yield an interval that contains the
14
-------
measure is called the confidence level. As mentioned previously, it relates
to the significance level, a, in that it is 100% minus the significance
level.
In general, one would want to estimate the magnitude of any trend and to
place confidence limits on the magnitude of the trend. If the sample size is
small, and/or the variance is quite large, these confidence limits will be
very wide, indicating that the trend is not well estimated. If the
confidence limits are quite close, one can be confident that the trend is
well estimated.
I. The Normal Distribution
The results of performing an experiment such as flipping a coin or mea-
suring the pH of a water sample can be tabulated as it is repeated over and
over again. If the frequency of each observed result is then plotted against
the result itself, the graph is called the frequency distribution, or dis-
tribution for short. A very important type of distribution often used in
conjunction with trend analysis (and many other types of statistics) is
called the normal distribution.
The normal distribution (also called the Gaussian distribution) has
dominated statistical practice and theory for centuries. It has been ex-
tensively and accurately tabulated, which makes it convenient to use when
applicable. Many variables such as heights of men, lengths of ears of corn,
and weights of fish are approximately normally distributed. In some cases
where the underlying distribution is not normal, a transformation of the
scale of measurement may induce approximate normality. Such transformations
as the square root and the logarithm of the variable are commonly used. The
normal distribution may then be applicable to the transformed values, even if
it is not applicable to the original data.
15
-------
In many practical situations the variable of interest is not the
measurement itself (e.g., the weight of a fish), but the average of the
measurements (e.g., the average weight of 100 fish in a survey). Even if
the distribution of the original variable is far from normal, the
central-limit theorem states that the distribution of sample averages tends
to become normal as the sample size increases. This is perhaps the single
most important reason for the use of the normal distribution.
The normal distribution is completely determined by two quantities:
its average or mean (u) and its variance (a2). The square root of cr2,(o),
is called the standard deviation. The mean locates the center of the dis-
tribution and the standard deviation measures the spread or variation of
the individual measurements. Figure 3 depicts two normal distributions,
both with mean 0, but with different standard deviations.
.4
.3
-3 -£
-I 0 I
Value of X
Figure 3. Two normal distributions of a variable X.
Solid line: u = 0, o = 1
Dotted line: u = 0, a = 1.5
The values of the mean and variance of water quality parameters could
be practically anything, depending upon the parameter, when and where it
was measured, etc. In order to simplify calculations it is desirable to
change (transform) the data to obtain a mean of 0 and a variance of 1. To
do so, compute a new variable
16
-------
where X is the original variable with mean |j and variance o*. Then Z has a
mean of 0 and a variance of 1. The quantity Z is called the standard normal
deviate, and has what is known as the standard normal distribution, which is
extensively tabulated. (From Z, X can be computed as X - aZ + u.)
In statistical testing one compares a computed value with values in a
table. There are basically two types of tests—one-sided and two-sided. In
the coin flipping example where we were interested only in the result involv-
ing all heads, one would use a one-sided test. However, if the whole argu-
ment were repeated without reference to heads specifically, but only to the
fact that the same outcome was achieved four times or ten times in a row, the
odds would all change. The chances of getting four like results in a row is
only (l/2)(l/2)(l/2)=l/8, because the first flip is immaterial; it only
matters that the last three match the first one, whatever it was. In this
case one would use a two-sided test. One often refers to the one-sided test
as using one tail (end) of a distribution (see Figure 4 below), and the
two-sided test as looking at both tails of the distribution. (By the way,
the term, two-sided, really has nothing to do with the two sides of a coin,
nor does a tail of the distribution refer to the tail side of a coin.)
-4r-3l -2
-.Ij Ot
Value of Z
Figure 4. Standard normal distribution
(u = 0, a = 1)
17
-------
Table 1 below shows one way of tabulating the standard normal
distribution. The following is an explanation of how to use the table for
testing. Each number in the table corresponds to the shaded area in Figure 4
for a particular, positive value of Z.
The water quality analyst can use the table either for a two-sided test
(e.g., is there a trend?) or for a one-sided test (e.g., is the trend in-
creasing?). The two-sided test would normally be used but, for simplicity,
the one-sided test will be explained first.
Suppose a calculated Z from some test is 1.53, that the alternative
hypothesis calls for a one-sided test, and we want to know if Z is signifi-
cant. We then ask, what is the probability of obtaining a Z greater than or
equal to 1.53. Referring to Table 1, in the left hand column chose the line
at 1.5; across the top, chose the column at 0.03, since 1.53 = 1.5 + 0.03.
The number at the intersection of the line and the column is 0.0630. The
probability of obtaining a Z greater than or equal to 1.53, purely by chance,
is therefore 0.0630. If one had previously chosen a significance level of 5%
(a = 0.05), this value of Z would not be significant because the obtained
value, 0.0630, is greater than the chosen significance level of 0.05. That
is we could not say confidently that the trend being tested was real; the
data could well be just random numbers.
To use the table for a two-sided test, with the same value of Z, we
first obtain the same probability from the table. However, the probability
we have to use is that of obtaining a Z greater than or equal to 1.53, or
less than or equal to -1.53. This probability is twice 0.0630 or 0.1260
since the curve is symmetrical about Z = 0.
In summary, for the calculated Z of 1.53, we determined the one-sided
significance level as being 0.063 and the two-sided significance level as
being 0.126. Table 1 can also be used the other way around. For a given
significance level a of 0.025, for example, we wish to determine a value, Z ,
18
-------
TABLE 1
UPPER TAIL* PROBABILITIES FOR THE STANDARD NORMAL DISTRIBUTION
X
.0
.1
.2
.3
.4
.5
.6
.7
.8
.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
.00
.5000
.4602
.4207
.3821
.3446
.3085
.2743
.2420
.2119
.1841
.1587
.1357
.1151
.0968
.0808
.0668
.0548
.0446
.0359
.0287
.0228
.0179
.0139
.0107
.0082
.0062
.0047
.0035
.0026
.0019
.0013
.0010
.0007
.0005
.0003
.01
.4960
.4562
.4168
.3783
.3409
.3050
.2709
.2389
.2090
.1814
.1562
.1335
.1131
.0951
.0793
.0655
.0537
.0436
.0351
.0281
.0222
.0174
.0136
.0104
.0080
.0060
.0045
.0034
.0025
.0018
.0013
.0009
.0007
.0005
.0003
.02
.4920
.4522
.4129
.3745
.3372
.3015
.2676
.2358
.2061
.1788
.1539
.1314
.1112
.0934
.0778
.0643
.0526
.0427
.0344
.0274
.0217
.0170
.0132
.0102
.0078
.0059
.0044
.0033
.0024
.0018
.0013
.0009
.0006
.0005
.0003
.03
.4880
.4483
.4090
.3707
.3336
.2981
.2643
.2327
.2033
.1762
.1515
.1292
.1093
.0918
.0764
(M30)
.0516
.0418
.0336
.0268
.0212
.0166
.0129
.0099
.0075
.0057
.0043
.0032
.0023
.0017
.0012
.0009
.0006
.0004
.0003
.04
.4840
.4443
.4052
.3669
.3300
.2946
.2611
.2296
.2005
.1736
.1492
.1271
.1075
.0901
.0749
.0618
.0505
.0409
.0329
.0262
.0207
.0162
.0125
.0096
.0073
.0055
.0041
.0031
.0023
.0016
.0012
.0008
.0006
.0004
.0003
.05
.4801
.4404
.4013
.3632
.3264
.2912
.2578
.2266
".1977
.1711
.1469
.1251
.1056
.0885
.0735
.0606
.0495
.0401
.0322
.0256
.0202
.0158
.0122
.0094
.0071
.0054
.0040
.0030
.0022
.0016
.0011
.0008
.0006
.0004
.0003
.06
.4761
.4364
.3974
.3594
.3228
.2877
.2546
.2236
.1949
.1685
.1446
.1230
.1038
.0869
.0721
.0594
.0485
.0392
.0314
.0250
.0197
.0154
.0119
.0091
.0069
.0052
.0039
.0029
.0021
.0015
.0011
.0008
.0006
.0004
.0003
.07
.4721
.4325
.3936
.3557
.3192
.2843
,2514
.2206
.1922
.1660
.1423
.1210
.1020
.0853
.0708
.0582
.0475
.0384
.0307
.0244
.0192
.0150
.0116
.0089
.0068
.0051
.0038
.0028
.0021
.0015
.0011
.0008
.0005
.0004
.0003
.08
.4681
.4286
.3897
.3520
.3156
.2810
.2483
.2177
.1894
.1635
.1401
.1190
.1003
.0838
.0694
.0571
.0465
.0375
.0301
.0239
.0188
.0146
.0113
.0087
.0066
.0049
.0037
.0027
.0020
.0014
.0010
.0007
.0005
.0004
.0003
.09
.4641
.4247
.3859
.3483
.3121
.2776
.2451
.2148
.1867
.1611
.1379
.1170
.0985
.0823
.0681
.0559
.0455
.0367
.0294
.0233
.0183
.0143
.0110
.0084
.0064
.0048
.0036
.0026
.0019
.0014
.0010
.0007
.0005
.0003
.0002
*For a two-tailed significance level of x , multiply probability in
table by 2.
Source: Hollander and Wolfe (1973) p. 258
19
-------
such that a value of Z obtained from our data smaller than Z would not be
significant while a value of Z equal to or greater than Z would be sig-
nificant. Z is therefore called the critical value associated with the
significance level a. In the case of a = 0.025 and a one-sided test, the
critical value would be 1.96 (row at 1.9 and column at 0.06). As before, the
probability of Z being less than or equal to -1.96 is also 0.025. We can
thus say that the probability of Z being between -1.96 and +1.96 is
1-2(0.025) = 0.95, or 95% of the values of Z are between -1.96 and +1.96.
Thus, ZG = 1.96 is the critical value for a = 0.025 (one-sided) or a = 0.05
(two-sided). If we want to determine, directly, the critical value for or =.
0.025 (two-sided), we look in Table 1 for 0.0125 = a/2 and find Zc = 2.24
(row at 2.2 and column at 0.04).
J. Parametric or Distribution-Free?
Suppose there are two years of monthly water quality data and we want to
test whether the water quality in one year differed from that in the other,
based on the level of a certain pollutant. The classical method to test for
differences in the concentration of the pollutant in the two years would be
to compute a two-sample t-test. This test, like all statistical tests, is
based on a number of assumptions about the underlying distribution from which
the data were drawn. For the two-sample t-test, these assumptions are:
(1) the errors are independent, (2) the variances are the same, (3) the
distribution is normal, and (4) the null hypothesis is that the two means are
the same. Procedures such as those that specify the form of the underlying
distribution (e.g., normal) up to a few parameters are referred to as
parametric procedures. Many of these are based on the normal distribution or
on sampling distributions (e.g., t-, F-distribution) derived from it.
An alternative approach is to base the test statistic on less detailed
assumptions about the underlying distribution. For example, if the null
hypothesis merely specifies that the distribution of the water quality mea-
sure was the same in the two years and was continuous, then a nonparametric
or distribution-free test can be used. The most widely used such test is
20
-------
the Wilcoxon-Mann-Whitney two-sample rank test (see Section V-C). Because it
does not depend on the assumed form (e.g., normal) of the underlying distri-
bution, inferences based on the Wilcoxon rank sum test may be more reliable.
On the other hand, a distribution-free test will typically have lower power
to detect specific alternatives than will the appropriate parametric test if
all the assumptions of the parametric test hold. However, in most cases the
difference is negligible.
In order for some of the commonly applied parametric statistics to be
valid, the data should be approximately normally distributed, or capable of
being transformed so that they become so. A check for normality is discussed
in Section VI. Unfortunately, most water quality data are often far from
being normally distributed. There are two drawbacks to transforming the
data. One is that it may be difficult to select a suitable transformation.
The second is that the transformation may induce a scale that is difficult to
interpret, or might change some of the other assumptions.
If the classical parametric statistical methods are not valid for the
data, we must rely on the less familiar nonparametric methods. These
procedures require fewer assumptions, so they have wider applicability. Most
commercial statistical computer program packages such as SAS, BMDP, and SPSS
include some nonparametric methods for data analysis, although these often
are restricted to hypothesis testing applications. (They may indicate
whether or not a trend exists, but not provide estimates or confidence
intervals of its magnitude).
In addition to the assumption of normality (or another specific dis-
tribution), statistical procedures—particularly parametric procedures—are
based on a number of other assumptions. The most common of these are
constant variance, linearity of a trend, and independence of errors. The
distribution-free procedures are less sensitive to violations of the assump-
tion of equal variances. Further, the distribution-free procedures generally
deal only with a monotonicity requirement for a trend rather than a specif-
ically linear one, so they may be better for testing. However, procedures
such as regression can easily accommodate various forms of a trend such as
21
-------
polynomials in estimating the magnitude of the trend. Both types of proce-
dures assume independence of errors and can be sensitive to correlation of
errors. Correlated errors call for more complex time series analysis that
specifically incorporates the error structure.
Many of the methods described in this document are nonparametric. How-
ever, some of the more common parametric tests are also presented. If the
data appear to be normally distributed or can be transformed to be so, the
user may prefer these more widely known tests.
K. Plotting
Before using any statistical method to test for trends, a recommended
first step is to plot the data. Each point should be plotted against time
using an appropriate scale (e.g., month or year). Figure 2, presented
earlier, is an example of such a plot.
In general, graphs can provide highly effective illustrations of water
quality trends, allowing the analyst to obtain a much grealer feeling for the
data. Seasonal fluctuations or sudden changes, for example, may become quite
visually evident, thereby supporting the analyst in his decision of which
sequence of statistical tests to use. By the same token, extreme values can
be easily identified and then investigated. Plots of water quality
parameters, expressed as raw concentrations or logarithms, loadings, or water
quality index, can easily be plotted against time using STORET. Also, SAS
features a plotting procedure, less sophisticated, however; for the reader
familiar with SAS GRAPH, a series of computer graphics is available if the
appropriate hardware is on hand.
Another area where plotting plays an important and instructive role is
when testing for normality of data. A relatively simple plotting procedure
is available to the user, in the case when the data base is not too large,
since the plotting is done by hand. All one needs is probability paper. The
use of probability paper is illustrated with an example in Section VI.
22
-------
L Organization of this Document
The normal-theory or parametric procedures are discussed in Section IV,
while the distribution-free procedures are grouped into Section V. In
Section IV, after methods to deseasonalize data are presented, a test for
long-term trend is discussed, then a test for step trend (change), followed
by a test for both types of trend. Finally, time series analysis is briefly
discussed in the last subsection.
Section V is organized in a similar fashion. After discussing tests for
randomness in general terms, distribution-free methods applicable to
non-seasonal data are presented first for long-term trends and step trends
(Subsection B and C, respectively). Next, tests for long-term trends and
step trends in the case of seasonal data are discussed in Subsections D and
E, respectively. The section closes with a brief discussion of the
difficulties of dealing with the data when both types of trends are present.
Figure 5 below presents a decision tree indicating how to determine
which procedures are appropriate based on the characteristics of the data
available to the analyst. Thus, one can follow this as a road map and refer
to the sections where the methods to answer each question on the diagram are
discussed.
The first step is to determine whether the data are on an annual basis
or were more frequently recorded. If they are more frequent than annual,
then seasonality is an important consideration. In general, one must test
for seasonality if the data are on a quarterly or monthly basis. If
seasonality is found, it must be removed before a test for change or trend
can be done. Tests for seasonality can be based on runs tests, turning point
tests, serial correlations, or other general tests for randomness. Methods
for removing seasonality, or for accounting for it in the analysis, are given
in Sections IV and V, where appropriate.
23
-------
IM
WATER QUALITY DATA
ANNUAL DATA
NORMAL?
Yes
No
Yes
MONTHLY OR
QUARTERLY DATA
SEASONAL?
No
NORMAL?
Yes
DESEASONALIZE
Yes
No
Test for Test for
Change Trend
f-Test Linear
Regression
V. J
Parametric
Procedures
Test for Test for
Change Trend
Rank Sum Kendall's
Test Tau Test
V
Distribution-Free
Procedures
Test for Test for
Change Trend
t-Test Linear
Regression
V. J
V
Parametric
Procedures
Test for Test for
Change Trend
Aligned Seasonal
Rank Sum Kendall's
Test Test
^ J
Distribution-Free
Procedures
Figure 5 - Decision tree.
-------
. > ^ y y.'.,. * U J -> M • ,i LI * . y L _ . •> )1 L , i, ^L L— u l»W -W. J'i U i'v - ^LiiO
-------
that there is no seasonality (as with annual data) the test procedure is
based on Kendall's tau. Application of this test is presented in
Section V-B. If seasona?ity is present, then a modified version of Kendall's
test can be used. It is presented in Section V-D.
A number of special problem areas are covered in Section VI. These
include how to handle missing observations; what to do with observations that
appear strange (outliers); how to test for normality; how to deal with
situations, sometimes fairly common, where the parameter value of interest
could not be measured because it was less than the detection limit of the
measurement technique; and how to make corrections to the data for variations
in flow (especially important with concentration parameters).
The document concludes with a selected bibliography. The papers and
texts listed are not all referenced directly, but provide additional
information on one or more of the tests discussed here. For general coverage
of statistical testing, the texts of Snedecor and Cochran (1980) and Johnson
and Leone (1977) are suggested; the books by Siegel (1956), and by Hollander
and Wolfe (1973), provide useful presentations of many nonparametric methods.
Also included in the bibliography are references to several common
packages of statistical computer programs. Perhaps the most versatile and
universally available to the States is SAS. The SAS software is interfaced
to STORET and may be used by STORET users. Documentation of how to use SAS
within STORET is available through the EPA. Where possible, references are
made in this document to the available procedures in SAS, for readers
familiar with or having access to SAS.
Finally, it should be emphasized that the examples used in the following
sections are hypothetical examples. Their purpose is to demonstrate the
different statistical techniques and how to arrive at a test statistic
necessary to perform a specific test. The reader should not be lead to be-
lieve that one year of monthly data, for example, is sufficient to apply a
given test, because of the possibility of seasonal fluctuations.
26
-------
IV. PARAMETRIC PROCEDURES
A. Methods to Deseasonallze Data
If the data exhibit a seasonal cycle--typically an annual cycle for
monthly or quarterly data—then this seasonal effect must be removed or ac-
counted for before testing for a long-term trend or a step trend. The method
proposed here is simple and straightforward. Assume that the data are
monthly values. To remove the possible seasonal effect, calculate the mean
of the observations taken in different years, but during the same months.
That is, calculate the mean of the measurements in January, then the mean for
February, and so on for each of the 12 months.
After calculating the 12 monthly means, subtract the monthly mean from
each observation taken during that month. These differences will then have
any seasonal effects removed—the differences are thus deseasonalized data.
If data were taken on a quarterly basis, one would calculate the four
quarterly means, then subtract the mean for the first quarter from each of
the first quarter observations, subtract the mean for the second quarter from
each of the second quarter observations and so on. The resulting differences
can be used subsequently for testing for a long-term trend or a step trend.
In the parametric framework, deseasonalizating can also be accomplished
by using multiple regression with indicator variables for the months. Using
multiple regression, as described in Subsection D below, this method of
seasonal adjustment can be performed at the same time that a regression line
is fitted to the data.
Many other approaches to deseasonalize data exist. If the seasonal
pattern is regular it may be modeled with a sine or cosine function. Moving
averages can be used, or differences (of order 12 for monthly data) can be
used. Time series models may include rather complicated methods for desea-
sonalizing the data. However, the method described above should be adequate
for the water quality data. It has the advantage of being easy to understand
27
-------
and apply, and of providing natural estimates of the monthly effects via the
monthly means.
B. Regression Analysis
A parametric procedure commonly used to test for trends is regression
analysis. As stated earlier, however, water quality data often do not meet
the underlying assumptions such as constant variance and normally distributed
error terms, so regression analysis should be used with great caution. An
analysis of the residuals (i.e., differences between the observed and the
regression-predicted values of the water quality parameter) is therefore
recommended.
With this proviso in mind, the following example will illustrate the
method. Suppose an annual water quality index were available for a 7-year
period.
Year:
Year No.:
WQI:
1977
1
46
1978
2
52
1979
3
42
1980
4
44
1981
5
39
1982
6
45
1983
7
40
We can use the years themselves in the numerical calculations which follow,
but it is much easier and totally equivalent to just number them from 1 to 7
and use the numbers instead. Note that if a year is missing, then the other
years should be numbered accordingly (i.e., the corresponding number will be
missing also).
First, it is often a good idea to plot the data. Figure 6 shows these
sample data as points on a plot of WQI versus year, along with two calculated
points described subsequently.
The calculations with the data to determine the regression line, and to
test it for statistical significance (i.e., whether the slope and intercept
of the calculated line are statistically different from zero), can be done
28
-------
60-
50
X
I
40
§ 30
a
V
I
20
10
I
77
78,
79 80
Year
81
82
83
Figure 6 - Sample Linear Regression Line
29
-------
manually. However, the formulas are fairly lengthy and the computations are
quite tedious and therefore error-prone. These calculations are almost
always done on a computer or hand calculator. Most scientific hand
calculators, in fact, have the formulas built in, so we only have to enter
the data, pairs of year and WQI, into the calculator and then read out the
answers.
i
The equation for the regression line is:
Y = a + bX
where X is the year number (or year), Y is the WQI; a is called the intercept
of the line, and b is called the slope. Upon entering all the X and Y values
(the exact entry procedure depends upon the specific hand calculator being
used), we read out the values of a and b.
Using the sample data, we get a - 49 and b = -1.25. Thus, the fitted
regression line is:
Y = 49 - 1.25X .
The regression line can be drawn by plotting two arbitrary points from this
regression equation and connecting them with a straight line. For example,
choosing X = 1,
Y = 49 - 1.25(1) = 47.75
and for X = 6,
Y = 49 - 1.25(6) = 41.5 .
These points are also plotted in Figure 6 (as small o's). The straight line
through these two points is the calculated regression line.
30
-------
The fitted regression equation can be used to predict the value of the
water quality index Y which corresponds to a given value of X. The dif-
ference between the observed value of Y and the predicted value of Y for a
given X is called the residual at this point X. In our example, we have
Year No.: 1 2 3 4 5 6 7
Observed WQI: 46 52 42 44 39 45 40
Predicted WQI: 47.75 46.50 45.25 44.00 42.75 41.50 40.25
Residual: -1.75 5.50 -3.25 0 -3.75 3.50 -0.25
The residual measures the discrepancy between the observed value and the
value obtained from the regression line. If the points were perfectly lined
up, each residual would be zero and we would have a perfect fit. Note that
the residuals always add up to zero (rounding off aside). It is these re-
siduals that should be analyzed to test whether they are normally distrib-
uted. The normal distribution assumption can be graphically checked by
either using probability paper as explained in Section VI-C or by performing
a test of fit using a computer program such as the UNIVARIATE procedure in
SAS.
In this example, the WQI appears to have a decreasing trend. But, we
must then ask, is the slope of the regression line statistically significant?
In other words, is the decrease in the water quality index, by an amount of
1.25 per year, significantly different from zero? To test for this, compute
the ratio of the slope, b, to its standard deviation and test using a
t-table. This testing procedure is automatically done in SAS as shown in the
appendix. When using a hand calculator, however, we use an equivalent test
via r, the sample correlation coefficient. The Student's variable t with n-2
degrees of freedom is
t = r
31
-------
In the example data, r = -0.619 and n = 7, and we obtain t = 1.762. Using
Table 2 below and reading at the intersection of the line headed by 5 (7-2
degrees of freedom) and the column headed 0.050 (two-sided significance level
of 5%), we read a critical value of 2.571. Since t of 1.762 falls between
-2.571 and +2.571, we conclude that the correlation coefficient, r, and
consequently the slope, b, are not statistically different from zero, or that
the apparent downward trend is not statistically significant.
More details on regression analysis are available in textbooks by Draper
and Smith (1981), first chapter, and Chatterjee and Price (1977) who also
provide extended discussions on the analysis of residuals.
Regression analysis can be performed with SAS using one of several
procedures with their appropriate options. One SAS procedure would be
PROC REG which features an extensive output (see SAS User's Guide: Statis-
tics, p. 40). An example with output is included in the appendix.
C. Student's T-Test
Student's t-test is probably the most widely used parametric test to
compare two sets of data to determine whether the populations from which they
come have means that differ significantly from each other. This test would
commonly be used to determine if a change has occurred, as reflected in data
obtained before and after some event such as the coming on-line of a sewage
treatment plant. Student's t-test, being a parametric procedure, makes as-
sumptions about the underlying distribution of the population from which the
data are a sample. The basic assumption required to formally develop this
test procedure is that the population be normally distributed; however,
moderate departures from normality will not seriously affect the results.
When using a t-test, one should assure that this basic assumption is not
violated. Mathematical transformations of the data--e.g., log transform,
exponential transform, etc.—can often be helpful in order to arrive at a
normally distributed sample. In comparing two samples, an additional require-
ment besides normality is that the two distributions have equal variances.
32
-------
TABLE 2
TWO-TAILED* SIGNIFICANCE LEVELS OF STUDENT'S T
/T\
Degrees
Probability of a Larger Value. Sign Ignored
VI
Freedom
1
2
3
4
•* 5
6
7
8
9
10
II
12
13
14
IS
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
35
40
45
50
55
60
70
80
90
100
120
X
1 0.500
^T.ooo
0.816
.765
.741
.727
.718
.711
.706
.703
.700
.697
.695
. .694
.692
.691
.690
.689
.688
.688
.687
.686
.686
.685
.685
.684
.684
.684
.683
.683
.683
.682
.681
.680
.680
.679
.679
.678
.678
.678
.677
.677
.6745
; 0.400
' 1.376
1.061
0.978
; .941
; .920
.906
.896
.889
.883
.879
• .876
: .873
.870
.868
.866
: .865
.863
.862
.861
.860
i .859
.858
; .858
; .857
.856
.856
.855
.855
.854
.854
.852
.851
.850
.849
.849
.848
.847
.847
.846
.846
.845
.8416
0.200
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.306
1.303
1.301
1.299
1.297
1.296
1.294
1.293
1.291
1.290
1.289
1.2816
0.100
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.690
1.684
1.680
1.676
1.673
1.671
1.667
1.665
1.662
1.661 :
1.658
1.6448 '
0.050
12.706
4.303
3.182
1776
C2SP
2.447
2.365
2.306
2.262
2.228
L20I
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.030
2.021
2.014
2.008
2.004
2.000
1.994
1.989
1.986
1.982
1.980
1.9600
! 0.025
125.452
I 6.205
! 4.176
: 3.495
! 3.163
1
j 2.969
! 2.841
1 2.752
2.685
: 2.634
[
i 2.593
' 2.560
• 2.533
! 2.510
| 2.490
! 2.473
. 2.458
! 2.445
j 2.433
' 2.423
i
! 2.414
2.406
2.398
• 2.391
: 2.385
i
! 2.379
: 2.373
. 2.368
i 2.364 .
i 2.360
'
. 2.342
! 2.329
i 2.319
: 2.310
; 2.304
: 2.299
2.290
2.284
2.279
2.276
: 2.270
1 2.2414
0.010
63.657
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.724
2.704
2.690
2.678
2.669
2.660
2.648
2.638
2.631
2.625
2.617
2.5758
0.005 !
1
14.089 i
7.453 j
5.598
4.773 ;
!
4.317 i
4.029 :
3.832 i
3.690 •
3.58i :
3.497 :
3.428
3.372 •
3.326 !
3.286 !
3.252 ,
3.222 :
3.197 ;
3.174 :
3.153 i
3.135 !
3.119
3.104 :
3.090 :
3.078
3.067
3.056 <
3.047 i
3,038
3.030
2.996
2.971
2.952
2.937
2.925 ;
2.915
2.899
2.887
2.878
2.871
2.860
2.8070 •
0.001
31.598
12.941
8.610
6.859
5.959
5.405
5.041
4.781
4.587
4.437
4.318
4.221
4.140
4.073
4.015
3.965
3.922
3.883
3.850
3.819
3.792
3.767
3.745
3.725
3.707
3.690
3.674
3.659
3.646
3.591
3.551
3.520
3.496
3.476
3.460
3.435
3.416
3.402
3.390
3.373
3.2905
*for a one-tailed significance level a, read in column 2a.
df = n-1
Source: Snedecor and Cochran (1980) p. 469
33
-------
This means that the distributions of the two populations are identical in
shape, although they may differ in location (in other words, only their means
may differ).
Procedure
The following is a series of 18 concentration measurements for total
chromium, 8 before and 10 after implementation of a pollution control
measure.
"Before" "After"
Concentrations Concentrations
99 59
111 99
74 82
123 51
71 48
75 39
59 42
85 42
47
50
First it is necessary to calculate the mean and standard deviation of
each of the data sets. The mean (or average) is just the sum of all the
values divided by the number of values, n. Thus, for the "before" data, the
mean, mD, is:
D
mfl - (99 + 111 + ... + 85)/8 = 87.1 ug/£
The standard deviation involves using the sum of the squares of each of the
values. Specifically,:
/nB (992 + 1112 + ... + 852) - (99 + 111 + ... +
V n0 (n0 - 1)
34
-------
where ng is the number of "before" data points (8). Carrying out all the cal-
culations gives:
SB = V48T8 = 22.0
Thus, we can compute:
"Before" mB = 87.1 ug/£; Sg = 22.0 ug/£; S| = 481.8; ng = 8
"After" mA = 55.9 ug/£; SA = 19.5 ug/£; S* = 380.1; nA = 10.
The t-test for a step trend in this data set is simply a method of estimating
whether the means mB and m. of the two partitions ("before" and "after") of
the data set differ significantly at a chosen level of significance a. The
test statistic t is:
t _ "B -
Sp
where S , the pooled standard deviation, is computed as:
assuming S§ and S? are not statistically different. To test whether the "be-
fore" concentrations are on the average higher than the "after" concentrations
(i.e., one-sided test) at the a-level of significance, we compare the computed
t with tabulated values of the Student's t distribution at probability level
(1-a) with (nB+nA-2) degrees of freedom.
In the example,
7 • 481.8 * 9
16
35
-------
thus:
t = 87.1 - 55.9
.
20.6 Vl/8 + 1/10
The critical t at a = 0.05 (5%) with (8 + 10 - 2) = 16 degrees of freedom for
a one-sided test is t = 1.746 (see Table 2 above). Since the computed t of
3.18 is larger, one concludes that, indeed, the "before" concentrations were
higher on the average than the "after" concentrations, or that there is a
significant decreasing step trend in the data at the 95% confidence level
when the new treatment facility went into operation.
In the above example, it was assumed (without proof) that the variances,
S| and S?, were not statistically different. This assumption can (and
should) be checked by using an F-test. Calculate:
F - SB _ 481.8 _ , 2? .
f ~ Sj[ 380.1 ~ *""
Note that due to tabulation restrictions, F is always computed as the larger
variance over the smaller. That is, if S. were greater than $„, then
F = SA/SB-
If the population variances are the same, one would expect F = 1. To
test whether the calculated F is statistically greater than 1, an F table is
used. Using Table 3 below, which is applicable at the a = 0.05 level of sig-
nificance, look up the critical F value in the column headed by 7 (= nD-l)
D
and the row labeled 9 (=n.-l). The critical F value in this example is 3.29.
If the computed F were greater than 3.29, then we would say with 95% confi-
dence that the two variances were different, and so the Student's t-test
could not be used. In our case F is less than the critical value, we there-
fore cannot confidently say that they are different, and so we accept the
assumption of equal variances in the "before" and "after" groups.
In the case of unequal variances, when the Student's t-test cannot be
used, only an approximate t-test such as the Behrens-Fisher test (Snedecor
and Cochran (1980) p. 97) can be used to compare the two means.
36
-------
TABLE 3
THE 5% (ROMAN TYPE) AND 1% (BOLDFACE TYPE) POINTS FOR THE DISTRIBUTION OF F
Co
n « i
r
2|
.v
4,
5
ft;
!
ft
> 9;
10:
II;
12;
"1
1
161
4,052
18.51
98.49
10.13
34.12
7.71
21.20
661
16.26
5.99
13.74
5.59
12.25
5.32
11.26
5.12
10.56
4.96
10.04
4.84
9.65
4.75
9.33
4.67
9.07
2
200
4,999
19.00
99.00
9.55
30.82
6.94
18.00
5.79
13.27
5.14
10.92
4.74
9.55
4.46
8.65
4.26
8.02
4.10
7.56
3.98
7.20
3.88
6.93
3.80
6.70
3
216
5,403
19.16
99.17
9.28
29.46
6.59
16.69
541
12.06
4.76
9.78
4.35
8.45
4.07
7.59
3.86
6.99
3.71
6.55
3.59
6.22
3.49
5.95
3.41
5.74
4
225
5,625
19.25
99.25
9.12
28.71
6.39
15.98
5 19
11.39
4.53
9.15
4.12
7.85
3.84
7.01
3.63
6.42
3.48
5.99
3.36
5.67
3.26
5.41
3.18
5.20
5
2JO
5,764
19.30
99.30
9.01
28.24
6.26
15.52
5.05
10.97
4.39
8.75
3.97
7.46
3.69
6.63
3.48
6.06
3.33
5.64
3.20
5.32
3.11
5.06
3.02
4.86
6
234
5,859
19.33
99.33
8.94
27.91
6.16
15.21
495
10.67
4.28
8.47
3.87
7.19
3.58
6.37
3.37 <
5.80
3.22
5.39
3.09
5.07
3.00
4.82
2.92
4.62
1
7
2J7
5,928
19.36
99.36
8.8*
27.67
6.09
14.98
488
10.45
4.21
8.26
3.79
7.00
3.50
6.19
d5:
5.62
3.14
5.21
3.01
4.88
2.92
4.65
2.84
4.44
8
239
5.981
19.37
99.37
8.84
27.49
6.04
14.80
482
10.29
4.15
8.10
3.73
6.84
3.44
6.03
> 3.23
5.47
3.07
5.06
2.95
4.74
2.85
4.50
2.77
4.30
flj df in Numerator
V 10 II 12 14 16 20
241 242 243 244 245 246 248
6,022 6.056 6,082 6,106 6,142 6.169 6.208
19.38 19.39 19.40 19.41 19.42 19.43 19.44
99.39 99.40 99.41 99.42 99.43 99.44 99.45
8.81 8.78 M.76 8.74 8.71 8.69 8.66
27.34 27.23 27.13 27.05 26.92 26.83 26.69
600 5.96 5.93 5.91 5.87 5.84 5.80
14.66 1454 14.45 14.37 14.24 14.15 14.02
4 78 4 74 4 70 4 68 4 64 4 60 4 56
10.15 10.05 9.96 9.89 9.77 9.68 955
4.10 4.06 4.03 400 3.96 3.92 3.87
7.98 7.87 7.79 7.72 7.60 752 7J9
3.68 3.63 3.60. 3.57 3.52 3.49 3.44
6.71 6.62 6.54 6.47 6J5 6.27 6.15
3.39 3.34 3.31 3.28 3.23 3.20 3.15
5.91 5.82 5.74 5.67 5.56 5.48 5.36
3.18 3.13 3.10 3.07 3.02 2.98 2.93
5.35 5.26 5.18 5.11 5.00 4.92 4.80
3.02 2.97 2.94 2.91 2.86 2.82 2.77
4.95 4.85 4.78 4.71 4.60 4.52 4.41
2.90 2.86 2.82 2.7V 2.74 2.70 2.65
4.63 454 4.46 4.40 4.29 4.21 4.10
2.80 2.76 2.72 2.69 2.64 2.60 2.54
4.39 4.30 4.22 4.16 4.05 3.98 3.86
2.72 2.67 2.63 2.60 2.55 2.51 2.46
4.19 4.10 4.02 3.96 3.85 3.78 3.67
24
249
6.234
19.45
99.46
8.64
26.60
5.77
13.93
4 53
9.47
3.84
7.31
3.41
6.07
3.12
5.28
2.90
4.73
2.74
4J3
2.61
4.02
2.50
3.78
2.42
359
30
250
6,261
19.46
99.47
8.62
2650
5.74
13.83
4 50
9.38
3.81
7.23
3.38
5.98
3.08
5.20
2.86
4.64
2.70
4.25
2.57
3.94
2.46
3.70
2.38
351
40
251
6,286
I«M7
99.48
8.60
26.41
5.71
13.74
446
9.29
3.77
7.14
3.34
5.90
3.05
S.ll
282
4.56
2.67
4.17
2.53
3.86
2.42
.3.61
2.34
3.42
50
252
6.302
19.47
99.48
8.58
26-35
5.70
13.69
444
9.24
3.75
7.09
3.32
5.85
3.03
5.06
2.80
451
2.64
4.12
2.50
3.80
2.40
356
2.32
3J7
75
253
6423
19.48
99.49
8.57
26.27
5.68
13.61
442
9.17
3.72
7.02
3.29
5.78
3.00
5.00
2.77
4.45
2.61
4.05
2.47
3.74
2.36
3.49
2.28
3 JO
100
253
6.334
IV.49
99.49
8.56
26.23
5.66
1357
440
9.13
3.71
6.99
3.28
5.75
2.98
4.96
2.76
4.41
2.59
4.01
2.45
3.70
2.35
3.46
2.26
3.27
200
254
6.352
19.49
99.49
8.54
26.18
5.65
1352
438
9.07
3.69
6.94
3.25
5.70
2.96
4.91
2.73
4J6
2.56
3.96
2.42
3.66
2.32
3.41
2.24
3JI
500
254
6.361
(9.50
9950
8.54
26.14
5.64
13.48
4 37
9.04
3.68
6.90
3.24
5.67
2.94
4.88
2.72
4.33
2.55
3.93
2.41
3.62
2.31
3.38
2.22
3.18
<*
254
6J66
19.50
9950
8.53
26.12
5.63
13.46
4.36
9.02
3.67
6.88
3.23
5.65
2.93
4.86
2.71
4.31
2.54
3.91
2.40
3.60
2.30
3J6
2.21
3.16
"2
1
2
3
4
5
6
7
8
9
10
II
12
II
MI: degrees of freedom 1n Numerator of F
n2: degrees of freedom 1n Denominator of f
Source: Snedecor and Cochran (1980) p. 480
-------
A final note: many hand calculators will automatically compute means
and standard deviations, making the above calculations quite easy. Also, the
t-test can be performed using SAS. The procedure PROC TTEST may be used (see
SAS User's Guide: Statistics, p. 217). An example output is shown in the
appendix.
D. Trend and Change
Over a long period of time, a situation could arise where both a
long-term trend and a step trend due to the implementation of a new treatment
facility might be present in the series of data. In such a case, the two
methods described above (regression analysis and t-test procedure), would be
combined to test for both types of trends. Practically, this is done by per-
forming a multiple regression analysis where the dependent variable is the
water quality parameter of interest, one independent variable is the time
(e.g., month) and the second independent variable is a "dummy" variable
taking the value 0 before the known change and 1 after the change. In
addition, if a seasonal effect is suspected (this might be detected when
plotting the data), then the data need to be deseasonalized by introducing
12 indicator variables. The first one would take the value 1 for January and
0 otherwise; the second would be 1 for February and 0 otherwise, etc. The
regression equation can be written as:
Y = a1X1 + a2X2 + . . . + a12X12 * bC * cT
where a. is the effect of month i, i = 1.....12
D is the change effect, and
c is the slope for the long-term trend.
Algebraically, the above equation is equivalent to
Y - a. - aX - ... - aX = bC + cT.
The left-hand side of this equation is just the deseasonalized data, while
the right-hand side is the contribution due to change and trend. The next
38
-------
step is to estimate all 14 parameters (a,, .... a,-, b and c) via least
squares method and to test whether b and c (the coefficients for change and
trend, respectively) are significantly different from zero.
A complete example, based on the data plotted in Figure 1 of Sec-
tion III, is shown in the appendix using the multiple regression procedure of
SAS.
E. Time Series Analysis
Time series analysis is a set of parametric procedures that can be
applied to a series of ordered observations of a quantitative measure, such
as a water quality indicator, taken at regular points in time. Although not
essential, it is common for the points to be equally spaced in time, for
example, monthly. The objective in time series analysis is to determine from
the set of data the pattern of change over time (e.g., time trends, sea-
sonal ity,. cyclical variations, etc.). The various measured patterns are
extracted one by one until the remaining variation in the data is purely
random. When this has been done, all the meaningful information contained in
the original data has been "captured," and the random component that remains
is, by definition, worthless for forecasting. In general, the various trends
and patterns can then be used to forecast probable future behavior of the
series, and confidence intervals can be computed for future projections.
Various analytical methods have been developed to decompose a time
series into trend, seasonal, change, and irregular components. These methods
are fairly complex, nearly always require special computer programs, and thus
are beyond the scope of these guidelines. The interested reader is referred
to standard textbooks on time series such as Box and Jenkins (1970), Kendall
and Stuart (1966), and Glass et al. (1975) or to the publications by Box and
Tiao (1975) and Schlicht (1981).
Time series analysis is not suitable for use with many water quality
data bases because of missing data, values reported as below detection limits,
and changing laboratory techniques (see van Belle and Hughes (1982)). Also,
39
-------
time series methods generally require several years of monthly observations.
For these reasons, the methods suggested here are generally more practical
for analyzing water quality data.
40
-------
V. DISTRIBUTION-FREE METHODS
A. Runs Tests for Randomness
Given a series of measurements of a water quality parameter or indicator
derived therefrom, the question might be asked if the data vary in a random
manner or if they indicate a long-term trend or a seasonal or other periodic
fluctuation. An easy test to detect nonrandom components is the runs test
illustrated in the following simple example. For a given constituent, an
average monthly water quality index (WQI) has been computed and categorized
as either bad (B) or good (G). The figures are:
Month: 123456789 10 11 12
WQI: GGG/BBBBB/G G G G
In statistical terms, the above question can be formulated as a null
hypothesis: the G's and B's occur in random order; versus an alternative:
the order of the G's and B's deviates from randomness.
Each cluster of like observations is called a run. Thus, there are
3 runs in the series of data above. Let n^S be the number of B's, and
n2 - 7 the number of G's (nx denotes the smaller of the two numbers). If
there are too many or too few runs, then the assumption that the sample is
random is probably not correct. If very few runs occur, a time trend or some
bunching due to lack of independence is suggested. If a great many runs
occur, systematic short-period cyclical fluctuations seem to be influencing
the WQI data. Tables have been developed for n^ and n2 up to 20 showing proba-
bility levels for the runs test, and can be found in Langley (1971). An example
showing 5% probability levels is included in Table 4. For nx = 5 and n2 = 7
(at arrow) we use the table as follows: reject the null hypothesis that the
sample is a random ordering of G's and B's if the observed number of runs is
equal to or less than the smaller number in the table (i.e., 3), or is equal to
or greater than the larger number in the table (i.e., 11). Since we have a
series with three runs, we conclude that the fluctuations are not random;
41
-------
TABLE 4
r TABLES SHOWING 5% LEVELS FOR RUNS TEST
«l
2
2
3
3
3
4
4
4
4
5
5
6
6
6
6
6
7
7
7
7
7
7
7
8
8
8
8
8
8
9
9
9
9
9
9
9
10
10
10
10
»t
2-11
12-20
3-5
6-14
15-20
4
5-6
7
8-15
16-20
5
6
7-8
9-10
11-17
18-20
6
7-«
9-12
13-18
19-20
7
8
9
10-12
13-14
IS
16-20
8
9
10-11
12-15
16
17-20 "
9
10
11-12
13
14
15-17
18-20
10
11
12
13-15
No.
«•»
2
—
2
3
—
2
2
3
4
2
3
CT~
3
4
5
3
3
4
5
6
3
4
4
5
5
6
6
4
5
5
6
6
7
5
5
6
6
7
7
8
6
6
7
7
of runs
__
__
—
__
—
—
9
_
—
_J[r.
10
10
TD
IM^^^^^^
—
_
11
12
13
_
—
13
13
14
14
15
15
- -
14
14
15
16
17
17
15
16
16
17
17
18
18
16
17
17
18
Iff III
10
10
10
11
II
II
II
II
II
II
12
12
12
12
12
13
13
13
13
13
14
14
14
14
14
14
15
15
15
15
IS
16
16
16
16
17
17
17
17
18
18
18
19
20
16-18
19
20
11
12
13
14-15
16
17-18
19-20
12
13
14-15
16-18
19-20
13
14
15-16
17-18
19-20
14
IS
16
17-18
19
20
IS
16
17
18-19
20
16
17
18
19-20
17
18
19
20
18
19
20
19-20
20
No.
8
8
9
7
7
7
8
8
9
9
7
8
8
9
10
8
9
9
10
10
9
9
10
10
II
11
10
10
11
11
12
11
11
11
12
II
12
12
13
12
13
13
13
14
of runs
19
20
20
17
18
19
19
20
20
21
19
19
20
21
22
20
20
21
22
23
21
22
22
23
23
24
22
23
23
24
25
23
24
25
25
25
25
26
26
26
26
27
27
28
Source: Langley (1971) p. 325
42
-------
that is, there is only a small probability of obtaining three or fewer runs
(less than once in 20 times) if the sample was actually random. Should the
observed number of runs fall between the two values given in the table, then
we can accept the sample as being random. If we come across a dash (-) in
the table, it means that a 5% probability level cannot be reached in the
particular circumstances, regardless of the number of runs.
If the runs test is used for data sets where n2 is greater than 20, then
rather than using these special types of tables, a different approach is used
where a Z-statistic is computed. The formula is:
r-^*l
- N
N M N2 - N '
where N is simply (nj + n2), and r is the number of runs (3 in the above
example).
The notation, I |, means use the absolute value of what is within—that
is, make the resulting number positive. For example, if n: = 6, n2 = 24, and
there are 10 runs, then:
30
= 10-10.6 = +0.6
The denominator in Z equals: V* 30 / I 30^ - 30 / = 2'85'
Thus, Z = 2-^1 = 0.21.
Reference to Table 1 (page 19) shows that to reject the hypothesis at a
two-sided significance level of 0.05 requires a Z-value of 1.96 or greater.
Since we obtained Z = 0.21, which is less than the critical value of 1.96, we
cannot reject the null hypothesis of the sample being random.
43
-------
Note: It should be emphasized that this latter approach, which involves
computing a Z-statistic and comparing it with the tabulated standard normal
deviates in Table 1, is only to be used with larger data sets, where n2 ex-
ceeds 20.
The above example was based on two kinds of observations (good, bad)
only. However, the runs test applies to any kind of data that can be cate-
gorized into two groups. If the data are quantitative measurements, first
find the median. Then assign a plus sign whenever the observation is above
the median, and a minus sign whenever the observation is below the median.
Then proceed as above, using the plus/minus categories in the same way as the
good/bad categories were used.
If a trend is the alternative to randomness that is of particular
importance, then a runs test appropriate to this alternative can be con-
structed as follows. Count as a plus each observation that exceeds the
preceding one; count as a minus each observation that is less than the
preceding one. Then use these plus/minus categories as before. By changing
the definition of the two categories one can arrive at different runs tests
for different alternatives to randomness.
Many types of data, however, have more than two categories. For exam-
ple, a water quality index could be partitioned into "very good," "good,"
"fair," "poor," and "very poor" categories. The above mentioned runs test
for two categories has been generalized to cases with any number of cate-
gories. The appropriate test statistics are derived Z-statistics and can
therefore be tested by using the tabulated standard normal deviates. Details
are given in Wall is and Roberts (1956), Chapter 18.
The runs test can detect a wide variety of departures from randomness.
However, if a specific type of departure is important, a test designed to
detect that type of departure will be more likely to detect it. With water
quality data taken over time there are three particular types of departures
from randomness that are of interest. One is a seasonal effect related to
44
-------
different amounts of precipitation or weather changes over the year. A more
interesting effect is a long-term trend in which the water quality level is
gradually improving (or worsening) over time. Finally, a third is a sudden
change in water quality associated with a discrete event—opening of a new
factory or installation of a new treatment plant. Of these three,
seasonality is generally a nuisance. That is, one is not specifically
interested in it but rather must adjust for it before the other effects
(long-term trend or step trend) can be tested for. Specific tests for these
types of trend are presented in the following subsections.
B. Kendall's Tau Test
Kendall's tau is a rank correlation coefficient. A distribution-free
test based on Kendall's statistic is commonly used to compare one set of
numerical data with another to see if they tend to "track" together. In
water quality work, a series of readings of some parameter can be tested
against time (e.g., the series of months over which the parameter was ob-
tained) to see if it has any generally increasing or decreasing tendency.
This tendency does not need to be linear (i.e., straight line). To illus-
trate the method, assume one has the following 12 monthly, average water
quality indices (WQIs) at a given station:
Month: 123456789 10 11 12
WQI: 21 3 5 8 21 48 37 39 26 16 35 7
Statistically speaking, one may wish to test the null hypothesis that there
is no trend in the data, i.e., months and WQI values are unrelated, against
the alternative that there is a trend in the data, i.e., the two variables
are related.
The first step is to rank the months and the WQIs in order from lowest
to highest. Since the months are already in order, it is only necessary to
rank the WQIs. The smallest is 3 (in month 2), so it is ranked number 1.
Similarly, the second WQI in rank is 5, then 7, etc. Note that
45
-------
the value 21 appears twice, so the WQIs for months 1 and 5 are "tied." This
is a common situation with this type of data, and is resolved by averaging
the ranks. In this case, the two WQIs in question share ranks 6 and 7, so
each is given the average value, (6+7)/2 = 6.5. The final rankings are tabu-
lated below.
Month: 12345 6789 10 11 12
WQI: 21 3 5 8 21 48 37 39 26 16 35 7
Rank: 6.5 1 2 4 6.5 12 10 11 8 5 9 3
k+: -01235565372
k-: -11100113639
The test involves determining the extent to which the set of WQI values
is ordered in the same way as the months. The following explains the
procedure.
Take each WQI rank and count how many of the ranks to the left of it are
smaller; this gives the k+ line. Then sum up the 11 k+ values; this yields
K+, the number of concordant pairs (i.e*., pairs ordered in the same way as
the months). Then repeat, but count how many of the values to the left of
each WQI rank are greater; this gives 11 k- values. The sum of these, K-, is
the number of discordant pairs. The tie at (21,21) is disregarded in the
counts of K+ and K-. Kendall's tau is then computed as:
tau - (K+) - (K-)
tau n(n-l)/2
If there were no ties and concordant pairs only, then K+ = n(n-l)/2, K- = 0,
and tau = 1; if there were discordant pairs only and no ties, then K+ = 0,
K- = n(n-l)/2, and tau = -1. If K+ = K-, then tau = 0.
46
-------
In our example,
K+ = (0 + 1 + 2 .... + 3'+ 7 + 2) = 39,
K- = (1 + 1 + .... + 3 + 9) = 26, and
For small sample sizes, n, the significance of tau is tested by means of
tabulations of values of Kendall's K = (K+)-(K-), rather than of tau itself.
There is a series of tables for sample sizes ranging from n = 4 to 40
(Hollander and Wolfe, 1973). Table 5 shows a sample of these tabulations.
For larger values of n a normal approximation, discussed subsequently, is
used.
Our example yields a K of (39-26) = 13. In Table 5, under Column n = 12
(number of observations), we read 0.230 at x = 12 and 0.190 at x = 14. Thus,
the one-tailed probability associated with K = 13 is about 0.21 (midway be-
tween 0.230 and 0.190); the two-tailed significance level associated with
K = 13 is thus 2 • 0.21 = 0.42. If we had initially chosen a desired level
of significance of 5% (0.05), we would conclude that there is no significant
trend in the data at this level since 0.42 is not small enough (it is greater
than 0.05).
We could also use Table 5 to determine the 5% critical value for n = 12.
In Column n = 12 we read down to find 0.022 (closest probability to
0.05/2 = 0.025), then read across to x = 30. The probability of 0.031 in the
same column yields x = 28. The critical K associated with an approximate
two-sided 5% significant level is therefore between 28 and 30, so one could
use 29, (or -29 if K were negative). Since K of 13 lies between -29 and +29,
we come to the same conclusion as above, i.e., since K is not big enough, nor
small enough, there is no significant trend. Note that if there are no ties,
K can take on only even values. This is why there is no entry for K=29 in
the table.
47
-------
TABLE 5
UPPER TAIL PROBABILITIES FOR THE NULL DISTRIBUTION OF KENDALL'S
K STATISTIC (Subtable)
1
x 4
0 .625
2 .375
4 .167
6 .042
8
10
>12
>14
16
18
20
22
24
26
28
30
•32
34
36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
70
5 8
.592 J48
.408 .452
.242 .360
.117 .274
.042 .199
.008 .138
.089
.054
.031
.016
.007
.002
.001
.000
9
.540
.460
.381
.306
.238
.179
.130
.090
.060
.038
.022
.012
.006
.003
.001
.000
12
.527
.473
.420
.369.
.319
.273
C23JD
Qjp
.155
.125
.098
.076
.058
.043
.031
.022
.016
.010
.007
.004
.003
.002
.001
.000
13
.524
.476
.429
.383
.338
.295
.255
.218
.184
.153
.126
.102
.082
.064
.050
.038
.029
.021
.015
.011
.007
.005
.003
.002
.001
.001
.000
16
.518
.482
.447
.412
.378
.345
.313
.282
.253
.225
.199
.175
.153
.133
.114
.097
.083
.070
.058
.048
.039
.032
.026
.021
.016
.013
.010
.008
.006
.004
.003
.002
.002
.001
.001
.001
17
.516
.484
.452
.420
.388
.358
.328
.299
.271
.245
.220
.196
.174
.154
.135
.118
.102
.088
.076
.064
.054
.046
.038
.032
.026
.021
.017
.014
.011
.009
.007
.005
.004
.003
.002
.002
20
.513
.487
.462
.436
.411
.387
.362
.339
.315
.293
.271
.250
.230
.211
.193
.176
.159
.144
.130
.117
.104
.093
.082
.073
.064
.056
.049
.043
.037
.032
.027
.023
.020
.017
.014
.012
Source: Hollander and Wolfe (1973) p. 384-393
48
-------
For large sample sizes, a normal distribution approximation to the
distributions of K or tau may be considered. Under the null hypothesis, the
expected value of tau is 0 and the variance of tau is 2(2n+5)/9n(n-l). Thus,
the ratio:
7 = tau
J2(2n+5)
1 9n(n-l)
is approximately standard normally distributed and Table 1 (page 19) is used
for testing.
Our example, based on a tau of 0.197, would yield a value
°-197 =0.89
V
2(24+5)
In Table 1 we read a one-tailed probability of 0.1867 at Z = 0.89 from
which a two-tailed probability of twice 0.1867, or approximately 0.37, follows.
Again, the computed value of Z of 0.89 is not significant at the 5% level and
the same conclusion is reached. Note that from Table 5 we obtained the exact
probability of 0.42; the large sample size approximation yielded a probability
of 0.37. The discrepancy between the two probabilities arises because the
large sample size approximation was used for a sample size of only 12.
In case of ties, the value of tau as defined above is not affected.
However, the variance of tau when using the large sample size approximation
needs a correction for ties. The correction is lengthy in form and has
little effect when only a few ties are present (see Hollander and Wolfe,
1973, p. 187).
This test can be performed using the SAS procedure, PROC CORR, with the
KENDALL option (SAS User's Guide: Basics, p. 501). An example output with
the corresponding SAS statements is presented in the appendix.
49
-------
Once a trend is found to be significant, the next step is to estimate
its magnitude. A common measure of the magnitude of the trend is the slope
of a straight line fitted to the data. To find the distribution-free
estimate of the slope, calculate the slopes for all possible pairs of
observations. That is, calculate
for i=l,2, ..., n-1 and j=i+l, i+2 n. There are N=n(n-l)/2 such dis-
tinct pairs. For the example of WQI values used here, N=12(ll)/2=66.
Then the N slopes are arranged in ascending order, and the middle value
(the median) is the best estimate of the slope. In this case, the ordered
series of slopes becomes -28, -18, -13, .... 19, 20, 27. The median is the
average of the 33rd and 34th values in the series, or 1.49. Thus, we would
conclude that the WQI is, on the average, increasing by 1.49 units per month.
However, it must be remembered that we found no significant trend, so that
the value 1.49 is not significantly different from zero. Indeed, more
advanced procedures also allow one to calculate confidence bounds from these
ordered values; in this case the approximate 95% confidence interval for the
slope is (-4.33, +3.85), a wide interval that includes both negative and
positive possibilities. The interested reader may find further details in
Hollander and Wolfe (1973). SAS does not include routines for estimating the
magnitude and confidence bounds of a trend using Kendall's tau.
C. The Wilcoxon Rank Sum Test (Step Trend)
This is a distribution-free test most appropriate for testing for a
so-called step trend. A step trend might be evident when some major event
such as placing a new treatment facility into operation occurred during the
data collection period.
50
-------
Procedure
The general procedure involves comparing the rankings of one set of
values (e.g., readings obtained before a major event) with the rankings of
another set (e.g., readings obtained after a major event). There need not be
an equal number of values in each set. Because of the way the tables for W,
the Wilcoxon test statistic, are arranged, the set with the fewer number of
values, (sample Size n), is always compared to the larger set (sample size
m), and not vice versa. Thus n is always the smaller of the two sample
sizes.
Consider again the example used to demonstrate Student's t test, namely
the series of 18 concentration measurements for total chromium, 8 before
(denoted by X.) and 10 after (denoted by Y.) implementation of a pollution
control measure.
"Before"
Concentrations
(uq/l) Ranks
99 15.5
11 17
74 11
23 18
71 10
75 12
59 8.5
85 14
"After"
Concentrations
(uq/£) Ranks
59 8.5
99 15.5
82 13
51 7
48 5
39 1
42 2.5
42 2.5
47 4
50 6
Let us first consider the following test situation: because we do not
expect (or are not interested in) a worsening of the water quality due to the
new, improved facility, we will test the null hypothesis that the "before"
and "after" concentrations are equally high or low (i.e., there is no change)
against the one-sided alternative that the "before" concentrations are higher
than the "after" concentrations (i.e., there is an improvement in the water
quality).
51
-------
The first step, as with many other distribution-free procedures, is to
determine the ranks of the concentration values. The observations are
ordered as a single set of data (i.e., disregard "before" and "after"); if
ties are present, use average ranks. Wilcoxon's test statistic is simply the
sum of the ranks of the values in the group of smaller size (here the before
group with n = 8):
W = (15.5 + 17 + .... + 14) = 106.
For a one-sided test at the a = S% level of significance, we reject the null
hypothesis if W is greater than or equal to the critical value, W , associ-
ated with m, n, and or, and we accept the null hypothesis otherwise. Table 6
gives the one-sided levels of significance for n = 8 and several values of m.
Note that we might not always find the exact a that we have chosen, because
the significance levels are discrete and so are only tabulated for integer
values of m and n. In our example, for a = 0.051 (approximately 5%) we read
that x = 95 (= Wc) for n = 8 and m = 10 (see arrows). Since W of 106 is
greater than 95, we reject the null hypothesis of no change and conclude that
the concentrations have significantly decreased after implementation of the
improvement measure.
Also, Table 6 shows for n = 8 and m = 10 the probability of obtaining a
value of W greater than or equal to 106 to be 0.003. This means that the
probability of obtaining a W of 106 or more under the null hypothesis is
0.003, which is small enough to reject the hypothesis in favor of the
alternative.
Next let us consider another testing situation. Suppose a new indus-
trial discharge began, and the same "before" and "after" data as above were
obtained. Here, we will test the null hypothesis of no change against the
alternative that the water quality has degraded, i.e., the "before" concen-
trations are lower on the average than the "after" concentrations. (For
these data, we already know this is not true. However, we will go through
the analysis process anyway to illustrate the procedure.)
52
-------
TABLE 6
UPPER TAIL PROBABILITIES FOR
WILCOXON'S RANK SUM W STATISTIC
(Subtable)
m = 9
1
m = 8 m = 9 m = 10
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
.520
.480
.439
.399
.360
.323
.287
.253
.221
.191
.164
.139
.117
.097
.080
.065
.052
.041
.032
.025
.019
.014
.010
.519
.481
.444
.407
.371
.336
.303
.271
.240
.212
.185
.161
.138
.118
.100
.084
.069
.057
.046
.517
.483
.448
.414
.381
.348
.317
.286
.257
.230
.204
.180
.158
.137
.118
91
92
93
94
— * 95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
.007
.005
.003
.002
.001
.001
.001
.000
.000
.000
.037
.030
.023
.018
.014
.010
.008
.006
.004
.003
.002
.001
.001
.000
.000
.000
.000
.000
.102
.086
.073
.061
(JosJ)
.042
.034
.027
.022
.017
.013
.010
.008
.006
.004
.003
.002
.002
.001
.001
.000
.000
.000
.000
.000
.000
Source: Hollander and Wolfe (1973) p. 272-282
53
-------
The computation of W is unchanged (i.e., W= 106). However, we will
reject the null hypothesis at the a level of significance whenever W is
less than or equal to the critical value of n(m+n+l) - W , where W is de-
c c
fined as above, and accept the null hypothesis otherwise. Thus, with an a of
0.051, m = 10, and n = 8, n(m+n+l) - Wc = (8)(19) - 95 = 57, and we cannot
reject the hypothesis in favor of a degradation since 106 is not less than
the critical value of 57.
For a two-sided test of the null hypothesis of "no change" against the
alternative of "a change at the 5% level of significance," we compare the
above W of 106 with the following two critical values: n(m+n+l) - V(alt m, n)
and W(of2, m, n), where a^ + a2 - a. Most often, we cannot perform the test
at the exact a level, so we have to choose c^ and a2 from the table as close
to a/2 as possible. In our example, W(0.027, 10, 8) = 98 and W(0.022,
10, 8) = 99. Thus for a = 0.022 + 0.027 = 0.049, the two critical values
would be either (8)(19) - 98 = 54 and 99, or (8)(19) - 99 = 53 and 98.
Since the computed W of 106 lies outside the interval 54-99 or 53-98, we
reject the null hypothesis in favor of the alternative of a significant
change.
The upper tail probabilities associated with smaller x-values (see
Table 6) are not tabulated and are generally not of interest because the cor-
responding crva?ues are greater than 0.5. However, they can be calculated.
If the probability that W is greater than or equal to x is P(W > x), which is
not tabulated because it is greater than 0.500, then calculate:
P(W > x) = 1 - P[W > (n(m+n+l) - x + 1)]
Example: n = 8, m = 10, x = 75:
P(W > 75) = 1 - P[W > (8(19) - 75 + 1)]
= 1 - P(W > 78)
= 1 - 0.448
= 0.552
-------
Note: Wilcoxon's W statistic is tabulated in Hollander and Wolfe
(1973), pp. 272-282 for values of m up to 20. Table 6 is just a sample of
these tabulations. For larger values of m, a large sample approximation is
generally used. Under the null hypothesis of no difference between the two
sets of data, the expected value of W is
E(W) = n(m+n+l)/2, and the variance is
Var(W) = mn(m+n+l)/12.
W •
Then the distribution of 2 = war/u\ tends toward a standard normal
distribution and Table 1 page 19 can be used for testing for significance as
described in Section III.
Let us examine the situation of ties more closely. First, we use aver-
age ranks to compute W. It is clear that if a tie occurs among two or more
observations in the same group (e.g., "before" or "after" concentrations), it
does not affect the value of W if average ranks are used for ties, or just
assigned arbitrarily. For example, if the two concentrations of 42 in the
"after" group had arbitrarily been ranked as 2 and 3, W would be the same.
But, if ties occur among observations in different groups, W would change
depending on how the ties were "broken." Averaging, as we did in the exam-
ples, is the usual tie-breaking procedure.
The variance of W is affected by ties, regardless of whether they are
within or between groups. The variance, corrected for ties, is computed as
follows:
Varc(W) = !g
£ yy-»
j=i
55
-------
where g is the number of sets of ties and t. is the size of tied set j. Then
v
compute Z as above with Var (W) replacing Var(W).
Although the large sample approximation is not applicable in our exam-
ple, because m is only 10, we use it to demonstrate the computation of the
correction for ties. We observe:
2 values of 42 (ties within "after" group)
2 values of 59 (ties between groups)
2 values of 99 (ties between groups)
Thus g = 3 and tj = 2, t2 - 2 and t3 = 2. Compute:
t.(t^-l) = 2(22-l) + 2(22-l) + 2(22-l) = 18
J J
and
/O\/T ft'X I TO I
= 126.27
(8+10+11 -- — _ r =
KB+IJ+I; (8+10) (8+10+1)
The Wilcoxon rank sum test is sometimes referred to as the Wilcoxon
two-sample test. An analogous test is the Mann-Whitney test which is based
on an equivalent test statistic, U. In the case of no ties between con-
centration values, U = W - n (n + l)/2, and therefore tests based on W and U
are equivalent. One-tailed probabilities for U can be found in Siegel
(1956), pp, 271-277. If there are ties, a correction is necessary in the
computation of the Mann-Whitney U-statistic. For details see Hollander and
Wolfe (1973) and Siegel (1956).
The Wilcoxon rank sum test can be performed using SAS. The procedure
PROC NPAR1WAY with the Wilcoxon option may be used (see SAS User's Guide:
Statistics, p. 205); an example output with the appropriate SAS statements is
presented in the appendix.
56
-------
The above procedure demonstrated how to use the Wilcoxon rank sum test
to test for the presence of a step trend. It is also of interest to estimate
the magnitude of such a change. A method of estimation based on the
Wilcoxon test is presented next. '
The first step in the estimation procedure is to calculate the dif-
ferences formed by subtracting each observation in the after group from each
observation in the before group. Denote these differences by D.., where
• j
D.. = X. - Y., for i = 1, . . ., n and j = 1 m.
In the example, m = 10 and n = 8, so that a total of 80 differences must be
calculated. Next, order the nm differences from least to greatest and take
the median (the middle value of the ordered differences) as the point esti-
mate of the difference in the two groups. This is the estimate of the
change.
Since the "after" observations were subtracted from the "before"
observations, if the point estimate is positive, the interpretation would be
that the introduction of the pollution control measure resulted in a decrease
in the concentration. Applying the calculations to the example data results
in a point estimate of 29 ug/£ as the decrease in concentration that occurred
when the new pollution control measure was introduced. This is significantly
different from zero, as concluded by the test.
D. Seasonal Kendall's Test for Trend
This is a test procedure proposed by Hirsch et al. (1982) which uses a
modified form of Kendall's tau. In brief, if there are several years of
monthly data, Kendall's K (number of concordant-disconcordant pairs), pre-
sented earlier, is computed for each of the 12 months, and the 12 statistics
are then combined to provide a single overall test for trend. This method is
discussed by van Belle and Hughes (1982) where it is also compared to another
distribution-free test for trend proposed by Parrel (1980) using an aligned
rank order test of Sen (1968).
57
-------
1. Rationale
Figure 7 below depicts a series of 8 years of monthly data recorded for
a given water quality parameter.
= o
efts
f
o "
t o
DO Oj O
00
a
a .
irs.0
Tire IN rows
1B71.0 IVt.0 MO.A
Figure 7. Monthly Concentrations of Total Phosphorus
(ref. Hirsch et al.f 1982)
The plot of the data clearly exhibits a seasonal movement—peaks and
troughs recur at almost yearly intervals. This feature of having a period of
a year (other periodicities may exist) is a common pattern for water quality
parameters in general as discussed earlier. Comparing values between months
within a year will thus not help in detecting a possible long-term trend over
the time period considered. It would be more appropriate to make comparisons
between data from the same month for different years, and avoid the problem
of seasonality, and then combine the individual results into an overall test
statistic from which we can draw conclusions about a trend.
58
-------
2. Procedure
The following demonstration is based on monthly measurements; the same
procedure can be applied to any sampling frequency (e.g., spring, summer,
fall, winter measurements or average measurements), provided that the sam-
pling scheme is identical from year to year. A general case with 12 months
and n years of data is presented first; a simplified numerical example will
then follow.
Arrange the monthly water quality measurements as follows:
Year(i)
Number of
Observations
1
2
n
1
*?
Xnl
nl
Month(j)
23..
V V
Jl2 Jl3
A22 A23
Xn2 ' ' •
n2 . . .
12
y
1 12
Y '
' A2,12
'. "12
where
11
is the observation for the first month of the first year,
is the observation for the first month of the second year,
is the observation for the 12th month of the 2nd year,
2 12
generally, X . . is the observation for the jth month of the ith year.
Note that the number of observations need not be the same from month to
month, i.e., there may be 5 January measurements (n, =5), 6 February
measurements
= 6), etc.
Next, make a second table of numbers of concordant (K+) and discordant
(K-) pairs treating each month separately, using the procedure described with
Kendall's tau. When finished, we have the following array:
59
-------
Month
123
12
number of:
K+
K-
r\^ ™ o™" *
concordant pairs
disconcordant pairs
K
Kl K2 K3
(K+) - (K-)
Then sum the 12 monthly statistics to obtain K = K^ + K2 + . . . + K12.
If the sample measurements are truly random (no trend), this statistic has a
mean of 0 and a variance Var(K) = Var(K,) +. . . +Var(K,2). The variance of
each monthly statistic K. is computed as:
Var(K.) = [n.(n.
J J J
- Z t.(t.
J t
where t- is the size of the ith set of ties in the jth month.
Then compute the standard normal deviate Z with a continuity
correction of one unit as:
VVar~K
0
K+l
II l\ ^ U
if K = 0
if 11 ^ n
I =
VVar K
and use Table 1, page 19 of the standard normal deviates to determine the
significance of Z.
Hirsch et al. (1982) have shown that the normal approximation works
quite well with as few as 3 years of complete data. For fewer years of
records, the exact distribution of K,..., K,^, and therefore of K, has been
derived by Kendall (1975).
60
-------
3. Numerical Example
The following is an oversimplified example used for demonstration pur-
poses only. For simplicity, we will assume that there were only four
observations per year, taken on a quarterly basis. Note that no data were
available in the first quarter of the fourth year. The example shows how
such missing data may be handled. Consider the array of percent violations
of a water quality standard for a given parameter:
Quarter(j)
Year(i)
1
2
3
4
1
12.3
10.5
12.6
~
2
11.5
10.9
10.9
9.8
3
11.6
9.5
9.5
9.5
4
15.3
14.8
13.9
14.0
Number of
Observations
these data are plotted in Figure 8 below.
15-
14-
13-
8
I
I «H
c
3
II-
10-
f-
YMT I
Y««2
Year 3
T 1 1 I 1 I 1 I I I ) I I I I I
1 234 S 6 7 8 » 10 II 12 13 14 15 16
Tim* in Quortcn
Figure 8. Plot of Percent Violations Versus Time
61
-------
From the data we calculate the number of concordant and disconcordant
pairs within each quarter, again ignoring ties in the computation of each K+
and K-. We obtain
K+
K-
K=(K+)-(K-)
1
2
1
1
Quarter(j)
2 3
0
5
-5
0
3
-3
4
1
5
-4
Thus, K, = 1 and n, = 3; no ties
K2 = -5 and n£ = 4; 1 set of 2 ties (10.9, 10.9)
K2 = -3 and n3 = 4; 1 set of 3 ties (9.5, 9.5, 9.5)
K. = -4 and n. = 4; no ties,
and K - K, + Kp + K, + K. = -11. The four variances, corrected for ties,
are:
) = [3(2)(11)-0]/18 = 66/18;
Var(K2) = [4(3)(13)-2(1)(9)]/18 = 138/18;
Var(K3) = [4(3)(13)-3(2)(11)]/18 = 90/18; and
Var(K4) = [4(3)(13)-0]/18 = 156/18 .
From here, Var(K) = (66+138+90+156)/18 = 450/18 = 25. Since K is negative,
we compute Z as
Z = K"1"1
Since Z is less than -1.96, the lower critical Z corresponding to a two-sided
5% confidence level, (Table 1), we reject the null hypothesis of no trend in
favor of the alternative that a trend is present.
62
-------
Once we have identified a significant trend in a series of water quality
measurements, we might be interested in determining the magnitude of the
trend. For a set of stations at which trends have been detected, one could
then compare the different trend slopes for a given water quality indicator
and identify those stations where the trend slope is larger than average.
One way of computing the magnitude of a trend would be to compute the
slope b, of the regression line of the water quality measurement versus time
as we did earlier in Section IV. This technique, however, is recommended
only with caution since the underlying assumptions for regression analysis
are often violated when dealing with water quality measurements. A
distribution-free method for computing the magnitude of a trend has been sug-
gested by Hirsch et al. (1982). This method estimates the magnitude of trend
by means of the seasonal Kendall slope estimator, B, computed as follows.
Considering again the more general data arrangement above, compute d. ..
quantities for each month (season) as follows:
' » f°r al) <
where k = 1,2. . , 12 and 1 < i < j < n. For monthly data, there will be a
total of 12(n)(n-l)/2 such differences. In general, with n years and m
measurements per year, the number of differences will be mn(n-l)/2. The
slope estimator, B, is the median of these d. .. values (i.e., half the d... 's
exceed B and half fall short of it; if the number of differences is even,
then take the average of the two middle ones). The estimator, B, is related
to the seasonal Kendall test statistic S such that if S is positive, then B
is positive or zero; if S is negative, then B is negative or zero. The S
statistic is simply the number of positive d. .. values minus the number of
negative d.-k values and B is the median of these d... 's.
As a computation example for the second quarter (j = 2) and 4 years of
data (n = 4), from the 4 measurements X12, X22, X32> X42 compute the dif-
ferences:
63
-------
d!22 = (X22 " X12)/1 = (10'9 "
d!32 = (X32 " X12)/2 = (10'9 "
d!42 " (X42 " X12)/3 = (9'8 "
d232 = (X32 " X22)A = (10-9 "
d242 = (X42 " X22)/2 = (9'8 " 1
d342 = (X42 " X32)X1 = (9'8 " 10-9>/:L = •1-1
Note that for the first quarter, there will be only 3 (=(3)(2)/2) differences
since one year had missing data, while for each of the remaining three quar-
ters, there will be (4)(3)/2 = 6 differences.
Continuing the above calculations for all 21 pairs for the example data,
one finds that the median of the differences is -0.5. Thus, we would
estimate the trend as a decrease of 0.5 percent violations per quarter. Re-
call that this was found to be significantly different from zero at the
two-sided 5% level using the seasonal Kendall's test.
The value, B, as a measure of trend magnitude, is quite resistant to the
effect of extreme values in the data, unlike the slope of the regression line
as computed in Section IV. It is also unaffected by seasonality because the
slope is always computed between values that are multiples of m months (e.g.,
12 months) apart.
A discussion of the seasonal Kendall's test for trend and other
statistical procedures applied to total phosphorus measurements at NASQAN
stations has been published by Smith et al. (1982). This document also
contains a FORTRAN subroutine to perform the seasonal Kendall procedures.
E. Aligned Rank Sum Test for Seasonal Data (Step Trend)
This test is a method for testing for a step trend when the data exibit
seasonality. This seasonal effect is usually clearly visible after the data
have been plotted. In some cases, the analyst knows or suspects that a
specific water quality parameter may be affected by seasonality.
64
-------
When data are seasonal, the Wilcoxon rank sum test must be modified
before testing for a step trend. Again, assume that a discrete event such as
the opening of a new factory or the installation of a new pollution control
system has been identified. The question is whether this event has produced
a significant change in some measurement of a water quality parameter. The
following is an outline of the computation of the appropriate
distribution-free test statistic.
Assume measurements of a water quality indicator have been taken monthly
at a fixed station over several years. More generally, the data can be col-
lected for m seasons per year. The m times n measurements, where m is the
number of months (seasons) and n is the number of years of collection, can be
arranged as follows:
Year
1
2
mean
123..
XXX
jii Ji2 Ji3
A21 A22 A23* *
*
1^23
• X • £ • J
m
' Jim
A2m
*
• X.m
mean
Xl
X
?•
X..
The * indicates the time at which a major event has happened.
For example, X--, is the measurement in the first month (season) of the
second year. The symbol X~. denotes the average monthly measurement in the
second year, while X., would be the average of the January measurements over
the n years.
Procedure
1. Within each month (column) subtract the monthly average from each
measurement in the n years. This will result in an array of deseasonalized
data. For example, in January, calculate the n differences (X,, - X.,),
(x
nl
These monthly differences will then have an
average value of zero.
65
-------
2. Rank all the nm differences from 1 to nm, regardless of month and
year; this will produce the matrix of aligned ranks:
Year
1
2
n
mean
1
R21
*
Rnl
R.l
2
R12
K22
.
3
R13
K23* '
.
.
m
' R11"
' R2m
•
• Rnm
. Rm
mean
Rl
p
R2.
Rn
R..
3. Now sum the ranks of all the observations taken before the event in
question. Let this sum of ranks be W. To construct the test we will use the
fact that for large sample sizes the distribution of W will be approximately
normal. Let b. be the number of observations from month i that occurred
before the change event, and let a. be the number after the change event. In
the example, b,=2, b2=2, b-=l, .... b =1, considering that the event occurred
in the third month of the second year.
4. Next calculate
m
E -
•i
E is the sum of the products of the number of pre-event observations in each
month and the mean rank of that month. E is the expected value of W.
5. Now calculate
V =
m a1b1 n.
1=1 W1* ill (R'J'~R-J)
= aj+bj 1S tne number of observations for month i.
where
have the same number of observations this is n.
If all months
V is the variance of W.
66
-------
6. Then the test is based on:
U-F
7 — ** c »
Vr
which is tested using Table 1, (page 19).
Consider again the example used in Kendall's seasonal test. Assume that
the measurements are taken quarterly rather than monthly (for simplicity of
illustration) and that the event in question occurred with the beginning of
the third quarter of the second year (denoted by *). Note that no obser-
vation was available for the first quarter of the fourth year. This example
illustrates that the procedure can be used when there are missing data and
shows how to apply the procedure in this case.
Year
Mean
1
2
3
4
1
12.3
10.5
12.6
-
11.8
Quarter
2 3
11.5
10.9
10.9
9.8
10.8
11.6
9.5*
9.5
9.5
10.0
4
15.3
14.8
13.9
14.0
14.5
Next, "deseasonalize" the data by subtracting from each observation within a
quarter the mean value for this quarter. We obtain:
Year
Mean
1
2
3
4
1
0.50
-1.30
0.80
-
0
Quarter
2 3
0.72
0.12
0.12
-0.98
-0.02
1.57
-0.53
-0.53
-0.53
-0.02
4
0.80
0.30
-0.60
-0.50
0
Note: The means are not all exactly zero due to rounding errors.
The 15 new observations are then ranked and quarterly mean ranks
are computed. Again, average ranks are used to break ties. The table of
aligned ranks is as follows:
67
-------
1
2
3
4
1
11
1
13.5
8.5
Quarter
2 3
12
8.5
8.5
2
7.75
15
5
5
5
7.5
4
13.5
10
3
7
8.38
Year
Mean
W is the sum of ranks over all four quarters of year one and the first
two quarters of year two. (W = 11 + 12 + 15 + 13.5 + 1 + 8.5 = 61). The
expected value of W, assuming no change, is the sum of the average ranks over
the same period (E = 8.5 + 7.75 + 7.5 + 8.38 + 8.5 + 7.75 = 48.38). The
variance consists of four terms, one for each quarter. Each term is the
variance of the ranks within that quarter. For example, for the first
quarter b =2, a,=l, n,=3, and the variance of the ranks is
(11-8.5)2 + (1-8.5)2 + (13.5-8.5)2 = 87.5. So the first term in the
variance, corresponding to i=l, is:
87.5 = 29.17 .
Proceeding in the same way for the next three quarters and summing gives
V=80.26. Then
\80.26
=1.41.
Comparing this value of Z to the critical value of 1.96 at the two-sided 5%
level (Table 1), shows that the change is not significantly different from
zero because 1.41 falls between -1.96 and +1.96.
The positive sign for Z indicates that the "before" period had higher
ranks (after deseasonalizing) than the "after" period, thus, the direction of
change was from high to low. However, in this case the change could be due
to random fluctuations.
68
-------
More details on the aligned rank suro test can be found in Lehmann
(1975), p. 132-141. Unfortunately, this procedure is not available through
SAS.
F. Trend and Change
It is very difficult in nonparametric statistics to deal with a data set
containing both a step trend and a long-term trend, or to distinguish between
the two trends in a series of data values. In general one should only be
interested in a step trend when there is a definite external event that would
be likely to result in a change. That is, a step trend is only present when
an event occurred at a known point in time and influenced the data. The
testing procedures for change could be misled by the presence of a long-term
trend. Likewise the tests for trend could indicate an apparent trend in the
presence of (only) a step trend. If there is a change event, determining
whether it is significant and whether there is a trend as well, or only one,
is a difficult task. The importance of plotting the data must be
re-emphasized.
In the parametric case, as discussed in Section IV, one can use multiple
regression analysis to test for the presence of both a step trend and a
long-term trend in the same data series. Unfortunately, there is no
distribution-free procedure that is as well developed and as easy to apply.
One could try both types of tests (for change and trend). If one is
significant and the other not, then the answer is reasonably clear. If
neither is significant, then the data appear to be random. However, if both
are significant, then both types of nonrandomness may be present, or only
one. To determine whether both a change and a trend are present or only one,
and if only one, which, will require a series of analyses, and advice from a
statistician should be sought.
To test for a long-term trend in the presence of a step trend, one could
use the rank procedure as in subsection D. The "before" and "after" data
would be considered as two groups and separate means calculated. Differences
69
-------
between each observation and its group mean would be calculated and the
Kendall's test applied. In effect this would be the seasonal Kendall's test
with only two "seasons"—before and after. To test for the presence of a
step trend when a long-term trend is known to exist would require estimating
the slope and calculating the difference between each observation and the
trend line, then applying the Wilcoxon rank-sum test to the differences in a
manner similar to the procedure of Subsection E. A detailed presentation of
these procedures is beyond the scope of this document.
70
-------
VI. SPECIAL PROBLEMS
As mentioned throughout the preceding sections, water quality data do
not, in general, exhibit all the desired properties necessary for the use of
parametric statistical procedures. We have already mentioned the fact that
most water quality measurements show seasonal (or cyclical) effects. We have
then suggested methods to deseasonalize the data when using parametric proce-
dures (i.e., multiple regression) or distribution-free methods (i.e.,
seasonal Kendall's test, aligned rank sum test). (A seasonal adjustment
method has also been proposed in Schlicht (1981), although within the more
general setting of time series analysis.)
Other problems inherent to "real life" data bases are those of miss-
ing data (incomplete records) and extreme or outlying observations. Another
fact mentioned throughout Section IV is the assumption of normality of the
deseasonalized data or residuals obtained from regression analysis. In addi-
tion, a problem specific to water quality measurements, especially when con-
centrations of pesticides, trace metals, etc. are estimated, are measurements
below detection limit. One final and important point is the problem of
flow changes in rivers and streams which will affect concentrations of most
constituents considered as potential water quality indicators.
A. Missing Data
The basic assumption we make in treating missing data, from a statis-
tical point of view, is that data are missing because of mistakes, such as
lost records, and not because the analyst simply wants to ignore conflicting
or compromising data. In other words, any missing datum is assumed to follow
the same pattern as the recorded observations. A simple method to fill in a
few missing data values is to replace them by the sample mean. More involved
methods deal with least squares estimates; these methods are available
through standard statistical program packages such as SAS, BMDP and SPSS, and
are explained, for example, in Johnson and Leone (1977).
-------
The application of the seasonal Kendall's test for trend is not
restricted to complete data sets, nor is it necessary to have full years of
data. As we mentioned earlier, the seasonal Kendall's test statistic can in
fact be computed with incomplete data. It is also suggested (van Belle and
Hughes (1982)) that the season length be adjusted, if necessary, to obtain a
reasonable record within each season. When using parametric procedures, a
few missing data will reduce the sample size and slightly affect the means
and standard deviations. A large number of missing data, however, might
affect these statistics considerably.
B. Outlying Observations
Most often, outlying or extreme observations, also called outliers, can
be easily detected either when looking at a plot of the data or even earlier,
when closely examining the raw data sheets. Several logical steps can be
taken when outliers are found. The most basic first step is to double check
suspicious observations for transcription errors. If the entry was an error,
correct it if possible. If the proper entry cannot be retrieved but an error
is certain, then delete the datum. There are also statistical procedures to
test whether a suspected outlier(s) is in fact an extreme observation. Such
methods are presented in detail in ASTM Standard E178-75 entitled "Standard
Recommended Practice for Dealing With Outlying Observations." Also, tests
for outliers based on ranks are presented in the "Handbook of Tables for
Probability and Statistics" (1966).
Extreme observations can be actual and caused by flow changes, tempera-
ture changes, etc. It is therefore recommended that ancillary data such as
time of day, water temperature and rate of discharge at the time of sample
collection be collected at the same time as the water quality data. Many
outliers can then be explained and/or corrected. Most rank tests, as
described earlier, are little affected by the magnitude of the observations
and therefore by outliers. In parametric tests, however, outliers affect
means and variances, and may therefore invalidate the resulting tests and
conclusions.
72
-------
C. Test for Normality
Given a set of measurements of a variable X, one wishes to know whether
the variable comes from a normal distribution. The following simple plotting
procedure, if the data base is not too extensive, can be used. Consider the
following example of 12 data points. These n=12 data points can be
rearranged in ascending order:
1 X (i/n+l)xlOO%
1
2
3
4
5
6
7
8
9
10
11
12
-1.45
-1.35
-0.78
-0.62
-0.01
0.04
0.22
0.49
0.72
1.45
1.79
2.50
7.7
15.4
23.1
30.8
38.5
46.2
53.8
61.5
69.2
76.9
84.6
92.3
Should a value of X occur more than once, then the corresponding value of i
(cumulative frequency) increases appropriately. The maximum value of i is
always n, the total number of data points. The pairs (X, (i/n+l)100)-values
are then plotted on probability paper using an appropriate scale for X on the
horizontal axis. Figure 9 shows the results. The vertical axis for the
values of (i/n+l)xlOO% is already scaled from 0.01 to 99.99. If the data
came from a normal distribution, then the plotted points would fall on a
straight line. In practice, a straight line can be drawn by hand through the
points and a judgment can be made as to the normality of the data. Also,
rough estimates of the mean, X, and standard deviation, S, can be made from
this plot. The horizontal line drawn through 50 cuts the plotted line at the
value of X, and the horizontal line through 84 cuts it at the value of X + S;
these two numbers then yield the value of S by subtraction.
More rigorous statistical tests are available for testing for normality,
such as the Kolmogorov-Smirnov test (see Hollander and Wolfe, 1973) or the x2
goodness of fit test (see Snedecor and Cochran, 1980). The drawback of
using these tests is that they tend to easily reject the hypothesis
73
-------
L- 1 ! ,
1
, |
--M— 4:-:
! - -t
_j_) j_ i... 4.
' |
"^ 1 t '
i I . i-|
t | 1
! j i
i *
! ' , !
-H- hH-
: i i i
1
i
"^""T^
i
I
il • I i i 1 i ! 1 1 1
4--M--J-I
x Values 1 T
- - -06
v/.u
0.2
.0
:: 0.0
ft -1.4
" 1.4
t£ -1-3
0-T
.7
04
17
1 ./
t= 0.7
2
2
1
4
5
5
5
8
9
9
2
:: 2.50
r J — 1_ _ — r-L
i 1 1
i
• i
I
T~
7- • i 1 ' ' ! '
i — i M ' i '
t
i=:=a±: =
:|::_§J::
::: ::::: ::
i i
i ; !
l l '
i
1
. ! I1
t i 1
, i/
_i — p_j — j/' I —
T
* . .
' •
/
_z T..
-2.0 -1.0
T ; , ! T4 T
) |
i /
^
"_7 ~~^~ j/ i
m mil MI I m|i m hi
i i T j— i-,-1 — j - j ; — ; ... . t, .. i. j y. — ^^ — i— .
7 1
i
\\ i / i
_t___JU_ j
S-^Tfr-^T — T" u
| 1 • lA '111 ; i ' i ' _,._,_.
. . I j r? ,' ! ' j . ,
/ i .11 : . , . ... .
yf - - J' i
y t i u "t
JA _aE_, i i
\/ \
fa----.1]' ' '; 4=:-^-^-
m. i ,-H |! U4-1 i i"l ' i:^=f
~i4 "~i ' '
1* 1
( if ) 1
•" t
iLL , *
^ s - 1.23 r
II I 1*1
j 1 1 1 j. -H--P- -p|-
. Ji ,
II 1 ._.._.
il plJ-Ll^I
— 1__ l. l _ T
T I
l __
t
1
of i.o t 2'°
... _p . . — ....
: t
1 1 l 1 ! 1 1 1 1 L
"T'l 8
T?_j_ l
A T 1 T T „
f 0>
T .
:::::±::::::::::g
|=^=|==.===it
S'
:::||::::|:::::8
::: ::::: ::::::::::
.^-T--)- 4-L-
i ~~^
i ^
^ S"":::S
-a . _4_ l i -t—f- _ - - -4- - -
p-U- -i- . ^
w
f - H
jj_ _LJJ — J — LLLL P*
it*
e
i Hi '' ill 1 1 1 i 1 1 1 Is
*4
t
n
i
c
I
o
X
x = 0.25 x+s = 1.48
Figure 9. Example of plot on probability paper.
74
-------
of normality when the sample sizes are fairly large, although a visual exam-
ination of a probability plot tends to support the hypothesis. On the other
hand, when sample sizes are relatively small, as might be the case when only
two-years worth of data are available, then the more rigorous tests tend not
to detect deviations from normality. Thus, a probability plot might be more
helpful. It should be noted here that SAS provides a probability plot, as
well as a test for normality, through the UNIVARIATE procedure (see example
output in the appendix).
D. Detection Limits
Sometimes in testing for metals, organic compounds, etc., the analytical
test will not be sensitive enough to quantify the amount of the particular
substance. The amount is below the detection limit of the test, so is
reported simply as "less than" that limit. These data, also referred to as
"censored" data, are not to be confused with missing data, because they are
not "missing"—they are known to be in a certain range, but the precise
values are not known.
There are, however, some ways to compute values that can be substituted
for "less than" values. Such options would include deleting the sample,
filling in with zeros, substituting the actual detection limit for the datum,
or filling in with a random number between zero and the detection limit based
on the underlying distribution of the data. The first three methods are all
biased in some way. The last one, on the other hand, requires information
about the distribution. A simple random number between zero and the
detection limit is sometimes used for substitution, implying a uniform
distribution.
Rank tests can handle "less than" values in some cases with less
difficulty than parametric tests. If the limit of detection is constant, all
"less than" values for a particular constituent are considered as "ties."
Alternatively, rank tests that treat censored data explicitly have been de-
veloped (e.g., Gehan 1965). Note that detection limit values for different
constituents are handled individually.
75
-------
E. Flow Adjustments
Caution must be exercised when interpreting trends found to be sig-
nificant by any of the previously described statistical procedures, espe-
cially when the measurements used are specific constituent concentrations.
It is common knowledge that for most constituents, concentrations change as
the flow changes, which in turn introduces considerable variability into the
measurements. Flow conditions can vary naturally due to climatic factors,
and artificially due to stream regulation and manipulation by man.
One way to correct for changing flow is to determine the relationship
between flow and concentration of the considered constituent. However, no
uniform equation exists since this relationship may vary from site to site
and from constituent to constituent. Hirsch et al. (1980) suggested some
nonlinear equations characterizing relationships between concentrations and
flow in cases where the increased discharge of a constituent is due to pre-
cipitation, snowmelt, or reservoir release. In another case, quadratic equa-
tions are proposed to relate concentrations and flow when the constituent
load may increase dramatically with an increase in discharge because of
runoff during a storm event.
When the effect of increased discharge is a simple dilution effect, the
relationship between concentration and discharge can be characterized by
X = Xi + \2/Q , or
X = \! + [\2/(l + \jQ)] , for example,
where X is the concentration, Q is the discharge flow, and the coefficients
\! and X2 are equal to or greater than zero, and \3 is greater than zero.
Generally the coefficients in these equations can be estimated via least
squares methods (e.g., regression analysis).
76
-------
The sequence of procedures suggested by Hirsch et al. (1980) can be
summarized as follows:
1. First find the best fitting relationship between flow and concen-
tration using regression methods.
2. Compute the series of flow-adjusted concentrations whenever the
relationship determined in Step 1 is significant.
3. Apply the seasonal Kendall test for trend.
4. Compute the magnitude of the trend, if significant, using the sea-
sonal Kendall slope estimator.
For an example of these procedures, the interested reader is referred to
Smith et al. (1982), who applied all three methods—seasonal Kendall test for
trend, flow adjustment, and seasonal Kendall slope estimator—to measurements
of total phosphorus concentrations.
77
-------
BIBLIOGRAPHY
ASTM Designation: E178-75. 1975 "Standard Recommended Practice For Dealing
With Outlying Observations."
Bell, Charles B., and E. P. Smith. 1981. Water Quality Trends: Inference
for First-Order Autoregressive Schemes. Tech. Rep. 6, SIAM Instit. for Math.
in Soc., Biomath. Group, Univ. of Wash., Seattle.
Box, George E. P. , and J. M. Jenkins. 1970. Time Series Analysis. Holden-
Day, San Francisco, Ca.
Box, George E. P., and G. C. Tiao. 1975. "Intervention Analysis with Appli-
cation to Economic and Environmental Problems." J. American Statistical Assoc..
Vol. 70, pp. 70-79.
Chatterjee, Samprit, and B. Price. 1977. Regression Analysis by Example.
John Wiley and Sons, New York.
Draper, Norman R., and H. Smith. 1981. Applied Regression Analysis. Second
Edition, John Wiley and Sons, Inc., New York.
Dykstra, Richard L. and T. Robertson. 1983. "On Testing Monotone Tendencies."
J. American Statistical Assoc.. Vol. 78, pp. 342-350.
Farrell, Robert L. 1980. Methods for Classifying Changes in Environmental
Conditions. Tech. Rep. VRI-EPA7.4-FR80-1, Vector Research, Inc., Ann Arbor,
Mich.
Gehan, Edmund A. 1965. "A Generalized Wilcoxon Test for Comparing Arbitrarily
Singly Censored Samples." Biometrika 521, 203-223.
General Accounting Office. 1981. Better Monitoring Techniques are Needed
to Assess the Quality of Rivers and Streams. Report CED-81-30, U.S. General
Accounting Office, Washington, D.C.
78
-------
Colorado.
Handbook of Tables for Probability and Statistics. 1966. Edited by Beyer,
William H. The Chemical Rubber Co.
Hirsch, Robert M., J. R. Slack, and R. A. Smith. 1982. "Techniques of Trend
Analysis for Monthly Water Quality Data." Water Resources Research. Vol. 18(1),
pp. 107-121.
Hollander, Myles, and D. A. Wolfe. 1973. Nonparametric Statistical Methods.
John Wiley and Sons, New York.
Jernigan, Robert W., and J. C. Turner. Seasonal Trends in Unequally Spaced
Data: Confidence Intervals for Spectral Estimates. Submitted for publication.
Johnson, Norman L, and F. C. Leone. 1977. 2 Vol. Statistics and Experi-
mental Design in Engineering and the Physical Sciences. Second Edition, John
Wiley and Sons, Inc., New York.
Kendall, Maurice G., and W. R. Buckland. 1971. A Dictionary of Statistical
Terms. Third Edition. Hafner Publishing Company, Inc., New York.
Kendall, Maurice G., and A. Stuart. 1966. The Advanced Theory of Statistics.
Volume 3. Hafner Publ. Co., New York, pp. 342.
Kendall, Maurice G. 1975. Rank Correlation Methods. Charles Griffin, London.
Langley, Russell A. 1971. Practical Statistics Simply Explained. Second
Edition, Dover Publications, Inc., New York.
Lehmann, Erich L. 1975. Nonparametric Statistical Methods Based on Ranks.
Holsten Day, San Francisco.
79
-------
Lettenmaier, Dennis P. 1976. "Detection of Trends in Stream Quality: Moni-
toring Network Design and Data Analysis." Tech. Rep. 51, Harris Hydraul. Lab.,
Dept. of Civil. Eng., Univ. of Wash., Seattle.
Lettenmaier, Dennis P. 1976. "Detection of Trends in Water Quality Data from
Records With Dependent Observations." Water Resources Research. Vol. 12(5),
pp. 1037-1046.
Mann, Henry B. 1945. "Nonparametric Tests Against Trend." Econometrica,
Vol. 13, pp. 245-259.
Sen, Pranab K. 1968. "On A Class of Aligned Rank Order Tests in Two-Way
Layouts." Annals of Mathematical Statistics. Vol. 39, pp. 1115-1124.
Schlicht, Ekkehart. 1981. "A Seasonal Adjustment Principle and a Seasonal
Adjustment Method Derived from this Principle." Journal of the American
Statistical Association. Vol. 76, pp. 374-378.
Siegel, Sidney. 1956. Nonparametric Statistics for the Behavioral Sciences.
McGraw-Hill, New York.
Smith, Richard A., R. M. Hirsch, and J. R. Slack. 1982. A Study of Trends
in Total Phosphorus Measurements at NASQAN Stations. U.S. Geological Survey
Water-Supply Paper 2190.
Snedecor, George W., and W. G. Cochran. 1980. Statistical Methods. Seventh
Edition, the Iowa State University Press, Ames, Iowa.
STORET User Handbook. 1980. U.S. Environmental Protection Agency, Office of
Water and Hazardous Materials.
van Belle, Gerald, and J. P. Hughes. 1982. "Nonparametric Tests for Trend
in Water Quality." SIMS Technical Report No. 11, University of Washington,
Seattle, Washington (to appear in Water Resources Research).
80
-------
van Belle, Gerald, and J. P. Hughes. 1983. "Monitoring for Water Quality:
Fixed Station versus Intensive Surveys." Journal Water Pollution Control
Federation. Vol. 55, pp. 400-404.
Wall is, W. Allen, and H. V. Roberts. 1956. Statistics: A New Approach.
The Free Press, New York.
Statistical Program Packages:
BMDP Statistical Software, 1980, University of California Press
SAS: Statistical Analysis System, SAS Institute, Inc.,
SAS User's Guide: Basics. 1982 Edition
SAS User's Guide: Statistics. 1982 Edition
Box 8000, Gary, North Carolina
SPSS: Statistical Package for the Social Sciences, 1982, McGraw-Hill.
81
-------
APPENDIX
This appendix is a collection of output examples using SAS and the data
used to demonstrate various procedures in the preceding sections. It is or-
ganized in the same fashion as the text, and the title in each output cor-
responds to the appropriate section and subsection. The following procedures
are shown:
1. Regression analysis (IV-B)
2. Student's t-test (IV-C)
3. Multiple regression analysis (IV-D)
4. Kendall's tau test (V-B)
5. Wilcoxon rank sum test (V-C)
A FORTRAN subroutine for the seasonal Kendall's test and trend magnitude
estimation is included in Smith et al. (1982).
82
-------
1. REGRESSION ANALYSIS (IV-B)
fOPTIONS LINESIZE=100 NODATE ?
])/DATA REGRESS?
UNPUT YEAR UQI 88 i
5) fCARDS?
*U977 46 1978 52 1979 42 1980 44 1981 39 1982 45 1983 40
TITLE REGRESSION ANALYSIS » UQI VERSUS TIME(YR)?
TITLES OUTPUT EXAMPLE FOR SECTION IV B?
"PROC REG ?
MODEL UQI=YEAR t
OUTPUT OUT=RES
PREDICTED=FRED
UESIDUAL = RESin ;
PROC PLOT DATA=RESf
PLOT PRED*YEAR='P'
UQI*YEAR='0'/OVERLAY i
PROC PRINT DATA=RES »
PROC UNIVARIATE DATA=RES PLOT NORMAL?
VAR RESIDr
TITLE5 TEST OF NORMALITY FOR THE RESIDUALS?
/*
© t>*'
\ ^ arto« fi
(5)
(!)
/
t>
83
-------
REGRESSION ANALYSIS f WQI VERSUS TIME(YR)
OUTPUT EXAMPLE FOR SECTION IV B
DEP VARIABLE: UQI
SOURCE
|
1 MODEL
: ERROR
; C TOTAL
DF
5 D
6
' ROOT MSE
DEP MEAN
i C.V.
i
•! oo I
* 1
j VARIABLE
i
i INTERCEP
! YEAR£<,Lop
i At rA4r
! d
1 l^ohu: if
i
DF
1
£ 1
«SCto»
^Cl
/SUM OF
/SQUARES
/43, 750000
C 70.250000
114.000
3.748333
44.000000
8.518939
PARAMETER
ESTIMATE
2519*000
-1.250000
f\ .aCjUfttSo^ »S :
r l^o vA)tr-c u
MEAN
SQUARE
43,750000
14.050000
R-SQUARE
ADJ R-SQ
STANDARD
ERROR
1402.570
0,708368
: y= -ias
sdl CH77 =
y ^ - i.v
F VALUE
3.114
0,3838
0,2605 "
T FOR HOJ
PARAMETER=0
1,796
-1,765
\J«u- ^ 2.51^
1 , IW = 1,«>
S ^-(.cwrK/0 + H
PROB>F
0,1379 ^
j/oV
^^ » o.uo s.
PROB > IT! ~)
I "^
a) 0.1324 > a;
V,)0.1379J bj
OO;KN ^-ux^ ^ W7
O Hvtn
LS^ -^^-W7t)
V . .
s O
no
-tu
-------
oo
52
SI
REGRESSION ANALYSIS i WOI VERSUS TIHE(YR)
OUTPUT EXAMPLE FOR SECTION IV 6
PLOT OF PREDtYEAR
PLOT OF UOI*YEAR
SYMBOL USED IS P
SYMBOL USED IS 0
YEAR
UGI
PRED
47,75
46.SO
45.25
44.00
42.75
41*50
40.25
RESIO
-1.75
5.50
-3.25
0.00
-3.75
3.50
-0.25
YEAR
-------
CD
CD
VARIABLE=RESID
MOMENTS
REGRESSION ANALYSIS t UQI VERSUS TIME(YR)
OUTPUT EXAMPLE FOR SECTION IV B
TEST OF NORMALITY FOR THE RESIDUALS
UNIVARIATE
RESIDUALS
QUANTILES(DEFM)
EXTREMES
N
MEAN
STD DEV
SKEUNESS
USS
CV
T:MEAN=O
SGN RANK
NUM "= 0
[W1NORMAL
STEM LEAF
4 5
2*1
^
0
*• o ^
•
7
1.787E-13
3.42174
0.680336
70.25
1.915E+15
1.381E-13
-1
7
0.923974
t
SUM UG TS
SUM
VARIANCE
KURTOSIS
CSS
STD MEAN
PROB>!TI
PROBXSI
PROEKU
I
\
\
\
\
\
\
7 1
1.251E-12
11.7083
-0.689156
70.25
1.2933
1
0.932647
0.482 |
* BOXPLOT\
1 ' \
4 J. • \
1 + + \
21 •
T T
100% MAX
75% 03
50% MED
25% 01
0% MIN
RANGE
Q3-Q1
MODE
V5+
1 +
i
•31
.
5.5
3.5
-0.25
-3.25
-3.75
9.25
6.75
-3.75
+____A__
-2
99%
95%
90%
10%
5%
1%
NORMAL
-1
5.5 LOWEST
5.5 -3.75
5.5 -3.25
-3.75 -1.75
-3.75 -0.25
-3.75 1.705E-13
PROBABILITY PLOT
*
+ *
»«^
*
1 • • l_ _»
+0 +1
HIGHEST
-1.75
-0.25
1«705E-13
3.5
5.5
. .
+ 2
0(
Op
I
«w»t to So
So
W o
is
-------
2. STUDENT'S T-TEST (IV-C)
<
I
fOPTIONS LINESIZE=100 NODATE J
DATA UATERQt
INPUT TIME * CONC 005
LABEL CONC=CONCENTRATIONJ
(CARDS i
JB 99 B 111 B 74 B 123 B 71 B 75 B 59 B 85
^A 59 A 99 A 82 A 31 A 48 A 39 A 42 A 42 A 47 A 50
TITLE T-TEST PROCEDURE » TOTAL CHROMIUM CONCENTRATIONS?
TITLES OUTPUT EXAMPLE TO SECTION IV C »
fPROC TTEST »
/CLASS TIME;
IVAR CONC ?
fPROC UNIVARIATE PLOT NORMAL J
CVAR CONC 5
TITLES TEST OF NORMALITY AND NORMAL PROBALITY PLOT »
/*
uva
-Rv,-
an
,' U
t -
J» K
?lolr
87
-------
VARIABLE! CONC
TIME N
T-TEST PROCEDURE t TOTAL CHROMIUM CONCENTRATIONS
OUTPUT EXAMPLE TO SECTION IV C
TTEST PROCEDURE
CONCENTRATION
HEAN STD DEV STD ERROR VARIANCES
A
B
10
B
FOR HOt VARIAMtES\ARE EQUAL, F'<
.55,90000000
'87,12500000
19.49615347
21.95083793
6.16522506
7,76079318
1.27 UITH 7 AND 9 DF
g
UNEQUAL
EQUAL
-3.1503
-3.1946
DF PROB > IT!
14.2 (D 0.0070
16.0 (J) 0.0056
PROB > F'» 0.7234
i
0( .
kvu.oixl OM
\x
-------
00
T-TEST PROCEDURE t TOTAL CHROMIUM CONCENTRATIONS
OUTPUT EXAMPLE TO SECTION IV C
TEST OF NORMALITY AND NORMAL PROBALITY PLOT
UNIVARIATE
VARIABLE-CONC
CONCENTRATION
MOMENTS
QUANTILES(DEF»4)
EXTREMES
N
MEAN
STD DEV
SKEUNESS
USS
CV
T1MEAN«0
SGN RANK
NUM "= 0
lut NORMAL
STEM LEAF
12 3
10 1
8 2599
6 145
18
69.7778
25.5839
0.653072
98768
36.6648
11.5714
85.5
18
0.920973
SUM MOTS
SUM
VARIANCE
KURTOSIS
CSS
STD MEAN
PROBXTI
PROBXS!
PROEKU
4 22780199
2 9
+•
NULTII
+ +•
»LY STEM. LI
+
IAF BY 101
18 100% MAX
1256
654.536
-0.612432
11127,1
6.03018
0*0001
.000212983
0.1661
* BOXPLOR
1 1 \
1 1 \
4X_____X \
3 *— +— *
8X_____X
1 !
K*+01
75% 03
50% MED
25% Ql
0% MIN
RANGE
Q3-Q1
MODE
130+
I
I
V i
\ I
\ 30 +
\ *
123
88.5
65
47.75
39
84
40.75
42
*
-2
99%
95%
90%
10%
5%
1%
NORMAL
*** **
+ +
-1
123 LOWEST HIGHEST
123 39
112.2 42
41.7 42
39 47
39 48
PROBABILITY PLOT
*
t T
+* *****
+ + ** *
** **
+0 +1 +2
85
99
99
111
123
+
lsV ajL VvorwioiU'k
0.166 u>
(
-------
3. MULTIPLE REGRESSION ANALYSIS (IV-D)
OPTIONS LINESIZE=100 NODATE J
''DATA MULTIPLEJ
INPUT TIME CONC J
IF MOD(TIMErl2)=l THEN Xl=l JELSE Xl=0»
IF MOD(TIME»12)=2 THEN X2=l JELSE X2=OJ
IF MOD(TIME»12)=3 THEN X3=I JELSE X3=OJ
IF MOD(TIME»12>=4 THEN X4 = l JELSE X4=OJ
IF MODCTIMEi12)=5 THEN X5=l JELSE X5=OJ
IF MOD(TIME»12>=6 THEN X6=l JELSE X6=0»
IF MOD(TIMErl2)=7 THEN X7=l JELSE X7=0>
IF MOD=0 THEN X12=l JELSE X12=OJ
IF TIME LE 18 THEN C=0 J ELSE C=l »
CARD3J
TITLE MULTIPLE REGRESSION ANALYSISJ
TITLES OUTPUT EXAMPLE TO SECTION IV D J
PROC PRINT J
FORMAT CONC 5.3 J
PROC REG J
MODEL CONC=X1 X2 X3 X4 X5 X6 X7 X8 X? X10 Xll X12 C TIME/NOINT
PROC PLOT 5 PLOT CONC*TIME=C/HAXIS=0 TO 72 BY 2 HREF=18J
/*
DaK
+
00
90
-------
MULTIPLE REGRESSION ANALTSI8
DBS
TIME
CONC
XI
X2
OUTPUT EXAMPLE TO SECTION IV D
X3 X4 XS X6 X7 XB
X?
X10
Xll
X12
1
2
3
4
5
6
7
8
9
10
11
12
13
14
IS
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
r 4B
49
SO
SI
32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
IS
16
17
IB
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
4B
49
30
31
32
0.704
0.636
0.288
0.576
0.422
0.198
0.414
0.057
0.028
0.50S
0.113
0.406
0.414
0.624
0.540
0.343
0.618
0.372
0.198
0.123
0.139
0.010
0.117
0.265
0.256
0.366
0.342
0.233
0.487
0.141
0.223
0.010
0.124
0.019
0.133
0.080
0.296
0.370
0.328
0.292
0.284
0.333
0.230
0.031
0.089
0.054
0.085
0.086
0.216
0.182
0.333
0.290
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0 .
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0 <
0 (
0 (
0 (
0 <
0 <
0 (
0 <
0 <
0 <
0 (
1 <
0 (
0 (
0 <
0 <
0 <
0 <
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
> 1
> /
> /
> /
) (
> 1
> \
> V &
) \ &<
) /
)
)
)
)
)
)
)
*J
1
i
0
-------
DBS
MULTIPLE REGRESSION ANALYSIS
OUTPUT EXAHPLE TO SECTION IV D
TIME
CONG
XI
X2
X3
X4
X6
X7
X8
X9
X10
Xll
X12
33
54
35
56
57
58
59
40
61
62
63
64
65
66
67
60
6?
70
71
.1 72
10
no
53
54
55
56
57
58
5V
60
61
62
63
64
65
66
67
68
" 6V
70
71
72
0.272
0.193
0.174
0.158
0.096
0.056
0.117
0.201
0.283
0.320
0.234
0.272
0.145
0.189
0.095
0.075
0.032
0.046
0.071
0.010
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
o
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
a
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0 .
o'
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1 1
I
-------
'10
MULTIPLE REGRESSION ANALYSIS
OUTPUT EXAMPLE TO SECTION IV D
OEP VARIABLE: CONC
SOURCE
MODEL
ERROR
U TOTAL
ROOT
DEP
C.V.
NOTE: NO
VARIABLE
*X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
Xll
VX12
/C
'/TIME
SUM OF
DF SQUARES
14 5.742797
58 0,559039
72 6.301836
MSE 0.098176
MEAN 0.240778
40,77468
INTERCEPT TERM IS
PARAMETER
DF ESTIMATE
1
1 *»
i 7
1 j?
i *
i ^,
1 ov*
1
1 1
1 £
1
'0.496496
0.552580
0.488331
0.508082
0.511333
0.378917
0.389222
0.243806
0.257057
0,285308
0.277559
1 \0. 347477
1 -0.144327-*.
1 -0.0012509J
MEAN
SQUARE
0,410200
0,009638597
R-SQUARE
APJ R-SQ
£*• 0uA\u&Jl
F VALUE
42.558
O.
^R
0.9113*^
0.8914 *S
ol fir ft oF UJUpv^V V
PROB>F
0.0001
USED, R-SQUARE IS REDEFINED.
STANDARD
ERROR
0,044398
0.044520
0.044657
0.044810
0,044978
0.045162
0.046439
0.046555
0.046686
0.046833
0.046994
0.047169
0.040917
), 0008483678
T FOR HO:
PARAMETERS PROB
11.183
12.412
10,935
11,339
11,368
8,390
8,381
5,237
5,506
"6,092
5,906
7,367
-3,527
-1,474
> m
o.oooT
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001
0.0001/
0.0008'
0.1458 4
* I t*
J JU1T Rav
-------
CONC
0.8
0.4
0.5
0.4
0.3
0.2
0.1
0.0
HULTIPLE REGRESSION ANALYSIS
OUTPUT EXAMPLE TO SECTION IV D
PLOT OF CONC*TIHE SYMBOL IS VALUE OF C
r1
(J
11111222223333344444S5SSS66AAA77
TIHE
-------
4. KENDALL'S TAU TEST (V-B)
^OPTIONS LINESIZE=100 NODATE 9
DATA WATERQ?
INPUT MONTH UQI 86?
ORD=_N_J
CARDS J
. 1 21 2 3 3 5 4 8 5 21 6 48 7 37 8 39 9 26 10 16 11 35 12 7
TITLE KENDALL"S TAU TEST?
TITLES OUTPUT EXAMPLE TO SECTION V B»
/PROC CORR DATA=WATERQ KENDALL?
IVAR MONTH UQI 9
* i
*THE FOLLOWING STEPS ARE TO COMPUTE THE N*(N-l)/2 SLOPES 9
*THIS WILL ALLOW TO OBTAIN THE ESTIMATE OF THE TREND MAGNITUDE*
*IF THE TREND IS STATISTICALLY SIGNIFIGNANT 9
* 9
DATA DIFFS i SET UATERQJ
RETAIN X1-X50 -1
Y1-Y50 -1J
ARRAY XX (I) X1-X50J
ARRAY YY
-------
t Corr-Mlouv ?rocudUK
^ - Vor^,
11.11/1=
96
-------
VARIABLE
MONTH
UQI
u*'
KENDALL'S TAU TEST
OUTPUT EXAMPLE TO SECTION V B
MEAN STD DEV MEDIAN
6.50000000
22.16666667
3.60SSS128
15.02623968
6,50000000
21.00000000
MINIMUM
1.00000000
3.00000000
MAXIMUM
12.00000000
48.00000000
KENDALL'S TAU TEST
OUTPUT EXAMPLE TO SECTION V B
OBS MEDIAN
1 1.48571
«, Tt»»S is
KENDALL TAU B CORRELATION COEFFICIENTS / PROB > IRI UNDER HO:RHO=0 / N = 12
MONTH UQI
MONTH 1,00000 0.19848
0,0000 0,3716
UQI 0,19848 1.00000
0.3716 0.0000
•A
o.os
(
-------
5. WILCOXON RANK SUM TEST (V-C)
("OPTIONS LINESIZE=100 NODATE f
AjDATA UATERQ?
(ff INPUT TIME PERIOD $ CONC 6B ?
'-LABEL CONC=CONCENTRATION?
("CARDS F
(2\h B 99 2 B 111 3 B 74 4 B 123 5 B 71 6 B 75 7 B 59 8 B 85
(9 A 59 10 A 99 11 A 82 12 A 51 13 A 48 14 A 39 15 A 42 16 A 42 17 A 47 18 A 50
DC PLOT DATA=UATERQ?
[PLOT CONC*TIME=PERIOD/HREF=9?
TITLE UILCOXON RANK SUM TEST (STEP TREND)?
TITLES EXAMPLE OUTPUT TO SECTION V C?
'PROC NPAR1UAY DATA=UATERQ UILCOXONt
:LASS PERIOD?
VAR CONC ?
/*
Q 3>o.U
98
-------
,1
130 *
120 4
110 4
100 4
0 90 4
H
UDli C
»l E
N 80 *
T
R
A
T 70 4
1
0
N
60 4
SO 4
40 4
30 4
20
UILCOXON RANK BUN TEST (STEP TREND)
EXAMPLE OUTPUT TO SECTION V C
PLOT OF CONCtTIHE SYMBOL IB VALUE OF PERIOD
A A
A A
Y 4 4 4 4 4 4 4 4 4 4 4 4 4.
3 6 7 B 9 10 11 12 13 14 IS 16 17 18
TIME
-------
UILCOXON RANK SUM TEST (STEP TREND)
EXAMPLE OUTPUT TO SECTION V C
ANALYSIS FOR VARIABLE CONC CLASSIFIED BY VARIABLE PERIOD
AVERAGE SCORES UERE USED FOR TIES
UILCOXON SCORES (RANK SUMS)
LEVEL
B -
A i
N
8
10
SUM OF
SCORES
106.00
65.00
EXPECTED
UNDER HO
76.00
95,00
STD DEV
UNDER HO
11.24
11 .24
MEAN
SCORE
13.25
6.50
UILCOXON 2-SAMPLE TEST (NORMAL APPROXIMATION)
(UITH CONTINUITY CORRECTION OF .5)
g ^*r S= 106.00 2- 2.6252 PROB >!Z!=0.0087
T-TEST APPROX. SIGNIFICANCES .0177
^*U
KRUSKAL-UALLIS TEST (CHI-SQUARE APPROXIMATION)
CHISQ- 7.13 DF= 1 PROB > CHISQ-0.0076
(* VO
•J
4
------- |