-------
I
I
I
I
I
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
variable (e.g., DO, chlorophyll-a), variables can be integrated
over time and/or space (e.g., volume or duration of hypoxia),
or multiple water quality variables can be reduced to a univar-
iate response variable. The latter can be achieved using
predefined water quality indices (see Reed and McErlean 1979)
or multivariate statistical techniques. An example of a
multivariate statistical technique would be the use of the
values of the first principal component from a Principal
Components Analysis (PCA) applied to several water quality
variables. Multivariate analyses can be especially useful for
defining univariate response variables, since they provide an
objective means for capturing in a single measure the compli-
cated nature of the Bay's water quality. A review of multivar-
iate techniques appropriate to CBP water quality monitoring
data is an area that deserves further investigation.
. For this report, we focus on response variables defined as
individual water quality variables measured in the monitoring
program. The methods are generally applicable to any response
variable. Water quality variables that are likely candidates for
response variables are DO, chlorophyll-a, dissolved nitrogen and
phosphorus forms, total nitrogen and phosphorus, and total
suspended solids (TSS). In addition, Secchi depth may be
treated as a response variable, although it is a special case
since only one measurement is associated with a station-cruise
combination.
Explanatory Variables
Based on the earlier discussion, our general approach to the
incorporation of explanatory variables for these response
variables is to include only empirically oriented explanatory
variables to remove "noise" from the data. The only exception
would be analyses of DO trends, for which we would perform
analyses with and without the inclusion of a functionally oriented
explanatory variable that measures the intensity of stratifica-
tion, (e.g., some measure of the rate of change of salinity
with depth).
It is important to realize that measures of the intensity
of stratification are also likely to be related to differences
in tributary flow. Tributary flow may, in turn, be related to
nutrient loadings to the Bay, and nutrient loadings are one of
the major targets of management actions. Care must therefore
be used to ensure that variation in DO removed by a measure of
the intensity of stratification are not trends in DO that are
perhaps attributable to reduced nutrient loadings from the
tributaries.
III-5
I
-------
Martin Marietta Environmental Systems
Information on wind direction, magnitude, and duration
may also be a useful explanatory variable, especially for
analyses during the fall season (for turnover events) and
perhaps for understanding "outlier" DO observations.
Censored Data
For many of the dissolved forms of the nutrients measured
by the CBP monitoring program (e.g., dissolved inorganic phos-
phorus, ammonia), substantial numbers of observations are
below detection concentrations, resulting in censored values.
These observations below detection can also affect data on
the particulate and total forms of the nutrients if calculation
of their concentrations involves the dissolved components.
The central issue with censored data is that concentrations
between zero and the detection limit exist, but these concen-
trations are not observable.
If a substantial number of observations are below detection
concentrations and the overall range of observable values is
relatively narrow, problems can arise with a statistical analysis
for trend. In such situations, censored variables can result
in data that cause violations of the data requirements and
assumptions underlying parameter estimation and hypothesis
testing for many of the statistical methods, furthermore,
determination of trends in a censored response variable, or
use of a censored variable as an explanatory variable, can be
difficult because the unobservability of values between zero
and detection concentrations may cause an important portion of
the variation in the data to be masked. Censored data is of
special concern to the CBP monitoring program. Due to differ-
ences in laboratory measurement methods among the data genera-
tors (i.e., OEP, ODU, VIMS), detection limits vary spatially.
In addition, measurement methods have also changed over time
for some of the data generators causing temporal variation in
detection limits. Variations in detection limits with time
and space can introduce artificial temporal and spatial trends
in the data, thereby further confounding statistical determin-
ation of trends attributable to management actions.
There are five general approaches for dealing with censored
data:
(1) Categorize the censored variable.
(2) Assign the same value to individual censored values I
(e.g., detection levels, zero). . m
III-6
-------
I
I
i
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
(3) Pool observations and assume a statistical distribution
for the censored values in order to obtain less biased
estimates of distributional parameters of the pooled
data (e.g., mean, variance).
(4) Pool observations, assume a statistical distribution for
all observations, and use regression or maximum likelihood
methods to obtain less biased estimates of distributional
parameters of the pooled data.
(5) Pool observations and use measures of central tendency
unaffected by censored data (e.g./ for < 50% of the
observations censored/ the sample median)
The first two approaches result in individual censored observa-
tions being assigned to a specific value. The latter three
approaches utilize pooled observations/ and thus do not result
in the assignment of values to individual censored observations.
The third and fourth approaches use statistical methods to
obtain less biased estimates of the distributional parameters
of the pooled observations (e.g./ mean and variance of the
pooled data)/ and the fifth approach simply uses statistical
measures of central tendency unaffected by censored data.
Examples of the approaches involving the assignment of
values and the use of statistical methods (methods 2, 3, and
4 above) were presented and compared using Monte Carlo methods
in Gilliom and Helsel (1986) and Helsel and Gilliom (1986).
They found that the approach of assigning a specific value to
censored observations was inferior to several of the statisti-
cal approaches. Furthermore, among the statistical methods,
the most robust method for estimating means and variances of
pooled observations was a regression method assuming a lognorraal
distribution of observations (denoted LR in Gilliom and Helsel,
1986). This method is implemented in the following manner:
(1) Calculate the logarithm of uncensored pooled observa-
tions (the response variable).
(2) Calculate the probit of the ranks (standardized to be
between zero and one) of the all of the pooled observa-
tions (the explanatory variable).
(3) Apply linear regression to the above response and
explanatory variables.
(4)" Calculate probit values for ranks from one to the number
of censored observations.
(5) Use the regression model (step 2) to calculate values
of the response variable corresponding to probit values
generated in step 4.
III-7
I
-------
Martin Marietta Environmental Systems
(6) Estimate the mean and variance using the uncensored
data combined with the regression generated values of
the censored data.
The categorization approach and the use of measures of
central tendency unaffected by censored values are alternatives
to the other three approaches, which attempt in some manner to
reconstruct values for the censored data. Rather than "filling
in" censored values, the idea behind categorization and unaf-
fected measures of central tendency is to deal directly with
the information available (i.e./ the censored value falls
somewhere between zero and the detection concentration).
Use of a measure of central tendency requires that observa-
tions be pooled. An alternative to the use of the mean as a
measure of central tendency is the sample median. In fact, if
less than 50% of the pooled observations are censored, then the
sample median is unaffected by the censored values.
The categorization approach involves defining categories
for the censored variable based on some criterion. Analysis
then proceeds using the categorized variable. Any number of
categories can be defined, although to ensure sufficient numbers
of observations in each category usually two or three categories
are used. For two categories, the sample median is commonly used
to define the "low" and "high" categories, since this results in
equal numbers of observations in each category. Provided less
than 50% of the observations are.censored, categorizing the
censored variable into two categories based on the median is
exact. All values less than the median, some portion of which
are censored, are assigned to the "low" category, and all
values greater than the median are assigned to the "high"
category. There is no need to attempt to distinguish among
the censored values. Other criterion, as well as three or
more categories, can be implemented in a similar manner. Care
must be used to ensure that categories are defined such that
all censored values are assigned to the same category.
In summmary, in situations of heavily censored variables,
the feasible options are pooling data and defining "robust"
measures of central tendency (e.g., median), pooling data and
implementing statistical techniques to obtain less biased
estimates of distributional parameters (e.g., means, variances),
and dealing with individual observations by categorizing the |
censored variable. We recommend the use of robust measures of |
central tendency or categorization. In Chapters V and VI we
discuss how these various alternatives can be used with univar- ป
iate methods for trend detection. I
III-8
-------
I
I
I
I
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
Homogeneity of Trend
It is important in a monitoring program as large as the
CBP monitoring program (40 stations, 20 cruises/year, multiple
depths) to perform analyses on temporal and spatial scales
that allow for interpretation and generalization of results.
On one hand, results of analyses of trends in water quality at
individual stations, depths, or cruises are difficult to gen-
eralize to statements concerning trends in Chesapeake Bay. On
the other hand, if all of the data are included in a single
analysis, interpretation of results is extremely difficult.
As discussed in more detail in Chapter VI, we advocate con-
ducting analyses for trend on seasonal-, regional-, and depth
layer-specific bases. Depth layers could be defined as either
the water column or above and below pycnocline layers.
In situations where trend analyses are being performed on
data grouped into season, region, and depth layer, the issue
of homogeneity of trend becomes important. Improper aggrega-
tion of data into groups can cause trends in some members of
the group to mask trends, in some, or even most, of the other
members of the group. Homogeneity of trend issues can involve
the consistency of trend in individual stations grouped into a
region, in multiple cruises in a year grouped into a season,
and in multiple depth measurements grouped into a depth layer.
For all of the methods described in Chapter IV, homogeneity of
trend is an important consideration.
III-9
I
-------
Martin Marietta Environmental Systems
I
I
I
I
I
I
I
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
I
-------
I
I
I
I
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
IV. UNIVARIATE STATISTICAL METHODS
FOR TEMPORAL TREND DETECTION
A. INTRODUCTION
This Chapter presents a brief overview of selected
univariate methods for temporal trend detection. These methods
were selected based on their appropriateness for detecting trends
in water quality data. Additional information on these and other
methods for trend detection are presented in the bibliography
(Appendix A). Some methods not included in this chapter may
also be appropriate for analysis of CBP water quality data (e.g.,
two-sample tests). As new methods become available, or as
existing methods are deemed appropriate, these can be easily
incorporated into the analysis framework.
Univariate statistical methods for trend detection have been
organized into the following general categories:
Box-Jenkins Intervention analysis
Parametric methods (GLM and linear logit models)
Distribution free (nonparametric) methods.
In the next sections, we provide brief overviews of methods in
each of these categories.
B. BOX-JENKINS INTERVENTION ANALYSIS
Description
Box-Jenkins time series analysis is an empirical approach
that entails the estimation of a model from the data (see Box and
Jenkins 1976; McCleary and Hay 1980). The observed time series
of data is assumed to be a single realization from a stochastic
process. The goal of Box-Jenkins modeling is to estimate the
parameters of the underlying stochastic process that generated
the observed time series of observations.
Box-Jenkins methods require stationary time series (i.e.,
no trend or drift). Differencing of a time series is typically
used to obtain a stationary time series, and analyses proceed
using the differenced time series.
IV-1
I
-------
Martin Marietta Environmental Systems
I
The general form of Box-Jenkins models is an ARIMA
(autoregressive integrated moving average) process, . A*.IMA
models are specified with three parameters (ARIMA (p,d,q)):
p refers to the autoregressive structure in a model (i.e., I
the preceding p values of the response variable are used
to predict the present value of the response variable).
q refers the moving average component of the model (i.e.,
the preceding q random shocks (or errors) are used to
predict the present value of the response variable).
d indicates the order of differencing that was applied to
the response variable time series.
Thus an ARIMA (2,1,2) model would be of the following form:
(l-e1B1-92B2)et I
Yt - Yt_! = __-_5
where: . m
Yt * response variable
B * backshift operator (BnasXt-Yt_n)
9 .ป moving average parameters
$ * autoregressive parameters
e^ = errors.
Specific Box-Jenkins models can consist of either an autore-
gressive component (AR), a moving average component (MA), or
both (ARIMA).
The Box-Jenkins models described thus far do not take into
account any seasonal signals in the data (e.g., with the models
we can model DO in June as a function of DO in the preceding
April and May). Seasonal ARIMA models allow for inclusion of
additional seasonal signals into the model (e.gr DO in June as
a function of DO in the previous June, as well as the preceding
April and May). ' Seasonal ARIMA models are denoted as ARIMA
(p,d,q)X(P,D,Q), where P, D, and Q refer to the structure of
the seasonal model.
Explanatory Variables
Explanatory variables are incorporated into Box-Jenkins
models by the use of transfer functions. A transfer function
IV-2
I
-------
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
relates the present value of the response time series to a linear
function of present and previous values of the explanatory
variable.
Relevant Data Requirements
Box-Jenkins methods generally require a long time history of
data. Observations are assumed to be from continuous variables.
Single observations of the response variable are required (i.e.,
replicate values are not allowed)/ and these observations must be
uniformly space in time with no missing values. Recently/
several methods have been proposed for Box-Jenkins modeling of
data with missing values (e.g., see Lettenmaier 1980; Sturges
1983).
Assumptions
The following assumptions are required for parameter
estimation and hypothesis testing in Box-Jenkins models: '
(1) raean(et) = 0
(2) var(et) * a for all t (homoscedasticity )
I
I
I
I
(3) cov(et /et+fc) * 0 for all t/k (independence)
(4) e^ are normally distributed.
Assumptions 1-4 imply a white noise process for the errors (et)
Model Building
The Box-Jenkins approach involves three steps to model
building: model identification/ parameter estimation/ and
diagnostic checking.
Model Identification
The autocorrelation function (ACF) and partial autocor
relation function (PACF) are examined to assess the
statioharity of the time series and to identify the
general structure of the ARIMA model (i.e./ values for
p/ d, and q). When a transfer function is included/
the cross correlation function (CCF) between the re-
sponse and- explanatory time series is examined to
determine the form of the transfer function.
IV-3
I
-------
Martin Marietta Environmental Systems
Parameter Estimation
Initial parameter estimates for the identified model
are first obtained directly from the ACF, PACF, and,
if appropriate, CCF. An iterative maximum likelihood
procedure is then used to obtain final estimates for
the parameters. Hypothesis testing of whether indi-
vidual parameter estimates are significantly different
from zero are performed using normal theory. Insignif-
icant parameters may be eliminated from the identified
model.
Diagnostic Checking
Four diagnostics, based on the residuals (observed Yt
minus predicted Yt), are typically examined to
assess the adequacy of the determined model. These
diagnostics are plots of residuals versus time, the
ACF and PACF of the residuals, and tests of whether
residuals are normally distributed (e.g., Kolmogorov-
Smirnov test). Residuals based on an "adequate"
model would appear as a white noise process in these
diagnostics. If the residuals do not appear as white
noise, Box-Jenkins methods can be used to identify
structure in the residuals, and this information can
be utilized to modify the response time serie* model.
Parameter estimation and diagnostic checking are then
applied to the modified model. This process continues
until a model is obtained that results in white noise
residuals.
Trend Detection
Box-Jenkins methods can be used for determining whether an
action (an "intervention") has had a significant effect on the
behavior of the response time series. These analyses are termed
intervention analysis or ARIMA Impact assessment (see McCleary
and Hay 1980; McDowall et al. 1980). Intervention analysis
requires that the time of the onset of the intervening action be
known. Furthermore, intervention analysis requires a long time
history of both pre- and post-intervention data. Four general
types of responses to the intervention can be detected (see
Figure IV-1). These responses involve whether the response is a
permanent or temporary shift in the mean level of the time
series and whether the shift is abrupt or gradual. In addition, ||
for all of these of cases, the response may be coincident with
the onset of the intervention or may be delayed from the onset
of the intervention.
IV-4
-------
I
j
I
I
I
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
DURATION
PERMANENT
TEMPORARY
O
I
|
Figure IV-1.
Four general types of responses to an intervention
that can be detected with Box-Jenkins intervention
analyses (reproduced from McCleary and Hay 1980)
I
IV-5
-------
Martin Marietta Environmental Systems
Interventions are represented in Box-Jenkins models as a
special case of a transfer function (i.e./ a step or pulse
transfer function). The form of the intervention transfer
function can be specified a priori. Model identification is
performed using the pre-intervention data/ and parameter
estimation is performed using both pre- and post-intervention
data. Essentially/ the idea is to determine if incorporation of
an intervention transfer function enables the model developed
from the pre-intervention data to also De applicable to
post-intervention data. Conclusions concerning the effect of
the intervention are based on significance testing of the
parameters associated with the intervention transfer function.
C. PARAMETRIC METHODS
General Linear Models (GLM)
Description
*
GLM are a suite of methods that partition variation in a
continuous response variable to hypothesized sources (i.e./
explanatory variables). The principal of GLM is that the
response variable is a linear function of parameters
corresponding to terms for individual, or combinations of/
explanatory variables(see Neter et al. 1985).
Explanatory Variables
Explanatory variables can be categorical or continuous.
GLM that include only continuous explanatory variables are
called regression models. GLM that include only categorical
explanatory variables (which are termed factors) are ANOVA
models. Models that include a mixture of categorical and
continuous explanatory variables are known as ANCOVA models.
ANOVA/ANCOVA models containing more than one factor can be
characterized as crossed and/or nested designs. To illustrate,
consider a response variable observed on two cruises in each year
for two years. A crossed ANOVA model with cruise and year as
factors would be appropriate if the first cruise in each year
has the same effect on the response variable for year 1 as for
year 2. In this situation, the ANOVA model would contain
terms corresponding to a cruise effect, year effect, and
interaction effect between cruise and year.
IV-6
-------
I
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
YJ.J = U + CRUISE i + YEARj + CRUISE x YEARji + e^
where:
YJLJ = value of response variable for i cruise
and jfc year'
U * overall average of the response variable
the effect
ith cruise
CRUISED = the effect on the response variable of the
th
YEARj ป the effect on the response variable of the
jtn year
CRUISE x YEAR^j a the interection effect on the response
variable due to the i^n cruise in the
year
error.
A nested ANOVA model with cruise and year as factors would be
appropriate if the first cruise in year 1 was not related to the
first cruise in year 2, and the second cruise in year 1 was not
related to the second cruise in year 2. That is, the model does
not involve estimation of the "effects" of a given cruise
across years. The nested model would contain terms corresponding
to year effects and cruise nested in year effects:
where:
I
Y^J ป val
itfi
I
I
I
I
U + YEARj -I- CRUISE ( YEAR)
ue of response variable variable for
cruise and jtn year
U ป overall average of the response variable
YEAR.* = the effect on the response variable of the
J jtn year
CRUISE(YEAR)\* * the effect on the response variable of the
the effect on the response
itn cruise in the jtn year
eij * error.
Factor effects in an ANOVA/ANCOVA model can be characterized
as fixed or random effects. Fixed effects imply conclusions are
restricted to observed levels of factors. Random effects imply
that inferences will extend to a population of factors levels
(not all of which are observed). We will be dealing with
fixed effects models.
IV-7
-------
Martin Marietta Environmental Systems
Relevant Data Requirements
GLM requires that the response variable be a continuous
variable. In addition-, it is preferrable (although not
necessary) to have replicate values of the response variable for
each combination of factors included in a model.
Assumptions
The following assumptions are required for GLM:
(1) meanfe^) ป 0
(2) var(ek) * c for all k (homoscedasticity)
(3) covCefc/efc+in) a 0 for all k and m (independence)
(4) efc are normally distributed.
Note that these assumptions correspond to the assumptions
underlying Box-Jenkins methods. For GLM, assumptions 1-3 are
required for parameter estimation and assumption 4 is required
for hypothesis testing.
Parameter Estimation/Hypothesis Testing
GLM base parameter estimation on ordinary least squares
methods and hypothesis testing on the F-statistic. Consider
the following two-factor crossed ANOVA model:
Y ป U + Ai + Bj -i-
The following hypotheses are typically evaluated:
(1) Ho: Aj. ป Bj = AxBij * 0 for all i and j
(2) Ho: AxB^j * 0 for all i and j
(3) Ho: Ai - 0 for all i
(4) Ho: Bj ป 0 for all j.
Hypothesis 1 is a test of the significance of the overall
model. Hypotheses 2-4 involve significance tests of interaction *
and main effects. Signficant interaction effects imply that
the effects of one factor are not homogenous across levels of
another factor. Evalaution of .hypotheses concerning the main
IV- 8
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
effect of a factor are difficult to interpret if interactions
involving that factor are significant. In such situations,
separate analyses can be preformed at each level of one of the
factors involved in the interaction.
In situations of interpretable and significant main effects/
pairwise (for ANOVA and ANCOVA) and multiple comparison proce-
dures (for ANOVA) can be applied to estimated means of the
response variable for each level of the signficant main effect.
Examples of multiple range procedures are Duncan's New Multiple
Range Test and Tukey's Studentized Range Test.
Diagnostics
Determination of model adequacy is based on the analysis of
residuals" (observed minus predicted values of the response
variable). Residuals can be examined for their adherence to the
assumptions made on the errors using graphical techniques and,
with sufficient data, hypothesis testing. Examples of statis-
tical tests include tests for homoscedasiticity (Bartlett
test; Hartley test), normality (X2 goodness-of-fit test), and
independence (Drubin-Watson test). A common remedy for hetero-
scedasticity is the use of transformations applied to the
response variable.
Trend Detection
GLM can be used for trend dection by inclusion of a time-
related factor or covariate into the model. Use of a covariate
involves the specification of the functional form of the trend
(e.g., a linear term for the time-related covariate in an
ANCOVA). Use of a time-related factor does not require the
specification of the functional form of the trend. Rather,
pairwise (for ANCOVA) or multiple comparison (for ANOVA) tests
can be used to look for temporal patterns in a significant
time-related factor.
Detection of long-term trends with GLM can be difficult in
data that exhibit short-term periodicities. Two approaches to
dealing with short-term peridocities in the data are to include
a covariate (e.g., Lorda and Saila 1986) or factor (e.g., cruise
nested in year) in the model to account for these periodicities.
IV-9
I
-------
Martin Marietta Environmental Systems
Linear Logit Models
Description
Linear logit models and loglinear models deal with
categorical variables (see Feinberg 1980; SAS, Inc. 1985).
These methods focus on the analysis of the number of observa-
tions (counts) observed at various combinations of levels of
variables. Linear logit models are appropriate when a distinc-
tion is made between response and explanatory variables.
Loglinear models are used to examine relationships among
categorical variables, without distiguishing between response
and explanatory variables. In a general sens"e, loglinear
models can be viewed as the extension of contingency table
analysis. Most linear logit models can be recast in terms of
a loglinear model.
The form of linear logit models is analogous to that of
GLM; the major difference is the definition of the response
variable. Suppose a variable Y^ has three categories (i =
low, medium/ high). The response variable (Z) for the linear
logit model would be the generalized logit function with two
possible outcomes:
log
PLOW
PHIGH
PMEDIUM
PHIGH
where:
PLOW = proportion of low observations
PMEDIUM * proportion of medium observations
PHIGH a Proportion of high observations.
The calculation of PLOW' PMEDIUM' and PHIGH depends On the terms
in the model. For example/ suppose we have observations of Y^
for two cruises in each of two years. A nested linear logit
model with year/ and cruise nested in year/ as factors would have
four cruise-year combinations. For each of these cruise-year
combinations, the proportions of low/ medium, and high responses
would be determined. These proportions would then be used in the
generalized logit function.
I
IV-10
-------
I Martin Marietta Environmental Systems
Explanatory Variables
I-
^H As with GLM, categorical explanatory variables can be
1^^ included as factors (either crossed and/or nested) and continuous
explanatory variables can be included as covariates (termed
logistic regression).
Relevant Data Requirements
Linear logit models require non-zero counts for each level
of the variable (Y^) for each combination of factors defined
by the model. When zero counts are present/ the standard
procedure is to add a small constant value to each count.
Assumptions
Linear logit models are based on the assumption of counts
following a product multinomial distribution.
I
I
I
I
I
I
I
Parameter Estimation/Hypothesis Testing
Parameter estimation in linear logit models can be performed
based on either maximum likelihood ox weighted'least squares
techniques. Hypothesis testing is based on the X2 statistic. As
with GLM, significance tests are performed on hypotheses
involving interaction and main effects, and interpretation of
significant main effects when interactions involving this factor
are present are difficult. In situations of an interpretable
and significant main effect/ pairwise comparisons can be
performed on the estimates of the main effect parameters.
Trend Detection
Trend detection with linear logit models can be performed in
the same manner as with GLM. A time-related explanatory variable
is incorporated into analyses either as a factor or covariate.
Pairwise comparisons applied to the parameter estimates of the
significant time-related factor can be then be used to discern
temporal trends.
IV-11
I
-------
Martin Marietta Environmental Systems
D. DISTRIBUTION FREE METHODS
Distribution free methods involve the analysis of scored
values of the response and explanatory variables. Commonly used .
score functions include the sign function and rank transforma-
tions. We highlight four general distribution free methods
appropriate for trend detection:
Cox-Stuart test for trend
Kendall's Tau
Spearman rank correlation
- Friedman's two-way rank ANOVA.
Conover (1971) and Hollander and Wolfe (1973) provide excellent
discussions of distribution free statistical methods. Note that
the use of rank transformed data with parametric methods has
recently been proposed as a bridge between distribution free and
parametric analyses (see Conover and Iman.1981).
Cox-Stuart Test
Description
Cox-Stuart test is a test for trend and is based on the sign
test. Observations are grouped into pairs that are equidistant
in time. A sign test is performed on the differences between the
observations in each pair. The test statistic is the number of
positive differences. To illustrate, suppose there are 10
observations of a response variable ordered in time (ฅt,
tปl,10). Cox-Stuart test would involve a sign test applied to
the five differences (*t"Yt+5' t * 1'5)-
Explanatory Variables
Explanatory variables cannot be directly incorporated
into the Cox-Stuart test for trend. Two alternatives for
incorporating explanatory variables are the use of a linear |
regression model (GLM) and an alignmnent procedure. With linear
regression, the response variable is regressed on the explanatory .
variable (or some function of the explanatory variable, e.g. .
ranks). The Cox-Stuart test is then applied to the time ordered
residuals from the regression model.
IV-12
-------
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
The data alignment procedure is analogous to the regression
approach in that residuals are analyzed for temporal trend. T.he
difference between the two methods is that with the data align-
ment procedure, the explanatory variable is first categorized.
The response variable is then adjusted by the average value of
the response variable for each category of the explanatory
variable. For example, with an explanatory variable with two
categories (low and high):
Yt-YLOW if Xt is low
RESIDUALt _
if xt is
I
I
I
I
I
I
where:
Yt s response variable
Xt ป explanatory variable
* average value of response variable for all
low Xt
YHIGH ป average value of response variable for all
high Xt.
With sufficient data, any number of categories, as well as any
number of explanatory variables, can be incorporated.
Relevant Data Requirements
The Cox-Stuart test requires a single value of the response
variable over time (i.e., replicate values not allowed). The
test only requires data to be sufficient to allow calculation of
the signs of paired differences. Thus, at a minimum, data must
be ordinal. Note, the Cox-Stuart ignores tied values.
Assumptions
As is characteristic of distribution free methods, the
Cox-Stuart test does not require distributional assumptions.
IV-13
I
-------
Martin Marietta Environmental Systems
Hypothesis Testing
Hypotheses testing with the Cox-Stuart statistic are as
follows:
Ho: Pr(YtYt+c) (no trend exists)
Ha: Pr(YtYt+c) (trend exists)
where:
c * number of paired observations.
The above test"is a two-tailed test; similar one-tailed tests can
be performed for detection of upward trend and downward trend.
Critical values for the Cox-Stuart statistic can be obtained from
a binomial distribution table assuming equal probability of
positive and negative signs. With large sample sizes, a
standardized Cox-Stuart statistic can be compared to normal
critical values.
Spearman Rank Correlation
Description
Spearman rank correlation is the distribution free analog
to Pearson moment correlation. Calculation of Spearman rank
correlation is performed by computing the Pearson moment cor-
relation between rank transformed values of the two variables.
Spearman rank correlation measures the monotonic association
between two variables. Use of Spearman rank correlation for
trend detection simply involves the correlation of a response
variable with a time.
Explanatory Variables
Direct incorporation of additional explanatory variables
into correlation analysis involves the use of partial Spearman
correlation coefficients.
In addition, incorporation of explanatory variables can be
achieved using regression analysis or alignment procedures.
Analyses would proceed in the same manner as described for the
Cox-Stuart test (i.e., correlation of residuals with time).
IV-14
-------
I
I
I
I
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
Relevant Data Requirements
Spearman rank correlation requires that the response
variable be at least ordinal.
Assumptions
Spearman rank correlation does not require any
distributional assumptions.
Hypothesis Testing
With Spearman rank correlation, the null hypothesis is that
the two variables are independent. For trend detection, this is
a test of whether the response and time-related variables are
independent. Tabled critical values of Spearman rank correlation
coefficients are available. Approximate significance tests of
partial Spearman rank correlation coefficients can be performed
using crtitical values of Pearson correlation coefficients
(Conover arid Iman 1981).
Kendall's Tau
Description
Kendall's Tau is a distribution free measure of concordance
between two variables. Use of Kendall's Tau in trend detection
is based on time ordered observations of a response variable
(Yt/t*l,n). In this situation, Kendall's Tau involves the
sum of the signs of all unique pairwise comparisons. First
the value of Kendall's statistic (K) is calculated:
n-1 n
K - I I sgn(Yi-Yi)
J-l i-j+1
where:
sgn(a)
1 if a > 0
0 if a = 0
-1 if a < 0
Kendall's Tau is then obtained by transforming K to a value
between -1 and 1.
IV-15
I
-------
Martin Marietta Environmental Systems
Suppose we are examining year to year trends in a response
variable. In situations when observations are available for
multiple time periods within each year (e.g., monthly values),
two possible options are:
Data can be averaged over months within each year and
Kendalls' Tau can be applied to these yearly averages.
Month-specific Kendall's Taus can be computed. These
values can then be combined into a single statement
concerning trend, assuming independence (denoted as
S' in Hirsch et al. 1982) or accounting for possible
dependence (Hirsch and Slack 1984). Note that Van
Belle and Hughes (1984) provide a X2-based test for
determining homogeneity of trend for the different
months.
Explanatory Variables
Incorporation of explanatory variables with Kendall's Tau
can be achieved using residuals from regression analysis or an
alignment procedure.
ป
Relevant Data Requirements
Kendall's Tau requires the response variable to be at least
ordinal. Note that Kendall's Tau is applied to single values of
the response variable over time.
Assumptions
Kendall's Tau does not require any distributional assump-
tions.
Hypothesis Testing
With Kendall's Tau, the null hypothesis is that the two I
variables are independent. For trend detection, this is a
test of whether the response variable is independent of time
order. Tabled critical values of Kendall's Tau are available.
For large sample sizes, Kendall's Tau can be standardized and
compared to normal critical values.
IV-16
-------
I
I
I
I
I
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
Friedman's Two-Way Rank ANOVA
Description
Friedman's two-way rank ANOVA is a distribution free analog
to a two-way ANOVA without interactions. Analyses are performed
on rank transformed data. The general model is:
Y * u + a^ + bj + e^j
where:
a^ * the effect on the response varible of the
ith level of factor a
bj - the effect on the response variable of the
jth level of factor b
eij - error.
With Friedman's two-way rank ANOVA, one of the factors (a^) is
the effect of interest (treatment effect) and the other factor
(bj) is considered a block effect. Observations are ranked
within each level of the block factor/ and ranks are summed for
each level of the treatment factor. Use of Friedman's two-way
rank ANOVA for trend detection involves defining the effect of
interest as a time-related factor.
Explanatory Variables
Friedman's two-way rank ANOVA allows for a design factor
(e.g., multiple stations) to be incorporated as the block factor,
Relevant Data Requirements
Friedman's two-way rank ANOVA requires a single observation
of the response variable for each combination of levels of the
treatment and block factors. The response variable must be
continuous.
IV-17
-------
Martin Marietta Environmental Systems
Assumptions
The errors are independent random variables from the same
continuous distribution. Note that a specific form of the
distribution of the errors is not assumed.
Hypothesis Testing
The null hypothesis for Friedman's two-way rank ANOVA is that
treatments effects are zero (H0: a^ ป 0 for all i; Ha: a^ ป 0 for
some i). Sigificance testing for this hypothesis is based on a
X2 statistic. Trend detection with Friedman's two-way rank ANOVA
involves a time-related treatment factor. Rejection of the null
hypothesis (i.e./ treatment effects are not equal to zero)
implies that there are signigicant differences in the response
variable over time. Multiple comparison procedures can be
applied to discern patterns in the significant time-related
treatment factor. Alternatively, a test designed specifically
for trend detection is Page's test. Page's test is based on an
ordered alternative hypothesis (Ha: a^ ฃ a2 <.<, an' with at
least one inequality being strict).
I
IV-18
-------
I
Martin Marietta Environmental Systems
I
I
I
I
I
I
I
I
I
I
I
I
V. DEMONSTRATION OF SELECTED METHODS
A. INTRODUCTION
In this Chapter/ we use historical time series data at
four stations off of the Calvert Cliffs area of Chesapeake Bay
and data collected as part of the present CBP water quality
monitoring program to demonstrate some of the methods described
above. All analyses were performed using procedures in the
Statistical Analysis System (SAS)/ the database for CBP data.
The primary objective of these analyses is to demonstrate
how the the statistical methods that were described in general
terms above can be applied to CBP water quality data. In this
context/ the Calvert Cliffs data can be viewed as a "microcosmic"
version of CBP monitoring data (i.e./ both include data col-
lected at multiple depths and multiple stations during multiple
cruises in a year).
Many of the distribution-free methods (as well as Box-
Jenkins methods) require a fairly long time series of data.
At the present time/ only the first 18 months of data from the
present monitoring program are available for analysis/ and these
are insufficient for application of the distribution free
methods. Therefore/ the Calverts Cliffs data were assembled from
a variety of sources so that the distribution free approaches
could, be demonstrated with actual Chesapeake Bay water quality
data.
Parametric methods (GLM and linear logit models) and most
of the distribution free methods are applied to at least one of
the demonstration datasets. The parametric method of GLM is
applied to both Calvert Cliffs and present monitoring data to
illustrate the versatility of the parametric methods. The
application of GLM to the present monitoring data also provides a
good example of the use of results from multivariate techniques
in univariate analyses for trend.
Box-Jenkins methods were not included in the demonstration
because a relevant "time of intervention" could not be defined at
Calvert Cliffs. Furthermore/ because of the strict data require-
ments associated with Box-Jenkins methods/ we do not forsee
their general application to data from the present CBP monitoring
program.
Because the data used in analyses are limited/ we do
not rigorously assess the adherence to the assumptions under-
lying these methods. Rather/ we emphasize implementation of
V-l
I
-------
Martin Marietta Environmental Systems
the methods to the data. 'Future analyses of CBP monitoring
data will involve more dataf thus allowing for more rigorous
assessment of adherence to assumptions.
B. CALVERT CLIFFS AND CBP MONITORING DATA
The data collection methods used for the present CBP
monitoring program were summarized in Chapter III.
The Calvert Cliffs historical data consist of approxi-
mately monthly values of surface and bottom DO, salinity/ and
temperature for four stations off of Calvert Cliffs for 1969 to
1986. These data were assembled from a variety of sources.
Figure V-l shows the locations of the four stations (labelled
A, B, C, and D) in Chesapeake Bay. All four stations have a Dottorn
depth of approximately 30 ft (9 m). Figures V-2 to V-4 show
the time series data for variables used in analyses (bottom DO,
bottom salinity/ and surface salinity) for each station.
C. ANALYSIS OF CALVERT CLIFFS DATA
General Analysis Strategy
t
The objectives of the analysis were, first, to detect year-
to-year trends in summertime bottom DO at Calvert-Cliffs and,
second/ to attempt to attribute any detected trends to year-to-
year differences in the intensity of stratification. We would
expect that in years of intense stratification (due to flow
patterns in the tributaries/ lack of wind-induced mixing, or
other reasons), that bottom DO would tend to be lower than in
years of the same conditions but with less intense stratifi-
cation. The explanatory variable used as the measure of the
intensity of stratification was the difference between bottom
and surface salinity values. Figures V-5 and V-6 show bottom
DO values and salinity differences (calculated as bottom minus
surface) for August of each year for each station. However,
care must used in attempting to attribute trends in summer DO
to salinity differences. While proposed management actions
will not directly affect the intensity of stratification,
salinity differences in the summer at Calvert Cliffs may be I
correlated to flows in the Susquehanna River, and consequently
salinity differences may-be correlated with nutrient loadings.
Any trends in DO attributed to salinity differences may not be I
due to stratification differences, but rather may be due to
differences in nutrient loadings. Therefore, all methods (GLM
and distribution free) were applied with and without salinity
difference as an explanatory variable.
V-2
-------
Martin Marietta Environmental Systems
CHESAPEAKE BAY
0 5 10 NAUTlCAu MILES
I
Figure V-l. Location of the four Calvert Cliffs stations in
Chesapeake Bay (A, B, C, D) for which historical
data were assembled
V-3
-------
Martin Marietta Environmental Systems
c
O
^
JJ
<0
JJ
CO
.10
E-
<
Q
CO
o
CJ
NO
GO
n
oo
GO
m
oo
CM
GO
oo
O
CO
ON
co
r- or
r-
NO
\e\
-
O
r-.
ON
NO
O
CM
CM CO
iH
(1/OrN) 00
0}
c
0
(0
e -u
O ซ0
jj
JJ 03
C U
O
ปH
yj it]
O CJ
01 W
0) 3
O
03 JJ
(T5
-------
I
}
I
I
I
I
I
I
Martin Marietta Environmental Systems
I
I
I
I
I
I
c
0
-~4
JJ
(TJ
E-
Q
CO
fc,
O
ซ
O
cc
-------
Martin Marietta Environmental Systems
CJ
c
0
^H
iJ
(0
.u
CO
Q
CO
PS
5
(U
3
C
M-4
Jj
C
o
0)
u
3
O>
V-6
I
-------
I
Martin Marietta Environmental Systems
I
I
I
I
I
I
c
o
(0
4J
CO
O
E-H
03
I
I
I
I
I
I
O
VO
GO
CO
m
CO
CM
CO
O
co
ov
CO
UJ
>
VO
r*.
m
CM
p*.
rซ.
o
ON
TJ
0)
3
C
i-4
JJ
C
o
0}
u
3
CT
O
CM
CM CO
ปH
(I/ON) 00
V-7
-------
Martin Marietta Environmental Systems
c
o
t-4
JJ
ซ3
JJ
CO
<0
OQ
&H
O
H
OS
O
NO
00
cn
00
CM
oo
o
00
ON
p-
UJ
>
CM
P-
r*.
c
ON
NO
CM
o
CM
in o
*H ^
(idd) 9-Ai I NI TVS
JJ ^O
^4 03
C ON
(0 O
CO -^
s <*
o *ซ
j= CQ
JJ C
c o
o--*
E <->
(0
n n
Q) vu
^ vu
U I
0) ^t
m u
E w
^ 0)
fl O
o
-H U
U 3
O O
jj vu
n
^4 jJ
x us
CO
I
u
3
V-8
-------
I
[
I
I
I
I
I
I
Martin Marietta Environmental Systems
I
I
I
I
I
I
c
o
f4
4J
(Q
4J
CO
CO
I " 1
J
O
E-
03
O
NO
00
IT*
00
CO
CN
CO
CO
o
CO
o\
UJ
>
NO
r*
r-
o
o\
vo
3
C
C
O
CJ
m
I
0)
CM
o te\ o
CM ^ - ^
< idd) a-Al I NI TVS
V-9
-------
Martin Marietta Environmental Systems
E-
<
Q
CO
PL,
O
E-
3
o
c
o
*v4
JJ
(0
4J
w
(idd) a-AllNMVS
vo
oo
00
ro
oo
o
00
ON
-------
I
I
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
I
I
I
I
I
I
I
c
o
03
jj
CO
Q
00
C^
PlH
O
E-H
Pi
o
C
0)
3
C
C
o
o
Os
NO
T
lf\
CS
0)
u
3
O>
\f\ O
^ ซH
(idd) 9-AilNllVS
I
v-ll
-------
Martin Marietta Environmental Systems
c
o
*4
4J
(0
JJ
CO
C/Q
ft!
O
en
eo
CN
eo
o
oo
o\
r*.
LU
>
te\
CN
^
r*ป
O
O\
O ir> o
CN ซH *H
(idd) S-A1INMVS
tf\
C 00
(Q
n O
IQ
VM
u
3
n
ฃ 03
4J C
C O
O -*
E "->
rtS
y_, JJ
O 0]
01 r
O <*-i
t^ VU
U -<
-------
I
I
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
I
I
I
I
I
I
09
C
o
H
4J
Cldd) S-A1INI1VS
I
V-13
-------
Martin Marietta Environmental Systems
c
o
1-4
JJ
fl
Jj
CO
ฃ-
<
Q
O
E-
O
CM
CO
GO
CO
co
o
CO
0\
CO
ui
CM
ปH
r-
o
ON
o
CM
3
C
H
iJ
C
o
( Idd) S-Ai I N I TVS
V-14
-------
I
I
I
1
I
I
I
I
I
I
I
I
I
I
I
E-
<:
o
CO
O
E-
PS
o
c
o
-r4
Jj
<0
Jj
en
Martin Marietta Environmental Systems
3
C
ol
Jj
C
o
CJ
0)
u
3
CT
(Idd) S-A1INITVS
I
V-15
-------
Martin Marietta Environmental Systems
E-
00
O
D
Q
OQ
O
E-
c c
o o
ป* -^
JJ 4J
(0 (0
JJ XJ
en w
ป 11
CJ
u
3
0
u
O
(0
3
C CO
0 S
-< o
4J -^
(II aJ
U
o *
e u
o ซ
Jj >
JJ r-4
o ซj
03 U
I
>
u
3
oa
V-16
-------
I
I
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
1
I
i
I
I
I
E-*
CO
D
O
w
pt,
o
E-
o
VO
00
CO
00
o
CO
ON
f-
co
" S
UJ
r- >
\e\
**
r-.
cn
CM
*H
r-
o
ox
vo
(Idd) AiINI TVS S-8
I
V-17
c
o
*+ x-4
jj i
JJ (Q
O O
ฃi
03 O
C
0)
O
u-i
>. cn
JJ 3
i^ 01
C 3
(0 C
cn ซ
0)
u
3
cn
-------
Martin Marietta Environmental Systems
Distribution Free Methods
A suite of analyses using distribution free methods was
applied to the Calvert Cliffs data to attempt to detect trends
in summer bottom DO and to attribute these trends to salinity
differences. These analyses are organized into two groups
based on whether or not salinity differences were included as
an explanatory variable.
Analysis of DO without Salinity Differences
Kendall's Tau and Spearman rank correlation coefficients
were computed between bottom DO and year for each station for
July/ for August/ and for the minimum DO value within the year.
The results indicate that July and August values of bottom DO
display similar trends/ and these trends are similar for the four
stations (Table V-l). We therefore decided to pursue grouping
data over stations for August and for July and August (which
we term "summer"). August DO values were examined/ since August
values showed the strongest downward trends in bottom DO.
Additional analyses were not performed on minimum DO.
Once data are grouped over stations and months (July and
August)/ two ways of collasping the data are possible. -Data
can be averaged over stations and months and then analyzed/ or
all data can be analyzed but'with observations treated as
station-specific or month-specific values (i.e./'not averaged
over stations or months).
We computed Kendall's Tau and Spearman rank correlation
coefficients between bottom DO and year/ with DO averaged over
stations for August only and averaged over stations for the
summer (July and August). The results show a significant
downward trend in bottom DO averaged over all stations for
August and for the summer (Table V-2).
An alternative to completely collapsing the data by
averaging over stations and months is provided by Hirsh's
modification to Kendall's Tau and by the use of Friedman's
two-way rank ANOVA. In this demonstration analysis/ Hirsch's ^,
modification was used to compute Kendall's Tau for July and ^
for August/ and then to combine these statistics into a single |
statement about trend for these months. In practice, before
applying Hirsch's modification to Kendall's Tau/ homogeneity
of trend needs to be established. In this example/ the homo- I
geneity of trend involving the four stations and the two months
(July and August) was confirmed both graphically and with
Kendall's Tau and Spearman rank correlations applied to trends
V-18
-------
Martin Marietta Environmental Systems
s. u
U <0
(0 Q)
C
(0 U 0)
ฃ O J=
u n-i jj
(0
o u c
Qj (Q ซ^4
CO ฎ ฃ.
ro C
10 O
3ซa
3
io o ^
H O 10
BJ ฃ
- 0 O
H 4J Q
^ 4J
(0 0 ฃ
o ja 3
C ฃ
V C f-4
kMซ ฉ C
0) ^
ซ-.:*ฃ
O -U
Q) C)
^^ ^5 ^2
CO 4J
0) -o
3 ID U
i 1 4J O
^^ ^^ ^
<0 3 *M
> a
1 ฃ TJ
Oi O C
*~ O (0
Q] 01 ป
4 11 t 1
^^ * *^
O C 0}
>
J2 0 ^
O -H 3
u -u n
cu H C 10
(0 ซJ JJ
> Si 03
^^
1
^>
0
s
1-1
Jj
3
ซ*
CO
**
I
4J
s
CO
1
JJ
(0
ti
CO
4J
(0
03
c
ฃ
jj
CO
f
2?
c
ฃ
Jj
to
1
^^
iH
3
c
ฃ
tco
^^
^
3
n
c
ฃ
JJ
CO
3
I
|-
-3
U
1-1
Jj
to
Jj
(0
CO
in *ป
m o
c5 o
VO 00
ซw o
o
o
1 0
V VO
m o
o o
i
VO 00
v o
o
o
1 0
-4 CO
in o
o
o
1 O
ts oo
m o
0 0
^" i-H
^- .O
*
0 0
CS VO
in o
o
o
1 0
vo m
CO O
o o
i
00 VO
o
o
1 O
in 1-1
*p o
o o
CN r*ป
m o
? ฐ
co
^i ฎ
^ 3
(0 r-l
3 3 10
e
o
in . ^ซ
o
o
1 0
vo o
o
o
1 0
a\ m
a u i
co . a
.
(0
EH
I
I
V-19
-------
Martin Marietta Environmental Systems
Table V-2.
Values and probability levels (p-values) of
Kendall's Tau and Spearman rank correlation
coefficients computed between bottom DO (aver-
aged over stations for August only) and year,
and between bottom DO (averaged over stations
and averaged over July and August (Summer)) and
year
August
Summer
Kendall's Tau
p-value .
-0.52
0.003
-0.59
0.0006
Spearman rank
p-value
-0.70
0.001
-0.77
0.0002
V-20
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
with time on a month-specific and station-specific basis
(not all results shown). For illustrative purposes, we applied
Van Belle's test to confirm the homogeneity of trends in July
and August bottom DO (averaged over stations). As expected,
Van Belle's test showed that^the trends in July and August
bottom DO were consistent (xf * 0.44, ns (p > 0.05)).
Therefore/ we computed Kendall's Taus for these two months
and used Hirsch's modification (Hirsch et al. 1982) to combine
the statistics. The result is that bottom DO exhibits a
significant downward trend for July and August (S1 * -0.44,
p < 0.01).
Friedman's two-way rank ANOVA was applied to summer aver-
age (July and August) bottom DO to detect year-to-year differ-
ences, with station as the blocking variable. Note that we
have already established the homogeneity of trend among sta-
tions. The results from the Friedman's test show that there
are significant year-to-year differences in average summer
bottom DO (Xj^ ป 96.5, p < 0.0001). One option (not shown)
for detection of trends with Friedman's test is to apply mul-
tiple comparison tests (e.g., Tukey's Studentized Range Test)
to determine any trends in ranked values of bottom DO with
year. Another option is to use the same two-way station-by-year
table of average summer bottom DO as with Friedman's test, and
apply Page's test for monotonic trend. Page's test showed a
significant downward monotonic trend in summer bottom DO over
the four stations (Z ป -6.72, p < 0.001).
Analysis of DO with Salinity Differences
Having determined that summer bottom DO at Calvert Cliffs
exhibits significant downward trends, we next attemptea to re-
late those trends to the salinity difference explanatory vari-
able. As discussed in Chapter IV, explanatory variables cannot
easily be directly incorporated into distribution free methods.
We illustrate two different, though related, techniques for
incorporating explanatory variables into these methods. A
third alternative (not demonstrated) is to perform approximate
significance tests on partial Spearman rank correlation coef-
ficients between DO and salinity differences, using significance
levels for Pearson correlation values.
The first method is to use linear regression of DO on
salinity differences, and then to apply the distribution free
methods to detect temporal trends in the residuals (i.e.,
observed DO minus predicted DO from the regression model).
The second method for incorporating salinity differences as an
explanatory variables is the data alignment procedure. Average
salinity differences are categorized into low and high categories
v-21
-------
Martin Marietta Environmental Systems
based on the median value of salinity differences, and distri-
bution free methods are applied to time ordered residual DO
values calculated for high and low salinity difference years.
The advantage of the alignment procedure, as discussed in
Chapter IV, is that it allows for censored explanatory variables
to be easily incorporated into the distribution free methods.
For illustrative purposes, we apply the alignment procedure to
salinity differences, even though salinity difference is an
uncensored variable.
Kendall's Tau and Spearman rank correlation coefficients
were computed between bottom DO (averaged over all stations) and
year for August only and for July/August averages, after removing
the variation accounted for by bottom minus surface salinity
differences (i.e., intensity of stratification) using linear
regression and the alignment procedure. For analyses applied
to August DO, salinity differences were computed for each
station and then the average "August" salinity difference was
calculated for each year. For analyses applied to summer
(July/August) DO, bottom minus surface salinity differences were
computed for each station for each month, and the overall average
difference was calculated for each year.
Within the linear regression approach, two alternative
regression models were investigated: DO as a linear function
of salinity differences (DO ป BQ + B^'SALDIFF) and DO as an
exponential function or salinity differences (In (DO + 1J" * B0
+ B]/SA{,DIFF, or equivalent!/, DO'ป Bo'exptBi'SALDIFF) - 1,
where B_ ป exp(BQ)). The logarithm of (DO + 1) is used to
avoid the problem of the logarithm of zero being undefined.
The exponential model is based on inspections of plots of
bottom August DO and summer DO values versus salinity differ-
ences, and the fact that, beyond some large value of salinity
difference, DO is bounded below by zero.
The linear and exponential regression models were
significant for both August DO and summer (July/August) DO
(Table V-3). Figure V-7 shows observed and predicted values
of August bottom DO for the linear and exponential regression
models. As expected/ the negative slope .coefficient (B]_)
indicates that the greater the difference between bottom and
surface salinities, the lower the value of bottom DO.
We examined residual plots to determine if a more compli-
cated model of salinity differences was appropriate or if
there were obvious heteroscedasticity problems. Figure V-8
shows examples of residual plots (in this case, residual DO
versus salinity differences) for August DO for the linear and
exponential models. For both models, these plots do not indi-
cate any obvious'structure in DO residuals that could be ac-
counted for by a more complex model of salinity differences.
Similar results were obtained for residual plots involving
summer bottom DO and salinity differences.
V-22
-------
Martin Marietta Environmental Systems
u
CD
> iH
O (0
a o jj jj
3) C CD
(Q S-H C
U O >H O
CD JJ <0 O<
> JJ 03 X
IQ O CD
' .n u
CD *O
งu s c
CD E * ->
ฃs 3 *""^
E E 03 0
O 3 Cb O
JJ 03 C Eh
JJ O M O
0 TJ Q
.Q C -ป J V
ซJ JJ <
jj 03 cn Oi
03 ป 3
.3 03 O> ซ i II
O> CD 3 03
3 0 < *
< C 4- -~
CD -a
>w u c on
O CO (0 03 -H
^j fl)
03 <*!>, II "0
* -H F-( O
to TJ 3 O E
>i "T3 Q
>H ^ "~* "^
(Q JJ T3 Cb
C -^ C U Cb
(Q C <0 ซJ M
C -H 03 C U
0 <0 C -< <
^ 03 o <-* cn
03 -*
03 JJ JJ CD ~*
CD 03 (0 ฃi 03
U 3 JJ JJ
Ol D> 03 +
CD 3 U
U < U O O
CD M-i 03
(0 C >
O O 0) II
*44 CD
O T3 0 -~
o a> c -H
03 C Dl CO +
JJ 0 IQ U O
^ -H U CD Q
3 jj CD MJ *-
03 ซ > <ซ C
CD JJ (Q * ซ-l
Ctf 03 ^* 'O ซ-
1
>
CD
.H
^3
(0
"
03
03
H
CD
U
C
1-1
CO
o
n
33
**
0
X
0
II
0
03
M
=ฐ
O
II
_t
03
II
O
03
0
X
47
CN i-^
05 IQ
>
U 03
CD CD
JJ JJ
CD IQ
E E
(0 **
U JJ
IQ 03
0) U
l-l
CD
O
ฃ
CD CD
03 iH
C JO
0 10
03 ki
CD (0
OS >
* ซ
-.
* *
ซ *
o tn
\0 r**
o o
O vO
in CN ^4
fH 00
O O
\fl \ "4 \
U U II U
O-* O -H
03 03 03 03
F-l
U IQ
(0 1 -^
ID O JJ
c a c
ปH X CD
tJ Cd C
JJ
03
3
Oป
3
* *
*
* ซ
41 4
*T IT)
r- t>-
o o
CN nซ
^" r*- m CN
-* Q
0 .0
If) 1 CN 1
II II II II
O *i O~*
03 03 03 03
p*4
fl 1 -ซ
CD O JJ
c a c
^ X CD
hJ U C
lJ
CD
E
E
3
cn
V-23
-------
Martin Marietta Environmental Systems
E-
<
Q
cn
OS
3
o
O
0
u
(0
o
e
o
o
_ o
- CO
a.
a.
ซ?
m
- en
- CM
- o
cs
i
ซ*% ^r cn cs ซH
(I/ON) oa
to
3
'Si
c -a
** O
> TJ
C O
T3 u s
14 (0 O
-------
1
1*
1*
1
1
1
1
1
1
19
1
1
I
i
1
1
Martin Marietta Environmental Systems
W
>
<
O u
* *. to
J xs
0
J ฐ
< <>ซ
O o
O
0
........-,!-.. ,.........|. |
*
" ซH -^
o
- 2 2
^ 0
~4
CN ^
.-! ฎ
" u -o
Qt C
-ซ 1 fl
' -* 1 u
> ^
- P^ to 0)
> O 0) T3
H- QUO
._ ปH C ฃ
" ^ ^ 4J 0>
M 01 U C
j 3 0 O
' -J 01 ซw
- IfN ซ 3 M-l tO
> < -4 CO
T3 ฎ
_ W U-l U
- ^ i o > 01
to -^ ฎ
Jj --H U
m o c
"" ^* *H ป-< r>- 4
Q.^ (0
(0 ^
A.I i-H CO AJ
- ^ ซ3 C
3 tQ (U
T3 3 C
-^ to o
- ~* w u a
(U 0) X
OS > 0)
- o
00
ป-4 1
- 1 >
0)
CN U
J~ 1 3
rn
I
I
CM
CN
I
tn
oo
V-26
-------
Martin Marietta Environmental Systems
c
(0
c
o
cu
X
e
CO
<
CO
o
o
cr>
cs
- o\
- GO
a.
a.
- so
ffi
i- CS
- o
en
CN
M
t
m
i
oa
TJ
0)
3
C
O
CJ
oo
I
0)
u
oป
ปH
Du
V-27
I
-------
I
I
I
I
I
I
I
I
I
I
I
I
I
Martin Marietta Environmental Systems
Conceptually, detection of temporal trends in these
residuals is based on the analysis of the residuals reordered
in time (years)/ rather than as a function of salinity differ-
ences. (For August DO/ compare Figs. V-8 and V-9.) Table V-4
shows the results of Spearman rank correlation and Kendall's
Tau applied to residual bottom DO values with year for August
only and for July/August averages/ with residuals computed
from the linear and exponential regression models. After
removing variation accounted for by salinity differences,
there is still a significant downward trend in residual August
DO values for both the linear and exponential regression models/
and for summer DO residuals with the linear regression model.
However, there is not a significant trend in the summer DO
residuals using the exponential regression model.
With the given information/ it is difficult to determine
which of the regression models is more appropriate. Deciding
between the linear and exponential regression models is a model
building problem when using a parametric method (GLM). The
original intent of the analysis was to use distribution free
methods to detect trends in DO. The analyses have become a
functional analysis of the specific relationship between DO
and salinity differences. The choice of the functional form
for the relationship between DO and salinity differences
affects conclusions concerning trends in DO.
If salinity were a censored variable/ application of the
alignment procedure would involve assigning salinity differences
to high and low categories using the median value of salini.ty
differences (1.74 for August only and 2.36 for summer). We
elected to use two categories to ensure that a sufficient number
of observations occurred in each category to allow calculation
of the average DO for "high" and "low" salinity difference
years. With additional data, any number of categories could
be defined, as well as any number of explanatory variables.
Average bottom DO was calculated for each salinity difference
category for August only and for summer (July/August). Each
observation of DO was then subtracted from the average value
for the category corresponding to salinity difference for that
year. Spearman rank correlation and Kendall's Tau were calcu-
lated between these differences in DO and year (Table V-5).
After accounting for differences in salinity from year to year
using the alignment procedure, we did not find a significant
downward trend in bottom August DO,' but we did for summer
(July/August) DO.
When analyses for trend detection include a functional
explanatory variable, collinearity can be a problem. This is
illustrated by the analysis described above. Both bottom DO
and salinity differences exhibit significant trends in time
(see Tables V-2 and V-6). Salinity differences were included
V-28
I
-------
Martin Marietta Environmental Systems
0
u
fl
0) 4J
03 C
.0 o
o c
o
a
OJ X
O fl)
K 3
j* (8 C
>
gป-
ซJ .
jj C
03 -H
3 -H
CT
3 0)
< ฃ
-Ul
vu
O Jw
O
O
FH U
a in
(0
3 n oj
^ 01 e
1
5)
U
3
J
oo K ujcni-i o
V-2J
-------
Martin Marietta Environmental Systems
1
J2
-U
I i
a d
x o
1 J
1
1
1 co
0
4
1
1
1
O
1
1 ' '
ซn rsj i r
O
o
o
o
o
o
(- , ,
^ _j /vi ^n
VO
03
00
00
<*)
00
CM
00
ซH
00
o
00
o\
r-
00
rป cr
UJ
r^
^^*
\^
^^*
CM
O
O\
vo
I
ivnaiS3a oa
I
V-30
Continued
0)
U
3
-------
Martin Marietta Environmental Systems
c
(Q E
O 0
SOU
(Q ซW
H E
0 -0
aj jj o
^ ^ CO
rH 0 3 l
ป 0 H Q) (Q
>^4 es co
i a
a a 0) 3 TJ
(B ** OJ O
-H 0 E
H U
>,u-i O ^
-< o -<^
-^ O T3 -U
-H O C C
ia too)
(0 C C
ฃ O AJ O
o ปซ ซ a
U JJ 3 X
p. (H rn fli
'H 3
"e u c
(0 U U (Q
o o
a u (4-i u
3 J< U 0)
^ C (0 C
HJ